Knowledge base search API relevance ranking - keyword matching vs semantic search

kiranvalue · October 4, 2025, 11:24am

We’re experiencing poor search relevance in our knowledge base API and evaluating whether to enhance traditional keyword matching or move to semantic search with embeddings.

Current implementation uses basic keyword matching with some TF-IDF weighting, but users complain they can’t find relevant articles even when using exact terminology from the content. For example, searching “refund processing time” returns articles about payment methods ranked higher than our actual refund policy article.

I’ve been researching BM25 keyword ranking algorithms and semantic search using embeddings. BM25 seems like a natural evolution of our current approach, while embeddings promise better conceptual matching but require significant infrastructure for indexing and vector similarity search.

Has anyone implemented hybrid ranking strategies that combine both approaches? Also curious about A/B testing methodology for measuring search relevance improvements - what metrics actually correlate with user satisfaction?

ankit_arch · October 8, 2025, 3:28am

A/B testing methodology is critical for validating improvements. We ran 50/50 traffic split for 4 weeks comparing old vs new search. Primary metrics: click-through rate on top 3 results, zero-result query rate, search refinement rate (users modifying query), and time-to-article-view. Also tracked session-level metrics like successful case resolution without escalation. The semantic search variant reduced zero-result queries by 42% and search refinements by 35% - strong signals that relevance improved significantly.

melissa_guru · October 5, 2025, 8:15am

Embedding indexing was our biggest challenge. With 50K knowledge articles, generating embeddings took 6 hours initially. We optimized by batching articles (32 per batch) and running parallel workers. Now full reindexing completes in 45 minutes. For incremental updates, we generate embeddings on article publish/update events and upsert into our vector store (we use Pinecone). Query-time embedding generation is fast - 25-35ms for typical search queries. The infrastructure investment is real but the relevance improvement justifies it.

cloudpro · October 13, 2025, 11:44pm

Implementation tip for hybrid ranking: use a weighted scoring approach where you can tune the balance between keyword and semantic signals. We use 0.6 weight for BM25 score and 0.4 for embedding similarity, but this varies by query characteristics. Short queries (1-2 words) weight BM25 higher (0.75/0.25), while longer natural language queries weight semantic higher (0.35/0.65). This adaptive weighting improved relevance another 8% beyond static weighting. Query classification happens at search time using simple heuristics - query length, presence of boolean operators, exact phrase quotes.

Topic		Views
Migrated knowledge base to SAP CX cloud, enabling faster support resolution SAP Customer Experience (SAP CX) use-case , self-service , customer-portal , cloud-migration , knowledge-base , scx-2105 , cloud-deploy-hosting , search-optimization , sap-cx-knowledge	6	March 16, 2025
Auto-tagging knowledge base articles with scripting improves search accuracy and metadata consistency SAP Customer Experience (SAP CX) use-case , rest-api , knowledge-base , scx-2205 , javascript , metadata-management , scripting-automation , auto-tagging , content-analysis	7	December 15, 2025
Knowledge base API search performance degrades with multilingual content SAP Customer Experience (SAP CX) question , api-development , timeout , rest-api , knowledge-base , scx-2205 , performance-optimization , search-api , multilingual	4	March 17, 2025
Automated data classification in knowledge base improves search relevance SAP Customer Experience (SAP CX) use-case , data-governance , knowledge-base , scx-2105 , machine-learning , automated-classification , metadata-standardization , search-optimization , manual-tagging	7	January 20, 2026
Knowledge base integration: search index not updating after CMS article import SAP Customer Experience (SAP CX) question , xml , rest-api , search-index , knowledge-base , scx-2105 , integration-frameworks , content-discoverability , search-indexing	5	September 10, 2025
Comparing sentiment analysis accuracy between native social listening and third-party APIs SAP Customer Experience (SAP CX) discussion , api-development , rest-api , social-listening , scx-2205 , third-party-integration , sentiment-analysis , integration-comparison , accuracy-metrics	5	December 3, 2025
Knowledge base content migration to cloud: handling large vo Oracle CX Cloud discussion , search-index , knowledge-base , ocx-23d , metadata-mapping , cloud-deploy-hosting , kb-migration , content-discoverability , bulk-migration	4	June 21, 2025
Cutting account hierarchy research time by 57% with generative AI and vector search AI Adoption in CRM use-case , scaling , data-enrichment , ai-adoption , crm-ai , salesforce , account-hierarchy , llm , vector-search	5	November 30, 2025
Search index in knowledge base module not updating after new Adobe Experience Cloud question , api-integration , rest-api , search-index , knowledge-base , aec-2023 , json , integration-frameworks , content-automation	7	March 27, 2025

Knowledge base search API relevance ranking - keyword matching vs semantic search

Related topics