Re-rankers maximize vector database retrieval results

There’s a really good reason almost every credible vector database—or enterprise application of this technology—incorporates re-ranking models, or re-rankers. It’s not just because these deep neural networks score vector retrieval engines’ results so they’re more useful for search and Retrieval Augmentation Generation (RAG).

Quite simply, it’s because for most vector database use cases, no one has actually fine-tuned the embedding model on an organization’s specific, or proprietary, data. Thus, when embedding the content, models don’t know the questions users will ask. Consequently, those embeddings aren’t tuned to the questions.

Re-ranking the results of vector-based retrieval systems with machine learning models, however, rectifies this issue. “With a re-ranker, the way it works, you have the query now and you have the document,” explained Gareth Jones, Pinecone Staff Product Manager. “You have the query and the document at once, so you can actually run a much more powerful comparison saying this document is specific to this query.”

That comparison produces two effects. Firstly, it enables the model to score the vector search results. Secondly, it uses those scores to adjust the order (and the number of results, in some cases) in which results are given so the most worthy are at the top of the list.

Or, as Jones termed it, re-ranking models “produce a score for every query-document pair. We found that has a pretty significant boost in performance for that very reason. You have the query and the document and you can use both of these things together to make a prediction about how well the score is.”

Better RAG context

There are multiple techniques for re-ranking search results. Doing so with neural networks is just one of them, although it’s one of the more popular methods currently used. Organizations that implement vector database retrieval systems without some form of re-ranking results run the risk of squandering investments in this technology and miring optimal results in a litany of less relevant ones. Other effects include poor RAG outputs for summarization and question answering.

However, users who implement re-ranking models in their vector database applications see an improvement in the overall contextual understanding facilitated. This sentiment is particularly true for RAG applications in which re-rankers provide fewer, and more relevant, results for language models to consider when answering questions. “There’s a lot of evidence to show that too much, or wrong context, just confuses [language models], and it leads to more hallucinations,” Jones denoted. By using re-rankers, organizations can reduce the amount of results for language models to parse—as well as the amount of tokens sent, which decreases cost.

Re-ranking dense and sparse vectors

Re-rankers are routinely employed for sparse vectors—which involve lexical, keyword search methods—and dense vectors, which entail embedded content. Re-rankers are even applied to hybrid scenarios in which both vector types are used for the same search. Traditionally, re-rankers were designed for specialized use cases. However, the increasing adoption of re-ranking models in vector-based AI retrieval systems is linked to advancements in techniques for re-ranking models. According to Jones, “Re-rankers that are general purpose for a wide variety of users and use cases is pretty recent. I think Cohere, frankly, really pioneered sort of offering re-rankers that could work on the vast majority of user’s use cases to do this dense-sparse [vector] combination in the past year and a half, I would say.”

The endurance of keyword search in the age of vector databases shouldn’t be surprising. Since they provide information retrieval, both are manifestations of AI. Although the multifaceted nature of the content in dense vector embeddings (which can encompass audio, video, images, and text) is impressive, the sheer precision of lexical search is, in some ways, unsurpassable. Thus, vector database vendors offer it individually and in expressions of hybrid search. Re-rankers may be applied to both. “Having that ability to find a rare keyword, and recall those results, is quite powerful,” Jones remarked.

Lexical search

Lexical search excels in deployments in which organizations need results predicated on a specific keyword. Although dense vector models are helpful in a range of ways, they’re less useful when what’s desired is exact information—as opposed to embedded content that is similar to an embedded question in a vector space, which is how dense vector models work.

For example, “you have the name of a person, or a part number, or a telephone number,” Jones postulated. “A dense embedding model has a couple of issues. One, it just has no idea what that is. It doesn’t know if that’s a phone number or a part number. It wasn’t trained with that knowledge.” Thus, the dense model would return results that are similar to that part number, whereas lexical search techniques would return results only based on the exact number.

A solid practice

There are numerous ways to implement RAG systems. There are also myriad intricacies to consider when utilizing vector databases for similarity search and other types of search. All of these paradigms, however, are tremendously improved by re-ranking results with neural networks. Moreover, as Jones indicated, “It’s easy to add re-ranking to an existing retrieval pipeline. It’s usually simple to add in, or swap out, different re-rankers so you don’t have to re-embed all the data.”

Re-rankers maximize vector database retrieval results

Better RAG context

Re-ranking dense and sparse vectors

Lexical search

A solid practice

Leave a Reply Cancel reply