
Although numerous vendors gloss over this fact, there’s much more to reaping the enterprise benefits of generative AI than implementing a vector database. Organizations must also select a model for generating their vector embeddings; shrewd users will take the time to fine-tune or train that model.
Additionally, as part of creating those embeddings, it’s necessary to devise a chunking strategy to get results that are actually worthy of GenAI investments for question answering, Retrieval Augmented Generation (RAG,) and deployments of intelligent agents.
Chunking is the prerequisite of determining how much content is included in a single embedding. For text, the options are nearly interminable. Chunks can include entire pages, sentences, paragraphs, lines, specific sections in a document—or any other specifications.
Not surprisingly, some vendors have taken the liberty of developing a chunking strategy for organizations when they select their vector database or specific applications involving it. Pinecone—which recently released a Pinecone Assistant application that, among other capabilities, automates many of the organizational requirements for implementing RAG—handles the chunking step for organizations with this construct. According to Nathan Cordeiro, Principal Product Manager, GenAI, at Pinecone, one use case for Pinecone Assistant is “where you provide us documents, and we take the action to parse them, chunk them, and embed them.”
Those embeddings are then available in the vector store for organizations to search through, answer questions with, or provide summaries of content—without puzzling over an optimal chunking strategy.
Chunking considerations
Chunking is imperative for obtaining credible results from different forms of vector-based search. If the chunks of embedded content are too large, models might not produce answers that are specific enough for critical questions. If they’re too small, models might not obtain adequate context to form correct responses. It’s not uncommon for organizations to rely on potentially costly trial-and-error methods to determine a chunking strategy that suits their content and use cases.
Pinecone Assistant users can skip this expenditure of resources and benefit from the vendor’s foray into creating effective chunking strategies. Doing so involved “trying a variety of different chunking strategies, and then we benchmarked them to see what the quality was for each of those chunking strategies for this factual question answering problem,” Cordeiro commented.
Structure variations
The results of Pinecone’s research were influenced by the type of content used to produce embeddings. Specific factors include structure variations in the data (which might involve semi-structured or unstructured data) from which the vectors are generated. “We handle chunking differently depending on the document type that is provided,” Cordeiro revealed. “We will chunk a JSON file differently than a PDF, because a JSON file has structure implicit in it, and a PDF doesn’t necessarily.”
At this point, Pinecone doesn’t rely on a language model to chunk content for Pinecone Assistant. Instead, its approach is one that Cordeiro characterized as more “deterministic. For structured information we use the structure as the mechanism to chunk. For more freeform unstructured information, we use a specific combination of paragraph breaks and token counts.”
Chunking enrichment
Creating efficient chunks for vector embeddings on unstructured data, which might include emails, social media feeds, clinical notes about patients and more, is particularly challenging. The token counting and paragraph break methodology espoused by Cordeiro focuses on “getting minimum amounts of data in…and then ensuring that related information is all contained within a given chunk.” One of the more exacting facets of this approach is incorporating all the related information in a particular chunk, even when that chunk is for structured or semi-structured data. There are two techniques that help achieve this objective. The first involves enriching chunks with metadata to enlarge the context around chunks.
“You can imagine things like the document name could be used to enrich the chunk to ensure that each chunk has the relevant metadata associated with it to maximize the retrieval quality,” Cordeiro remarked. Additionally, the notion of chunk expansion is helpful for supplying models with greater amounts of relevant information about particular chunks. Cordeiro described this concept as one in which, “Depending on the content of the chunk and the question being asked, we might expand the number of chunks to the ones surrounding the ones that were retrieved. Depending on the question and the data being retrieved, we might actually provide more information at the retrieval step based on the task being performed.”
Key takeaways
Granted, what works best for chunking will almost always pertain to the specific use case, the type of data being used, and other vector database considerations for generative AI—such as re-ranking and query planning. The key takeaway, however, is vendors such as Pinecone are providing organizations with a starting point for forming their own chunking best practices.
As such, this necessity is no longer an inhibitor to employing vector databases for RAG and what many are calling Agentic AI. And, with time, organizations can refine their chunking approaches and form their own methodologies for optimizing results when working with vector embeddings.