Should Document Chunks in RAG Systems Be Stored With or Apart from Vector Databases?

Seeking advice on storing document fragments in RAG implementations. Should chunks reside within vector databases or separate storage? Interested in metadata linking, recommended platforms, and potential challenges.

In my experience, embedding document chunks directly in the vector database can simplify retrieval and reduce system complexity because it keeps the vector representations and their associated metadata together. However, balancing this with a dedicated document store often provides more robust indexing capabilities and better control over document versions. Having a separate store also allows for additional metadata to be linked directly back to the original documents, which can be beneficial for provenance. A hybrid model can thus offer both performance benefits and flexibility when updating or revising indexed content.

i lean towards embedding chunks in your vector db for easier retrival. if you need extra metadata depth, a separate store might work but adds overhead. it’s all tradeoffs depending on what you value more.

From my experience, storing document chunks directly within a vector database is effective for quick retrieval, especially when metadata requirements are basic. However, when the application demands extensive metadata management and comprehensive versioning, an external document store provides better control over data history and indexing flexibility. This approach may require additional synchronization between systems but offers improved scalability for complex datasets. In practice, I have transitioned to a dual approach that leverages both systems, balancing operational simplicity with the need for enriched, persistent metadata.

i reckon storing chunks in vectordb gives quick retrival, but if u need rich metadata and better histor tracking, separate storage is an option. syncing the two might be tricky tho, so assess your project scale carefully.

hey, im curious if storing directly in vectordb might ease retrival despite limited metadata. got any insights on handling scalability and versioning with external stores? keen to hear more of ur experiences and ideas.