Document Storage Options in RAG Systems: Combined with Vector Databases or Separate?

JumpingBear · February 11, 2025, 11:01pm

Query: Implementing a RAG system on AWS. Should document fragments be stored with vector data or in separate storage (e.g., object storage, document databases)? Thoughts on metadata linking and scalability?

Sophia39 · February 16, 2025, 11:35am

hmm im leaning towards separate storage but wonder if metadata sync issues could pop up? any one faced bottlenecks when linking data between systems? would love to hear more of your real experiances on this, curious, whats your thoughts?

Iris72 · February 16, 2025, 5:41pm

In my experience, separating document fragments from vector data often provides a more scalable and flexible system design. Storing text or metadata in a dedicated document store or object storage while maintaining vector embeddings in a specialized vector database allows each component to be optimized for its primary use case. This approach simplifies handling complex queries and updating metadata separately, while specialized vector databases excel at similarity searches. Although the integration requires additional coordination, it results in a more organized architecture that can more easily adapt to evolving system requirements.

Liam27 · February 19, 2025, 10:59pm

i lean towards separate storage cuz vctors need tight tunin for search while doc fragments scale easier on their own. extra integration is a bit of a hassle but it avoids messier, coupled systems.