Optimizing Document Storage in RAG Systems: Unified or Separate?

Inquiry

Should document fragments be stored with vectors in a single database or managed separately (e.g., object storage)? What strategies effectively associate metadata with vectors in scalable RAG setups?

i lean towards seprate storage for docs and vectors in big setups - it offers flexability to optimize each. unified systems work fine for light loads but decoupling enables finer control over metadata and scaling aspects using unique id mappings.

My experience indicates that integrating document fragments with vectors in a unified system can simplify retrieval operations and reduce cross-system overhead. In smaller-scale projects or where latency is a major concern, this approach has been effective. However, managing metadata in large-scale or archival scenarios may benefit from separation. Using dedicated storage for extensive metadata or large objects, while linking these through unique identifiers, offers flexibility and scalability. A hybrid strategy that leverages the efficiency of unified vector storage for rapid queries and the capacity of object storage for detailed metadata tends to be optimal.