Document Storage in RAG Systems: Unified or Separated from Vector Databases?

Exploring optimal strategies for storing document segments within AWS RAG systems. Should these fragments live with vector data, or be handled separately? Insights on metadata binding, efficiency, and scalability are welcome.

Based on personal experience, integrating document segments with vector data has the advantage of streamlined indexing and faster look-ups, especially when metadata is seamlessly bound to each fragment. However, separating them could offer benefits in terms of scalability and flexibility during updates since text content and vector embeddings can be managed in different system components. A distinction between immediate retrieval and auxiliary metadata storage may improve efficiency under certain workloads. Ultimately, the choice should be based on specific system demands, balancing quick response times against periods of heavy update loads.