RAG Systems: Unified or Separate Storage for Documents with Vector DB?

Quick Question

Implementing a RAG system on AWS: Should document chunks be stored with vector data or in separate repositories? Seeking insights on efficient storage and metadata management.

i lean towards keepin docs and vectors seperate. it gives better flexibility when scaling or updatin data. even though merge might be simpler in small cases, separat stores usually help manage metadata better over time.

Based on my experience when designing scalable retrieval systems, determining the storage method requires a balance between operational efficiency and flexibility. Storing document chunks together with vector data can simplify metadata management and streamline access, as it reduces potential mismatches between documents and their corresponding vectors. However, isolating them in separate repositories might enhance adaptability when adjusting storage strategies or scaling independently. Ultimately, the decision should be driven by system requirements such as consistency, speed, and potential future modifications to data structure or indexing methods.