Query: Implementing a RAG system on AWS. Should document fragments be stored with vector data or in separate storage (e.g., object storage, document databases)? Thoughts on metadata linking and scalability?
hmm im leaning towards separate storage but wonder if metadata sync issues could pop up? any one faced bottlenecks when linking data between systems? would love to hear more of your real experiances on this, curious, whats your thoughts?
In my experience, separating document fragments from vector data often provides a more scalable and flexible system design. Storing text or metadata in a dedicated document store or object storage while maintaining vector embeddings in a specialized vector database allows each component to be optimized for its primary use case. This approach simplifies handling complex queries and updating metadata separately, while specialized vector databases excel at similarity searches. Although the integration requires additional coordination, it results in a more organized architecture that can more easily adapt to evolving system requirements.
i lean towards separate storage cuz vctors need tight tunin for search while doc fragments scale easier on their own. extra integration is a bit of a hassle but it avoids messier, coupled systems.