RAG Systems: Should Document Segments and Vectors Share Storage?

I’m building a RAG system on AWS. Should document segments be stored with vectors or separately? How do you handle metadata linking and scalability?

In my work with RAG systems on AWS, I have found that storing document segments separately from the vector embeddings offers several benefits. This separation allows more flexibility in managing and updating each component and helps ensure that the indexing of metadata remains efficient as the volume of data grows. By linking your document segments with vectors using well-structured metadata, you can achieve reliable cross-referencing. In practice, this approach simplifies maintenance and scaling because you can optimize storage for text retrieval independently from vector similarity search capabilities.