Efficient Document Management in RAG: Integrated with Vectors or Kept Separate?

Using a RAG approach, should I merge document fragments with the vector store or maintain separate storage? What strategies effectively manage metadata and scalability?

Based on personal experience, when managing documents with a RAG approach, keeping metadata separate from the vector store provides distinct advantages. This strategy allows for flexible and independent handling of complex metadata, ensuring that changes or enhancements in document structure do not directly complicate the vector embeddings. Although integrating the two seems appealing for simplicity, separation supports scalability by enabling specialized handling for metadata indexing and updates. Over time, this approach has proven more adaptable as the volume and complexity of documents increase.

i lean merging for stable, basic metadata as it avoids extra overhead. but when frequent updates occur or metadata are complex, keeping them seperate might save headaches. depends on your usecase.