Should Document Chunks and Vectors Be Stored Together or Separately in RAG Systems?

Working on an AWS RAG solution: should I store document chunks with vectors or separately (using S3), and how do I handle metadata? Which vector DBs perform well?

hey, im leaning towards keeping vectors and your doc chunks together make indexing easier. metadata can be extra on s3 if needed for flexibility. ive seen pinecone work well, though milvus is pretty solid too. experiment to see what fits best for your use case!

In my work with AWS-based RAG systems, I discovered that storing document chunks and vectors together can simplify data retrieval and synchronization. Keeping these elements close aids in ensuring that vector representations remain aligned with the original text, reducing the likelihood of mismatches between metadata and corresponding information. During implementation, I embedded metadata directly with the vectors to streamline queries, yielding improved search performance. It is crucial, however, to rigorously test performance as system dynamics vary. Selecting a dedicated vector database that meets specific operational needs has proven essential over multiple projects.