Storing Documents in RAG Systems: Integrated with Vector DB or Separate Setup?

In AWS RAG setup, should document segments reside within the vector database (e.g., PostgreSQL with a vector addon) or use separate storage (such as S3)? How is metadata linked?

i lean toward a unified approach. storing docs and vectors together cuts sync probs and makes metadata linking more transparent. for small apps it works fine, but if you’re planning massive scale, you might want to consider a separte setup. in my experiance, simplicity often wins.

In my experience, separating document storage from the vector index can provide both performance and scalability benefits. Storing segments in a dedicated system such as S3, while keeping vector metadata in a PostgreSQL extension, helps optimize resource usage. I found that this separation allows for easier updates and management of metadata, reducing indexing overhead. Linking is usually achieved via unique identifiers, which are stored as part of the vector record. This approach also simplifies scaling, as document storage can grow independently from the indexing system.