Exploring RAG implementation on AWS: Should document chunks be stored with vectors (e.g., using a vector-enhanced PostgreSQL) or separately (like S3)? How do you manage metadata linking and concerns?
hey, i’ve tried a hybrid method storing raw docs in s3 while meta links live in a vector db. does syncing between em cause latency? anyone faced similar challenges? curious what type of issues you’ve encountered, really keen to hear your thoughts
i’ve been testing a setup similar to that. storing vectors in a db and docs in s3 works, but the metadata linkage can slow things down if not tuned right. i suggest careful caching and periodic sync checks to avoid lags, trial and error works best
In my experience managing similar setups, I have found that an integrated approach to document storage can simplify the overall architecture significantly. By storing metadata alongside vector representations in a dedicated system, it is possible to reduce complexity and potential sources of latency. However, thorough indexing and caching practices become essential to mitigate performance issues. During implementation, I adjusted periodic synchronization intervals to ensure consistency without overloading the system. This approach, while demanding careful tuning, ultimately resulted in smoother retrieval and better alignment between document data and its metadata.