Can a Git repo serve as a document database?

I’m working on a project that needs to manage a large number of structured documents. We’re talking about a tree with around 1000 categories, each holding up to 10000 docs. These docs are a few KB each, probably in YAML or JSON.

The system needs to:

  • Fetch docs by ID
  • Search docs based on their content
  • Allow editing with change tracking
  • Show edit history (who, when, why)

I know using a doc database like MongoDB is the usual way. But I had this crazy idea: why not use Git as the backend?

Here’s my rough plan:

  • Use folders for categories and files for documents
  • Retrieve documents by reading files directly
  • Treat each edit as a commit to track changes
  • Get history from Git logs
  • Implement search by exporting data to a conventional database

Has anyone tried this approach before? How significant might the performance impact be and can it scale effectively?

I’m curious to hear if this method might actually work or if it’s bound to run into issues.

hmm, thats an interesting approach! have u thought about how youd handle concurrent edits? git’s great for version control, but it might get tricky with multiple users making changes simultaneously. What about using a distributed version control system like Mercurial? it could handle branching better. just curious, whats ur main reason for considering git over traditional databases?

interesting idea, but git might struggle with performance for large-scale stuff. have u considered using git for version control and a separate DB for querying? could be a nice hybrid solution. worth testing to see if it meets ur needs. good luck with ur project!

While using Git as a document database is an intriguing idea, it’s important to consider the potential drawbacks. Git wasn’t designed for this purpose, so you might encounter performance issues with large-scale operations, especially searches. The lack of indexing could slow down queries significantly.

That said, Git does offer excellent version control and change tracking out of the box. If these features are crucial to your project, it might be worth exploring. However, for optimal performance and scalability, you may want to consider a hybrid approach. Use Git for version control and history tracking, but implement a separate database or search engine for efficient querying and retrieval.

Ultimately, the feasibility depends on your specific use case and performance requirements. It’s an interesting concept, but careful benchmarking would be essential before committing to this approach in a production environment.