I’m working on a Python application that processes a huge dataset and stores it in memory using custom data structures. The app has several functions that query this processed data.
I need to create a web interface for this Python app, but there’s an important requirement: the dataset should only be loaded once when the server starts. If I reload the data for every web request, it will be way too slow and use too much memory.
Basically, I need the Python process to keep running and maintain all the data in memory between different web requests. The memory state must persist so users can query the data quickly.
I know that using PHP’s exec() won’t work because it creates a fresh Python process each time someone makes a request. I’ve heard that mod_python might be a solution for this kind of problem, but I’m not sure how to implement it properly.
What’s the best approach to keep a Python process alive with persistent memory state for web requests?
have you thought about using redis or memcached for caching? it could help multiple processes share the same data instead of each loading it anew. also, what’s your dataset size? do you need regular updates or is it mainly read-only?
FastAPI with uvicorn workers handles this really well. I did something similar - loaded a 2GB dataset into memory at startup using the startup event handler. The trick is that ASGI servers like uvicorn keep process state between requests, unlike traditional CGI. Just define your data structures as module-level variables and load them once during init. For production, if your dataset’s too big for one process, use multiple workers with shared memory solutions like multiprocessing.shared_memory. Also set up graceful shutdown handlers to save any changes before the process dies. This worked great for me when serving ML models where loading time was killing performance.
using flask or django along with gunicorn or uwsgi is a good idea. load your data to a global variable when your app starts, and it will stay in memory for all requests on that worker. just be careful with locking issues if you’re changing the data at the same time.