Setting up Celery with Elasticsearch cluster as result backend

I’m trying to configure Celery to use Elasticsearch (ES) as its result backend. The tricky part is that my ES setup is a multi-node cluster. I’ve looked at the official docs and even dug into the source code, but I can’t find any clear info on how to handle this scenario.

Right now, my config looks something like this:

result_backend = 'elasticsearch://single-node:9200/my_index/my_type'

But that’s just for a single ES node. How can I modify this to work with my cluster setup? Do I need to list all the nodes somehow? Or is there a special way to specify the cluster endpoint?

I’m hoping someone here has tackled this before and can point me in the right direction. Any tips or examples would be super helpful!

I’ve dealt with a similar setup before. One approach that worked well was using the ‘elasticsearch-py’ library’s ConnectionPool feature. You can specify multiple nodes in your connection string like this:

result_backend = 'elasticsearch://node1:9200,node2:9200,node3:9200/my_index/my_type'

This way, Celery will use the ConnectionPool to distribute requests across your cluster nodes. It’s more resilient and can handle node failures gracefully.

Remember to install the ‘elasticsearch’ package alongside Celery. Also, ensure your ES cluster is configured for cross-node communication and that all nodes are reachable from your Celery workers.

hey ryan, elasticsearch clusters can be tricky. have u considered using the sniffing feature in the elasticsearch client? it automatically discovers and connects to all nodes in the cluster. might be worth a shot! jus make sure ur firewall allows connections between celery workers and all ES nodes.

hey there! have you tried using a load balancer url for your ES cluster? that might solve the multi-node issue. also, whats your ES version? some versions handle this differently. curious to know more about your setup - how many nodes are we talking about? hope we can figure this out together!