At my work, we use ruby heavily and sidekiq is an essential part of our stack. Sometimes, I long for the concurrency primitives from Elixir, but that’s not what today’s post is about.
A few days ago, I caused a minor incident by overloading our databases. Having been away from ruby for a bit, I had forgotten that sidekiq runs multiple threads per each worker instance. So, I ended up enqueuing about 10K jobs on Sidekiq, and Sidekiq started executing them immediately. We have 50 worker instances and run Sidekiq with a concurrency of 20. So, essentially we had 400 worker threads ready to start crunching these jobs. Coincidentally we have 400 database connections available and my batch background job ended up consuming all the connections for 5 minutes during which the other parts of the application were connection starved and started throwing errors 😬.
That was a dumb mistake. Whenever you find yourself making a dumb mistake, make sure that no one else can repeat that mistake. To fix that, we could set up our database with multiple users in such a way that the web app would connect with a user which could only open a maximum of 100 connections, the background worker with a user with its own limits and, so on. This would stop these kinds of problems from happening again. However, we’ll get there when we get there, as this would require infrastructure changes.
I had another batch job lined up which had to process millions of rows in a similar fashion. And, I started looking for solutions. A few solutions that were suggested were to run these jobs on a single worker or a small set of workers, you can do this by having a custom queue for this job and executing a separate sidekiq instance just for this one queue. However, that would require some infrastructure work. So, I started looking at other options.
I thought that redis might have something to help us here, and it did! So, redis
allows you to make blocking pops from a list using the
BLPOP function. So, if
BLPOP myjob 10, it will pop the first available element in the list,
However, if the list is empty, it will block for 10 seconds during which if an
element is inserted, it will pop it and return its value. Using this knowledge,
I thought we could control the enqueuing based on the elements in the list. The
idea is simple.
- Before the background job starts, I would seed this list with
nis the desired concurrency. So, if I seed this list with
2elements, Sidekiq would execute only 2 jobs at any point in time, regardless of the number of worker instances/concurrency of sidekiq workers.
- The way this is enforced is by the enqueue function using a
BLPOPbefore it enqueues, so, as soon as the enqueuer starts, it pops the first 2 elements from the redis list and enqueues 2 jobs. At this point, the enqueuer is stuck till we add more elements to the list.
- That’s where the background jobs come into play, at the end of each
background job, we add one element back to the list using
LPUSHand as soon as an element is added the enqueuer which is blocked at
BLPOPpops this element and enqueues another job. This goes on till all your background jobs are enqueued, all the while making sure that there are never more than 2 jobs at any given time.
Let’s put this into concrete ruby code.
That’s all folks! Hope you find this useful!
The full code for this can be found at: https://github.com/minhajuddin/sidekiq-controlled-concurrency