Optimizing Sidekiq For Maximum CPU Performance on Multicore System

I have been working a lot with somewhat large datasets (millions of records) that benefits from parallel processing. I thought sidekiq’s multi-threading was going to be a great solution for this, but upon further investigation, I noticed my work was only marginally faster and that my CPU wasn’t ever at 100%. In fact, it was hovering more around 25%… what gives? Maybe my jobs are IO bound? Nope, that wasn’t the case… $ top showed wait cpu time to be 0.0.. The CPU wasn’t waiting for more IO! What could be the issue?

Global Interpreter Lock (GIL) Sadness

On further research, I learned that all MRI ruby threads run one at a time, even on a multi-core system! This is to protect from non-thread safe functions. Implementations of JRuby and Rubinius have threads that can run in parallel, but I didn’t have a chance to try them. Reading this toptal article was very informative for me to understand the difference between ruby concurrency and parallelism. (Sorry for referencing parallel wrong in previous blogs!)

Solution For Maxing Out CPU in Sidekiq

So the only way to max out cpu utilization with sidekiq is to use more processes. All you have to do is spawn up more sidekiq workers with the same configuration file and they will just be added to the pool of workers. Neat and simple! You have to note that more worker processes mean more memory. While worker threads can share memory, worker processes will not and if you spawn too many processes, you’ll run out of memory quickly. Also, in general, it is better to only spawn as many workers as you have logical CPUs.

I wrote a quick script to manage starting/stopping sidekiq workers. Feel free to use it too if you’d like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash
 
NUM_WORKERS=2
NUM_PROCESSES=4
 
# http://www.ostricher.com/2014/10/the-right-way-to-get-the-directory-of-a-bash-script/
get_script_dir () {
     SOURCE="${BASH_SOURCE[0]}"
     # While $SOURCE is a symlink, resolve it
     while [ -h "$SOURCE" ]; do
          DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
          SOURCE="$( readlink "$SOURCE" )"
          # If $SOURCE was a relative symlink (so no "/" as prefix, need to resolve it relative to the symlink base directory
          [[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
     done
     DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
     echo "$DIR"
}
 
start_sidekiq_workers() {
  echo "Starting $NUM_PROCESSES sidekiq procesess with $NUM_WORKERS each."
  for n in `eval echo {1..$NUM_PROCESSES}`; do
    bundle exec sidekiq -r "$(get_script_dir)/../config/environment.rb" -c $NUM_WORKERS &
  done
}
 
case $1 in
  stop)
  ps aux|grep "sidekiq 3"|grep -v grep|awk '{print $2}'|xargs kill
  ;;
  start)
  start_sidekiq_workers
  ;;
  status)
  ps aux|grep "sidekiq 3"|grep -v grep
  ;;
  *)
  start_sidekiq_workers
  ;;
esac

Results

So after playing around with worker threads and processes. Here is the results of the job I was working on with different parameters:

Completed importing all files in 01:39:48:119774276. – 25 workers, 1 processes
Completed importing all files in 00:44:06:214396540. – 10 workers, 2 processes
Completed importing all files in 00:28:21:940166878. – 5 workers, 4 processes
Completed importing all files in 00:17:51:737359697. – 4 workers, 4 processes
Completed importing all files in 00:11:04:804641568. – 2 workers, 8 processes
Completed importing all files in 00:09:59:336971420. – 1 worker, 16 processes

Clearly, using more processes is faster than just more worker threads. Just make sure you have enough memory! For my test run, I could only divide up my work into 16 jobs, so I couldn’t test with more processes… but I think at a certain point, adding more processes would not help make the job run any faster and would probably start slowing down the system with overhead. I recommend running benchmarks on a small subset of your data to determine what the right balance would be before processing the whole thing! You can save a lot of time if you can make a 20 hour job turn into a 2 hour job.

I would love to see how multi-threaded processes work with rubinius. It’s been pretty fun learning about concurrency and parallel computing in ruby context.

One thought on “Optimizing Sidekiq For Maximum CPU Performance on Multicore System

  1. Pingback: MySQL – Processing 8.5 Million Rows In a Reasonable Amount of Time | JMC's Capricious Contemplations

Leave a Reply

Your email address will not be published. Required fields are marked *