Speeding up Celery Backends, Part 2

Posted by Alexander Todorov on Fri 07 November 2014

In the first part of this post I looked at a few celery backends and discovered they didn't meet my needs. Why is the Celery stack slow? How slow is it actually?

How slow is Celery in practice

  • Queue: 500`000 msg/sec
  • Kombu: 14`000 msg/sec
  • Celery: 2`000 msg/sec

Detailed test description

There are three main components of the Celery stack:

  • Celery itself
  • Kombu which handles the transport layer
  • Python Queue()'s underlying everything

Using the Queue and Kombu tests run for 1 000 000 messages I got the following results:

  • Raw Python Queue: Msgs per sec: 500`000
  • Raw Kombu without Celery where kombu/utils/__init__.py:uuid() is set to return 0
    • with json serializer: Msgs per sec: 5`988
    • with pickle serializer: Msgs per sec: 12`820
    • with the custom mem_serializer from part 1: Msgs per sec: 14`492

Note: when the test is executed with 100K messages mem_serializer yielded 25`000 msg/sec then the performance is saturated. I've observed similar behavior with raw Python Queue()'s. I saw some cache buffers being managed internally to avoid OOM exceptions. This is probably the main reason performance becomes saturated over a longer execution.

  • Using celery_load_test.py modified to loop 1 000 000 times I got 1908.0 tasks created per sec.

Another interesting this worth outlining - in the kombu test there are these lines:

with producers[connection].acquire(block=True) as producer:
    for j in range(1000000):

If we swap them the performance drops down to 3875 msg/sec which is comparable with the Celery results. Indeed inside Celery there's the same with producer.acquire(block=True) construct which is executed every time a new task is published. Next I will be looking into this to figure out exactly where the slowliness comes from.

tags: Python, Django, QA

Comments !