In the first part of this post I looked at a few celery backends and discovered they didn't meet my needs. Why is the Celery stack slow? How slow is it actually?
How slow is Celery in practice
- Queue: 500`000 msg/sec
- Kombu: 14`000 msg/sec
- Celery: 2`000 msg/sec
Detailed test description
There are three main components of the Celery stack:
- Celery itself
- Kombu which handles the transport layer
- Python Queue()'s underlying everything
Using the Queue and Kombu tests run for 1 000 000 messages I got the following results:
- Raw Python Queue: Msgs per sec: 500`000
- Raw Kombu without Celery where
kombu/utils/__init__.py:uuid()
is set to return 0- with json serializer: Msgs per sec: 5`988
- with pickle serializer: Msgs per sec: 12`820
- with the custom mem_serializer from part 1: Msgs per sec: 14`492
Note: when the test is executed with 100K messages mem_serializer yielded 25`000 msg/sec then the performance is saturated. I've observed similar behavior with raw Python Queue()'s. I saw some cache buffers being managed internally to avoid OOM exceptions. This is probably the main reason performance becomes saturated over a longer execution.
- Using celery_load_test.py modified to loop 1 000 000 times I got 1908.0 tasks created per sec.
Another interesting this worth outlining - in the kombu test there are these lines:
with producers[connection].acquire(block=True) as producer:
for j in range(1000000):
If we swap them the performance drops down to 3875 msg/sec which is comparable with the
Celery results. Indeed inside Celery there's the same with producer.acquire(block=True)
construct which is executed every time a new task is published. Next I will be looking
into this to figure out exactly where the slowliness comes from.
Comments !