Page MenuHomeMiraheze

Migrate to a Kafka Job Queue
Open, LowPublic

Description

Setting as high due to the recent JobQueue incidents, feel free to re-triage.

See https://wikitech.wikimedia.org/wiki/Kafka_Job_Queue

In addition to giving more resources (T5994), we should consider migrating to the new Kafka system.

It includes de-duplication and concurreny limiting which should in theory allow faster and more balanced processing of jobs due to a more managed load on the system.

I'm still thinking we should add more resources though as while this will ease & manage the pressure in theory by reducing duplicate jobs and processing things in a more balanced way. We still had a very large backlog and are expanding quickly so I still think more resources would be noticeable.

Event Timeline

RhinosF1 triaged this task as High priority.Aug 2 2020, 07:55
RhinosF1 created this task.
John lowered the priority of this task from High to Low.Aug 2 2020, 08:07
John added a subscriber: John.

Would not solve problems, and would require more resources to implement - potentially more than we are able to suitably give.

RhinosF1 added a comment.Aug 2 2020, 08:15

Would not solve problems

There's 100% duplication in the way jobs are processed. If the duplication detections works as I understand, it would reduce the load as less jobs to process. Concurrency limiting would also reduce the amount of resources a single type of job can take up and therefore reduce the impact of a spike in edits on loginwiki.

I also note that I doubt the old system will be maintained or supported forever and that the jobrunner services repo hasn't had a single commit since 2017

John added a comment.Aug 2 2020, 08:26

“ would require more resources to implement - potentially more than we are able to suitably give.” is a vital thing you did not reply to, the more critical part.

RhinosF1 added a comment.EditedAug 2 2020, 08:44
In T6006#117420, @John wrote:

“ would require more resources to implement - potentially more than we are able to suitably give.” is a vital thing you did not reply to, the more critical part.

That's something I'm going to look into more as I think we should seriously consider it.