Setting as high due to the recent JobQueue incidents, feel free to re-triage.
See https://wikitech.wikimedia.org/wiki/Kafka_Job_Queue
In addition to giving more resources (T5994), we should consider migrating to the new Kafka system.
It includes de-duplication and concurreny limiting which should in theory allow faster and more balanced processing of jobs due to a more managed load on the system.
I'm still thinking we should add more resources though as while this will ease & manage the pressure in theory by reducing duplicate jobs and processing things in a more balanced way. We still had a very large backlog and are expanding quickly so I still think more resources would be noticeable.