Page MenuHomeMiraheze

Segmentation fault in jobrunner
Open, NormalPublic

Description

There seems to be some problems with jobrunner recently (see comments from T7626#157135), but this is unrelated to that task. The jobrunner application logs on each mediawiki server indicate that this problem is caused by a segmentation fault.

Full details are a bit hard to determine, but it appears that this is error 14: attempt to execute code from an unmapped area.

For purposes of debugging, the error from the kernel log appears in the format of:
segfault at 7fXXXXXXXb80 ip 00007fXXXXXXXb80 sp 00007fXXXXXfea08 error 14
(I've kept everything that is the same between each entry)

Some also come with a bit more information, such as:
in gconv-modules.cache[7fXXXXXXX000+7000]
in LC_NUMERIC[7fXXXXXXX000+1000]
in libnss_dns-2.28.so[7fXXXXXXX000+1000]

Event Timeline

Void triaged this task as High priority.Aug 13 2021, 20:42
Void created this task.

For the record, this is about the software not the servers that didn't exist when this task was created.

John lowered the priority of this task from High to Normal.Aug 26 2021, 10:50
John added a subscriber: John.

Spoke with @Reception123 and this isn't high priority for the MW-SRE team.

Just noting that this is still continuing (confirmed via Graylog)