Page MenuHomeMiraheze

PHP-FPM issues
Closed, ResolvedPublic

Description

we are having constant alerts and 50x errors due to high php-fpm child usage.

Event Timeline

MacFan4000 triaged this task as Unbreak Now! priority.Apr 8 2022, 01:42
MacFan4000 created this task.
MacFan4000 lowered the priority of this task from Unbreak Now! to High.Apr 8 2022, 03:01

Appears to have stabilized for now, but this should probably be investigated

Appears to have stabilized for now, but this should probably be investigated

Agreed with the above, and thanks to all SRE who were on hand to stabilize the issues.

Is this not likely related as a whole to T8876 ?

Is this not likely related as a whole to T8876 ?

It’s definitely the same line of enquiry in terms of investigation.

Though given the user impact, an incident report should be generated for this.

MacFan4000 raised the priority of this task from High to Unbreak Now!.Apr 9 2022, 00:42

Happening again tonight (8 April)

@Universal_Omega and I have looked and have no idea why this is happening or how to fix it. Restarts of services and reboots of servers aren't helping. When/if this stabilizes, this task will remain UBN as we really need to look into this.

Unknown Object (User) added a comment.Apr 9 2022, 01:58

Yes I have tried rebooting mw* which has absolutely no effect either. So yes +1 to this remaining UBN as resolving this should be #1 priority, but I am out of things I'm able to do here also.

There is a bit of a pattern with these outages, both times test101 was also affected, and both times it was at the same time of day. (roughly 8:00 PM EST/0:00 UTC)

RhinosF1 changed the visibility from "Public (No Login Required)" to "Custom Policy".Apr 9 2022, 09:16
RhinosF1 changed the edit policy from "All Users" to "Custom Policy".
RhinosF1 added projects: Performance, Security.

This is a DOS

Unknown Object (User) removed a subscriber: Dmehus.Apr 9 2022, 09:21

This is a DOS

That's what I figured also.

I've had a quick glance through traffic patterns and there's no spike in actual requests. It must be something being accessed but no hint as to what.

It's very strange to happen exact same time.

There is a bit of a pattern with these outages, both times test101 was also affected, and both times it was at the same time of day. (roughly 8:00 PM EST/0:00 UTC)

I'm not sure how related test101's issues are.

There is a bit of a pattern with these outages, both times test101 was also affected, and both times it was at the same time of day. (roughly 8:00 PM EST/0:00 UTC)

I'm not sure how related test101's issues are.

This was an observation made by @Universal_Omega

Unknown Object (User) added a comment.Apr 9 2022, 16:20

test101 having the issues also seemed very strange to me seeing as how it currently didn't even work, as it is down right now.

Not seen since backups disabled

Agent_Isai lowered the priority of this task from Unbreak Now! to High.Apr 12 2022, 15:55

Complete outage as we saw on 7/8 April has not occurred since database backups were disabled so lowering from UBN to High as this is not currently impacting us anymore

Unknown Object (User) added a comment.Apr 15 2022, 00:52

Is their a reason this is a security task still? It is not an issue that users can reproduce themselves so see no reason why.

Unknown Object (User) closed this task as Resolved.Apr 15 2022, 01:10
Unknown Object (User) claimed this task.
Unknown Object (User) changed the visibility from "Custom Policy" to "Public (No Login Required)".
Unknown Object (User) changed the edit policy from "Custom Policy" to "All Users".