PHP-FPM issues
Closed, ResolvedPublic
Actions

Assigned To

Unknown Object (User)

Authored By

	MacFan4000
	Apr 8 2022, 01:42

Description

we are having constant alerts and 50x errors due to high php-fpm child usage.

Related Objects

Mentioned Here: T8876: Investigate MW 502/503 errors/high load (poor optimisation)

Event Timeline

MacFan4000 triaged this task as Unbreak Now! priority.Apr 8 2022, 01:42

MacFan4000 created this task.

Herald added subscribers: Agent_Isai, Unknown Object (User), RhinosF1, Reception123. · View Herald TranscriptApr 8 2022, 01:42

MacFan4000 added subscribers: John, Paladox, Void.Apr 8 2022, 02:17

Appears to have stabilized for now, but this should probably be investigated

In T9049#183124, @MacFan4000 wrote:

Appears to have stabilized for now, but this should probably be investigated

Agreed with the above, and thanks to all SRE who were on hand to stabilize the issues.

Dmehus awarded a token.Apr 8 2022, 03:31

Is this not likely related as a whole to T8876 ?

In T9049#183204, @Reception123 wrote:

Is this not likely related as a whole to T8876 ?

It’s definitely the same line of enquiry in terms of investigation.

Though given the user impact, an incident report should be generated for this.

https://meta.miraheze.org/wiki/Special:IncidentReports/49 I've put in everything I know

Happening again tonight (8 April)

@Universal_Omega and I have looked and have no idea why this is happening or how to fix it. Restarts of services and reboots of servers aren't helping. When/if this stabilizes, this task will remain UBN as we really need to look into this.

Yes I have tried rebooting mw* which has absolutely no effect either. So yes +1 to this remaining UBN as resolving this should be #1 priority, but I am out of things I'm able to do here also.

There is a bit of a pattern with these outages, both times test101 was also affected, and both times it was at the same time of day. (roughly 8:00 PM EST/0:00 UTC)

Appears to have stabilized

This is a DOS

https://grafana.miraheze.org/d/dsHv5-4nz/mediawiki?orgId=1&from=1649366728843&to=1649489031546 shows a ten-fold jump in connections active at the same time

In T9049#183315, @RhinosF1 wrote:

This is a DOS

That's what I figured also.

I've had a quick glance through traffic patterns and there's no spike in actual requests. It must be something being accessed but no hint as to what.

It's very strange to happen exact same time.

In T9049#183296, @MacFan4000 wrote:

There is a bit of a pattern with these outages, both times test101 was also affected, and both times it was at the same time of day. (roughly 8:00 PM EST/0:00 UTC)

I'm not sure how related test101's issues are.

In T9049#183324, @RhinosF1 wrote:

In T9049#183296, @MacFan4000 wrote:

There is a bit of a pattern with these outages, both times test101 was also affected, and both times it was at the same time of day. (roughly 8:00 PM EST/0:00 UTC)

I'm not sure how related test101's issues are.

This was an observation made by @Universal_Omega

test101 having the issues also seemed very strange to me seeing as how it currently didn't even work, as it is down right now.

Status?

Not seen since backups disabled

Complete outage as we saw on 7/8 April has not occurred since database backups were disabled so lowering from UBN to High as this is not currently impacting us anymore

Is their a reason this is a security task still? It is not an issue that users can reproduce themselves so see no reason why.

Unknown Object (User) closed this task as Resolved.Apr 15 2022, 01:10

Unknown Object (User) claimed this task.

Unknown Object (User) changed the visibility from "Custom Policy" to "Public (No Login Required)".

Unknown Object (User) changed the edit policy from "Custom Policy" to "All Users".

PHP-FPM issuesClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

PHP-FPM issues
Closed, ResolvedPublic
Actions