Page MenuHomeMiraheze

Redis server keeps failing due to unable to allocate memory
Closed, ResolvedPublic

Description

Starting 04:08 - 07:46 - Redis was failing as unable to allocate memory leaving us with the below issues. This happened again at 08:30. redis-server was restarted both times but we need to prevent this happening again as it renders us unusable.

---Earlier description---
At 04:20, users started reporting that globally the following action were failing due to session related issues.

"There seems to be a problem with your login session; this action has been canceled as a precaution against session hacking.  Please resubmit the form."

Users can not do the following actions while logged in or log in:

  • log out
  • edit
  • any other logged action

See below for a redis error starting 04:08, let me or Reception know for more information.

Event Timeline

Olaroll created this task.Dec 18 2019, 04:34
Olaroll updated the task description. (Show Details)Dec 18 2019, 04:38
dross triaged this task as Unbreak Now! priority.Dec 18 2019, 05:37
dross added a subscriber: dross.

Sitewide login issues. Reputably affecting several well-known users, including self.

Pmattes added a subscriber: Pmattes.Dec 18 2019, 06:17

Thanks for reporting! I’m working with @Reception123 to find a cause now.

RhinosF1 renamed this task from Unable to log in to Unable to act while logged in.Dec 18 2019, 07:39
RhinosF1 updated the task description. (Show Details)
RhinosF1 added a project: MediaWiki.
RhinosF1 renamed this task from Unable to act while logged in to Unable to act while logged in or log in.Dec 18 2019, 07:39

So far I've seen

360:M 18 Dec 2019 07:40:46.030 # Can't save in background: fork: Cannot allocate memory
8:41 AM 360:M 18 Dec 2019 07:40:52.036 * 1 changes in 60 seconds. Saving...

but don't have enough time to investigate further.

RhinosF1 closed this task as Resolved.Dec 18 2019, 07:47
RhinosF1 claimed this task.
RhinosF1 updated the task description. (Show Details)

07:46:02 <Reception123> !log restarted redis-server

RhinosF1 reopened this task as Open.Dec 18 2019, 08:25

Reopening, looks to have hit again

And @Reception123 has restarted redis-server again

Can edit but we've noticed on bgo.miraheze.org that even though https://bgo.miraheze.org/wiki/File:SantasSleigh.png is deleted, mediawiki doesn't seem to think so.

RhinosF1 added a comment.EditedDec 18 2019, 08:33

Can edit but we've noticed on bgo.miraheze.org that even though https://bgo.miraheze.org/wiki/File:SantasSleigh.png is deleted, mediawiki doesn't seem to think so.

REDIS does JobQueue (see 57592e8a26532c52833cfe6f in exception.log)

RhinosF1 removed Reception123 as the assignee of this task.Dec 18 2019, 08:43

Unassigning @Reception123 as not available anymore and unable to implement a more permanent fix or work out root cause.

RhinosF1 renamed this task from Unable to act while logged in or log in to Redis server keeps failing due to unable to allocate memory.Dec 18 2019, 08:45
RhinosF1 updated the task description. (Show Details)
RhinosF1 removed a project: MediaWiki.
J-Josyu added a subscriber: J-Josyu.Dec 18 2019, 08:54

I can't use the social profile feature private message, is it related to this problem?

I can't use the social profile feature private message, is it related to this problem?

When it fails, there is very little you will be able to do logged in, please state the full error though.

10:38:22 <+SPF|Cloud> !log install cron on mics2 to restart redis each 15 minutes

RhinosF1 lowered the priority of this task from Unbreak Now! to High.Dec 18 2019, 12:35

Seems stable, Thanks @Reception123 and @Southparkfan for the response today.

Constantly restarting redis doesn't seem like a long term solution so leaving open for further action/review.

and can we get an incident report for this?

I'm getting the same message as I got this morning. Unable to save a changed page.

I'm getting the same message as I got this morning. Unable to save a changed page.

Can you clear your cache?

Paladox added a subscriber: Paladox.Dec 18 2019, 15:33

@Timboliu999 hi, could you log back in through Special:UserLogin? Sorry for inconvenience.

RhinosF1 reassigned this task from Southparkfan to Paladox.Dec 18 2019, 15:54
RhinosF1 added a subscriber: Southparkfan.

had more reports, @Paladox is working on it.

Raising to UBN! to reflect reality.

RhinosF1 raised the priority of this task from High to Unbreak Now!.Dec 18 2019, 15:54
RhinosF1 added subscribers: Pfyh, Zppix.

Special:UserLogin is not possible.
When I try to logout I get the following message. Log out failed due to session error. Please try again.

Special:UserLogin is not possible.
When I try to logout I get the following message. Log out failed due to session error. Please try again.

We're aware and working on it.

Pfyh added a comment.Dec 18 2019, 15:59

@RhinosF1 Adding $wgCookieSecure = false; to the bottom of my localSettings.php file solved the problem. Can you do this please?

Wedhro added a subscriber: Wedhro.Dec 18 2019, 16:10
Pfyh added a comment.Dec 18 2019, 16:11

@RhinosF1 Now it's back.

In T4995#95133, @Pfyh wrote:

@RhinosF1 Now it's back.

@Paladox managed to find a solution.

Great work! I can save pages again. Thanks!

Pfyh added a comment.Dec 18 2019, 16:25

@RhinosF1 and @Paladox thank you so much!

RhinosF1 closed this task as Resolved.Dec 18 2019, 16:29

Seems to be fine now

A few things in the timeline are wrong. Would also be nice for a public more expanded reason than ‘redis had a technical issue’.