Page MenuHomeMiraheze

Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails
Closed, ResolvedPublic

Description

@Dmehus reported an issue declining a wiki request:

I saw a 1983 entries in the log saying "Redis server error" on mw6 alone

Example:

2020-07-20 12:44:13 mw6 metawiki: [6e724fc4e36bcfc852763030] /wiki/Special:RequestWikiQueue/13281 JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: Could not insert 1 EchoNotificationDeleteJob job(s).

Checking the redis log gives:

2020-07-20 12:47:59 mw6 {something}wiki: Lua script error on server "51.89.160.135:6379": ERR Error running script (call to f_2aac70021ade78213b297e3e8316fa24f82d2897): @user_script:19: @user_script: 19: -MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.

Event Timeline

RhinosF1 triaged this task as Unbreak Now! priority.Jul 20 2020, 12:52
RhinosF1 created this task.

This was the specific error message returned when trying to create a wiki:

  • [3c36b21f9e3e057f62fcd1e1] 2020-07-20 12:43:49: Fatal exception of type "JobQueueError"

Cheers,
Doug

Given the amount of log spam and the things that require Redis/JobQueue, declared Unbreak Now!

I have just been able to create and decline wikis with no issues.

I have just been able to create and decline wikis with no issues.

We fixed it, we are just working on deciding how to re run affected jobs.

RhinosF1 lowered the priority of this task from Unbreak Now! to High.Jul 20 2020, 13:40

180 wikis had failed jobs during the time. Most of them being recent changes or links related. We're just seeing how possible it is to rebuild them.

RhinosF1 added a subscriber: Paladox.

I'll let @Paladox close this when they're happy but all jobs we can do anything about are running. We can do anything about some and the rest will automatically fix themselves.

Should we document how we resolved this as @Void suggested on Discord? I can partially document, but may need you and @Paladox to fill in certain back-end things I didn't see.

I simply killed redis-server using pkill -9 <pid>.