Page MenuHomeMiraheze

Investigate cause of redis server error (socket error on read socket) when CreateWiki Extension creates a wiki
Closed, DuplicatePublic

Description

Problem: With increasing frequency, it's happened again, with this wiki being created without the requesting user's account being locally attached and a Main Page created. It's happened a couple times in the preceding 30 days or so, to the best of my personal recollection.

CreateWiki Extension error returned: Exception experienced creating the wiki. Error is: Redis server error: socket error on read socket

Investigation required? Yes.

Suggested Investigation:

  1. Review CreateWiki Extension debugging error logs
  2. Review redis server error logs

I do wonder if there might be a small coding change needed to the CreateWiki Extension in terms of the way, form, and manner it communicates with the redis server, particularly since this redis server socket read error has been occurring with increasing frequency since we forked redis server? Anyway, just one possible avenue to explore.

Related tasks, on-wiki, and other log entries:

  1. T7372 - where @Paladox ran the createLocalAccount.php for me on nebulaprojectwiki to locally create and attach @WikiJS's Bukkit user account
  2. ManageWiki log entry - where I manually granted local bureaucrat and sysop to WikiJS' Bukkit user account

Event Timeline

Dmehus triaged this task as High priority.May 27 2021, 02:33
Dmehus created this task.
Dmehus moved this task from Backlog to Short Term on the MediaWiki (SRE) board.
Dmehus added a project: Production Error.

Adding Production Error project, as I'm never sure whether to use that or Configuration

Dmehus renamed this task from Investigate cause of wiki being created without locally attaching the requesting user and creating a default Main Page to Investigate cause of redis server error (socket error on read socket) when CreateWiki Extension creates a wiki.May 27 2021, 02:40
Dmehus edited projects, added Redis-JobRunner; removed Production Error.

Removed Production Error and added Redis-JobRunner as I'm guessing the former was not correct.

Unknown Object (User) reopened this task as Open.May 27 2021, 02:58
Unknown Object (User) lowered the priority of this task from High to Normal.

Actually this may be a different error. I remember it first happened when John did work to Redis-JobRunner, so this must be a different error I guess.

Unknown Object (User) added a comment.May 27 2021, 03:08

I think this would be Infrastructure (SRE) task if it has to do with Redis-JobRunner causing the issue though not certain whether infrastructure or ourselves should handle/investigate this so leaving as is for now.

I think this would be Infrastructure (SRE) task if it has to do with Redis-JobRunner causing the issue though not certain whether infrastructure or ourselves should handle/investigate this so leaving as is for now.

No objection, really, to the lowering of the priority to normal, but even though the Redis-JobRunner server is involved, it's related to the redis server's communication with the CreateWiki extension, so it's one of those things where Redis-JobRunner is essentially a shared responsibility between the MediaWiki (SRE) and Infrastructure (SRE) teams depending on the infrastructure it's relating to. In this case, it's related to MediaWiki-related jobs, so in scope of the MediaWiki (SRE) team.

Dmehus moved this task from Backlog to Bugs on the CreateWiki board.

Actually this may be a different error. I remember it first happened when John did work to Redis-JobRunner, so this must be a different error I guess.

But yes, I agree 100% that while potentially tangentially related to T7338, this is a different error as it related to the CreateWiki's communication issues with the Redis-JobRunner server.

Actually this may be a different error. I remember it first happened when John did work to Redis-JobRunner, so this must be a different error I guess.

Note: On this occasion, we got an actual error in the description whereas T7338 was a silent fail.

Redis-JobRunner is additional software. It’s like saying MediaWiki is the cause of Matomo’s SSL certificate failing just because they have the same certificate - they’re entirely unrelated but confirmation bias suggests there’s a link because it’s easier to explain than an unknown cause.

Unknown Object (User) added a comment.May 31 2021, 17:10

I think, but not 100% certain this happens when redis is using 100% of available memory. Possibly could be fixed by allowing more memory.

I could be totally wrong here though.

Unknown Object (User) added a comment.EditedMay 31 2021, 17:12

Also see similar error on T7112. I remember when that was happening CreateWiki also gave this error, almost 100% of the time as well.

Unknown Object (User) added a comment.EditedMay 31 2021, 17:14

And yes, this doesn't have anything to do with CreateWiki directly so shouldn't have been tagged with that (although no way you could've known that @Dmehus so no problem). I wasn't sure about Redis-JobRunner though but I guess not that either.

It's happened again here in this request.

Exception experienced creating the wiki. Error is: Redis server error: socket error on read socket

Seems to be an issue with the CreateWiki extension's ability to communicate with the Redis-JobRunner. I do think the issue is likely on the CreateWiki extension side now, though.

Edit: Oddly, this time the Main Page was created and the user account was attached. Very strange. But still no created farmer log entry. This definitely needs to fixed, though.

Unknown Object (User) added a comment.Jun 1 2021, 23:32

It's happened again here in this request.

Exception experienced creating the wiki. Error is: Redis server error: socket error on read socket

Seems to be an issue with the CreateWiki extension's ability to communicate with the Redis-JobRunner. I do think the issue is likely on the CreateWiki extension side now, though.

I don't think it is, see what I said at T7373#147442

It's happened again here in this request.

Exception experienced creating the wiki. Error is: Redis server error: socket error on read socket

Seems to be an issue with the CreateWiki extension's ability to communicate with the Redis-JobRunner. I do think the issue is likely on the CreateWiki extension side now, though.

I don't think it is, see what I said at T7373#147442

I did, and I guess that's possible, though I talked to @Reception123 and he thinks either this bug or the related other bug might be related to your CreateWiki improvements possibly?

Unknown Object (User) added a comment.EditedJun 1 2021, 23:37

It's happened again here in this request.

Exception experienced creating the wiki. Error is: Redis server error: socket error on read socket

Seems to be an issue with the CreateWiki extension's ability to communicate with the Redis-JobRunner. I do think the issue is likely on the CreateWiki extension side now, though.

I don't think it is, see what I said at T7373#147442

I did, and I guess that's possible, though I talked to @Reception123 and he thinks either this bug or the related other bug might be related to your CreateWiki improvements possibly?

I'm 99% sure it isn't anything to do with CreateWiki directly. (Leaving the 1% in the event I am wrong, which I'm almost certain it isn't) I'm pretty sure it has to do with redis memory. But again, could be wrong.

Unknown Object (User) added a comment.Jun 1 2021, 23:58

OK did a bit of investigation. It seems at the time of the CreateWiki extension giving that warning in the mentioned request (23:22 1 June 2021) there was actually a drop of memory usage (a drop from 999MB to 743MB total memory usage).

All the graphs in Grafana seemed to change drastically for <1 minute. https://grafana.miraheze.org/d/HZGjmu_Zz/redis?orgId=1&from=1622589600000&to=1622590200000

Not 100% sure this is related but it seems to be. I am still certain (even more so now) it isn't related to CreateWiki.

Unknown Object (User) added a comment.Jul 16 2021, 03:11

Has this happened recently?

Unknown Object (User) added a comment.Jul 16 2021, 04:14
In T7373#153617, @Void wrote:

Could be T7626?

Yes it seems likely.

Feel free to reopen if not, but closing as duplicate since it would seem to be connected as that would make alot of sense.