Page MenuHomeMiraheze

John (John Lewis)
Engineering Manager, Infrastructure

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 17 2016, 19:20 (413 w, 1 d)
Availability
Available
IRC Nickname
JohnLewis
GitHub User
JohnFLewis
Miraheze User
John [ Global Accounts ]

Hi I'm John. I'm the Co-Founder of Miraheze and the Engineering Manager for the Infrastructure team.

Recent Activity

Mar 17 2023

John added a comment to T10617: Access Removal for John.
In T10617#213708, @John wrote:

John is pretty much threatening legal action against Miraheze right now

Where have I said this?

@John Your user page on Meta where you stated you were taking action against some users with NDAs. If I’m mistaken, I apologize, but that’s what it seemed to imply.

Mar 17 2023, 20:19 · Site Reliability Engineering
Bukkit awarded T10617: Access Removal for John a Heartbreak token.
Mar 17 2023, 19:14 · Site Reliability Engineering
John added a comment to T10617: Access Removal for John.

John is pretty much threatening legal action against Miraheze right now

Mar 17 2023, 18:42 · Site Reliability Engineering
Void removed the administrator role from John.
Mar 17 2023, 18:07
John triaged T10617: Access Removal for John as Normal priority.
Mar 17 2023, 13:27 · Site Reliability Engineering

Mar 3 2023

John added a comment to T10564: Phabricator MW OAuth broken.
In T10564#212732, @Void wrote:

Even if John is available to refresh the key tomorrow, it is still most likely worth changing the owner of the consumer to a shared staff account, such as the existing Miraheze Operations account.

Mar 3 2023, 21:41 · Universal Omega, Phabricator, Infrastructure (SRE), MediaWiki (SRE), MediaWiki

Feb 24 2023

John closed T10536: db112 is running out of disk space as Resolved.

https://github.com/miraheze/puppet/commit/bedbbf259236895187b13d9dde21e980787117bd temporary solution until we have more disk space to expand.

Feb 24 2023, 22:16 · Infrastructure (SRE), Monitoring
John closed T10510: Fix swift logging a 404 as 500 as Resolved.

Change deployed locally

Feb 24 2023, 21:44 · Swift, Infrastructure (SRE)
John changed the visibility for T10441: Convert the private miraheze.org key from rsa to pkcs8.
Feb 24 2023, 21:43 · Site Reliability Engineering
John lowered the priority of T10441: Convert the private miraheze.org key from rsa to pkcs8 from High to Low.
Feb 24 2023, 21:43 · Site Reliability Engineering
John added a comment to T10536: db112 is running out of disk space.

Will take a look over this later tonight

Feb 24 2023, 15:24 · Infrastructure (SRE), Monitoring
John closed T10357: Backups remain stored locally on db112; causing disk space full as Invalid.

So the idea itself is invalid - backups don't remain locally stored, the server runs out of space, causing the backup to fail.

Feb 24 2023, 15:20 · Infrastructure (SRE)

Feb 16 2023

John added a comment to T10510: Fix swift logging a 404 as 500.

Could this be why T10434 failed? If so, I think this task is high priority given the timeline left on Feb's SLO reporting

Feb 16 2023, 19:30 · Swift, Infrastructure (SRE)

Feb 15 2023

John added a comment to T10434: Infrastructure - Swift - SLO Errors Failure.

@Paladox half way through Feb, we really need to look into this ASAP

Feb 15 2023, 08:34 · Infrastructure (SRE), Swift, SLO

Feb 12 2023

John added a comment to T10483: Slow disk response on cloud14.

I've just ran that command on puppet141 25 times and the average was 0.007s (max of 0.017s) and 25 times on cloud14 which consistently gave 0.003s or 0.004s (max of 0.005s).

Feb 12 2023, 17:27 · Infrastructure (SRE)
John added a comment to T10483: Slow disk response on cloud14.

Grafana does not have data to suggest it is slower?

Feb 12 2023, 16:33 · Infrastructure (SRE)

Feb 11 2023

John closed T10218: MediaWiki - JobQueue - SLO Error/Availability Failure as Resolved.

I've looked into this and the metric being used in Grafana was wildly wrong.

Feb 11 2023, 17:58 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO

Feb 4 2023

John placed T10218: MediaWiki - JobQueue - SLO Error/Availability Failure up for grabs.

For January 2023 SLO Reporting - JobQueue failed the SLO for Errors.

Feb 4 2023, 12:28 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO
John moved T10218: MediaWiki - JobQueue - SLO Error/Availability Failure from Failure Stage 1 to Failure Stage 2 on the SLO board.
Feb 4 2023, 12:26 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO
John triaged T10434: Infrastructure - Swift - SLO Errors Failure as Normal priority.
Feb 4 2023, 12:25 · Infrastructure (SRE), Swift, SLO

Jan 28 2023

John added a comment to T10144: Renew miraheze.wiki.

@Collei why did you mark this Dec 2022 task as a duplicate of a Dec 2021 task?

Jan 28 2023, 09:55 · Infrastructure (SRE)

Jan 24 2023

John closed T10357: Backups remain stored locally on db112; causing disk space full as Resolved.

The backups are automatically deleted - backup/dbs was full of backups from early 2022, before the backup system even existed.

Jan 24 2023, 19:53 · Infrastructure (SRE)

Jan 23 2023

John added a comment to T8793: Create a formal Incident Response/Management Process.

The blocker on this task is actually unresolved

Jan 23 2023, 19:12 · Site Reliability Engineering
John added a comment to T10351: CreateWiki not granting bureaucrat/sysop permissions .

I do not - I haven't touched the code in about 2 years and the fact this started to happen immediately after an upgrade suggests something changed and CreateWiki wasn't tested correctly.

Jan 23 2023, 19:12 · MediaWiki (SRE), CreateWiki, Universal Omega

Jan 22 2023

John committed rPUPC78b37d4a4625: T8847: Add some infra monitoring docs.
T8847: Add some infra monitoring docs
Jan 22 2023, 22:29
John updated the task description for T8847: Icinga docs entries for all Infrastructure monitoring.
Jan 22 2023, 22:28 · Documentation, Monitoring, Infrastructure (SRE)
John closed T10161: Add "Scratch" to CSP whitelist as Resolved.
Jan 22 2023, 14:01 · Trust & Safety, MediaWiki (SRE), CSP Review
John closed T10153: [Elevation request] Expanded access for ssl-admins (dns) as Declined.

Boldly going to mark as declined due to concerns raised above.

Jan 22 2023, 12:43 · Site Reliability Engineering
John added a comment to T10219: MediaWiki - MediaWiki - SLO Availability Failure.

https://grafana.miraheze.org/d/pfjAbhf7k/mediawiki-slos?orgId=1&from=now-3h&to=now&viewPanel=22 now shows no data.

Jan 22 2023, 12:33 · Universal Omega, MediaWiki, MediaWiki (SRE), SLO

Jan 4 2023

John triaged T9658: Can't access my wiki as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T9669: Configuration request: $egAutoCreatePageNamespaces as Normal priority.
Jan 4 2023, 23:09 · Universal Omega, MediaWiki (SRE), Extensions, Configuration
John triaged T9838: File thumbnails not working: File missing (rucultrowiki.miraheze.org) as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T9660: Site down as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T9893: Files appear missing for the wiki since migrated to new file storage software (Swift) as Normal priority.
Jan 4 2023, 23:09 · MediaWiki, MediaWiki (SRE)
John triaged T9866: Vector2022 - Selected logo does not display as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T9844: World Trigger Wiki - Google site verification as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T9963: 503 Backend fetch failed as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T9965: Unable to upload file as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE), MediaWiki, Swift
John triaged T9986: Trouble with wiki infobox as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T9973: Connection timed out (db141) as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T10154: Wiki recovery as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T10077: Missing files on The Whisperers Wiki as Normal priority.
Jan 4 2023, 23:09 · Gluster, Swift
John triaged T10249: All The Tropes - database locked with no explanation as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John triaged T10175: Requesting Yoshipedia and Wariopedia to be reopened as Normal priority.
Jan 4 2023, 23:09 · MediaWiki (SRE)
John added a project to T10091: Import request for: [floogsims].miraheze.org (CLOSED - Self-resolved): MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9797: Renaming queue: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9934: Generate a dumps using DataDump: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9919: Recent Changes are not showing my edits: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T10024: Request backup dump for nenawiki.org and ndg.nenawiki.org: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T10003: Recovery of Wiki Domain/Upload of XML: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T10234: Cannot view most pages or edit thevibrantlands wiki: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T10240: $wgCompressRevisions activated: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T10187: "Wikimedia\Rdbms\DBQueryError" error when logging in: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9658: Can't access my wiki: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9660: Site down: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9866: Vector2022 - Selected logo does not display: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9838: File thumbnails not working: File missing (rucultrowiki.miraheze.org): MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9844: World Trigger Wiki - Google site verification : MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9963: 503 Backend fetch failed: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9973: Connection timed out (db141): MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T9986: Trouble with wiki infobox: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T10154: Wiki recovery : MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T10175: Requesting Yoshipedia and Wariopedia to be reopened: MediaWiki (SRE).
Jan 4 2023, 23:08 · MediaWiki (SRE)
John added a project to T10249: All The Tropes - database locked with no explanation: MediaWiki (SRE).
Jan 4 2023, 23:07 · MediaWiki (SRE)

Jan 2 2023

John added a comment to T10230: Restrict ability to delete bureaucrat and sysop group without Steward assistance.

A dialogue box if the group contains managewiki rights is a rather easy solution to implement as all the logic would just be checking the group rights that are already exposed for the rights selection pages.

Jan 2 2023, 23:23 · Joritochip, MediaWiki (SRE), ManageWiki
John added a comment to T10230: Restrict ability to delete bureaucrat and sysop group without Steward assistance.
In T10230#206299, @John wrote:

Personally, I think we should just provide a warning rather than restrict the ability entirely - a user can correctly reconfigure the wiki ecosystem and this would then generate a new type of workload of requesting stewards correctly delete a bureaucrat or sysop group thereby changing the problem, not fixing it.

Even better would be to make it so the user can't delete a user group if it would remove *their* access to ManageWiki - this would not be group specific and would solve the problem above while also resolving the core issue.

A Steward would only delete the group if an alternative exists for it. I discussed this with CosmicAlpha who said it would be harder to implement hence why this was suggested as an alternative instead. If anyone wants to give it a try then that'd be most welcome.

Jan 2 2023, 23:05 · Joritochip, MediaWiki (SRE), ManageWiki
John added a comment to T10230: Restrict ability to delete bureaucrat and sysop group without Steward assistance.

Personally, I think we should just provide a warning rather than restrict the ability entirely - a user can correctly reconfigure the wiki ecosystem and this would then generate a new type of workload of requesting stewards correctly delete a bureaucrat or sysop group thereby changing the problem, not fixing it.

Jan 2 2023, 22:48 · Joritochip, MediaWiki (SRE), ManageWiki
John closed T10232: mydump is causing issues with php-fpm as Resolved.

New run schedules:

Jan 2 2023, 12:18 · Infrastructure (SRE)
John added a comment to T10232: mydump is causing issues with php-fpm.

What probably doesn't help is all db servers take backups at the same time. I suggest we stagger these a bit more and then watch the next impact.

Jan 2 2023, 11:21 · Infrastructure (SRE)
John added a comment to T10232: mydump is causing issues with php-fpm.

This was actually the major reason we didn't have backups for so long - this performance problem.

Jan 2 2023, 11:11 · Infrastructure (SRE)
John added a comment to T10232: mydump is causing issues with php-fpm.

Suggestions welcome but the way it is is currently the best option.

Jan 2 2023, 11:10 · Infrastructure (SRE)

Jan 1 2023

John closed T10216: Infrastructure - Mail - SLO Error Failure as Resolved.

This has been fixed. This was generating around 1440 failures a day - in order to meet the error threshold with these numbers, we'd need to have sent 144000 emails a day, or 100 a minute. As we don't operate at these volumes, this was always going to be the case.

Jan 1 2023, 23:35 · Mail, Infrastructure (SRE), SLO
John closed T10217: Infrastructure - MariaDB - SLO Availability/Error Failure as Resolved.

Availability - having reviewed this, I am certain that the failure here is attributed to two things - one beyond our control and one where we have an open task that is blocked on MediaWiki (SRE) for a resolution.

Jan 1 2023, 22:04 · Database, Infrastructure (SRE), SLO
John assigned T9033: Decide whether to switch to opensearch as graylog seems to not be supporting versions passed 7.10 to Paladox.
Jan 1 2023, 19:41 · Goal-2023-Jan-Jun, Infrastructure (SRE)
John moved T9033: Decide whether to switch to opensearch as graylog seems to not be supporting versions passed 7.10 from Backlog to Infrastructure on the Goal-2023-Jan-Jun board.
Jan 1 2023, 19:41 · Goal-2023-Jan-Jun, Infrastructure (SRE)
John added a project to T9033: Decide whether to switch to opensearch as graylog seems to not be supporting versions passed 7.10: Goal-2023-Jan-Jun.
Jan 1 2023, 19:40 · Goal-2023-Jan-Jun, Infrastructure (SRE)
John set the color for Goal-2022-Jul-Dec to Green.
Jan 1 2023, 17:24

Dec 31 2022

John moved T9839: cloud11 intermittently showing drives as fault or logical drives not showing from Short Term to External on the Infrastructure (SRE) board.
Dec 31 2022, 17:03 · Cloud Infrastructure, Infrastructure (SRE)
John moved T10171: cloud13: new disks need to be looked at from Short Term to External on the Infrastructure (SRE) board.
Dec 31 2022, 17:03 · Cloud Infrastructure, Infrastructure (SRE)
John moved T10169: [New] Server Resource Request for db132 from Short Term to External on the Infrastructure (SRE) board.
Dec 31 2022, 17:03 · Infrastructure (SRE)
John moved T10216: Infrastructure - Mail - SLO Error Failure from Incoming to Short Term on the Infrastructure (SRE) board.
Dec 31 2022, 17:03 · Mail, Infrastructure (SRE), SLO
John moved T10217: Infrastructure - MariaDB - SLO Availability/Error Failure from Incoming to Short Term on the Infrastructure (SRE) board.
Dec 31 2022, 17:03 · Database, Infrastructure (SRE), SLO

Dec 30 2022

John triaged T10219: MediaWiki - MediaWiki - SLO Availability Failure as Normal priority.
Dec 30 2022, 22:38 · Universal Omega, MediaWiki, MediaWiki (SRE), SLO
John triaged T10218: MediaWiki - JobQueue - SLO Error/Availability Failure as Normal priority.
Dec 30 2022, 22:37 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO
John triaged T10217: Infrastructure - MariaDB - SLO Availability/Error Failure as Normal priority.
Dec 30 2022, 22:35 · Database, Infrastructure (SRE), SLO
John triaged T10216: Infrastructure - Mail - SLO Error Failure as Normal priority.
Dec 30 2022, 22:33 · Mail, Infrastructure (SRE), SLO
John closed T8801: Formalise and agree on Infrastructure SLOs as Resolved.
Dec 30 2022, 22:29 · Goal-2022-Jul-Dec, Goal-2022-Jan-Jun, Infrastructure (SRE)
John set the image for SLO to F1982242: fa-tags-yellow.png.
Dec 30 2022, 20:30
John created SLO.
Dec 30 2022, 20:30

Dec 29 2022

John moved T10169: [New] Server Resource Request for db132 from Incoming to Short Term on the Infrastructure (SRE) board.
Dec 29 2022, 20:15 · Infrastructure (SRE)
John edited projects for T10117: db121 frequently OOMs, added: MediaWiki (SRE); removed Infrastructure (SRE).

Re-assigning as the action above identified is one for the MediaWiki team and not Infrastructure

Dec 29 2022, 20:15 · Infrastructure (SRE), Database

Dec 28 2022

John updated the task description for T8801: Formalise and agree on Infrastructure SLOs.
Dec 28 2022, 22:19 · Goal-2022-Jul-Dec, Goal-2022-Jan-Jun, Infrastructure (SRE)
John updated the task description for T8801: Formalise and agree on Infrastructure SLOs.
Dec 28 2022, 21:55 · Goal-2022-Jul-Dec, Goal-2022-Jan-Jun, Infrastructure (SRE)
John updated the task description for T8801: Formalise and agree on Infrastructure SLOs.
Dec 28 2022, 21:46 · Goal-2022-Jul-Dec, Goal-2022-Jan-Jun, Infrastructure (SRE)
John updated the task description for T8801: Formalise and agree on Infrastructure SLOs.
Dec 28 2022, 21:21 · Goal-2022-Jul-Dec, Goal-2022-Jan-Jun, Infrastructure (SRE)
John committed rPUPCa64825a09210: prometheus: add basic graylog node.d exporter.
prometheus: add basic graylog node.d exporter
Dec 28 2022, 20:47

Dec 27 2022

John closed T8350: Redesign backup handling as Resolved.
Dec 27 2022, 23:07 · Goal-2022-Jul-Dec, Bacula, Infrastructure (SRE), Goal-2022-Jan-Jun
John added a comment to T8350: Redesign backup handling.

Backup schedules defined:

  • Private - weekly
  • SSL Keys - weekly
  • SQL - fortnightly
  • mediawiki-xml - MediaWiki (SRE) - can someone propose a time frame for XML dumps please? - 3 monthly?
  • Phabricator Static - fortnightly
Dec 27 2022, 19:05 · Goal-2022-Jul-Dec, Bacula, Infrastructure (SRE), Goal-2022-Jan-Jun
John added a comment to T8350: Redesign backup handling.
root@puppet141:~/private# /usr/local/bin/miraheze-backup backup private
Starting backup of 'private' for date 2022-12-27...
Completed! This took 8.501368522644043s
root@puppet141:~/private# /usr/local/bin/miraheze-backup backup sslkeys
Starting backup of 'sslkeys' for date 2022-12-27...
Completed! This took 7.49277400970459s
Dec 27 2022, 17:42 · Goal-2022-Jul-Dec, Bacula, Infrastructure (SRE), Goal-2022-Jan-Jun

Dec 26 2022

John added a comment to T9478: Add monitoring for high MariaDB connections.

This task doesn't indicate what we'd want to consider as 'high connection' (for warning and critical).

Dec 26 2022, 19:19 · Universal Omega, Monitoring, Infrastructure (SRE), Database