Page MenuHomeMiraheze

John (John Lewis)
User

Projects (24)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 17 2016, 19:20 (264 w, 1 d)
Availability
Available
IRC Nickname
JohnLewis
GitHub User
JohnFLewis
Miraheze User
John [ Global Accounts ]

Hi I'm John. I'm the Co-Founder of Miraheze and a Steward.

Recent Activity

Mon, May 3

Reception123 defrocked John.
Mon, May 3, 17:08
John triaged T7238: Removal of access for John as Normal priority.
Mon, May 3, 16:37 · Site Reliability Engineering
John updated John.
Mon, May 3, 16:35

Sun, May 2

John added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.
In T7230#143535, @John wrote:

Load times for mw*, so this is MediaWiki infrastructure. Please tag tasks correctly on future.

It's as far as we know to now caused by gluster.

Sun, May 2, 22:29 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
John edited projects for T7230: High I/O on cloud nodes affecting GlusterFS, added: MediaWiki (SRE); removed Site Reliability Engineering.

Load times for mw*, so this is MediaWiki infrastructure. Please tag tasks correctly on future.

Sun, May 2, 21:05 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
John assigned T7224: Uncompressed puppetserver json logs fill up disk to Paladox.

https://github.com/miraheze/puppet/commit/8fdd5bd235142e5103bdeadef3d2e7b9ab62b489 ?

Sun, May 2, 00:14 · Puppet, Infrastructure (SRE)

Thu, Apr 29

John closed T7067: Subscribe SRE to OpenCVE for notifications as Resolved.

I have created an account on OpenCVE and populated it with products/services we are using. Password can be found on Private Git.

Thu, Apr 29, 17:18 · Security, Site Reliability Engineering
John added a comment to T7214: Write docs for GHSA.
In T7214#143206, @Void wrote:

Our main focus is on the allowing others to view and managing private patches.

I could be wrong, but I think what @Void is suggesting with this comment is that the GitHub docs for creating security advisories in GitHub exist and are, presumably, fairly adequate, so there's not a real need to create our own tech docs?

Thu, Apr 29, 13:44 · Security, MediaWiki (SRE)
John removed a project from T7216: Private configs are also exposed by DataDump: DataDump.
Thu, Apr 29, 13:19 · MediaWiki (SRE), Security
John added a comment to T7213: ManageWiki API allows viewing configs that shouldn't be viewed publicly.
In T7213#143182, @Void wrote:

It seems that the visibility check is awkwardly implemented if each interface that exposes a setting needs to independently check the visibility.

Thu, Apr 29, 10:38 · ManageWiki, MediaWiki (SRE), Security

Wed, Apr 28

John added a comment to T7213: ManageWiki API allows viewing configs that shouldn't be viewed publicly.

If this can’t be done as a single source of truth, my personal opinion would be to get rid of public/private settings until they can be done safely and securely.

Wed, Apr 28, 18:35 · ManageWiki, MediaWiki (SRE), Security

Mon, Apr 26

John added a comment to T7195: Gluster crashed on mw8-10 - possible OOM.
In T7195#142778, @John wrote:

@Paladox can you take a look just to see if it’s only an OOM and nothing more serious?

Why is this tagged with the mediawiki team. We have minimal control and zero knowledge of gluster.

Mon, Apr 26, 07:46 · MediaWiki (SRE)

Sun, Apr 25

John edited projects for T7195: Gluster crashed on mw8-10 - possible OOM, added: MediaWiki (SRE); removed Site Reliability Engineering.

@Paladox can you take a look just to see if it’s only an OOM and nothing more serious?

Sun, Apr 25, 21:00 · MediaWiki (SRE)
Dmehus awarded T5397: Create a logbot for server actions a Like token.
Sun, Apr 25, 15:18 · Infrastructure (SRE)

Fri, Apr 23

John closed T5397: Create a logbot for server actions as Resolved.

/usr/local/bin/logsalmsg test

Fri, Apr 23, 21:37 · Infrastructure (SRE)
John committed rPUPC3b0191964ddb: add logsalmsg script (authored by John).
add logsalmsg script
Fri, Apr 23, 21:37
John committed rPUPC3e03caad3e46: change path for python script (authored by John).
change path for python script
Fri, Apr 23, 20:25
John committed rPUPC159e22a614f9: udp_port not port (authored by John).
udp_port not port
Fri, Apr 23, 20:20
John committed rPUPCf0b421ea01cc: add irclogserverbot manifest (authored by John).
add irclogserverbot manifest
Fri, Apr 23, 20:18

Thu, Apr 22

John committed rPUPCb060d404de73: fix hiera key (authored by John).
fix hiera key
Thu, Apr 22, 16:41
John committed rPUPC6330d1b62500: introduce jobrunner::intensive for high memory tasks (authored by John).
introduce jobrunner::intensive for high memory tasks
Thu, Apr 22, 15:28

Wed, Apr 21

John added a comment to T7173: 15m load average on mw* has been steadily rising since 6am.

@Reception123; can this be closed? Grafana shows this isn’t irregular and there’s a task already opened to increase capacity that is blocked in MWSRE.

Wed, Apr 21, 21:41 · MediaWiki (SRE)

Tue, Apr 20

John added a comment to T5877: Revise MariaDB backup strategy.

@Southparkfan updates the above?

Tue, Apr 20, 12:52 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
John added a comment to T7173: 15m load average on mw* has been steadily rising since 6am.

Grafana shows an increase in requests/s on both cp10 and cp11.

Tue, Apr 20, 07:27 · MediaWiki (SRE)
John edited projects for T7173: 15m load average on mw* has been steadily rising since 6am, added: MediaWiki (SRE); removed Site Reliability Engineering.
Tue, Apr 20, 07:22 · MediaWiki (SRE)

Mon, Apr 19

John added a comment to T4425: Fix all mysql tables that are using latin rather then utf8mb4.

@Southparkfan See the above please

Mon, Apr 19, 20:10 · Infrastructure (SRE)

Fri, Apr 16

John added a comment to T7011: Renames getting stuck on deleted wikis.

Rename worked when I did:

Fri, Apr 16, 20:02 · MediaWiki (SRE), MediaWiki
John added a comment to T7011: Renames getting stuck on deleted wikis.

Ignore the above - problem was caused by aaawiki not have a cache file on test3.

Fri, Apr 16, 19:37 · MediaWiki (SRE), MediaWiki
John added a comment to T7011: Renames getting stuck on deleted wikis.
root@test3:~# php /srv/mediawiki/w/maintenance/eval.php --wiki metawiki
> $jQ = JobQueueGroup::singleton('metawiki');
Fri, Apr 16, 19:23 · MediaWiki (SRE), MediaWiki
John added a comment to T7139: MediaWiki Capacity Proposal.

Deployment of jobrunner on all servers has now happened. Per the above, this is blocked on MediaWiki (SRE) deciding when they wish to deploy an additional two servers.

Fri, Apr 16, 18:16 · MediaWiki (SRE)
John committed rPUPC36d66f5f1324: reduce jobrunner processes to 1, but deploy to mw* (authored by John).
reduce jobrunner processes to 1, but deploy to mw*
Fri, Apr 16, 17:45
John added a comment to T7139: MediaWiki Capacity Proposal.

Steps to enact the above would be:

Fri, Apr 16, 17:05 · MediaWiki (SRE)

Wed, Apr 14

John triaged T7139: MediaWiki Capacity Proposal as Normal priority.
Wed, Apr 14, 19:30 · MediaWiki (SRE)

Tue, Apr 13

John assigned T7134: Puppet cannot remount GlusterFS mount if directory exists to Paladox.

@Paladox are you okay to have a look at this?

Tue, Apr 13, 23:45 · Puppet, Infrastructure (SRE)
John edited projects for T7135: Ingest PHP-FPM slowlogs into Graylog, added: Monitoring; removed Production Error.

Added monitoring, removed production error as no stack trace/ID/link was provided for a production error

Tue, Apr 13, 23:40 · Monitoring, MediaWiki (SRE)

Sun, Apr 11

John removed a project from T7127: Add more jobrunner rate tasks to Grafana: Redis-JobRunner.
Sun, Apr 11, 17:09 · MediaWiki (SRE), Monitoring
John closed T7108: Remove abandoned l-unclaimed entries as Resolved.

https://github.com/miraheze/jobrunner-service/compare/de7d72b68abc...7e6175d56b4e

Sun, Apr 11, 15:02 · Redis-JobRunner, Infrastructure (SRE)

Apr 9 2021

John added a comment to T7067: Subscribe SRE to OpenCVE for notifications.

It looks like a useful service, so we should definitely give it a try and see from a security perspective.

Apr 9 2021, 10:42 · Security, Site Reliability Engineering
John committed rPUPC63c60c548bf2: rm double keystroke (authored by John).
rm double keystroke
Apr 9 2021, 10:37
John closed T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket as Resolved.

Changes never got deployed on the server, this has been fixed now.

Apr 9 2021, 10:22 · Infrastructure (SRE)
John committed rPUPC222d2ffbc71c: jobrunner: ensure latest not present (authored by John).
jobrunner: ensure latest not present
Apr 9 2021, 10:19

Apr 8 2021

John closed T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket as Resolved.

T7107

Apr 8 2021, 11:27 · Infrastructure (SRE)
John closed T7107: Remove :rootjobs: periodically as Resolved.
Apr 8 2021, 11:26 · Redis-JobRunner, Infrastructure (SRE)
John moved T7107: Remove :rootjobs: periodically from Incoming to Short Term on the Infrastructure (SRE) board.
Apr 8 2021, 11:21 · Redis-JobRunner, Infrastructure (SRE)
John moved T7108: Remove abandoned l-unclaimed entries from Incoming to Short Term on the Infrastructure (SRE) board.
Apr 8 2021, 11:21 · Redis-JobRunner, Infrastructure (SRE)
John moved T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket from Incoming to Short Term on the Infrastructure (SRE) board.
Apr 8 2021, 11:21 · Infrastructure (SRE)
John added a comment to T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket.

Because of our monitoring, we’re doing fairly intensive Lua scripts on almost a 100k keys, this can take up to 2 seconds to run. We have set our connectTimeout in Redis has being 2s (https://github.com/miraheze/mw-config/blob/master/GlobalCache.php#L48).

Apr 8 2021, 10:17 · Infrastructure (SRE)
John edited projects for T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket, added: Infrastructure (SRE); removed Redis-JobRunner.

Redis software not the jobqueue software as this is manually ran, not a job

Apr 8 2021, 10:09 · Infrastructure (SRE)

Apr 7 2021

John moved T7108: Remove abandoned l-unclaimed entries from To Triage to Bugs on the Redis-JobRunner board.
Apr 7 2021, 20:31 · Redis-JobRunner, Infrastructure (SRE)
John moved T7107: Remove :rootjobs: periodically from To Triage to Features on the Redis-JobRunner board.
Apr 7 2021, 20:31 · Redis-JobRunner, Infrastructure (SRE)
John triaged T7108: Remove abandoned l-unclaimed entries as Normal priority.
Apr 7 2021, 20:31 · Redis-JobRunner, Infrastructure (SRE)
John triaged T7107: Remove :rootjobs: periodically as Low priority.
Apr 7 2021, 20:26 · Redis-JobRunner, Infrastructure (SRE)
John set the image for Redis-JobRunner to F1420607: fa-briefcase-blue.png.
Apr 7 2021, 20:20
John created Redis-JobRunner.
Apr 7 2021, 20:20
John committed rPUPC8bf8f9bce546: jobrunner: only run jobchron on one server (authored by John).
jobrunner: only run jobchron on one server
Apr 7 2021, 20:03
John committed rPUPC2229561d9ad9: jobrunner: use Miraheze repo not Wikimedia (authored by John).
jobrunner: use Miraheze repo not Wikimedia
Apr 7 2021, 15:42
Reception123 awarded T6974: Jobs Statistics in Grafana a Haypence token.
Apr 7 2021, 04:35 · Monitoring, MediaWiki (SRE)
Dmehus awarded T6974: Jobs Statistics in Grafana a Like token.
Apr 7 2021, 01:54 · Monitoring, MediaWiki (SRE)
John closed T6974: Jobs Statistics in Grafana as Resolved.

https://grafana.miraheze.org/d/3L3WYylMz/mediawiki-job-queue?orgId=1

Apr 7 2021, 01:19 · Monitoring, MediaWiki (SRE)
John committed rPUPCadcfd9f89f8f: wiki:jobqueue not global:jobqueue (authored by John).
wiki:jobqueue not global:jobqueue
Apr 7 2021, 00:25
John committed rPUPCa4a8da3589f0: remove claimed and delayed data collection (authored by John).
remove claimed and delayed data collection
Apr 7 2021, 00:17

Apr 6 2021

John added a comment to T6974: Jobs Statistics in Grafana.

https://github.com/miraheze/puppet/blob/master/modules/prometheus/files/redis/jobQueueCollector.lua

Apr 6 2021, 23:49 · Monitoring, MediaWiki (SRE)
John committed rPUPC7429c584a9bf: add jobQueueCollector script to Redis Prometheus exporter (authored by John).
add jobQueueCollector script to Redis Prometheus exporter
Apr 6 2021, 23:32
John added a comment to T6974: Jobs Statistics in Grafana.

Basic LUA script to handle this:

Apr 6 2021, 18:48 · Monitoring, MediaWiki (SRE)

Apr 5 2021

John claimed T6974: Jobs Statistics in Grafana.
Apr 5 2021, 11:41 · Monitoring, MediaWiki (SRE)
John added a comment to T7073: Install prometheus-es-exporter for prometheus <-> graylog integration.

Since there are more uses than MediaWiki, should this be tagged as MediaWiki (SRE) only?

Apr 5 2021, 11:16 · MediaWiki (SRE), Monitoring

Apr 1 2021

John edited projects for T7073: Install prometheus-es-exporter for prometheus <-> graylog integration, added: MediaWiki (SRE); removed Infrastructure (SRE).
Apr 1 2021, 00:08 · MediaWiki (SRE), Monitoring

Mar 31 2021

John added a comment to T7073: Install prometheus-es-exporter for prometheus <-> graylog integration.

Is there a use case for this that the ES data source wouldn’t fulfil? Is this the approach MediaWiki (SRE) wish to take? If so this would fall under the MW team to implement as part of their task as without a use case for Infra, what’s the point in implementing something unused?

Mar 31 2021, 23:41 · MediaWiki (SRE), Monitoring

Mar 28 2021

John changed the status of T6984: High load on dbbackup servers, a subtask of T5877: Revise MariaDB backup strategy, from Stalled to Open.
Mar 28 2021, 23:07 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
John changed the status of T6984: High load on dbbackup servers from Stalled to Open.

Not blocked on external entity

Mar 28 2021, 23:07 · Database, Monitoring, Infrastructure (SRE)
John moved T7033: Restart services running on older openssl binaries from Incoming to Short Term on the Infrastructure (SRE) board.
Mar 28 2021, 19:26 · Infrastructure (SRE), Security
John assigned T7033: Restart services running on older openssl binaries to Southparkfan.
Mar 28 2021, 19:25 · Infrastructure (SRE), Security
John removed a project from T7046: New Resource Request for MediaWiki-Extension-Updates: MediaWiki (SRE).
Mar 28 2021, 19:20 · Infrastructure (SRE)
John closed T7046: New Resource Request for MediaWiki-Extension-Updates as Declined.

We still need something to test on though. I suggest we use test3 at first. We therefore just need to know which db to put the cached info on.

Mar 28 2021, 19:18 · Infrastructure (SRE)

Mar 27 2021

John closed T7042: salt-ssh broken due to unknown minion as Invalid.

Sounds like there isn't a problem then?

Mar 27 2021, 10:15 · Infrastructure (SRE)
John added a comment to T7033: Restart services running on older openssl binaries.

Do we have an update on this? Also, who is taking responsibility for this?

Mar 27 2021, 10:14 · Infrastructure (SRE), Security

Mar 26 2021

John reassigned T7046: New Resource Request for MediaWiki-Extension-Updates from John to Reception123.
Mar 26 2021, 18:24 · Infrastructure (SRE)
John added a comment to T7046: New Resource Request for MediaWiki-Extension-Updates.

Before I can review this, more information needs to be provided.

Mar 26 2021, 18:18 · Infrastructure (SRE)

Mar 25 2021

John closed T7038: Existing Server Resource Request for bacula2 as Resolved.

Approved, with spending authorisation by @Southparkfan

Mar 25 2021, 22:41 · Infrastructure (SRE)
John edited P386 Resources Table.
Mar 25 2021, 22:32 · Cloud Infrastructure, Infrastructure (SRE)
John closed T7037: [New] Server Resource Request for ats as Resolved.

Approved for cloud4.

Mar 25 2021, 22:32 · Infrastructure (SRE)
John added a project to T7033: Restart services running on older openssl binaries: Infrastructure (SRE).
Mar 25 2021, 18:42 · Infrastructure (SRE), Security

Mar 23 2021

John closed T4191: Redesign compression of content inside NGINX and Varnish as Declined.

T4302 - if that task gets declined in the future then this task would need re-opening.

Mar 23 2021, 17:09 · Infrastructure (SRE), Varnish
John committed rPUPC086bcaa22a85: grafana: add sre-mediawiki as Editors (authored by John).
grafana: add sre-mediawiki as Editors
Mar 23 2021, 16:13

Mar 22 2021

John assigned T4302: Deploy Apache Traffic Server to Paladox.
Mar 22 2021, 20:29 · Infrastructure (SRE)
John claimed T5397: Create a logbot for server actions.
Mar 22 2021, 20:09 · Infrastructure (SRE)
John committed rPUPC704701df93d8: set service_count to 0 (authored by John).
set service_count to 0
Mar 22 2021, 19:04
John added a comment to T6974: Jobs Statistics in Grafana.

Get a list of all h-sha1ById, loop over them running a HLEN on the key will return how many unclaimed jobs there are by job type - add these up and then the data exists for both the whole jobqueue but also per job (and if you want to go further, each job type by each wiki)

Mar 22 2021, 17:06 · Monitoring, MediaWiki (SRE)
John committed rPUPC8107e7fc79c8: change mailname on all non-mail servers (authored by John).
change mailname on all non-mail servers
Mar 22 2021, 13:09
John closed T6976: General Mail Statistics as Resolved.

Reviewing the stats already put up by @Paladox and looking into Dovecot's stats facility in more detail, I don't believe we would gain any new information from Dovecot stats as the Postfix ones already cover all bases of mail, including connections, logins and auth failures.

Mar 22 2021, 13:00 · Monitoring, Mail, Infrastructure (SRE)

Mar 21 2021

Dmehus awarded T7008: Investigate database server/cache proxy issues and extreme load times this evening a Orange Medal token.
Mar 21 2021, 19:24 · Infrastructure (SRE)
John changed the start date for E239: Infrastructure SRE Weekly Meeting from Mar 21 2021, 20:00 to Mar 22 2021, 20:00.
Mar 21 2021, 18:25 · Infrastructure (SRE)
John changed the start date for E239: Infrastructure SRE Weekly Meeting from Mar 21 2021, 19:00 to Mar 21 2021, 20:00.
Mar 21 2021, 18:24 · Infrastructure (SRE)
John set E239: Infrastructure SRE Weekly Meeting to repeat weekly.
Mar 21 2021, 18:22 · Infrastructure (SRE)
John created E239: Infrastructure SRE Weekly Meeting.
Mar 21 2021, 18:22 · Infrastructure (SRE)
John cancelled E106: SRE Duty.
Mar 21 2021, 18:18
John cancelled E237: SRE Duty.
Mar 21 2021, 18:18
John cancelled E236: SRE Duty.
Mar 21 2021, 18:18
John cancelled E234: SRE Duty.
Mar 21 2021, 18:18
John cancelled E235: SRE Duty.
Mar 21 2021, 18:18