Page MenuHomeMiraheze
Feed Advanced Search

Yesterday

Southparkfan added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

Paladox changed the scheduler on cloud5. Let's wait for a day to see the impact on I/O performance (regular operations).

Thu, May 13, 19:10 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Paladox updated the task description for T7299: Upgrade to gluster 8.5.
Thu, May 13, 18:54 · Infrastructure (SRE)
Paladox updated the task description for T7299: Upgrade to gluster 8.5.
Thu, May 13, 18:50 · Infrastructure (SRE)
Paladox updated the task description for T7299: Upgrade to gluster 8.5.
Thu, May 13, 18:49 · Infrastructure (SRE)
Southparkfan lowered the priority of T7288: Upgrade gluster to 9.2 from Normal to Low.
Thu, May 13, 18:49 · Infrastructure (SRE)
Southparkfan moved T7288: Upgrade gluster to 9.2 from Incoming to Long Term on the Infrastructure (SRE) board.
Thu, May 13, 18:49 · Infrastructure (SRE)
Paladox moved T7299: Upgrade to gluster 8.5 from Incoming to Short Term on the Infrastructure (SRE) board.
Thu, May 13, 18:48 · Infrastructure (SRE)
Paladox triaged T7299: Upgrade to gluster 8.5 as High priority.
Thu, May 13, 18:48 · Infrastructure (SRE)

Tue, May 11

Paladox triaged T7288: Upgrade gluster to 9.2 as Normal priority.
Tue, May 11, 19:23 · Infrastructure (SRE)

Sun, May 9

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Going to decom dbbackup2 (we'll be using dbbackup1).

Sun, May 9, 19:49 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Wed, May 5

Paladox added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

Ah ignore the above, having looked around I don't see how we can do it then if you cannot use ionice within the script && also using it outside it too.

Wed, May 5, 12:41 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance

Tue, May 4

Paladox added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

Ohh should we run with —low

Tue, May 4, 22:18 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Paladox added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

I even ran it with ionice the command and the load on the disk was still high.

Tue, May 4, 22:13 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

On https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592149 it says ionice -c 3 /usr/share/mdadm/checkarray --cron --all

Tue, May 4, 16:50 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance

Mon, May 3

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Test backup: mydumper -G -E -R -v 3 -t 2 -c -L "/home/dbcopy/dbbackup1-mnt/$(date +"%Y%m%d%H%M%S").log" --trx-consistency-only

  • db11
    • Duration: still going on
    • Size: ?
    • Tables: ?
  • db12
    • Duration: 1615 minutes (26.9 hours)
    • Size: 26 GB
    • Tables: 156104
  • db13
    • Duration: 1359 minutes (22.7 hours)
    • Size: 35 GB
    • Tables: 125530
Mon, May 3, 22:42 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Paladox added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

The script runs with --idle which uses ionice, so it's already using it?

Mon, May 3, 18:19 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Paladox added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

On https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592149 it says ionice -c 3 /usr/share/mdadm/checkarray --cron --all

Mon, May 3, 18:11 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan updated the task description for T5044: Setup centralised logging for services.
Mon, May 3, 17:54 · Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Southparkfan added a comment to T6759: Automate the adding of SSL private keys to puppet3.

As discussed; candidate for Goal-2021-Jul-Dec.

Mon, May 3, 17:51 · Infrastructure (SRE), SSL
Southparkfan added a comment to T4302: Deploy Apache Traffic Server.

Discussed; handing over some of the tasks to me (see subtasks), we won't delay this.

Mon, May 3, 17:47 · Infrastructure (SRE)
Southparkfan lowered the priority of T6839: Upgrade puppet to puppet 7 from Normal to Low.

Puppet 6 is EOL in December 2022, no need to rush this. Scheduled for Q4 2021 / Q1-Q2 2022.

Mon, May 3, 17:46 · Puppet, Infrastructure (SRE)
Southparkfan moved T7241: ATS: Deploy healthchecker that depools/repools from Incoming to Long Term on the Infrastructure (SRE) board.
Mon, May 3, 17:41 · Infrastructure (SRE)
Southparkfan moved T7240: ATS: Review security from Incoming to Long Term on the Infrastructure (SRE) board.
Mon, May 3, 17:41 · Infrastructure (SRE)
Southparkfan moved T7239: ATS: Review performance from Incoming to Long Term on the Infrastructure (SRE) board.
Mon, May 3, 17:41 · Infrastructure (SRE)
Southparkfan lowered the priority of T7240: ATS: Review security from Normal to Low.

Until configuration has been synced (mostly) with Varnish'.

Mon, May 3, 17:41 · Infrastructure (SRE)
Southparkfan lowered the priority of T7239: ATS: Review performance from Normal to Low.

Until configuration has been synced (mostly) with Varnish'.

Mon, May 3, 17:41 · Infrastructure (SRE)
Paladox triaged T7241: ATS: Deploy healthchecker that depools/repools as Normal priority.
Mon, May 3, 17:40 · Infrastructure (SRE)
Paladox triaged T7240: ATS: Review security as Normal priority.
Mon, May 3, 17:39 · Infrastructure (SRE)
Paladox triaged T7239: ATS: Review performance as Normal priority.
Mon, May 3, 17:39 · Infrastructure (SRE)
Paladox updated the task description for T4302: Deploy Apache Traffic Server.
Mon, May 3, 17:37 · Infrastructure (SRE)
Paladox updated the task description for T4302: Deploy Apache Traffic Server.
Mon, May 3, 17:35 · Infrastructure (SRE)
Paladox added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

I found this https://github.com/kopchik/scripts/blob/master/checkarray#L70 when searching up.

Mon, May 3, 17:34 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan added a comment to T4425: Fix all mysql tables that are using latin rather then utf8mb4.

Discussed; paladox will contact Wikimedia DBAs.

Mon, May 3, 17:34 · Infrastructure (SRE)
Southparkfan moved T7230: High I/O on cloud nodes affecting GlusterFS from Incoming to Short Term on the Infrastructure (SRE) board.
Mon, May 3, 17:28 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan assigned T7230: High I/O on cloud nodes affecting GlusterFS to Paladox.
Mon, May 3, 17:28 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan added a member for Infrastructure (SRE): Southparkfan.
Mon, May 3, 17:22
Southparkfan removed a member for Infrastructure (SRE): John.
Mon, May 3, 17:22
Reception123 closed T7237: Create email account for Dmehus as Resolved.
Mon, May 3, 15:05 · Mail, Infrastructure (SRE)
Owen triaged T7237: Create email account for Dmehus as Normal priority.
Mon, May 3, 14:44 · Mail, Infrastructure (SRE)

Sun, May 2

Southparkfan added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

@RhinosF1 The task is for the Infrastructure team now, but JohnLewis couldn't have known that in the first place.

Sun, May 2, 22:45 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

Scheduler for sda and sdb: [mq-deadline] none

Sun, May 2, 22:32 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
John added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.
In T7230#143535, @John wrote:

Load times for mw*, so this is MediaWiki infrastructure. Please tag tasks correctly on future.

It's as far as we know to now caused by gluster.

Sun, May 2, 22:29 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan added a comment to T7230: High I/O on cloud nodes affecting GlusterFS.

Facts:

Sun, May 2, 22:24 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan renamed T7230: High I/O on cloud nodes affecting GlusterFS from Load times high enough to cause depool to High I/O on cloud nodes affecting GlusterFS.
Sun, May 2, 21:58 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan added projects to T7230: High I/O on cloud nodes affecting GlusterFS: Cloud Infrastructure, Infrastructure (SRE).
Sun, May 2, 21:57 · Infrastructure (SRE), Cloud Infrastructure, MediaWiki (SRE), Performance
Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Running on db1{2,3,4} simultaneously:

mydumper -G -E -R -v 3 -t 2 -c -L "/home/dbcopy/dbbackup1-mnt/$(date +"%Y%m%d%H%M%S").log"
Sun, May 2, 18:39 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Paladox closed T7224: Uncompressed puppetserver json logs fill up disk as Resolved.

That fixed it, yup! Thanks!

Sun, May 2, 01:09 · Puppet, Infrastructure (SRE)
John assigned T7224: Uncompressed puppetserver json logs fill up disk to Paladox.

https://github.com/miraheze/puppet/commit/8fdd5bd235142e5103bdeadef3d2e7b9ab62b489 ?

Sun, May 2, 00:14 · Puppet, Infrastructure (SRE)

Sat, May 1

Southparkfan triaged T7224: Uncompressed puppetserver json logs fill up disk as High priority.
Sat, May 1, 17:41 · Puppet, Infrastructure (SRE)

Mon, Apr 26

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Other tests required:

  • A test with the following settings: 1) -t 4 (true core count of each virtual machine) 2) --triggers --events --routines
  • Another test, but with -t 2 (to lessen server load)
  • What happens to performance if we backup three masters simultaneously? (reason: to maximise backup consistency)
Mon, Apr 26, 21:38 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan added a comment to T5877: Revise MariaDB backup strategy.
In T5877#142347, @John wrote:

@Southparkfan updates on the above?

Sorry for the lack of response. Still working on this: 16:36:25 <+SPF|Cloud> !log https://phabricator.miraheze.org/T5877#140588: run test backup on db11 with six threads. I stopped the backup from T5877#141278 mid-way by accident.

Command: mydumper -t 6 -v 3 -c --trx-consistency-only
Start: 2021-04-24 14:36 UTC
End: 2021-04-26 04:39 UTC (38 hours)
Backup size: 14 GB

Mon, Apr 26, 21:08 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Owen closed T7196: Create email accounts for new Trust and Safety team as Resolved.

@Owen Could you confirm whether you got an email from me?

Mon, Apr 26, 13:41 · Mail, Infrastructure (SRE)
Reception123 added a comment to T7196: Create email accounts for new Trust and Safety team.

@Owen Could you confirm whether you got an email from me?

Mon, Apr 26, 12:58 · Mail, Infrastructure (SRE)
Owen triaged T7196: Create email accounts for new Trust and Safety team as High priority.
Mon, Apr 26, 12:42 · Mail, Infrastructure (SRE)

Sun, Apr 25

Southparkfan closed T6984: High load on dbbackup servers, a subtask of T5877: Revise MariaDB backup strategy, as Invalid.
Sun, Apr 25, 12:08 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan closed T6984: High load on dbbackup servers as Invalid.

This won't be an issue anymore.

Sun, Apr 25, 12:08 · Database, Monitoring, Infrastructure (SRE)

Sat, Apr 24

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.
In T5877#142347, @John wrote:

@Southparkfan updates on the above?

Sat, Apr 24, 14:36 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan added a comment to T4425: Fix all mysql tables that are using latin rather then utf8mb4.
In T4425#142254, @John wrote:

@Southparkfan See the above please

Sat, Apr 24, 14:20 · Infrastructure (SRE)

Fri, Apr 23

John closed T5397: Create a logbot for server actions as Resolved.

/usr/local/bin/logsalmsg test

Fri, Apr 23, 21:37 · Infrastructure (SRE)

Tue, Apr 20

John added a comment to T5877: Revise MariaDB backup strategy.

@Southparkfan updates the above?

Tue, Apr 20, 12:52 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Mon, Apr 19

Paladox updated the task description for T5044: Setup centralised logging for services.
Mon, Apr 19, 21:44 · Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox added a comment to T5044: Setup centralised logging for services.

there's one other log I didn't think we need to send for proxmox (wasn't really any info we needed I think).

Mon, Apr 19, 21:44 · Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox added a comment to T5044: Setup centralised logging for services.

Added pve* logging via https://github.com/miraheze/puppet/pull/1713

Mon, Apr 19, 21:44 · Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox added a comment to T5044: Setup centralised logging for services.

I will try and finish this now (for cloud*)

Mon, Apr 19, 20:48 · Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
John added a comment to T4425: Fix all mysql tables that are using latin rather then utf8mb4.

@Southparkfan See the above please

Mon, Apr 19, 20:10 · Infrastructure (SRE)

Sat, Apr 17

Reception123 closed T7150: /mnt/mediawiki-static/requestmodel.phpml is not writeable by www-data / mediawiki-admins as Resolved.
Sat, Apr 17, 10:22 · Infrastructure (SRE)
RhinosF1 updated the task description for T7150: /mnt/mediawiki-static/requestmodel.phpml is not writeable by www-data / mediawiki-admins.
Sat, Apr 17, 09:08 · Infrastructure (SRE)
RhinosF1 triaged T7150: /mnt/mediawiki-static/requestmodel.phpml is not writeable by www-data / mediawiki-admins as High priority.
Sat, Apr 17, 09:08 · Infrastructure (SRE)

Wed, Apr 14

Paladox closed T7134: Puppet cannot remount GlusterFS mount if directory exists as Resolved.
Wed, Apr 14, 19:32 · Puppet, Infrastructure (SRE)
Paladox added a comment to T7134: Puppet cannot remount GlusterFS mount if directory exists.

I've applied https://phabricator.miraheze.org/rPUPC55c092fa5c56048e26e8104896a8accae825a383 and https://phabricator.miraheze.org/rPUPCce40d4fa9ce7f390a91da6599a314df97be0d5cf to see if this helps.

Wed, Apr 14, 19:21 · Puppet, Infrastructure (SRE)
Paladox added a comment to T7134: Puppet cannot remount GlusterFS mount if directory exists.
In T7134#141593, @John wrote:

@Paladox are you okay to have a look at this?

Wed, Apr 14, 14:14 · Puppet, Infrastructure (SRE)

Apr 14 2021

RhinosF1 added a comment to T7134: Puppet cannot remount GlusterFS mount if directory exists.

As long as the OOM is a one-off incident, I am not very concerned

Search for remount in SAL or check icinga history. It's not often but in unmounts every so often.

Apr 14 2021, 06:50 · Puppet, Infrastructure (SRE)

Apr 13 2021

John assigned T7134: Puppet cannot remount GlusterFS mount if directory exists to Paladox.

@Paladox are you okay to have a look at this?

Apr 13 2021, 23:45 · Puppet, Infrastructure (SRE)
Southparkfan created T7134: Puppet cannot remount GlusterFS mount if directory exists.
Apr 13 2021, 23:30 · Puppet, Infrastructure (SRE)

Apr 11 2021

John closed T7108: Remove abandoned l-unclaimed entries as Resolved.

https://github.com/miraheze/jobrunner-service/compare/de7d72b68abc...7e6175d56b4e

Apr 11 2021, 15:02 · Redis-JobRunner, Infrastructure (SRE)

Apr 9 2021

Southparkfan updated subscribers of T5877: Revise MariaDB backup strategy.

Running dump from db11 to dbbackup1:/srv/backups/db11. @Paladox and I are around to monitor.

Apr 9 2021, 22:21 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Paladox closed T6975: LDAP Statistics as Resolved.
Apr 9 2021, 21:24 · Monitoring, Infrastructure (SRE)
Paladox added a comment to T6975: LDAP Statistics.

I've added ldap monitoring. You can view at https://grafana.miraheze.org/d/uOLD33lMz/ldap?orgId=1

Apr 9 2021, 21:23 · Monitoring, Infrastructure (SRE)
John closed T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket as Resolved.

Changes never got deployed on the server, this has been fixed now.

Apr 9 2021, 10:22 · Infrastructure (SRE)
Reception123 reopened T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket as "Open".

@John I've run into the error again I'm afraid (though this time the dump has gone on for way longer, but eventually it happens)

Apr 9 2021, 09:34 · Infrastructure (SRE)

Apr 8 2021

John closed T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket as Resolved.

T7107

Apr 8 2021, 11:27 · Infrastructure (SRE)
John closed T7107: Remove :rootjobs: periodically as Resolved.
Apr 8 2021, 11:26 · Redis-JobRunner, Infrastructure (SRE)
John moved T7107: Remove :rootjobs: periodically from Incoming to Short Term on the Infrastructure (SRE) board.
Apr 8 2021, 11:21 · Redis-JobRunner, Infrastructure (SRE)
John moved T7108: Remove abandoned l-unclaimed entries from Incoming to Short Term on the Infrastructure (SRE) board.
Apr 8 2021, 11:21 · Redis-JobRunner, Infrastructure (SRE)
John moved T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket from Incoming to Short Term on the Infrastructure (SRE) board.
Apr 8 2021, 11:21 · Infrastructure (SRE)
John added a comment to T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket.

Because of our monitoring, we’re doing fairly intensive Lua scripts on almost a 100k keys, this can take up to 2 seconds to run. We have set our connectTimeout in Redis has being 2s (https://github.com/miraheze/mw-config/blob/master/GlobalCache.php#L48).

Apr 8 2021, 10:17 · Infrastructure (SRE)
John edited projects for T7112: JobQueueError from line 778 of /srv/mediawiki/w/includes/jobqueue/JobQueueRedis.php: Redis server error: socket error on read socket, added: Infrastructure (SRE); removed Redis-JobRunner.

Redis software not the jobqueue software as this is manually ran, not a job

Apr 8 2021, 10:09 · Infrastructure (SRE)

Apr 7 2021

John moved T7108: Remove abandoned l-unclaimed entries from To Triage to Bugs on the Redis-JobRunner board.
Apr 7 2021, 20:31 · Redis-JobRunner, Infrastructure (SRE)
John moved T7107: Remove :rootjobs: periodically from To Triage to Features on the Redis-JobRunner board.
Apr 7 2021, 20:31 · Redis-JobRunner, Infrastructure (SRE)
John triaged T7108: Remove abandoned l-unclaimed entries as Normal priority.
Apr 7 2021, 20:31 · Redis-JobRunner, Infrastructure (SRE)
John triaged T7107: Remove :rootjobs: periodically as Low priority.
Apr 7 2021, 20:26 · Redis-JobRunner, Infrastructure (SRE)

Apr 5 2021

Paladox added a comment to T4425: Fix all mysql tables that are using latin rather then utf8mb4.

@Southparkfan I'm wondering if I could have assistance on this please? This is a really big change and could lead to data loss.

Apr 5 2021, 19:09 · Infrastructure (SRE)

Apr 4 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

New performance test (using sshfs setup, 4 mydumper threads):

  • Uncompressed: 290 seconds
  • Compressed: 210 seconds
Apr 4 2021, 22:07 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

For reference: mydumper is superior to mysqldump due to its better performance (using multiple threads) and the flexibility (PCRE based table inclusion/exclusion) in conjunction with transaction consistency and (almost) no locking (no read-only time required during backups). However, mydumper does not support TLS in connections, so dumping must happen at the database master.

Apr 4 2021, 21:37 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Mar 31 2021

Southparkfan added a comment to T7073: Install prometheus-es-exporter for prometheus <-> graylog integration.

Proof of concept:
/etc/prometheus-es-exporter/mediawiki.cfg:

[query_log_mediawiki]
QueryIntervalSecs = 900
QueryIndices = <graylog_deflector>
QueryJson = {
    "size": 0,
    "track_total_hits": true,
        "query": {
                "bool": {
                        "must": [
                                {
                                        "match": {
                                                "application_name": "mediawiki"
                                        }
                                }
                        ],
                        "filter": [
                                {
                                        "range": {
                                                "timestamp": { "gte": "now-15m", "lte": "now" }
                                        }
                                }
                        ]
                }
        },
        "aggs": {
                "mediawiki-channels": {
                        "terms": {
                                "field": "mediawiki_channel"
                        }
                }
        }
    }
Mar 31 2021, 23:56 · MediaWiki (SRE), Monitoring
John added a comment to T7073: Install prometheus-es-exporter for prometheus <-> graylog integration.

Is there a use case for this that the ES data source wouldn’t fulfil? Is this the approach MediaWiki (SRE) wish to take? If so this would fall under the MW team to implement as part of their task as without a use case for Infra, what’s the point in implementing something unused?

Mar 31 2021, 23:41 · MediaWiki (SRE), Monitoring
Southparkfan triaged T7073: Install prometheus-es-exporter for prometheus <-> graylog integration as Normal priority.
Mar 31 2021, 23:01 · MediaWiki (SRE), Monitoring
Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

More testing is required to determine the final backup sizes.

Mar 31 2021, 15:10 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

A maintenance window is required for dumping from masters directly. Not because impact is guaranteed, but because dumping may cause database locks for multiple seconds, hence increasing save time or knocking wikis offline.

Mar 31 2021, 14:27 · Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Mar 29 2021

Southparkfan added a comment to T4302: Deploy Apache Traffic Server.

In order to do proper backend verification in the certificate (CN), we have tested using ENFORCE. However, the Host header from the client (e.g. allthetropes.org) is used for the CN check at the backend. Therefore, the allthetropes.org certificate would still be mandatory at the backend, even though I prefer to remove all certificates (including our wildcard one) but a single domain (such as ats-internal.miraheze.wiki) from the MediaWiki servers.

Mar 29 2021, 00:46 · Infrastructure (SRE)