Page MenuHomeMiraheze
Feed Advanced Search

Jun 25 2022

Paladox closed T5044: Setup centralised logging for services as Resolved.

Resolved

Jun 25 2022, 15:54 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox updated the task description for T5044: Setup centralised logging for services.
Jun 25 2022, 15:54 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
John added a comment to T5044: Setup centralised logging for services.

@Paladox less than a week until end of goal period - do we have an update on this?

Jun 25 2022, 13:02 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

May 9 2022

Unknown Object (User) moved T5044: Setup centralised logging for services from Backlog to Central Logging on the Monitoring board.
May 9 2022, 19:26 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Unknown Object (User) added a project to T5044: Setup centralised logging for services: Monitoring.
May 9 2022, 19:26 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Apr 16 2022

Paladox updated the task description for T5044: Setup centralised logging for services.
Apr 16 2022, 23:18 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Feb 21 2022

Paladox claimed T5044: Setup centralised logging for services.
Feb 21 2022, 15:09 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox updated the task description for T5044: Setup centralised logging for services.
Feb 21 2022, 14:46 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Jan 1 2022

John added a comment to T5044: Setup centralised logging for services.

This task has taken a back foot, over other work which has higher priority currently such as T8469 T8350

Jan 1 2022, 10:45 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Unknown Object (User) moved T5044: Setup centralised logging for services from Backlog to Infrastructure on the Goal-2022-Jan-Jun board.
Jan 1 2022, 03:22 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Unknown Object (User) added a project to T5044: Setup centralised logging for services: Goal-2022-Jan-Jun.
Jan 1 2022, 03:15 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Dec 5 2021

John added a comment to T5044: Setup centralised logging for services.

I am going to start progress on this task, firstly by cleaning up how we define all of this in puppet. I'll introduce simply logging stanzas that we can define over and over again for each log file, that handles all of the syslog-ng logic + logrotate configuration for the new system.

Dec 5 2021, 21:06 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Nov 7 2021

Unknown Object (User) merged T7135: Ingest PHP-FPM slowlogs into Graylog into T5044: Setup centralised logging for services.
Nov 7 2021, 00:59 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Oct 20 2021

John added a comment to T5044: Setup centralised logging for services.

New server list for checking the above plan against:

Oct 20 2021, 12:45 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
John added a comment to T5044: Setup centralised logging for services.

Plan for resolving this task:

  • All services will have their logs ingested into Graylog, this isn't negotiable.
  • Where logs are ingested, we will maintain 24-48 hours of *local* logs on the server. This will be supported by log rotation.
Oct 20 2021, 12:33 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Oct 15 2021

John closed T5877: Revise MariaDB backup strategy as Resolved.

This is now resolved.

Oct 15 2021, 17:11 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Oct 13 2021

John added a comment to T5877: Revise MariaDB backup strategy.

db13:

  • Time taken: 2 hours and 20 minutes
  • Size: 33G
Oct 13 2021, 19:35 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
John added a comment to T5877: Revise MariaDB backup strategy.

https://github.com/miraheze/puppet/compare/6d6dcbc15b0e...139bf730eb26 automates this for daily, so we should have a live accessible copy for a 24 hour RPO - and bacula will store backups for a longer period of time (TBD).

Oct 13 2021, 13:27 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
John added a comment to T5877: Revise MariaDB backup strategy.

The backup ran for 14 hours before I killed it as it caused T8163.

Oct 13 2021, 10:04 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Oct 12 2021

John added a comment to T5877: Revise MariaDB backup strategy.

Currently doing the above command but over an NFS mount to dbbackup1 which is in the US. This will take significantly longer - that is the main thing I am interested in right now.

Oct 12 2021, 20:14 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
John added a comment to T5877: Revise MariaDB backup strategy.
mydumper -G -E -R -m -v 3 -t 2 -c -x "^(?!([0-9a-z]+wiki.(objectcache|querycache|querycachetwo|recentchanges|searchindex)))" -L "/home/johnflewis/$(date +"%Y%m%d%H%M%S").log" --trx-consistency-only

On db12:

  • Time taken: 103 minutes (1 hour and 43 minutes)
  • Size: 30G
Oct 12 2021, 18:53 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Oct 11 2021

John added a comment to T5877: Revise MariaDB backup strategy.

Trying to optimise the dump by reducing amount of data carried over (because not everything in MediaWiki is irreplaceable!)

Oct 11 2021, 22:12 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
John claimed T5044: Setup centralised logging for services.
Oct 11 2021, 18:08 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
John added a comment to T5044: Setup centralised logging for services.

T7740 is likely to be influenced by work done on this task.

Oct 11 2021, 18:06 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Oct 9 2021

John claimed T5877: Revise MariaDB backup strategy.
Oct 9 2021, 20:26 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Sep 28 2021

John placed T5877: Revise MariaDB backup strategy up for grabs.

De-assigned per lack of progress.

Sep 28 2021, 11:25 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Sep 21 2021

John added a comment to T5877: Revise MariaDB backup strategy.

@Southparkfan Any updates on this task? If there isn't an update provided in a week, I'll reassign the task to ensure it gets completed.

Sep 21 2021, 20:14 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Aug 10 2021

Paladox added a comment to T5044: Setup centralised logging for services.
In T5044#156437, @John wrote:

@Paladox has raised concerns with centralised-only logging. We should explore these concerns before pushing for things like nginx access logs as these are critical for debugging some traffic influx/DoS attacks.

Aug 10 2021, 16:57 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Reception123 added a comment to T5044: Setup centralised logging for services.

I agree with that. At least for some logs it's definitely useful to have logs stored locally in case something goes wrong and the logs don't get transmitted to graylog.

Aug 10 2021, 14:14 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
John updated subscribers of T5044: Setup centralised logging for services.

@Paladox has raised concerns with centralised-only logging. We should explore these concerns before pushing for things like nginx access logs as these are critical for debugging some traffic influx/DoS attacks.

Aug 10 2021, 12:20 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
John added a comment to T5877: Revise MariaDB backup strategy.

Updates since last one on June 1st?

Aug 10 2021, 11:24 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Jul 31 2021

Unknown Object (User) updated subscribers of T5044: Setup centralised logging for services.
Jul 31 2021, 00:25 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Unknown Object (User) updated the task description for T5044: Setup centralised logging for services.
Jul 31 2021, 00:25 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Jul 28 2021

Excelsis added a comment to T5412: Review changes made to a wiki via Special:ManageWiki before submitting them.

This has now been deployed.

Jul 28 2021, 19:18 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) closed T5412: Review changes made to a wiki via Special:ManageWiki before submitting them as Resolved.

This has now been deployed.

Jul 28 2021, 17:45 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki

Jul 26 2021

Unknown Object (User) added a comment to T5412: Review changes made to a wiki via Special:ManageWiki before submitting them.

Currently blocked on community consensus.

Jul 26 2021, 20:40 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) added a comment to T5412: Review changes made to a wiki via Special:ManageWiki before submitting them.

https://github.com/miraheze/ManageWiki/pull/290

Jul 26 2021, 17:41 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) moved T5412: Review changes made to a wiki via Special:ManageWiki before submitting them from Long Term to Goals on the Universal Omega board.
Jul 26 2021, 06:38 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) moved T5412: Review changes made to a wiki via Special:ManageWiki before submitting them from Long Term to Goals on the MediaWiki (SRE) board.
Jul 26 2021, 06:38 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) moved T5412: Review changes made to a wiki via Special:ManageWiki before submitting them from Backlog to Miraheze Extensions on the Goal-2021-Jul-Dec board.
Jul 26 2021, 06:38 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) added a project to T5412: Review changes made to a wiki via Special:ManageWiki before submitting them: Goal-2021-Jul-Dec.

I guess this wasn't moved over to the next goal period, so doing that.

Jul 26 2021, 06:38 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) claimed T5412: Review changes made to a wiki via Special:ManageWiki before submitting them.

I drafted a bit of JS for this, using the oojs dialogs. This should be fairly good to do, with a "review" button, next to the save button, so it does not annoy users if they don't want to review them.

Jul 26 2021, 02:51 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki

Jul 3 2021

Unknown Object (User) moved T5877: Revise MariaDB backup strategy from Backlog to Infrastructure on the Goal-2021-Jul-Dec board.
Jul 3 2021, 18:45 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Unknown Object (User) added a project to T5877: Revise MariaDB backup strategy: Goal-2021-Jul-Dec.

Moving over to new goal period. Feel free to remove if it isn't wanted to be moved over.

Jul 3 2021, 18:45 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Unknown Object (User) moved T5044: Setup centralised logging for services from Backlog to Infrastructure on the Goal-2021-Jul-Dec board.
Jul 3 2021, 18:44 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Unknown Object (User) added a project to T5044: Setup centralised logging for services: Goal-2021-Jul-Dec.

Moving over to new goal period. Feel free to remove if it isn't wanted to be moved over.

Jul 3 2021, 18:43 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Unknown Object (User) added a project to T5412: Review changes made to a wiki via Special:ManageWiki before submitting them: Universal Omega.
Jul 3 2021, 18:27 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) moved T5412: Review changes made to a wiki via Special:ManageWiki before submitting them from Goals to Long Term on the MediaWiki (SRE) board.
Jul 3 2021, 18:26 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki

Jun 14 2021

Void added a comment to T5044: Setup centralised logging for services.

I could look into taking this over from @Paladox. Is there anything not on this task that I should be aware of if I do?

Jun 14 2021, 19:57 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox placed T5044: Setup centralised logging for services up for grabs.
Jun 14 2021, 18:59 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Jun 13 2021

Unknown Object (User) added a comment to T4420: Introduce stats for IncidentReports.

This went live after T7117: Upgrade to MediaWiki 1.36.0 was done.

Jun 13 2021, 02:53 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting
Unknown Object (User) closed T4420: Introduce stats for IncidentReports as Resolved.
Jun 13 2021, 02:51 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting

Jun 1 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

The latency between db and dbbackup causes the slowness in the dump process. Moving the dbbackup VM to NL should improve the performance, but NL is much closer to UK than the US is. A disaster impacting both UK and NL is not very likely, but still...

Jun 1 2021, 19:06 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

May 27 2021

Unknown Object (User) claimed T4420: Introduce stats for IncidentReports.

https://github.com/miraheze/IncidentReporting/pull/22 should complete this.

May 27 2021, 19:28 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting

May 9 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Going to decom dbbackup2 (we'll be using dbbackup1).

May 9 2021, 19:49 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

May 3 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Test backup: mydumper -G -E -R -v 3 -t 2 -c -L "/home/dbcopy/dbbackup1-mnt/$(date +"%Y%m%d%H%M%S").log" --trx-consistency-only

  • db11
    • Duration: 2095 minutes (34.9 hours)
    • Size: 14 GB
    • Tables: 204,174
  • db12
    • Duration: 1615 minutes (26.9 hours)
    • Size: 26 GB
    • Tables: 156,104
  • db13
    • Duration: 1359 minutes (22.7 hours)
    • Size: 35 GB
    • Tables: 125,530
May 3 2021, 22:42 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan updated the task description for T5044: Setup centralised logging for services.
May 3 2021, 17:54 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

May 2 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Running on db1{2,3,4} simultaneously:

mydumper -G -E -R -v 3 -t 2 -c -L "/home/dbcopy/dbbackup1-mnt/$(date +"%Y%m%d%H%M%S").log"

EDIT: trying again with --trx-consistency-only

May 2 2021, 18:39 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Apr 26 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Other tests required:

  • A test with the following settings: 1) -t 4 (true core count of each virtual machine) 2) --triggers --events --routines
  • Another test, but with -t 2 (to lessen server load)
  • What happens to performance if we backup three masters simultaneously? (reason: to maximise backup consistency)
Apr 26 2021, 21:38 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan added a comment to T5877: Revise MariaDB backup strategy.
In T5877#142347, @John wrote:

@Southparkfan updates on the above?

Sorry for the lack of response. Still working on this: 16:36:25 <+SPF|Cloud> !log https://phabricator.miraheze.org/T5877#140588: run test backup on db11 with six threads. I stopped the backup from T5877#141278 mid-way by accident.

Command: mydumper -t 6 -v 3 -c --trx-consistency-only
Start: 2021-04-24 14:36 UTC
End: 2021-04-26 04:39 UTC (38 hours)
Backup size: 14 GB

Apr 26 2021, 21:08 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Apr 25 2021

Southparkfan closed T6984: High load on dbbackup servers, a subtask of T5877: Revise MariaDB backup strategy, as Invalid.
Apr 25 2021, 12:08 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Apr 24 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.
In T5877#142347, @John wrote:

@Southparkfan updates on the above?

Apr 24 2021, 14:36 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Apr 20 2021

John added a comment to T5877: Revise MariaDB backup strategy.

@Southparkfan updates on the above?

Apr 20 2021, 12:52 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Apr 19 2021

Paladox updated the task description for T5044: Setup centralised logging for services.
Apr 19 2021, 21:44 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox added a comment to T5044: Setup centralised logging for services.

there's one other log I didn't think we need to send for proxmox (wasn't really any info we needed I think).

Apr 19 2021, 21:44 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox added a comment to T5044: Setup centralised logging for services.

Added pve* logging via https://github.com/miraheze/puppet/pull/1713

Apr 19 2021, 21:44 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox added a comment to T5044: Setup centralised logging for services.

I will try and finish this now (for cloud*)

Apr 19 2021, 20:48 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Apr 9 2021

Southparkfan updated subscribers of T5877: Revise MariaDB backup strategy.

Running dump from db11 to dbbackup1:/srv/backups/db11. @Paladox and I are around to monitor.

Apr 9 2021, 22:21 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Apr 4 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

New performance test (using sshfs setup, 4 mydumper threads):

  • Uncompressed: 290 seconds
  • Compressed: 210 seconds
Apr 4 2021, 22:07 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

For reference: mydumper is superior to mysqldump due to its better performance (using multiple threads) and the flexibility (PCRE based table inclusion/exclusion) in conjunction with transaction consistency and (almost) no locking (no read-only time required during backups). However, mydumper does not support TLS in connections, so dumping must happen at the database master.

Apr 4 2021, 21:37 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Apr 3 2021

Unknown Object (User) removed a project from T5412: Review changes made to a wiki via Special:ManageWiki before submitting them: Universal Omega.
Apr 3 2021, 06:55 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) removed a project from T5412: Review changes made to a wiki via Special:ManageWiki before submitting them: Universal Omega.
Apr 3 2021, 06:55 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) removed a project from T4420: Introduce stats for IncidentReports: Universal Omega.
Apr 3 2021, 06:53 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting
Unknown Object (User) placed T4420: Introduce stats for IncidentReports up for grabs.
Apr 3 2021, 06:53 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting

Mar 31 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

More testing is required to determine the final backup sizes.

Mar 31 2021, 15:10 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

A maintenance window is required for dumping from masters directly. Not because impact is guaranteed, but because dumping may cause database locks for multiple seconds, hence increasing save time or knocking wikis offline.

Mar 31 2021, 14:27 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Mar 28 2021

John changed the status of T6984: High load on dbbackup servers, a subtask of T5877: Revise MariaDB backup strategy, from Stalled to Open.
Mar 28 2021, 23:07 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan changed the status of T6984: High load on dbbackup servers, a subtask of T5877: Revise MariaDB backup strategy, from Open to Stalled.
Mar 28 2021, 22:44 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Mar 27 2021

Unknown Object (User) moved T5105: Investigate and Implement basic Machine Learning concepts for automatic wiki creation from Backlog to Goals on the MediaWiki (SRE) board.
Mar 27 2021, 17:06 · MediaWiki (SRE), Goal-2021-Jan-Jun, Universal Omega, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun, CreateWiki

Mar 25 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

A maintenance window is required for dumping from masters directly. Not because impact is guaranteed, but because dumping may cause database locks for multiple seconds, hence increasing save time or knocking wikis offline.

Mar 25 2021, 22:08 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Mar 21 2021

Unknown Object (User) moved T5412: Review changes made to a wiki via Special:ManageWiki before submitting them from Unsorted to Long Term on the Universal Omega board.
Mar 21 2021, 19:47 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki
Unknown Object (User) moved T4420: Introduce stats for IncidentReports from Unsorted to Goals on the Universal Omega board.
Mar 21 2021, 19:45 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting

Mar 18 2021

Southparkfan added a comment to T5877: Revise MariaDB backup strategy.

Perhaps, it may be possible to directly dump from the masters, with very little interruption: https://stackoverflow.com/q/56715657.
In that case, we can use the RamNode VMs to store the logical dumps (mydumper to stdout | ssh - local file). The disadvantage is that we won't have a live replica at all times (if a master crashes for good, the data between <most recent backup> and <crash> will be lost), but it's much cheaper: I/O limit is not much of an issue and since data is not replicated, there is more space for storing logical dumps.

Mar 18 2021, 23:08 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Southparkfan added a subtask for T5877: Revise MariaDB backup strategy: T6984: High load on dbbackup servers.
Mar 18 2021, 13:36 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec
Unknown Object (User) added a comment to T4420: Introduce stats for IncidentReports.
In T4420#138213, @John wrote:
In T4420#138210, @John wrote:

When I try this and select ‘show number of incidents’ and ‘show all services’, all the rows turn up empty work no numbers. This is the same for visible outage and total outage.

Oh, hmm. That didn't happen to me when I was testing this. I will attach screenshots of local test shortly

If this is deployed, a local test bears no value to the point here because it’s deployed in production now.

Mar 18 2021, 00:11 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting
Unknown Object (User) added a comment to T4420: Introduce stats for IncidentReports.
Mar 18 2021, 00:05 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting
John added a comment to T4420: Introduce stats for IncidentReports.
In T4420#138210, @John wrote:

When I try this and select ‘show number of incidents’ and ‘show all services’, all the rows turn up empty work no numbers. This is the same for visible outage and total outage.

Oh, hmm. That didn't happen to me when I was testing this. I will attach screenshots of local test shortly

Mar 18 2021, 00:05 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting

Mar 17 2021

Unknown Object (User) added a comment to T4420: Introduce stats for IncidentReports.
In T4420#138210, @John wrote:

When I try this and select ‘show number of incidents’ and ‘show all services’, all the rows turn up empty work no numbers. This is the same for visible outage and total outage.

Mar 17 2021, 23:58 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting
John reopened T4420: Introduce stats for IncidentReports as "Open".

When I try this and select ‘show number of incidents’ and ‘show all services’, all the rows turn up empty work no numbers. This is the same for visible outage and total outage.

Mar 17 2021, 23:50 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting
Unknown Object (User) added a comment to T4420: Introduce stats for IncidentReports.

Done with https://github.com/miraheze/IncidentReporting/pull/17

Mar 17 2021, 21:10 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting
Unknown Object (User) closed T4420: Introduce stats for IncidentReports as Resolved.
Mar 17 2021, 21:10 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting

Mar 15 2021

Paladox added a comment to T5044: Setup centralised logging for services.

Done with:

Mar 15 2021, 00:30 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Mar 14 2021

Paladox closed T6071: Set up replicas for all database clusters, a subtask of T5877: Revise MariaDB backup strategy, as Resolved.
Mar 14 2021, 23:51 · Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Database, Goal-2020-Jul-Dec

Mar 11 2021

Reception123 moved T4420: Introduce stats for IncidentReports from Backlog to Goals on the MediaWiki (SRE) board.
Mar 11 2021, 07:03 · Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, IncidentReporting
Reception123 moved T5412: Review changes made to a wiki via Special:ManageWiki before submitting them from Backlog to Goals on the MediaWiki (SRE) board.
Mar 11 2021, 07:02 · Goal-2021-Jul-Dec, Universal Omega, MediaWiki (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, ManageWiki

Mar 10 2021

John closed T5105: Investigate and Implement basic Machine Learning concepts for automatic wiki creation as Resolved.

https://github.com/miraheze/CreateWiki/pull/200 makes this task resolved, only setting a configuration in LS is required now to enable this.

Mar 10 2021, 16:40 · MediaWiki (SRE), Goal-2021-Jan-Jun, Universal Omega, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun, CreateWiki

Mar 9 2021

Southparkfan added a comment to T5044: Setup centralised logging for services.

We switched off syslog-ng logging on the cloud servers. Not sure if we want to switch it back on @John @Southparkfan ?

Yes, let's see if we can receive proxmox logs without further tweaking.

Mar 9 2021, 11:45 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox updated the task description for T5044: Setup centralised logging for services.
Mar 9 2021, 01:19 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun
Paladox added a comment to T5044: Setup centralised logging for services.

So I've created and merge this pull https://github.com/miraheze/puppet/pull/1695. Essentially logs for puppetserver/puppetdb are now read and sent to graylog.

Mar 9 2021, 01:15 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun

Mar 8 2021

Paladox added a comment to T5044: Setup centralised logging for services.

We switched off syslog-ng logging on the cloud servers. Not sure if we want to switch it back on @John @Southparkfan ?

Mar 8 2021, 01:01 · Monitoring, Goal-2022-Jan-Jun, Goal-2021-Jul-Dec, Infrastructure (SRE), Goal-2021-Jan-Jun, Goal-2020-Jul-Dec, Goal-2020-Jan-Jun