Page MenuHomeMiraheze

Members

  • This project does not have any members.
  • View All

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

This tag is used to monitor everything related to SLOs - whether this is new SLOs needing to be added for services being deployed - or collating tasks regarding SLO failures and review processes.

Recent Activity

Jun 16 2023

AmandaCath removed a member for SLO: Reception123.
Jun 16 2023, 18:09

May 19 2023

MacFan4000 removed a member for SLO: John.
May 19 2023, 20:00

Mar 21 2023

Void closed T10434: Infrastructure - Swift - SLO Errors Failure as Resolved.

Based on the fact that the error rate in swift dropped off massively in late February, at about the time T10510 was closed, I'm assuming that task was both the cause and additionally the solution.

Mar 21 2023, 01:53 · Infrastructure (SRE), Swift, SLO

Feb 15 2023

John added a comment to T10434: Infrastructure - Swift - SLO Errors Failure.

@Paladox half way through Feb, we really need to look into this ASAP

Feb 15 2023, 08:34 · Infrastructure (SRE), Swift, SLO

Feb 11 2023

John closed T10218: MediaWiki - JobQueue - SLO Error/Availability Failure as Resolved.

I've looked into this and the metric being used in Grafana was wildly wrong.

Feb 11 2023, 17:58 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO

Feb 4 2023

John placed T10218: MediaWiki - JobQueue - SLO Error/Availability Failure up for grabs.

For January 2023 SLO Reporting - JobQueue failed the SLO for Errors.

Feb 4 2023, 12:28 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO
John moved T10218: MediaWiki - JobQueue - SLO Error/Availability Failure from Failure Stage 1 to Failure Stage 2 on the SLO board.
Feb 4 2023, 12:26 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO
John triaged T10434: Infrastructure - Swift - SLO Errors Failure as Normal priority.
Feb 4 2023, 12:25 · Infrastructure (SRE), Swift, SLO

Feb 2 2023

Reception123 added a comment to T10218: MediaWiki - JobQueue - SLO Error/Availability Failure.

For January 2023 SLO Reporting - JobQueue failed the SLO for Errors
The Errors SLO agreed was: 1.5%.
The Performance achieved was: 3.4%.

Feb 2 2023, 16:15 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO

Jan 22 2023

Unknown Object (User) added a comment to T10218: MediaWiki - JobQueue - SLO Error/Availability Failure.

The availability was likely caused by cloud14 being down at one point and therefore so was mwtask141 and mw141/mw142 jobrunners. The issue with availability does not seem present so far in the past 30 days now.

Jan 22 2023, 22:58 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO
Unknown Object (User) closed T10219: MediaWiki - MediaWiki - SLO Availability Failure as Resolved.

This was originally due to the cloud14 outage, as every wiki hitting the down wikis returned a 500 status code. This was outside of our control.

Jan 22 2023, 22:37 · Universal Omega, MediaWiki, MediaWiki (SRE), SLO
Unknown Object (User) added a comment to T10219: MediaWiki - MediaWiki - SLO Availability Failure.
In T10219#208245, @John wrote:

https://grafana.miraheze.org/d/pfjAbhf7k/mediawiki-slos?orgId=1&from=now-3h&to=now&viewPanel=22 now shows no data.

This needs to be resolved before the next round of reporting on Jan 31st.

Jan 22 2023, 20:13 · Universal Omega, MediaWiki, MediaWiki (SRE), SLO
John added a comment to T10219: MediaWiki - MediaWiki - SLO Availability Failure.

https://grafana.miraheze.org/d/pfjAbhf7k/mediawiki-slos?orgId=1&from=now-3h&to=now&viewPanel=22 now shows no data.

Jan 22 2023, 12:33 · Universal Omega, MediaWiki, MediaWiki (SRE), SLO

Jan 21 2023

Reception123 assigned T10219: MediaWiki - MediaWiki - SLO Availability Failure to Unknown Object (User).
Jan 21 2023, 08:54 · Universal Omega, MediaWiki, MediaWiki (SRE), SLO
Reception123 assigned T10218: MediaWiki - JobQueue - SLO Error/Availability Failure to Unknown Object (User).
Jan 21 2023, 08:54 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO

Jan 1 2023

John closed T10216: Infrastructure - Mail - SLO Error Failure as Resolved.

This has been fixed. This was generating around 1440 failures a day - in order to meet the error threshold with these numbers, we'd need to have sent 144000 emails a day, or 100 a minute. As we don't operate at these volumes, this was always going to be the case.

Jan 1 2023, 23:35 · Mail, Infrastructure (SRE), SLO
John closed T10217: Infrastructure - MariaDB - SLO Availability/Error Failure as Resolved.

Availability - having reviewed this, I am certain that the failure here is attributed to two things - one beyond our control and one where we have an open task that is blocked on MediaWiki (SRE) for a resolution.

Jan 1 2023, 22:04 · Database, Infrastructure (SRE), SLO

Dec 31 2022

John moved T10216: Infrastructure - Mail - SLO Error Failure from Incoming to Short Term on the Infrastructure (SRE) board.
Dec 31 2022, 17:03 · Mail, Infrastructure (SRE), SLO
John moved T10217: Infrastructure - MariaDB - SLO Availability/Error Failure from Incoming to Short Term on the Infrastructure (SRE) board.
Dec 31 2022, 17:03 · Database, Infrastructure (SRE), SLO

Dec 30 2022

John triaged T10219: MediaWiki - MediaWiki - SLO Availability Failure as Normal priority.
Dec 30 2022, 22:38 · Universal Omega, MediaWiki, MediaWiki (SRE), SLO
John triaged T10218: MediaWiki - JobQueue - SLO Error/Availability Failure as Normal priority.
Dec 30 2022, 22:37 · Infrastructure (SRE), Universal Omega, MediaWiki, SLO
John triaged T10217: Infrastructure - MariaDB - SLO Availability/Error Failure as Normal priority.
Dec 30 2022, 22:35 · Database, Infrastructure (SRE), SLO
John triaged T10216: Infrastructure - Mail - SLO Error Failure as Normal priority.
Dec 30 2022, 22:33 · Mail, Infrastructure (SRE), SLO
John set the image for SLO to F1982242: fa-tags-yellow.png.
Dec 30 2022, 20:30
John created SLO.
Dec 30 2022, 20:30