Page MenuHomeMiraheze

[ACCESS REQUEST] Expanded access for Universal Omega
Closed, ResolvedPublic

Assigned To
Authored By
Unknown Object (User)
Aug 12 2022, 09:46
Referenced Files
None
Tokens
"Like" token, awarded by Reception123."Like" token, awarded by John.

Description

Shell name: universalomega
Requested access: Site Reliability Engineer

Rationale for access: With recent outage (T9656) db141 has been down for over two hours at this point. While obviously I can't do anything about it at this point, if I had this access, as I am the only one online, I would have been able to resolve a lot sooner. Up until now, I have done everything I could possibly do to avoid this level of full access, as I had no interest originally. I had instead opted to request what I felt a need for (instead opting for lower level access).

However, at this point, I do feel in order to help prevent this, and with some lack of volunteers with the needed access, and the lack in certain timezones, it would be what is needed for me to actually deal with these type of outages in the future.

I do feel I have the necessary experience for this elevation. However I would like to note, I likely would still only use it on most servers only in emergency situations, holding the access does not mean I am going to suddenly be comfortable with using it on most servers, in most circumstances. This will at least be the case at first, in the first couple of weeks to months, until I feel more comfortable with it. I will assist where needed also, but if any of this is an issue, this can be declined with no issues from me, I do think it would still be good for me though to be able to handle things when no one else is able to.

Also, just to note, this is to be a Site Reliability Engineer, however remaining under MediaWiki Engineering.

  • shell
  • icinga
  • GitHub (sre team)
  • GitHub (owner)
  • phab projects/badges
  • OVH access
  • Proxmox access
  • phab admin
  • 1&1 access
  • matomo superuser
  • grafana admin
  • graylog admin

Event Timeline

Unknown Object (User) created this task.Aug 12 2022, 09:46
Unknown Object (User) moved this task from Radar to Access on the Site Reliability Engineering board.Aug 12 2022, 09:49
Unknown Object (User) added a subscriber: John.Aug 12 2022, 15:54

As usual I think it would be useful to ask a few questions:

  1. Other than emergencies what would you mainly want to work on eventually? (i.e. Varnish, MariaDB, etc.) (still focusing on MW related services of course)
  1. Do you expect to continue to be as active as now in the future?
  1. How will you ensure a good relationship with the Infrastructure team considering that some work that you'd be doing could overlap with Infrastructure work?
Unknown Object (User) added a comment.Aug 14 2022, 18:07

As usual I think it would be useful to ask a few questions:

  1. Other than emergencies what would you mainly want to work on eventually? (i.e. Varnish, MariaDB, etc.) (still focusing on MW related services of course)

Well, Varnish is probably one of my more-known services to be able to understand, at least to some decent extent, and would hopefully be able to work on some issues caused by Varnish, which is effecting MediaWiki services, such as some apparent issue with the way Varnish cache splitting is done, which causes some issues with UniversalLanguageSelector. I also suspect (though am not 100% certain) that the issue with MobileFrontend not loading Mobile resources is caused by Varnish. I would hopefully (most likely in the longer term) be able to investigate this, and other MediaWiki-related issues.

Another thing I would likely be able to do is handle wiki database reset or rename requests. I don't know I would want do those (or anything else MariaDB related) immediately (as not unless I were fully comfortable with it, as database is a highly sensitive service, that making the wrong mistake could be catastrophic, but I do use sql.php often, and no sql pretty good, so query wise I would be fine), but I would probably eventually. While those are a bit rarer and handled by Reception123 in a reasonable amount of time, it would not hurt to have someone else able to handle those requests.

  1. Do you expect to continue to be as active as now in the future?

I do not expect to be as active as now, as in actively doing work here, but I do expect to at least be around as much as I am now when I am needed, or there is an issue requiring attention, or something else is needed. I do not mean I won't work on anything, but I won't be actively working on things 14 (or more) hours a day as I have been in the past either.

  1. How will you ensure a good relationship with the Infrastructure team considering that some work that you'd be doing could overlap with Infrastructure work?

I think that I already work with infrastructure sometimes. For example, with working on getting infrastructure the MTR reports for the recent issues with packet loss, and working with them on debugging some other issues. I have also liaised with paladox in the past for things like rebooting Memcached, to ensure there was no production issues, since it effects MediaWiki services. I would continue to do this, and be able to assist infrastructure if requested, and when necessary to the best of my ability.

After reading the responses to the questions and discussing with John, we've decided to approve this request.

Unless someone else wants to I can do the on-boarding tomorrow.

I can do some of the onboarding now:

  • shell
  • icinga
  • GitHub
  • phab projects/badges
  • OVH access
  • Proxmox access
MacFan4000 updated the task description. (Show Details)
Unknown Object (User) updated the task description. (Show Details)Aug 15 2022, 02:44
Unknown Object (User) updated the task description. (Show Details)Aug 15 2022, 02:50
Unknown Object (User) updated the task description. (Show Details)Aug 15 2022, 05:33
Unknown Object (User) updated the task description. (Show Details)Aug 15 2022, 05:58
Unknown Object (User) closed this task as Resolved.Aug 15 2022, 06:27
Unknown Object (User) assigned this task to Reception123.
Unknown Object (User) updated the task description. (Show Details)
Unknown Object (User) updated the task description. (Show Details)