Page MenuHomeMiraheze

[Access Request] Operations for NDKilla
Closed, ResolvedPublic

Description

I'd like to request Operations access. I already have root on mw* and I'm a cache admin. I'd like to expand my work (especially with DNS and parsoid config).

Below is a list of services and related comments (coppied from here)

List of services ran by Miraheze in production (correct as of 8/1/2016 and sourced from manifests/site.pp).

Next to each service, comment on your understanding, whether you feel comfortable enough handling the service and whether or not you want to get involved more with the service (through pairing, learning, redesigning etc.).

  • Bacula - LEARN
    • No knowledge besides what is on Meta Wiki. I tried to create a more user-friendly documentation for our backup procedures. I'd like to know how to fetch backups and use-them if needed, but do not plan on interfering with or changing the process at all.
      • That's fine. Bacula documentation is really basic, I need to expand on the restore command (how you fetch backups).
  • Varnish - WORK
    • Slight knowledge of how the software works. I'm currently a cache admin and know enough to be able to purge objects from the cache or how to bypass the cache myself (debug header)
      • This'll be SPF area but some knowledge with the desire to expand - sounds good.
  • NFS - LEARN
    • Knowledge of where it's mounted on our MediaWiki servers. Understanding how to create directories there (as www-data) for extensions etc, not sure what software is used or why really.
  • The software is NFS ;) but this is something we really want to kill so knowledge over this isn't any sort of deal breaker or maker - just knowing how to start the service if its down is good enough for me.
  • MariaDB - WORK
    • Somewhat more extensive knowledge here. I use mediawiki maintenance scripts and run direct queries through sql.php. I tweaked the createwiki query to figure out stats for approved requests only.
      • Practical usage good - operational usage, none? This is fine as well, MariaDB is mostly tweaked to a good degree and we don't even notice its running half the time as it's perhaps our 'best performing software'.
  • Icinga - LEARN
    • Pretty basic knowledge of how the software operates. I probably couldn't set it up but I find the monitoring extremely useful, and I read several pages of the documentation learning about the difference between active/passive checks, service/host checks, and what 'volatile services' are. If added to operations I'd add myself to the notifications.
      • Icinga is one of the 'one-time set up' things anyway. Just knowing how to add and remove service checks and hosts is really all is necessary.
  • Mail- NO
    • I have a mailbox instead of an alias but have no knowledge of how it was set up or how to do anything server-wise. Just using the mailbox
      • "<SPF|Cloud> Mail is complicated" is all I need to say in response to this.
  • DNS- WORK
    • Not very much knowledge about the software behind this, but I have made commits to DNS. I could probably set up DNS for most new sites, including setting up SSL for wikis with custom domains (of which I've handled several tasks related to this).
      • Good.
  • Phabricator- WORK/LEARN?
    • Basic knowledge. I had revi help set up an installation on a 1 GB VPS I had before getting rid of that DO droplet. Much more knowledge for on-phab administrator usage etc than server knowledge. I could start/stop daemons but wouldn't feel comfortable upgrading it or performing any other maintenance (this might change)
      • Sounds okay.
  • Redis- LEARN
    • I understand it's purpose (mainly session data) and what happens if the service fails, but no knowledge of it's setup or how it operates.
      • Acknowledged.
  • Ganglia- LEARN?
    • Really not much knowledge of this at all. I just googled it and I actually have no idea what we use it for.
      • It's just server aggregation of internal data (network usage, disk space, CPU, load) it's useful for when something is acting really strangely as Ganglia can give you the picture of all the stats over time and compared to other servers right now.
  • Piwik- NO
    • I know this is related to analytics but I have no idea what information it collects or what it's used for
      • In all defense, neither do I :P It's just the open source version of Google Analytics I guess.
  • MediaWiki- WORK
    • As mediawiki roots I have extensive knowledge of this. I have very repeatedly installed clean installs of MediaWiki, and have extensive knowledge of maintenance scripts. Not a very skilled PHP coder (like 0) so can't really review much, but I understand how it operates. I've only performed one or two upgrades (namely from 1.24 -> 1.25) and would have to be cautious doing anything like this (probably with depooling and later reverting like SPF)
      • This is null by default as he already has root on mw*.
  • Parsoid- WORK
    • Limited knowledge, but I did manage to get a Parsoid server setup on my Linux machine at work and installed (and got working) VisualEditor. I know enough to add mediawiki requirements for VE on miraheze and custom domains.
      • This is mostly what this entails. The configuration is fairly static.
  • Puppet- WORK/LEARN?
    • Limited knowledge. I understand the different repositories and how they are used. I understand the basic layout of the modules inside Puppet, including how to change certain settings, difference between ensure/absent, etc. Submitted a change to how the git::clone module.
    • Not really sure how to puppetize new files though.
      • This is a hard requirement but puppet is not overly first-timer friendly. Could be mitigated by some basic tutorials. SPF learned puppet by doing what he does now, and he's writing modules for it! Though he mostly defers the complex stuff to me still (e.g. https://github.com/miraheze/puppet/pull/107 )
  • nginx- WORK
    • Basic knowledge as I just used Apache on all my VPSes. I know enough to modify existing config files, perform a config test, and restart the service.
      • Unsure about nginx v apache. I think we'd want to deprecate one for the other.
  • apache- (see above:?)
    • Slightly more knowledgeble here. Had it running on several VPSes (and currently still do, https://forums.publictestwiki.com ), I had SSL working (not on that domain) but besides SSL not sure what I could do. Don't know much about advanced config or optimization.
      • See above.
  • base OS work (networking, server reinstalls etc.) - WORK
    • Fairly extensive knowledge of Ubuntu 14.04 (haven't checked out 16.* yet) and Debian 8.3 (note that I haven't upgraded my VPSes past 8.3 for whatever reason). Pretty sure there's no major differences (for the end user anyways)
      • Seems okay.

Fun statistics

  • Commits to mw-config: 180
  • Commits to puppet: 34
  • Commits to DNS: 2

WORK - would work with if access was granted, not really much to elaborate on except perhaps work practices and requirements of standards/what to do in "nuclear" situations)

  • Varnish
  • MariaDB
  • DNS
  • Would just require a walk through of the standards, what to do in the nuclear situation.
  • MediaWiki
  • Parsoid
  • nginx
  • apache
  • base OS work
  • This would require a walk through of how installs happen.

LEARN - would need to probably follow the pairing system and work with either John or SPF on the area of infrastructure to get this up to a WORK level.

  • Bacula
    • Would like to learn how to restore backups.
      • Needs better docs.
  • NFS
    • Would like to know more about the general usage of the software. I know what it's used for and where it's mounted on mw* but I don't know where the original file location is or how to access it (I believe it's somewhere on cp1)
      • cp1:/srv/mediawiki-static.
  • Icinga
    • Would just like to know how to add/remove services/hosts. I could probably read the documentation myself for this and look at icinga module
      • Pair with John.
  • Phabricator
    • Would just like to know if/when/why daemons are restarted and how to upgrade (probably have another ops on hand (@John) during upgrades)
  • Redis
    • Again I know what the service does in a general sense (sessions) but I'd like to know where it's at (misc2?) and how to fix it if it breaks (or the node goes down).
      • Pair with SPF probably, he seems to be doing a lot with this lately.
  • Ganglia
    • I guess just like to know how to add hosts to be monitored
      • Puppet does it.
  • Puppet
    • Most critical. Going hand in hand with 'base os work' I'd need to know how to setup a new server and make it work with puppet.
    • Additionally, I understand the basic layout of ensure => present and ensure=> running for packages and services, but don't know where to put generic package installs (like mtr, that aren't specific for a machine)
      • Pair with John. Something I'd think is best taken in an introductary phase.

NO - well, you can lead a horse to water but you can't make it drink.

  • Mail
  • Piwik
    • I'd be willing to learn about this but not sure what good it would offer :)
      • Pair with SPF.

base OS work

Event Timeline

I know I don't have any "vote" in this but I know that NDkilla is a very good sysadmin. Being in Operations will have advantages and he will be able to do more.

John triaged this task as Normal priority.Aug 1 2016, 17:02
John added subscribers: Corey, labster.

Fixed some phabricator markup

As I'm only a mw-admin, I'll abstain from voting. However we do need more operations admins and NDKilla definitely has the capacity to learn what's needed to be successful in operations.

Macro southparkfan-approves:
After talking on IRC, approved.

All looks good.

Approval from me, I'll go about with implementing and working with @NDKilla for a few days to make sure he fits in well.

  • Give ops (root on all servers)
  • Made Phab administrator
  • Added to Icinga monitoring
  • Added to mail aliases
  • Created RamNode/Backupsy Account.

TODO:

  • Give SolusVM access.
  • Talk through private git

Private git talked through.

This is now resolved.