- JobChron service
- Redis service
- php-fpm
- Memcached
- JobRunner service
- MediaWiki Rendering
- MirahezeRenewSSL
- SSL validity checks
- Reverse DNS checks
Description
Revisions and Commits
Related Objects
Event Timeline
@Reception123: I'm happy to do these. Pretty sure I've caused or requested most of them in some way.
https://github.com/miraheze/puppet/pull/2446 for the puppet part of this, but on-wiki docs still need done.
Is this supposed to be done in the same style as https://meta.miraheze.org/wiki/Tech:Icinga/Base_Monitoring ? As in, what exactly needs to be documented?
What needs to be documented for each check is:
- Why the check exists/what does it monitor?
- Is an alert a bad thing?
- If its warning/critical, how do we fix it? Does it need fixing? Does it need further investigation?
Essentially, anyone should be able to go to any alert which triggers and know what to do about it or who to notify if necessary.
https://meta.miraheze.org/wiki/Tech:Icinga/MediaWiki_Monitoring created and reviewed. Thanks to @Universal_Omega for helping out with some of the sections!
On second thought I actually will reopen this until the PR is merged, as technically that is the main part of this task, the doc entries on icinga, which isn't done until that is merged.