Page MenuHomeMiraheze

Setup centralised logging for services
Open, NormalPublic

Description

Currently, access/audit and error logs for all of our services are stored locally, which makes exploring logs harder. Miraheze should aim for a centralised (probably ELK, though there are other options) logging stack providing role-based access control.

Rollout (last update: 2021-01-01 19:59 UTC) status:

  • bacula2.miraheze.org: syslog daemon present and remote logging (graylog) enabled, needs @John to verify all bacula logs are sent to syslog instead of local files
  • cloud3.miraheze.org: syslog daemon present and remote logging (graylog) enabled, needs @Paladox to verify all proxmox logs are sent to syslog instead of local files
  • cloud4.miraheze.org: syslog daemon present and remote logging (graylog) enabled, needs @Paladox to verify all proxmox logs are sent to syslog instead of local files
  • cloud5.miraheze.org: syslog daemon present and remote logging (graylog) enabled, needs @Paladox to verify all proxmox logs are sent to syslog instead of local files
  • cp3.miraheze.org: NOT done yet @Southparkfan
  • cp10.miraheze.org: NOT done yet @Southparkfan
  • cp11.miraheze.org: NOT done yet @Southparkfan
  • cp12.miraheze.org: NOT done yet @Southparkfan
  • db11.miraheze.org: syslog daemon present and logging, MariaDB logging NOT DONE yet @Southparkfan
  • db12.miraheze.org: syslog daemon present and logging, MariaDB logging NOT DONE yet @Southparkfan
  • db13.miraheze.org: syslog daemon present and logging, MariaDB logging NOT DONE yet @Southparkfan
  • gluster3.miraheze.org: syslog daemon present and remote logging (graylog) enabled, needs @Paladox to verify all gluster logs are sent to syslog instead of local files
  • gluster4.miraheze.org: syslog daemon present and remote logging (graylog) enabled, needs @Paladox to verify all gluster logs are sent to syslog instead of local files
  • graylog2.miraheze.org: syslog daemon present and remote logging (graylog) enabled, graylog internal logging not checked yet (@Paladox)
  • jobrunner3.miraheze.org: jobrunner/jobchron logging DONE, php-fpm logging DONE, cron logs NOT DONE yet @Southparkfan
  • jobrunner4.miraheze.org: jobrunner/jobchron logging DONE, php-fpm logging DONE, cron logs NOT DONE yet @Southparkfan
  • ldap2.miraheze.org: syslog daemon present and remote logging (graylog) enabled, slapd logs seem to be fine
  • mail2.miraheze.org: syslog daemon present and remote logging (graylog) enabled, postfix/dovecot/roundcubemail logs are sent to syslog
  • mon2.miraheze.org: NOT done yet, dependency check for icinga logs (ie are local logs needed for icinga-miraheze IRC bot?)
  • mw8.miraheze.org: syslog daemon present and remote logging (graylog) enabled, nginx logging DONE, php-fpm logging DONE
  • mw9.miraheze.org: syslog daemon present and remote logging (graylog) enabled, nginx logging DONE, php-fpm logging DONE
  • mw10.miraheze.org: syslog daemon present and remote logging (graylog) enabled, nginx logging DONE, php-fpm logging DONE
  • mw11.miraheze.org: syslog daemon present and remote logging (graylog) enabled, nginx logging DONE, php-fpm logging DONE
  • ns1.miraheze.org: syslog daemon present and remote logging (graylog) enabled, GDNSD logging done, confirmed working
  • ns2.miraheze.org: syslog daemon present and remote logging (graylog) enabled, GDNSD logging done, confirmed working
  • phab2.miraheze.org: syslog daemon present and remote logging (graylog) enabled, phd logs go to syslog
  • puppet3.miraheze.org: syslog daemon present and remote logging (graylog) enabled, probably still a lot of puppet daemons logging to local files (@Paladox )
  • rdb3.miraheze.org: syslog daemon present and remote logging (graylog) enabled, redis logging done, confirmed working
  • rdb4.miraheze.org: syslog daemon present and remote logging (graylog) enabled, redis logging done, confirmed working
  • services3.miraheze.org: syslog daemon present and remote logging (graylog) enabled, citoid/proton/restbase/electron logging DONE
  • services4.miraheze.org: syslog daemon present and remote logging (graylog) enabled, nginx logging DONE, citoid/proton/restbase/electron logging DONE
  • test3.miraheze.org: syslog daemon present and remote logging (graylog) enable, nginx logging DONE, php-fpm logging DONE

Event Timeline

Paladox triaged this task as Normal priority.Dec 31 2019, 16:02
Southparkfan renamed this task from Setup logstash and send the logs to it to Setup centralised logging for services.Jan 29 2020, 23:34
Southparkfan updated the task description. (Show Details)

Conversation from the staff channel is:

@Zppix brought up that the WMF use logstash, @Southparkfan brought up graylog. Based on reading up, graylog is less to install then the logstash setup.

The logstash setup is made for big data, whereas graylog is made for logs.

Heres one source https://medium.com/@logicify/advantages-of-graylog-grafana-compared-to-elk-stack-a7c86d58bc2c

Some questions we can use for logstash vs graylog:

<+SPF|Cloud> what are the differences between solutions X and Y? what are the pros and cons of both? does either solution lack something we consider to be critical? what are the recommendations of people on internet in discussions comparing both solutions?

<+SPF|Cloud> is one solution more secure than the other (support for transit and at-rest encryption? security track record)? are there performance differences (requires less resources to do the same thing)? how easy is it to setup new log sources? is it to be expected that one of the solutions requires less maintenance than the other?

<+SPF|Cloud> this list is not exhaustive, you may find some questions to be irrelevant or missing (use your own judgement) - but they allow you to make a good comparison between these solutions

Note from staff channel: ELK stack vs graylog (though both ones use Elasticsearch).

I've deployed the new logging infrastructure to jobrunner[12] and mw[45].

Sent an email to the team notifying of the deployment to mw[45].

I've deployed the new logging infrastructure to mw[67]

@Paladox are you able to give a look over the ones that SPF has marked for you to review please?

mon1 marked as done, Icinga logs need to be local for IRC bots however I set up icinga logs to go to graylog separately under T6798

Quite a few actions are blocked on you.

Puppet-agent logs to syslog in addition to it logs to logging to a file.

See application_name:"puppet-agent"

Also cron seems to log application_name:"CRON".

So we only need to parse puppetdb/puppetserver files.

We can do https://puppet.com/docs/puppet/7.4/server/config_logging_advanced.html for puppetserver (including its access logs). We can also probably do the same for puppetdb (as it also uses logback).

We switched off syslog-ng logging on the cloud servers. Not sure if we want to switch it back on @John @Southparkfan ?

So I've created and merge this pull https://github.com/miraheze/puppet/pull/1695. Essentially logs for puppetserver/puppetdb are now read and sent to graylog.

I've created 3 new streams:

https://graylog.miraheze.org/streams/6046c9f7259fd27d6737dc89/search

https://graylog.miraheze.org/streams/6046c9c4259fd27d6737dc3d/search

https://graylog.miraheze.org/streams/6046c9de259fd27d6737dc63/search

But we have an issue now. Syslog-ng seems to have an issue when the file is rotated. So we have some choices.

We have a cron which restarts at midnight (syslog-ng). We use rsyslog, doesn't appear to have this issue? We don't rotate the log puppetserver/puppetdb side but then we'll likely have this issue with other services when we try to load their logs (if they rotate). Gluster rotates its logs which cannot be disabled.

@Southparkfan

I was reading up on this and it says you have to restart https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/administration-guide/86 granted that's if you are using the softwares inbuilt log rotate but seems similar to the issue I'm having where I have to restart it to get it to read logs.

We switched off syslog-ng logging on the cloud servers. Not sure if we want to switch it back on @John @Southparkfan ?

Yes, let's see if we can receive proxmox logs without further tweaking.

@Southparkfan

I was reading up on this and it says you have to restart https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/administration-guide/86 granted that's if you are using the softwares inbuilt log rotate but seems similar to the issue I'm having where I have to restart it to get it to read logs.

You can invoke postrotate scripts in logrotate. For example, for ufw.log we do the following on puppet3:

{
        rotate 4
        weekly
        missingok
        notifempty
        compress
        delaycompress
        sharedscripts
        postrotate
                invoke-rc.d rsyslog rotate >/dev/null 2>&1 || true
        endscript
}

After each file rotation, a SIGHUP signal is sent to rsyslog in order to reopen the log file, otherwise rsyslog won't deal nicely with the rotation. In the past, we have had issues with mariadb keeping old slow log file descriptors open, so deleted files were still 'present' and thus unused disk space is shown as claimed.

The syslog-ng's systemd unit reload command ensures a SIGHUP signal is sent to syslog-ng, which will reopen the source file. Changing the postrotate script to:

postrotate
        invoke-rc.d syslog-ng reload >/dev/null 2>&1 || true
endscript

should fix your issue, but you'll add it to every logrotate configuration file.

I will try and finish this now (for cloud*)

Added pve* logging via https://github.com/miraheze/puppet/pull/1713

Needs to be fixed to parse the timestamp but other then that this is done (cloud*)

there's one other log I didn't think we need to send for proxmox (wasn't really any info we needed I think).

I could look into taking this over from @Paladox. Is there anything not on this task that I should be aware of if I do?

Moving over to new goal period. Feel free to remove if it isn't wanted to be moved over.