Page MenuHomeMiraheze

Puppet cannot remount GlusterFS mount if directory exists
Closed, ResolvedPublic

Description

'glusterfs' OOM'd on a server earlier today:

Apr 13 23:03:57 mw9 kernel: Out of memory: Kill process 20373 (glusterfs) score 82 or sacrifice child

Puppet cannot properly remount GlusterFS:

Apr 13 23:13:20 mw9 puppet-agent[21626]: (/Stage[main]/Role::Mediawiki/Gluster::Mount[/mnt/mediawiki-static]/Exec[/mnt/mediawiki-static]/returns) /bin/mkdir: cannot create directory ‘/mnt/mediawiki-static’: File exists
Apr 13 23:13:20 mw9 puppet-agent[21626]: '/bin/mkdir -p '/mnt/mediawiki-static'' returned 1 instead of one of [0]
Apr 13 23:13:20 mw9 puppet-agent[21626]: (/Stage[main]/Role::Mediawiki/Gluster::Mount[/mnt/mediawiki-static]/Exec[/mnt/mediawiki-static]/returns) change from 'notrun' to ['0'] failed: '/bin/mkdir -p '/mnt/mediawiki-static'' returned 1 instead of one of [0] (corrective)
Apr 13 23:13:22 mw9 puppet-agent[21626]: (/Stage[main]/Role::Mediawiki/Gluster::Mount[/mnt/mediawiki-static]/Mount[/mnt/mediawiki-static]) Dependency Exec[/mnt/mediawiki-static] has failure
Apr 13 23:13:22 mw9 puppet-agent[21626]: (/Stage[main]/Role::Mediawiki/Gluster::Mount[/mnt/mediawiki-static]/Mount[/mnt/mediawiki-static]) Skipping because of failed dependencies
Apr 13 23:13:22 mw9 puppet-agent[21626]: (Stage[main]) Unscheduling all events on Stage[main]

A umount -l /mnt/mediawiki-static fixed the situation:

Apr 13 23:15:59 mw9 systemd[15896]: mnt-mediawiki\x2dstatic.mount: Succeeded.
Apr 13 23:15:59 mw9 systemd[1]: mnt-mediawiki\x2dstatic.mount: Succeeded.
Apr 13 23:15:59 mw9 systemd[24940]: mnt-mediawiki\x2dstatic.mount: Succeeded.
Apr 13 23:15:59 mw9 systemd[1]: mnt-mediawiki\x2dstatic.automount: Got automount request for /mnt/mediawiki-static, triggered by 894 (nginx)
Apr 13 23:15:59 mw9 systemd[1]: Mounting /mnt/mediawiki-static...

As long as the OOM is a one-off incident, I am not very concerned, but services must self-heal after failures, which didn't happen here. The -p flag in mkdir should prevent the 'File exists' error, but it doesn't. In the puppet tree, we run mkdir manually, can't we change this to file { '/mnt/mediawiki-static': ensure => directory, <put other parameters here> }?

Event Timeline

John triaged this task as Low priority.
John added a project: Puppet.
John subscribed.

@Paladox are you okay to have a look at this?

As long as the OOM is a one-off incident, I am not very concerned

Search for remount in SAL or check icinga history. It's not often but in unmounts every so often.

In T7134#141593, @John wrote:

@Paladox are you okay to have a look at this?

Yes.