Page MenuHomeMiraheze

High I/O on cloud nodes affecting GlusterFS
Open, HighPublic

Description

We just had a ns report cp3 down because load times are so high. @RhinosF1 thinks that happened with mw10 this morning.

Event Timeline

RhinosF1 triaged this task as Unbreak Now! priority.Sun, May 2, 21:00
RhinosF1 created this task.
John added a subscriber: John.

Load times for mw*, so this is MediaWiki infrastructure. Please tag tasks correctly on future.

In T7230#143535, @John wrote:

Load times for mw*, so this is MediaWiki infrastructure. Please tag tasks correctly on future.

It's as far as we know to now caused by gluster.

Southparkfan lowered the priority of this task from Unbreak Now! to High.EditedSun, May 2, 21:56
Southparkfan added subscribers: Paladox, Southparkfan.

@Paladox and I are investigating the possibility of the /etc/cron.d/mdadm check (on cloud nodes only) being the cause of high I/O.

Southparkfan renamed this task from Load times high enough to cause depool to High I/O on cloud nodes affecting GlusterFS.Sun, May 2, 21:57

Facts:

Checkarray starts at 00:57 on the first Sunday of the month:

#
# cron.d/mdadm -- schedules periodic redundancy checks of MD devices
#
# Copyright © martin f. krafft <madduck@madduck.net>
# distributed under the terms of the Artistic Licence 2.0
#

# By default, run at 00:57 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
57 0 * * 0 root if [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi

Same behavior seen on other first Sundays of months:

In T7230#143535, @John wrote:

Load times for mw*, so this is MediaWiki infrastructure. Please tag tasks correctly on future.

It's as far as we know to now caused by gluster.

That information wasn’t know when I tagged it as MWSRE, therefore it’s valid. You have a pattern of mistagging tasks however which is beginning to become annoying.

@RhinosF1 The task is for the Infrastructure team now, but JohnLewis couldn't have known that in the first place.

The script runs with --idle which uses ionice, so it's already using it?

On https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592149 it says ionice -c 3 /usr/share/mdadm/checkarray --cron --all

See T7230#143573:
"However, since the deadline i/o scheduler does ignore ionice, even though the –idle argument is passed, the raid check (which is very long) will just not run with a low i/o priority…"

I even ran it with ionice the command and the load on the disk was still high.

Ohh should we run with —low

Ah ignore the above, having looked around I don't see how we can do it then if you cannot use ionice within the script && also using it outside it too.