Page MenuHomeMiraheze

cp7 load check flapping
Closed, ResolvedPublic

Description

The check for load on cp7 is flapping constantly

@Paladox said this is due to it being on a HDD

We shouldn't have critical notifications going off inappropiately for multiple reasons.

If the check isn't appropiate for a HDD server, it needs moving to a new appropitate one.

Event Timeline

RhinosF1 triaged this task as High priority.Jun 15 2020, 21:02
RhinosF1 created this task.
Paladox lowered the priority of this task from High to Normal.Jun 15 2020, 21:21
Paladox added subscribers: John, Southparkfan.

@John / @Southparkfan thoughts?

John added a comment.Jun 16 2020, 02:15

HDD or SSD, a load warning is a warning. It should not be happening.

@John maybe you have ideas on how to resolve this?

John added a comment.EditedJun 16 2020, 04:58

@John maybe you have ideas on how to resolve this?

Decrease CPU utilisation or increase CPU allocation is the typical way to reduce load. Essentially reduce the workload needed to be done by the server. Though this is extremely basic stuff to be known for SRE

This seems to be caused by I/O @John. Increasing CPU will help the check not go off, but won't guarantee stopping it going off as load could go past the check still.

I guess we could try upping cp6/7 to 3 cores or even 4.

John added a comment.Jun 16 2020, 15:05

So then I/O is the issue. So then workload is the issue.

I have moved the SHM log to RAM, so we should have reduced the I/O on most of the cache proxies. Is the result satisfactory?

I have moved the SHM log to RAM, so we should have reduced the I/O on most of the cache proxies. Is the result satisfactory?

Let's give it 24 hours and see how busy the service history logs are https://icinga.miraheze.org/monitoring/service/history?host=cp7&service=cp7%20Current%20Load

RhinosF1 closed this task as Resolved.Jun 18 2020, 17:07

Looks fine now