Page MenuHomeMiraheze

[Existing] Server Resource Request for prometheus
Closed, ResolvedPublic

Description

Number of servers requested: N/A
Service: prometheus
Processor: N/A
Memory: Increase by 1g
Disk: N/A
Network: N/A

Justification for request: There are spikes in the memory and when a large query is ran can cause a OOM because of how close the memory is to the Total available memory.

Endorsement by Engineering Manager (MediaWiki) or Site Reliability Engineer: @John

Event Timeline

Paladox renamed this task from [New/Existing] Server Resource Request for prometheus to [Existing] Server Resource Request for prometheus.Jul 29 2022, 05:11

It's OOM'ing:

Aug  9 16:07:22 prometheus131 kernel: [684331.615568] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/prometheus.service,task=prometheus,pid=797999,uid=109
Aug  9 16:07:22 prometheus131 kernel: [684331.615591] Out of memory: Killed process 797999 (prometheus) total-vm:86609720kB, anon-rss:759032kB, file-rss:0kB, shmem-rss:0kB, UID:109 pgtables:3868kB oom_score_adj:0
Paladox raised the priority of this task from Normal to Unbreak Now!.Aug 9 2022, 16:13

Prometheus won't start now because it keeps OOM'ing, upping the priority.

Approved due to it being an emergency

Paladox assigned this task to Reception123.

Resolved, thanks!

Just for the record, this was informally already approved as was communicated on IRC a few weeks ago :)