Page MenuHomeMiraheze

db121 frequently OOMs
Open, HighPublic

Description

I hesitated to make this task high priority given that there are already so many and that the OOM's aren't that frequent but it seems like the issues aren't stopping and some users appear to be worried given the recent db141 incident it's not a good image to have another db server go down every once in a while. As far as I'm aware looking at SAL it's OOM'd on 10 December, 22 November, 13 November, 8 November

Event Timeline

Reception123 created this task.

There's minimal opportunity to grow memory on db121 - as far as I know the cause is likely parsercache which would mean the easiest fix is to reduce the amount of caching on MW side.

In T10117#203977, @John wrote:

There's minimal opportunity to grow memory on db121 - as far as I know the cause is likely parsercache which would mean the easiest fix is to reduce the amount of caching on MW side.

I also have mentioned a couple times that the cause is parsercache.

I was thinking, however, what if we moved parsercache to say db141, and instead potentially increased memory on db141 where we have more on cloud14? Another option I was thinking was investing in a smaller db server on say cloud14 that is specifically for parsercache, so that it going down, will also not bring down MediaWiki wikis.

I am not certain, but also decreasing caching MW-side, in my opinion, is not ideal also, caching less just doesn't seem like a preferred route, as us caching more lets us keep more, and as overall performance benefits.

Once the cloud13 reboots happen, a request for a smaller db server could be considered. Or moving it to another server as well could be considered.

Re-assigning as the action above identified is one for the MediaWiki team and not Infrastructure

Noting that last OOM was on January 4