Tag to identify any tasks that affect gluster.
Details
Wed, Jan 4
Dec 15 2022
Dec 12 2022
This is now done.
Dec 6 2022
Your wiki is private and we have changed it so files are really private now (requires using img_auth) to access them rather then being able to access them over static.miraheze.org anonly.
Status update: We have now fully switched to swift. Gluster will be dismantled from the 11th of December.
Nov 14 2022
Due to the slow disks (sata) on cloud11 we’ve had to change things slightly. We’re migrating gluster from cloud12 to cloud11 onto the slower disks. This has been done for gluster122 but currently gluster121 is in the process of doing it. Swiftobject121 has been setup and after gluster121 is done swiftobject122 will be setup.
Oct 28 2022
Oct 27 2022
So just to update this task a little bit now that migration is ongoing.
Oct 26 2022
Oct 22 2022
Oct 18 2022
Created and working on https://meta.miraheze.org/wiki/Special:IncidentReports/53, should be published for public viewing once ready.
Oct 17 2022
Happens if you do "reset" and also:
Oh and on a plain reboot it sometimes doesn't manage to boot (not able to find a drive). But if you do a cold reboot it works.
Ok, this is blocked on someone having a look at cloud11. I think it may require manual intervention. But UO said that it keeps coming up with "logical drive not found" and also it keeps having some issues with the drive fro example drive 7 showing faulty then not. Maybe the cable? Or maybe something else? @John could you have a look please as we've spent a month on this and we cannot seem to fix it so maybe you'll know.
Oct 16 2022
While MediaWiki was visibly affected, it had nothing to do with this issue, this was overallocation on Cloud Infrastructure
Oct 4 2022
Reopening and lowering prio pending an incident report.
Given how long the visible outage/issues lasted there should probably be an incident report.
Here's some details on the problem:
Yes, Gluster is definitely at lest a part of what is causing issues.
it was said that gluster maybe causing some of the issues
Oct 1 2022
We also probably want to adjust the load check for the object servers (should have done this to gluster) since the high load is expected because of the I/o. We need to figure out what is a good number. It doesn't seem to use a lot of cpu where as the proxies do (we may have to increase it to 4 cores or even 6).
Sep 18 2022
Aug 28 2022
We made the above calculations with https://platform.swiftstack.com/docs/admin/hardware.html. RAM will be done on a trial and error bases.
So we’ve decided resource wise:
Aug 15 2022
That's really disappointing. I understand that incidents happen, and a lot of things are out of our control once it's started. But there were no actions identified to take in the future, which means Miraheze learned nothing from a week of partial downtime. I am now significantly more worried about the future of Miraheze than I was when I was getting 503 errors half of the time.
Aug 10 2022
Aug 9 2022
Aug 7 2022
As best person is position to do this.
Jun 21 2022
Was this fixed by https://github.com/miraheze/puppet/commit/1eb05e3?
Ran puppet on both gluster101 and gluster111 and there were no issues. Not sure what caused this but seems resolved now.