Page MenuHomeMiraheze

[GOAL] Gluster -> Swift
Closed, ResolvedPublic

Description

This task is a goal on switching from gluster to swift.

  • Create plan on migrating data.

VMs will be setup first then all data will be migrated starting in numerical order then alphabetical order.

  • Plan resource allocation.

Base host:

DISK 30G

VMs:

proxy:

CPU 2 cores
RAM 4-5G
DISK 30G
object:

CPU 2 cores
RAM 3G
DISK use all available disk space per disk (each vm will have its own hdd disk)
account/container:

CPU 2 cores
RAM 4-5G
DISK around 160G give or take

  • Create public notice in advanced of the migration.

Event Timeline

John moved this task from Backlog to Infrastructure on the Goal-2022-Jul-Dec board.
John moved this task from Incoming to Goals on the Infrastructure (SRE) board.
Paladox triaged this task as High priority.Aug 28 2022, 15:41
John lowered the priority of this task from High to Normal.Aug 28 2022, 15:41

So we’ve decided resource wise:

Base host:

  • DISK 30G

proxy:

  • CPU 2 cores
  • RAM 4-5G
  • DISK 30G

object:

  • CPU 2 cores
  • RAM 3G
  • DISK use all available disk space per disk (each vm will have its own hdd disk)

account/container:

  • CPU 2 cores
  • RAM 4-5G
  • DISK around 160G give or take

We made the above calculations with https://platform.swiftstack.com/docs/admin/hardware.html. RAM will be done on a trial and error bases.

We also probably want to adjust the load check for the object servers (should have done this to gluster) since the high load is expected because of the I/o. We need to figure out what is a good number. It doesn't seem to use a lot of cpu where as the proxies do (we may have to increase it to 4 cores or even 6).

We also probably want to adjust the load check for the object servers (should have done this to gluster) since the high load is expected because of the I/o. We need to figure out what is a good number. It doesn't seem to use a lot of cpu where as the proxies do (we may have to increase it to 4 cores or even 6).

Fine tuning is stuff we need to do once in production - fine tuning without both a working model and real traffic is a pointless exercise esp for monitoring and determining resources (when we've based this already on doc recommendations)

In T9708#197874, @John wrote:

We also probably want to adjust the load check for the object servers (should have done this to gluster) since the high load is expected because of the I/o. We need to figure out what is a good number. It doesn't seem to use a lot of cpu where as the proxies do (we may have to increase it to 4 cores or even 6).

Fine tuning is stuff we need to do once in production - fine tuning without both a working model and real traffic is a pointless exercise esp for monitoring and determining resources (when we've based this already on doc recommendations)

Yes, that's true. But based on just uploading, I'm getting half the cpu used and load at around 1 or more, see https://grafana.miraheze.org/d/W9MIkA7iz/miraheze-cluster?orgId=1&var-job=node&var-node=swiftproxy111.miraheze.org&var-port=9100.

The documentations do say that the proxy will be the one that is the most cpu intensive tbf.

I'm testing uploading to see how things go and whether I need to tune anything more.

Ok, this is blocked on someone having a look at cloud11. I think it may require manual intervention. But UO said that it keeps coming up with "logical drive not found" and also it keeps having some issues with the drive fro example drive 7 showing faulty then not. Maybe the cable? Or maybe something else? @John could you have a look please as we've spent a month on this and we cannot seem to fix it so maybe you'll know.

The below is something I got a while after tho:

image.png (1×2 px, 187 KB)

Oh and on a plain reboot it sometimes doesn't manage to boot (not able to find a drive). But if you do a cold reboot it works.

(Seems this was when I pressed f12).

Happens if you do "reset" and also:

image.png (1×2 px, 217 KB)

Unknown Object (User) added a comment.Oct 27 2022, 23:05

So just to update this task a little bit now that migration is ongoing.

From a MediaWiki side, everything should be working with it, with the sole exception of DataDump, which generating has been temporarily disabled for in order to start the migration.

  • Extension:SocialProfile and Skin:Mirage fixed for compatibility upstream, and we managed to get those fixes merged
  • ImportDump fixed for support
  • CreateWiki (persistent model file) fixed for support
  • MirahezeMagic fixed for support (support fixed and tested for creating, renaming, and deleting wikis, generateManageWikiBackup.php, generateMirahezeSitemap.php, generateSitemapIndex.py)
  • Everything in config supports it
  • /static now rewrites to static.miraheze.org instead of aliasing /mnt/mediawiki-static, so it will work with Swift
  • robots.php and sitemap.php now support to stream from static-new to work while Swift migration is in progress
  • All new wikis are currently on Swift, and the migration is ongoing for remaining wikis
Unknown Object (User) updated the task description. (Show Details)Oct 28 2022, 01:18

Due to the slow disks (sata) on cloud11 we’ve had to change things slightly. We’re migrating gluster from cloud12 to cloud11 onto the slower disks. This has been done for gluster122 but currently gluster121 is in the process of doing it. Swiftobject121 has been setup and after gluster121 is done swiftobject122 will be setup.

Status update: We have now fully switched to swift. Gluster will be dismantled from the 11th of December.

This is now done.