Page MenuHomeMiraheze

Install elasticsearch
Closed, ResolvedPublic

Description

db4.miraheze.org is currently using 341G (333G for /srv/mariadb) out of the available 377G, which results in 96% thus far beyond icinga's threshold of CRITICAL disk usage.

Despite db4.miraheze.org costing about half of our budget, very close to $1k/year, this situation is not sustainable. We are working on reducing the amount of empty databases and our schema size, but there is one thing responsible for lots of data usage: MediaWiki's search functionality.

All searchindex.ibd (the actual content of the searchindex table) files take up 13.84GB and all FTS_*.ibd files (see StackExchange) take up 87.30GB, totalling over 101GB. 30.3% of our used space is going to those stupid InnoDB's FULLTEXT search indexes that still don't fully support MediaWiki.

Buying another 200GB VDS for $50/mo just to host those search indexes seems like a giant waste of money. One 1024MB CVZ (200GB disk, $10/mo) and installing Elasticsearch on it reduces disk usage on db4 to a mere 64%, improves MediaWiki search, stops us wasting money on more space and gives us more headroom to work on long-term projects such as utilising InnoDB compression.

Event Timeline

Southparkfan created this task.

why dont we already use elasticsearch? is it intensive on some other resource? I've installed and use it on the wiki at work, but it's basically no-traffic, I just wanted better search queries.

John lowered the priority of this task from High to Low.Feb 7 2019, 19:42
Paladox raised the priority of this task from Low to High.Feb 26 2019, 12:44

Per chat with @Southparkfan.

We would like to do this shortly to increase disk space on db4. Objections, anyone?

Give it a shot. The current searches don't work well, either.

This is really critical now (we missed db4 running out of space).

Southparkfan renamed this task from Evaluate Elasticsearch for search functionality to Install elasticsearch.Mar 1 2019, 00:24
Southparkfan claimed this task.
Southparkfan raised the priority of this task from High to Unbreak Now!.

Time to do this right now.

Any progress? Since this is UBN.

John lowered the priority of this task from Unbreak Now! to High.Mar 2 2019, 11:49
John subscribed.

Time to do this right now.

Any progress? Since this is UBN.

Yes, not possible atm due to upstream resources + planning alternatives.

elasticsearch1.miraheze.org has been bought and test1wiki is actually using CirrusSearch now. We are manually migrating a few wikis now for testing.

Steps for migration (don't change the order!):

  1. Set wmgUseCirrusSearch and wmgDisableSearchUpdate to true for wiki
  2. Run php /srv/mediawiki/w/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki <wiki>
  3. Set wmgDisableSearchUpdate to false
  4. Run php /srv/mediawiki/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip --wiki <wiki>
  5. Run php /srv/mediawiki/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse --wiki <wiki>
  6. Finally, set wmgSearchType to true for the wiki

I think this task as-is is is technically resolved. Should we make a new task or rename this task for migrating all wikis to use elastic search? just the largest wikis?

We should either move on this, or figure out another way to get more space on db4, as the disk space has been pretty much stuck in critical for some time now.

The installation has been done a while ago. Follow up in T4259 and T4260.