Page MenuHomeMiraheze

MediaWiki response time can fluctuate due to messages
Open, NormalPublic

Description

While doing work on our new infrastructure, I noticed response times can skyrocket when the message cache is not present. @Paladox
and I already discussed message caching before, since I suspected it was primarily responsible for the slowness or fluctuations for MediaWiki.

It is worth investigating if we are using the right configuration settings for caching.

$ curl https://test2.miraheze.org/wiki/Main_Page?forceprofile=1 | grep -i message
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 53573    0 53573    0     0  57114      0 --:--:-- --:--:-- --:--:-- 57114 65.32% 197.303    271 - Message::toString
 57.42% 173.443    385 - Message::fetchMessage
 55.41% 167.379    316 - MessageCache::get
 54.53% 164.715    316 - MessageCache::getMessageFromFallbackChain
 54.50% 164.626    353 - MessageCache::getMessageForLang
 53.97% 163.018     30 - Message::plain
 53.30% 161.000    318 - MessageCache::getMsgFromNamespace
 52.11% 157.395    318 - MessageCache::load
 51.15% 154.506      1 - MessageCache::loadFromDBWithLock
 43.38% 131.022      1 - MessageCache::saveToCaches
  6.23% 18.829    217 - Message::text
  5.66% 17.088      1 - MessageCache::loadFromDB
  4.87% 14.700      8 - Message::parse
  4.69% 14.167      4 - Message::parseText
  4.67% 14.092      4 - MessageCache::parse
  4.26% 12.861    235 - Message::transformText
  4.13% 12.473    239 - MessageCache::transform
  2.05% 6.184     13 - MessageCache::getParser
  1.85% 5.602    557 - Message::getLanguage
  0.99% 2.979    352 - Language::getMessage
  0.98% 2.960     88 - Message::exists
  0.72% 2.172     64 - Language::getMessageFromDB
  0.59% 1.768      1 - MessageCache::setValidationHash
  0.48% 1.456      2 - Language::getMessageKeysFor
  0.41% 1.240    443 - MediaWiki\Languages\LanguageNameUtils::getMessagesFileName

$ curl https://test2.miraheze.org/wiki/Main_Page?forceprofile=1 | grep -i message
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 49602    0 49602    0     0  75497      0 --:--:-- --:--:-- --:--:-- 75497 28.17% 41.517    271 - Message::toString
 13.73% 20.237    385 - Message::fetchMessage
 11.30% 16.654    217 - Message::text
  9.93% 14.636      8 - Message::parse
  9.55% 14.076      4 - Message::parseText
  9.51% 14.011      4 - MessageCache::parse
  9.28% 13.682    316 - MessageCache::get
  7.42% 10.928    316 - MessageCache::getMessageFromFallbackChain
  7.33% 10.798    353 - MessageCache::getMessageForLang
  7.17% 10.564    235 - Message::transformText
  6.90% 10.171    239 - MessageCache::transform
  6.42% 9.456     30 - Message::plain
  4.86% 7.155    318 - MessageCache::getMsgFromNamespace
  4.16% 6.126    557 - Message::getLanguage
  3.23% 4.764     13 - MessageCache::getParser
  2.28% 3.362    318 - MessageCache::load
  2.15% 3.163     88 - Message::exists
  2.05% 3.016    352 - Language::getMessage
  1.60% 2.351     64 - Language::getMessageFromDB
  1.21% 1.785      1 - Language::getMessageKeysFor
  0.93% 1.365    443 - MediaWiki\Languages\LanguageNameUtils::getMessagesFileName

Event Timeline

Southparkfan created this task.

Is anyone taking responsibility for this? A month as high priority with no commitment or ownership means this can't be high priority.

Paladox lowered the priority of this task from High to Normal.Mar 19 2020, 20:00

@Southparkfan i guess.

@Southparkfan does this still happen? We've made a lot of changes since this task was made, so wondering if it was indirectly fixed or not.

@Universal_Omega I am not sure if it still happens, but I know MediaWiki reads the l10n .cdb files from the disk at each visit. Sometimes, the linux page cache works and the cdb file is in memory (no performance issues), but it happens frequently that the cdb file is not in the page cache. Since MediaWiki servers run on HDDs that are shared with other VMs, reading the cdb file from the disk is subject to the I/O blender effect in which case message parsing will be awfully slow. The same happens for reading the .json files of ManageWiki (at times not in the page cache either).

Try refreshing test2wiki's main page a few times and note the response times. Afterwards, run vmtouch -tl /srv/mediawiki/w/cache/l10n/l10n_cache-en.cdb /srv/mediawiki/w/cache/test2wiki.json on test2 and note the response times of a few refreshes. I performed various tests some months ago, though I am not sure if the impact nowadays is as huge compared to months ago.

Hmmm: https://meta.miraheze.org/w/index.php?title=Miraheze&oldid=159&forceprofile=1 (proxied to test2)

100.00% 836.687      1 - main()
 73.80% 617.440      1 - wfIndexMain
 73.79% 617.414      1 - MediaWiki::run
 73.75% 617.022      1 - MediaWiki::main
 50.06% 418.813      1 - MediaWiki::performRequest
 49.97% 418.061      1 - MediaWiki::performAction
 49.84% 417.030      1 - ViewAction::show
 41.77% 349.489    325 - LCStoreCDB::get
 39.62% 331.456    325 - Cdb\Reader\DBA::get
 39.57% 331.079    325 - dba_fetch
 37.03% 309.823      1 - Article::view
 34.62% 289.692    587 - Message::fetchMessage
 34.40% 287.842    464 - MessageCache::get
 33.30% 278.595    464 - MessageCache::getMessageFromFallbackChain
 33.26% 278.267    574 - MessageCache::getMessageForLang
 30.74% 257.205    342 - Message::toString
 30.66% 256.563    739 - LocalisationCache::getSubitem
 30.58% 255.896    303 - LocalisationCache::loadSubitem
 29.17% 244.072   1664 - MediaWiki\HookContainer\HookContainer::run
 29.12% 243.685     26 - LocalisationCache::loadItem
 27.43% 229.534   1313 - MediaWiki\HookContainer\HookContainer::callLegacyHook
 20.12% 168.335    520 - MessageCache::getMsgFromNamespace
 18.96% 158.619      1 - MediaWiki::{closure}
 18.96% 158.615      1 - OutputPage::output
 18.79% 157.239    520 - MessageCache::load
 18.36% 153.611      1 - SkinTemplate::outputPage
 18.34% 153.489      2 - OutputPage::setPageTitle

325 calls to https://github.com/wikimedia/cdb/blob/master/src/Reader/DBA.php#L43, over 331 ms spent in reading keys from a cdb file? Not very fast