Page MenuHomeMiraheze

Consistent 50x errors
Closed, InvalidPublic

Description

Hello, I get several times a 503 or 504 error, especially on certain pages (for example https://socdemwiki.miraheze.org/wiki/Germany).

Here is the ticket as an example:

Error 503 Backend fetch failed, forwarded for <ip-address>
(Varnish XID 97781317) via cp10 at Mon, 12 Apr 2021 20:03:14 GMT.

Several users and readers have complained that the system on SocDemWiki is slow.

Related Objects

StatusAssignedTask
InvalidNone
OpenNone

Event Timeline

R4356th triaged this task as High priority.Tue, Apr 13, 03:50
  1. I don't personally receive any errors on the mentioned page.
  2. That wiki seems to load instantaneously for me. (load time: 216ms)

Weird, because I got a "504 Gateway Time-out" just right now (trying to access https://socdemwiki.miraheze.org/wiki/Iceland).

Southparkfan added a subscriber: Southparkfan.

@GodlessRaven's first request:

*   << BeReq    >> 97781317
-   Begin          bereq 97781316 pass
-   Timestamp      Start: 1618257674.184309 0.000000 0.000000
-   BereqMethod    POST
-   BereqURL       /w/index.php?title=Germany&action=submit
-   BereqProtocol  HTTP/1.1
-   BereqHeader    Host: socdemwiki.miraheze.org
<REDACTED>
-   BereqHeader    sec-fetch-dest: document
-   BereqHeader    referer: https://socdemwiki.miraheze.org/w/index.php?title=Germany&action=submit
<REDACTED>
-   BereqHeader    X-Device: desktop
-   BereqHeader    X-Use-Mobile: 0
-   BereqHeader    X-Varnish: 97781317
-   VCL_call       BACKEND_FETCH
-   VCL_return     fetch
-   BackendOpen    89 mw11 127.0.0.1 8088 127.0.0.1 60006
-   BackendStart   127.0.0.1 8088
-   Timestamp      Bereq: 1618257674.184397 0.000087 0.000087
-   FetchError     HTC idle (3)
-   BackendClose   89 mw11
-   Timestamp      Beresp: 1618257794.150665 119.966356 119.966269
-   Timestamp      Error: 1618257794.150675 119.966366 0.000010
-   BerespProtocol HTTP/1.1
-   BerespStatus   503
-   BerespReason   Service Unavailable
-   BerespReason   Backend fetch failed
-   BerespHeader   Date: Mon, 12 Apr 2021 20:03:14 GMT
-   BerespHeader   Server: Varnish
-   VCL_call       BACKEND_ERROR
-   BerespHeader   Content-Type: text/html; charset=utf-8
-   VCL_return     deliver
-   Storage        malloc Transient
-   Length         2934
-   BereqAcct      1981 7270 9251 0 0 0

The varnish backends were healthy. NGINX logs (redacted) on mw11 indicate a 504 gateway timeout:

{
   "timestamp":"1618257814.190",
   "remote_addr":"<REDACTED>",
   "time_local":"12/Apr/2021:20:03:34 +0000",
   "request_method":"POST",
   "request_uri":"https://socdemwiki.miraheze.org/w/index.php?title=Germany&action=submit",
   "status":"504",
   "request_time":"140.004",
   "nginx_access_log":true
}

NGINX error log:

2021/04/12 20:03:34 [error] 2801#2801: *24358458 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <REDACTED>, server: *.miraheze.org, request: "POST /w/index.php?title=Germany&action=submit HTTP/1.1", upstream: "fastcgi://unix:/run/php/fpm-www.sock", host: "socdemwiki.miraheze.org", referrer: "https://socdemwiki.miraheze.org/w/index.php?title=Germany&action=submit"

Yeah, this is definitely strange that the 503s are localized for only select users, as I'm like @Universal_Omega. My response times are even better than a couple days ago, when they were around 400-500ms, if I remember correctly.

Just now...Queued Jobs: 0194ms (PHP7 via socdemwikiwiki@mw10 / cp12

Weird, because I got a "504 Gateway Time-out" just right now (trying to access https://socdemwiki.miraheze.org/wiki/Iceland).

Thank you for your report. I confirm there's something going on with the Germany article you have linked. We have made various adjustments on our side to start an investigation. It would be great if you can trigger the timeout at least once, since our systems will provide us with more detailed data now regarding your error.

Submitting an edit to the Germany article is enough to reproduce: https://socdemwiki.miraheze.org/w/index.php?diff=8321&oldid=8307&rcid=8653

[13-Apr-2021 23:57:16]  [pool www] pid 17715
script_filename = /srv/mediawiki/w/index.php
[0x00007f0ee24207f0] curl_exec() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/Handler/CurlHandler.php:40
[0x00007f0ee2420750] __invoke() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/Handler/Proxy.php:28
[0x00007f0ee24206a0] GuzzleHttp\Handler\{closure}() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/Handler/Proxy.php:51
[0x00007f0ee2420600] GuzzleHttp\Handler\{closure}() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/Middleware.php:233
[0x00007f0ee2420560] GuzzleHttp\{closure}() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/PrepareBodyMiddleware.php:37
[0x00007f0ee2420480] __invoke() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/RedirectMiddleware.php:54
[0x00007f0ee24203e0] __invoke() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/Middleware.php:57
[0x00007f0ee2420350] GuzzleHttp\{closure}() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/HandlerStack.php:71
[0x00007f0ee24202c0] __invoke() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/Client.php:351
[0x00007f0ee24201b0] transfer() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/Client.php:112
[0x00007f0ee2420130] sendAsync() /srv/mediawiki/w/vendor/guzzlehttp/guzzle/src/Client.php:129
[0x00007f0ee24200b0] send() /srv/mediawiki/w/includes/http/GuzzleHttpRequest.php:204
[0x00007f0ee241ff70] execute() /srv/mediawiki/w/includes/filerepo/ForeignAPIRepo.php:532
[0x00007f0ee241fe80] httpGet() /srv/mediawiki/w/includes/filerepo/ForeignAPIRepo.php:576
[0x00007f0ee241fd90] {closure}() /srv/mediawiki/w/includes/libs/objectcache/wancache/WANObjectCache.php:1529
[0x00007f0ee241fab0] fetchOrRegenerate() /srv/mediawiki/w/includes/libs/objectcache/wancache/WANObjectCache.php:1376
[0x00007f0ee241f960] getWithSetCallback() /srv/mediawiki/w/includes/filerepo/ForeignAPIRepo.php:586
[0x00007f0ee241f8b0] httpGetCached() /srv/mediawiki/w/includes/filerepo/ForeignAPIRepo.php:202
[0x00007f0ee241f820] fetchImageQuery() /srv/mediawiki/w/includes/filerepo/ForeignAPIRepo.php:273
[0x00007f0ee241f740] getThumbUrl() /srv/mediawiki/w/includes/filerepo/ForeignAPIRepo.php:344

Slow image fetching / InstantCommons? Removing image embedding helps to avoid a timeout on submit.

Actually, I think I'm closer now. Top profile entries for the Germany article without parsercache hit:

100.00% 4987.024      1 - main()
 99.22% 4947.946      1 - wfIndexMain
 99.22% 4947.914      1 - MediaWiki::run
 99.21% 4947.640      1 - MediaWiki::main
 96.67% 4820.735      1 - MediaWiki::performRequest
 96.58% 4816.498      1 - MediaWiki::performAction
 96.57% 4816.175      1 - ViewAction::show
 96.42% 4808.691      1 - Article::view
 95.22% 4748.519      1 - PoolCounterWork::execute
 95.22% 4748.501      1 - PoolWorkArticleView::doWork
 94.86% 4730.743    364 - call_user_func
 94.73% 4724.182      1 - MediaWiki\Revision\RenderedRevision::getRevisionParserOutput
 94.73% 4724.173      1 - MediaWiki\Revision\RevisionRenderer::MediaWiki\Revision\{closure}
 94.73% 4724.172      1 - MediaWiki\Revision\RevisionRenderer::combineSlotOutput
 94.73% 4724.165      1 - MediaWiki\Revision\RenderedRevision::getSlotParserOutput
 94.73% 4724.150      1 - MediaWiki\Revision\RenderedRevision::getSlotParserOutputUncached
 94.73% 4724.147      1 - AbstractContent::getParserOutput
 94.50% 4712.590      9 - Parser::parse
 94.39% 4707.457      1 - WikitextContent::fillParserOutput
 92.26% 4601.219      9 - Parser::internalParse
 50.38% 2512.279    442 - Parser::replaceVariables
 50.32% 2509.301    262 - PPFrame_Hash::expand
 49.79% 2483.122     65 - Parser::braceSubstitution
 48.95% 2440.924    241 - PPFrame_Hash::expand@1
 48.88% 2437.498     66 - Parser::braceSubstitution@1
 48.63% 2424.996      2 - PPTemplateFrame_Hash::cachedExpand
 48.56% 2421.826    206 - PPFrame_Hash::expand@2
 48.48% 2417.873     34 - Parser::braceSubstitution@2
 48.46% 2416.850    219 - PPFrame_Hash::expand@3
 48.43% 2415.128    235 - Parser::argSubstitution
 48.38% 2412.822    234 - PPTemplateFrame_Hash::getArgument
 48.30% 2408.795    243 - PPTemplateFrame_Hash::getNumberedArgument
 47.81% 2384.088    156 - PPFrame_Hash::expand@4
 47.80% 2383.565     95 - PPTemplateFrame_Hash::getNamedArgument
 47.75% 2381.453    166 - PPFrame_Hash::expand@5
 47.74% 2380.743    299 - Parser::braceSubstitution@3
 47.73% 2380.329     55 - Parser::argSubstitution@1
 47.72% 2379.760     55 - PPTemplateFrame_Hash::getArgument@1
 46.37% 2312.559    506 - PPFrame_Hash::expand@6
 44.65% 2226.903    250 - Parser::braceSubstitution@4
 44.36% 2212.446    353 - Parser::callParserFunction
 43.84% 2186.357    254 - MediaWiki\Extensions\ParserFunctions\ParserFunctions::switch
 41.80% 2084.648     32 - Parser::handleInternalLinks
 41.80% 2084.477     32 - Parser::handleInternalLinks2
 38.41% 1915.588  72583 - PPFrame_Hash::expand@7
 29.46% 1469.174    257 - Parser::braceSubstitution@5
 28.39% 1415.838   1003 - PPFrame_Hash::expand@8
 28.10% 1401.562    247 - Parser::braceSubstitution@6
 27.84% 1388.267    307 - Parser::callParserFunction@1
 27.75% 1383.777    247 - MediaWiki\Extensions\ParserFunctions\ParserFunctions::switch@1
 27.60% 1376.460   4138 - PPFrame_Hash::expand@9
 26.82% 1337.538    730 - Parser::braceSubstitution@7
 25.18% 1255.556 177103 - MediaWiki\Extensions\ParserFunctions\ParserFunctions::decodeTrimExpand
 24.88% 1240.986    254 - MediaWiki\BadFileLookup::isBadFile
 24.86% 1239.964    508 - RepoGroup::findFile
 24.19% 1206.336   1954 - PPFrame_Hash::expand@10
 21.63% 1078.468    730 - Parser::braceSubstitution@8
 21.26% 1060.474    738 - FileRepo::findFile
 20.30% 1012.231   1857 - WANObjectCache::getWithSetCallback
 20.08% 1001.627    733 - Parser::callParserFunction@2
 19.92% 993.603   1849 - WANObjectCache::fetchOrRegenerate
 19.79% 987.131    730 - MediaWiki\Extensions\ParserFunctions\ParserFunctions::switch@2
 15.57% 776.300    254 - Parser::makeImage
 14.30% 713.180   1356 - Wikimedia\Rdbms\DBConnRef::__call
 14.08% 702.115    801 - Wikimedia\Rdbms\Database::select
 13.42% 669.325    813 - Wikimedia\Rdbms\Database::query
 13.31% 663.683    816 - Wikimedia\Rdbms\Database::executeQuery
 13.29% 662.669    743 - Wikimedia\Rdbms\DBConnRef::selectRow
 13.25% 660.912    744 - Wikimedia\Rdbms\Database::selectRow
 13.17% 656.632    816 - Wikimedia\Rdbms\Database::executeQueryAttempt
 12.99% 647.844 102832 - PPFrame_Hash::expand@11
 12.52% 624.529   1852 - WANObjectCache::get
 12.45% 621.126   1852 - WANObjectCache::getMulti
 12.05% 600.810   1855 - MediumSpecificBagOStuff::getMulti
 11.92% 594.457    816 - Wikimedia\Rdbms\DatabaseMysqli::doQuery
 11.89% 592.824   1855 - MemcachedPhpBagOStuff::doGetMulti
 11.81% 589.080   1855 - MemcachedClient::get_multi
 11.77% 587.036    816 - mysqli::query
 10.96% 546.385   3715 - MemcachedClient::_fgets
 10.72% 534.668   1863 - MemcachedClient::_load_items
 10.71% 534.110   3715 - fgets
  9.54% 475.907    492 - LocalRepo::checkRedirect
  9.05% 451.240   1008 - ForeignAPIRepo::fetchImageQuery
  8.59% 428.321   1008 - ForeignAPIRepo::httpGetCached
  8.37% 417.328    254 - Linker::makeImageLink
  8.12% 404.924    984 - LocalFile::load
  8.11% 404.227    492 - LocalFile::loadFromCache
  7.65% 381.575 381284 - call_user_func@1
  7.64% 381.088    762 - ForeignAPIFile::transform
  7.46% 371.800 380846 - ParserOptions::getOption
  6.70% 333.928    486 - spl_autoload_call
  6.69% 333.546    762 - ForeignAPIRepo::getThumbUrlFromCache
  6.66% 332.034    762 - ForeignAPIRepo::getThumbUrl
  6.35% 316.879    478 - AutoLoader::autoload
  6.17% 307.496    246 - LocalFile::loadFromDB
  5.92% 295.367 177199 - PPNode_Hash_Tree::splitArg
  5.31% 264.929 177199 - PPNode_Hash_Tree::splitRawArg
  5.07% 252.726 185159 - ParserOptions::getMaxPPNodeCount
  4.78% 238.492    246 - section.query-m: SELECT img_name,img_size,img_width,img_height,img_metadata,img_bits,img_media_type,img_major_mime,img_minor_mime,img_timestamp,img_sha1,comment_img_description.comment_text AS `img_description_text`,comment_img_description.comment_data AS `img_description
  4.57% 227.997    247 - Parser::parseLinkParameter
  4.51% 225.003    630 - LinkCache::addLinkObj
  4.51% 224.969    460 - Title::getArticleID
  4.50% 224.621    254 - Linker::processResponsiveImages
  4.39% 218.950    243 - LocalRepo::{closure}
  4.34% 216.456    623 - ParserOutput::addLink
  3.86% 192.463    245 - LinkCache::fetchPageRow
  3.61% 180.236   4960 - PPNode_Hash_Tree::splitRawTemplate
  3.60% 179.284 185159 - ParserOptions::getMaxPPExpandDepth
  3.56% 177.691   6729 - MediaWiki\HookContainer\HookContainer::run
  3.51% 174.833 380846 - ParserOptions::optionUsed
  3.48% 173.330    738 - FileRepo::newFile
  3.26% 162.529      1 - RepoGroup::initialiseRepos
  3.26% 162.513      3 - RepoGroup::newRepo
  3.24% 161.691    243 - section.query-m: SELECT rd_namespace,rd_title FROM `page`,`redirect` WHERE page_namespace = N AND page_title = 'X' AND (rd_from = page_id) LIMIT N 
  3.18% 158.616    245 - section.query-m: SELECT page_id,page_len,page_is_redirect,page_latest,page_restrictions,page_content_model FROM `page` WHERE page_namespace = N AND page_title = 'X' LIMIT N 
  3.00% 149.794    246 - ForeignAPIRepo::newFile
  2.94% 146.832    246 - ForeignAPIFile::newFromTitle
  2.89% 144.363      3 - FileRepo::__construct
  2.89% 144.294      2 - LocalRepo::__construct
  2.62% 130.767      3 - FileBackendGroup::get

NewPP report:

{
   "wgPageParseReport":{
      "limitreport":{
         "cputime":"3.488",
         "walltime":"4.692",
         "ppvisitednodes":{
            "value":185066,
            "limit":1000000
         },
         "postexpandincludesize":{
            "value":176839,
            "limit":2097152
         },
         "templateargumentsize":{
            "value":61315,
            "limit":2097152
         },
         "expansiondepth":{
            "value":14,
            "limit":40
         },
         "expensivefunctioncount":{
            "value":0,
            "limit":99
         },
         "unstrip-depth":{
            "value":0,
            "limit":20
         },
         "unstrip-size":{
            "value":6561,
            "limit":5000000
         },
         "timingprofile":[
            "100.00% 2493.819      1 -total",
            " 97.25% 2425.354      1 Template:Geonav",
            " 96.85% 2415.340      1 Template:Navbottom",
            " 93.89% 2341.491    247 Template:Cc",
            " 68.14% 1699.328      4 Template:Nbs2",
            " 57.18% 1425.870    247 Template:Cf",
            " 49.34% 1230.372    730 Template:Cf/codes",
            " 27.88%  695.269      1 Template:Nbs1",
            "  0.69%   17.169     14 Template:Nbar",
            "  0.53%   13.104      1 Template:Reflist"
         ]
      },
      "loops":{
         "limitreport-count-unlimited":[
            0
         ]
      },
      "cachereport":{
         "origin":"test3",
         "timestamp":"20210414001326",
         "ttl":86400,
         "transientcontent":false
      }
   }
}

This suggests Template:Geonav is the cause. Visting the Template page gives a 504. The template tries to load 200+ images of country flags (using Template:Cc).

@GodlessRaven Unfortunately, our servers have trouble loading the images of country flags. Please reduce the number of images in Template:Geonav.

Loading a huge number of files from Wikimedia Commons is a known cause for issues like these. The way our Miraheze <-> Wikimedia Commons integration works is not ideal.