Page MenuHomeMiraheze

Automatically generate Sitemaps
Closed, ResolvedPublic

Description

Sitemaps are used in the field of SEO for telling search engines what's available on a site. While there are extensions that generate sitemaps automatically, they are not that great (T1558). What I'd like to see is a script that automatically generates sitemaps for public wikis using generateSitemap.php. These wouldn't have to be updated that often, but maybe once a week/two weeks/month, depending on how long it takes.

If possible, it would be best to add the sitemaps to each wiki's respective robots.txt file, as:

Sitemap: http://example.com/sitemap_location.xml

I'm not sure what the process to do that would be, as it's currently a static file. Puppet templates? Do I need to write a PHP script to serve it? Without this, wikis would have to manually submit sitemaps with webmaster tools, which is less than awesome.

Details

Commits
Unknown Object (Diffusion Commit)
Unknown Object (Diffusion Commit)

Event Timeline

labster created this task.May 8 2017, 07:39
John added a subscriber: John.May 8 2017, 08:30

It would have to be a php script.

Things to overcome;
a) all static files are shared
b) limited space
c) there's one robots.txt for all wikis.

John triaged this task as Normal priority.May 12 2017, 15:48
Reception123 mentioned this in Unknown Object (Diffusion Commit).Jul 14 2017, 08:43
John added a comment.Jun 28 2018, 18:55

With swift we lose our static data point so this task has potentially gotten more complicated.

John lowered the priority of this task from Normal to Low.Jun 28 2018, 18:55
Paladox added a subscriber: Paladox.Aug 3 2018, 02:24

We have moved to lizardfs from swift.

Paladox raised the priority of this task from Low to Normal.Sep 15 2018, 15:43
Paladox lowered the priority of this task from Normal to Low.Sep 20 2018, 22:05
eldrago added a subscriber: eldrago.Oct 13 2018, 21:04
John assigned this task to Paladox.Nov 16 2018, 18:27
John moved this task from Backlog to Operations on the Goal-2018-Jul-Dec board.
Paladox raised the priority of this task from Low to Normal.Nov 16 2018, 18:28
Paladox added a commit: Unknown Object (Diffusion Commit).Nov 16 2018, 20:48
Paladox added a comment.Nov 16 2018, 22:13

The infrastructure now supports sitemaps (see https://meta.miraheze.org/sitemap !)

We just need to create a script that runs generateSitemap.php like "php maintenance/generateSitemap.php --fspath=/mnt/mediawiki-static/sitemaps/meta.miraheze.org --identifier=metawiki --urlpath=https://meta.miraheze.org/ --server=https://meta.miraheze.org --compress=yes --wiki=metawiki"

So im thinking something like

"php maintenance/generateSitemap.php --fspath=/mnt/mediawiki-static/sitemaps/"$wgServer" --identifier="$wgDBname" --urlpath="https://$wgServer/" --server="https://$wgServer" --compress=yes --wiki="$wgDBname""

Paladox added a comment.Dec 2 2018, 02:10

I've started /usr/bin/nice -19 /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblist/all.dblist /srv/mediawiki/w/extensions/MirahezeMagic/maintenance/generateMirahezeSitemap.php

Paladox added a comment.Dec 2 2018, 02:41

Turns out storage is not a problem.

It's only 37MB for all wikis.

Paladox closed this task as Resolved by committing Unknown Object (Diffusion Commit).Dec 2 2018, 02:44
Paladox added a commit: Unknown Object (Diffusion Commit).