Page MenuHomeMiraheze

Import of nonciclopedia.miraheze.org is not progressing
Closed, ResolvedPublic

Description

Almost 2 weeks ago I asked for my wiki to be imported by providing a XML dump on Google Drive (link), and @Reception123 started the process (see T3553#68345), but a week ago it stopped without notice and there's still a ton of pages and images to import.

Event Timeline

AmandaCath triaged this task as Normal priority.Sep 23 2018, 18:36

This is taking too much time. People are losing interest in the wiki because editing the old one is pointless, and editing the new one is impossible until the transfer is done, so they're no longer showing up. Can I get an update on what's going on, so at least I can tell something to the users?

For reference @Reception123 started the image import on mw2.

I started both images and xml import. Not sure what progress on @Reception123 was on.

We apologise for any inconvenience.

Update: The script was killed likly due to ram / redis. I've restarted and it should start from where it's left off.

After 6 days there's still no sign of progress. Can you tell me something so I can tell the users?

Hi, as your import is over 19gb your import will take a few weeks from when we first started it.

Hi, status update we have had to put this on hold as we have limited storage on the db server now (~19gb left) (your wiki is 50+gb). We are hopping to do a funding round to secure funding to buy more db servers soon.

Paladox changed the task status from Open to Stalled.Oct 23 2018, 14:25

I can understand this kind of issues but the dump I provided (https://drive.google.com/file/d/1UbDuh0KW8xnYoadRtPkVK6CHy-RhnhNK/view?usp=sharing) is not 50 GB but 7,7 GB, and files were already uploaded together with 63,623 pages out of 171,977, so maybe there's only 3 GB left to upload. Are you sure you're uploading that dump? We don't want the whole wiki, just what we provided as a dump.

Hi, even though the dump may be small (unzipped it's 19gb) when imported it can be much bigger the db side.

unzipped these are the files

-rw-r--r-- 1 root root 19G Aug 22 16:28 nonciclopediawikiacom-20180818-history.xml
-rw-r--r-- 1 root root 7.7M Aug 24 15:05 nonciclopediawikiacom-20180818-images.txt
-rw-r--r-- 1 root root 4.5M Aug 18 07:46 nonciclopediawikiacom-20180818-titles.txt

Ok, I didn't knew it. So that means we won't be able to move to the new wiki until more money comes in?
Maybe you should warn people before they start moving their wikis because now we're stuck with a wiki that nobody wants to edit anymore since we thought we were about to leave, and a new wiki that is not ready and won't be for who knows how long. I don't think the community will survive this and I wish I knew before starting this process.

I am putting some work in today to try and mitigate this issue and allow the import to continue.

Status update, since @John did some work on reducing db sizes, we have managed to successfully import all of your wiki!

... but it isn't live yet, isn't it? Compare https://nonciclopedia.org/wiki/Special:Statistics with https://nonciclopedia.wikia.com/wiki/Special:Statistics and you'll see there's still a lot of pages (and some files) missing.

Note that i am currently running some scripts to update those statistics (importing does not automatically update that)

Hmm just ran the statistic script and it still shows 65,000.

I wonder if the reason why it didn't import the rest is because of:

PHP Warning:  XMLReader::next(): uploadsource://a8e46cd537f7373d2d577761c09abaee:213481028: parser error : Opening and ending tag mismatch: page line 65535 and mediawiki in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755

Warning: XMLReader::next(): uploadsource://a8e46cd537f7373d2d577761c09abaee:213481028: parser error : Opening and ending tag mismatch: page line 65535 and mediawiki in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755
PHP Warning:  XMLReader::next(): </mediawiki> in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755

Warning: XMLReader::next(): </mediawiki> in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755
PHP Warning:  XMLReader::next():             ^ in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755

Warning: XMLReader::next():             ^ in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755
Done!

I have no idea what the above means. I can only tell the import in the main namespace ends at the letter O and that the pages in old namespaces (which should go into the main one since they don't exist in the new wiki) were not imported at all.

@Wedhro which name spaces do we need to add to your wiki?

The new namespaces I asked for were already added, thanks. I will only need to move pages starting with "OldNamespace:Title" to the new namespaces, but that requires the import to get finished. Then I'll need a ton of other changes but let's make one little step at the time.

I see you used wikiteam tool to create this dump. Wikia provide there own dump too http://s3.amazonaws.com/wikia_xml_dumps/n/no/nonciclopedia_pages_full.xml.7z which is 23gb uncompressed. Im using that to import and hopping there's no syntax error or any lines mismatching.

It's still importing and may take a long while it seems.

Status update: it has now imported (after running with "--no-updates"). Now im running rebuildall.php which will rebuild the search index / recent changes + others too.

This is now done :)

So we may have found the problem. We would like to try and reimport your wiki (by deleting it and recreating with your permission of course)

Status update: We have installed mw 1.30 on test1, and we have recreated nonciclopediawiki db using mw 1.30 and are now using the import you provided (using --uploads too)

You will get a mw error until we update your wiki to 1.31 so please do not be alarmed (and that we are aware) :)

We would like to try and reimport your wiki (by deleting it and recreating with your permission of course)

Go for it.

In case it's still possible (if not, never mind) I'd like if you imported pages directly from Wikia because I just requested an updated dump, just wait until the date is 2018-11-08; but that dump doesn't include images, which should still be uploaded from the XML dump I provided.

EDIT: ... aaand, it's live.

@Wedhro thank you! Seems that it's still having syntax error (we think overflow) so we are going to import using mw 1.30.

I see the wiki is online again and pages with full histories and user contribs are imported. Good! Two things:

  • Images are missing, friendly reminder that they should be imported from the XML dump I provided.
  • EditCount doesn't work, it shows 0 edits. Can this be fixed?

"Images are missing, friendly reminder that they should be imported from the XML dump I provided." Is there a way you could generate a dump that only includes files please? (ie just the file pages).

I can try but first I I'd like to know why I should because I'm on a limited data plan and every GB is precious.

Oh, didn't realise you were on a limited plan. And it's because the other dump has a syntax error and because of that it will likely fail to import most or all of the file pages.

Is there a way for wikia to give you a dump with files included?

No, it's only possible with WikiTeam. Anyway, IIRC the syntax error came after importing images, maybe you could try importing images only (not the pages in the File namespace, they were already imported) and see if any error shows up?

"(not the pages in the File namespace, they were already imported)" so you mean just upload the images (actual images)?

Exactly (images, videos, audio etc.), not pages themselves, that are already imported; if you look at https://nonciclopedia.org/w/index.php?title=Speciale%3APrefissi&prefix=&namespace=6 you can see file pages were already imported but there's no actual file, just the page itself with history, categories etc. Then maybe it would need a script to "bond" images with the related File pages, create the history with thumbnails that appears at the bottom of the page etc., but I don't really know.

Ok, I’ve started the import of images.

There is an archive http://download.uncyc.org/it-images.zip of about 4.0G of Nonciclopedia images if it's of any use to you. I'd downloaded these from Wikia a couple of months ago.

I hope it will be possible to fix histories so that images won't be all attributed to mr. 127.0.0.1 in the file's page.

Since there's an actual and correct history loaded when pages were imported, this would "only" require to erase the last edit and show the real history in the files' pages.

For example, https://nonciclopedia.org/wiki/File:Snow_con_bicchiere.jpg only shows 127.0.0.1 as editor but in https://nonciclopedia.org/w/index.php?title=File:Snow_con_bicchiere.jpg&action=history we see it's actually Fuffoloschiomancio who loaded it, and Eeeeee who categorized it.

@Wedhro hi, im not sure how easy it would be to do that. I do know is that it's somewhere in the db. I just doin't know where exactly.

This is resolved as in importing is concerned. If you would like images to be fixed for attribution please open a seperate task (though I'm not sure if there's even a script for that (that can change the user of the uploader))