Page MenuHomeMiraheze

Import of nonciclopedia.miraheze.org is not progressing
Closed, ResolvedPublic

Description

Almost 2 weeks ago I asked for my wiki to be imported by providing a XML dump on Google Drive (link), and @Reception123 started the process (see T3553#68345), but a week ago it stopped without notice and there's still a ton of pages and images to import.

Event Timeline

Wedhro created this task.Sep 22 2018, 05:42
Paladox added a subscriber: Paladox.Sep 23 2018, 02:10

@Reception123 did you start the import? I can do it.

AmandaCath triaged this task as Normal priority.Sep 23 2018, 18:36
Wedhro added a comment.Sep 27 2018, 09:24

This is taking too much time. People are losing interest in the wiki because editing the old one is pointless, and editing the new one is impossible until the transfer is done, so they're no longer showing up. Can I get an update on what's going on, so at least I can tell something to the users?

Paladox added a comment.Sep 28 2018, 21:53

For reference @Reception123 started the image import on mw2.

Paladox claimed this task.EditedOct 6 2018, 23:25

I started both images and xml import. Not sure what progress on @Reception123 was on.

We apologise for any inconvenience.

Paladox added a comment.Oct 10 2018, 21:51

Update: The script was killed likly due to ram / redis. I've restarted and it should start from where it's left off.

Wedhro added a comment.Oct 16 2018, 17:47

After 6 days there's still no sign of progress. Can you tell me something so I can tell the users?

Paladox added a comment.Oct 16 2018, 17:49

Hi, as your import is over 19gb your import will take a few weeks from when we first started it.

Paladox added a comment.Oct 23 2018, 14:25

Hi, status update we have had to put this on hold as we have limited storage on the db server now (~19gb left) (your wiki is 50+gb). We are hopping to do a funding round to secure funding to buy more db servers soon.

Paladox changed the task status from Open to Stalled.Oct 23 2018, 14:25
Wedhro added a comment.Oct 23 2018, 14:45

I can understand this kind of issues but the dump I provided (https://drive.google.com/file/d/1UbDuh0KW8xnYoadRtPkVK6CHy-RhnhNK/view?usp=sharing) is not 50 GB but 7,7 GB, and files were already uploaded together with 63,623 pages out of 171,977, so maybe there's only 3 GB left to upload. Are you sure you're uploading that dump? We don't want the whole wiki, just what we provided as a dump.

Paladox added a comment.Oct 23 2018, 14:48

Hi, even though the dump may be small (unzipped it's 19gb) when imported it can be much bigger the db side.

Paladox added a comment.Oct 23 2018, 14:49

unzipped these are the files

-rw-r--r-- 1 root root 19G Aug 22 16:28 nonciclopediawikiacom-20180818-history.xml
-rw-r--r-- 1 root root 7.7M Aug 24 15:05 nonciclopediawikiacom-20180818-images.txt
-rw-r--r-- 1 root root 4.5M Aug 18 07:46 nonciclopediawikiacom-20180818-titles.txt

Wedhro added a comment.EditedOct 23 2018, 14:56

Ok, I didn't knew it. So that means we won't be able to move to the new wiki until more money comes in?
Maybe you should warn people before they start moving their wikis because now we're stuck with a wiki that nobody wants to edit anymore since we thought we were about to leave, and a new wiki that is not ready and won't be for who knows how long. I don't think the community will survive this and I wish I knew before starting this process.

John added a subscriber: John.Oct 23 2018, 15:00

I am putting some work in today to try and mitigate this issue and allow the import to continue.

Paladox added a comment.Oct 28 2018, 14:43

Status update, since @John did some work on reducing db sizes, we have managed to successfully import all of your wiki!

Paladox closed this task as Resolved.Oct 28 2018, 14:46
Wedhro added a comment.EditedOct 28 2018, 14:53

... but it isn't live yet, isn't it? Compare https://nonciclopedia.org/wiki/Special:Statistics with https://nonciclopedia.wikia.com/wiki/Special:Statistics and you'll see there's still a lot of pages (and some files) missing.

Paladox added a comment.Oct 28 2018, 14:56

Note that i am currently running some scripts to update those statistics (importing does not automatically update that)

Wedhro added a comment.Oct 28 2018, 14:57

OK, I'll wait.

Paladox added a comment.Oct 28 2018, 15:02

Hmm just ran the statistic script and it still shows 65,000.

I wonder if the reason why it didn't import the rest is because of:

PHP Warning:  XMLReader::next(): uploadsource://a8e46cd537f7373d2d577761c09abaee:213481028: parser error : Opening and ending tag mismatch: page line 65535 and mediawiki in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755

Warning: XMLReader::next(): uploadsource://a8e46cd537f7373d2d577761c09abaee:213481028: parser error : Opening and ending tag mismatch: page line 65535 and mediawiki in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755
PHP Warning:  XMLReader::next(): </mediawiki> in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755

Warning: XMLReader::next(): </mediawiki> in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755
PHP Warning:  XMLReader::next():             ^ in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755

Warning: XMLReader::next():             ^ in /srv/mediawiki/w/includes/import/WikiImporter.php on line 755
Done!
Wedhro added a comment.Oct 28 2018, 15:11

I have no idea what the above means. I can only tell the import in the main namespace ends at the letter O and that the pages in old namespaces (which should go into the main one since they don't exist in the new wiki) were not imported at all.

Paladox reopened this task as Open.Oct 28 2018, 15:16

@Wedhro which name spaces do we need to add to your wiki?

Wedhro added a comment.Oct 28 2018, 15:22

The new namespaces I asked for were already added, thanks. I will only need to move pages starting with "OldNamespace:Title" to the new namespaces, but that requires the import to get finished. Then I'll need a ton of other changes but let's make one little step at the time.

Paladox added a comment.Oct 28 2018, 20:50

I see you used wikiteam tool to create this dump. Wikia provide there own dump too http://s3.amazonaws.com/wikia_xml_dumps/n/no/nonciclopedia_pages_full.xml.7z which is 23gb uncompressed. Im using that to import and hopping there's no syntax error or any lines mismatching.

@Paladox What is the status on the import?

Paladox added a comment.Oct 30 2018, 19:19

It's still importing and may take a long while it seems.

Paladox added a comment.Nov 1 2018, 00:26

Status update: it has now imported (after running with "--no-updates"). Now im running rebuildall.php which will rebuild the search index / recent changes + others too.

Paladox closed this task as Resolved.Nov 1 2018, 17:15

This is now done :)

Paladox reopened this task as Open.Nov 7 2018, 23:22

So we may have found the problem. We would like to try and reimport your wiki (by deleting it and recreating with your permission of course)

Paladox added a comment.Nov 8 2018, 01:40

Status update: We have installed mw 1.30 on test1, and we have recreated nonciclopediawiki db using mw 1.30 and are now using the import you provided (using --uploads too)

You will get a mw error until we update your wiki to 1.31 so please do not be alarmed (and that we are aware) :)

Wedhro added a comment.Nov 8 2018, 15:23

We would like to try and reimport your wiki (by deleting it and recreating with your permission of course)

Go for it.

Wedhro added a comment.EditedNov 8 2018, 19:51

In case it's still possible (if not, never mind) I'd like if you imported pages directly from Wikia because I just requested an updated dump, just wait until the date is 2018-11-08; but that dump doesn't include images, which should still be uploaded from the XML dump I provided.

EDIT: ... aaand, it's live.

Paladox added a comment.Nov 8 2018, 21:57

@Wedhro thank you! Seems that it's still having syntax error (we think overflow) so we are going to import using mw 1.30.

Wedhro added a comment.Nov 10 2018, 16:46

I see the wiki is online again and pages with full histories and user contribs are imported. Good! Two things:

  • Images are missing, friendly reminder that they should be imported from the XML dump I provided.
  • EditCount doesn't work, it shows 0 edits. Can this be fixed?
Paladox added a comment.Nov 10 2018, 16:52

"Images are missing, friendly reminder that they should be imported from the XML dump I provided." Is there a way you could generate a dump that only includes files please? (ie just the file pages).

Wedhro added a comment.Nov 10 2018, 16:54

I can try but first I I'd like to know why I should because I'm on a limited data plan and every GB is precious.

Paladox added a comment.Nov 10 2018, 16:55

Oh, didn't realise you were on a limited plan. And it's because the other dump has a syntax error and because of that it will likely fail to import most or all of the file pages.

Is there a way for wikia to give you a dump with files included?

Wedhro added a comment.Nov 10 2018, 17:00

No, it's only possible with WikiTeam. Anyway, IIRC the syntax error came after importing images, maybe you could try importing images only (not the pages in the File namespace, they were already imported) and see if any error shows up?

Paladox added a comment.Nov 10 2018, 17:06

"(not the pages in the File namespace, they were already imported)" so you mean just upload the images (actual images)?

Wedhro added a comment.EditedNov 10 2018, 17:14

Exactly (images, videos, audio etc.), not pages themselves, that are already imported; if you look at https://nonciclopedia.org/w/index.php?title=Speciale%3APrefissi&prefix=&namespace=6 you can see file pages were already imported but there's no actual file, just the page itself with history, categories etc. Then maybe it would need a script to "bond" images with the related File pages, create the history with thumbnails that appears at the bottom of the page etc., but I don't really know.

Paladox added a comment.Nov 11 2018, 01:34

Ok, I’ve started the import of images.

Carlb added a subscriber: Carlb.Nov 11 2018, 02:52

There is an archive http://download.uncyc.org/it-images.zip of about 4.0G of Nonciclopedia images if it's of any use to you. I'd downloaded these from Wikia a couple of months ago.

Wedhro added a comment.Nov 11 2018, 05:55

I hope it will be possible to fix histories so that images won't be all attributed to mr. 127.0.0.1 in the file's page.

Since there's an actual and correct history loaded when pages were imported, this would "only" require to erase the last edit and show the real history in the files' pages.

For example, https://nonciclopedia.org/wiki/File:Snow_con_bicchiere.jpg only shows 127.0.0.1 as editor but in https://nonciclopedia.org/w/index.php?title=File:Snow_con_bicchiere.jpg&action=history we see it's actually Fuffoloschiomancio who loaded it, and Eeeeee who categorized it.

Paladox added a comment.Nov 11 2018, 18:10

@Wedhro hi, im not sure how easy it would be to do that. I do know is that it's somewhere in the db. I just doin't know where exactly.

Paladox closed this task as Resolved.Nov 18 2018, 01:34

This is resolved as in importing is concerned. If you would like images to be fixed for attribution please open a seperate task (though I'm not sure if there's even a script for that (that can change the user of the uploader))