Page MenuHomeMiraheze

Migrate Wikimedia Australia wikis to Miraheze
Closed, ResolvedPublic

Description

Wikimedia Australia would like to move two wikis to Miraheze:

(Decided by the WMAU Committee here.)

We just have a couple of questions...

Is it possible to provide database dumps and image/ directory tarballs to be used for the import? Or do we request the wiki's normally and then import via XML dumps?

How can we provide the images, especially for the comm. wiki, which is private (for the public one, we can upload a dump to the Internet Archive; we want to do that anyway).

Thanks!

Event Timeline

Samwilson triaged this task as Normal priority.Sep 9 2018, 04:01
Samwilson created this task.

@Samwilson That's great news :)

  1. We prefer XML dumps, but if you think it's better for you we can also import SQL dumps (though we will import it into an empty database and then make an XML so it shouldn't change much anyway). Images can be provided in whichever format you wish (zip, tar, etc.) and we will import them.
  1. The private files should be able to be uploaded via Phabricator with a custom policy, but first we must ask, what is their size? (Would you like img_auth.php for them also?)

Note that we will set wgServer to https://comm.wikimedia.org.au for https://wmaucomm.miraheze.org/wiki/Main_Page once the wiki has been uploaded (and images) and the domain points at it.

You may want to think about adding both wikis to the wgCreateWikiInactiveWikisWhitelist whitelist.

@Samwilson Another way would be for you to use Dropbox, Google Drive or OneDrive and share the link to us via staff[at]miraheze[dot]org

Thanks!

No, it's okay, we can supply XML dumps no worries. And will upload tarballs of the two images directories to Google Drive and send you a link.

And yes, we'd like to use img_auth.php for the private wiki; is this possible?

A couple of other questions:

  • This means that we'll lose all log entries doesn't it?
  • Should we recreate all user accounts here before importing? So that imported revisions are associated with the correct account.

The uncompressed images directories are:

  • 427M images_commwiki
  • 849M images_wmauwiki

@Samwilson

  1. I believe if the XML dump script is ran with the correct options, log entries should remain.
  1. Yes, that would be helpful :) If not, we can always associate the accounts manually if needed (we have done so for Wikimedia Indonesia, and some other wikis).

I will wait for your email containing the link for Google Drive (perhaps it's easier to do the XML there as well).

And yes, img_auth.php is definitely possible :)

@Samwilson you can use my script here (which creates users only)

1#!/usr/bin/env bash
2
3#Needs curl and jq
4
5USERNAME="paladox"
6USERPASS="xxxxx"
7WIKIAPI="https://xxx.miraheze.org/w/api.php"
8cookie_jar="openhatchwiki"
9#Will store file in wikifile
10
11echo "UTF8 check: ☠"
12#################login
13echo "Logging into $WIKIAPI as $USERNAME..."
14
15###############
16#Login part 1
17#printf "%s" "Logging in (1/2)..."
18echo "Get login token..."
19CR=$(curl -S \
20 --location \
21 --retry 2 \
22 --retry-delay 5\
23 --cookie $cookie_jar \
24 --cookie-jar $cookie_jar \
25 --user-agent "Curl Shell Script" \
26 --keepalive-time 60 \
27 --header "Accept-Language: en-us" \
28 --header "Connection: keep-alive" \
29 --compressed \
30 --request "GET" "${WIKIAPI}?action=query&meta=tokens&type=login&format=json")
31
32echo "$CR" | jq .
33
34rm login.json
35echo "$CR" > login.json
36TOKEN=$(jq --raw-output '.query.tokens.logintoken' login.json)
37TOKEN="${TOKEN//\"/}" #replace double quote by nothing
38
39#Remove carriage return!
40printf "%s" "$TOKEN" > token.txt
41TOKEN=$(cat token.txt | sed 's/\r$//')
42
43
44if [ "$TOKEN" == "null" ]; then
45 echo "Getting a login token failed."
46 exit
47else
48 echo "Login token is $TOKEN"
49 echo "-----"
50fi
51
52###############
53#Login part 2
54echo "Logging in..."
55CR=$(curl -S \
56 --location \
57 --cookie $cookie_jar \
58 --cookie-jar $cookie_jar \
59 --user-agent "Curl Shell Script" \
60 --keepalive-time 60 \
61 --header "Accept-Language: en-us" \
62 --header "Connection: keep-alive" \
63 --compressed \
64 --data-urlencode "username=${USERNAME}" \
65 --data-urlencode "password=${USERPASS}" \
66 --data-urlencode "rememberMe=1" \
67 --data-urlencode "logintoken=${TOKEN}" \
68 --data-urlencode "token=${TOKEN}" \
69 --data-urlencode "loginreturnurl=https://xxx.miraheze.org" \
70 --request "POST" "${WIKIAPI}?action=clientlogin&format=json")
71
72echo "$CR" | jq .
73
74STATUS=$(echo $CR | jq '.clientlogin.status')
75if [[ $STATUS == *"PASS"* ]]; then
76 echo "Successfully logged in as $USERNAME, STATUS is $STATUS."
77 echo "-----"
78else
79 echo "Unable to login, is logintoken ${TOKEN} correct?"
80 exit
81fi
82
83###############
84#Login part 1
85#printf "%s" "Logging in (1/2)..."
86echo "Get createaccount token..."
87CR=$(curl -S \
88 --location \
89 --retry 2 \
90 --retry-delay 5\
91 --cookie $cookie_jar \
92 --cookie-jar $cookie_jar \
93 --user-agent "Curl Shell Script" \
94 --keepalive-time 60 \
95 --header "Accept-Language: en-us" \
96 --header "Connection: keep-alive" \
97 --compressed \
98 --request "GET" "${WIKIAPI}?action=query&meta=tokens&type=createaccount&format=json")
99
100echo "$CR" | jq .
101
102rm logins.json
103echo "$CR" > logins.json
104TOKENSS=$(jq --raw-output '.query.tokens.createaccounttoken' logins.json)
105TOKENSS="${TOKENSS//\"/}" #replace double quote by nothing
106
107#Remove carriage return!
108printf "%s" "$TOKENSS" > tokens.txt
109TOKENSS=$(cat tokens.txt | sed 's/\r$//')
110
111
112if [ "$TOKENSS" == "null" ]; then
113 echo "Getting a login token failed."
114 exit
115else
116 echo "CreateAccount token is $S"
117 echo "-----"
118fi
119
120mysql -u root -toor xxxwikidb -e 'SELECT rev_user_text, user.user_email, COUNT(*) AS CONTRIBS FROM revision JOIN user ON revision.rev_user = user.user_id GROUP BY revision.rev_user_text;' | while read -r column1 user column2 email value;
121do
122 CR=$(curl -S \
123 --location \
124 --cookie $cookie_jar \
125 --cookie-jar $cookie_jar \
126 --user-agent "Curl Shell Script" \
127 --keepalive-time 60 \
128 --header "Accept-Language: en-us" \
129 --header "Connection: keep-alive" \
130 --compressed \
131 --data-urlencode "createreturnurl=https://meta.miraheze.org" \
132 --data-urlencode "createtoken=${TOKENSS}" \
133 --data-urlencode "username=${user}" \
134 --data-urlencode "mailpassword=1" \
135 --data-urlencode "email=${email}" \
136 --data-urlencode "realname=" \
137 --data-urlencode "reason=Per+request" \
138 --request "POST" "${WIKIAPI}?action=createaccount&format=json")
139
140 echo "$CR" | jq .
141
142done

Steps to use

  1. Replace any xxx with the wikis subdomain (not custom domain and also excluding .miraheze.org since that's already there :))
  1. Add your password to PASSWORD= field. same for username
  1. Replace cookie_jar field text with your wikis db on miraheze so ie subdomainwiki.
  1. Replace "--data-urlencode "createreturnurl=https://meta.miraheze.org" \" with --data-urlencode "createreturnurl=https://<wiki_subdomain>miraheze.org" \
  1. replace xxxwikidb with subdomainwiki.

now you can run the script.

We could grant you some relief from hitting a rate limit or something?

I'm attempting to run dumpBackup.php, and hitting a bug. It's hanging with no error, after a couple of dozen pages. Seems to happen even after restoring the DB locally and trying with a clean MW install. Not sure what's going on. :-(

I might try with your script @Paladox, but will try a bit more to get the XML dump working. Might be an actual problem with the data somewhere.

You could copy the wiki to your local computer and do it (so same mw version) or we can do it as ops with the sql dump.

by clean mw installed is that mw 1.31? did you run update.php too?

try php dumpBackup.php --full > dump.xml

I don't know what the problem was, but I ran fixDoubleRedirects.php and rebuildall.php and everything now exports correctly. Some wayward link record I guess.

I've made comm.wikimedia.org.au readonly, dumped the latest database, XML, and images directory, zipped them and sent a link to staff@.

So I hope everything's ready to import! :-)

We've received the email to staff@ :)

Reception123 mentioned this in Unknown Object (Diffusion Commit).Sep 12 2018, 06:47

Terrific! :)

By the way, is wmaucomm correctly set to private? I seem to be able to browse without being logged in.

@Samwilson It is private. The Main Page is exempted by default, if you would like we can either change this, or you can create a redirect.

The image and file imports have began (images are all img_auth.php)

the xml import has finished on https://wmaucomm.miraheze.org/wiki/

So progress statement is that we have imported the xml + images onto https://wmaucomm.miraheze.org/wiki/ now :)

You'll probably want to request another wiki for https://wikimedia.org.au :)

(and send the link to do the xml + images for that wiki too) :)

Thank you! Looks great. And yep, I'll get on to the main wiki shortly. :) This is exciting!

The only one issue I can see is that the file imports have overwritten the image description pages, e.g.:
https://wmaucomm.miraheze.org/w/index.php?title=File%3AMemberdb-approvingmembers.png&type=revision&diff=221&oldid=1146

Oh, you can revert the description change. :) (i think that's safe?)

Good point; yeah, that should be fine.

Would someone be able to run rollbackEdits.php --user Maintenance_script ?

Anymore progress on this? :)

The WMAU committee has a meeting tomorrow, where the next stage will be decided. Gotta get people to approve where we're up to so far. :-)

So the Committee has decided not to proceed with the migration.

I'm very sorry about this! Thank you so much for all your help and the work you did getting our data imported.

Could you please delete the wmaucomm wiki and the data dumps we provided?

We've decided to continue hosting our own MediaWiki instances, because we have keen volunteers who want to do it and others who want to learn how.

Thanks again for your support. Sorry again for the hassle.

@Samwilson Sorry to hear that, but thank you anyway for recommending us :)

I hope of course you will still be able to help us with things from time to time.

Paladox claimed this task.

Thanks for that. WMAU is sending a donation to Miraheze in appreciation.

And yup I'm still around and will do what I can with extensions and whatnot.