Page MenuHomeMiraheze

Animated Feet at Miraheze
Closed, ResolvedPublic

Description

Wiki URLhttps://animatedfeet.miraheze.org/wiki/Main_PageBecause the wiki doesn't have as much pages as the sooner or later defunct older wiki, I was thinking that could you transfer the content from https://animefeet.fandom.com on to https://animatedfeet.miraheze.org

the pages are down below of the email:

Related Objects

Event Timeline

Paladox subscribed.

Hello,

Sorry for the delay, i will start the import now!

Hello Paladox thank you for claiming this task! I was the bureaucrat on the project's previous location and Delta has extended those rights here as well. Wikia/Fandom appears to have closed the project now. Do you still have the XML file? Pages_full zipped in a 7z was 17MB so very easy to download but I noticed when I decompressed it was 461 MB.

Do you know if there is a way to track the % completion we are of incorporating the XML data? When it wouldn't let me upload the 461MB file (I figured due to 250MB cap on file uploads) I tried to do pages_current since it was smaller (8MB compressed, 90MB decompressed) but it only added about 5 pages before giving an error.

What confuses me is that when I view https://animatedfeet.miraheze.org/wiki/Special:Log?type=import I can see my 5 adds but I can also see that a lot of history has been imported but it just doesn't show up in the log.

Am I complicating matters by doing this? I don't want to get in the way. I didn't know there was already a Phabricator Task open for this until today when I saw Void's reply at https://meta.miraheze.org/wiki/Community_noticeboard#ways_to_check_import_logs and only just set up Phabricator due to activations getting sent to junk mail.

I'm not sure how to monitor the progress of the XML import (is it at 100% now that you've moved on to help with AB's XML now?) but for whenever Phabricator has the GB available I believe archive.org has a .torrent and even a direct download of the images?

It was very big though so if you only have 70 left we can wait however many months it takes to expand storage.

https://archive.org/download/wiki-animefeetfandomcom shows the dump's .7z file to be 60.9 GB, no idea how much bigger it would get decompressed. My guess is more than the 70 you have available, and given that AnimeBaths was shut down first it only seems fair to prioritize it's recovery first, which is more feasible since there is less data needed for its image files (about 25%)

the overview at https://archive.org/details/wiki-animefeetfandomcom also links to https://archive.org/download/wiki-animefeetfandomcom/wiki-animefeetfandomcom_archive.torrent if that might be an easier way to acquire it.

I didn't set this up (it was the mysterious "Feetlord 3000" from /a/ in February) so I'm not really sure how they did this. I've never known how to back up more than individual webpages on archive.org so entire sites are beyond me. I'm not sure how hard it would be to get the format to match here.

Hello, i still have the xml file (and have restarted the import). Sorry for the late response as it seems i missed your reply :(.

I would be good to hold of on the 60g files for now.

Also even though the file is smaller (the processing of the file is what counts, which can take longer then the timeout we have configured).

Hello, after such a long delay (sorry), this has finally import!

I was just thinking it might be more appropriate to change the status from Resolved>Stalled

Paladox thank you for your wonderful work in completing importing the XML file and restoring the text source code for all our pages. This has been of great help and is a very important first step which does give us the tools to rebuild a lot of the work.

I realize the final step (importing the images) must be indefinitely postponed until we know Miraheze's storage capacities are up to the task.

Until that can finally happen (2021? 2022?) one of the core valuable things from the old projects is still missing :(

Tycio lowered the priority of this task from Normal to Low.EditedNov 10 2019, 00:00

One thing I was wondering about image importing... since we know there is not capacity for ALL of the images, I'm wondering if there is some way we could just download a partial amount of them (~1%ish?) base on some kind of criteria, and maybe put off the rest until later?

I'm not sure how feasible it would be to selectively choose files from an archive like that. I was thinking perhaps we could start with ones I uploaded?

I'm not sure if there's a way to find that list in the archive, it doesn't appear to have restored the old file pages/descriptions from the old wiki which might've put it on the uploads... I do see that a "no file by this names exists" if I click a history it does show it appearing, for example at https://animatedfeet.miraheze.org/w/index.php?title=File:WilykitTheUnholyAlliance.png&action=history it shows I originally added that file on 17 Oct 2013 even though https://animatedfeet.miraheze.org/w/index.php?title=Special:ListFiles/Tycio&ilshowall=1 only shows files from June 2019 and onward.

One problem we had, which I regret not having dealt with prior to archival, was allowing a lot of needlessly bloated (ie upscaled) screenshots, as well as excessive detail like people copying EVERY SINGLE FRAME of a given seen, when there was so little variation between frames it wasn't necessary at all.

I know I personally never did that, only did a minimist representation of scenes, so I think that would be prioritizing good files.

I'm sure there were other users who made more ideal uploads like that, whose files I'd feel comfortable prioritizing the restoration of compared to others... so after my uploads we could possibly move onto theirs later as soon as additional space becomes available and they can remind me of their habits.

Any updates on this? No comments since November yet I can't identify a clear status.

We have more storage now, how much do you need to upload?

Have just seen the download link, 60gb compressed.

So uncompressed would mean hundreds of gb i guess? So we would need to increase capacity by alot.

Stallng for now.

Paladox changed the task status from Stalled to Open.May 12 2020, 23:03
Unknown Object (User) changed the task status from Open to Stalled.Oct 27 2020, 02:07

I'm not sure that this will be able to be imported ever, it's such a large wiki....

Unknown Object (User) removed a subscriber: BrandonWM.Nov 19 2020, 05:07
Tycio raised the priority of this task from Low to Normal.Nov 19 2020, 08:52

one thing I am trying to figure out, for importing the files/pictures from the old wiki in 2018 would we open a new request or just tal kabout it in this existing thread?
hoping it could be pulled off archive.org

@Tycio You can just mention it in this task if you'd like

Unknown Object (User) moved this task from Backlog to Maintenance Script Run on the MediaWiki board.Nov 27 2020, 08:35
Unknown Object (User) lowered the priority of this task from Normal to Low.Dec 8 2020, 22:12

Stalled tasks should be low priority.

John claimed this task.
John subscribed.

If I am reading the status of this task right;

  • The import itself is done, bar images.
  • Given the length and fractured discussions on this task, it's becoming hard to understate the status of it.
  • It would be great if a new task can be opened (or this replied to) with a clear understanding of what is now required from us.

Thanks.

60.9g wikidump from 24 feb 2019 at https://archive.org/download/wiki-animefeetfandomcom
I figure this must include the images.
Basically 2fold:

  1. is it feasible to recover and import?
  2. does miraheze have storage capacity for that?
  3. after such an import, are there tools we could do to purge low-quality / bloated / duplicate images to help lessen Miraheze's storage burden?

I know in a lot of cases when images get deleted they're still actually there in storage (mods can view them, just not normal users) which doesn't help lessen Miraheze's storage burden
so I can only figure this would be possible via some kind of "hard delete" option on files

Prior to such a massive import, with the intent being prepared to give tools to userbase to purge undesired files to lessen our footprint, would like to know what tools could be used to help remove them quickly and FOREVER

ie for example to be able to pare down 60g to 30g as a start, maybe create some kind of nomination system where any user could nominate images for perma-purge for admin approval, to allow collaboration on such an effort

will readily admit that because of Wikia/Fandom having seemingly unlimited storage this was nver a concern and pretty much wasn't picky about people adding dupe images, bloated scale-ups, uncropped, frame-by-frame spam, etc

regret not being pickier then but can be picky now, but will need help

Unknown Object (User) removed John as the assignee of this task.Dec 29 2020, 07:33
John changed the task status from Stalled to Open.Dec 29 2020, 08:43
In T4166#130811, @Tycio wrote:

Basically 2fold:

  1. is it feasible to recover and import?
  2. does miraheze have storage capacity for that?
  3. after such an import, are there tools we could do to purge low-quality / bloated / duplicate images to help lessen Miraheze's storage burden?

@Paladox

We have more storage now, could this be imported over the coming weeks?

Paladox removed Paladox as the assignee of this task.Mar 6 2021, 22:16

I'm going to unassign my self and leave it up to the MediaWiki (SRE) team.

@Universal_Omega / @RhinosF1 / @Reception123 want to do this?

Unknown Object (User) moved this task from Backlog to Long Term on the MediaWiki (SRE) board.Mar 24 2021, 02:11
Unknown Object (User) unsubscribed.Apr 3 2021, 19:59
John claimed this task.

Going to mark this as resolved as the dump has been waiting for a long period of time.

During this time, I'm sure images will have been uploaded.

If the original dump of images is what is required, please re-open this task and I will get around to it immediately.

Yeah it's the original dump that's required, though I guess I was thinking AnimeBaths would be a better test-run since it has less data to recover, a longer history, and higher-quality stuff.

New images have been uploaded to both but I don't think that would interfere with restoring the old files.

John changed the task status from Stalled to Open.Aug 24 2021, 16:15

The dump is 60GB in size, I'll download it and take a look at the fully uncompressed sizes to mostly figure out if we are able to import it and if so, I'll start on it.

I haven't forgotten this task, it just turned out more complicated than originally anticipated as all file names are in the form of revision grabber and not direct filenames - so I have had to create a script to not only rename them, but also categorise them by author to ensure attribution is retained.

fileRename.sh
while IFS=$'\t' read -r -a fileArray
do
        if [ ! -d ${fileArray[2]} ]
        then
                mkdir ${fileArray[2]}
        fi

        shortTitle=${fileArray[1]::-34}

        mv ${fileArray[0]} ${fileArray[2]}/${shortTitle:67}
done < fileData

About 50% of files should have been imported by now, this should go up to 60-65% overnight.

Tomorrow I will start the last 25% import I have prepped already, to bring it to 85-90%. The remaining percentage are classified as 'hash files' which contained hashes and special characters that the MediaWiki importImages script is not able to render for one reason or another. Tomorrow I will attempt to sort as many of these files as possible to make them importable.

I expect this task will be completed tomorrow evening or Thursday morning at the latest.

This should now be fully resolved.

Yes it seems like it, thanks for your help!
I just wish there had been a similar archive.org download of animebaths.
The MEGA.nz also appears to be down...
Now I'm trying to remember if I had downloaded it or not before my old PC died, need to check externals.