Page MenuHomeMiraheze

Consider creating an import usage policy
Closed, ResolvedPublic

Description

(TL;DR What kind of rules should we have for large wiki/image imports?)

While this isn't very frequent and not something I'd worry about yet, I think we should start considering a fair use policy.

This would mainly affect imports and, as an example would make it harder for a user that is the single user on a wiki be able to request a large amount of Wikipedia pages for import, or for example a single wiki to request say 30 GB worth of images. Currently we do tend to ask questions or refuse such requests but there's no clear guideline or policy. For example, right now I've started an import for 15GB worth of images for a wiki. I wouldn't say that's too much but we do want to think about these things in general.

Therefore, we should start thinking about what kind of policy we want. I'd not personally be in favour of any maximum limits for imports/images at this time, but I would be in favour of some sort of policy or constraints. For example maybe having a few questions that people need to answer when requesting an XML import or image import (i.e. How many active users are there on your wiki / Why do you need this import / do you expect many contributors, etc.). And maybe if we do want some sort of very minimal limit we could perhaps require (or in practice only accept) very large imports if a certain amount of active users exist.

Any other ideas/thoughts?

Event Timeline

Reception123 created this task.

100% in favour of this. This should've been done a year ago. I'd say 5-10 GB/wiki is a reasonable limit. :)

Unknown Object (User) added a comment.Jan 20 2022, 08:25

I think it's a good idea to do, I'm not sure exactly what it should be yet, 5-10GB does seem a bit low depending, but it would all depend on the circumstances. I mean that is still quite a bit for an XML import, but not a huge amount for files really.

I think it's a good idea to do, I'm not sure exactly what it should be yet, 5-10GB does seem a bit low depending, but it would all depend on the circumstances. I mean that is still quite a bit for an XML import, but not a huge amount for files really.

Yeah, what I think is best isn't a specific number but a guideline that we can be flexible about. But the most important I feel is the number of active users on a wiki and the purpose of the import. The main issue I'm wishing to avoid is a wiki that has a single user and is requesting an unreasonable amount of content to be imported when it's not physically possible for one user to need / work on that much content. Perhaps if we want some (flexible) limits we should set them in relation to the number of active users. For example, if a wiki only has 1 active user, they shouldn't be able to request more than 5GB worth of images OR say 500MB worth of XML content. And we should also be more strict / take a different approach to Wikipedia imports than to transfers from old wikis.

A solution would be to ask the following questions:

  • What is the purpose of this import? (Are you transferring a wiki from another provider? Are you forking content from somewhere else?)
  • How many active users are you expecting will join your wiki? (realistically)
  • How many active users do you currently have on your Miraheze wiki?

As for minimal sizes, it's difficult to come up with exact numbers but perhaps the following could work (feel free to propose changes).

  • 1 active user: 500MB XML max; 2.5GB images max. (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
    • Imports from Wikipedia (or any other website that didn't belong to the user before) are limited to 100MB unless justification is provided.
  • 1-5 active users: 1GB XML; 5GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • 5-15 active users: 5GB XML; 10GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • 15-25 active users: 10GB XML; 20 GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • > 25 active users: quite rare, discretion to be used but unlikely justifiable to exceed 15GB for an XML and 30GB for an image dump.

If the imports exceed these sizes, an additional question of "Is this content essential for the functioning of your wiki? Can your wiki successfully work without this content?)

What exactly does this have to do with fair use?

Naleksuh renamed this task from Consider creating a fair use (imports) policy to Consider creating an import usage policy.Jan 20 2022, 23:21

Just my two cents, but I would also hope that this would help avoid some of the attitude around "I moved all of the old site's media to Miraheze, but it isn't exactly what I/we wanted. Could you export our media for us?"

Unknown Object (User) moved this task from Backlog to Short Term on the MediaWiki (SRE) board.Jan 22 2022, 00:22

A solution would be to ask the following questions:

  • What is the purpose of this import? (Are you transferring a wiki from another provider? Are you forking content from somewhere else?)
  • How many active users are you expecting will join your wiki? (realistically)
  • How many active users do you currently have on your Miraheze wiki?

As for minimal sizes, it's difficult to come up with exact numbers but perhaps the following could work (feel free to propose changes).

  • 1 active user: 500MB XML max; 2.5GB images max. (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
    • Imports from Wikipedia (or any other website that didn't belong to the user before) are limited to 100MB unless justification is provided.
  • 1-5 active users: 1GB XML; 5GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • 5-15 active users: 5GB XML; 10GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • 15-25 active users: 10GB XML; 20 GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • > 25 active users: quite rare, discretion to be used but unlikely justifiable to exceed 15GB for an XML and 30GB for an image dump.

If the imports exceed these sizes, an additional question of "Is this content essential for the functioning of your wiki? Can your wiki successfully work without this content?)

100% agree with these numbers. No objections from me. It would be absolutely silly for a wiki with only 1 user to have XML imports of over 1GB.

After a discussion with Raidarr, I think it's a good idea to have a separation between two main categories of imports:

  1. Migrations where the whole wiki community is moving. We should be more lenient towards these wikis and they will usually include a larger number of active users
  1. Forking a wiki where not all the community is moving - we should be more strict on these especially if there are few users (or one user only) who are part of the attempt. If there is a large 'chunk' of the community moving, we can be more lenient as with 1.

2a. Wikipedia import requests are a sort of forking, and as such the same considerations as 2 apply, however we can be more strict here if it's a single user who wishes to import large portions of Wikipedia without good justification.

I will think about how we could make this into a policy.

Unknown Object (User) unsubscribed.Feb 12 2022, 07:24

Here is my current draft: https://meta.miraheze.org/wiki/User:Reception123/imports_policy_draft_(SRE)

It may be a bit complicated and need fixing, but just as the general idea.

2a. Wikipedia import requests are a sort of forking, and as such the same considerations as 2 apply, however we can be more strict here if it's a single user who wishes to import large portions of Wikipedia without good justification.

Regarding this, I would encourage us to be very strict with these. Wikipedia fork wikis usually suffer from only one, maybe two, editors, and it is just mathematically impossible for one user to maintain, repair, fix, edit, keep free of vandalism and spam, a wiki with 100,000 pages imported from English Wikipedia (or similar Wikipedia). So, for these ones, in addition to file size, we should have maximum pages that may be imported, either in one shot or collectively, based on the number of unique persons actively editing the wiki. If only one unique person editor, I'd suggest a very generous maximum number of pages at 25,000, and 5,000 pages per import (total of 5 imports).

A solution would be to ask the following questions:

  • What is the purpose of this import? (Are you transferring a wiki from another provider? Are you forking content from somewhere else?)
  • How many active users are you expecting will join your wiki? (realistically)
  • How many active users do you currently have on your Miraheze wiki?

As for minimal sizes, it's difficult to come up with exact numbers but perhaps the following could work (feel free to propose changes).

  • 1 active user: 500MB XML max; 2.5GB images max. (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
    • Imports from Wikipedia (or any other website that didn't belong to the user before) are limited to 100MB unless justification is provided.
  • 1-5 active users: 1GB XML; 5GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • 5-15 active users: 5GB XML; 10GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • 15-25 active users: 10GB XML; 20 GB images max (Can be exceeded if it's approved by a MWE following the response to the question below or if a reasonable use case exists)
  • > 25 active users: quite rare, discretion to be used but unlikely justifiable to exceed 15GB for an XML and 30GB for an image dump.

If the imports exceed these sizes, an additional question of "Is this content essential for the functioning of your wiki? Can your wiki successfully work without this content?)

I'd also suggest for anything over 5 GB, it requires approval of the EM MediaWiki or the EM Infrastructure, if the EM MediaWiki is on a leave of absence/vacation/otherwise unable to approve or decline in a reasonable period of time. Allow MWEs to approve up to 5 GB of imports for wikis, in line with the policy of course.

If no one objects then I think this should formally be implemented

If no one objects then I think this should formally be implemented

That should be fine but my draft isn't final yet as I was still thinking of some minor changes. I aim to get this done after the general normal tasks backlog is stabilised.

A new version of the draft is now available. Please feel free to review and give suggestions. I plan on having it as a guideline rather than policy since it's not imposing strict conditions, it's just a guideline that should be followed.

Reception123 claimed this task.

With no extra suggestions or objections, this can now be adopted. Since it is a guideline minor changes can be made over time. All import requests created after this post should be assessed following the guideline if possible.