Page MenuHomeMiraheze

Investigate and Implement basic Machine Learning concepts for automatic wiki creation
Open, LowPublic

Description

It is my understanding that automatic wiki creation has been a low-hanging fruit with Miraheze for a while. Machine Learning is becoming an increasingly trusted and common-place technology in the industry and given part of my studies align with computation applications of neural networks (AI/ML), I'm creating this task to work with @John on investigating whether CreateWiki and Miraheze can capitalise.

An ideal situation would be to remove most simple/moderate style wiki creations away from needing human input while flagging more complex and perhaps irregular behaviour for a human to review. This would produce an almost automatic wiki creation process for users, and reduce the strain on volunteers for working with more complex cases.

Event Timeline

Owen triaged this task as Low priority.Jan 13 2020, 21:19
Owen created this task.
Owen added a comment.Jan 14 2020, 17:54

Considering this, it is likely the implementation method is going to be creating a web-interface which is able to accept input parameters, where the extension can make web requests and receive a score or response to process.

Backend language is likely going to be Python, seeing as this is industry standard for machine learning. This can potentially be expanded upon by using R (rpy2) if the complexities of R would outweigh and improve on what Python can offer in term of advance neural learning.

How would this deal with more subtle things like spotting potential socks requesting wikis?

Could it check CU data?

How will we train datasets?

Owen added a comment.Jan 15 2020, 21:35

How would this deal with more subtle things like spotting potential socks requesting wikis?

It wouldn't as that would require a critical test to be developed which means everyone can spot them all the time with no false positives.

Could it check CU data?

No.

How will we train datasets?

Existing request data, with information that is known about them.

It could be a good idea but we’ll have to see how accurate it works out to be.

It’d be interesting to see how things like past requests, account age etc. could impact as well.

I suppose I could train it on patterns of LTAs.

A big thing would be auto-closing things like duplicate requests and wikis that already exist.

Owen added a comment.Jan 15 2020, 21:44

It’d be interesting to see how things like past requests, account age etc. could impact as well.

Would likely be part of an exploratory analysis to evaluate factors outside of text reasoning alone.

A big thing would be auto-closing things like duplicate requests and wikis that already exist.

These, with changes, would never get this far.

John moved this task from Backlog to Features on the CreateWiki board.Apr 12 2020, 20:55
AmandaCath added a subscriber: AmandaCath.

How would this deal with more subtle things like spotting potential socks requesting wikis?

Could it check CU data?

How will we train datasets?

As far as I know, unless I've missed something, just because you are blocked or are a sock of a blocked user on Meta doesn't mean that you can't create your own wiki... I would think that the only cases where someone would be prohibited from creating a wiki would be if they were under a global community ban, which appears to only apply to two specific users at the moment.

Global Ban != Global Lock
Local action could also be taken to prevent wiki requests via block.

Someone under a ToU ban (2 people) would also be banned from creating a wiki.

I think the issue of socks of blocked users and banned users is far too complex for this and for now this task's main idea is to implement a simple system, such specific things can be discussed at a later time

Owen added a comment.Apr 26 2020, 14:38

If it helps, the scope of this task is purely the implementation of such a system - not necessarily the enabling of such on Miraheze - but making enabling technically possible.

Tali64 added a subscriber: Tali64.Jul 7 2020, 17:05

This should be a high-priority task. Fandom has over 400,000 communities because it has automatic wiki creation. On a side note, you should visit my Wiki Gazetteer at Fandom.

Hispano76 added a subscriber: Hispano76.EditedJul 7 2020, 17:14

This should be a high-priority task. Fandom has over 400,000 communities because it has automatic wiki creation. On a side note, you should visit my Wiki Gazetteer at Fandom.

Changing priorities would not change the research process.

As my colleagues have said, we want to prevent wikis from being used for harassment and content that violates Miraheze's terms of use. It should also be added, however, that some users say the wiki will be for video games when it turns out to be used for content that violates Miraheze's policies for example.

This should be a high-priority task. Fandom has over 400,000 communities because it has automatic wiki creation. On a side note, you should visit my Wiki Gazetteer at Fandom.

I would just like to add that I doubt having this would make us have more wikis, I don't see the logic in that reasoning.

Void added a subscriber: Void.Jul 7 2020, 23:25

Something that could/would really help with this process is exposing the RequestWikiQueue to the api so that requests could be fetched by an external process in a programmatic fashion without having to parse raw HTML. With that alone, I'm fairly sure I could build and test a dataset and implement this in a python-based bot/script.

Tali64 added a comment.Jul 16 2020, 20:41

This should be a high-priority task. Fandom has over 400,000 communities because it has automatic wiki creation. On a side note, you should visit my Wiki Gazetteer at Fandom.

Changing priorities would not change the research process.

As my colleagues have said, we want to prevent wikis from being used for harassment and content that violates Miraheze's terms of use. It should also be added, however, that some users say the wiki will be for video games when it turns out to be used for content that violates Miraheze's policies for example.

Well, why not make this a normal priority task?

Well, why not make this a normal priority task?

All extension development tasks are low. This is towards the top of the low tasks.