Page MenuHomeMiraheze

Reconsider production usage of Elastic/CirrusSearch
Closed, DeclinedPublic

Description

After activating the AdvancedSearch extension, I cannot search by categories as advertised.

For example, searching the wiki poserdazfreebies for the word "Outfit" returns many pages with the word Outfit in the title, including the page "Genesis Babydoll Outfit" that has Category:Genesis. Searching for the word "Outfit" and the category "Genesis" returns "Create the page "Outfit deepcat:genesis" on this wiki! There were no results matching the query."

Some web searching indicates that this functionality requires something called CirrusSearch to be enabled in order to work correctly, but I didn't see that on the Extensions list.

Please fix this.

Event Timeline

All The Tropes has enabled this extension as well and have similar issues, more here:

https://allthetropes.org/wiki/Blog:Tech_blog_for_August_2021

I understand Miraheze tried and failed to enable CirrusSearch due to high load requirements, so we may need the tweak mentioned on the first item of my linked post to disable the CirrusSearch functions and use "incategory" searching only if it still cannot be enabled.

We've proved this is possible with graylog so it might be worth investigating.

RhinosF1 renamed this task from Extension:AdvancedSearch does not work as documented - cannot search by categories to Reconsider production usage of Elastic/Circussearch.Aug 4 2021, 13:28
Universal_Omega moved this task from Incoming to Long Term on the Infrastructure (SRE) board.
Universal_Omega moved this task from Backlog to Long Term on the MediaWiki (SRE) board.
Universal_Omega moved this task from Unsorted to Long Term on the Universal Omega board.

We've proved this is possible with graylog so it might be worth investigating.

This isn't correct. Our use of ES for Graylog isn't comparable to all wikis using ES. Different usage and different load too.

We've proved this is possible with graylog so it might be worth investigating.

This isn't correct. Our use of ES for Graylog isn't comparable to all wikis using ES. Different usage and different load too.

If we were to back ES up with SSDs and distribute processing nodes, do you think ES for wiki search would be possible? Can we also explore the size we would save on db* by moving the searchindex away as this could assist in reducing the complexity of database backups potentially.

In T7740#156440, @John wrote:

We've proved this is possible with graylog so it might be worth investigating.

This isn't correct. Our use of ES for Graylog isn't comparable to all wikis using ES. Different usage and different load too.

If we were to back ES up with SSDs and distribute processing nodes, do you think ES for wiki search would be possible? Can we also explore the size we would save on db* by moving the searchindex away as this could assist in reducing the complexity of database backups potentially.

Yes that would work (though emphasis on that it has to distribute processing).

Universal_Omega renamed this task from Reconsider production usage of Elastic/Circussearch to Reconsider production usage of Elastic/CirrusSearch.Aug 11 2021, 05:05

It's been nearly two weeks. AdvancedSearch is still broken. Please provide a status report on the efforts being made to fix it.

I also request that the priority of this task be bumped up. It affects at least two of the top-ten wikis on Miraheze,by usage after all.

Please provide a status report on the efforts being made to fix it.

While we have the disk space necessary to accommodate an ES cluster on cloud3, we don't have the memory resources available to do so currently.

This would be blocked on gaining more physical resources, or somehow reducing the usage by the database servers (which in theory may come by adding ElasticSearch for searchindex).

We would save the following disk space per server:

db11 ~ 110G
db12 ~ 74G
db13 ~88G

Coming up on four weeks now, and AdvancedSearch is still broken...

Coming up on four weeks now, and AdvancedSearch is still broken...

We've not had this for any significant period of time in the whole history of Miraheze. Big infra changes take time. There's far more to benefit than just 1 extension when we can do it.

It has now been eight weeks since I reported this broken behaviour. Please increase the priority of this - it isn't "Low" priority any more.

(I work in government IT. I know how long "big infra" takes.)

Currently, we are not likely to be in a position to give this serious though until early next quarter (Jan/Feb) as it would require changes to our physical hardware to support such an intensive piece of software.

We have attempted to run this on our current hardware before at the scale MediaWiki requires and have found it not to be reliable or efficient.

I don't know why this became a discussion of Elastic/CirrusSearch. My report was that an extension is broken.

Now it's been nine weeks. Does the Advanced Search really need this elastic search component, or is there a different search component that can be used instead? Or should the Advanced Search extension just be grayed out?

According to https://www.mediawiki.org/wiki/Extension:AdvancedSearch it requires CirrusSearch which requires ElasticSearch. I think the extension should be undeployed as not currently supported by our infra. CC @RhinosF1 @Reception123 @Universal_Omega.

AdvancedSearch can be removed I think.

Universal_Omega claimed this task.

I have just removed AdvancedSearch for now at least.

John removed Universal_Omega as the assignee of this task.

This task remains on Infra’s radar.

In T7740#165063, @John wrote:

This task remains on Infra’s radar.

Ah. Alright, my bad.

We are setting up an ElasticSearch cluster in January, we should be able to start progress on this in February once the air is clear from January's migration

Re-assigning team.

The current es* cluster set up should be able to handle a limited deploy of CirrusSearch to wikis. Note it is on HDDs though. If we want to go for SSDs, this going to be blocked until probably July at the earliest I would imagine, and additional memory as a new cluster would need to be deployed (we don't want log storage on SSDs as it provides no benefit to us when HDD is available).

Is it worth trying what's it's like on a few wikis?

I'm happy to try and set some config up and test perf on experimental wikis.

Is it worth trying what's it's like on a few wikis?

I'm happy to try and set some config up and test perf on experimental wikis.

You can use it for:

  • search
  • text
  • file metadata - probably minor and easily switchable back if fails?
  • ...?

If you want to work on it sure, I think that would be a good idea to test out a few select wikis to see how it goes rather than suddenly deploying everywhere and who knows what happens.

gratispaideiawiki is open for testing :)

RhinosF1 raised the priority of this task from Low to Normal.Jan 23 2022, 15:43
RhinosF1 moved this task from Infrastructure to MediaWiki on the Goal-2022-Jan-Jun board.
RhinosF1 moved this task from Long Term to Short Term on the MediaWiki (SRE) board.

What is the URL for gratispaideiawiki? I'd like to see if you can include hyphens in the search term.

In T7740#175455, @MikeV wrote:

What is the URL for gratispaideiawiki? I'd like to see if you can include hyphens in the search term.

https://gratispaideia.miraheze.org

In T7740#175455, @MikeV wrote:

What is the URL for gratispaideiawiki? I'd like to see if you can include hyphens in the search term.

What was meant is that the user is offering their wiki to be included in our initial testing. We've not yet started the process of configuring CirrusSearch, so no wikis have it for the time being. Either way, the URL would just be the name given (without the wiki suffix).miraheze.org

Oh, you're right. It hasn't started yet.

Please use this form if you would like it on your wiki.

It will be available starting later this week. Please be aware that search will be unavailable while we enable it.

RhinosF1 edited projects, added Infrastructure (SRE); removed MediaWiki (SRE).

ES 7 is not compatible with MediaWiki

No action required from Infra. ES7 is deployed with no plan to downgrade.

It sounds like an upstream problem that needs to be worked on.

RhinosF1 changed the task status from Open to Stalled.Jan 25 2022, 17:05
RhinosF1 lowered the priority of this task from Normal to Low.
RhinosF1 moved this task from Short Term to Long Term on the MediaWiki (SRE) board.
RhinosF1 edited projects, added Upstream; removed Notice.

Removing goal, please re tag when appropiate

Universal_Omega claimed this task.

Due to the fact it isn't compatible with ES7, and there are upstream tasks for support, and we've always closed tasks if reliant solely on upstream, I'm going to go ahead and close this one as declined. This can be reopened if/when support is added. However it does not look like there will be support before MediaWiki 1.39 at the earliest, based off:

MediaWiki 1.33.x - 1.38.x require Elasticsearch 6.5.x - 6.8.x (6.8.23+ recommended)