Page MenuHomeMiraheze

Reconsider production usage of Elastic/CirrusSearch
Open, LowPublic

Description

After activating the AdvancedSearch extension, I cannot search by categories as advertised.

For example, searching the wiki poserdazfreebies for the word "Outfit" returns many pages with the word Outfit in the title, including the page "Genesis Babydoll Outfit" that has Category:Genesis. Searching for the word "Outfit" and the category "Genesis" returns "Create the page "Outfit deepcat:genesis" on this wiki! There were no results matching the query."

Some web searching indicates that this functionality requires something called CirrusSearch to be enabled in order to work correctly, but I didn't see that on the Extensions list.

Please fix this.

Event Timeline

All The Tropes has enabled this extension as well and have similar issues, more here:

https://allthetropes.org/wiki/Blog:Tech_blog_for_August_2021

I understand Miraheze tried and failed to enable CirrusSearch due to high load requirements, so we may need the tweak mentioned on the first item of my linked post to disable the CirrusSearch functions and use "incategory" searching only if it still cannot be enabled.

We've proved this is possible with graylog so it might be worth investigating.

RhinosF1 renamed this task from Extension:AdvancedSearch does not work as documented - cannot search by categories to Reconsider production usage of Elastic/Circussearch.Aug 4 2021, 13:28

We've proved this is possible with graylog so it might be worth investigating.

This isn't correct. Our use of ES for Graylog isn't comparable to all wikis using ES. Different usage and different load too.

We've proved this is possible with graylog so it might be worth investigating.

This isn't correct. Our use of ES for Graylog isn't comparable to all wikis using ES. Different usage and different load too.

If we were to back ES up with SSDs and distribute processing nodes, do you think ES for wiki search would be possible? Can we also explore the size we would save on db* by moving the searchindex away as this could assist in reducing the complexity of database backups potentially.

In T7740#156440, @John wrote:

We've proved this is possible with graylog so it might be worth investigating.

This isn't correct. Our use of ES for Graylog isn't comparable to all wikis using ES. Different usage and different load too.

If we were to back ES up with SSDs and distribute processing nodes, do you think ES for wiki search would be possible? Can we also explore the size we would save on db* by moving the searchindex away as this could assist in reducing the complexity of database backups potentially.

Yes that would work (though emphasis on that it has to distribute processing).

Universal_Omega renamed this task from Reconsider production usage of Elastic/Circussearch to Reconsider production usage of Elastic/CirrusSearch.Aug 11 2021, 05:05

It's been nearly two weeks. AdvancedSearch is still broken. Please provide a status report on the efforts being made to fix it.

I also request that the priority of this task be bumped up. It affects at least two of the top-ten wikis on Miraheze,by usage after all.

Please provide a status report on the efforts being made to fix it.

While we have the disk space necessary to accommodate an ES cluster on cloud3, we don't have the memory resources available to do so currently.

This would be blocked on gaining more physical resources, or somehow reducing the usage by the database servers (which in theory may come by adding ElasticSearch for searchindex).

We would save the following disk space per server:

db11 ~ 110G
db12 ~ 74G
db13 ~88G

Coming up on four weeks now, and AdvancedSearch is still broken...

Coming up on four weeks now, and AdvancedSearch is still broken...

We've not had this for any significant period of time in the whole history of Miraheze. Big infra changes take time. There's far more to benefit than just 1 extension when we can do it.