Page MenuHomeMiraheze

Indexed, though blocked by robots.txt
Open, NormalPublic

Description

Google Search Console reveals, that in the last few days

, a new problem has started to occur...


https://radioscanningtw.miraheze.org/w/api.php?uselang=en&action=feedrecentchanges : Indexed, though blocked by robots.txt

Event Timeline

Jidanni updated the task description. (Show Details)
Universal_Omega added a project: MediaWiki.

Hello can you please provide us with a little bit more explanation of your issue? I am a bit unclear what your issue is and therefore unable to assist at the moment. Thanks!

Universal_Omega claimed this task.

looking at your charts the only one in your issue is api.php and I'm pretty sure that it is intentional blocked so it won't bbe crawled. This won't have much effect on your site. Thank you!

RhinosF1 removed Universal_Omega as the assignee of this task.

You've misread the issue.

The pages like Special:* and api.php that we block can be displayed on google still because they never get to seeing the noindex tag as it's blocked by robots.txt.

It's a perfectly valid and triggers quite a number of the alerts we have on search console.

You've misread the issue.

The pages like Special:* and api.php that we block can be displayed on google still because they never get to seeing the noindex tag as it's blocked by robots.txt.

It's a perfectly valid and triggers quite a number of the alerts we have on search console.

Apologies for misreading.

So do you have any ideas on how to approach the issue then? Should we remove it from robots.txt or something else? Because if we don't remove it from robots.txt then the issue is still invalid as it can't be solved, to my knowledge. That's really our only option, but is it a good one?

It needs a bit wider debate from Site Reliability Engineering as to whether we definately want it on robots.txt or to trust mediawiki's noindex tag although I'll say for what it's worth that our robots.txt is useless for non English wikis anyway.