Addition to robots.txt for newlaborforum.cuny.edu
I am doing some SEO cleanup for newlaborforum.cuny.edu and have found that Google crawls a number of URLs that point to the main login page. Because of this I would like to make a change to the robots.txt file, which should be on the server. Therefore I would like the following line added to the robot.txt file:
This should prevent Google from crawling those specific pages. If this is not an option, could you please advise on alternatives to this? Please let me know if you have any questions, and thank you for your help.
Web Designer, New Labor Forum
#1 Updated by Boone Gorges almost 2 years ago
- Status changed from New to Rejected
Hi Diane - As discussed in https://redmine.gc.cuny.edu/issues/9684#note-2, our robots.txt is shared by the entire installation, so cannot be customized for the needs of specific subsites. Moreover, the robots.txt standard does not allow for paths to be defined in an absolute way (using http://...); paths must be relative. See eg https://developers.google.com/search/reference/robots_txt#group-member-records.
It's worth noting that the login page on Commons sites already has a meta tag like this:
<meta name='robots' content='noindex,follow' />
So no changes at the robots.txt level should be necessary in this specific case.
For further customizations, you may want to look into the Yoast SEO plugin, which allows the site admin to set up noindex meta tags for various pieces of content.
#2 Updated by Diane Krauthamer almost 2 years ago
Thanks, I understand that but it seems that the feature which is usually available on Yoast (the ability to update the robots.txt file) but is not included in this edition. It does allow users to block indexing some types of URLs, but not duplicates such as these. However I understand that figuring this out goes beyond the developer team so I will search for another plugin or another way to do this.