Done in https://github.com/cuny-academic-commons/cac/commit/fb29c879f5ce46963dfdfbabb359cbba7a102612.
In dev environments, the generated robots.txt
will use:
User-agent: *
Disallow: /
I've also added the following to the HTML markup for safe measure:
<meta name='robots' content='max-image-preview:none, noindex, noarchive, nofollow, noimageindex, nosnippet, notranslate' />
More info about these rules can be found here: https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag
This would only work for good crawlers though, but better than nothing I guess.
I've also introduced the helper function, cac_is_production()
, to determine if we're viewing the production environment or not.