Bug #25065
openCloudflare captcha gating anonymous HTML fetches of *.commons.gc.cuny.edu
0%
Description
Quick follow-up related to #24987 (thanks again for the WAF exception that fixed the wp-json case).
Noticed a related behavior on the public-facing side worth a look. Anonymous HTML fetches of my site's home page get captcha-walled by Cloudflare, even from non-suspicious user agents.
Reproducer¶
curl -A "Mozilla/5.0 (compatible; LinkPreview)" https://khatchad.commons.gc.cuny.edu/
Response is a "Captcha Required" interstitial (HTML body containing the challenge, no actual page content).
Two questions¶
- Is this anonymous-fetch challenge intentional, or is the WAF being aggressive with user agents Cloudflare doesn't recognize?
- Are there explicit exemptions configured for social-media preview bots (LinkedIn, Twitter/X, Slack, Mastodon, Facebook)? My main concern is link previews for shared posts coming up blank.
Major search-engine crawlers (Googlebot, Bingbot) typically have verified-IP allowlists that bypass these, so I'm less worried about indexing impact -- but worth confirming.
Thanks,
Raffi
Updated by Raymond Hoh 27 days ago
Raffi, I've edited the ticket description so it uses the ticket's "Description" field rather than the "Deployment Actions" field meant for internal usage.
Just going to quickly reply to your points.
Is this anonymous-fetch challenge intentional, or is the WAF being aggressive with user agents Cloudflare doesn't recognize?
Probably the latter, which would explain the curl command not working.
Are there explicit exemptions configured for social-media preview bots (LinkedIn, Twitter/X, Slack, Mastodon, Facebook)? My main concern is link previews for shared posts coming up blank.
We've had issues where the WAF has been a bit more aggressive. See #24092 for Facebook and LinkedIn. If you are running into issues with other social media sites, let us know.
Updated by Boone Gorges 26 days ago
I agree with Ray that this is very likely part of Cloudflare's general algorithms and heuristics for identifying malicious bot traffic. I've sent off an inquiry to our host, and perhaps they'll provide us with some more context. In the meantime, let us know if you discover any actual functional problems that arise from the behavior.
Updated by Raffi Khatchadourian 26 days ago
Yup, I'm seeing it on LinkedIn; no link preview, which used to come up.
Updated by Raffi Khatchadourian 23 days ago
Following up with concrete details and a possible remediation.
Confirmed user-facing impact: LinkedIn link previews for shared *.commons.gc.cuny.edu URLs now come up blank (per note #5)—so this breaks real functionality, not just curl tests.
Suggested allowlist (link-preview bots): a WAF/Cloudflare rule that skips the Managed Challenge for known preview crawlers would fix previews directly. User agents to allow:
facebookexternalhit(Facebook/Meta)LinkedInBot(LinkedIn)Twitterbot(X/Twitter)Slackbot-LinkExpanding(Slack)Discordbot(Discord)WhatsApp(WhatsApp)TelegramBot(Telegram)Mastodon/*(Mastodon instances)Embedly/Iframely(generic preview services)
Cloudflare's built-in "Verified Bots" allow, plus a custom rule (e.g. skip challenge when cf.client.bot is true OR the UA matches the list above), should cover it.
Updated by Boone Gorges 23 days ago
Hi Raffi - Thanks for the additional info.
I've been communicating with the host about this issue. They confirmed that, as your previous update suggests, Cloudflare has general rules that use various heuristics to block suspicious traffic. As you might guess, it's not possible for us to turn this off across the board, but they'll work with us to loosen rules as necessary to allow legitimate functionality. (User agent strings like the ones that you have assembled are not ideal for this purpose because they can be, and in fact are, easily and frequently spoofed, though they can of course be part of a heuristic.)
It's interesting that you say that Linkedin links in particular are not working. Our host indicated that they made WAF rule changes on Friday to update the IP ranges used by Microsoft for Linkedin, and that the Linkedin Post Inspector was correctly fetching your site. See https://www.linkedin.com/post-inspector/inspect/https:%2F%2Fkhatchad.commons.gc.cuny.edu If you're seeing otherwise, can you please share specific steps to reproduce? Such as a link to a page on LinkedIn that ought to be pulling in content from the Commons, but is not.
Updated by Raffi Khatchadourian 8 days ago
Boone Gorges wrote in #note-7:
It's interesting that you say that Linkedin links in particular are not working. Our host indicated that they made WAF rule changes on Friday to update the IP ranges used by Microsoft for Linkedin, and that the Linkedin Post Inspector was correctly fetching your site. See https://www.linkedin.com/post-inspector/inspect/https:%2F%2Fkhatchad.commons.gc.cuny.edu If you're seeing otherwise, can you please share specific steps to reproduce? Such as a link to a page on LinkedIn that ought to be pulling in content from the Commons, but is not.
Confirmed on my end—after the Friday IP-range update, LinkedIn previews are working again. The Post Inspector fetches the site (title and description), and a real shared post renders with its title and image: https://khatchad.commons.gc.cuny.edu/2026/05/12/selected-for-the-cra-emerging-leaders-cohort/ showed the card and the Hunter logo. Thanks for getting this sorted with the host.
Updated by Raffi Khatchadourian 8 days ago
Separately—a related case under the same Cloudflare behavior: authenticated Gravity Forms file downloads (gf-download links) still return the "Captcha Required" interstitial, even for me as the site owner, so I can't fetch my own form attachments programmatically. I've sent Boone the specific reproducer by email rather than post it here, since it's a live download link to a student applicant's CV. Could the WAF allow authenticated or token-bearing gf-download requests to bypass the managed challenge?
Updated by Raffi Khatchadourian 8 days ago
Raffi Khatchadourian wrote in #note-9:
Separately—a related case under the same Cloudflare behavior: authenticated Gravity Forms file downloads (gf-download links) still return the "Captcha Required" interstitial, even for me as the site owner, so I can't fetch my own form attachments programmatically. I've sent Boone the specific reproducer by email rather than post it here, since it's a live download link to a student applicant's CV. Could the WAF allow authenticated or token-bearing gf-download requests to bypass the managed challenge?
The gf-download case can be worked around using the authenticated wp-json REST API (per #24987)—authenticated REST requests aren't challenged, so I can pull form uploads programmatically that way.
It's only a partial workaround, though: REST returns data, not rendered pages, and not everything is reachable through it. Direct authenticated curl access to the site is still needed, so I wouldn't consider this issue fully resolved.
Updated by Raymond Hoh 8 days ago
- Category name changed from SEO to Server
Separately—a related case under the same Cloudflare behavior: authenticated Gravity Forms file downloads (gf-download links) still return the "Captcha Required" interstitial, even for me as the site owner, so I can't fetch my own form attachments programmatically.
Cloudflare's WAF is most likely blocking the call due to the usage of cURL. The WAF is generally looking for legitimate visitors to the site via a web browser, not traffic made by external scripts.
I've asked our webhost to allow URLs through the Cloudflare WAF matching the specific Gravity Form query parameters and will let you know once they have responded. In the meantime, please continue to use the JSON API to access your Gravity Form attachments.
Updated by Raffi Khatchadourian 7 days ago
Raymond Hoh wrote in #note-11:
Separately—a related case under the same Cloudflare behavior: authenticated Gravity Forms file downloads (gf-download links) still return the "Captcha Required" interstitial, even for me as the site owner, so I can't fetch my own form attachments programmatically.
Cloudflare's WAF is most likely blocking the call due to the usage of cURL. The WAF is generally looking for legitimate visitors to the site via a web browser, not traffic made by external scripts.
I've asked our webhost to allow URLs through the Cloudflare WAF matching the specific Gravity Form query parameters and will let you know once they have responded. In the meantime, please continue to use the JSON API to access your Gravity Form attachments.
Thanks, Ray. No urgency on this now—I have a working path: a headless browser executes the challenge JS, so I can fetch rendered HTML and pull my authenticated gf-download attachments with my own session cookie, no WAF change needed. The query-param allowlist and the JSON API both help in the meantime.
For the longer term, if the host is open to it, the clean general fix would be to skip the Managed Challenge for requests carrying a valid wordpress_logged_in_ cookie—the same "authenticated traffic is legitimate" treatment the REST API already gets. That isn't spoofable the way a user-agent string is (the cookie is WP-signed; a forged one just fails at the origin), and it applies uniformly to every logged-in user rather than being a one-off exception for me. It would let plain curl reach the front end without driving a full browser. Not blocking me—just flagging it as the tidy resolution.
Updated by Raymond Hoh 7 days ago
The webhost has configured the WAF to allow Gravity Form attachment URLs to work through cURL now. I've tested the change and it is working for me. Raffi, can you verify?
That isn't spoofable the way a user-agent string is (the cookie is WP-signed; a forged one just fails at the origin)
While allowing the correct cookie would work for those that have it, this wouldn't stop traffic from bots trying to automate requests with the same cookie name or cookie header. The majority of our users are not using cURL to do automated management of their Commons site, so I'm reluctant to ask the webhost to make additional changes here. Though if the rest of the team thinks otherwise, I'd be okay with that as well.
I think your usage of using a headless browser is a suitable workaround for anything that requires authentication. If there is something you cannot accomplish with the headless browser approach or if the WAF is blocking something that should be allowed, let us know and we'll try to accommodate the request.
Updated by Raffi Khatchadourian 6 days ago
Raymond Hoh wrote in #note-13:
The webhost has configured the WAF to allow Gravity Form attachment URLs to work through cURL now. I've tested the change and it is working for me. Raffi, can you verify?
That isn't spoofable the way a user-agent string is (the cookie is WP-signed; a forged one just fails at the origin)
While allowing the correct cookie would work for those that have it, this wouldn't stop traffic from bots trying to automate requests with the same cookie name or cookie header. The majority of our users are not using cURL to do automated management of their Commons site, so I'm reluctant to ask the webhost to make additional changes here. Though if the rest of the team thinks otherwise, I'd be okay with that as well.
I think your usage of using a headless browser is a suitable workaround for anything that requires authentication. If there is something you cannot accomplish with the headless browser approach or if the WAF is blocking something that should be allowed, let us know and we'll try to accommodate the request.
Thanks, Ray—verified: the gf-download reproducer now returns the PDF through plain cURL (HTTP 200, no captcha), no auth or allowlisting needed. Gravity Form fix works end to end; appreciate you sorting it with the host.
Understood on the cookie—I won't push it. Headless covers me for now, so nothing's blocking me; I'll just leave plain-curl rendered-HTML access on the list as a long-term nice-to-have. Thanks again.