May be this is related to Jetpack or some other plugin interference during the cloning process? Is there a bulk process we can run to fix these image urls?
Yes, it's related to Jetpack. Jetpack appears to have a feature where images are loaded through wp.com rather than directly from commons.gc.cuny.edu (probably offloaded to a CDN for performance reasons). I turned Jetpack off on the site https://testworkshop1.commons.gc.cuny.edu/ just to test, and the images then loaded properly on the front end. After re-enabling Jetpack on that site, it appears that the images still work. From this I surmise something like the following:
1. When you have Jetpack active on a non-public site (https://testworkshop1.commons.gc.cuny.edu/ is set to blog_public=-2), wordpress.org is unable to reach the site and its images
2. Therefore, in this case, Jetpack doesn't try to do the URL swapout for images. The failure must somehow be cached as a configuration setting.
3. However, when you clone a public source site to a non-public destination site, Jetpack must have some cached data about available URLs. In other words, Jetpack was able to access the source site, and this accessibility must be cached somewhere in the database, and the cached accessibility flag is cloned along with the rest of the site, so post-clone Jetpack continues to believe that it can proxy the images in question.
4. When I disabled Jetpack and turned it back on, these cached values must have been reset, and Jetpack correctly determined that the images are not accessible.
Assuming that something like this is correct, here's my takeaways:
a. There may be a way to update the cloning process such that we tell Jetpack to flush its internal cache. But Jetpack is very complicated and I don't know whether it'll be possible to do this flushing in a way that's forward-compatible and doesn't have any weird side effects.
b. Not all of Jetpack's features work well on a non-public site, especially when the non-public site is a clone of a public site; Jetpack doesn't know anything about the clone process.
c. In the short-to-medium term, I think we can probably solve this with documentation: if you are cloning a site that has Jetpack, toggle the plugin after the clone so that you can reestablish the proper contact with wordpress.com
Perhaps you could test this yourself on the second cloned site you reference, and let me know whether this seems like an OK path forward.
Not sure but maybe this is related to either of these previous issues?
A good thought, this issue appears to be specific to Jetpack. The cloned sites you've mentioned had their URLs properly swapped and their content successfully copied.