Project

General

Profile

Actions

Bug #21312

closed

Image URLs on Cloned Site

Added by Laurie Hurson 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority name:
Normal
Assignee:
-
Category name:
-
Target version:
Start date:
2024-10-24
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

Hi All,

We moving the workshops archive out of the TLC site (https://tlc.commons.gc.cuny.edu/) into its own site (https://testworkshop1.commons.gc.cuny.edu/)

To facilitate this, we cloned the TLC site and then erased most TLC content, leaving only workshops archive content.

We want to use all the same images for the workshop archive, and in theory they should have been moved over during the cloning process. The images appear correct in the editor and in the media library but are not working on the front end. In the media library the image url appears to be correct but when you view the image on the front the url is different, with an additional prefix tacked on. Most urls are similar to this:
https://i0.wp.com/testworkshop1.commons.gc.cuny.edu/wp-content/blogs.dir/35025/files/2021/01/NARA-abstract-by-davidjoyner.jpg?w=520&ssl=1

I have tried deleting the image and re-adding the page and downloaded and re-uploaded an image - neither processes resolved the issue.

May be this is related to Jetpack or some other plugin interference during the cloning process? Is there a bulk process we can run to fix these image urls?

Actions #1

Updated by Laurie Hurson 3 months ago

  • Description updated (diff)
Actions #2

Updated by Laurie Hurson 3 months ago

Actions #3

Updated by Boone Gorges 3 months ago

  • Status changed from New to Reporter Feedback

May be this is related to Jetpack or some other plugin interference during the cloning process? Is there a bulk process we can run to fix these image urls?

Yes, it's related to Jetpack. Jetpack appears to have a feature where images are loaded through wp.com rather than directly from commons.gc.cuny.edu (probably offloaded to a CDN for performance reasons). I turned Jetpack off on the site https://testworkshop1.commons.gc.cuny.edu/ just to test, and the images then loaded properly on the front end. After re-enabling Jetpack on that site, it appears that the images still work. From this I surmise something like the following:
1. When you have Jetpack active on a non-public site (https://testworkshop1.commons.gc.cuny.edu/ is set to blog_public=-2), wordpress.org is unable to reach the site and its images
2. Therefore, in this case, Jetpack doesn't try to do the URL swapout for images. The failure must somehow be cached as a configuration setting.
3. However, when you clone a public source site to a non-public destination site, Jetpack must have some cached data about available URLs. In other words, Jetpack was able to access the source site, and this accessibility must be cached somewhere in the database, and the cached accessibility flag is cloned along with the rest of the site, so post-clone Jetpack continues to believe that it can proxy the images in question.
4. When I disabled Jetpack and turned it back on, these cached values must have been reset, and Jetpack correctly determined that the images are not accessible.

Assuming that something like this is correct, here's my takeaways:

a. There may be a way to update the cloning process such that we tell Jetpack to flush its internal cache. But Jetpack is very complicated and I don't know whether it'll be possible to do this flushing in a way that's forward-compatible and doesn't have any weird side effects.
b. Not all of Jetpack's features work well on a non-public site, especially when the non-public site is a clone of a public site; Jetpack doesn't know anything about the clone process.
c. In the short-to-medium term, I think we can probably solve this with documentation: if you are cloning a site that has Jetpack, toggle the plugin after the clone so that you can reestablish the proper contact with wordpress.com

Perhaps you could test this yourself on the second cloned site you reference, and let me know whether this seems like an OK path forward.

Not sure but maybe this is related to either of these previous issues?

A good thought, this issue appears to be specific to Jetpack. The cloned sites you've mentioned had their URLs properly swapped and their content successfully copied.

Actions #4

Updated by Laurie Hurson 3 months ago

Thanks so much for this Boone!

b. Not all of Jetpack's features work well on a non-public site, especially when the non-public site is a clone of a public site; Jetpack doesn't know anything about the clone process.

Yes, this makes sense. Good to know re: public v private sites and jetpack functionality.

c. In the short-to-medium term, I think we can probably solve this with documentation: if you are cloning a site that has Jetpack, toggle the plugin after the clone so that you can reestablish the proper contact with wordpress.com

Yes, this sounds good. I can add some information to our documentation on cloning on the HELP site.

Perhaps you could test this yourself on the second cloned site you reference, and let me know whether this seems like an OK path forward.

Just test this and yes, this resolved the issue on the tlcdev clone site so adding some documentation will help alert users to this workaround.

Thanks again!

Actions #5

Updated by Boone Gorges 3 months ago

  • Status changed from Reporter Feedback to Resolved
  • Target version set to Not tracked

This all sounds good. Let's mark this as resolved. If it ends up being a more broadly-reported issue, we can reconsider a code-based fix.

Actions

Also available in: Atom PDF