Project

General

Profile

Support #14538

Weebly To Commons

Added by Laurie Hurson 4 months ago. Updated about 1 month ago.

Status:
Reporter Feedback
Priority name:
Normal
Assignee:
-
Category name:
-
Target version:
Start date:
2021-06-08
Due date:
% Done:

0%

Estimated time:

Description

Hi All,

I met with a faculty member who runs the Lehman Lab for Social Analysis: https://www.lehmanlab.org/

She would like to move the Lehman Lab site onto Commons along with several of the websites for the courses the lab offers, linked below

https://www.lehman-guns-research.org/

https://nycethnography.weebly.com/

https://foodhealthmigrationlehman.weebly.com/

These 4 sites are on Weebly currently but she would like to move them over to the Commons so they can be further built out and used each semester, in tandem with the courses that will now also run on the Commons.

I have looked online for various methods of exporting Weebly to Wordpress but the exporter that is out there (https://weeblytowp.com/) does work in this case because the site are all pages, not blogs.

There seems to be another method for moving to WordPress that involves exporting an HTML archive of the site and then uploading HTML archive to Wordpress. However, I do not believe we have a way to upload HTML archives to the Commons (I think this is a security issue).

Moving these sites to the Commons will be a one time thing since, once they are on the Commons she will continue building out the sites in the CAC space.

Any suggestions for how we might export/import these Weebly sites onto Commmons (preferably avoiding a manual copy/paste method, if possible)?

History

#1 Updated by Laurie Hurson 4 months ago

An update...

I was testing the weebly-CAC import with the weebly site below

https://globalizationlehman.weebly.com/

And was planning to upload the site here

https://weeblyimport.commons.gc.cuny.edu/

The HTML Archive Zip I was able to get for this site is attached

I will keep poking around in the back end of weebly to see if I can get a WXR file. When I try to use the Weeblytowp tool I get an error message that the URL does not contain a blog (screenshot attached).

#2 Updated by Boone Gorges 4 months ago

  • Status changed from New to Reporter Feedback

Thanks, Laurie!

After doing some reading, it looks to me like the weeblytowp.com tool works only on blog posts because it uses Weebly's RSS feed as a source, and the RSS feed only contains blog content. This makes the tool a non-starter for our purposes.

I've had a look at the HTML export. An automated plugin for importing HTML is not only insecure, but it's going to produce very bad results with an archive like this one, since the majority of the HTML markup in each file is related to navigation and other elements that aren't relevant when importing to WordPress. So we would need a custom tool. I could build such a tool, but I should first caution that the results are not likely to be pretty:
a. Most of the CSS-based styling included on the Weebly site will not be included on import. WordPress uses a centralized theming system to dictate the styles for a site, so the team will have to more or less start from scratch.
b. The left-hand navigation will not be retained, and will need to be rebuilt somehow (probably using the WP Menu system).
c. Individual page headers, like the banner at the top of https://globalizationlehman.weebly.com/beauty-standards.html, can be imported, but they're likely to be redundant with the page titles that are already built into WordPress themes. This might require manual cleanup on each post later on. (I can also skip these headers if that's what's best.)
d. Some formatting of page content is not likely to come through the import. For example, the "column" layout on https://globalizationlehman.weebly.com/beauty-standards.html ("The United States..." and the Big Bill Broonzy video appear on the left, the Laurie Cooper photograph on the right) uses some Weebly-specific styling that can't easily be imported. Another example on the same page is the different font sizes in the Willie Lynch letter section, which would likely be lost in the import process.

It would take me a couple of hours to write a tool that could be used to parse the export, find the content in each HTML file, import the pages, and process the embedded images. And we'll still end up with a bit of a mess, since much of the formatting would be lost. Cleaning up this formatting would require manual work on each post, in which case you may as well do the manual copy-paste in the first place. So, on balance, it doesn't seem like a great idea to attempt the import - I'd end up doing a bunch of work, but it wouldn't save much time, if any, for the team.

If the purpose of the import was simple preservation, I may be able to offer the ability to host a version of the archived course on the Commons webspace, with a Commons URL. But in this case it would not be editable as WordPress content.

Laurie, can you think about this and maybe talk it over with the Lehman folks? Let them know that, for technical reasons, we can't offer an import process that would preserve the appearance of the content (that they've obviously worked hard to establish on the Weebly site), and get a feel for how they'd like to move forward.

#3 Updated by Raymond Hoh 4 months ago

Boone, you might want to check out #9492 where I did something similar for sites hosted from Wikispaces. See https://github.com/cuny-academic-commons/wikispaces-to-wordpress.

The script does not save each image into its own attachment though. The images would be manually uploaded and saved to a static location like /wp-content/static/.

Other than what you've already mentioned, I took a quick look at the Weebly export and it appears that we could parse the Weebly page contents from the wsite-section-elements div container. Some amending would need to be done for Weebly, but it should be relatively easy to modify the Wikispaces scripts to work for it.

#4 Updated by Boone Gorges 4 months ago

Thanks, Ray! I recall your Wikispaces work but I'd neglected to look up the ticket number, so thanks for connecting the dots :)

Parsing out the wsite-section-elements elements was pretty much what I had in mind. But they use lots of garbage markup that will be stripped by WordPress, either at the time of post creation or when the post is first edited. This is why I have all the warnings about formatting being lost.

I'm not too worried about the media part of it - I have some existing code laying around that identifies "local" src elements, moves the files to the proper location in wp_upload_dir(), and then swaps out the src attribute. It doesn't bother adding attachment items, so they can't be managed through the Media Library. Anyway, this is a technical hurdle that I think is small next to the larger usability issues I described above.

#5 Updated by Laurie Hurson 4 months ago

Thank you Boone and Ray for exploring these options.

It sounds like trying to do any export-import will create a significant amount of work on the dev side and ultimately does not eliminate the need for re-formatting on the front end. Overall, it sounds like the manual copy/paste method might be the best route.

If there might be any way to use the html files to make the copy/paste method retain some of styling and in particular the media on pages, let me know if you have ideas.

I tried uploading the media from the Globalization html archive along with copy/pasting the html for a page (linked below) but it looks like the images dont embed properly, probably because the src link structure is wrong in the html?

Media library:https://weeblyimport.commons.gc.cuny.edu/wp-admin/upload.php

Test page: https://weeblyimport.commons.gc.cuny.edu/substance-dependency-during-pregnancy-from-html-file/

Thanks again for your help with this.

#6 Updated by Laurie Hurson 4 months ago

To clarify my last question--

I understand that there is little we can do to retain the weebly formatting and styling either through import or the copy/paste method.

But: is there a way to do the copy/paste method and have the images appear in the content without having to manually re-add them in? So for example- we add the images that were on the weebly site to the Media library for the new commons site and then they appear in the Commons content on the frontend where they were in the original weebly content?

#7 Updated by Boone Gorges 4 months ago

Hi Laurie - No, there is no way to do this. The original Weebly content uses a certain kind of relative path structure for the src attribute on the img elements. There's no way to mirror this structure on the Commons, meaning that each image's src attribute needs to be updated. This can be done via script if we write an importer, but outside of this, it must be done manually.

#8 Updated by Laurie Hurson 4 months ago

Okay, good to know. T hanks for this info Boone

I am going to touch base with the Lehman folks and see what they want to do. I may circle back about this idea:

If the purpose of the import was simple preservation, I may be able to offer the ability to host a version of the archived course on the Commons webspace, with a Commons URL. But in this case it would not be editable as WordPress content.

If we can host a static version and then start a Commons site as the new editable V2 of the site, this might be an option.

#9 Updated by Raymond Hoh 4 months ago

This can be done via script if we write an importer, but outside of this, it must be done manually.

That's true, the images could be uploaded statically and the images will work once the import is done.

It would be relatively easy to do a quick import if we set low expectations and that the Lehman editors will need to adjust parts of the page content, layout and site manually themselves. This will depend on how comfortable the editors are with WordPress. It would probably be good to find a basic theme as well.

I should note that the ZIP file Laurie provided has missing internal images. Check the beauty-standards.html in the ZIP file for one example. Here's what the page should look like: https://globalizationlehman.weebly.com/beauty-standards.html. Also, I modified my converter to work with the Weebly ZIP file and I've attached a screenshot of the same page on the Twenty Sixteen theme with the Classic Editor. Columns are retained, but styling is a little off as expected.

We could set up one site import for them to see if they will like it or not. If they find that the import is sufficient, we can move forward with the other three sites.

#10 Updated by Boone Gorges about 1 month ago

  • Target version set to Not tracked

Also available in: Atom PDF