Project

General

Profile

Actions

Feature #18194

open

Migration routine for CVs

Added by Boone Gorges 10 months ago. Updated 2 months ago.

Status:
New
Priority name:
Normal
Assignee:
Category name:
CV
Target version:
Start date:
2023-05-09
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

When we introduce the new CV feature #17768 we will need to migrate over existing user data. Some initial thoughts for discussion:

- We should not create CVs for all existing users. How do we decide? I propose that we should only create them for users who have at least one "bottom" section filled in - Education, Positions, Publications, etc. Does anyone else have suggestions for how this should work?
- We'll have to build block markup using a pipeline like serialize_blocks(). This is likely to be subject to all sorts of issues with character encoding, etc, so we'll have to find some outlier profiles to use for testing. Ones with lots of fields built in, etc.
- Should old data - ie the BP profile data - be deleted or kept? It depends in part on whether we will continue to use the BP profile data. I'm thinking in part of directory search. If we do this, we'll need a mechanism in the regular CV save routine that syncs the relevant fields to BP's profile data system.

The migrator will have to be one of the last things built, since it'll depend on the specifics of the CV block implementation. See #18192, #18193.


Files

Social Icons.zip (6.6 KB) Social Icons.zip Sara Cannon, 2023-09-19 05:18 PM
Social Icons.zip (8.4 KB) Social Icons.zip Sara Cannon, 2023-09-19 06:04 PM
Screenshot 2023-10-05 at 12.02.41 AM.png (113 KB) Screenshot 2023-10-05 at 12.02.41 AM.png Sara Cannon, 2023-10-05 01:03 AM

Related issues

Related to CUNY Academic Commons - Feature #17768: CV Editing and PublishingResolvedJeremy Felt2023-03-04

Actions
Related to CUNY Academic Commons - Feature #18467: "Positions" migration to new CV editorResolvedBoone Gorges2023-07-17

Actions
Related to CUNY Academic Commons - Feature #19131: Add DBLP to Commons profileNewBoone Gorges2023-10-30

Actions
Related to CUNY Academic Commons - Feature #18192: CV "top" sectionResolvedBoone Gorges2023-08-11

Actions
Actions #1

Updated by Boone Gorges 10 months ago

Actions #2

Updated by Matt Gold 10 months ago

It would be good to discuss this at the team meeting, but I think your proposed metric ("at least one 'bottom' section filled in") is reasonable

Actions #3

Updated by Jeremy Felt 10 months ago

We'll have to build block markup using a pipeline like serialize_blocks(). This is likely to be subject to all sorts of issues with character encoding, etc, so we'll have to find some outlier profiles to use for testing.

Is any of the existing data in HTML? Dealing with arbitrary tags while importing into a block structure would be the most annoying.

Actions #4

Updated by Boone Gorges 10 months ago

Is any of the existing data in HTML? Dealing with arbitrary tags while importing into a block structure would be the most annoying.

Yes, though it's a pretty limited subset. Aside from inline elements like <em> and <a>, there are paragraphs and, IIRC, lists. So I don't think it'll be overly finicky to translate.

Actions #5

Updated by Boone Gorges 8 months ago

  • Related to Feature #18467: "Positions" migration to new CV editor added
Actions #6

Updated by Boone Gorges 6 months ago

Work-in-progress migration routine https://gist.github.com/boonebgorges/60467a056ec9f26a044ace3e4d52fdc9

It's not fully rigged up in that it doesn't yet create the CV post type, but this is trivial to add. I've run this with various types of profiles and it seems to handle most types of content pretty well.

Some outstanding issues, for my own reference:
- The following social-ish fields exist on the Commons but are not yet being migrated: ORCID ID, Google Scholar, Academia.edu, Delicious. We may want to scrap Delicious. The other three aren't natively supported by WordPress, so we need to write our own custom variations.
- A couple cac-cv-editor blocks aren't properly rendering for me. Not sure yet whether this is a bug in my migrator or in cac-cv-editor. These are: cv-last-active and cv-profile-navigation.
- When we come up with the placeholder text for various fields, I'll have to update the script.
- There's no logic built in yet for determining whether a user should in fact have a CV created, ie my suggestion that we create if a user has at least one "bottom" section.

Actions #7

Updated by Boone Gorges 6 months ago

Quick follow-up that I figured out non-rendering blocks. The block-renderer requests were 403ing, which was happening because no postId was being sent by ServerSideRender. I've added the necessary urlQueryArgs: https://github.com/cuny-academic-commons/cac/commit/956c123d765c73db2f7506224cf0ee11d286de59

Actions #8

Updated by Colin McDonald 6 months ago

Regarding the fields like Delicious to be scrapped, it seems that we never closed the loop on ticket #17018 and the data we pulled on how active these fields are. Here's the main takeaway:

I don't have the ability to determine when people added the data. As you'd probably guess, things like IM, Flickr, Delicious have been defunct for years, and all existing instances are probably ancient. Here are current raw numbers for each (again, out of ~39,000 user accounts):

- IM: 272
- Flickr ID: 227
- Delicious ID: 194
- ORCID ID: 152
- Google Scholar: 161
- Blog URL: 804
- I don't see an RSS field, but the Website field has: 2343

All of these except Blog URL and Website seem like candidates for removal to me. People could always add this info in a freeform block/field. But perhaps we can ask Luke/Matt/Laurie/etc on an upcoming call if things like ORCID and Google Scholar are particularly important in an academic setting to offer.

Actions #9

Updated by Sara Cannon 6 months ago

Ray - Here are some SVG's of social icons to refresh. Should we remove the IM, Flickr, and Delicious fields?

Actions #10

Updated by Boone Gorges 6 months ago

Sara, we're not going to remove anything for this interim release - it's a reskin only. We'll discuss whether and how to remove other fields as part of this ticket.

Actions #11

Updated by Sara Cannon 6 months ago

Sounds good. I've re-attached the zip file with the icons so it include those. (it was pretty funny trying to find a modern Instant Messenger icon)

Actions #12

Updated by Sara Cannon 6 months ago

I know I gave you a different size, but seeing it in the browser, could we shrink the size of the social icons height to 24px while maintaining a touch target of 42?

Actions #13

Updated by Boone Gorges 5 months ago

  • Target version changed from 2.2.0 to 2.3.0
Actions #14

Updated by Boone Gorges 5 months ago

Actions #15

Updated by Boone Gorges 3 months ago

Jeremy, the info of interest is here: https://redmine.gc.cuny.edu/issues/18194#note-8

Upshots are, I think:
1. Delicious and IM will not be migrated
2. Flickr should probably come over, and I think WP has a variation for this already
3. Google Scholar and ORCID will need their own variations

Actions #16

Updated by Jeremy Felt 3 months ago

I've pushed up a data-test-template.html file on the feature/17769-cv-editor branch that has a complete version of Matt's CV. It's making use of everything.

Some notes from that:

  • I've added social link variations for ORCID, Google Scholar, and X. * WordPress 6.4 ships with X, so our version is more of a shim until that happens.
  • I've updated the profile image block so that the full URL is not stored, only the filename.

For images in general: When a new image is uploaded, I'm prepending the filename with wp_generate_uuid4() to avoid conflicts. That's not strictly necessary for media being copied from elsewhere, as long as uniqueness is already there. Profile images go in files/cv/avatars/ and cover images go in files/cv/covers.

And with that, I think it's ready, Boone!

Actions #17

Updated by Sara Cannon 3 months ago

How do I view the complete version of Matt's CV? Is this decoupled from his profile?

Actions #18

Updated by Boone Gorges 3 months ago

Jeremy, is 'subdir' in this line a bug? I think it should be 'url' https://github.com/cuny-academic-commons/cac/blob/34dfa73798a5f341931a3b818cf12373bfc1c538/wp-content/plugins/cac-cv-editor/includes/media.php#L232

How do I view the complete version of Matt's CV? Is this decoupled from his profile?

You can't view it yet. It's stored in an HTML file as a reference for me, and its primary purpose is to demonstrate the syntax for the CV for the purposes of the migration script. I will create a live version of it for you to look at, but we should probably not use it as a public exemplar of CV (as suggested in #19410). For that, we may either want a single "fake" CV, or perhaps a few various types of real-but-fleshed-out exemplars.

Actions #19

Updated by Sara Cannon 3 months ago

Gotcha! Sorry, I was thinking that this was the sample one.

Actions #20

Updated by Jeremy Felt 3 months ago

Jeremy, is 'subdir' in this line a bug? I think it should be 'url'

It is - and I should have noticed when I wrote more code to work around it. I just pushed up a fix in a620ad02

Actions #21

Updated by Boone Gorges 3 months ago

Thanks, Jeremy!

I have a fully-working version of the migrator, which I've added as a "bin" file to the plugin for better tracking. See https://github.com/cuny-academic-commons/cac/blob/6ea20b48d1ca7ca7fd71a46868c52d0817de7afb/wp-content/plugins/cac-cv-editor/bin/cac-migrate-cvs.php

Jeremy, perhaps you could take a few minutes to have a look. A couple notes for your consideration:
- The format_xml() method just pretty-prints the block HTML for easier comparison. It's not used in the production pipeline because all the DOMDocument juggling tends to break encoding (UTF-8 linebreaks, etc)
- The imageURL attribute on the cac/cv-profile-image block now contains only the basename: https://github.com/cuny-academic-commons/cac/blob/98af440ed3476dc6c156188dc6205bccca3b861a/wp-content/plugins/cac-cv-editor/bin/cac-migrate-cvs.php#L179
- In your data-test-template.html, some blocks have 'id' attributes. Eg: <h2 class="wp-block-heading" id="matthew-k-gold">Matthew K. Gold</h2> Are these added by cac-cv-editor or are they added automatically by the block editor? I didn't build any IDs as part of the migration as I'm not sure whether they're functionally necessary and they seem a bit finicky (sanitize_title_with_dashes??)
- If you test, you may want to set a limit on the user_ids: $user_ids = $wpdb->get_col( "SELECT ID FROM {$wpdb->users} WHERE deleted = 0 ORDER BY RAND() LIMIT 100" );
- As discussed, users only get CVs if they have at least one "below the fold" section: https://github.com/cuny-academic-commons/cac/blob/6ea20b48d1ca7ca7fd71a46868c52d0817de7afb/wp-content/plugins/cac-cv-editor/bin/cac-migrate-cvs.php#L839
- cac-cv-editor doesn't have a method for programatically creating CVs. Could you double-check that this is the correct structure of a CV post object? I think the post_author property is the important one: https://github.com/cuny-academic-commons/cac/blob/6ea20b48d1ca7ca7fd71a46868c52d0817de7afb/wp-content/plugins/cac-cv-editor/bin/cac-migrate-cvs.php#L892

As I give Jeremy time to have a look, I'll be working on next steps. This includes:
1. Working up a matrix of various migration possibilities that we need to test. This includes different combinations of social fields (and lack thereof), below-the-fold fields (and lack thereof), various kinds of markup that appear in sections like 'Publications', various types of positions, and so forth.
2. Creating a test environment. This will probably involve: Identifying profiles on CDEV that meet the criteria in my matrix (or importing from production if necessary); running the migration on CDEV; creating a backdoor on CDEV (?show_legacy_portfolio=1) that allows you to do side-by-side comparison between the old portfolio and the new CV.

I will aim to have this all put together by our Tuesday call so that we can test in the latter half of next week.

Actions #22

Updated by Jeremy Felt 3 months ago

Things are looking good—wow that's a lot of HTML building. :)

I'm troubleshooting some stuff with WP-CLI and my local Commons environment, so I haven't run a full test yet, but a couple answers:

In your data-test-template.html, some blocks have 'id' attributes.

These are added by the block editor for any heading. It's okay if they're missing from the migration—we don't really make use of them and the block editor will generate them automatically whenever the user edits their CV.

Could you double-check that this is the correct structure of a CV post object? I think the post_author property is the important one.

This structure is correct and yes, post_author is set by the REST endpoint as the current user.

Actions #23

Updated by Raymond Hoh 3 months ago

Hi Boone,

You mentioned on the call about having trouble detecting if a Gravatar is set or not. I actually wrote a function to detect if a Gravatar is valid -- cac_is_valid_gravatar() . To use, pass the user's email address. I'm not sure if this function will help or not for your migration script, but I mention it anyway.

Actions #24

Updated by Jeremy Felt 3 months ago

I have the migration script working well locally and am doing some spot checking. Boone, let me know if you run into anything strange that I can assist with.

Actions #25

Updated by Boone Gorges 3 months ago

Ray, thanks! https://github.com/cuny-academic-commons/cac/commit/d5a70ac43a93e2d91561e57e0fac9ee06d7b2d4a

Jeremy, I'll let you know. I'm writing up my final instructions and something may come up :)

Actions #26

Updated by Boone Gorges 3 months ago

Hi everyone,

The migration routine is ready for testing.

For testing purposes, I've built a tool that allows you to view the legacy "Public Portfolio" on CDEV. To use, append ?legacy-portfolio=1 to any user URL. So, for example:

- https://commons.gc.cuny.edu/members/boonebgorges is my migrated CV
- https://commons.gc.cuny.edu/members/boonebgorges?legacy-portfolio=1 is the old Public Portfolio

There seems to be a bug in the Public Portfolio that causes field in the header ("website" etc) to be improperly escaped. I assume this is due to changes made elsewhere as part of the CV migration. You can ignore this, since the Public Portfolio is going away.

On CDEV, CVs have been created for each user who has at least one completed "widget" (ie, below-the-fold section) on their Public Portfolio. These profile URLs will redirect to the /activity/ page for that user, while you can view their old Public Portfolio at the corresponding legacy-portfolio=1 URL:
- https://commons.gc.cuny.edu/members/packerman https://commons.gc.cuny.edu/members/packerman?legacy-portfolio=1
- https://commons.gc.cuny.edu/members/rphillips/ https://commons.gc.cuny.edu/members/rphillips/?legacy-portfolio=1

A few examples of profiles that are quite built out:
- https://commons.gc.cuny.edu/members/admin/ https://commons.gc.cuny.edu/members/admin/?legacy-portfolio=1
- https://commons.gc.cuny.edu/members/gotte/ https://commons.gc.cuny.edu/members/gotte/?legacy-portfolio=1

Some profiles have a Twitter widget. On the old system, this showed a dynamic list of tweets. Very few users ever used this, and it's possible that the widget has been broken for a while because Twitter is generally broken. See https://commons.gc.cuny.edu/members/boonebgorges, https://commons.gc.cuny.edu/members/boonebgorges?legacy-portfolio=1 Do we want to discard this?

Some tests for avatars:
- https://commons.gc.cuny.edu/members/josswinn/ is a real Gravatar that's been pulled over
- https://commons.gc.cuny.edu/members/gotte/ is an uploaded avatar that's been pulled over
- https://commons.gc.cuny.edu/members/mbarr/ has a default avatar

I recommend looking at the CVs above, looking at your own CV, scrolling through https://commons.gc.cuny.edu/members/ and doing side-by-side comparisons.

You may also consider walking through the Edit CV interface for your own migrated CV, to ensure that the right buttons are appearing through the interface ("Edit CV", not "Create") and that you're able to edit as expected.

If you find issues, please be sure to share specific URLs and/or steps to reproduce.

Actions #27

Updated by Raymond Hoh 3 months ago

Great to see the portfolio to CV migration up-and-running on cdev, Boone.

There seems to be a bug in the Public Portfolio that causes field in the header ("website" etc) to be improperly escaped. I assume this is due to changes made elsewhere as part of the CV migration. You can ignore this, since the Public Portfolio is going away.

You assume correctly! For the deployment actions in #19278 (and on cdev), I changed the Blog and Website profile fields to use the URL profile field type so they would work better on a user's Commons Profile member header. Before, these fields were using the Textbox profile field type. This explains the wonkiness in the Public Portfolio view, which we can ignore as we're getting rid of the Portfolio as you mentioned.

Some profiles have a Twitter widget. On the old system, this showed a dynamic list of tweets. Very few users ever used this, and it's possible that the widget has been broken for a while because Twitter is generally broken. See https://commons.gc.cuny.edu/members/boonebgorges, https://commons.gc.cuny.edu/members/boonebgorges?legacy-portfolio=1 Do we want to discard this?

Let's get rid of the old Twitter widget.

Actions #28

Updated by Boone Gorges 3 months ago

Thanks for confirming the escaping behavior, Ray!

Agreed on the Twitter widget. I've made the change to the migrator in https://github.com/cuny-academic-commons/cac/commit/5c11ce2f7af0a34233218aa4ec513180a2f05a8e. I haven't re-run it on cdev, so you'll still see Twitter widgets on some CVs there.

Actions #29

Updated by Colin McDonald 2 months ago

This all looking great to me as well Boone, many thanks, and I agree about getting rid of the Twitter widget. Here are a couple of other small issues I noticed while testing, in case they still help, but I think this is basically good to go as it is currently.

https://commons.gc.cuny.edu/members/cirasella/?legacy-portfolio=1
https://commons.gc.cuny.edu/members/cirasella/
- Not crucial, but would be nice if the CV recognized and used the line breaks in the Education section in the legacy so it's easier to follow the three different degrees.

https://commons.gc.cuny.edu/members/msmith/?legacy-portfolio=1
https://commons.gc.cuny.edu/members/msmith/
- The From the Blog and Recently Interesting bottom sections here seem to use a legacy feature to auto-display RSS feeds. Likely very edge cases that this would apply to, but perhaps we exclude these sections if we can't migrate and show the auto-display easily?

https://commons.gc.cuny.edu/members/cstein/?legacy-portfolio=1
https://commons.gc.cuny.edu/members/cstein/
- Something similar to msmith going on here with the Most Viewed Videos bottom section, though for legacy it isn't showing anything at all and for CV it shows a broken RSS link. Perhaps another reason to just exclude these kinds of sections entirely, as a lot may have older links in them anyway.

https://commons.gc.cuny.edu/members/lwaltzer/?legacy-portfolio=1
https://commons.gc.cuny.edu/members/lwaltzer/
- Why was the Program Website bottom section of legacy excluded from the CV?

Actions #30

Updated by Boone Gorges 2 months ago

Thanks for the thorough review, Colin.

- Not crucial, but would be nice if the CV recognized and used the line breaks in the Education section in the legacy so it's easier to follow the three different degrees.

This is a case of some weird legacy markup that must have been allowed in an early version of the Public Portfolio tool. In https://github.com/cuny-academic-commons/cac/commit/8c9b0c909bb04c35ea195f643049cf4048613a0a I made a change so that line breaks are preserved.

- The From the Blog and Recently Interesting bottom sections here seem to use a legacy feature to auto-display RSS feeds. Likely very edge cases that this would apply to, but perhaps we exclude these sections if we can't migrate and show the auto-display easily?

I had forgotten about RSS widgets in the old system. I would prefer to avoid excluding content where possible, and I realized that WP has an RSS widget. So I've made a change that migrates the old RSS widgets into RSS blocks. It's basic but it works. https://github.com/cuny-academic-commons/cac/commit/8b4a783de806c05e87aa88ef21553172d384be32 https://github.com/cuny-academic-commons/cac/commit/523c0aba20a3a9de49fe0151ff222bfa9547c74b In cases where the source content is defunct, you'll now see an RSS error; see https://commons.gc.cuny.edu/members/cstein/

- Why was the Program Website bottom section of legacy excluded from the CV?

Another case of old markup, where Luke's profile included a text block with no wrapping element. This should be accounted for by https://github.com/cuny-academic-commons/cac/commit/90102615c9cef5fa28ec21aa5ee15bd65f611fe9

I have re-run the CV migration for the users mentioned in your message, but for no other users on cdev.

Actions #31

Updated by Colin McDonald 2 months ago

Thanks for the quick work on these, Boone. The only other thing I noticed confirming all of the fixes on CDEV is that these two legacy portfolios have full URLs pasted into bottom sections that are formatted as clickable, but on the CV version they're only recognized as text. Perhaps there's a way to address that?

https://commons.gc.cuny.edu/members/lwaltzer/?legacy-portfolio=1
https://commons.gc.cuny.edu/members/cirasella/?legacy-portfolio=1

https://commons.gc.cuny.edu/members/lwaltzer/
https://commons.gc.cuny.edu/members/cirasella/

Actions #32

Updated by Boone Gorges 2 months ago

Thanks for the quick work on these, Boone. The only other thing I noticed confirming all of the fixes on CDEV is that these two legacy portfolios have full URLs pasted into bottom sections that are formatted as clickable, but on the CV version they're only recognized as text. Perhaps there's a way to address that?

In the Public Portfolio, BuddyPress handled this by filtering the content before display: https://github.com/cuny-academic-commons/cac/blob/8e6135b8897a0226462633e7bebe2b6d16a39c1d/wp-content/plugins/buddypress/bp-xprofile/bp-xprofile-filters.php#L26 In the case of the new system, we're using the block editor. If we were to take a similar approach, it would mean that content would be linked on the front-end which is not linked when editing. This creates a confusing experience for users, especially because the filter can't be disabled on a case-by-case basis. The other approach would be to handle these at the time of migration: detect when a field contains a URL (perhaps nothing but a URL?) and then to turn it into a proper link in the saved content. This way, it would show up as a standard link when editing the CV. I feel uneasy about this because it means modifying user-generated content, using regular expressions that can be finicky when run over a big set of data like this. As such, my preference is to do nothing, and to let users who notice this discrepancy to "correct" it themselves.

Actions #33

Updated by Colin McDonald 2 months ago

Seems fine to me, especially because this is likely a very edge case, and it wouldn't be hard for a user to fix or for a visitor to get to the URL anyway with a copy/paste. Thanks for investigating it further.

Actions

Also available in: Atom PDF