Project

General

Profile

Feature #14181

Design/UX #13998: Homepage Redesign

"Suggested" tool for members, groups, sites

Added by Boone Gorges 4 months ago. Updated about 1 month ago.

Status:
New
Priority name:
Normal
Assignee:
Category name:
Home Page
Target version:
Start date:
2021-03-16
Due date:
% Done:

0%

Estimated time:

Description

The new homepage design have sections for "Groups you might like", "Sites you might like", "People you may know". See "Members Homepage" section at https://redmine.gc.cuny.edu/issues/13999#note-21.

We still have to sort out the precise logic that drives these suggestions. For example, they may simply be based on campus affiliation - ie, 'Groups you may like' are those that are associated with a campus that is among the campuses that your profile is associated with. We will also have some minimal filters on the eligible items: groups and sites should only be suggested if they are public, and we may also want to exclude items that have not been recently active. Further, we may start with a fairly simple mechanism like 'campus affiliation', but then later add more complex logic (for example, suggesting groups that have a lot of membership overlay with my own groups). So whatever set of tools we develop should make it possible to modify the logic down the road.

Data is stored in very different ways for groups, sites, and members, so it's likely that these tools will look pretty different under the hood. But I think there's some advantage to building them all at the same time, so that we can use a similar set of higher-level API functions for pulling up the items on the homepage. (For example, they'll all need similar parameters like "per_page", they'll all require that items be pulled up in random order(?), etc.)

I believe Colin is going to provide some more specifics about the required logic.

Ray, once we have a spec, could you take the lead on this?

History

#1 Updated by Colin McDonald 4 months ago

Thanks for starting this ticket, Boone. It would be great if we could leave a foundation here for more complex logic down the road. But for starters, I think for all three of People/Groups/Sites we've been coalescing around:

-From the user's campus
-Most public privacy level only
-Most recently active

For groups I think we could potentially get more complicated because we have the added measurement of total members, and perhaps some groups will dominate the recent activity metric (or it's just too noisy to seem like a real recommendation). I don't think we just want to show the biggest groups, as that could have little variance over time also. It would be nice if new groups worked their way in sometimes, whatever we choose. Is it possible to go with highest growth rate over a time period, like two weeks, rate being new members in that span divided by total members? Or open to other thoughts.

#2 Updated by Boone Gorges 4 months ago

It would be nice if new groups worked their way in sometimes, whatever we choose.

"Creation" counts as "activity", so they'll end up in the most-recently-active list.

Is it possible to go with highest growth rate over a time period, like two weeks, rate being new members in that span divided by total members? Or open to other thoughts.

We don't have any mechanism for tracking growth over time, so I recommend that we table this suggestion. I think that activity is likely to be a good proxy for this anyway.

In fact, activity is likely to be a good proxy for lots of measures of "interestingness" that I can think of, so I agree that it's a good place to start with all three content types. As you note, the one danger is that extremely busy groups could create noise. But I think we should wait to address this "problem" until we've seen it in action. So, I think that your suggested criteria are good for now: show the most recently active items that are public and from the user's campus(es).

#3 Updated by Colin McDonald 4 months ago

Sounds good to me. I'll keep an eye on this ticket if you hit any snags with that logic or want to tweak it.

#4 Updated by Raymond Hoh about 2 months ago

Based on the following criteria:

So, I think that your suggested criteria are good for now: show the most recently active items that are public and from the user's campus(es).

Here's a brief tech overview:

Users:

  • User's campus is stored in BP xprofile field data under the field "College".
  • A BuddyPress user does not have an option for privacy, so we can ignore this.
  • User's recently active timestamp is stored in the BuddyPress activity component.
  • BP members loop (bp_has_members()) cannot be filtered by date_query. Can be an upstream improvement to BuddyPress.

Groups:

  • Group's campus is stored in group meta under the "cac_campus" meta key. (See cac_get_group_campuses().)
  • A BuddyPress group can be filtered by public status, so we will only query for public groups.
  • Group's recently active timestamp is stored in group meta under the "last_activity" meta key.
  • BP groups loop (bp_has_groups()) cannot be filtered by date_query. Can be an upstream improvement to BuddyPress.

Sites:

  • Site's campus is stored in BP site meta under the "cac_campus" meta key. (See cac_get_site_campuses().)
  • Only public sites are recorded into the BP blogs table, so we can just do a regular BP site loop query.
  • Site's recently active timestamp is stored in BP site meta under the "last_activity" meta key.
  • BP sites loop (bp_has_blogs()) cannot be filtered by date_query. Can be an upstream improvement to BuddyPress.

Implementation notes / questions:

  • Will use a custom cac/suggestions REST API endpoint for this. Will probably extend the existing BuddyPress REST controller classes, but will override the register_routes() and get_items() methods to add our custom campus filter.
  • The date_query improvements would allow us to do fine-grained recently active lookups. For example, recently active groups from X campus over the last three months. However, this is not a requirement.
  • What should we display if there are no suggestions or not enough suggestions? A message to visit the respective directories?
  • How often should a suggested item be seen? A simple random order is mentioned in the initial post. However, if we wanted to do something more complicated, do we want to record how many times a certain item is shown with a transient and to omit those items from the suggestions query for X number of days? Perhaps the recording should only take place for the first paginated set of suggestions. Feedback appreciated.

Regarding Colin's question:

Is it possible to go with highest growth rate over a time period, like two weeks, rate being new members in that span divided by total members?

This might be possible by:

  1. Asking BuddyPress to pull up all the "X joined the group" (joined_group) activity items over the last two weeks.
  2. From the results, we parse out the group for each joined_group activity item and tally the count. Then we can calculate the "new members / total members" rate that way.

This can only be possible for groups since we do not record an activity item when someone joins a site.

#5 Updated by Boone Gorges about 2 months ago

What should we display if there are no suggestions or not enough suggestions? A message to visit the respective directories?

Will this ever actually happen? If we're filtering only by shared Campus, it seems like there will always be results, since each campus is fairly well represented in our site/group/user directories. But, in case it ever does happen, perhaps we could simply lift the Campus requirement and randomly pull from the entire Commons community. (Or, if we are operating using a restriction like "active in the last month", we could bump that number up to two or three months in order to fill in the missing slots.)

How often should a suggested item be seen? A simple random order is mentioned in the initial post. However, if we wanted to do something more complicated, do we want to record how many times a certain item is shown with a transient and to omit those items from the suggestions query for X number of days? Perhaps the recording should only take place for the first paginated set of suggestions. Feedback appreciated.

This feels like it's bound to get complicated. If the pool of available suggestions is large enough, then I propose that random suggestions will be "good enough". In other words, if there are four slots for suggested groups, and they're being pulled from a pool of 50 qualifying groups, then on average, you'll only see a duplicate every couple of page refreshes. So I guess my suggestion is to make the suggestion criteria loose enough so that the pool of suggestions is fairly large, rather than trying to force the appearance of randomness.

At some point down the road, if our suggestions are based on more sophisticated logic, it could turn out that they're "too" specific. In that case, we might consider a dismissal mechanism for individual suggestions. (Twitter had this at one time - not sure if it still exists.)

Is it possible to go with highest growth rate over a time period, like two weeks, rate being new members in that span divided by total members?

This sounds really complex to me, and I suggest that we set it aside for a future iteration. Keep the suggestion logic simple for now, and we can complicate it down the road.

A question for Ray and the group: Will the suggestion query exclude items that you are already connected to? In other words, don't suggest groups that a user is a member of, members that the user is friends with, sites that the user is a member of?

#6 Updated by Raymond Hoh about 2 months ago

Will this ever actually happen?

(Or, if we are operating using a restriction like "active in the last month", we could bump that number up to two or three months in order to fill in the missing slots.)

Good call. Maybe this will occur for groups when we go with some more complicated logic as discussed above. We can increase the time period if we decide to go that route as you mentioned.

If the pool of available suggestions is large enough, then I propose that random suggestions will be "good enough". In other words, if there are four slots for suggested groups, and they're being pulled from a pool of 50 qualifying groups, then on average, you'll only see a duplicate every couple of page refreshes.

Now that you mention it, random makes more sense and is easier!

Is it possible to go with highest growth rate over a time period, like two weeks, rate being new members in that span divided by total members?

I actually think what I suggested is not that complicated. The BP activity component already has the ability to filter by date. But as I mentioned above, this would only work for groups due to the joined_group activity item. Update - the one thing is filtering out private groups after the initial activity query though, so we might get less results afterwards.

Will the suggestion query exclude items that you are already connected to?

I would probably recommend excluding since if you're already connected, you probably wouldn't be as interested in viewing the suggested item.

#7 Updated by Raymond Hoh about 1 month ago

I've got a first pass of this ready for testing on cdev.

Once you are logged in, navigate to the homepage and view the widgets in the third column under "Recent Posts". I've added a suggestion widget for each component -- Members, Groups and Sites. The widgets are lazy-loaded, meaning the suggestions only load when they appear on screen and not before.

Right now, the algorithm is as suggested: show the most recently active items that are public and from the user's campus(es). I primarily did this so we can see results on cdev :)

I didn't add a date query to limit the results to say the most recently active items from X months with a random sort, but the functionality is already coded and ready to go if we decide to go in that direction. I've also added a patch upstream for BuddyPress to limit results by date for members, groups and sites here.

To test suggestions with a different college, you will need to go to https://commons.gc.cuny.edu/wp-admin/users.php?page=bp-profile-edit to select a different one for your user account. I tested with CUNY Graduate Center.

Technical notes:
  • While testing on cdev, I found that the College data for older users was saved as serialized data before moving to a string format. So I had to account for that in the member xprofile_query here. Question for Boone, were users able to select multiple Colleges on the registration page in the past? If so, I'll have to rework the suggestions queries to handle this.
  • Unlike the member's College data, the group and site's College meta is saved with the college key, not the college full name. So I created a helper function called cac_get_cuny_campus_key_by_field() to fetch the college key from the fullname (I haven't committed this function yet).
  • I decided to utilize the existing BuddyPress REST endpoints rather than implement a new REST namespace and used a custom type=suggestions in the query loops to modify the query.
To do:
  • I haven't included the "Add Friend" button to the members suggestions widget as outlined in the new homepage wireframes yet.
  • I haven't added pagination support or the ability to refresh the suggestions with a new set yet.
  • Styling adjustments once the new homepage is at a suitable stage in development.

#8 Updated by Boone Gorges about 1 month ago

Awesome! Thanks for sharing your progress, Ray. I've looked through the cac-suggestions repo and it's looking awesome.

While testing on cdev, I found that the College data for older users was saved as serialized data before moving to a string format. So I had to account for that in the member xprofile_query here. Question for Boone, were users able to select multiple Colleges on the registration page in the past? If so, I'll have to rework the suggestions queries to handle this.

I don't recall what the registration screen used to look like. But I just did a query on the production site and it looks like 3064 out of 5444 values of 'College' are serialized arrays, so we have to do something about it. I'd like to move toward a unified treatment of College/Positions data, but this will take a good amount of work across the codebase. So, for now, a LIKE query like the one you've linked to is probably best.

Unlike the member's College data, the group and site's College meta is saved with the college key, not the college full name. So I created a helper function called cac_get_cuny_campus_key_by_field() to fetch the college key from the fullname (I haven't committed this function yet).

Blergh. See above. I intentionally moved to storing the key rather than the name, because the display name occasionally changes. If you're going to go with a single source of truth, I'd make it the key (cac_get_cuny_campuses()) and then build a translation map to the name(s) of each college. Sounds like this might be the opposite of what you're suggesting, but there's more than one way to skin this cat.

I think we can hold off on additional interface mods until after we have the first round of visual designs.

#9 Updated by Raymond Hoh about 1 month ago

But I just did a query on the production site and it looks like 3064 out of 5444 values of 'College' are serialized arrays, so we have to do something about it.

Were you able to quickly see if any of those serialized arrays contained multiple colleges?

I'd like to move toward a unified treatment of College/Positions data, but this will take a good amount of work across the codebase.

Ahh, I just noticed the Positions field from the CAC Advanced Profiles plugin and on the Groups Directory filter interface. Did you want me to follow similar logic for the filtering?

If you're going to go with a single source of truth, I'd make it the key (cac_get_cuny_campuses()) and then build a translation map to the name(s) of each college.

I chose the opposite approach because the value from a user's xprofile data is the full college name, not the key.

This is the code for cac_get_cuny_campus_key_by_field:

/**
 * Fetch the campus key by campus data field.
 *
 * @param  string $value Value to search for.
 * @param  string $field Field to search from. Default: 'full_name'
 * @return string
 */
function cac_get_cuny_campus_key_by_field( $value = '', $field = 'full_name' ) {
    foreach( cac_get_cuny_campuses() as $key => $data ) {
        if ( $data[$field] === $value ) {
            return $key;
        }
    }
    return '';
}

I'll commit this function once I get your sign-off :)

#10 Updated by Boone Gorges about 1 month ago

Were you able to quickly see if any of those serialized arrays contained multiple colleges?

mysql> select count(*) from wp_bp_xprofile_data where field_id = 2 and value like 'a:1:%';
+----------+
| count(*) |
+----------+
|     2676 |
+----------+
1 row in set (0.01 sec)

So it's about 400 that have more than one. We could simply convert these to separate entries in the data table, but I'm unsure if this would break a lot of other stuff.

Ahh, I just noticed the Positions field from the CAC Advanced Profiles plugin and on the Groups Directory filter interface. Did you want me to follow similar logic for the filtering?

In theory, data from Positions should be synced to College. See https://github.com/cuny-academic-commons/cac/blob/1.18.x/wp-content/plugins/cacap-cac/includes/widgets/positions.php#L80, https://github.com/cuny-academic-commons/cac/blob/1.18.x/wp-content/plugins/cacap-cac/cacap-cac.php#L13

In fact, it could be that this is where the serialized entries for College are coming from.

In any case, maybe you could do a couple of queries to see if Positions is, in fact, mirrored in College. (Like, maybe compare a list of user IDs in each place - there should be none in Positions that aren't also reflected in College) If so, there's probably no need to check Positions.

All this is demonstrating how the separation between the two is really problematic, so if you have bright ideas about how to tackle the larger problem, I'd be glad to hear them.

I chose the opposite approach because the value from a user's xprofile data is the full college name, not the key.

As long as the name of the college doesn't change, this should work OK. We went through a process of changing a couple college names in the last year or two, but at that time I ran a script that did the search-replace in the bp_xprofile_data table as well. So they should match up.

Also available in: Atom PDF