Auto Scheduled Tasks #23586
closedPruning unconfirmed users
Description
We talked in the call today about pruning (or perhaps just deleting, within a certain timeframe?) the ~50K unconfirmed users we realized in #23545 are being held after signing up for the Commons but not completing the process. It is probably a good idea security-wise, but perhaps it helps reduce our database load too.
Files
Related issues
Updated by Colin McDonald 6 months ago
- Related to Support #23545: over 50,000 unconfirmed users on my private site added
Updated by Boone Gorges 6 months ago
Do we want an automated tool to do this? I think we'd want something like:
1. Run on a schedule (how often? once monthly?)
2. Identify pending items of a certain age (how old? 1 month? 6 months?)
3. Delete them (what kind of logging do we need?)
If we want to do this in chunks, we could first build the tool that does 2 and 3, and run it manually at first. Then we can decide how and whether it's worth adding the automation (1).
Updated by Raymond Hoh 6 months ago
WP-CLI does have a command to delete all signups with:
wp user signup delete --all
But we don't want to do that.
For age range, I'm thinking delete any signup older than four months as that is the length of a semester:
wp db query "SELECT signup_id FROM wp_signups WHERE registered < DATE_SUB(NOW(), INTERVAL 4 MONTH)" --format=ids | xargs -n 1 wp user signup delete
Then for a scheduled task, we can create a scheduled task to run every month. If the remaining number is small, can make the interval longer, like every two months or so.
Updated by Boone Gorges 6 months ago
Those limits sound good to me. Would be nice to get confirmation from Colin.
Updated by Boone Gorges 20 days ago
Colin, do we want to pick this back up and consider it?
Updated by Colin McDonald 13 days ago
As discussed in the monthly meeting on Friday, Boone is going to manually remove the large backlog of these "users" soon, and then he will set a reminder to do so again annually. We also discussed whether a user can re-register with one of these emails once the backlog is removed. Could we record a few emails before removal and try them to see, or figure this out another way?
Updated by Boone Gorges 13 days ago
Thanks, Colin. So here's what I'm thinking: Ray sketched a command using xargs above, but wp db query doesn't allow you to return --format=ids in that way (at least so far as I can see), and plus passing each individual ID to wp user signup delete would take ages. Internally, WP-CLI's delete method just removes the row from the database: https://github.com/wp-cli/entity-command/blob/36afb61fb7058123f33558990ac49c7fc3987d19/src/Signup_Command.php#L317 So i think that, instead, we should just do something simple like this:
DELETE FROM wp_signups WHERE active = 0 AND registered < DATE_SUB(NOW(), INTERVAL 6 MONTH)
Note that this only deletes unconfirmed items, those that were never activated. We need to keep the activated ones around, since they are our record of the email address originally used at signup.
If this logic seems OK to others, I'll run it soon, and set an alert to run it every 6 months. This first run will delete 4971 accounts (at the time of writing), out of a total 5,266 pending signups. Future runs will of course delete far fewer.
Regarding reusable email addresses, I've spent some time looking into this, and here's how it works. When you register, a "signup" object is created with your CUNY email address. When you activate the signup, a user account is created with that same address. You can then change the email address on your account, but that doesn't change the email address on the signup. If you then try to register for a new account using the same email address, and it's within 48 hours of the original signup, you'll get a rejection: "That email address is pending activation..." (not totally accurate, but it comes from WP core). Past 48 hours, the new signup will simply cause the old one to be deleted and will take its place. This means that, in theory, a user could register for arbitrary accounts using the same email address, as long as they (a) changed the address on the created accounts, and (b) they waited at least 48 hours between. We could build something to prevent this - say, don't allow a registration using an email address associated with any signup that has been successfully activated ever (not just < 48hr), but I don't know whether it's worth the effort.
Updated by Colin McDonald 13 days ago
Sounds good to me Boone about the initial pruning and running it every 6 months. Thanks too for looking into the email address reuse. I wanted to clarify this part: "Past 48 hours, the new signup will simply cause the old one to be deleted and will take its place."
So if I sign up with my GC email, then make my Gmail my account email, then sign up 72 hours later again with my GC email, I will now have two accounts, but there will only be one signup record in the system? Just curious about what is being deleted and if that messes with our desired paper trail at all.
It doesn't seem to me that we need to build something here, unless we start to see this being exploited.
Updated by Boone Gorges 12 days ago
So if I sign up with my GC email, then make my Gmail my account email, then sign up 72 hours later again with my GC email, I will now have two accounts, but there will only be one signup record in the system? Just curious about what is being deleted and if that messes with our desired paper trail at all.
Correct. WP deletes the old signup in this case: https://github.com/WordPress/wordpress-develop/blob/d673955e0e2a8d7522bc243646a899d7196a2522/src/wp-includes/ms-functions.php#L560 If we want to have truly permanent records of users' signup email address, we'll have to build that feature ourselves.
I'm going to wait a day or two to perform the query so that Ray or Jeremy has a chance to look at my reasoning in https://redmine.gc.cuny.edu/issues/23586#note-7 and make sure I'm not wrong about something.
Updated by Raymond Hoh 12 days ago
DELETE FROM wp_signups WHERE active = 0 AND registered < DATE_SUB(NOW(), INTERVAL 6 MONTH)
After seeing that wp user signup delete does a straight DB query and nothing else, your DB query looks good to me, Boone.
On the call, we also talked about saving the user's original signup email address into user meta. Do we still want to do that? If so, maybe do this for new users?
Updated by Boone Gorges 11 days ago
- File cac-save-initial-email-address-for-existing-users.php cac-save-initial-email-address-for-existing-users.php added
- Category name set to Registration
- Status changed from New to Resolved
- Target version set to Not tracked
Thanks, Ray!
Yes, let's save initial email addresses. I've added this in https://github.com/cuny-academic-commons/cac/commit/3ee74b74b8e4b4acb4e16ed7557d3176a38e1722. It's saved to the 'cac_initial_email_address' meta key in usermeta at the time of account activation. I've pushed this to production as a hotfix.
I also ran a tool to fill for existing users. See attached script. It goes through all Commons users, tries to find a row in wp_signups that matches the user_login, and if it finds it, it uses the user_email from that signup row to fill 'cac_initial_email_address'. It matched 54,297 users and missed 5,672. I don't have a good explanation for why a small percentage of signups are not in the system, but in any case, this is the best we can do.
Then, I ran the DELETE query, which removed all non-activated signups older than 6 months. This resulted in 4,971 rows being deleted.
I'll set a calendar reminder to do this every six months. If this gets annoying to maintain, we'll set up an automated routine to do it.
Thanks for the help thinking through this.