Project

General

Profile

Feature #3231

Feature #3230: Scripts for quicker provisioning/updating of development environments

"Clean" versions of production data

Added by Boone Gorges almost 5 years ago. Updated about 3 years ago.

Status:
Resolved
Priority name:
High
Assignee:
Category name:
Internal Tools and Workflow
Target version:
Start date:
2014-05-28
Due date:
% Done:

0%

Estimated time:
15.00 h

Description

Some version of the Commons database is necessary to provision a new instance of the site for development purposes. Using a raw dump of the entire Commons database(s) is impractical for a number of reasons:

- It's extremely large
- It's divided between two databases (WP and MediaWiki)
- It contains a great deal of non-public and potentially sensitive data

I'd like to write a tool that will create a cleaned-up version of the production database, which will be safe and fairly easy to pass around for people to provision or update their local environments. Here's a sketch of what I have in mind:

- Ignore the wiki for the time being. Let's get it set up with WP and then worry about what to do with MW. (If you need a copy of the MW db in the meantime, we can figure something out)
- Off the top of my head, the following tasks need to be performed: * For all user accounts, reset passwords to something neutral (so it's easy to log in as a different user for testing) * For all user accounts, reset user_email to something fake (to avoid improper emails going out to actual addresses) * For all users, delete any xprofile data that is not set to be visible to the public (or maybe public + logged-in) * Delete all non-public groups, along with associated activity, forums, docs, files * Delete all blogs that are not set to be visible by the public (or maybe public + logged in) * For all blogs, delete all password-protected posts * For all blogs, delete all post drafts and post revisions (this is both for privacy reasons and to reduce the size of the database) * For all blogs, delete all non-published comments * Delete all private messages
- Obviously, since many of these checks will be WP-based, the "cleaner" will have to be a WordPress plugin of some sort. I'm thinking this is a good use for wp-cli.
- At the same time, we cannot perform any of these actions on the production database. So part of the cleaner will also have to be a tool that will export a raw version of the db and set up a parallel instance of WP (see #3230 for some ways forward with this). Much of this aspect could be manual for now.
- Most of this stuff is not specific to the Commons, so we should build this in a way that is fairly abstracted with the thought of making it available as a plugin

I'm soliciting feedback from the team on this, especially Ray. What items have I missed above, both in terms of privacy and in terms of paring down a huge database? Do you have any implementation ideas beyond (or in lieu of) what I've spelled out above?

(As a bit of background, we're in the process of bringing on one or two folks to work on the Commons in the fairly near future, so getting at least a semi-manual version of this cleaner up and running is somewhat of a priority.)

History

#1 Updated by Boone Gorges almost 5 years ago

A first pass at this functionality is in https://github.com/cuny-academic-commons/cac-database-cleaner. I'm going to follow up privately with Dan to make sure the generated database dump is working correctly, and I'll post back here with details.

#2 Updated by Boone Gorges about 3 years ago

  • Status changed from Assigned to Resolved

Dan and I have run through a couple sets of exports over the past year or so, and the cac-database-cleaner tool is working pretty well. I'm going to mark this ticket as resolved.

Also available in: Atom PDF