Project

General

Profile

Actions

Feature #21383

open

Feature #21380: Hosting migration

Offload media using S3-Uploads

Added by Boone Gorges 11 days ago. Updated 1 day ago.

Status:
New
Priority name:
Normal
Assignee:
Category name:
Server
Target version:
Start date:
2024-11-01
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

As part of our migration to Reclaim, we will be offloading our media files to Amazon S3. This is necessary for cost reasons, as well as for compatibility with load-balancing and other high-availability infrastructure at Reclaim.

Reclaim has requested that we use the following tool from Human Made: https://github.com/humanmade/S3-Uploads

Our first task is to gauge compatibility between this tool and the various parts of the Commons. As a starting place, here's a list of concerns:

1. We currently have a custom tool that uses a dynamically-generated .htaccess file to protect files uploaded to a private site. See https://github.com/cuny-academic-commons/cac/blob/2.4.x/wp-content/mu-plugins/cac-file-protection.php. We've got to determine whether this will continue to be compatible with S3-Uploads. My initial guess is that it won't, since S3-Uploads filters attachment URLs. Related, S3-Uploads allows uploaded files to be "private" https://github.com/humanmade/S3-Uploads?tab=readme-ov-file#private-uploads. I don't really understand what this does, so we'll have to research and understand whether it accomplishes something similar, and if so, how we migrate to it.

2. We allow file uploads of several types that aren't related to the WP Media Library. On the primary site, this includes user avatars, group avatars, forum attachments, buddypress-group-documents. On secondary sites, it might include various plugins that use a non-standard technique for accepting uploads (see eg Gravity Forms). We need to figure out what S3-Uploads means for all of these. It's possible that S3-Uploads won't interfere with them at all - ie, files will continue to be uploaded to and served from the web server. If so, we'll have to determine whether this is OK (in terms of performance, backups, cost, etc). The answer may differ depending on file type: I can image, for example, that it'd be OK to keep avatars on the web server, but that we'd be more motivated to move (potentially much larger) buddypress-group-documents to S3.

3. Reclaim has suggested that our team may want to roll out S3-Uploads integration before we do our final migration. There are a couple of reasons to like this idea: it reduces the number of moving parts on migration day, and it gives us plenty of lead time to upload existing files (1+ TB worth) to S3 well in advance of the launch date. Our team needs to decide whether this is feasible, and if so, when it will happen. Reclaim is serving as our vendor for AWS (ie we're paying Reclaim, and they're paying AWS), so we would need Reclaim to help us configure our bucket(s) in order for us to move forward with this.

Ray and Jeremy, I've never run a large site with S3-offloaded content, and I've definitely never run a migration of an existing site. Have you? It would be great to get your impression of the project, and your warnings about potential problems that I haven't discussed above.

Actions #1

Updated by Raymond Hoh 8 days ago

Ray and Jeremy, I've never run a large site with S3-offloaded content, and I've definitely never run a migration of an existing site. Have you?

I've worked on a multisite site that uses S3 Uploads. We maintain a fork that addresses a few issues. See
https://github.com/humanmade/S3-Uploads/compare/master...hwdsb:S3-Uploads:hwdsb-mods.

For the Commons, this would namely include the following:
- Fixes an issue with some older multisite URLs that use /blogs.dir/ in their uploads directory: https://github.com/WordPress/WordPress/blob/master/wp-includes/ms-default-constants.php#L31. The /blogs.dir/ uploads directory would apply for sites that existed before WordPress MU was merged into WordPress Core. S3 Uploads does not take this into account. We probably have older sites that use /blogs.dir/ for their uploads directory as well so this issue would apply to us as well. I forget the particulars of this issue, but I note this just so we are aware.
- Incompatibility with Gravity Forms. As you mentioned in point 2, there is an issue here with some non-standard plugins. I didn't look too far into the actual issue with Gravity Forms. I'm doing a dirty bail fix in the s3-uploads fork.
- Image URL rewriting via filters needed to be done as well for themes using a custom header image and background image since these URLs are written into the DB as theme mods and these URLs were referencing the local URLs instead of the S3 URLs.
- Also for this site, BuddyPress avatar uploads remain being served locally.

Reclaim is serving as our vendor for AWS (ie we're paying Reclaim, and they're paying AWS), so we would need Reclaim to help us configure our bucket(s) in order for us to move forward with this.

Do we have an estimate on the potential cost of using AWS? This could be in the thousands of dollars per year.

Actions #2

Updated by Boone Gorges 8 days ago

Ray, thanks so much for this!

- Fixes an issue with some older multisite URLs that use /blogs.dir/ in their uploads directory

Do you think Human Made would accept a PR for this? Seems like a general problem.

Do we have an estimate on the potential cost of using AWS? This could be in the thousands of dollars per year.

It's rolled into the top-line number that Reclaim is charging us. This is by design: I didn't want our team to be responsible for covering these variable costs, not to mention the overhead associated with configuring, maintaining, troubleshooting, itc.

Actions #3

Updated by Boone Gorges 1 day ago

Reclaim has asked what we'd like to use as the rewrite domain for S3-stored uploads. By default, S3 URLs are long and unwieldy, but we can rewrite them as something like files.commons.gc.cuny.edu/sites/1234/2024/01/foo.jpg. Are we OK with using files.commons.gc.cuny.edu for this purpose, or is there a better idea floating out there? We should decide this soon, because it'll require a DNS change at the Graduate Center, and I would like to be able to include this ask in our initial round of communication with GC IT.

Actions

Also available in: Atom PDF