Project

General

Profile

Actions

Bug #10720

closed

GES email queue triggers many failed user lookups

Added by Boone Gorges over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority name:
Normal
Assignee:
-
Category name:
Email Notifications
Target version:
Start date:
2018-11-17
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

When BP sends emails, it sets the From and Reply-To field using the BP_Email_Recipient class. We (in RBE and in bp-custom) then override the Reply-To address in various cases. BP_Email_Recipient, in turn, does a get_user_by() lookup on these email addresses. The problem is that the emails we pass into these functions generally do not belong to actual Commons users - , etc. The way that get_user_by() works means that failed lookups are not cached; see https://core.trac.wordpress.org/browser/tags/4.9.8/src/wp-includes/class-wp-user.php#L214. So we get lots of duplicate database queries for the same users, which look like this:

 [1739] => Array
        (
            [0] => SELECT * FROM wp_users WHERE user_email = 'commons@gc.cuny.edu'
            [1] => 0.001478910446167
            [2] => do_action('wp_ajax_nopriv_wp_bpges_send_queue'), WP_Hook->do_action, WP_Hook->apply_filters, WP_Async_Request->maybe_handle, BPGES_Async_Request_Send_Queue->handle, BPGES_Async_Request_Send_Queue->handle_digest_queue, bpges_process_digest_for_user, bpges_generate_digest, ass_send_email, bp_send_email, bp_get_email, BP_Email->__construct, BP_Email->set_from, BP_Email_Recipient->__construct, BP_Email_Recipient->get_user, get_user_by, WP_User::get_data_by
        )

    [1740] => Array
        (
            [0] => SELECT * FROM wp_users WHERE user_email = 'cunycommons@gmail.com'
            [1] => 0.0023388862609863
            [2] => do_action('wp_ajax_nopriv_wp_bpges_send_queue'), WP_Hook->do_action, WP_Hook->apply_filters, WP_Async_Request->maybe_handle, BPGES_Async_Request_Send_Queue->handle, BPGES_Async_Request_Send_Queue->handle_digest_queue, bpges_process_digest_for_user, bpges_generate_digest, ass_send_email, bp_send_email, bp_get_email, BP_Email->__construct, BP_Email->set_reply_to, BP_Email_Recipient->__construct, BP_Email_Recipient->get_user, get_user_by, WP_User::get_data_by
        )

    [1741] => Array
        (
            [0] => SELECT * FROM wp_users WHERE user_email = 'cunyacademiccommons+bounce@gmail.com'
            [1] => 0.0011720657348633
            [2] => do_action('wp_ajax_nopriv_wp_bpges_send_queue'), WP_Hook->do_action, WP_Hook->apply_filters, WP_Async_Request->maybe_handle, BPGES_Async_Request_Send_Queue->handle, BPGES_Async_Request_Send_Queue->handle_digest_queue, bpges_process_digest_for_user, bpges_generate_digest, ass_send_email, bp_send_email, bp_get_email, BP_Email->__construct, do_action('bp_email'), WP_Hook->do_action, WP_Hook->apply_filters, BuddyPress->{closure}, BP_Email->set_reply_to, BP_Email_Recipient->__construct, BP_Email_Recipient->get_user, get_user_by, WP_User::get_data_by
        )

Normally, this is a fairly minor issue. But when GES sends its batches, these queries constitute 30-40% of the database i/o.

There are various ways this could be fixed.

1. In a sense, much of this would go away if WP properly cached failed user lookups. That said, it's hard to get a fix into WP (in general), and in addition, I'm not sure what the fix would be. We can't send a dummy value to update_user_caches() because it checks for $user->exists(). I guess you could just shove the failures directly into the useremails etc caches?
2. BP could be less aggressive about these user lookups. If I understand correctly, the get_user_by() calls in BP_Email_Recipient are meant to provide us with a sender Name, as well as a canonical email address. In our use case, these lookups are unneeded. At the very least, we ought to be able to filter them to prevent the lookups, since we know what they're going to be. Or, knowing that they're not going to match a WP user, we should be able to tell BP_Email_Recipient (via a filter?) not to ping the database. Or, BP's set_reply_to() and set_from() shouldn't use BP_Email_Recipient at all, since these are not recipients, and there will be many BP installs where the reply-to/from emails won't match actual users. Or, BP should have its own cache wrapped around some of this.
3. We could do our own magic to fix this. The only way I can think of is to wp_cache_add( $bad_email, 0, 'useremails' ) (which would be enforced whenever we begin sending an email, in case of a cache flush), but I have no idea whether this will cause other bugs.

Ray, I was hoping to get your thoughts before potentially opening upstream tickets.

Actions #1

Updated by Boone Gorges over 5 years ago

adding Ray as a watcher

Actions #2

Updated by Raymond Hoh over 5 years ago

Thanks again for another well-researched post, Boone.

I think the easiest way to fix this without hacking BuddyPress is to create our own version of get_user_by(). (This would be loaded early in /wp-content/mu-plugins.) Since get_user_by() is pluggable, we can write our own version to circumvent WordPress. Do a check to see if our email addresses are being looked up and bail from using WP's user lookup functions if any of our special email addresses are matched. This is probably the cleanest option.

Something like this:

function get_user_by( $field, $value ) {
        if ( 'email' === $field ) {
            switch ( $value ) {
                case 'commons@gc.cuny.edu':
                case 'cunycommons@gmail.com':
                case 'cunyacademiccommons+bounce@gmail.com':
                    return false;

                    break;
            }
        }

        // The rest of get_user_by() follows...
}

BuddyPress should stop hitting the DB now, but should still use our custom From and Reply-To email headers.

Method 3 is too complicated and not ideal.

For BuddyPress:

Or, BP's set_reply_to() and set_from() shouldn't use BP_Email_Recipient at all, since these are not recipients, and there will be many BP installs where the reply-to/from emails won't match actual users.

Agreed. We should at least disable user lookups by email for the Reply-To email header. The lookup might be useful for the From email header, but I can see how this also wouldn't be necessary for most sites. I think what you propose about replacing the usage of BP_Email_Recipient would work well for the From and Reply-To email headers.

Actions #3

Updated by Boone Gorges over 5 years ago

  • Status changed from New to Resolved
  • Target version changed from 1.15 to 1.14.1

Thanks for the feedback, Ray! I hadn't even thought of the pluggable get_user_by(). You're right that it's the best option (of several bad ones). I've implemented it in https://github.com/cuny-academic-commons/cac/commit/ffa8961d844c2050afa791dbd2aad4c242d7c4fd

It turned out we already had a get_user_by() function, for a different reason that no longer applies. See #842 (!). I've cleaned that up.

I've also opened a BuddyPress ticket to gather feedback on the issue more generally: https://buddypress.trac.wordpress.org/ticket/8003#ticket

Actions #4

Updated by Raymond Hoh over 5 years ago

It turned out we already had a get_user_by() function, for a different reason that no longer applies. See #842 (!).

(!), indeed.

I think we can also add support@cunycommons.zendesk.com to the list as well.

Actions #5

Updated by Boone Gorges over 5 years ago

Added in https://github.com/cuny-academic-commons/cac/commit/c68b96bd636800d76d68c0ab008f6b7119bba0aa. Though do we ever send email from this address? No harm in adding it :)

Actions

Also available in: Atom PDF