Project

General

Profile

Actions

Bug #18098

closed

Disable NextGen Gallery's cache tracker

Added by Raymond Hoh about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority name:
Normal
Assignee:
Category name:
Performance
Target version:
Start date:
2023-04-21
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

While looking into our database storage issue (#17761), I've identified NextGen Gallery as a probable offender of our storage woes. I started logging new additions as well as new updates to the DB to one of our sites using NextGen (site ID 13027) and found that the log was 99.5% full with a NextGen DB option called 'photocrati_cache_tracker'. (For Jeremy or Boone, you can view the log at /PROD_WWW/html/commons/www/wp-content/uploads/ray-db-debug.log). The other 0.5% was from the 'akismet_spam_count' option (15 times) and the 'post_count' option (1 time). Since April 21, 1pm UTC until now (roughly April 22, 2am UTC), update_option( 'photocrati_cache_tracker' ) was called 3266 times.

The other problem other than the update frequency is the size of the 'photocrati_cache_tracker' option. That option currently sits at 7.32MB:

mysql> select char_length(option_value)/1024/1024 from wp_13027_options where option_name = 'photocrati_cache_tracker';
+-------------------------------------+
| char_length(option_value)/1024/1024 |
+-------------------------------------+
|                          7.31606674 |
+-------------------------------------+
1 row in set (0.03 sec)

Since InnoDB stores every type of transaction (a good readthrough is this PDF), I think it is safe to assume the 'photocrati_cache_tracker' option is the main culprit behind most of our storage problems (other than StatPress!).

So the next thing we need to figure out is what the 'photocrati_cache_tracker' option does. A thread on wordpress.org ran into the same problem as us: https://wordpress.org/support/topic/what-does-photocrati_cache_tracker-do/. One of the NextGen devs chimed in and mentioned that this option tracks all NextGen transients so they can purge the transients in groups when running an object cache. My first thought is this shouldn't be necessary since WordPress deletes expired transients on its own, but I digress! The dev fortunately decided to let site administrators disable this tracker through the 'NGG_DISABLE_PHOTOCRATI_CACHE_TRACKER' constant: https://github.com/cuny-academic-commons/cac/blob/34a3a64d76ed2291289d25c780dcbd13cf472952/wp-content/plugins/nextgen-gallery/non_pope/class.photocrati_transient_manager.php#L35-L36

I think our next steps are to:
1. Set the following line somewhere: define( 'NGG_DISABLE_PHOTOCRATI_CACHE_TRACKER', true );
2. Wipe out the 'photocrati_cache_tracker' option for all NextGen Gallery sites for performance reasons
3. Probably think about disabling the cronjob for NextGen Gallery as well with the following line: define( 'NGG_CRON_ENABLED', false );. See https://github.com/cuny-academic-commons/cac/blob/34a3a64d76ed2291289d25c780dcbd13cf472952/wp-content/plugins/nextgen-gallery/nggallery.php#L434-L439


Related issues

Related to CUNY Academic Commons - Bug #18016: ml-slider get_plugins() call causes performance issuesHoldBoone Gorges2023-04-12

Actions
Actions #1

Updated by Matt Gold about 1 year ago

Thank you, Ray!! This is exciting progress. I would be curious to know how many CAC sites are running this plugin

Actions #2

Updated by Boone Gorges about 1 year ago

Ray, this is great sleuthing. When I saw the option name 'photocrati_cache_tracker' it jumped right out at me, as it used to come up very frequently when looking through the processlist tracker that Lihua previously set up. Specifically, I remember there sometimes being pending write transactions with the option_name 'photocrati_cache_tracker' and very long arrays for the option_value. So it's possible that the problem we're seeing here is linked to database issues that we've had for a long time.

Looking at some of the values of 'photocrati_cache_tracker', I'm pretty sure there's a bug in nextgen-gallery. The arrays all seem to be full of the very same value. So when I ran wp --url=ct101.commons.gc.cuny.edu option get photocrati_cache_tracker (which I don't recommend - the output is so large that it kinda locks up the shell) I get an enormous array whose members are all the same:

...
    142978 => '2951165530',
    142979 => '2951165530',
    142980 => '2951165530',
    142981 => '2951165530',
    142982 => '2951165530',
    142983 => '2951165530',
    142984 => '2951165530',
    142985 => '2951165530',
    142986 => '2951165530',
    142987 => '2951165530',
    142988 => '2951165530',
...

Here's a rough picture of what's happening:
- When running a persistent cache, the "transient manager" uses its _track_key() method to keep a record of "key groups". See https://github.com/cuny-academic-commons/cac/blob/8f0c3c5db8e11545cad6010a82cac5e690bbeccf/wp-content/plugins/nextgen-gallery/non_pope/class.photocrati_transient_manager.php#L148, https://github.com/cuny-academic-commons/cac/blob/8f0c3c5db8e11545cad6010a82cac5e690bbeccf/wp-content/plugins/nextgen-gallery/non_pope/class.photocrati_transient_manager.php#L118.
- It decides what the "group" is by parsing the key. So the key would be something like 2__2951165530, which turns into "group 2" with a value of "2951165530".
- But in all cases, the plugin simply appends the key, without checking to see whether it's already present. See https://github.com/cuny-academic-commons/cac/blob/8f0c3c5db8e11545cad6010a82cac5e690bbeccf/wp-content/plugins/nextgen-gallery/non_pope/class.photocrati_transient_manager.php#L130.
- The purpose of the _tracker, based on the linked wordpress.org forum as well as my reading of the code, is to assist in deletion/flushing. See eg https://github.com/cuny-academic-commons/cac/blob/8f0c3c5db8e11545cad6010a82cac5e690bbeccf/wp-content/plugins/nextgen-gallery/non_pope/class.photocrati_transient_manager.php#LL44C3-L44C11
- As such, I don't think there's any reason why a cache "group" should ever include more than one copy of a given "key". It just loops over them and deletes the corresponding transients.
- I think this means that there should be an in_array() check or something similar in the _track_key() method to avoid dupes.

I don't fully understand the purpose of this "transient manager" in the first place, so I'm very slightly wary of simply shutting off the "cache tracker" tool. Perhaps instead we could try something like this: patch the installation in the way I've suggested, see if it breaks anything obvious on our site, and if not, then we report the issue to the developers to see if they can have a deeper look? If it doesn't work for whatever reason, we can always fall back on disabling the cache tracker using that constant.

3. Probably think about disabling the cronjob for NextGen Gallery as well with the following line: define( 'NGG_CRON_ENABLED', false );. See https://github.com/cuny-academic-commons/cac/blob/34a3a64d76ed2291289d25c780dcbd13cf472952/wp-content/plugins/nextgen-gallery/nggallery.php#L434-L439

Preventing this cleanup would probably reduce stress on Cavalcade, but it seems to me that it might negatively affect cache performance if the plugin is performing important cleanup. (I don't know whether it is :-D ) Perhaps related, I have previously throttled NextGEN's trash-collection cron job; see https://github.com/cuny-academic-commons/cac/blob/34a3a64d76ed2291289d25c780dcbd13cf472952/wp-content/mu-plugins/cavalcade.php#L60

Actions #3

Updated by Raymond Hoh about 1 year ago

I would be curious to know how many CAC sites are running this plugin

As of right now, there are 364 sites running NextGen Gallery.


I don't fully understand the purpose of this "transient manager" in the first place, so I'm very slightly wary of simply shutting off the "cache tracker" tool.

They use it to temporarily cache some values. Here are the cache groups that NextGen uses:

$ wp option get ngg_transient_groups --url=ct101.commons.gc.cuny.edu
array (
  '__counter' => 6,
  'WordPress-Router' =>
  array (
    'id' => 2,
    'enabled' => true,
  ),
  'col_in_wp_13027_posts' =>
  array (
    'id' => 3,
    'enabled' => true,
  ),
  'col_in_wp_13027_ngg_gallery' =>
  array (
    'id' => 4,
    'enabled' => true,
  ),
  'col_in_wp_13027_ngg_pictures' =>
  array (
    'id' => 5,
    'enabled' => true,
  ),
  'col_in_wp_13027_ngg_album' =>
  array (
    'id' => 6,
    'enabled' => true,
  ),
)

So:

It decides what the "group" is by parsing the key. So the key would be something like 2__2951165530, which turns into "group 2" with a value of "2951165530".

Group 2 would be for "WordPress-Router" and the transient key would be "2951165530".

NextGen would take these transient keys and delete the transients.

I think this means that there should be an in_array() check or something similar in the _track_key() method to avoid dupes.

Perhaps instead we could try something like this: patch the installation in the way I've suggested, see if it breaks anything obvious on our site, and if not, then we report the issue to the developers to see if they can have a deeper look?

Great catch, Boone. Let's do this.

We'll probably also need to fix all NextGen sites so the dupe array values are unique. This script should fix it:

$site_ids = GET_SITE_IDS_FOR_NEXTGEN;
foreach ( $site_ids as $site_id ) {
    $tracker = get_blog_option( $site_id, 'photocrati_cache_tracker' );

    if ( empty( $tracker ) ) {
        continue;
    }

    $changed = false;

    foreach ( $tracker as $key => $value ) {
        if ( is_array( $value ) ) {
            $changed = true;
            $tracker[$key] = array_unique( $value );
        }
    }

    if ( $changed ) {
        update_blog_option( $site_id, 'photocrati_cache_tracker', $tracker );
    }
}
Actions #4

Updated by Boone Gorges about 1 year ago

  • Status changed from New to Hold
  • Target version set to 2.1.6

Awesome, Ray. I've run the following and confirmed that it's deduped:

$site_ids = GET_SITE_IDS_FOR_NEXTGEN;
foreach ( $site_ids as $site_id ) {
    $tracker = get_blog_option( $site_id, 'photocrati_cache_tracker' );

    if ( empty( $tracker ) ) {
        continue;
    }

    $changed = false;

    foreach ( $tracker as $key => $value ) {
        if ( is_array( $value ) ) {
            $changed = true;
            $tracker[$key] = array_unique( $value );
        }
    }

    if ( $changed ) {
        update_blog_option( $site_id, 'photocrati_cache_tracker', $tracker );
    }
}

I've added my fix to the repo and have deployed it to production https://github.com/cuny-academic-commons/cac/commit/a75eafb4b19308fe2a728804ac7aff3d9556cf2c

I'm going to put this ticket into the 2.1.6 release. This will give us a chance to review a few weeks from now whether (a) the cache tracker size is staying small, and (b) whether there have been reports of issues with nextgen-gallery. If we're good, I'll take a moment at that time to report our suggested fix to the nextgen-gallery devs, and we'll figure out next steps from there.

Actions #5

Updated by Matt Gold about 1 year ago

You guys are awesome

Actions #6

Updated by Raymond Hoh 12 months ago

Just to update this thread, I just checked my added_option / updated_option log and am pleased to note that Boone's fix for NextGen Gallery has basically stopped the 'photocrati_cache_tracker' option from continuously updating :)

The only entries I see in the log now are for more general updates ('akismet_spam_count', 'post_count', 'sticky_posts' and 'feedwordpress_diagnostics_log'). I think this ticket is pretty much resolved at this point. However, I'll leave it open for now so we can provide the NextGen team with Boone's fix and also so Boone can add a note in his wp-cli-cac tool to patch NextGen should the NextGen team lag in applying our fix.

Actions #7

Updated by Boone Gorges 12 months ago

  • Target version changed from 2.1.6 to 2.1.7

Thanks for circling back around, Ray!

I've posted a support thread describing our fix: https://wordpress.org/support/topic/performance-fix-for-photocrati_cache_tracker-transient-cache/

It looks like nextgen-gallery doesn't have an available update for tomorrow's release, so I can't do a real-world test of my wp-cli-cac exclusion. So I'm going to bump this to the next release for a proper test.

Actions #8

Updated by Boone Gorges 12 months ago

  • Status changed from Hold to Resolved
  • Target version changed from 2.1.7 to 2.1.6

Actually, I think I'll close this one and bump #18016 to the next milestone.

Actions #9

Updated by Boone Gorges 12 months ago

  • Related to Bug #18016: ml-slider get_plugins() call causes performance issues added
Actions #10

Updated by Boone Gorges 12 months ago

  • Status changed from Resolved to Hold
  • Target version changed from 2.1.6 to 2.1.7

I'm not sure why I connected this to #18016, which is a totally separate problem :-/

The nextgen-gallery support folks followed up: https://wordpress.org/support/topic/performance-fix-for-photocrati_cache_tracker-transient-cache/#post-16723914

This suggests that, as Ray had originally suggested, we can bypass the entire issue by setting the 'NGG_DISABLE_PHOTOCRATI_CACHE_TRACKER' flag. Ray, does this align with your understanding too?

I still think that the suggested fix should go into the plugin, but it would no longer affect us if we're not using that transient "tracker" feature.

Actions #11

Updated by Raymond Hoh 12 months ago

Ray, does this align with your understanding too?

Yes, this aligns with my understanding.

I think the NextGen team put in the 'NGG_DISABLE_PHOTOCRATI_CACHE_TRACKER' constant as a workaround for https://wordpress.org/support/topic/what-does-photocrati_cache_tracker-do/#post-15246611 , but your isset() solution is the actual fix for the problem.

Although NextGen tried to engineer a solution for clearing their own transient groups for those with an external object cache, they shouldn't really deal with clearing transients in the first place; let WordPress do their job.

Since WordPress already deletes expired transients, I would say that we should definitely do both point 1) and point 3) from what I outlined in the beginning of this ticket: set define( 'NGG_DISABLE_PHOTOCRATI_CACHE_TRACKER', true ) and define( 'NGG_CRON_ENABLED', false ). For the latter constant, the NextGen cron job just deletes their own, expired transients anyway (see https://github.com/cuny-academic-commons/cac/blob/34a3a64d76ed2291289d25c780dcbd13cf472952/wp-content/plugins/nextgen-gallery/non_pope/class.photocrati_transient_manager.php#L165 .) Lastly, we can write a script to remove all of NextGen's 'ngg_custom' cron jobs, which will help out with Cavalcade.

Actions #12

Updated by Boone Gorges 12 months ago

Thanks for confirming all this, Ray.

I've added both the constants to cac-env-config.php on the production site. This means I didn't need to go through version control, which seems appropriate given that the behavior is tied to the use of an environment-specific external cache.

Lastly, we can write a script to remove all of NextGen's 'ngg_custom' cron jobs, which will help out with Cavalcade.

When I run wp cavalcade jobs --hook=ngg_delete_expired_transients, I get 21 items. Any reason I can't just delete them from the Cavalcade jobs table using MySQL? (Maybe followed by a cache flush - I can't remember how/whether Cavalcade integrates with Memcached.) This would save me the 20 minutes to write a script that does it through WP :-D

Actions #13

Updated by Raymond Hoh 12 months ago

When I run wp cavalcade jobs --hook=ngg_delete_expired_transients, I get 21 items. Any reason I can't just delete them from the Cavalcade jobs table using MySQL? (Maybe followed by a cache flush - I can't remember how/whether Cavalcade integrates with Memcached.) This would save me the 20 minutes to write a script that does it through WP :-D

It's better to do it through WP just to be safe :)

Cavalcade does clear their own caches -- https://github.com/cuny-academic-commons/cac/blob/master/wp-content/mu-plugins/cavalcade/inc/connector/namespace.php#L269-L273 -- but there are definitely more than 21 sites that run NextGen, so if we use WP's cron API to remove the task, it will clean up the 'cron' DB option as well.

Anyway, I just ran a WP-CLI script that used wp_clear_scheduled_hook( 'ngg_delete_expired_transients' ) for all the site IDs with NextGen Gallery installed (via the wp_network_plugins table). And also deleted a few stragglers that were still listed in Cavalcade afterwards and we should be good now!

Going to mark this as resolved.

Actions #14

Updated by Boone Gorges 12 months ago

  • Status changed from Hold to Resolved

Thanks, Ray!

Actions

Also available in: Atom PDF