Project

General

Profile

Actions

Bug #9930

closed

wp_privacy_delete_old_export_files runs a bazillion times

Added by Boone Gorges over 6 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority name:
Normal
Assignee:
Category name:
-
Target version:
Start date:
2018-06-14
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

The new privacy features in WP 4.9.6 include a cron job that runs hourly, on each site, to clean up old export files. An hourly run for all 4000 sites on the Commons means that these cron jobs are running nearly constantly.

Realistically, these items almost never need to run. To my knowledge, not a single privacy export has been run on the whole site. I've opened a ticket to improve this behavior in WordPress. https://core.trac.wordpress.org/ticket/44370#ticket

In the absence of a more sophisticated fix upstream, we can safely make a few changes on the Commons:

1. Reschedule all wp_privacy_delete_old_export_files jobs on the Commons so that they run daily (and are spread out) rather than hourly. I'm currently running a script that does just this.
2. Prevent WordPress from scheduling daily privacy exports.

Regarding 2 - the easiest way to do this is to unhook `wp_schedule_delete_old_privacy_export_files()`, but I would like a more general system that we can use for any plugin that tries to schedule an event. I'm going to spend some time trying to build a system, likely by intercepting calls to pre_update_option_cron before Cavalcade kicks in.


Related issues

Related to CUNY Academic Commons - Feature #8987: Migrate away from wp-cronResolvedBoone Gorges2017-12-07

Actions
Related to CUNY Academic Commons - Bug #14299: Remove Site Health scheduled taskResolvedRaymond Hoh2021-04-09

Actions
Related to CUNY Academic Commons - Support #16673: Cloning requestResolvedBoone Gorges2022-08-28

Actions
Related to CUNY Academic Commons - Bug #16889: Broken link checker not workingResolvedBoone Gorges2022-09-24

Actions
Actions #1

Updated by Boone Gorges over 6 years ago

Actions #2

Updated by Boone Gorges over 6 years ago

  • Status changed from New to Resolved

In https://github.com/cuny-academic-commons/cac/commit/0fa94e53bedf9f590b000f713bb29001c5beb10a I introduced a system for intercepting new cron jobs, and I set it so that this specific job can only be scheduled as 'daily'.

I then cleared out all existing jobs manually, and will let WP reschedule them. This is much faster than letting the script do it :)

I think nothing more is needed on this ticket.

Actions #3

Updated by Raymond Hoh over 3 years ago

  • Related to Bug #14299: Remove Site Health scheduled task added
Actions #4

Updated by Raymond Hoh about 2 years ago

I'm taking a closer look at this ticket due to https://redmine.gc.cuny.edu/issues/16673#note-10.

I wanted to see which jobs in the Cavalcade queue were being scheduled the most:

mysql> select hook, count(*) from wp_cavalcade_jobs where status = "waiting" group by hook order by count(*) desc limit 20;
+------------------------------------+----------+
| hook                               | count(*) |
+------------------------------------+----------+
| wp_privacy_delete_old_export_files |     5489 |
| jetpack_clean_nonces               |     5333 |
| wp_scheduled_auto_draft_delete     |     5071 |
| jetpack_v2_heartbeat               |      575 |
| cac_registration_nudge             |      205 |
| et_core_page_resource_auto_clear   |      109 |
| akismet_scheduled_delete           |      104 |
| jetpack_sync_full_cron             |       99 |
| jp_sitemap_cron_hook               |       98 |
| gravityforms_cron                  |       89 |
| jetpack_sync_cron                  |       87 |
| wpseo_permalink_structure_check    |       78 |
| tribe_common_log_cleanup           |       58 |
| tribe_schedule_transient_purge     |       56 |
| tribe_daily_cron                   |       56 |
| wp_scheduled_delete                |       41 |
| delete_expired_transients          |       41 |
| ngg_delete_expired_transients      |       38 |
| nf_optin_cron                      |       37 |
| publish_future_post                |       36 |
+------------------------------------+----------+
20 rows in set (0.08 sec)

And sure enough, the wp_privacy_delete_old_export_files job is on top. We currently run this task on a weekly basis. Perhaps we can be more aggressive here. Maybe we should remove this job completely and schedule it as a single event only if a privacy export actually took place.

Also, maybe we should look at disabling Jetpack on sites that are not even using it.

What do you think, Boone?

Actions #5

Updated by Matt Gold about 2 years ago

To my knowledge, not a single privacy export has been run on the whole site.

It is crazy that WP was running this cron job across the site as a whole so frequently. But just on the point above, I wanted to note the CAC users do occasionally export data. I'd say I see 1-2 exports per semester.

Actions #6

Updated by Boone Gorges about 2 years ago

Thanks for the stats on this, Ray.

Looking at the raw number of pending jobs feels like only a rough approximation of what's causing the backlog. For example, the top three hooks on your list are forced to run only on a weekly schedule https://github.com/cuny-academic-commons/cac/blob/61f78ef5de6423f9214acc17bebfd4da686e9c07/wp-content/mu-plugins/cavalcade.php#L37 Putting those counts together, we have ~16,000 weekly jobs. There are 604,800 seconds in a week, which means that on average one of these jobs is scheduled to run roughly every 37 seconds. Given that the runs take only a few seconds and we have several workers, this shouldn't cause a backlog under normal conditions. If a maintenance window were to shut down the workers for 6 hours, this would create in the neighborhood of 550 or 600 jobs that need to be caught up. Is this enough to create days' worth of backlog, as we're currently seeing?

Here's a different way of identifying potential troublemakers: limiting to those that run every hour.

mysql> select hook, count(*) from wp_cavalcade_jobs where `interval` <= 3600 and status = 'waiting' group by hook order by count(*) desc limit 20;
+--------------------------------+----------+
| hook                           | count(*) |
+--------------------------------+----------+
| jetpack_sync_full_cron         |      164 |
| jetpack_sync_cron              |      146 |
| ngg_delete_expired_transients  |      127 |
| embedly_revalidate_account     |       97 |
| blc_cron_check_links           |       10 |
| take_feed_out                  |        9 |
| group_post_scheduler_cron      |        8 |
| wp_logging_prune_routine       |        8 |
| pull_feed_in                   |        7 |
| tag_groups_check_tag_migration |        3 |
| bpges_health_check             |        1 |
| useyourdrive_synchronize_cache |        1 |
| wpbdp_hourly_events            |        1 |
| bp_rbe_schedule                |        1 |
+--------------------------------+----------+
14 rows in set (0.04 sec)

The top three items here account for ~350 hourly jobs, which is about one every 10 seconds. My hypothesis is that these might be more likely to cause problems than the larger number of 'weekly' jobs. Do I have my logic right? I'll spend a little bit of time with a recent Cavalcade log (/weblog/lw2a/commons/cavalcade/) to see if I can corroborate this theory.

This is not to say it's not worth looking at the 'weekly' jobs, just that we should focus our energies. Your specific suggestion about wp_privacy_delete_old_export_files (removing the scheduled task and scheduling only when an export is performed) seems worth exploring. (See https://core.trac.wordpress.org/ticket/44370.) In contrast, from what I know of the hourly/twicehourly Jetpack sync hooks (see https://github.com/cuny-academic-commons/cac/blob/61f78ef5de6423f9214acc17bebfd4da686e9c07/wp-content/mu-plugins/cavalcade.php#L66), these are likely to be relatively long-running tasks that could gum up the works.

Since our specific problem seems to be related to backlogs, here's a general principle we might explore: When Cavalcade looks for pending tasks, if it finds two or more identical tasks that are both due to be run 'now' or in the past, then it should only run one of them, and the others should be deleted or marked complete. This way, you won't have a case where (for instance) large numbers of 'jetpack_sync_cron' are run on a given site within an hour or two, as the system catches up from downtime.

Actions #7

Updated by Boone Gorges about 2 years ago

Quick follow-up. I downloaded the most recent Cavalcade log, which covers 2022-08-28 03:21:15 through 2022-08-31 10:53:17, and took a look at run counts for the top items from both Ray's list and mine:

2039   wp_privacy_delete_old_export_files 
4192   jetpack_clean_nonces              
1870   wp_scheduled_auto_draft_delete   
12310  jetpack_v2_heartbeat            

21006  jetpack_sync_full_cron         
23338  jetpack_sync_cron             
11810  ngg_delete_expired_transients
10865  embedly_revalidate_account  
13405  blc_cron_check_links       

Based on this, it seems like we should probably be focusing on throttling items that run frequently, rather than those that have the most raw numbers in the jobs table.

This, of course, only mitigates the issue of what happens during backlog catch-ups. For that, I think we might explore the "launch only one job if there's duplicates" strategy, though I don't know how hard it will be.

Actions #8

Updated by Raymond Hoh about 2 years ago

Looking at the raw number of pending jobs feels like only a rough approximation of what's causing the backlog.

You're right, Boone.

I did mention above that we should look at disabling Jetpack on sites that are not using it. That would probably solve some of the problems. I'm quite sure that some sites leave plugins activated, but are not actually using them.

7668 sites on the Commons have Jetpack activated. Here's a SQL query that determines older sites with Jetpack enabled:

mysql> select b.* from wp_blogs as b left join wp_network_plugins as p on b.blog_id = p.site_id where p.plugin = 'jetpack/jetpack.php' order by b.last_updated asc limit 20;      
+---------+---------+-------------------------------------------------+------+---------------------+---------------------+--------+----------+--------+------+---------+---------+
| blog_id | site_id | domain                                          | path | registered          | last_updated        | public | archived | mature | spam | deleted | lang_id |
+---------+---------+-------------------------------------------------+------+---------------------+---------------------+--------+----------+--------+------+---------+---------+
|     191 |       1 | miachen.commons.gc.cuny.edu                     | /    | 2009-11-22 23:55:17 | 2009-11-23 04:55:17 |      0 |        0 |      0 |    0 |       0 |       0 |
|     959 |       1 | inequality.commons.gc.cuny.edu                  | /    | 2012-05-05 11:54:58 | 2012-05-16 12:49:09 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1119 |       1 | anthupdates.commons.gc.cuny.edu                 | /    | 2012-09-06 08:38:34 | 2012-10-02 16:11:28 |     -2 |        0 |      0 |    0 |       0 |       1 |
|     974 |       1 | testestest.commons.gc.cuny.edu                  | /    | 2012-05-24 13:12:19 | 2012-10-10 19:58:45 |      1 |        0 |      0 |    0 |       0 |       1 |
|     984 |       1 | keithmiyake.commons.gc.cuny.edu                 | /    | 2012-06-05 16:32:12 | 2012-12-17 22:31:27 |      1 |        0 |      0 |    0 |       0 |       0 |
|    1083 |       1 | dhdebates12.commons.gc.cuny.edu                 | /    | 2012-08-27 14:23:27 | 2012-12-17 23:03:35 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1140 |       1 | cunygraduatecentergeography.commons.gc.cuny.edu | /    | 2012-09-17 11:04:50 | 2013-01-03 20:24:37 |      1 |        0 |      0 |    0 |       1 |       1 |
|    1394 |       1 | thesocialpaper.commons.gc.cuny.edu              | /    | 2013-03-03 23:36:19 | 2013-03-05 18:56:55 |     -1 |        0 |      0 |    0 |       0 |       1 |
|    1398 |       1 | albertocordero.commons.gc.cuny.edu              | /    | 2013-03-07 16:18:24 | 2013-03-13 18:57:56 |      1 |        0 |      0 |    0 |       0 |       0 |
|    1427 |       1 | mendis.commons.gc.cuny.edu                      | /    | 2013-03-21 15:07:47 | 2013-03-21 19:29:57 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1428 |       1 | isphandbook.commons.gc.cuny.edu                 | /    | 2013-03-21 15:34:17 | 2013-03-21 19:43:13 |      1 |        0 |      0 |    0 |       0 |       1 |
|     964 |       1 | chesn64.commons.gc.cuny.edu                     | /    | 2012-05-06 22:22:44 | 2013-04-30 15:24:08 |     -3 |        0 |      0 |    0 |       0 |       1 |
|    1458 |       1 | m501.commons.gc.cuny.edu                        | /    | 2013-04-13 16:41:54 | 2013-05-18 02:13:54 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1323 |       1 | xphi.commons.gc.cuny.edu                        | /    | 2013-02-07 15:43:34 | 2013-06-09 14:18:18 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1506 |       1 | eshtestcac.commons.gc.cuny.edu                  | /    | 2013-05-24 10:09:26 | 2013-06-13 04:52:25 |     -2 |        0 |      0 |    0 |       1 |       1 |
|    1105 |       1 | selliesessions.commons.gc.cuny.edu              | /    | 2012-08-30 11:28:21 | 2013-06-17 20:28:10 |     -3 |        0 |      0 |    0 |       0 |       1 |
|    1482 |       1 | philnews.commons.gc.cuny.edu                    | /    | 2013-05-09 10:05:20 | 2013-07-10 17:32:02 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1522 |       1 | arctest1.commons.gc.cuny.edu                    | /    | 2013-06-21 11:46:35 | 2013-08-10 22:10:08 |     -2 |        0 |      0 |    0 |       0 |       1 |
|    1043 |       1 | preludenyc12.commons.gc.cuny.edu                | /    | 2012-07-16 18:08:59 | 2013-08-25 20:43:03 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1590 |       1 | brianmoen.commons.gc.cuny.edu                   | /    | 2013-09-02 18:17:34 | 2013-09-10 17:28:09 |      1 |        0 |      0 |    0 |       0 |       1 |
+---------+---------+-------------------------------------------------+------+---------------------+---------------------+--------+----------+--------+------+---------+---------+
20 rows in set (0.09 sec)                                                                                                                                                        

After briefly looking at some of those sites, Jetpack could probably be deactivated there, but it does require analyzing each site, which would take time.

For some easy wins, of the other offenders, only 306 sites use Nextgen Gallery, 238 sites use Embedly and 138 sites use Broken Link Checker. Of the three, Broken Link Checker seems to be a good candidate to look at disabling as dormant sites have no need to check for broken links as often.

Here's a list of the top 20 oldest sites with Broken Link Checker enabled:

mysql> select b.* from wp_blogs as b left join wp_network_plugins as p on b.blog_id = p.site_id where p.plugin like 'broken-link%' order by b.last_updated asc limit 20;
+---------+---------+-------------------------------------------------+------+---------------------+---------------------+--------+----------+--------+------+---------+---------+
| blog_id | site_id | domain                                          | path | registered          | last_updated        | public | archived | mature | spam | deleted | lang_id |
+---------+---------+-------------------------------------------------+------+---------------------+---------------------+--------+----------+--------+------+---------+---------+
|     894 |       1 | filmconference.commons.gc.cuny.edu              | /    | 2012-01-31 13:32:27 | 2012-05-10 16:25:14 |      1 |        0 |      0 |    0 |       0 |       1 |
|     984 |       1 | keithmiyake.commons.gc.cuny.edu                 | /    | 2012-06-05 16:32:12 | 2012-12-17 22:31:27 |      1 |        0 |      0 |    0 |       0 |       0 |
|    1140 |       1 | cunygraduatecentergeography.commons.gc.cuny.edu | /    | 2012-09-17 11:04:50 | 2013-01-03 20:24:37 |      1 |        0 |      0 |    0 |       1 |       1 |
|    1480 |       1 | cunytrio.commons.gc.cuny.edu                    | /    | 2013-05-03 17:58:37 | 2013-05-03 21:58:40 |     -3 |        0 |      0 |    0 |       0 |       1 |
|    1491 |       1 | librarians.commons.gc.cuny.edu                  | /    | 2013-05-17 17:01:37 | 2013-05-26 06:03:07 |      1 |        0 |      0 |    0 |       1 |       1 |
|    1506 |       1 | eshtestcac.commons.gc.cuny.edu                  | /    | 2013-05-24 10:09:26 | 2013-06-13 04:52:25 |     -2 |        0 |      0 |    0 |       1 |       1 |
|    1512 |       1 | digrants.commons.gc.cuny.edu                    | /    | 2013-06-07 13:15:52 | 2013-06-25 23:07:38 |     -2 |        0 |      0 |    0 |       1 |       1 |
|     973 |       1 | zeteonq.commons.gc.cuny.edu                     | /    | 2012-05-23 17:16:42 | 2013-10-11 17:19:24 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1054 |       1 | williameaton.commons.gc.cuny.edu                | /    | 2012-08-09 15:39:40 | 2013-11-25 18:44:48 |      1 |        0 |      0 |    0 |       1 |       1 |
|    1115 |       1 | zetwork.commons.gc.cuny.edu                     | /    | 2012-09-02 22:01:30 | 2013-12-14 22:30:05 |     -2 |        0 |      0 |    0 |       0 |       1 |
|    1035 |       1 | internetresearchteam.commons.gc.cuny.edu        | /    | 2012-07-05 13:02:23 | 2014-02-26 20:56:29 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1800 |       1 | sicelidas.commons.gc.cuny.edu                   | /    | 2014-02-25 20:09:43 | 2014-03-12 23:50:42 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1822 |       1 | lkaltsite.commons.gc.cuny.edu                   | /    | 2014-03-11 16:27:06 | 2014-03-26 04:00:38 |      1 |        0 |      0 |    0 |       0 |       1 |
|    1801 |       1 | fsautman.commons.gc.cuny.edu                    | /    | 2014-02-25 20:10:58 | 2014-05-15 15:16:15 |     -2 |        0 |      0 |    0 |       0 |       1 |
|    1872 |       1 | bethanyholmstrom.commons.gc.cuny.edu            | /    | 2014-05-02 15:34:22 | 2014-09-11 11:36:39 |      1 |        0 |      0 |    0 |       1 |       1 |
|    1665 |       1 | anderswallace.commons.gc.cuny.edu               | /    | 2013-10-24 19:00:30 | 2014-10-23 21:34:11 |     -3 |        0 |      0 |    0 |       0 |       1 |
|    1825 |       1 | myblogsite.commons.gc.cuny.edu                  | /    | 2014-03-11 18:43:59 | 2015-03-18 02:53:42 |     -2 |        0 |      0 |    0 |       0 |       1 |
|    1908 |       1 | sciart.commons.gc.cuny.edu                      | /    | 2014-06-10 13:23:52 | 2015-03-27 20:39:51 |      1 |        0 |      0 |    0 |       0 |       1 |
|    2097 |       1 | acertdev.commons.gc.cuny.edu                    | /    | 2014-10-23 09:03:17 | 2016-03-14 21:19:15 |      1 |        0 |      0 |    0 |       0 |       1 |
|    2660 |       1 | slaldev.commons.gc.cuny.edu                     | /    | 2016-02-24 15:53:12 | 2016-04-20 17:28:44 |      0 |        0 |      0 |    0 |       0 |       1 |
+---------+---------+-------------------------------------------------+------+---------------------+---------------------+--------+----------+--------+------+---------+---------+
20 rows in set (0.01 sec)                                                                                                                                                    

About throttling jobs, the embedly_revalidate_account job can probably be switched from hourly to daily. I'm not sure what nextgen-gallery is doing with transients that requires it to purge them all the time.

When Cavalcade looks for pending tasks, if it finds two or more identical tasks that are both due to be run 'now' or in the past, then it should only run one of them, and the others should be deleted or marked complete. This way, you won't have a case where (for instance) large numbers of 'jetpack_sync_cron' are run on a given site within an hour or two, as the system catches up from downtime.

The alternative is to change how Cavalcade reschedules recurring jobs. Currently, Cavalcade uses the older nextrun value + the interval to reschedule tasks. See https://github.com/humanmade/Cavalcade-Runner/issues/51. (We've both commented in that thread before!) Perhaps we can try the approach by edubacco here or in this PR.

Actions #9

Updated by Boone Gorges about 2 years ago

Ray, thanks as always for the thoughts.

Messing with individual sites seems like a hopeless task. As you note, we can't do this in bulk - we would need to go through each individual site. Even in cases like broken-link-checker, where you're correct that it's pointless to allow the scan when the site is never updated, we'd need to worry about cases where the site did in fact become active again. I think that we should focus on global approaches.

The alternative is to change how Cavalcade reschedules recurring jobs. Currently, Cavalcade uses the older nextrun value + the interval to reschedule tasks. See https://github.com/humanmade/Cavalcade-Runner/issues/51. (We've both commented in that thread before!) Perhaps we can try the approach by edubacco here or in this PR.

I like the approach in https://github.com/humanmade/Cavalcade-Runner/issues/51, but the comment that closed it https://github.com/humanmade/Cavalcade-Runner/pull/57#issuecomment-694980100 shows a lack of understanding for the underlying issue, which is not linked to UTC. In any case, given the current situation at the GC, I think we need to come up with a fix that doesn't involve forking or modifying the runner (which we don't have access to). I don't see any available hooks in the job runner class https://github.com/humanmade/Cavalcade-Runner/blob/master/inc/class-job.php.

But there is a hook when fetching the next job: https://github.com/humanmade/Cavalcade-Runner/blob/0dfb42d505e9cd870a11366c49ee680d327c961a/inc/class-runner.php#L257 In fact, we already have a callback for the get_next_job.job hook, which we use for on-the-fly environment configuration of Cavalcade:https://github.com/cuny-academic-commons/cac/blob/03f287df485e77ade6dd7552914e7b50ff893d5f/wp-config.php#L24 I wonder if we could do something like this in that callback:

\HM\Cavalcade\Runner\Runner::instance()->hooks->register( 'Runner.get_next_job.job', function( $job ) {
  // ...

  $query = "SELECT * FROM wp_cavalcade_jobs";
  $query .= ' WHERE nextrun < NOW() AND status = "waiting"' AND hook = :hook' AND site = :site AND args = :args';
  $query .= ' ORDER BY nextrun ASC';

  // Not sure how to get Cavalcade's $db object inside of a hook callback but we could figure something out
  $statement = $db->prepare( $query );
  $statement->bindValue( ':hook', $job->hook );
  $statement->bindValue( ':site', $job->site );
  $statement->bindValue( ':args', $job->args ); // This might need serialization or something
  $statement->execute();

  $job_instances = $statement->fetchAll();
  if ( count( $job_instances ) === 1 ) {
    return $job;
  }

  foreach ( $job_instances as $job_instance ) {
    // Only run the oldest one.
    if ( $job->id === $job_instance->id ) {
      continue;
    }

    $delete_query = "DELETE FROM wp_cavalcade_jobs WHERE id = :id";
    // etc
  }

  return $job;
} );

So basically this is the suggestion in the final paragraph of https://redmine.gc.cuny.edu/issues/9930#note-6 - not because it's ideal (would be better not to allow Cavalcade to schedule the dupes) but because it's something we can do with hooks. What do you think?

In the meantime, I'm going to do some more throttling:

1. 'embedly_revalidate_account' should only run once per week https://github.com/cuny-academic-commons/cac/commit/c4568c8aeb9ba5503d905fa6ed2e68e8fdf53f0a Deployed, and also updated all existing jobs to reflect the new interval/schedule.
2. 'blc_cron_check_links' should only run once per hour instead of every 10 minutes https://github.com/cuny-academic-commons/cac/commit/18178f6a4bdef3cfba21ef22f136f5b254b58950 Deployed, and also updated existing jobs.
3. I'm making the executive decision that NGG's transient-expiration is not very necessary, and bumping it down to once per week. https://github.com/cuny-academic-commons/cac/commit/8860c28220f628edfb31686bb1eabbdec2ccf719 Deployed and updated existing jobs.
4. In https://github.com/cuny-academic-commons/cac/commit/5553257e481153d066dfd9684e5515c57d6da06e we changed Jetpack's cron sync schedules to twice hourly instead of every 5 minutes. Can we bump this up even further? Maybe every 3 hours? This move alone would dramatically decrease scheduled tasks across the network.

Actions #10

Updated by Boone Gorges about 2 years ago

Looking over the linked GitHub thread, I think I was somewhat in error in my previous comment. After downtime, the problem is not that there are many instances of a given task in wp_cavalcade_jobs. It's that Cavalcade reschedules missed tasks in such a way that they can be run many times.

As such, we can implement a much easier maneuver, which tricks Cavalcade into only scheduling tasks in the future. See https://github.com/cuny-academic-commons/cac/commit/559bca8273f80b0ac495ab2632ebf76be2c7bcd3. Ray, do you think this captures the essence of the backlog issue? Can you imagine negative side-effects of refusing to reschedule tasks in the past?

Actions #11

Updated by Boone Gorges about 2 years ago

Actions #12

Updated by Boone Gorges about 2 years ago

Note - I'll be shipping the commit from comment 10 as part of the 2.0.7 release. See #16673.

Actions #13

Updated by Boone Gorges about 2 years ago

  • Related to Bug #16889: Broken link checker not working added
Actions #14

Updated by Raymond Hoh almost 2 years ago

As such, we can implement a much easier maneuver, which tricks Cavalcade into only scheduling tasks in the future. See https://github.com/cuny-academic-commons/cac/commit/559bca8273f80b0ac495ab2632ebf76be2c7bcd3.

After coming here from #17476, I believe this commit is the problem causing the double interval rescheduling issue.

There are two issues:

1. In that commit, we do the following check before setting the nextrun value again:

if ( strtotime( $job->nextrun ) < time() )

The problem with this conditional is Cavalcade can sometimes lag behind by a few minutes to work through the scheduled tasks queue.

For instance as of right now, Cavalcade is lagging behind by close to a minute so that is why most recurring tasks are getting the extra interval added for their next scheduled run. We should add a threshold before changing the nextrun time. Maybe a 5-minute or 10-minute threshold would be enough.

2. We shouldn't be adding an interval to the nextrun value here because the Cavalcade Runner will add the interval again:
https://github.com/humanmade/Cavalcade-Runner/blob/0dfb42d505e9cd870a11366c49ee680d327c961a/inc/class-job.php#L73-L74, https://github.com/humanmade/Cavalcade-Runner/blob/0dfb42d505e9cd870a11366c49ee680d327c961a/inc/class-job.php#L87-L89. This is the main cause of the double interval rescheduling.

What do you think of the following, Boone?

/*
 * Reschedule 'nextrun' to now if nextrun + 5mins is less than current time.
 *
 * Setting the 'nextrun' value to now will allow Cavalcade to add the task's
 * interval later in Job::mark_completed():
 * https://github.com/humanmade/Cavalcade-Runner/blob/0dfb42d505e9cd870a11366c49ee680d327c961a/inc/class-job.php#L73-L74
 */
if ( ( strtotime( $job->nextrun ) + 300 ) < time() ) {
    $interval = $job->interval;
    if ( $interval ) {
        $date = new DateTime( 'now', new DateTimeZone( 'UTC' ) );
        $job->nextrun = $date->format( 'Y-m-d H:i:s' );
    }
}
Actions #15

Updated by Boone Gorges almost 2 years ago

Good catch, Ray! My objective here was to make sure jobs didn't overlap, and I didn't test to see whether I'd overcorrected :-D Your suggested fix looks good to me - please go ahead and implement.

Actions #16

Updated by Boone Gorges almost 2 years ago

Hi Ray - During today's release, I noticed that the DateInterval line was commented-out in wp-config.php, but that the change had not been committed to the repo. Since it's been in production for a while, could you look to see whether event scheduling seems to be going as expected, and if so, commit that change?

Actions #17

Updated by Raymond Hoh almost 2 years ago

Since it's been in production for a while, could you look to see whether event scheduling seems to be going as expected, and if so, commit that change?

Hi Boone, yes, I was testing things out on production because I needed a larger number of jobs for testing.

Commenting out the interval line appears to have fixed the double interval problem. I have committed the fix in https://github.com/cuny-academic-commons/cac/commit/37778e71966b098e5b17ed39375dbf3ae9e61469. This is also deployed on production.

I might experiment with the time check a bit later on.

Actions

Also available in: Atom PDF