Project

General

Profile

Actions

Bug #17476

closed

Broken Link Checker not sending notifications of broken links

Added by Syelle Graves about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority name:
Normal
Assignee:
Category name:
WordPress Plugins
Target version:
Start date:
2023-01-11
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

Hi team,

Something is strange on our ILETC site: the broken link checker says that it has found no broken links, and even stranger, in tools > broken links > all links > status, nearly all of the links say “not checked”, and under Details > Link last checked, all unchecked links say Never.

I looked through plugins > broken link checker > settings > Check each link is set to every 72 hours; Advanced > Link monitor is set to Run continuously and Hourly.

And I manually noticed at least one broken link, located on this page: https://iletc.commons.gc.cuny.edu/materials-resources/heritage-telecollaboration/ht-modules/ht-spanish-modules/ht-module-urban-latinx-communities/ >
The link that breaks is https://peru.com/mundo-latino/us-news/nueva-york-conoce-huerta-urbana-latinos-brooklyn-noticia-380331-1194851/.
In addition, I don’t see the broken peru.com link at all in tools > broken links, which could be a separate issue/related to the expandable menu items in the pages, perhaps?

We can manually run the broken link checker on all the links and see if that works, but is it possible to find out why the plugin is not checking them automatically? I will wait to run the checker manually so you can see dashboard the way I found it. I also included a screenshot, just in case.

Thank you!


Files

broken link checker screenshot.png (149 KB) broken link checker screenshot.png Syelle Graves, 2023-01-11 05:03 PM

Related issues

Related to CUNY Academic Commons - Bug #17655: Broken Link Checker improperly scanning unpublished postsNewBoone Gorges2023-02-13

Actions
Actions #1

Updated by Boone Gorges about 1 year ago

Previously, we've had to intervene in the way that broken-link-checker runs its automated checks. See #16889, #9865.

For the moment, I'd recommend that you run the check manually. See if it works. If so, then we will only have one thing to investigate - the scheduled task. If it doesn't work when run manually (as I suspect it may not; we've had problems with this plugin before) then we may have to investigate elsewhere before looking into the scheduled task.

Actions #2

Updated by Syelle Graves about 1 year ago

Thanks, Boone. Rechecking as a bulk action does not seem to work (unless it's just slow?). Rechecking each link one at a time does seem to work. I did this for about 10 links.

However, I still don't see the one broken link I found in the list at all (linked above). The list has 167 links, and I suspect that our site has substantially more.

I'm going to leave the rest of the links alone to see if the bulk action takes effect over the next couple of days (the ones I “re”checked by hand are mostly the very first in the list of 167). Also strange: when I click on “details” of one of the links that I manually “re”checked, it says it was last checked in September of 2020… So, I'm not sure what to test next.

Actions #3

Updated by Boone Gorges about 1 year ago

I think that what's happening here might be similar to what was happening in #16889. See esp https://redmine.gc.cuny.edu/issues/16889#note-4. Namely, BLC imposes limits on the amount of system resources it uses when crawling links on your site. Because the Commons is a large and complex site, the plugin's calculations, which might make sense on a single WP site, don't work as well.

As a test, I've temporarily nudged up the resources allocated to the BLC, and we'll see if this allows it more space to do its work. (I think it might - https://iletc.commons.gc.cuny.edu/wp-admin/tools.php?page=view-broken-links&filter_id=all already shows 419 links instead of the 167 you referenced in your message.) Please circle around in the next 24 hours and see if it's working in a way that you'd expect.

Actions #4

Updated by Syelle Graves about 1 year ago

Thank you!

Today, it still shows the same 419 links you saw, and while it has now found four broken ones, so there is improvement, many links still show Status not checked/last checked Never.

Could even more resources be allocated to the BLC? Or is there something else we can do?

Actions #5

Updated by Boone Gorges about 1 year ago

I'm guessing that 419 might, in fact, be the total number of external links on your site. Does that seem feasible to you?

As for the link checking itself, it doesn't appear that the resource limit is what's causing them not to be scanned. In addition, it looks like the scheduled task is indeed running (though perhaps Ray can confirm, as he has a better understanding of how to interpret what's happening in the task runner than I do). So I'm not sure exactly what's happening.

For the time being, could I suggest that you do this manually? At https://iletc.commons.gc.cuny.edu/wp-admin/tools.php?page=view-broken-links&filter_id=all, check the boxes next to links that are labeled 'Not checked'. Then, from the Bulk Actions dropdown at the top/bottom of the page, choose Recheck. This will at least get you a baseline. In the future, our team may be able to find the time to devote to further debugging of the background task, but in the meantime this technique will let you verify the links.

Actions #6

Updated by Syelle Graves about 1 year ago

Boone Gorges wrote in #note-5:

I'm guessing that 419 might, in fact, be the total number of external links on your site. Does that seem feasible to you?

Hm. I can't think of a way to estimate this.

For the time being, could I suggest that you do this manually? At https://iletc.commons.gc.cuny.edu/wp-admin/tools.php?page=view-broken-links&filter_id=all, check the boxes next to links that are labeled 'Not checked'. Then, from the Bulk Actions dropdown at the top/bottom of the page, choose Recheck. This will at least get you a baseline.

I knew how to recheck the links manually, but I had left them unchecked so that you could see the status as it was.
After your last ticket update, I bulk-rechecked the links several times, but many dozens of links still remain unchecked (scroll down to the bottom of the list).

In the meantime, unfortunately, we also aren't getting any e-mail notifications of broken links, even though that setting is switched on. Incidentally, I'm having the same problem on another Commons site--I stumbled across many broken links on my site of which I wasn't aware, because no e-mail notifications are going out https://syellegraves.commons.gc.cuny.edu/.

In the future, our team may be able to find the time to devote to further debugging of the background task, but in the meantime this technique will let you verify the links.

Is there a way to get a rough estimate on the possible timing of getting this resolved, or, is there an alternative option? Automated broken link checking and notification functionality is critical for all of our Commons sites. I have been sharing site pages and then finding out that most links on certain pages are broken, which causes some confusion and embarrassment, because we have no other way to know when links break, which they do all the time.

Thank you!

Actions #7

Updated by Raymond Hoh about 1 year ago

In addition, it looks like the scheduled task is indeed running (though perhaps Ray can confirm, as he has a better understanding of how to interpret what's happening in the task runner than I do).

I just checked the Broken Link Checker scheduled tasks for the ILETC site. Here are the results:

mysql> select * from wp_cavalcade_jobs where site = 1185 and hook like 'blc_%';
+---------+------+-------------------------------+--------+---------------------+---------------------+----------+----------+---------+
| id      | site | hook                          | args   | start               | nextrun             | interval | schedule | status  |
+---------+------+-------------------------------+--------+---------------------+---------------------+----------+----------+---------+
| 1832971 | 1185 | blc_cron_email_notifications  | a:0:{} | 2022-08-30 19:30:26 | 2023-01-28 03:23:02 |    86400 | daily    | waiting |
| 1832972 | 1185 | blc_cron_database_maintenance | a:0:{} | 2022-08-30 19:30:26 | 2023-01-28 03:23:11 |    86400 | daily    | waiting |
| 1834520 | 1185 | blc_cron_check_links          | a:0:{} | 2022-08-30 20:24:42 | 2023-01-27 08:50:49 |     3600 | weekly   | waiting |
+---------+------+-------------------------------+--------+---------------------+---------------------+----------+----------+---------+

The nextrun column is in GMT. Disregard the schedule column, the interval column is the accurate one. So it looks like the "blc_cron_check_links" scheduled task runs hourly and the email notifications task runs daily.

However, I just looked into the logs for the "blc_cron_check_links" scheduled task and it appears to run every two hours instead of every hour:

mysql> select * from wp_cavalcade_logs where job = 1834520 and timestamp > '2023-01-27 00:00:00';
+----------+---------+-----------+---------------------+---------+
| id       | job     | status    | timestamp           | content |
+----------+---------+-----------+---------------------+---------+
| 95857686 | 1834520 | completed | 2023-01-27 00:48:38 |         |
| 95864160 | 1834520 | completed | 2023-01-27 02:49:19 |         |
| 95870565 | 1834520 | completed | 2023-01-27 04:50:04 |         |
| 95877068 | 1834520 | completed | 2023-01-27 06:50:54 |         |
+----------+---------+-----------+---------------------+---------+

I also checked the Broken Link Checker email notifications task and it is running every two days instead of daily:

mysql> select * from wp_cavalcade_logs where job = 1832971;
+----------+---------+-----------+---------------------+---------+
| id       | job     | status    | timestamp           | content |
+----------+---------+-----------+---------------------+---------+
| 94835027 | 1832971 | completed | 2023-01-14 03:09:30 |         |
| 94987275 | 1832971 | completed | 2023-01-16 03:10:58 |         |
| 95152106 | 1832971 | completed | 2023-01-18 03:15:46 |         |
| 95315872 | 1832971 | completed | 2023-01-20 03:17:21 |         |
| 95471568 | 1832971 | completed | 2023-01-22 03:18:56 |         |
| 95631113 | 1832971 | completed | 2023-01-24 03:21:16 |         |
| 95787682 | 1832971 | completed | 2023-01-26 03:23:07 |         |
+----------+---------+-----------+---------------------+---------+
7 rows in set (0.02 sec)

To see if this was just specific to Broken Link Checker or not, I checked some logs for some other scheduled tasks and I can confirm that those scheduled tasks were also reoccurring at double the interval. There appears to be a rescheduling bug occurring somewhere in our scheduled task runner, Cavalcade. Boone, I remember we made some changes to the Cavalcade runner to deal with GMT. I can't find the appropriate ticket at the moment, but that's the first thing that popped into my head. I'll do some more debugging tomorrow once I have a fresh pair of eyes.

Is there a way to get a rough estimate on the possible timing of getting this resolved, or, is there an alternative option?

Syelle, while we look into this problem, you could consider using an external tool to check for broken links. Here are some suggestions as outlined in this blog article: https://ninjareports.com/broken-link-checkers-for-seo/.

Actions #8

Updated by Boone Gorges about 1 year ago

Thanks for looking into this, Ray. Here are some previous tickets related to broken-link-checker, but I don't think they explain the scheduling issue: #9930, #9865

In any case, even if BLC jobs are running half as frequently as they should, it doesn't seem like that fact would explain the behavior that Syelle is seeing. In Syelle's case, some expected events are never taking place at all.

Is there a way to get a rough estimate on the possible timing of getting this resolved, or, is there an alternative option?

Syelle, I'll spend some time today determining whether this is a problem that we are capable of solving. From your report, the broken-link-checker plugin is broken in a number of distinct ways. It may well be the case that the Commons doesn't have the resources to devote to fixing it. I'll report back when I've collected some more information.

Actions #9

Updated by Boone Gorges about 1 year ago

I've spent some time doing an initial investigation.

1. Syelle mentioned that a large number of links on https://iletc.commons.gc.cuny.edu/wp-admin/tools.php?page=view-broken-links&filter_id=all remained unchecked. When I checked this list, no items were listed as unchecked. This makes me suspect that the bulk action merely adds the links to the queue, and the links are then checked in the minutes/hours that follow. The UI here is not super clear - it'd be nice if, after you ran the bulk action, there was an indication that the item was queued for a future check. But it does seem as if the plugin got around to checking them.

2. The temporary modification I'd put in place to increase resources for BLC was written in such a way that it didn't properly apply to https://syellegraves.commons.gc.cuny.edu/. As such, BLC was usually not running on that sit, because it didn't think there was enough CPU available. I've adjusted my override and is no longer bailing early on that site.

3. I wanted to test on a site we know to be experiencing issues, so I used https://syellegraves.commons.gc.cuny.edu/ (hope that's okay, Syelle). I changed the settings at https://syellegraves.commons.gc.cuny.edu/wp-admin/options-general.php?page=link-checker-settings so that I would receive emails of failed links. Then on 'Look for links in' I checked the 'Private' box. Finally, I created a test page - private, so no one else could see it - with a good link and a bad link: https://syellegraves.commons.gc.cuny.edu/wp-admin/post.php?post=1277&action=edit&classic-editor I also enabled logging so that I could watch what BLC was doing. I found that (a) BLC immediately added the new page to what it calls its "synch" queue; (b) after a few seconds, BLC had reviewed the content of the page and identified the two links; (c) after about a minute, BLC had scanned both links and correctly identified one as bad and one as good.

4. After all of this, emails were not immediately sent out. But I saw that BLC sends emails on a cron job, probably so that it doesn't inundate admins with broken-link messages. (A sort of "digest".) As Ray notes above, the blc_cron_email_notifications job is set to run once daily, though in fact it's been running every two days. I manually triggered the cron event, and I successfully received emails about the broken links. This made me wonder whether perhaps the emails on https://iletc.commons.gc.cuny.edu/ were indeed being sent, but that it wasn't happening for several days. Syelle, perhaps you can confirm this one way or another.

So, to summarize, I think that most parts of the plugin are working properly (scanning posts for links, checking links to see if they're broken, emailing users about broken links) within the oddities of BLC in the Commons environment (the "resource" override I've now got in place across the network, the double-interval of the scheduled tasks).

The remaining item that I haven't been able to debug is the regularly-scheduled scanning of existing content for bad links (in contrast to scanning new or updated content, which happens more or less immediately). It appears that the plugin tries to do this roughly once every 72 hours, though this time frame may in fact be more than a week given the scheduling of the blc_cron_check_links task.

A few proposed next steps:

- Syelle, I would like to trigger the blc_cron_email_notification job for iletc.commons.gc.cuny.edu. This should trigger an email notification about broken links, which according to the current site configuration, will be sent to . (BLC has a "Notification e-mail address" setting, but it falls back on the Settings > General email address when it's empty, which it currently is https://iletc.commons.gc.cuny.edu/wp-admin/options-general.php?page=link-checker-settings). Are you the administrator of this email address? If not, can we change the setting on https://iletc.commons.gc.cuny.edu/wp-admin/options-general.php?page=link-checker-settings to point to your inbox so that we can do a proper test?

- BLC has a setting that allows us to wipe out its log of detected links. This should cause BLC to see all content on the site as new, triggering a rescan of all content and a recheck of every link. It may be helpful to perform this reset on https://iletc.commons.gc.cuny.edu/ so that, over the course of a few days, we can properly monitor (a) whether links are being detected in posts and added to the link queue, (b) whether links are being checked from the queue, and (c) whether the admin is receiving emails about the links.

Actions #10

Updated by Raymond Hoh about 1 year ago

Thanks for looking into this, Ray. Here are some previous tickets related to broken-link-checker, but I don't think they explain the scheduling issue: #9930, #9865

Thanks for referencing #9930. I believe this is the cause of the double interval scheduling issue. I'll list some thoughts in that ticket.

Actions #11

Updated by Syelle Graves about 1 year ago

Boone Gorges wrote in #note-9:

I've spent some time doing an initial investigation.

1. Syelle mentioned that a large number of links on https://iletc.commons.gc.cuny.edu/wp-admin/tools.php?page=view-broken-links&filter_id=all remained unchecked. When I checked this list, no items were listed as unchecked. This makes me suspect that the bulk action merely adds the links to the queue, and the links are then checked in the minutes/hours that follow. The UI here is not super clear - it'd be nice if, after you ran the bulk action, there was an indication that the item was queued for a future check. But it does seem as if the plugin got around to checking them.

2. The temporary modification I'd put in place to increase resources for BLC was written in such a way that it didn't properly apply to https://syellegraves.commons.gc.cuny.edu/. As such, BLC was usually not running on that sit, because it didn't think there was enough CPU available. I've adjusted my override and is no longer bailing early on that site.

3. I wanted to test on a site we know to be experiencing issues, so I used https://syellegraves.commons.gc.cuny.edu/ (hope that's okay, Syelle).

*Yup, fine! *

I changed the settings at https://syellegraves.commons.gc.cuny.edu/wp-admin/options-general.php?page=link-checker-settings so that I would receive emails of failed links. Then on 'Look for links in' I checked the 'Private' box. Finally, I created a test page - private, so no one else could see it - with a good link and a bad link: https://syellegraves.commons.gc.cuny.edu/wp-admin/post.php?post=1277&action=edit&classic-editor I also enabled logging so that I could watch what BLC was doing. I found that (a) BLC immediately added the new page to what it calls its "synch" queue; (b) after a few seconds, BLC had reviewed the content of the page and identified the two links; (c) after about a minute, BLC had scanned both links and correctly identified one as bad and one as good.

4. After all of this, emails were not immediately sent out. But I saw that BLC sends emails on a cron job, probably so that it doesn't inundate admins with broken-link messages. (A sort of "digest".) As Ray notes above, the blc_cron_email_notifications job is set to run once daily, though in fact it's been running every two days. I manually triggered the cron event, and I successfully received emails about the broken links. This made me wonder whether perhaps the emails on https://iletc.commons.gc.cuny.edu/ were indeed being sent, but that it wasn't happening for several days. Syelle, perhaps you can confirm this one way or another.

*I did indeed receive one email notification of one broken link on syellegraves.commons for the first time ever, on 11/27/23 at 11:13 AM. *

*However, the e-mail did not include the broken one you put in the test page you made, I believe because you did not turn on the settings to have the checker check privately published pages in dashboard, plugins, BLC, settings, look for links in, Post statuses? You could change that setting if you want to test it out, and I'll report any notifications that go out (or I could test that, if you wanted?). Apologies if you already did that and put the setting back. *

So, to summarize, I think that most parts of the plugin are working properly (scanning posts for links, checking links to see if they're broken, emailing users about broken links) within the oddities of BLC in the Commons environment (the "resource" override I've now got in place across the network, the double-interval of the scheduled tasks).

The remaining item that I haven't been able to debug is the regularly-scheduled scanning of existing content for bad links (in contrast to scanning new or updated content, which happens more or less immediately). It appears that the plugin tries to do this roughly once every 72 hours, though this time frame may in fact be more than a week given the scheduling of the blc_cron_check_links task.

A few proposed next steps:

- Syelle, I would like to trigger the blc_cron_email_notification job for iletc.commons.gc.cuny.edu. This should trigger an email notification about broken links, which according to the current site configuration, will be sent to . (BLC has a "Notification e-mail address" setting, but it falls back on the Settings > General email address when it's empty, which it currently is https://iletc.commons.gc.cuny.edu/wp-admin/options-general.php?page=link-checker-settings). Are you the administrator of this email address?

*Yes, I'm in that email account every day.

As of today, we still have zero BLC email notifications for the iletc.commons site.

By the way, if you want a third site to test, https://cilc.commons.gc.cuny.edu/ has lotsa broken links that we are actually holding off on updating, and zero email notifications (and I'm admin of that email, too).*

- BLC has a setting that allows us to wipe out its log of detected links. This should cause BLC to see all content on the site as new, triggering a rescan of all content and a recheck of every link. It may be helpful to perform this reset on https://iletc.commons.gc.cuny.edu/ so that, over the course of a few days, we can properly monitor (a) whether links are being detected in posts and added to the link queue, (b) whether links are being checked from the queue, and (c) whether the admin is receiving emails about the links.

*That sounds fine to me. *

Actions #12

Updated by Syelle Graves about 1 year ago

Raymond Hoh wrote in #note-7:

Is there a way to get a rough estimate on the possible timing of getting this resolved, or, is there an alternative option?

Syelle, while we look into this problem, you could consider using an external tool to check for broken links. Here are some suggestions as outlined in this blog article: https://ninjareports.com/broken-link-checkers-for-seo/.

Thank you, Ray--we could try one of these in a pinch, and I appreciate the link.
I'll just add that the automated components of the BLC (checking regularly on a schedule without us telling it to and the e-mail notifications) are the most critical aspects of the functionality. All of us manage content and subscribe to new content in hundreds of places, and we have no way to check everything manually in lieu of being notified by e-mail when there is an issue. Would it make sense for the development team to look into alternative plugins? I see this one https://wordpress.org/plugins/broken-link-finder/, but it’s not up to the current version of WP, and it looks like the e-mail notifications are only available with the premium version. And it seems that there aren't other plugins that do the same thing…!

Actions #13

Updated by Boone Gorges about 1 year ago

On Saturday I ran the "reset" tool on https://iletc.commons.gc.cuny.edu/ and it appears that the rescan worked properly since then. See https://iletc.commons.gc.cuny.edu/wp-admin/tools.php?page=view-broken-links.

Syelle, did you receive any email notifications at ?

Actions #14

Updated by Syelle Graves about 1 year ago

Boone Gorges wrote in #note-13:

On Saturday I ran the "reset" tool on https://iletc.commons.gc.cuny.edu/ and it appears that the rescan worked properly since then. See https://iletc.commons.gc.cuny.edu/wp-admin/tools.php?page=view-broken-links.

Syelle, did you receive any email notifications at ?

"All links" now shows 800! Great news! But no: No email notifications at ILETC@GC.

Actions #15

Updated by Boone Gorges about 1 year ago

Thanks, Syelle. It sounds like everything is working properly except for email notifications.

It appears that the cron job for sending email notifications ran on 2023-01-30:

$ wp cavalcade log --job=1832971
+---------+------------------------------+---------------------+-----------+
| job     | hook                         | timestamp           | status    |
+---------+------------------------------+---------------------+-----------+
| 1832971 | blc_cron_email_notifications | 2023-01-14 03:09:30 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-16 03:10:58 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-18 03:15:46 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-20 03:17:21 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-22 03:18:56 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-24 03:21:16 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-26 03:23:07 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-28 03:24:44 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-30 03:26:31 | completed |
+---------+------------------------------+---------------------+-----------+

I'm afraid I don't have access to mail logs on the Commons server, which limits my ability to debug whether WP tried to send an email during this scheduled task. So we'll have to repeat this experiment, this time with some custom debug statements that I've just put in place, so that I can identify where in the chain the notification is being suppressed. The next scheduled blc_cron_email_notifications is at 2023-02-01 03:26:26, so I'll check tomorrow morning to see if anything's in my log file.

Actions #16

Updated by Boone Gorges about 1 year ago

I checked the site this morning and the initial scan for links in post content was only partly done, which means it hadn't checked to see whether any links were broken, which means no email would've been sent. So there's nothing to see in my log. The next scheduled email run is 2023-02-03 03:32:57, so I'll make a note to check on Friday for more progress.

Actions #17

Updated by Boone Gorges about 1 year ago

I checked today, and it looks like the scan is complete. But it looks like the email notification did not go out. According to my custom logging tool, the 'blc_cron_email_notifications' hook was never run. I triggered the hook manually, and it looks like the email then did go out. Syelle, can you confirm whether you received it? 2023-02-07 12:54.

So I think the only real problem here is that the cron hook isn't running. The log says it is, though:

$ wp cavalcade jobs --site=1185 --hook=blc_cron_email_notifications
+---------+------+------------------------------+---------------------+---------------------+---------+
| id      | site | hook                         | start               | nextrun             | status  |
+---------+------+------------------------------+---------------------+---------------------+---------+
| 1832971 | 1185 | blc_cron_email_notifications | 2022-08-30 19:30:26 | 2023-02-09 03:37:40 | waiting |
+---------+------+------------------------------+---------------------+---------------------+---------+
$ wp cavalcade log --job=1832971
+---------+------------------------------+---------------------+-----------+
| job     | hook                         | timestamp           | status    |
+---------+------------------------------+---------------------+-----------+
| 1832971 | blc_cron_email_notifications | 2023-01-14 03:09:30 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-16 03:10:58 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-18 03:15:46 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-20 03:17:21 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-22 03:18:56 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-24 03:21:16 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-26 03:23:07 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-28 03:24:44 | completed |
| 1832971 | blc_cron_email_notifications | 2023-01-30 03:26:31 | completed |
| 1832971 | blc_cron_email_notifications | 2023-02-01 03:33:02 | completed |
| 1832971 | blc_cron_email_notifications | 2023-02-03 03:35:01 | completed |
| 1832971 | blc_cron_email_notifications | 2023-02-05 03:37:45 | completed |
| 1832971 | blc_cron_email_notifications | 2023-02-07 04:28:24 | completed |
+---------+------------------------------+---------------------+-----------+

And a sample entry from /weblog/lw2a/commons/cavalcade/cavalcade.log-20230206:

Feb  6 23:28:24 lw2a cavalcade: [1832971] Worker status: Array                                           
2177757 Feb  6 23:28:24 lw2a cavalcade: (
2177758 Feb  6 23:28:24 lw2a cavalcade: [command] => wp cavalcade run 1832971 --url='iletc.commons.gc.cuny.edu/'
2177759 Feb  6 23:28:24 lw2a cavalcade: [pid] => 37825
2177760 Feb  6 23:28:24 lw2a cavalcade: [running] =>
2177761 Feb  6 23:28:24 lw2a cavalcade: [signaled] =>
2177762 Feb  6 23:28:24 lw2a cavalcade: [stopped] =>
2177763 Feb  6 23:28:24 lw2a cavalcade: [exitcode] => 0
2177764 Feb  6 23:28:24 lw2a cavalcade: [termsig] => 0
2177765 Feb  6 23:28:24 lw2a cavalcade: [stopsig] => 0
2177766 Feb  6 23:28:24 lw2a cavalcade: )
2177767 Feb  6 23:28:24 lw2a cavalcade: [1832971] Worker shutting down...
2177768 Feb  6 23:28:24 lw2a cavalcade: [1832971] Worker out:
2177769 Feb  6 23:28:24 lw2a cavalcade: [1832971] Worker err:
2177770 Feb  6 23:28:24 lw2a cavalcade: [1832971] Worker ret: 0

So it seems like Cavalcade is firing the event, but it's never reaching this callback (see my debug statements in production, which go to wp-content/uploads/bbg-debug.log): https://github.com/cuny-academic-commons/cac/blob/08c1e0e21d8d19857d99c002112b11ea31bdeb37/wp-content/plugins/broken-link-checker/core/core.php#L3925

Yet when I fired the event manually, the log entries did appear:

$ wp --url=iletc.commons.gc.cuny.edu cron event run blc_cron_email_notifications
Executed the cron event 'blc_cron_email_notifications' in 0.246s.
Success: Executed a total of 1 cron event.

So something seems to be happening such that wp cavalcade run 1832971 --url='iletc.commons.gc.cuny.edu/' is not actually running the event.

On further investigation, I think I've found it. The broken-link-checker plugin doesn't load its core libraries unless is_admin() || defined( 'DOING_CRON' ). See https://github.com/cuny-academic-commons/cac/blob/08c1e0e21d8d19857d99c002112b11ea31bdeb37/wp-content/plugins/broken-link-checker/core/init.php#L291. When running wp cavalcade run, Cavalcade sets DOING_CRON but by this point it's too late - broken-link-checker has already been bootstrapped, and its core libraries not loaded, because DOING_CRON was not defined yet. In contrast, wp cron event run must define DOING_CRON early enough that it's caught by broken-link-checker.

As a test, I'm going to do the following:
- In production, patch broken-link-checker to load the core library when WP_CLI is true
- Wipe the ILETC broken link database table again, to let it re-run its scan
- In a few days, let's check back to see whether the email is sent automatically.

Actions #18

Updated by Syelle Graves about 1 year ago

Boone Gorges wrote in #note-17:

I checked today, and it looks like the scan is complete. But it looks like the email notification did not go out. According to my custom logging tool, the 'blc_cron_email_notifications' hook was never run. I triggered the hook manually, and it looks like the email then did go out. Syelle, can you confirm whether you received it? 2023-02-07 12:54.

Yes, we did! at 11:54 AM (you must be in a different time zone):
"Broken Link Checker has detected 63 new broken links on your site. Here's a list of the first 5 broken links:..."

Actions #19

Updated by Syelle Graves about 1 year ago

Strange update: One of my sites is now sending me e-mail alerts for broken links found in unpublished pages, even though I have the settings set not to check page drafts. I apologize for complaining about the opposite problem, but the system continues to regenerate these emails, so each time I receive the erroneous email, I think that there are new links I need to fix. (I have these links in an unpublished page because I need to store former links/link history there). If this unexpected behavior is something you could look into at some point, I would really appreciate it--and perhaps this info will help with the whole puzzle?

Here is the link to the unpublished page that is sending me broken link checker e-mail notifications https://syellegraves.commons.gc.cuny.edu/wp-admin/post.php?post=7&action=edit&classic-editor

Actions #20

Updated by Boone Gorges about 1 year ago

  • Category name set to WordPress Plugins
  • Status changed from New to Staged for Production Release
  • Assignee set to Boone Gorges
  • Target version set to 2.1.1

From Syelle's report, and reports I've heard from other Commons users, it sounds like my fix for notification emails is working properly. I've reported it to the authors of broken-link-checker https://wordpress.org/support/topic/scheduled-tasks-dont-run-when-using-cavalcade-instead-of-wps-pseudo-cron/#new-topic-0, and I'll report back here when/if they reply. In the meantime, I've committed my change to our codebase as a hotfix https://github.com/cuny-academic-commons/cac/commit/20c8d2470b9c17e825e78ae180532820398e5981

and perhaps this info will help with the whole puzzle?

I'm afraid there's not a single puzzle, just a list of specific issues about the plugin. In any case, this ticket was initially opened because the plugin was not properly running its automated scans and notifications. Now it is, so I think we've addressed the central issue. Your newly reported issue does indeed seem like the opposite sort of problem, and it seems far less critical since it's not blocking the primary functionality of the plugin, so I'm going to put it into a new ticket: #17655

Actions #21

Updated by Boone Gorges about 1 year ago

  • Related to Bug #17655: Broken Link Checker improperly scanning unpublished posts added
Actions #22

Updated by Boone Gorges about 1 year ago

  • Subject changed from Broken Link Checker to Broken Link Checker not sending notifications of broken links
Actions #23

Updated by Syelle Graves about 1 year ago

Perfect!

Actions #24

Updated by Boone Gorges about 1 year ago

  • Status changed from Staged for Production Release to Resolved
Actions #25

Updated by Boone Gorges about 1 year ago

Quick follow-up that I received the following from the BLC team https://wordpress.org/support/topic/scheduled-tasks-dont-run-when-using-cavalcade-instead-of-wps-pseudo-cron/#post-16472233

I shared your data with our BLC developers, but please note there will be no future updates for the old engine of BLC. Soon, in the upcoming days, we plan to release a new version here, with the new engine.

I guess we'll see what happens with the new version of the plugin.

Actions

Also available in: Atom PDF