Project

General

Profile

Actions

Bug #16889

closed

Broken link checker not working

Added by Raffi Khatchadourian 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority name:
Normal
Assignee:
Category name:
-
Target version:
Start date:
2022-09-24
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

The broken link checker plug-in says I have no broken links on my site, but that's not true. I found one that was broken on a page. I reviewed the broken link checker tool and looked up the link text. Sure enough, it said it was a valid link. Then I rechecked it (pushed the button for this). Then, it said it was broken. The plug-in is set to check every 72 hours, but I don't think it has worked for a long time.


Related issues

Related to CUNY Academic Commons - Bug #9930: wp_privacy_delete_old_export_files runs a bazillion timesResolvedBoone Gorges2018-06-14

Actions
Related to CUNY Academic Commons - Bug #16937: Link checker checks transient linksRejected2022-09-30

Actions
Actions #1

Updated by Boone Gorges 2 months ago

  • Related to Bug #9930: wp_privacy_delete_old_export_files runs a bazillion times added
Actions #2

Updated by Boone Gorges 2 months ago

  • Assignee set to Boone Gorges
  • Target version set to 2.0.9

Hi Raffi - If I understand correctly, the broken link checker is working (ie it's able to identify broken links) but it's just not running its scheduled background tasks.

I did some investigation and it appears that this may be due to some recent changes related to #9930. In that ticket, we modified the way that recurring tasks are rescheduled, to prevent backlogs. However, on investigation, there was a bug in the modification that caused non-recurring tasks to be rescheduled. This caused a large backlog to form, which was likely responsible for unrun tasks from the Broken Link Checker plugin.

I fixed the issue in https://github.com/cuny-academic-commons/cac/commit/5742795abea6bc37dfdcc1828fce74f9a7c0c113 (shipped immediately to production) and the backlog is now being quickly caught up. Within the next 24 hours, tasks should be caught up and rescheduled properly. Could you please check within the next few days to see if you notice an improvement?

Actions #3

Updated by Raffi Khatchadourian 2 months ago

Hey Boone. Thanks for looking into this. I am now seeing the following on my dashboard:

Found 1 broken link
1038 URLs in the work queue
Detected 1038 unique URLs in 1402 links and still searching...

The one found link is the one I checked manually (not scheduled). I don't see the count going down.

Actions #4

Updated by Boone Gorges 2 months ago

  • Status changed from New to Reporter Feedback

Thanks, Raffi. This one is a tangle of unrelated issues - the scheduled-task problem needed to be solved to get at the next one.

After a bit of debugging and investigation, it appears that broken-link-checker was running up against some of its internally-imposed limits on the amount of system resources its background processes are allowed to consume. I started trying to modify these checks when I discovered that I'd previously done so already. See #9865. As a test, I tried increasing the server_load_limit directive. The checker began running again, and the number of URLs recorded first went up, before it peaked in around 2900. It's since been going down slowly, and it's at 2645 as I write this.

Could you please check back in another 24-48 hours to see if the reports look right to you? If so, and if we don't experience any other system problems, I'll make permanent my test modifications.

Actions #5

Updated by Raffi Khatchadourian 2 months ago

Boone,

This looks to be working now, thanks! But, it seems to be going into the linked web pages and checking for broken links. I think it is more desirable only to check the links from common sites. I can't fix the links on the sites I link to, so I don't know what to do with that information. Also, I think this is causing extra work for the CAC. What do you think?

I'll file a separate bug for this as it is somewhat unrelated. We can close out this bug. Thanks again!

Actions #6

Updated by Boone Gorges 2 months ago

  • Status changed from Reporter Feedback to Resolved

Yeah, that does seem strange. I think the near-term answer may be to disable 'Link Library' at https://khatchad.commons.gc.cuny.edu/wp-admin/options-general.php?page=link-checker-settings ('Look for links in' tab) but feel free to open a separate ticket and we can review.

Actions #8

Updated by Boone Gorges 2 months ago

  • Related to Bug #16937: Link checker checks transient links added
Actions

Also available in: Atom PDF