Bug #16889
closed
Broken link checker not working
Added by Raffi Khatchadourian about 2 years ago.
Updated about 2 years ago.
Description
The broken link checker plug-in says I have no broken links on my site, but that's not true. I found one that was broken on a page. I reviewed the broken link checker tool and looked up the link text. Sure enough, it said it was a valid link. Then I rechecked it (pushed the button for this). Then, it said it was broken. The plug-in is set to check every 72 hours, but I don't think it has worked for a long time.
- Related to Bug #9930: wp_privacy_delete_old_export_files runs a bazillion times added
- Assignee set to Boone Gorges
- Target version set to 2.0.9
Hi Raffi - If I understand correctly, the broken link checker is working (ie it's able to identify broken links) but it's just not running its scheduled background tasks.
I did some investigation and it appears that this may be due to some recent changes related to #9930. In that ticket, we modified the way that recurring tasks are rescheduled, to prevent backlogs. However, on investigation, there was a bug in the modification that caused non-recurring tasks to be rescheduled. This caused a large backlog to form, which was likely responsible for unrun tasks from the Broken Link Checker plugin.
I fixed the issue in https://github.com/cuny-academic-commons/cac/commit/5742795abea6bc37dfdcc1828fce74f9a7c0c113 (shipped immediately to production) and the backlog is now being quickly caught up. Within the next 24 hours, tasks should be caught up and rescheduled properly. Could you please check within the next few days to see if you notice an improvement?
Hey Boone. Thanks for looking into this. I am now seeing the following on my dashboard:
Found 1 broken link
1038 URLs in the work queue
Detected 1038 unique URLs in 1402 links and still searching...
The one found link is the one I checked manually (not scheduled). I don't see the count going down.
- Status changed from New to Reporter Feedback
Thanks, Raffi. This one is a tangle of unrelated issues - the scheduled-task problem needed to be solved to get at the next one.
After a bit of debugging and investigation, it appears that broken-link-checker was running up against some of its internally-imposed limits on the amount of system resources its background processes are allowed to consume. I started trying to modify these checks when I discovered that I'd previously done so already. See #9865. As a test, I tried increasing the server_load_limit directive. The checker began running again, and the number of URLs recorded first went up, before it peaked in around 2900. It's since been going down slowly, and it's at 2645 as I write this.
Could you please check back in another 24-48 hours to see if the reports look right to you? If so, and if we don't experience any other system problems, I'll make permanent my test modifications.
Boone,
This looks to be working now, thanks! But, it seems to be going into the linked web pages and checking for broken links. I think it is more desirable only to check the links from common sites. I can't fix the links on the sites I link to, so I don't know what to do with that information. Also, I think this is causing extra work for the CAC. What do you think?
I'll file a separate bug for this as it is somewhat unrelated. We can close out this bug. Thanks again!
- Status changed from Reporter Feedback to Resolved
- Related to Bug #16937: Link checker checks transient links added
Also available in: Atom
PDF