Project

General

Profile

Bug #6737

wp-rss-multi-importer cron job requires huge number of DB queries

Added by Boone Gorges about 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority name:
Normal
Assignee:
Category name:
WordPress Plugins
Target version:
Start date:
2016-11-15
Due date:
% Done:

0%

Estimated time:

Description

The wp-rss-multi-importer is responsible for - wait for it - importing RSS feed items. It also appears to contain a cleanup routine that runs every hour, and that cleanup routine requires a very large number of database queries. I've attached a sample log.

It doesn't appears that there are any outright loops involved, and separate cron jobs don't appear to process the same post over and over again. I think the issue is just that, on sites that have import logs of feeds, there's a lot of cleanup to do. And deleting items in WordPress is not very streamlined.

This is likely not a new issue with the plugin, though it's possible that it's cropping up recently because of increased usage. The site where the problem seems to be most prevalent is gcenglish, so that's a good place to start debugging. If it appears, for example, that all feeds were added to that site within the last week, then it's reasonable to assume that it's part of our problem.

wp-rss-multi-importer.txt (615 KB) wp-rss-multi-importer.txt Boone Gorges, 2016-11-15 09:47 PM

Related issues

Related to CUNY Academic Commons - Bug #8930: Throttle Jetpack batch routinesResolved2017-11-22

Related to CUNY Academic Commons - Feature #8987: Migrate away from wp-cronResolved2017-12-07

History

#1 Updated by Boone Gorges about 5 years ago

  • Status changed from New to Rejected
  • Target version set to Not tracked

I spent some time analyzing this. I don't see any out-and-out bugs. Most of the queries are a consequence of wp_delete_post(), which is pretty inefficient. It would be possible to reduce the number of required queries by introducing a tool that deletes items in bulk, but this would be a pretty large amount of work. Another mitigation would be to force the plugin to run its routine more frequently, so that it was processing fewer posts each time. Again, this would take a pretty large amount of work.

It looks like there might be some cases where the wp-rss-multi-importer deletion routine times out, because gcenglish has a fair number of items that ought to have been deleted by now, but have not been. I just ran a custom script to clean them out. This may help with memory overhead in the future.

All of the feeds on gcenglish were added months (or years) ago, so I'm fairly sure this is not the source of the problem described in #6731. So I think I've done all I can reasonably do here.

#2 Updated by Matt Gold about 5 years ago

Thanks, Boone. Should we consider reaching out to gcenglish to see whether they absolutely need this set-up?

#3 Updated by Boone Gorges about 5 years ago

We'd need to reach out to all users of the plugin. I'd have to query to see who those are.

We allow other RSS-fetching tools - PressForward and FeedWordPress come to mind. It's likely that they'll all face similar problems (we know, in fact, that PF does; see #6734). The one thing I'd say on this front is that it'd be easier to consider improving the plugins if we had fewer of them. That said, I don't think that the issue described here is serious enough to go through the trouble of migrating users awya.

#4 Updated by Matt Gold about 5 years ago

Thanks, Boone. FYI, I have been thinking about the need for a more streamlined system overall, with fewer plugins. It may just be what we need to do to sustain a reliable platform. Certainly, that is the way that WordPress.com and many other multi-site platforms go. If you see ways to reduce plugins, I think we should pursue that. We will just have to plan a communication effort with our users to explain why we are doing it.

#5 Updated by Boone Gorges about 4 years ago

  • Related to Bug #8930: Throttle Jetpack batch routines added

#6 Updated by Boone Gorges over 3 years ago

#7 Updated by Boone Gorges over 3 years ago

  • Status changed from Rejected to Resolved
  • Target version changed from Not tracked to 1.13.4

During recent debugging, it became pretty clear that this plugin is a culprit in some recent resource issues.

I've looked at a couple other RSS aggregator plugins, and it's possible that we could migrate. Specifically, https://wordpress.org/plugins/wp-rss-aggregator/ is quite a bit smarter than wp-rss-multi-importer in terms of the way it processes imports. Namely, during each import run, it schedules separate cron jobs for each subscribed feed, which greatly decreases the likelihood of timeouts and other resource issues.

The problem with migrating is that the data has to be moved over, and the front-end functionality has to be duplicated somehow. It appears that the only site actively using the plugin is gcenglish, and they use the front-end widget, but wp-rss-aggregator either doesn't have a widget or only has it as a paid add-on, and even then it would likely work quite differently. The amount of effort required to migrate gcenglish to the new plugin might therefore be fairly large. Another option, especially since wp-rss-multi-import has been removed from the wordpress.org repo and thus is no longer receiving updates, is to fork the plugin and fix it so that it does one-cron-job-per-feed. I have a feeling that this will be easier in the short term. We can then hide wp-rss-multi-importer for new installations, and perhaps install wp-rss-aggregator if anyone requests the functionality in the future.

I'm going to reopen this ticket and move it into a milestone so I can take care of some of these tasks.

#8 Updated by Boone Gorges over 3 years ago

  • Status changed from Resolved to Deferred

#9 Updated by Boone Gorges over 3 years ago

  • Parent task deleted (#6731)

#10 Updated by Boone Gorges over 3 years ago

  • Status changed from Deferred to Assigned

#11 Updated by Boone Gorges over 3 years ago

  • Target version changed from 1.13.4 to 1.13.5

#12 Updated by Boone Gorges over 3 years ago

  • Target version changed from 1.13.5 to 1.13.6

#13 Updated by Boone Gorges over 3 years ago

  • Status changed from Assigned to Resolved

In https://github.com/cuny-academic-commons/cac/commit/b8db29c94e306a79da7b6a339988d8e637730877 I disabled the plugin for future activation.

In https://github.com/cuny-academic-commons/cac/commit/9a1b83b9a2469ceeaff937b3d5517528396e99b6 I split the cron job into per-feed events, which should lighten the load.

Also available in: Atom PDF