Project

General

Profile

Actions

Feature #22630

open

Prevent SHOW TABLES queries

Added by Boone Gorges 6 days ago. Updated 5 days ago.

Status:
New
Priority name:
Normal
Assignee:
-
Category name:
-
Target version:
Start date:
2025-04-23
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

SHOW TABLES requires reference to INFORMATION_SCHEMA. On our very large database, this kind of query is extremely slow, to the point where it can crash our database. This has happened most recently with litespeed-cache #22618, but it has happened in the past with other plugins: Gravity Forms #15242, #21031; Editoria11y #20828.

A quick search of the plugins directory shows dozens more SHOW TABLES query, all of which could cause similar problems in the right combination of situations.

Unfortunately, I think the time has come when we need to introduce some sort of global solution. I see two types of approaches:

1. Pre-deployment mitigations

We could build sniffs into our CI pipeline that look for SHOW TABLES queries. Then, for each identified item, we either (a) somehow fix the plugin (using hooks if possible, forking the plugin if it's old, sending upstream patches, etc), or we (b) decide it's not a problem because the code path is never or is rarely accessed. This is pretty time-consuming, but it has the benefit of being very targeted.

2. Just-in-time "polyfill"

A more general solution is to write a 'query' filter callback that detects all instances of SHOW TABLES and swaps them out for less destructive queries. For example, $wpdb->get_var( "SHOW TABLES LIKE 'foo'" ); returns 'foo' if the table exists. We can run something like DESCRIBE 'foo' and return the table name if the result is not empty. get_col() queries would need a somewhat more complex treatment. Wildcard queries SHOW TABLES LIKE '%foo%' would likely have to be skipped, and we may discover other instances where special treatment or exclusions are necessary.

The second option is a fairly big hammer, but I wonder if it might be worth it. We could use a similar infrastructure in the future to prevent INFORMATION_SCHEMA queries or other types of problems.

Jeremy and Ray, do you have thoughts on whether it's worth pursuing this, and if one of these approaches (or something else) is preferable?

Actions #1

Updated by Raymond Hoh 6 days ago

I like the second approach for "show tables like 'foo'" queries. There are some $wpdb->get_results() that will need to be rewritten, but for the general $wpdb->get_var() query, we can definitely do this. +1!

Actions #2

Updated by Boone Gorges 6 days ago

Thanks so much, Ray.

On building this, I realized that the difference between get_var(), get_col(), etc is determined post-query by WPDB, and is not part of the SQL query results. So no extra logic is required there.

As such, I built the simplest possible drop-in I could muster: https://github.com/cuny-academic-commons/cac/commit/fbacadb3783313a076cbb24b58b5c9de33265422

After some back and forth with ChatGPT (and some manual testing) I determined that the scoped INFORMATION_SCHEMA query is just as performant as DESCRIBE. The performance problem comes with unscoped, unindexed SHOW TABLES queries.

I've deployed this to production and run a couple manual tests. Various permutations of $wpdb->get_var( "SHOW TABLES LIKE 'foo'") are all working as expected, and most importantly, are not causing the database to choke. Let's leave this in place a bit to see how things run.

Actions #3

Updated by Matt Gold 6 days ago

Thanks so much for your work on this. Fingers crossed.

Actions #4

Updated by Boone Gorges 5 days ago

The Cavalcade-Runner daemon also had a SHOW TABLES LIKE 'wp_blogs' query pretty much perpetually hanging in the mysql processlist. Because that application lives outside of WP, it wasn't picked up by my filter. It turns out that the Human Made team independently came across the same problem and put a mitigation in place in a release earlier this year: https://github.com/humanmade/Cavalcade-Runner/pull/77 I decided that we should update Cavalcade-Runner and also begin keeping track of our own modifications to the runner. So I started a repo at https://github.com/cuny-academic-commons/Cavalcade-Runner, with a branch 'cac-prod' which is now being run on the production site. This cac-prod branch includes an additional streamlining of the "SHOW TABLES" query, as well as Ray's fix related to the way that $options are passed to the Runner object, and also our max_workers customization bump. Let's keep track of changes there, and then ask Reclaim for a bump (just a git pull, followed by a service restart) when we need them deployed.

Actions

Also available in: Atom PDF