Feature #22353
openIntroduce static analysis to CI pipeline
Added by Boone Gorges 4 months ago. Updated 9 days ago.
0%
Description
Bugs like #22351 could be avoided if we had some basic static analysis tools. I've set up PHPStan successfully on projects not too dissimilar to the Commons. It takes a good deal of configuration to get it set up, and getting our custom codebase to pass even the most permissive scan level will probably take some work. At the same time, the benefits are quite real: we avoid the type of simple regression in #22351, and likely also identify other latent bugs.
Ray, Jeremy, have you used PHPStan or another tool for this? I'm happy to have a first crack at it, unless one of you has other thoughts.
Updated by Jeremy Felt 19 days ago
I've finally got some work up in the feature/phpstan-initial
branch. I'm going to brain dump here and then give Boone and Ray a chance to get caught up a bit / ask questions. :)
This branch adds a few related composer dependencies and setups up an initial phpstan configuration.
Categories¶
To make things a bit more manageable / contained / faster than molasses to run, I've split code up into various categories.
- mu-plugins: anything in the mu-plugins directory, excluding mercator and cavalcade.
- BP plugins: A managed list of plugins related to BuddyPress and bbPress.
- CAC plugins: A managed list of plugins explicitly prefixed as
cac-
. - Plugins One: A list of remaining plugin directories.
- Plugins Two: A second list of remaining plugin directories.
- Themes: The
bp-nelo
theme.
Configuration¶
A top-level phpstan configuration exists in developer/phpstan/config/phpstan.neon.dist
. This file uses the phpstan default bleedingEdge.neon
configuration, scans for errors at level 1, and has a variety of common scan files, ignore files, etc...
Each category has its own phpstan configuration (e.g. phpstan-plugins-cac.neon
) that allows us to specify a subset of directories to scan and ignore specific errors if needed.
Baseline¶
Even at level 1, there are thousands of "errors". Rather than ignore these going forward, we can create a baseline so that only new errors are highlighted. These baseline files are large PHP arrays that track found errors and are located in developer/phpstan/baseline/
.
Each category has its own baseline file (e.g. baseline-plugins-cac.php
).
Stubs¶
PHPStan uses stubs to help describe the expected structure of methods, classes, and other code in cases where you don't want to include the entire codebase. We're able to use the composer packages szepeviktor/phpstan-wordpress
and php-stubs/wp-cli-stubs
so that we don't need to load the entirety of WordPress or WP-CLI into memory each time.
In addition to those composer packages, we're also able to define our own stubs. PHPStan seems to have trouble analyzing the origin of various constants, so we have a file in developer/phpstan/stubs/phpstan-constants.php
that contains a long list which are likely fine to use, but that PHPStan could not verify.
Scripts¶
I've setup an assortment of composer commands that can be used to run scans or baseline creation per category or as a large batch.
composer phpstan
runs a PHPStan scan on each category.composer phpstan:plugins:cac
runs a PHPStan scan on CAC plugins.composer phpstan:baseline
creates a baseline scan for each category.composer phpstan:plugins:cac:baseline
creates a baseline scan for CAC plugins.
Tasks / Decisions:¶
- What code do we want to scan? * It seems obvious that mu-plugins, bp-nelo, and most cac- prefixed plugins fit here. Most of the bp- prefixed plugins also fit. We may be able to work out a configuration for other plugins and themes that only highlights the most severe errors that should not be shipped to production.
- When do we want it to scan? * This is easy enough to setup as a GitHub action, but scans take quite a bit, so the feedback loop is fairly slow. * I've thought through scanning only changed files a bit, but because a code change in one file can impact any file in the project, we probably need to scan a good chunk each time. There's some sort of balance of risk here—for example, mu-plugins vs the long list of 3rd party plugins.
- Are these categories correct? And how do we maintain them going forward? * I think we can script some directory list commands to highlight when the phpstan configuration is out of sync with the code base.
Updated by Boone Gorges 16 days ago
Awesome!! So much to digest here.
The current breakdown of content seems pretty good. A couple thoughts for (potentially) refining:
- The cac- prefix is a good but not perfect indicator of which plugins are "first-party". We also have a few other tools that are or were developed for sole use on CAC. social-paper comes to mind (though it's deprecated so I don't care about it). Another dumb one, but a good example, is bava-comments #5841. I guess we can discuss the extent to which it's really all that helpful to separate these out - I suppose the idea is that perhaps we would increase the scanner level for in-house-developed tools?
- Similarly, most of what's picked up in mu-plugins is homegrown, but some is not. See eg the cavalcade and mercator libraries.
Do you know how introducing PHPStan into our GitHub Actions workflow is going to affect billing? I assume that it uses machine-minutes, or whatever GH uses to charge us. Obviously this will quickly get out of hand, if we let it. Third-party plugins are generally updated only twice-monthly, which should help to keep these runs down, but it assumes that we are able to target based on files (or categories) that have changed. So, tentatively, I'd say that plugins-one and plugins-two would only need to run twice per month, when plugins are batch-updated. The first-party categories will potentially be updated more frequently, and ideally we could configure an Action to recognize that only certain categories need to be updated on a given change.
Regarding your section on stubs: In the past I've used PHPStan's symbol discovery as an alternative to stubs https://phpstan.org/user-guide/discovering-symbols Maybe this is what you mean about "loading the entirety into memory"?
Updated by Jeremy Felt 12 days ago
I guess we can discuss the extent to which it's really all that helpful to separate these out - I suppose the idea is that perhaps we would increase the scanner level for in-house-developed tools?
I think so, level and frequency. Even the existing categories can take many minutes each in my local environment—I just ran the entire suite in the background in 20 minutes. There's probably room to separate things further so that the scan most frequently run is on some gut-level categorization of actively maintained / interconnected / high impact.
In theory, any change could impact any piece of code. In reality, our code and plugins aren't often calling each others' functions or conflicting in any way that static analysis will identify.
Generating the baseline errors should help here because we can at least say: "let's not introduce any new level 4(?) errors in code that we actively maintain". 3rd party plugins and themes will likely continue to introduce level 1 errors, but hopefully at a rate where we can run the report every so often, generate new baselines as needed, and identify any types of level 1 errors that would cause us to hold an update.
Separately, we can parse the current baseline to identify those that we're okay ignoring in current code, okay ignoring in future code, or should fix in current code and not allow in future code.
Regarding your section on stubs: In the past I've used PHPStan's symbol discovery as an alternative to stubs
I tried a couple different approaches here and I think there's a combination of things being finicky, my learning curve, and trying to speed things up. I know I went down the symbol discovery approach, but I don't remember why it wasn't working for the constants. I'll poke around at that a bit more. :)
Do you know how introducing PHPStan into our GitHub Actions workflow is going to affect billing?
I haven't run an actual GitHub Action test, but I think 20 minutes may be a reasonable enough guess, which is ~$0.16 if over our included plan minutes. It looks like the current CAC usage is ~210 minutes for July so far, so there should be plenty of room in the 2 or 3000 minutes that we have, but don't hold me to that!
In addition to cost, I think that the time just makes using GitHub Actions painful. I know this impacts Boone's workflow more than anyone else, but I could see:
- Use GitHub actions on commits to code we actively manage (target ~2-3 minutes/run)
- And use a script locally when updating plugins and themes twice a month
- And have a scheduled GitHub action run every so often to capture all categories
And then another process to define is: what happens when an error pops up in those 3 cases? Triage/fix/etc...
Updated by Boone Gorges 9 days ago
Jeremy and I discussed this briefly the other day and decided that the system as he's currently built it is ready for a test run. He's going to do some work to integrate selective CI runs into the GitHub Action workflow - "selective" in the sense that a group will only be scanned if the files in that group are touched. Once it's running in GH, we can monitor for a while, for cost, for runtime, for utility, etc. Based on this, we can refine the way that, for example, the most mission-critical plugins on the primary site are tested.