Feature #22353: Introduce static analysis to CI pipeline - CUNY Academic Commons - CUNY Graduate Center - Project Tracking System

Actions

Copy link

Feature #22353

open

Introduce static analysis to CI pipeline

Added by Boone Gorges 12 months ago. Updated 18 days ago.

Status:

New

Priority name:

Normal

Assignee:

Boone Gorges

Category name:

Internal Tools and Workflow

Target version:

Future release

Start date:

2025-03-19

Due date:

% Done:

Estimated time:

Deployment actions:

Description

Bugs like #22351 could be avoided if we had some basic static analysis tools. I've set up PHPStan successfully on projects not too dissimilar to the Commons. It takes a good deal of configuration to get it set up, and getting our custom codebase to pass even the most permissive scan level will probably take some work. At the same time, the benefits are quite real: we avoid the type of simple regression in #22351, and likely also identify other latent bugs.

Ray, Jeremy, have you used PHPStan or another tool for this? I'm happy to have a first crack at it, unless one of you has other thoughts.

Actions

Copy link

Updated by Jeremy Felt 8 months ago

I've finally got some work up in the feature/phpstan-initial branch. I'm going to brain dump here and then give Boone and Ray a chance to get caught up a bit / ask questions. :)

This branch adds a few related composer dependencies and setups up an initial phpstan configuration.

Categories¶

To make things a bit more manageable / contained / faster than molasses to run, I've split code up into various categories.

mu-plugins: anything in the mu-plugins directory, excluding mercator and cavalcade.
BP plugins: A managed list of plugins related to BuddyPress and bbPress.
CAC plugins: A managed list of plugins explicitly prefixed as cac-.
Plugins One: A list of remaining plugin directories.
Plugins Two: A second list of remaining plugin directories.
Themes: The bp-nelo theme.

Configuration¶

A top-level phpstan configuration exists in developer/phpstan/config/phpstan.neon.dist. This file uses the phpstan default bleedingEdge.neon configuration, scans for errors at level 1, and has a variety of common scan files, ignore files, etc...

Each category has its own phpstan configuration (e.g. phpstan-plugins-cac.neon) that allows us to specify a subset of directories to scan and ignore specific errors if needed.

Baseline¶

Even at level 1, there are thousands of "errors". Rather than ignore these going forward, we can create a baseline so that only new errors are highlighted. These baseline files are large PHP arrays that track found errors and are located in developer/phpstan/baseline/.

Each category has its own baseline file (e.g. baseline-plugins-cac.php).

Stubs¶

PHPStan uses stubs to help describe the expected structure of methods, classes, and other code in cases where you don't want to include the entire codebase. We're able to use the composer packages szepeviktor/phpstan-wordpress and php-stubs/wp-cli-stubs so that we don't need to load the entirety of WordPress or WP-CLI into memory each time.

In addition to those composer packages, we're also able to define our own stubs. PHPStan seems to have trouble analyzing the origin of various constants, so we have a file in developer/phpstan/stubs/phpstan-constants.php that contains a long list which are likely fine to use, but that PHPStan could not verify.

Scripts¶

I've setup an assortment of composer commands that can be used to run scans or baseline creation per category or as a large batch.

composer phpstan runs a PHPStan scan on each category.
composer phpstan:plugins:cac runs a PHPStan scan on CAC plugins.
composer phpstan:baseline creates a baseline scan for each category.
composer phpstan:plugins:cac:baseline creates a baseline scan for CAC plugins.

Tasks / Decisions:¶

What code do we want to scan? * It seems obvious that mu-plugins, bp-nelo, and most cac- prefixed plugins fit here. Most of the bp- prefixed plugins also fit. We may be able to work out a configuration for other plugins and themes that only highlights the most severe errors that should not be shipped to production.
When do we want it to scan? * This is easy enough to setup as a GitHub action, but scans take quite a bit, so the feedback loop is fairly slow. * I've thought through scanning only changed files a bit, but because a code change in one file can impact any file in the project, we probably need to scan a good chunk each time. There's some sort of balance of risk here—for example, mu-plugins vs the long list of 3rd party plugins.
Are these categories correct? And how do we maintain them going forward? * I think we can script some directory list commands to highlight when the phpstan configuration is out of sync with the code base.

Actions

Copy link

Updated by Boone Gorges 8 months ago

Awesome!! So much to digest here.

The current breakdown of content seems pretty good. A couple thoughts for (potentially) refining:
- The cac- prefix is a good but not perfect indicator of which plugins are "first-party". We also have a few other tools that are or were developed for sole use on CAC. social-paper comes to mind (though it's deprecated so I don't care about it). Another dumb one, but a good example, is bava-comments #5841. I guess we can discuss the extent to which it's really all that helpful to separate these out - I suppose the idea is that perhaps we would increase the scanner level for in-house-developed tools?
- Similarly, most of what's picked up in mu-plugins is homegrown, but some is not. See eg the cavalcade and mercator libraries.

Do you know how introducing PHPStan into our GitHub Actions workflow is going to affect billing? I assume that it uses machine-minutes, or whatever GH uses to charge us. Obviously this will quickly get out of hand, if we let it. Third-party plugins are generally updated only twice-monthly, which should help to keep these runs down, but it assumes that we are able to target based on files (or categories) that have changed. So, tentatively, I'd say that plugins-one and plugins-two would only need to run twice per month, when plugins are batch-updated. The first-party categories will potentially be updated more frequently, and ideally we could configure an Action to recognize that only certain categories need to be updated on a given change.

Regarding your section on stubs: In the past I've used PHPStan's symbol discovery as an alternative to stubs https://phpstan.org/user-guide/discovering-symbols Maybe this is what you mean about "loading the entirety into memory"?

Actions

Copy link

Updated by Jeremy Felt 8 months ago

I guess we can discuss the extent to which it's really all that helpful to separate these out - I suppose the idea is that perhaps we would increase the scanner level for in-house-developed tools?

I think so, level and frequency. Even the existing categories can take many minutes each in my local environment—I just ran the entire suite in the background in 20 minutes. There's probably room to separate things further so that the scan most frequently run is on some gut-level categorization of actively maintained / interconnected / high impact.

In theory, any change could impact any piece of code. In reality, our code and plugins aren't often calling each others' functions or conflicting in any way that static analysis will identify.

Generating the baseline errors should help here because we can at least say: "let's not introduce any new level 4(?) errors in code that we actively maintain". 3rd party plugins and themes will likely continue to introduce level 1 errors, but hopefully at a rate where we can run the report every so often, generate new baselines as needed, and identify any types of level 1 errors that would cause us to hold an update.

Separately, we can parse the current baseline to identify those that we're okay ignoring in current code, okay ignoring in future code, or should fix in current code and not allow in future code.

Regarding your section on stubs: In the past I've used PHPStan's symbol discovery as an alternative to stubs

I tried a couple different approaches here and I think there's a combination of things being finicky, my learning curve, and trying to speed things up. I know I went down the symbol discovery approach, but I don't remember why it wasn't working for the constants. I'll poke around at that a bit more. :)

Do you know how introducing PHPStan into our GitHub Actions workflow is going to affect billing?

I haven't run an actual GitHub Action test, but I think 20 minutes may be a reasonable enough guess, which is ~$0.16 if over our included plan minutes. It looks like the current CAC usage is ~210 minutes for July so far, so there should be plenty of room in the 2 or 3000 minutes that we have, but don't hold me to that!

In addition to cost, I think that the time just makes using GitHub Actions painful. I know this impacts Boone's workflow more than anyone else, but I could see:

Use GitHub actions on commits to code we actively manage (target ~2-3 minutes/run)
And use a script locally when updating plugins and themes twice a month
And have a scheduled GitHub action run every so often to capture all categories

And then another process to define is: what happens when an error pops up in those 3 cases? Triage/fix/etc...

Actions

Copy link

Updated by Boone Gorges 8 months ago

Jeremy and I discussed this briefly the other day and decided that the system as he's currently built it is ready for a test run. He's going to do some work to integrate selective CI runs into the GitHub Action workflow - "selective" in the sense that a group will only be scanned if the files in that group are touched. Once it's running in GH, we can monitor for a while, for cost, for runtime, for utility, etc. Based on this, we can refine the way that, for example, the most mission-critical plugins on the primary site are tested.

Actions

Copy link

Updated by Jeremy Felt about 1 month ago

The perfect scenario: When any file changes, run static analysis on everything.

In theory, it's possible any change could impact any code across the ecosystem. In practice, code generally relies on a small subset of the system and most code in the repository isn't touched for years. It takes a long time to analyze the entire project and the likelihood of it mattering is low.

A good scenario: When a file changes, run static analysis on that file and its likely related files.

For example, we should be able to assume that a change made to one of our MU plugins does not have any impact on nor care about any change in a 3rd party theme that hasn't been updated in years.

But! The idea of "when a file changes" really only works if we're using pull requests to submit changes and waiting for workflows to finish before merging. If we're running this on pushes of commits to a branch, then it's hard for git and the workflow to assume a start commit when generating a diff.

The consolidation scenario: Run static analysis on a specific set of files (mu-plugins, bp-nelo, etc...) every time a change is pushed to any PHP file (or phpstan config).

This doesn't provide 100% coverage, but can catch changes that would likely have a large impact (i.e. network-wide and/or main site code). It also completes in ~8 minutes, which is faster than the current PHPCS and PHP lint workflows.

Here's an approach:

Run a python script (developer/scripts/analyze-paths.py) to analyze the most frequently changed files in the repository over the last 2 years. This is meant to be run manually to help audit the configuration groups.
Identify impactful configuration groups to help provide focus to any analysis and cut down on the amount of time it takes to analyze.
Add a phpstan configuration for each of those groups, located in their respective directories. This allows us to manage different baselines, analysis levels, error ignoring, etc... for each group.
Add a standard phpstan configuration at the top level of the project to provide all of the paths that should be included/excluded regardless of the run.
And then, from time to time (???), maybe quarterly, maybe a couple weeks before major releases? We can plan on running a more complete suite for the entire codebase to see if anything has changed against our baseline.

The groups I've defined so far are:

All mu-plugins (wp-content/mu-plugins)
BP Nelo (wp-content/themes/bp-nelo)
CAC CV Editor (wp-content/plugins/cac-cv-editor)
CAC BP Custom Includes (wp-content/plugins/cac-bp-custom-includes)

I don't think that's a final list, but it covers our most frequently touched code. For example, it probably makes sense to only check something like the CV Editor when working with it directly or as part of a wider run. And there may be a less-touched plugin that is critical enough to include each time.

Each of these groups has its own baseline file in developer/phpstan/baseline. These are errors that we're ignoring for now, but don't want to appear in future reports. If we'd like, we can prioritize the errors in these and actually fix them across the codebase, or leave them for future triage.

The branch for all of this is task/phpstan-static-analysis and you can see the latest action in progress at https://github.com/cuny-academic-commons/cac/actions/runs/21380412502

Actions

Copy link

Updated by Boone Gorges about 1 month ago

Thanks, Jeremy. This is great progress.

Can we add wp-content/plugins/bp-custom.php to the list of key groups? This acts like an mu-plugin.

I fixed a couple of the little things in the last branch so that I could see the workflow go green. It worked :-D

The Action is pretty fast - a little less than 4 minutes. How hard would it to be across the entire codebase? I assume it'll take hours (with higher costs?). Is this something we'd do manually? Maybe we have a specific file in the repo that needs to be "touched" every time we want to trigger the run?

Might be worth asking an AI agent to take a look at the exceptions you've listed in your baseline files and decide whether there are ways of categorizing and prioritizing (some combination of ease/safety to fix and potential severity) and perhaps even patching.

Actions

Copy link

Updated by Jeremy Felt about 1 month ago

The Action is pretty fast - a little less than 4 minutes.

It's fast per job, but we multiply that by 4 jobs right now. You can see the per run metrics here: https://github.com/cuny-academic-commons/cac/actions/metrics/usage?tab=jobs&filters=workflow_file_name%3Aphpstan.yml

I assume it'll take hours (with higher costs?). Is this something we'd do manually? Maybe we have a specific file in the repo that needs to be "touched" every time we want to trigger the run?

Yes. We can setup another workflow that fires only manually and runs the full suite. We'd initiate it through the actions interface on GitHub. With parallel jobs, it may not take a ton of actual time to run, but the costs will be somewhat higher. I think running this before releases is probably worth it.

Might be worth asking an AI agent

I resisted the urge yesterday, but my imaginary friend Claude is now "systematically fixing these issues". We'll see what that looks like. :)

Actions

Copy link

Updated by Jeremy Felt 18 days ago

This work is now merged in the 2.7.x branch.

Next steps:

Add a config for wp-content/plugins/bp-custom.php
Send an AI agent after the baseline errors to see how much cleanup we can confidently do

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

CUNY Academic Commons

Custom queries

Feature #22353

Introduce static analysis to CI pipeline

Updated by Jeremy Felt 8 months ago

Categories¶

Configuration¶

Baseline¶

Stubs¶

Scripts¶

Tasks / Decisions:¶

Updated by Boone Gorges 8 months ago

Updated by Jeremy Felt 8 months ago

Updated by Boone Gorges 8 months ago

Updated by Jeremy Felt about 1 month ago

Updated by Boone Gorges about 1 month ago

Updated by Jeremy Felt about 1 month ago

Updated by Jeremy Felt 18 days ago