netns needs bwrap; without it commands silently share the host network
namespace and parallel network tests collide on ports. Skip the check for
--list (it inspects configs, runs nothing), hard-fail on CI so a missing-
bubblewrap misconfig can't silently degrade, and locally just warn and fall
back to the shared namespace.
Let any command ride the build/check pool, not just wolfSSL builds:
build false skips configure/make/check (config is just prepare+run)
netns true runs each command under 'bwrap --unshare-net --cap-add
CAP_NET_ADMIN' (its own network namespace) so parallel network
tests can't collide on ports and can configure that namespace
shards fan a config out into N instances, each with $SHARD (1..N) and
$SHARDS=N in its env and its own build-<name>-<k> dir, so a
command can split its work N ways (the pool load-balances them)
Error out, rather than silently degrade, on two misconfigurations that
otherwise surface as confusing test failures: netns requested but bwrap
missing (commands would share the host namespace and collide on ports),
and config-name collisions after shard fan-out (two jobs would share a
build dir and race).
A ::warning::/::error:: emitted with no file= property is pinned by GitHub
to the .github directory, whose blob URL is a directory listing - so the
stale-"minutes" annotations rendered with a dead source link and a line
number that points at nothing.
Derive the workflow file path from GITHUB_WORKFLOW_REF (owner/repo/path@ref)
and pass it as file= so the annotations link to the real workflow that
embeds the config list. Falls back to the previous fileless form off-CI or
when the ref is unavailable.
The 10 GB, LRU-evicted, PR-scoped Actions cache was being thrashed - the
docker simulator buildx layers (~6 GiB), plus per-PR ccache and apt-archive
writes whose keys never hit - which kept evicting the shared ccache, while
the apt mirror timed out often enough to break PR CI. Move the heavy caches
to ghcr (free, separate pool) and make PR runs read-only against the Actions
cache.
apt dependencies from prebuilt ghcr .deb bundles
- ci-deps-image.yml resolves each package list under .github/ci-deps/ into
its .deb closure and publishes ghcr.io/<owner>/wolfssl-ci-debs:<tag> in
two tiers: <ver>-minimal (make-check family) and <ver>-full (interop
superset), for ubuntu-22.04 and 24.04.
- install-apt-deps gains a ghcr-debs-tag input: pull the bundle and install
offline (--no-download) so the apt mirror is never on the PR critical
path. Any failure (bundle missing/not public/incomplete) falls through to
the existing apt path, so it is always safe to set.
sim-test buildx layers to a shared ghcr registry cache
- the 7 docker simulator workflows switch from cache-to: type=gha to
ghcr.io/wolfssl/wolfssl-sim-cache:<scope>. cache-from reads on every run
(anonymous); cache-to writes only on the weekend cron and manual
workflow_dispatch. Per-distinct-image tags and de-duplicated writers keep
parallel matrix jobs from racing on one ref.
ccache: PRs read, the schedule writes
- ccache-setup gains read-only: PR runs restore the shared master-scoped
cache but never upload; schedule/push runs refresh it. Wired across
os-check (linux + macOS), pq-all, smoke-test and the 12 small make-check
workflows.
- parallel-make-check.py gains --build-only (compile every config, skip the
test phase) so weekday-morning seed crons warm the cache PR runs consume.
artifact retention capped at 7 days on the failure-log/result uploads that
previously defaulted to 90.
ONE-TIME SETUP: after their first publish, make the ghcr packages
wolfssl-ci-debs and wolfssl-sim-cache PUBLIC so anonymous pulls work from PR
(including fork) runs; until then everything falls back cleanly.
Addresses review feedback:
- The "minutes" header comment described the check backwards (the
estimate drifting from the measured time). Reword it to match the
code, which warns when the measured time lands more than +/-50% away
from the estimate.
- Centralize the GitHub workflow-command escaping in gh_escape() and
apply it to the ::group:: title in dump() and the ::error:: summary in
main(), not just warn(), so a config name or step carrying %, CR or LF
cannot corrupt those commands either.
A config name comes from JSON and is only checked for emptiness and a
'/', so it can carry %, CR or LF. Passed straight into the ::warning::
workflow command those would truncate the annotation or be parsed as a
second command, so escape them in the GitHub branch of warn() per
GitHub's documented command-data encoding (% first). Local output is
unchanged.
The "minutes" field is only a scheduling estimate; when it goes stale it
just packs the schedule a little worse, and there was no signal that a
value needed updating. Emit a non-fatal warning when a config that
explicitly sets "minutes" finishes more than 50% above or below it (a
GitHub ::warning:: annotation in CI, a plain line locally) and flag the
row in the step-summary table with the value to copy over.
Configs that omit "minutes" keep riding the 1.0 default and are left
alone. The warning never touches the exit status, so it cannot fail the
job.
- Reject the config names "aux" and "test": build-aux/ is autotools'
aux-script dir and build-test/ a legacy build dir, neither the
script's to wipe and rebuild over.
- Add type hints throughout.
- Reword the shard-partition comment (the LPT bound was unparseable)
and replace the zip-over-pool.map result pairing with a run_one()
helper so the pool returns complete result rows.
wolfSSL's configure enables make's jobserver by default
(AX_AM_JOBSERVER([yes]) -> AM_MAKEFLAGS += -j<nproc+1> in aminclude.am),
and automake passes that explicit -j to every recursive sub-make, where
it overrides the invoking make's job limit. The script's -j therefore
only ever scheduled the outermost recursion hop: --jobs was inert.
Measured on a 4-CPU host with 10 build-only configs oversaturating the
worker pool, the jobserver default is also the better policy: capping
sub-makes via --disable-jobserver and -j2 dropped CPU utilization from
96% to 89% and lengthened the wall time, because configs' serial
phases (configure, link) stopped being backfilled by other configs'
compile jobs. So make is now invoked with no -j at all - parallelism
within a config comes from the configure-default jobserver - and the
misleading knob is gone, including the macOS job's --jobs 3.
Third Copilot review round:
- Makefile.am: run the test-data stamp recipe body under set -e. A
failed symlink mid-loop previously did not fail the compound command
(only the last command's status counted), so a partially-populated
build tree could be stamped complete. Now any failed setup command
aborts the recipe and the stamp is not created.
- parallel-make-check.py: fail-fast sent SIGTERM only, so a test that
traps or ignores SIGTERM could keep the job alive until the workflow
timeout. abort_others() now polls the swept processes and SIGKILLs
whatever is still alive after a 10 s grace period, and the
post-registration race-window kill escalates the same way (bounded
wait, then SIGKILL). Verified with a config running
"trap '' TERM; sleep 300": the run completes in ~10 s with the
stubborn config reported as aborted and no surviving processes.
Two fixes from the second Copilot review round:
A process spawned between abort_others()' live_procs snapshot and its
registration escaped the kill sweep, leaving that build/check running
to completion after fail-fast had begun. Re-check stop_event right
after registering the process and SIGTERM its process group if the
abort already started: either the registration happened before the
sweep's snapshot (the sweep kills it) or it happened after stop_event
was set (the re-check sees it), so the window is closed.
Exceptions from callable steps (user_settings staging, private-dir
copies) used to escape the worker thread and crash the whole script
with no summary. They are now recorded as that config's step failure
with the exception written to its make-check.log, e.g. a bad
"user_settings" path reports FAIL (stage <path>) while the other
configs keep running; the fail-fast bookkeeping is shared with the
nonzero-exit path via record_failure().
Address the Copilot review:
- parallel-make-check.py: validate "configure" (list of strings) and
cflags/ldflags (strings) so a malformed entry fails the load instead
of exploding a string into per-character configure arguments; print
a single line for passing configs instead of dumping their full
make-check.log into the CI log (failure dumps unchanged; the logs
remain in build-<name>/ for the failure artifacts).
- Makefile.am: use rm -rf for the certs/input/quit setup and distclean
cleanup. A --private-dir run replaces the certs symlink with a
private directory copy that rm -f cannot remove (verified: make
distclean in a build dir with a privatized certs/ now succeeds and
removes it).
- psk.yml, disable-pk-algs.yml: normalize the single-dash tokens
(-disable-rsa, -disable-ecc, -disable-aescbc, -enable-cryptonly)
carried verbatim from the old matrices to the canonical double-dash
form. No coverage change: configure honors single-dash spellings
(verified -disable-rsa sets NO_RSA with no unrecognized-option
warning), so these were always in effect; both touched configs
re-validated end-to-end.
The --cc default stays "ccache gcc": ccache resolves the compiler
through its own masquerade symlinks (verified: no recursion and normal
cache hits with /usr/lib/ccache prepended to PATH), and the explicit
CC= also covers jobs that use ccache without the PATH masquerade.
Replace the one-runner-per-configuration matrices across the
make-check workflow family with a generic pooled runner,
.github/scripts/parallel-make-check.py. Each workflow keeps its
configuration list as JSON next to the invocation; one runner (or a
small fixed set of shards, balanced by measured per-config minutes)
builds every config in its own out-of-tree (VPATH) build directory off
a single checkout/autogen, on a pool of one-per-CPU worker threads,
longest first. Concurrent checks are isolated with bubblewrap network
namespaces, compilations are cached with ccache, the first failure
aborts the rest (fail-fast, with --no-fail-fast to run everything),
and per-config timings plus pool efficiency land in the step summary.
Failure logs upload as artifacts. smoke-test.yml is likewise reworked
into a single pooled job that runs its nine configs on one runner.
Converted workflows (runner jobs per full pass):
os-check.yml 101 -> 8 (92 Ubuntu configs -> 4 shards;
the macOS matrix, the user-settings jobs and
the standalone
macos-apple-native-cert-validation.yml fold
into one macOS runner; Windows unchanged)
pq-all.yml 21 -> 2 shards
disable-pk-algs.yml 15 -> 1
wolfCrypt-Wconversion.yml 11 -> 1
trackmemory.yml 7 -> 1
cryptocb-only.yml 8 -> 1 (incl. the two new SHA512 entries)
multi-compiler.yml 6 -> 1
smallStackSize.yml 6 -> 1
multi-arch.yml 6 -> 1
async.yml 5 -> 1
psk.yml 5 -> 1
no-malloc.yml 3 -> 1
wolfsm.yml 3 -> 1
opensslcoexist.yml 2 -> 1
Measured against current upstream passing runs (job execution time,
queue excluded): ~200 runner jobs / ~374 runner-minutes per full pass
become 23 jobs / ~168 runner-minutes, with more coverage than before.
multi-arch's old matrix combined an "include" list of four
architectures with an "opts" axis; GitHub's include-merge rules made
each arch entry overwrite the previous one, so only the armel
combinations actually ran. The pooled list restores the intended
aarch64/armhf/riscv64 coverage (23 combinations; riscv64 x sp-math is
omitted as invalid - configure rejects sp-math without SP, and
--enable-riscv-asm, unlike --enable-sp-asm, does not bring SP in).
Out-of-tree build fixes this depends on:
- Makefile.am: symlink the read-only test data (certs/, tests/ config
files, sniffer captures and helpers, examples/crypto_policies,
input, quit) into the build tree via a BUILT_SOURCES stamp, removed
again in distclean-local. ChangeToWolfRoot() and the script tests
resolve everything relative to the working directory, so out-of-tree
make check and make distcheck now pass.
- scripts/multi-msg-record.py: locate the client binary from the build
tree working directory rather than the script's source directory.
- configure.ac + wolfssl/include.am: run
support/gen-debug-trace-error-codes.sh from $srcdir; it reads the
error-code headers from the source tree and generates into the build
tree.
- tests/swdev: a WOLFBUILD variable points the sub-make at the build
tree for the configure-generated headers (wolfssl/options.h,
wolfssl/version.h); the in-tree-only guards are dropped.
Portions of PR #10649 are incorporated: the cross-platform
ccache-setup composite action, repository_owner gates on check-headers
and check-source-text, the docs-only paths-ignore on os-check, and the
libspdm timeout bumps.
CPPFLAGS replaces C_EXTRA_FLAGS with embedded single-quotes, which were
passed as literal characters through the shell variable and caused
configure's C compiler test to fail. Fix the report.json summary parser
to use the actual TLS-Anvil field names (TotalTests, FullyFailedTests,
etc.) and include category scores.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Runs the TLS-Anvil combinatorial test suite nightly against wolfSSL in
all four roles: TLS 1.2/1.3 server and TLS 1.2/1.3 client. Results are
summarized in the job summary and uploaded as artifacts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>