OpenComputer Snapshot Stress Test

Stress test for OpenComputer checkpoint/restore integrity. Creates checkpoints and restores from them at scale, verifying git object integrity and file-level SHA-256 on every restore.

Requires Node.js 20+ and an OpenComputer API key.

Background

A customer reported corrupted snapshots — files had incorrect or missing content after checkpoint/restore operations. Root cause was race conditions in the QEMU backend where concurrent operations accessed qcow2 virtual disk files without synchronization. Specifically, tar was archiving qcow2 files while QEMU was modifying them (tar: rootfs.qcow2: file changed as we read it).

Four fixes were applied, all based on the same principle — never read a qcow2 file that another process can write:

Hibernate archive: reflink-copy qcow2 to staging before archiving
Destroy during archive: wait for archive goroutine before deleting files
Migration upload: mutex + reflink staging during S3 upload
Checkpoint cache delete: write-lock blocks removal while forks hold read locks

This repo validates the fix at the SDK level — 1,000 checkpoint restores against production.

Full incident report: docs/incident-report.pdf

Results: 1,000 restores (April 2026)

Ran against OpenComputer production. 5 independent rounds, 200 restores each, concurrency 5.

Metric	Result
Total restores	1,000
Corruptions	0
Infra errors (timeouts)	3
Avg restore (fork from checkpoint)	~130ms
Avg verify (git + SHA-256 over 5MB)	~10s

Round	Restores	Infra errors	Setup	Avg create	Avg verify
1	200	0	7.0s	140ms	10.3s
2	200	0	7.8s	130ms	11.5s
3	200	1	6.2s	132ms	10.8s
4	200	2	7.7s	149ms	11.2s
5	200	0	59.6s	135ms	8.9s
Total	1,000	3		~137ms	~10.5s

The 3 infra errors were Cloudflare 524 timeouts — transient network issues, not data integrity failures. Round 5's longer setup was a slow checkpoint readiness poll.

Raw results: results/2026-04-01_15-11-52/

Methodology

Each round:

Boot a sandbox (1 CPU / 4 GB), write a 5MB random marker file, commit it to a git repo
Checkpoint the sandbox
Restore from that checkpoint N times concurrently
Verify every restore with three checks:
- git status — segfault on corrupted filesystem/memory
- git log — verifies object database matches expected commit
- SHA-256 of marker file — detects bit rot or truncation

Git's content-addressed object store (SHA-1) makes it sensitive to even single bit flips — corruption typically surfaces as segfaults or hash errors rather than silent data loss.

Errors are classified as corruption (segfault, hash mismatch, git breakage) or infra (timeout, 502, rate limit). Only corruption counts as a test failure.

Usage

cp .env.example .env
# add your OpenComputer API key
npm install
source .env

npm run test:smoke    # 10 restores, ~30s
npm run test:full     # 1000 restores, ~40 min

Each run creates a timestamped directory under results/:

results/2026-04-01_14-30-00/
├── report.json        # structured results
├── run.log            # console output (ANSI stripped)
└── error-details.log  # verbose error bodies (if any)

Options

-n, --restores <num>      Total restores across all rounds (default: 1000)
-r, --rounds <num>        Independent checkpoint rounds (default: 5)
-c, --concurrency <num>   Max simultaneous restores (default: 5)
--marker-size <mb>        Marker file size in MB (default: 5)

npx tsx src/stress-test.ts -n 500 -r 3 -c 8
npx tsx src/stress-test.ts -n 2000 -r 10 -c 5 --marker-size 10

Structure

src/
├── stress-test.ts          # entry point
└── lib/
    ├── round.ts            # one round: create checkpoint, restore N times
    ├── verify.ts           # one restore: 3 integrity checks
    ├── errors.ts           # classify corruption vs infra errors
    ├── report.ts           # terminal output + JSON report
    └── types.ts            # shared types and utilities
docs/
└── incident-report.pdf     # root cause analysis & remediation
results/
└── 2026-04-01_15-11-52/    # 1000-restore full run

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
results/2026-04-01_15-11-52		results/2026-04-01_15-11-52
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenComputer Snapshot Stress Test

Background

Results: 1,000 restores (April 2026)

Methodology

Usage

Options

Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenComputer Snapshot Stress Test

Background

Results: 1,000 restores (April 2026)

Methodology

Usage

Options

Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages