Field Note · Blog

Reading Hatch Pet — 14 patterns from a well-designed Codex skill

Fourteen patterns from a small, complete Codex skill: identity locks, subagent boundaries, provenance, no-fallback gates, and deterministic packaging.


FIELD STUDY · NO. 02 · SKILL ANATOMY
License Apache 2.0 Subject Hatch Pet (OpenAI Codex) Length 14 patterns
FIELD STUDY · DRAFT CompleteTech LLC 2026 · 05 · 02

Reading
Hatch Pet.

Fourteen patterns from a small, complete, well-shaped Codex skill — worth reading not because pets are interesting, but because the patterns transfer to anything else you’d build as a skill.

14patterns
in this study
9rows
animation states
35lines
contract spec
22min
end-to-end run

THESIS

The Hatch Pet skill is a small, complete, well-shaped example of how to build a skill that does one thing end-to-end inside an agentic environment. It is worth reading not because pets are interesting, but because the patterns transfer to anything else you’d want to build as a skill.

OpenAI Codex shipped a Plugins / Skills marketplace, the Hatch Pet skill was published into it under Apache 2.0, and there is currently very little content explaining what good skill design looks like. There’s a window for “here is one to copy” content before the marketplace fills up.

Each pattern below is grounded in a specific source pointer. Every claim has receipts. The blog quotes source verbatim; the carousel pull-quotes it; the X thread screenshots it. Below each pattern, the “lazy version” sketches what an unpolished skill would have done instead — contrast is the fastest way to see what good looks like.

The Fourteen.

WHAT HATCH PET DOES THE LAZY VERSION
01Pattern

Composition over reimplementation.

The skill delegates image generation to a separate system skill ($imagegen) instead of calling the image API directly.

From the source · SKILL.md → Generation Delegation
SKILL.md Use $imagegen for all normal visual generation. … Do not call the Image API directly for the normal path. Let $imagegen choose its own built-in-first path and its own CLI fallback rules.

When $imagegen changes its built-in-first / fallback policy, Hatch Pet benefits without a code change. Composition gives you an upgrade path the lazy version doesn’t have. This is the agent-skill version of depend on interfaces, not implementations — and the skill enforces it explicitly with a “Hard boundary” paragraph.

What Hatch Pet does

Delegates to $imagegen. Inherits the host’s image-gen policy automatically. Stays in its lane: pet-specific prompt planning, atlas geometry, QA, packaging.

The lazy version

Every skill in the marketplace re-implements its own image-gen client, each with subtly different fallback behavior and authentication assumptions. Multiply by N skills.

02Pattern

Subagent parallelism, with a strict write boundary.

Nine row-strip jobs run in parallel across subagents. The parent retains exclusive ownership of all shared state.

From the source · SKILL.md → Subagent Row Generation
SKILL.md Subagent write boundary: do not let subagents edit imagegen-jobs.json, copy files into decoded/, run record_imagegen_result.py, run derive_running_left_from_running_right.py, run finalize_pet_run.py, or package the pet. This avoids manifest races and keeps provenance checks centralized.

Parallelism without a write boundary becomes a race. The parent-as-sole-writer / subagents-as-pure-producers pattern is the agentic version of single-writer principle. It generalizes to anything with a manifest — code generation, doc generation, batch transformation. The skill ships a one-screen handoff template so authors don’t reinvent the contract.

What Hatch Pet does

Subagents return only a path + a one-sentence QA note. Parent alone records, repairs, finalizes, packages.

The lazy version

Two subagents finish at the same instant, both read+write imagegen-jobs.json, last write wins, manifest is silently wrong.

03Pattern

Deterministic vs. generative split, with a hard boundary.

Prompts and references live in markdown. Atlas geometry, validation, packaging, and explicitly gated derivations live in scripts. Scripts may not invent pet visuals.

From the source · SKILL.md → Generation Delegation
SKILL.md Hard boundary: do not create, draw, tile, warp, mirror, or synthesize pet visuals with local Python/Pillow scripts, SVG, canvas, HTML/CSS, or other code-native art as a substitute for $imagegen. … If those calls are too expensive, blocked, or unavailable, stop and explain the blocker instead of fabricating row strips locally.

The temptation when an agent can write code is to have it draw the missing assets. The skill draws an explicit line. Deterministic code may compose, validate, package, and, in one narrow case, derive running-left from running-right only after visual inspection confirms mirroring is appropriate. It may not fabricate new pet art as a substitute for $imagegen.

What Hatch Pet does

When generation can’t run, halt and surface the blocker. No code-native art as a stand-in. The only visual derivation path is the inspected running-right to running-left mirror gate.

The lazy version

On failure, stitch frames with small transforms, ship a sheet that looks fine in thumbnail, break at full size three weeks later.

04Pattern

Identity locks across generative steps.

Every row attaches the canonical base as grounding plus an explicit identity-lock block. Visual drift is a blocker even when geometry QA passes.

From the source · prompts/rows/waving.md
waving.md → Identity lock Do not redesign the pet. Only change pose/action for the waving animation. Preserve the exact head shape, ear/horn/limb shape, face design, markings, palette, outline weight, body proportions, prop design, and overall silhouette from the canonical base pet. … A row that looks like a related but different pet is failed even if the deterministic geometry QA passes.

Generative rows of the same character drift. Hatch Pet’s solution — generate a canonical base first, then attach it as grounding to every subsequent row, plus an explicit list of what’s locked — is the simplest pattern that actually holds identity across nine separate generations.

Identity preserved across rows · canonical base → idle → waving
BASE · 1× Canonical Fathom base reference
IDLE · 6 FRAMES Six-frame idle row of Fathom
WAVING · 4 FRAMES Four-frame waving row of Fathom — identity preserved across pose change

Same head shape. Same gold belly scales. Same propeller hat on the same side. Pose changes, identity does not. The “passed geometry QA but visually drifted” failure mode would replace one of these dragons with a wyvern; the skill makes that a blocker.

What Hatch Pet does

Generate the base once. Attach it to every row job. Lock the identity dimensions in the prompt. Block on visual drift even when geometry passes.

The lazy version

Generate each row from the same text prompt and hope. By row 6 the dragon is now a wyvern. The pipeline reports success because the geometry is right.

05Pattern

Layered references — spec, data, checks.

The references/ folder splits cleanly into three roles, each authoritative for its slice. Mixing them is the default failure mode.

From the source · references/ folder
codex-pet-contract.md SPEC What the host app expects: atlas dimensions, manifest shape, file locations. 35 lines.
animation-rows.md DATA Per-row state name, used columns, frame durations. 9 lines of table.
qa-rubric.md CHECKS Geometry, character consistency, sprite style, completeness, app fitness, repair policy.

Splitting by role lets each one be authoritative for its slice and lets the skill point at the right one for the right step. It also lets a contributor change the QA rubric without touching the spec, which means the spec stays load-bearing instead of getting eroded by drive-by edits.

What Hatch Pet does

Three files, three roles, no overlap. Each one is owned by exactly the layer that needs it.

The lazy version

One 800-line INSTRUCTIONS.md. Two contributors edit in different directions. Spec is silently contradictory by Q3.

06Pattern

Repair the smallest failing scope.

Single bad frame → one row → full atlas regeneration only when identity is broadly broken. Targeted repair, not blanket re-roll.

From the source · qa-rubric.md → Repair Policy
qa-rubric.md Repair the smallest failing scope first:
1. Single bad frame.
2. One row.
3. Full atlas regeneration only when identity or layout is broadly broken.

Generative pipelines that “regenerate the whole thing on failure” are wasteful and destabilizing — a successful row gets re-rolled into a worse one; identity drifts across the regeneration; the user pays for re-generating work that was already correct. Targeted repair is harder to build (you need a manifest that knows what’s done) and very much worth it.

What Hatch Pet does

Reopens only failed row jobs via queue_pet_repairs.py. Successful rows stay locked.

The lazy version

“Regeneration is one button.” Costs 10× as much, drifts identity, incentivizes the user to silence the QA check rather than wait through another full run.

07Pattern

Visible progress plan — baked into the skill itself.

A four-step user-facing checklist runs alongside every job. UX prescription owned inside the skill, not left to the host app.

From the source · SKILL.md → Visible Progress Plan
SKILL.md For every pet run, keep a visible checklist so the user can see where the work is up to. Create the checklist before starting, keep one step active at a time, and update it as each step finishes. … Only mark a step complete when the real file, image, or decision exists.

A 22-minute job with no visible progress feels broken. Most skill authors leave this to the host and ship a “run, wait, get output” UX. The Hatch Pet author owned it inside the skill — same checklist for every run, same wording, no per-host variance. Users learn to read the four-step list once and recognize it across every Hatch Pet run.

What Hatch Pet does

Getting ready · Imagining the look · Picturing the poses · Hatching. One step active. Marked complete only against real artifacts.

The lazy version

Verbose log output to stdout. [INFO] step 17/32 complete. The user is in the kitchen and has no idea what’s left.

08Pattern

Hard “no fallback” rules — where they matter.

Subagent unavailability requires explicit user direction. Mirror derivation requires a written justification flag. No silent degradation.

From the source · SKILL.md → Subagent Row Generation
SKILL.md No silent sequential fallback: if subagents cannot be used for row-strip visual generation, stop and ask for explicit user direction before continuing without them. Only an explicit user instruction such as “do not use subagents” or “run this sequentially” authorizes a normal sequential row-generation path.

Silent fallbacks are how agents accumulate invisible debt. The skill makes escalation rules first-class: if subagents are unavailable, it stops for explicit direction; if running-left might be derived from running-right, it requires visual inspection plus --confirm-appropriate-mirror and a written --decision-note. Any asymmetry sends the row back through grounded generation instead.

What Hatch Pet does

Stops, surfaces the constraint, requires an explicit instruction to proceed. Writes the justification into the record.

The lazy version

Silent fall-through with a warning in the log. Six months later you can’t tell which pets were corner-cut.

09Pattern

Provenance is first-class.

Every visual output recorded with SHA-256, source path, and timestamp. Recording from anywhere except the original generation output is forbidden.

From the source · SKILL.md → Default Workflow
SKILL.md For the built-in path, record the selected source image from $CODEX_HOME/generated_images/.../ig_*.png. Do not record files from the run directory, tmp/, hand-made fixtures, deterministic row folders, or post-processed copies as visual job sources.

Generative pipelines without provenance can’t be debugged six months later. SHA-256 + source path + timestamp lets a user say “this is the exact image that became row 4 of my pet” and verify it. The forbid-recording-from-the-wrong-paths rule prevents the most common provenance-poisoning move (an author “fixes” a row by editing it locally and re-recording the local file as the source).

Provenance also covers human judgment. Mirror decisions are recorded with the run, so a derived row carries not just a file hash, but the reason mirroring was accepted for that specific pet.

pet_request.json · canonical_identity_reference “sha256”: “8160ca6600a75266c515b5350a6937808df984a08ea1404aa8bcb5ce981715b8”
What Hatch Pet does

Hash + source path + timestamp recorded for every output. Recording from anywhere but $imagegen‘s output paths is rejected.

The lazy version

Record the destination path only. After a manual fix, no one can tell whether row 4 came from generation or someone’s hand-edited copy.

10Pattern

Parameterized chroma key, propagated into prompts.

The chroma key isn’t hardcoded. It lives in pet_request.json and gets stamped into every row prompt by name.

From the source · pet_request.json + waving.md
pet_request.json “chroma_key”: { “hex”: “#00FF00”, “name”: “user-selected”, “selection”: “manual” }
waving.md · Layout requirements Use a perfectly flat pure user-selected #00FF00 chroma-key background across the whole image.

Parameters that get propagated into generated content (rather than hardcoded into prompts) are where most pipelines silently rot. If the canonical chroma key is in one place and gets stamped into every prompt, you can change it once and have the whole pipeline track. If it’s hardcoded into 9 row prompts as a string literal, you have 9 places to update and one place to forget.

What Hatch Pet does

Single source of truth in JSON. Prompts dynamically reference it. Switching keys is one edit.

The lazy version

Green is a literal in every prompt file. Switching to a magenta key for a green pet means hand-editing 9 prompts and missing one.

11Pattern

Layout guides as invisible construction references.

Generated layout-guide images get attached to row jobs as inputs. Prompts explicitly forbid the model from rendering the guide.

From the source · waving.md → Layout requirements
waving.md Exactly 4 full-body frames, left to right, in one horizontal row. The attached layout guide shows the 4 frame boxes and inner safe area for this row. Follow its slot count, spacing, centering, and padding. Do not reproduce the layout guide itself: no visible boxes, guide lines, center marks, labels, guide colors, or guide background may appear in the output.

Generative models are bad at counting cells and respecting margins from text alone. Giving them a visible construction reference and instructing them not to render it is a clever way to enforce geometry the model can’t compute. Applies anywhere you want a generative model to follow strict spatial structure.

The construction reference (input) and the rendered output
GUIDE · INPUT ONLY Waving layout guide — four blank cells with blue safe-area boxes and dashed centerlines
RENDERED OUTPUT Waving row — four Fathom poses centered in their cells, no guide artifacts

Left: the layout guide attached as a layout-only input. Right: the resulting row. The guide’s blue safe-area boxes and dashed centerlines are nowhere in the output — the model used them as construction reference and discarded them in the render.

What Hatch Pet does

Generates a per-row guide image. Attaches it as input. Forbids it appearing in the output. Model uses it as silent scaffolding.

The lazy version

Describe “8 frames evenly spaced with 18px safe margin” in text and hope. Get inconsistent counts and uneven spacing across rows.

12Pattern

Repeatable prompt structure.

Every row prompt follows the same block skeleton: opening line → identity lock → frame count → style contract → action → state-specific → transparency → layout.

From the source · prompts/rows/*.md (any of them)
structural skeleton 1 · opening line
2 · identity lock
3 · frame count + identity restatement
4 · style contract
5 · animation action
6 · state-specific requirements
7 · transparency / artifact rules
8 · layout requirements

A prompt with sections that always appear in the same order is a prompt you can extend reliably. Adding a 10th row to this skill is a known-cost operation — copy a row, swap two blocks. Adding a 10th row to a free-form prompt pile is a refactor.

What Hatch Pet does

Same block skeleton across nine row prompts. Action and state-specific blocks swap; everything else holds.

The lazy version

Hand-written prose per row, different ordering, slightly different language for the same concepts, ad-hoc rules buried inline.

13Pattern

Small contract, big production pipeline.

codex-pet-contract.md — what the host app expects — is 35 lines of markdown. The skill that produces conformant output is hundreds of lines plus a dozen scripts.

From the source · file sizes
asymmetry codex-pet-contract.md ········· 35 lines
SKILL.md ····················· 320+ lines
scripts/ ····················· 15 files, ~1.5k lines

Small spec, big pipeline is the right shape — a tiny spec leaves room for many implementations to compete on quality, and big pipelines are where the actual quality work happens. The reverse (huge spec, tiny pipeline) means the spec is doing the implementation’s job and constrains anyone who wants to write a competing skill.

What Hatch Pet does

35-line spec. Anything that produces conformant output is a valid skill. Production quality lives in the pipeline.

The lazy version

600-line “you must implement these 47 endpoints” spec. Every skill is 80% boilerplate matching the spec and 20% actual work.

14Pattern

Deterministic code mirrors the spec — mechanically.

compose_atlas.py opens with constants that exactly match codex-pet-contract.md. Drift is not possible by design.

From the source · scripts/compose_atlas.py
Python# scripts/compose_atlas.py

COLUMNS      = 8
ROWS         = 9
CELL_WIDTH   = 192
CELL_HEIGHT  = 208
ATLAS_WIDTH  = COLUMNS * CELL_WIDTH
ATLAS_HEIGHT = ROWS * CELL_HEIGHT

ROW_SPECS = [
    ("idle",          0, 6),
    ("running-right", 1, 8),
    ("running-left",  2, 8),
    ("waving",        3, 4),
    ("jumping",       4, 5),
    ("failed",        5, 8),
    ("waiting",       6, 6),
    ("running",       7, 6),
    ("review",        8, 6),
]
# ⤷ matches codex-pet-contract.md and animation-rows.md line-for-line.

Drift between spec and implementation is how skills decay. When the contract is mechanically copied into a single constants block, drift is impossible without somebody noticing. Bonus: the constants serve as machine-readable documentation — a reader who only opens the script learns the contract for free.

What Hatch Pet does

Spec values are constants at the top of one file. Reading the script teaches you the contract.

The lazy version

Geometry computed at runtime through three layers of indirection. Six months later nobody knows whether the actual atlas matches the spec.

CROSS-CUTTING OBSERVATIONS · 5 of 5

Patterns that emerge from the patterns.

M · 01

Posture beats process.

“Stop and ask the user” appears more often than “fall back automatically.” The skill’s character is escalate loudly, never degrade silently. That’s a stance, not a checklist.

M · 02

Hard boundaries are first-class.

“Hard boundary,” “no silent sequential fallback,” “do not record from local fixtures.” The skill talks like a team that’s been burned by soft-boundary versions of these rules.

M · 03

User in the loop where it matters.

Pet naming, mirror approval, fallback authorization — anywhere the answer requires taste, context, or visual judgment, the skill stops and asks. Deterministic work runs autonomously only after those gates are satisfied.

M · 04

References are inputs, not vibes.

Reference images aren’t “inspiration.” They’re load-bearing — attached to every job, hashed, recorded, locked. The visual generation has grounding sources, not just text instructions.

M · 05

Small spec, big pipeline, separate concerns.

Spec is short. Data is in JSON. Checks are in their own file. Code is in scripts. Prompts are in markdown. Each file knows its job.

The skill is the lesson. The pets are just the demo.

Follow-up · where this gets interesting

From one pet to a cast.

The same pattern that keeps Fathom consistent across nine animation rows can scale beyond desktop companions. Treat the skill as a character-production pipeline: one style bible, one canonical reference per character, structured manifests, deterministic checks, and reusable outputs that another tool can animate, composite, or remix later.

Build a cartoon cast

Create a lineup of characters with locked palettes, silhouettes, props, proportions, and personality notes. Each character gets a canonical base plus reusable state rows.

Keep continuity episode to episode

The manifest becomes the continuity document: names, dimensions, reference hashes, allowed poses, style rules, and the details that should never drift.

Generate production-ready packs

Instead of one image, the output is a folder of assets: turnarounds, expression sheets, walk cycles, prop variants, transparent exports, and metadata.

Invite the audience into creation

Readers could vote on the next character, submit traits, remix a sidekick, or request a scene. The pipeline gives that engagement somewhere real to go.

The broader point: once a skill can preserve identity across generative steps, it stops being a novelty image generator and starts looking like an asset pipeline for stories, games, training content, comics, and animated explainers.

OpenCLAW tie-in · show orchestration

Make the whole show end to end.

The next layer is not just “generate more characters.” It is using OpenCLAW as the production control plane: one agent keeps the show bible, another directs the cast, specialist agents handle art, dialogue, audio, continuity, rendering, publishing, and audience feedback. Recent voice/audio APIs make that more realistic because dialogue, scratch reads, character voices, transcription, timing, and review notes can all move through the same orchestrated pipeline.

S · 01 Show bible and constraints

OpenCLAW starts with a short but strict series bible: audience, tone, episode length, visual style, character schema, voice rules, safety boundaries, and the approval points where a human must decide.

S · 02 Cast generation pipeline

The Hatch Pet pattern becomes a cast factory. Each character gets canonical art, state rows, expression sheets, props, palette locks, reference hashes, and reusable metadata.

S · 03 Director agent in charge

A director agent assigns roles, requests scenes, checks continuity, blocks shots, rejects drift, and routes work to writer, storyboard, voice, animation, and QA agents.

S · 04 Writers room to storyboard

Writer agents draft beats and scripts. A storyboard agent converts those beats into shot lists, background needs, character poses, camera notes, and asset requests.

S · 05 Voice, audio, and timing

Audio agents use speech-to-text for notes, text-to-speech for scratch and final reads, and realtime voice sessions for directed performance passes, retakes, captions, and timing maps.

S · 06 Render, review, publish, repeat

OpenCLAW coordinates renders, continuity checks, file packaging, platform-specific cuts, blog embeds, short-form clips, and the feedback loop that decides the next episode.

The practical thesis: OpenCLAW does not need to be the drawing model, voice model, editor, or renderer. Its value is orchestration. It can hold the show state, dispatch jobs, preserve memory, require approvals, track artifacts, and let a director agent move an entire cartoon production from idea to published episode.

CompleteTech circuit mark
CompleteTech LLC
Innovation at Every Integration
Field Study · No. 02
Reading Hatch Pet
2026 · 05 · 02
Subject of study: Hatch Pet · OpenAI Codex · Apache License 2.0 Set in Fraunces, IBM Plex Sans, JetBrains Mono.