Building a Multi-Agent AI Development System with Claude Code
I've spent the past several months building out what I'd call a production-grade AI-enabled software engineering workflow. Not just using Claude Code as a coding assistant, but designing an entire infrastructure layer around it: multi-agent orchestration, automated quality gates, session-level coordination, and a reproducible configuration system that deploys to a new machine in one command.
This post is a comprehensive walkthrough of what I've built, how it works, and what I've learned.
Table of Contents
The Foundation: claude-dotfiles
Everything starts with a single GitHub repository: claude-dotfiles. This repo is the single source of truth for my entire Claude Code development workflow. It contains global instructions, enforcement hooks, slash commands, MCP server configs, plugin settings, and workflow documentation, all version-controlled and deployable to any machine.
What's in the repo
claude-dotfiles/
├── CLAUDE.md # Global instructions (~627 lines)
├── settings.json # Hooks, permissions, plugins (template)
├── mcp.json # MCP server configs (template)
├── .claudeignore # Template ignore file for repos
├── github-repo-protocols.md # Full repo lifecycle guide (16KB)
├── multi-agent-system.md # Multi-agent coordination
├── log-system.md # Session logging documentation
├── commands/ # 9 slash commands
│ ├── startup.md
│ ├── commit.md
│ ├── pr.md
│ ├── sync.md
│ ├── new-issue.md
│ ├── gs.md
│ ├── promote-rule.md
│ ├── dotsync.md
│ └── walkthrough.md
├── hooks/ # 4 Python enforcement hooks
│ ├── auto-approve-bash.py
│ ├── auto-approve-file-ops.py
│ ├── enforce-git-workflow.py
│ └── enforce-issue-workflow.py
├── log/ # Log analysis utilities
├── setup.sh # First-time setup script
└── sync-config.sh # Reverse-sync config changes
One-command deployment
The setup.sh script handles everything:
- Creates
~/.claude/if it doesn't exist - Backs up any existing
settings.json - Creates symlinks for files that don't need path templating (CLAUDE.md, commands/, hooks/, docs)
- Generates
settings.jsonandmcp.jsonfrom templates usingsed "s|__HOME__|$HOME|g"to expand real paths - Optionally enables GitHub Repo Protocols (toggled by symlink presence)
- Adds
GITHUB_TOKENto~/.zshrcif missing - Checks prerequisites (gh CLI, TypeScript LSP)
The key design decision here is the split between symlinked files and generated files. Files like CLAUDE.md and commands/ are identical across machines, so they're symlinked directly. But settings.json and mcp.json contain absolute paths (/Users/lem/code/...), so they use __HOME__ placeholders in the repo and get generated with real paths at setup time.
Reverse sync
When I modify settings locally (add a new permission, configure a new hook), I run /dotsync, which calls sync-config.sh. This script reads the live ~/.claude/settings.json, replaces the real home path back to __HOME__, compares to the repo version, writes if changed, and pushes to GitHub. On another machine, a git pull plus re-running setup.sh applies the updates.
Standardized .claudeignore
Every repo gets the same .claudeignore template. It excludes node_modules/, dist/, lock files, IDE configs, binary/media files, .env files, and Supabase temp directories. This keeps Claude's context window clean and focused on actual source code.
The Permission System
The settings.json file contains an exhaustive permission configuration with approximately 860 allow entries and 13 deny entries.
The deny list
The deny list is the safety layer. It blocks destructive operations at the tool level:
rm, rmdir # File deletion
git reset --hard # Destructive git resets
git push --force, git push -f # Force pushes
git clean # Working tree cleaning
git branch -D # Force-delete branches
docker rm, docker rmi # Container/image deletion
docker system prune # Docker cleanup
kubectl delete # Kubernetes resource deletion
DROP, TRUNCATE, DELETE FROM # Destructive SQL
The philosophy: AI agents should never be able to accidentally destroy work. These operations require explicit human confirmation.
The allow list
The allow list covers essentially every CLI tool I use: git, gh, npm/yarn/pnpm/bun, cargo/rustup, python/pip, docker, kubectl, terraform, supabase, wrangler, aws/gcloud/az, psql/mysql/redis-cli, ffmpeg, and hundreds more. The breadth eliminates permission prompts for legitimate development commands while the deny list catches the dangerous ones.
The bug that inspired the hooks
Claude Code has known bugs (GitHub issues #15921 and #13340) where the VSCode extension ignores settings.json permissions entirely, and piped commands bypass the allow list. I documented this in a bug report and built a workaround: Python hooks that re-implement the permission logic at the tool-use interception layer, where they're reliably enforced regardless of the client.
Enforcement Hooks
Four Python scripts intercept Claude Code's tool calls at different layers. They're the governance framework that ensures AI agents follow the same workflow rules as human developers.
enforce-issue-workflow.py (UserPromptSubmit)
Fires on every user prompt. It detects work-request verbs (update, fix, add, create, implement, build, change, modify, refactor, etc.) and filters out questions (starts with what/why/how, ends with ?, contains explain/describe/list).
When a work request is detected, it injects a workflow reminder into Claude's context:
STOP - Before making ANY code or file changes, you MUST:
1. CHECK: Does a GitHub issue exist for this work?
2. CREATE BRANCH: git checkout -b {issue-number}-{description}
3. IMPLEMENT: Make your changes
4. COMMIT & PR with issue-number prefix
This hook is toggled by the presence of ~/.claude/github-repo-protocols.md as a symlink. Remove the symlink and the hook becomes a no-op. This lets me disable the full issue-tracking workflow for quick experiments without modifying any code.
enforce-git-workflow.py (PreToolUse:Bash)
Intercepts every git commit and git push command. It enforces three rules:
- No commits on main: Must use a feature branch. The hook detects the current branch and blocks if it's
mainormaster. - Commit message format: Must match
^\d+:(issue number prefix like42: Fix the login bug). Parses the message from-mor--messageflags. Skips validation for heredoc-style messages (can't parse them reliably), merge commits, and amend-without-new-message. - No pushes to main: Detects if the push target resolves to main (handles
HEAD, explicitmain, no refspec).
There's an allowlist (DIRECT_TO_MAIN_REPOS) that exempts the dotfiles repo itself, and an emergency bypass via ALLOW_MAIN_COMMIT=1 environment variable.
auto-approve-bash.py (PreToolUse:Bash)
The workaround for the settings.json permission bug. This hook reads all Bash(pattern) entries from ~/.claude/settings.json at runtime. For each incoming Bash command, it checks the deny list first (higher priority), then the allow list. Pattern matching handles :* suffix (prefix match), * suffix (prefix match), and startswith fallback.
If a command matches a deny pattern, it returns permissionDecision: deny with a user-readable reason. If it matches an allow pattern, it returns permissionDecision: allow. If no match, it exits silently and falls through to Claude's default permission system.
auto-approve-file-ops.py (PreToolUse:Read/Edit/Write)
Same rationale as the Bash hook, same bug. Loads Read(...), Edit(...), Write(...) path patterns from settings. Normalizes paths, handles ** glob as prefix match, falls back to fnmatch. Currently auto-approves all file operations within ~/code/**, ~/.claude/**, and /tmp/**.
Slash Commands
Nine custom slash commands standardize the most common development workflows. Each is a markdown file in commands/ with frontmatter specifying description, allowed-tools, and optional argument-hint. They use ! prefix syntax to execute shell commands inline at invocation time.
/startup (Session Initialization)
The most complex command at ~250 lines. It runs at the beginning of every Claude Code session and bootstraps the agent's context:
Section 0: Checks if the current directory is a git repo. If not, offers alternatives.
Section 1 (Session Logging): Pulls the latest logs from the centralized log repo. Derives agent identity from the working directory suffix (lem-work-2 -> agent-2). Checks for today's log file. If the log repo has uncommitted changes older than 60 minutes, auto-commits and pushes them.
Section 2 (Git Status): Current branch, ahead/behind main, working directory cleanliness.
Section 3 (Open PRs): Lists all open pull requests for the current repo.
Section 4 (Open Issues): Lists issues by status: all open, in-progress, in-review. This gives the agent a clear picture of what work is available.
Section 5 (Dependencies): Checks if package.json exists, whether lock files are present, and if dependencies might be out of date relative to main.
After running all sections, it presents the agent with a prioritized recommendation of what to work on next.
/commit
Stages changes, runs lint/build verification, and commits with the enforced {issue_number}: {description} format. Reports the commit hash on success.
/pr
The full PR workflow: pre-flight cleanliness check, lint/build/test verification, rebase on origin/main, push with -u, check for PR template files in two locations (repo root and .github/), create the PR with gh pr create, update the issue label from in-progress to in-review, and report the PR URL.
/sync
Fetch origin and rebase the current branch on main. Handles uncommitted changes (offers to stash, commit, or abort). Reports any conflicts. Reminds about --force-with-lease for already-pushed branches.
/new-issue
Creates GitHub issues with the proper label taxonomy. Gathers title, type, and description. Maps types to labels (feature -> enhancement, bug -> bug). Supports a special human-agent type for tasks requiring manual intervention (env vars, account setup, etc.), which auto-assigns to the repo owner.
/gs
Quick git status overview: branch name, remote tracking, working directory status, sync status relative to main, last 5 commits, and open PRs. Summarizes the state and recommends the next action.
/promote-rule
Analyzes the current repo's CLAUDE.md for rules that should be promoted to the global config. Checks for explicit <!-- CANDIDATE:GLOBAL --> markers, detects implicit candidates (rules that are repo-agnostic), reads 2-3 other CLAUDE.md files to identify patterns, checks against the global config for duplicates, and presents findings in a table with a recommendation.
/dotsync
Runs sync-config.sh --dry first for a preview of what will change, then runs the actual sync. Reports what was updated and whether the changes were committed and pushed.
/walkthrough
Activates step-by-step guided mode for complex tasks. Identifies the task, breaks it into discrete steps, and presents one step at a time with a progress counter (Step N/Total). Waits for the user to confirm completion before proceeding to the next step. Never skips ahead.
The /audit Skill
Beyond slash commands, I've built a comprehensive codebase audit system as a Claude Code skill. The /audit command is a 7-phase self-healing system that launches 8 parallel agents, auto-fixes what it can, and creates GitHub issues for what needs human review.
Phase 1: Pre-Flight
In fix mode (the default), it verifies the working directory is clean and creates a checkpoint branch (audit-checkpoint-YYYYMMDD-HHMMSS) for rollback. If there are uncommitted changes, it stops and offers options.
Phase 2: Discovery
Detects monorepo structure (checks for apps/, packages/, workspaces, pnpm-workspace.yaml), identifies the tech stack (TypeScript, React, package manager), reads existing CLAUDE.md and ESLint configs, and determines the audit scope.
Phase 3: Parallel Audit (8 Agents)
Eight Task agents launch simultaneously, each covering a different category:
| Agent | Focus | Auto-fixable Examples |
|---|---|---|
| Security | Hardcoded secrets, SQL injection, XSS, auth issues | Remove console.logs with sensitive data |
| Dependencies | npm vulnerabilities, outdated packages, unused deps | npm audit fix, npm uninstall unused |
| Code Quality | ESLint violations, unused vars, long methods, empty catches | eslint --fix, remove unused imports |
| Architecture | Circular deps, god objects, layering violations | Limited (mostly human review) |
| TypeScript/React | Excessive any, hooks violations, Fast Refresh issues |
Add inferred types, add missing keys |
| Testing | Missing test files, tests without assertions | Generate test stubs |
| Documentation | Missing JSDoc, stale comments, README gaps | Generate JSDoc from types |
| Performance | N+1 queries, missing React.memo, large imports | Add memo wrappers |
Each agent reports structured JSON with severity, file/line, description, auto-fixability status, and fix confidence level.
Phase 4: Classify
All findings are collected, deduplicated, and split into two queues: auto_fix_queue (high-confidence fixable) and human_review_queue (everything else).
Phase 5: Fix Cycle
For each auto-fixable finding, a Task agent implements the fix. Then the verification suite runs (lint, type-check, tests). If all pass, the fix is committed as an atomic commit (audit: {category} - {brief title}). If any verification fails, the fix is reverted with git checkout -- . and moved to the human review queue.
Phase 6: Summary
Displays results in a table by category and severity, with fix results (successfully fixed / failed verification / skipped for human review), a list of all commits made, and rollback instructions pointing to the checkpoint branch.
Phase 7: Issue Creation
Optionally creates GitHub issues for findings that need human review. Labels them with audit and needs-human-review. Groups findings by category into single issues. Checks for existing audit issues to avoid duplicates on re-runs.
Reference Files
The skill includes five reference documents that the audit agents consult:
- security-patterns.md: OWASP Top 10 mapping, regex patterns for detecting API keys (OpenAI
sk-, GitHubghp_, AWSAKIA), injection patterns, auth weaknesses, insecure configurations, severity guidelines - architecture.md: Clean Architecture patterns, god object detection thresholds (>1000 lines, >20 methods), circular dependency detection via
madge, feature-based vertical slice recommendations - code-quality.md: Martin Fowler refactoring patterns, cyclomatic complexity thresholds (>10/function), cognitive complexity (>15), nesting depth (>4 levels), naming consistency
- fix-patterns.md: Implementation guides for each fix type with when-to/when-not-to guidance, verification commands by project type
- output-template.md: GitHub issue templates with severity-based sections, collapsible details for medium/low findings, rollback instructions
MCP Servers
Seven MCP (Model Context Protocol) servers are configured globally, giving Claude Code direct access to external services:
| Server | Purpose | Details |
|---|---|---|
| GitHub | Issue/PR management | Uses GITHUB_TOKEN from environment |
| Filesystem | Local file access | Scoped to ~/code |
| Memory | Persistent context | Cross-session knowledge retention |
| Fetch | Web content retrieval | HTTP fetching for documentation, APIs |
| Supabase (lem-work) | Database access | Direct SQL, migrations, logs for lem-work project |
| Supabase (lem-photo) | Database access | Direct SQL, migrations, logs for lem-photo project |
| n8n | Workflow automation | Connected to cloud n8n instance for pipeline management |
The MCP config uses __HOME__ placeholders in the dotfiles repo and gets expanded to real paths by setup.sh. This means the same config template works on any machine without modification.
Having Supabase as an MCP server is particularly valuable for debugging. Instead of using browser automation to diagnose database issues (slow, unreliable), I can run SQL directly, check logs, and apply migrations from within Claude Code. The debugging priority I've established is: Supabase MCP first for data issues, server logs for API issues, browser automation only as a last resort for UI issues.
Plugins
Fourteen Claude Code plugins extend the base functionality. Grouped by function:
Code Quality
- code-review: Multi-agent PR review that examines code changes across multiple dimensions
- pr-review-toolkit: Comprehensive review agents including a silent-failure hunter, code simplifier, comment analyzer, test analyzer, and type design analyzer
- security-guidance: Real-time security checks on file edits as they happen
Development
- feature-dev: Guided feature development with codebase analysis and architecture-focused planning
- frontend-design: Production-grade UI generation that avoids generic AI aesthetics
- typescript-lsp: TypeScript language server integration for type-aware code intelligence
- serena: Semantic code analysis and understanding
Integration
- github: GitHub platform integration (issues, PRs, branches, releases)
- supabase: Supabase project management and database operations
- playwright: Headless browser automation for E2E testing and screenshots
- figma: Figma design tool integration for implementing designs from Figma files
- greptile: Deep codebase search and understanding
Workflow
- hookify: Create custom Claude Code hooks from conversation analysis
- explanatory-output-style: Educational/explanatory output mode that provides insights about implementation choices
All plugins are pre-authorized in the permissions system. If a plugin is installed, the user has already decided to grant access. No per-operation permission prompts.
Multi-Agent Orchestration
This is the core of the system. Four Claude Code agents work in parallel on the same codebase without file conflicts, git collisions, or duplicated work.
Architecture
Each project that supports multi-agent work uses four independent clones:
~/code/{repo}-repos/
├── {repo}-0/ # Clone 0 (agent-0)
├── {repo}-1/ # Clone 1 (agent-1)
├── {repo}-2/ # Clone 2 (agent-2)
└── {repo}-3/ # Clone 3 (agent-3)
Each clone is a full, independent git repository with its own .git/ directory, branches, stash, reflog, and node_modules/. There are no shared resources between clones, which means:
- Full isolation: Each agent has its own git state, its own branches, its own stash. No cross-agent interference.
- Standard git workflows: Every git command works exactly as documented. No special rules or workarounds.
- Independent fetches: Each clone fetches on its own schedule. No shared object store to reason about.
Each agent runs in its own tmux pane, visible simultaneously. I manage all four from my phone via Blink terminal (iOS) through Mosh + Tailscale, which survives WiFi drops and device sleep.
The evolution from worktrees to clones
The original architecture used a bare git repo plus four worktrees. The theory was compelling: shared git object store, single fetch updates all worktrees, zero duplicated data. After several weeks of running this across four repos with four agents each, the theory fell apart.
What broke:
-
Branch locking was the killer issue. Worktrees can't have the same branch checked out in two places. This broke standard
git checkout -bworkflows. The enforce-git-workflow hook and issue-workflow hook both assumed standard branch creation. Agents would create a branch but commits would end up on the wrong branch because the worktree was locked to its parking branch. -
Shared stashes were a liability. Worktrees share the reflog and stash through the bare repo. An agent stashing changes in one worktree could interfere with another agent's stash pop.
-
"Never checkout main" rule was confusing. Local main couldn't exist in bare repo worktrees. Every agent and every hook had to use
origin/maineverywhere. This constantly tripped up agents and broke workflow assumptions. -
Storage savings were negligible. The bare repos were 2-9MB each. Duplicating across four clones adds at most 36MB total.
node_modules/(the real disk consumer) was already per-worktree anyway. -
Single fetch was rarely useful. Agents fetch at different times. The shared-fetch advantage only matters if all agents need the same new commits simultaneously, which almost never happens.
-
Simpler mental model wins. A clone is a clone. Every developer and every AI agent understands it. Worktrees are a git power feature that adds cognitive overhead for agents that already have enough to track.
The migration was straightforward: about 10 minutes per project. Delete the bare repo, clone four times, copy over .env files and node_modules/. The slightly higher disk usage is worth the dramatically simpler mental model.
Agent identity
Agent number is derived automatically from the working directory name suffix:
AGENT_NUM=$(basename "$PWD" | grep -oE '[0-9]+$' || echo "0")
| Directory | Agent |
|---|---|
lem-work-0 or lem-work |
agent-0 |
lem-work-1 |
agent-1 |
lem-work-2 |
agent-2 |
lem-work-3 |
agent-3 |
No configuration needed. The agent knows who it is from where it's running.
Issue claiming protocol
Before any agent writes a single line of code, it must claim the issue through a multi-check protocol:
-
Check GitHub labels: Skip if any
agent-*orin-progresslabel existsgh issue view {number} --json labels --jq '[.labels[].name]' -
Check sibling clone branches: Skip if any clone already has a branch starting with
{issue-number}-for dir in $(find ~/code/{repo}-repos/ -maxdepth 1 -name "{repo}-[0-9]*" -type d); do echo "$(basename $dir): $(git -C $dir branch --show-current)" done -
Label the issue immediately (before creating a branch):
gh issue edit {number} --add-label "in-progress" --add-label "agent-{N}" -
Verify labels applied before proceeding
Both checks must be clear. If there's a conflict (two agents try to claim simultaneously), the human supervisor resolves it. In practice, this hasn't been an issue because the label check and branch check together create a reliable mutex.
Git workflow
Standard git workflows apply. No special rules needed:
- Branch from main:
git checkout main && git pull && git checkout -b {issue}-{desc}or directlygit checkout -b {issue}-{desc} origin/main. Both work. - After PR merge: Return to main:
git checkout main && git pull && git branch -d {old-branch} - Stashes are per-clone: Each agent has its own stash. No cross-agent interference.
- npm install is per-clone: Each has its own
node_modules/. - Env files are per-clone:
.envand.env.localmust be copied individually.
Session Logging System
The coordination layer that makes multi-agent work possible. Without it, agents would duplicate work, lose context between sessions, and have no awareness of what other agents are doing.
Architecture
A centralized git repo (lem-agent-logs) where every agent writes structured entries:
~/code/lem-agent-logs/
├── biotechstonks/
│ └── 20260203/
│ ├── agent-0.md
│ └── agent-1.md
├── darkly-suite/
│ └── 20260216/
│ ├── agent-0.md
│ ├── agent-1.md
│ ├── agent-2.md
│ └── agent-3.md
├── gmail-darkly/
│ └── 20260212/
│ ├── agent-0.md
│ ├── agent-1.md
│ └── agent-2.md
└── ... (14 projects tracked)
Each agent writes exclusively to its own file. This eliminates merge conflicts entirely. When multiple agents push to the log repo simultaneously, git pull --rebase resolves cleanly because the files never overlap.
Mandatory log triggers
The log must be updated immediately at each of these events:
- After every git commit: Issue number, branch, what changed, decisions made, gotchas encountered
- After creating a PR: Issue number, PR URL, mark
#in-review - After PR merge: Mark
#completed, commit and push the log repo - After closing an issue: Resolution summary, mark
#completed, commit and push - Before context compaction: Current WIP state, uncommitted changes, next step (this preserves context when Claude's conversation gets too long and earlier messages are compressed)
Cross-agent awareness
At session start (/startup), every agent reads other agents' logs for today's date directory. This builds a "Cross-Agent Awareness" section that documents what each sibling is working on:
## Cross-Agent Awareness
- **Agent-1** (today): Working on issue #84, created PR #87. In-review.
- **Agent-2** (today): Idle on agent-2 branch.
- **Agent-3** (today): Working on issue #95 (bundle completion).
The agent uses this to avoid claiming issues that are already in flight, even if the GitHub labels haven't been updated yet.
Log file format
Real example from a session:
# agent-0 — 20260217 — darkly-suite
> Continued from previous session (20260216)
## Session Start
- **Time**: 2026-02-17 afternoon ET
- **Branch**: `main` (clean, up to date)
- **Previous session context**: Completed massive monorepo buildout...
## Work Log
### Issue #92 — Fix mini-panel white background in dark mode
- **Branch**: `92-fix-mini-panel-dark-mode` (from `origin/main`)
- **PR**: #93 — https://github.com/lucasmccomb/darkly-suite/pull/93
- **Merged**: PR #93 → `4f124a1` on main #completed
**Wrong fix (commit 1)**: Added settings-container class to mini-panel.
This broke layout because .settings-container has height: 100% and display: flex...
**Correct fix (commit 2)**: Added mini-panel to the CSS re-inversion rules.
**Lesson**: When working with filter-based dark mode, prefer CSS-only fixes over DOM class changes.
Key elements: continuation notes linking to previous sessions, issue/branch/PR references with URLs, status tags (#completed, #in-progress, #blocked), decision capture ("wrong fix / correct fix"), and lesson sections that preserve institutional knowledge across sessions.
Per-Project Configuration Layering
Claude Code loads all CLAUDE.md files in the directory hierarchy from root to working directory. This creates a three-tier inheritance system:
-
Global (
~/.claude/CLAUDE.md): Universal rules that apply everywhere. No AI attribution in commits, PR template usage, git workflow, session logging, code standards, security, MCP tool permissions. -
Workspace (
~/code/CLAUDE.md): Describes the multi-project directory layout, repo shorthand references, git aliases. -
Project (
{repo}/CLAUDE.md): Project-specific build commands, tech stack, unique behaviors.
How projects specialize
| Project | Tech Stack | Unique Rules |
|---|---|---|
| lem-photo | React + Express + Supabase + Cloudflare + SwiftUI | "Build locally, deploy pre-built" server model (Render's free tier can't compile TS). Never use --no-verify for server pushes. CI disabled, replaced by 8-step pre-push hook. Migrations run via Supabase MCP directly. |
| gmail-darkly | Chrome Extension + TypeScript + React + Stripe | Never verify in browser via automation (extension requires manual chrome://extensions refresh). Dev mode bypasses Stripe payment gate. InboxSDK integration rules. gd- CSS prefix. |
| sheets-darkly | Chrome Extension + TypeScript + React + Stripe | Custom DOM injection (no InboxSDK). Waffle grid canvas handling (double-inversion technique). sd- CSS prefix. |
| docs-darkly | Chrome Extension + TypeScript + React + Stripe | Kix canvas tile handling for Google Docs. dd- CSS prefix. |
| darkly-suite | pnpm monorepo, 9 packages, 4 extensions | Default to dev mode (production build enables Stripe paywall locally). Auto-rebuild decision flow. Conflict detection via data-darkly-active attribute. Prefix resolution uses three strategies (CSS loader, React context, config injection). |
| biotechstonks | React + Express + Supabase + n8n | 4-agent n8n AI pipeline. JSONB tags with GIN indexes. Entity lazy-creation pattern. Finnhub stock API integration with rate limiting. |
| totomail | Tauri 2 + React + Rust + SQLite | Tauri IPC patterns. OAuth tokens in OS keychain. Rust toolchain (cargo check). |
| human-of-habit | React + Supabase + Tailwind + Radix | Vanilla CSS imports (not CSS modules). Migrations require human-agent issues (older pattern). |
| nadaproof | React + Supabase + Vite | CI enabled and mirrors pre-push hooks exactly. Monorepo with shared package built first. Security headers in render.yaml. |
Project files that say "global instructions are in ~/.claude/CLAUDE.md" are documenting what Claude Code already handles automatically. It's a human-readable reminder that these files intentionally contain only project-specific additions.
Quality Gates
Quality is enforced at multiple levels, from individual keystrokes to deployment.
Pre-commit (every commit)
A global git hook runs on every commit across all repos:
- gitleaks: Scans for accidentally committed secrets (API keys, tokens, passwords)
- lint-staged: Runs ESLint with
--fixand TypeScript type-checking on staged files only
Pre-push (every push)
The most sophisticated gate. lem-photo's pre-push hook is 113 lines and mirrors the full CI pipeline:
- Lint client workspace
- Lint server workspace
- Type-check client
- Type-check server
- Run client tests
- Run server tests
- Build client
- Build server + verify
server/dist/is committed and up-to-date
A rapid mode (SKIP_CHECKS=1 git push) skips steps 1-6 but still builds both workspaces and verifies the server dist. The hook uses colored output and timing for each step.
Other projects adapt this pattern to their needs. nadaproof builds its shared package first, then runs the standard lint/type-check/test/build sequence. The darkly extensions run lint, type-check, tests, and webpack build.
CI/CD (GitHub Actions)
Where enabled, CI runs on PR events and push-to-main. The pipeline typically mirrors the pre-push hook exactly, ensuring that local and CI checks are identical. Some projects (lem-photo) have disabled CI to save GitHub Actions minutes and rely entirely on the pre-push hook.
Automated accessibility testing
All darkly extensions include WCAG AA contrast ratio tests that run programmatically across every theme preset and color pairing. The test computes relative luminance using the WCAG 2.1 formula and asserts >= 4.5:1 for body text and >= 3:1 for UI components. This catches accessibility regressions automatically across 78+ theme/color combinations.
Coverage thresholds
Test coverage is enforced at the framework level. lem-photo requires 60% coverage on the client and 70% on the server. Vitest fails the run if thresholds aren't met, which means the pre-push hook blocks the push.
Real Results
darkly-suite: Full monorepo in 2 days
Four agents working in parallel built an entire pnpm monorepo from scratch:
- 9 packages (shared core, 3 site packages, 4 extension packages, landing page)
- 4 Chrome extensions (Gmail, Sheets, Docs, bundle)
- Self-hosted payment system (Cloudflare Pages Functions + D1 + Stripe)
- 23 PRs merged across 2 days
- All tests passing, all extensions functional
The git log of the log repo shows the real-time coordination: four agents committing to the shared log repo within seconds of each other, each working on a different package, with git pull --rebase resolving cleanly because each agent writes to its own file.
Day 1 parallel streams:
- Agent-0: Scaffolded monorepo, created all 61 GitHub issues, coordinated merges
- Agent-1: Built site packages and individual extensions
- Agent-2: Built CSS prefix loader, webpack factory, bundle extension, CI
- Agent-3: Built landing page, payment APIs, D1 schema, marketing pages
gmail-darkly: 50 issues in a sprint
50 agent issues closed across 7 PRs, with 120+ tests passing. 18 PRs merged on day 1 alone. The extension shipped with self-hosted Stripe payments (no SDK, raw crypto.subtle HMAC-SHA256 for webhook verification), WCAG-validated themes, InboxSDK integration, and a full admin portal.
lem-photo: 2,267+ tests
The most tested project in the portfolio:
- ~1,388 client tests (Vitest + React Testing Library + MSW in strict mode)
- ~764 server tests (Vitest + Supertest)
- ~89 Playwright E2E tests with programmatic auth (two users: regular + admin)
- 26 Python pytest tests for the facial recognition microservice
- Coverage thresholds enforced (60% client, 70% server)
The Playwright setup authenticates via Supabase REST API directly (not through browser UI), serializes session state as JSON, and injects it into browser contexts via custom fixtures. This makes E2E tests deterministic and headless-friendly.
biotechstonks: AI pipeline processing 17 RSS feeds
A 4-agent n8n workflow running daily at noon:
- 4 Claude Opus instances processing different RSS feed categories in parallel (Reddit, News/PR Newswire, BioSpace, GlobeNewswire)
- 17 RSS feed tools across the 4 agents
- Structured output parsing (companies, topics, sectors, action classification)
- Deduplication via
source_urlunique constraint + upsert RPC - JSONB tags with GIN indexes for flexible querying
How This Scales
The system I've built for personal use maps directly to a team environment.
The consultant starter kit
The claude-dotfiles repo becomes a team template. Fork it, swap in client-specific values (GitHub org, Supabase project, n8n instance), run setup.sh, and every developer has identical enforcement hooks, quality gates, slash commands, and MCP connections from day one. No manual configuration of individual repos.
Boilerplate project templates
A library of pre-configured project templates, each with CI/CD pipelines, test frameworks, pre-push hooks, and CLAUDE.md project instructions already baked in. A new client engagement starts with picking the right template (React + Express, Chrome Extension, n8n pipeline, etc.), spinning up the repo, and the quality infrastructure is already in place.
Multi-consultant coordination
The session logging system scales naturally. If three consultants are working on the same client project, they each write to their own log file in the same date directory. At session start, they read each other's logs for awareness. The same coordination layer that prevents my four personal agents from stepping on each other works across a team.
This could be adapted to integrate with external tools. An MCP server configured to update Jira tasks or Confluence docs as work progresses would give non-technical stakeholders visibility into what's happening without requiring them to read git logs.
Shared infrastructure
For MCP tooling, shared remote servers (GitHub, Supabase, n8n) rather than local instances. Onboarding a new team member is just connecting to existing infrastructure rather than rebuilding it from scratch.
The enforcement hooks as governance
The hooks ensure that AI agents follow the same workflow rules as human developers. No special treatment, no shortcuts. Every commit references an issue. Every push passes the full verification suite. Every destructive operation is blocked. This is the kind of policy layer that enterprises need when adopting AI in their development workflows.
Closing Thoughts
The most interesting thing about this system isn't any individual component. It's how they compose. The dotfiles repo makes the system reproducible. The hooks make it self-enforcing. The logging makes it observable. The clones make it parallel. And the per-project CLAUDE.md files make it adaptable.
A solo developer with four AI agents and the right infrastructure can produce output that would typically require a small engineering team. But the productivity multiplier isn't just about speed. It's about maintaining quality at scale through automated enforcement.
The system works not because the AI models are perfect, but because the infrastructure around them is designed to catch mistakes, prevent conflicts, and preserve context. That's the real lesson: AI enablement isn't about the models. It's about the workflow you design around them.