Building a Multi-Agent AI Development System with Claude Code

I've spent the past several months building out what I'd call a production-grade AI-enabled software engineering workflow. Not just using Claude Code as a coding assistant, but designing an entire infrastructure layer around it: multi-agent orchestration, automated quality gates, session-level coordination, and a reproducible configuration system that deploys to a new machine in one command.

This post is a comprehensive walkthrough of what I've built, how it works, and what I've learned.


Table of Contents

The Foundation: claude-dotfiles

Everything starts with a single GitHub repository: claude-dotfiles. This repo is the single source of truth for my entire Claude Code development workflow. It contains global instructions, enforcement hooks, slash commands, MCP server configs, plugin settings, and workflow documentation, all version-controlled and deployable to any machine.

What's in the repo

claude-dotfiles/
├── CLAUDE.md                    # Global instructions (~627 lines)
├── settings.json                # Hooks, permissions, plugins (template)
├── mcp.json                     # MCP server configs (template)
├── .claudeignore                # Template ignore file for repos
├── github-repo-protocols.md     # Full repo lifecycle guide (16KB)
├── multi-agent-system.md        # Multi-agent coordination
├── log-system.md                # Session logging documentation
├── commands/                    # 9 slash commands
│   ├── startup.md
│   ├── commit.md
│   ├── pr.md
│   ├── sync.md
│   ├── new-issue.md
│   ├── gs.md
│   ├── promote-rule.md
│   ├── dotsync.md
│   └── walkthrough.md
├── hooks/                       # 4 Python enforcement hooks
│   ├── auto-approve-bash.py
│   ├── auto-approve-file-ops.py
│   ├── enforce-git-workflow.py
│   └── enforce-issue-workflow.py
├── log/                         # Log analysis utilities
├── setup.sh                     # First-time setup script
└── sync-config.sh               # Reverse-sync config changes

One-command deployment

The setup.sh script handles everything:

  1. Creates ~/.claude/ if it doesn't exist
  2. Backs up any existing settings.json
  3. Creates symlinks for files that don't need path templating (CLAUDE.md, commands/, hooks/, docs)
  4. Generates settings.json and mcp.json from templates using sed "s|__HOME__|$HOME|g" to expand real paths
  5. Optionally enables GitHub Repo Protocols (toggled by symlink presence)
  6. Adds GITHUB_TOKEN to ~/.zshrc if missing
  7. Checks prerequisites (gh CLI, TypeScript LSP)

The key design decision here is the split between symlinked files and generated files. Files like CLAUDE.md and commands/ are identical across machines, so they're symlinked directly. But settings.json and mcp.json contain absolute paths (/Users/lem/code/...), so they use __HOME__ placeholders in the repo and get generated with real paths at setup time.

Reverse sync

When I modify settings locally (add a new permission, configure a new hook), I run /dotsync, which calls sync-config.sh. This script reads the live ~/.claude/settings.json, replaces the real home path back to __HOME__, compares to the repo version, writes if changed, and pushes to GitHub. On another machine, a git pull plus re-running setup.sh applies the updates.

Standardized .claudeignore

Every repo gets the same .claudeignore template. It excludes node_modules/, dist/, lock files, IDE configs, binary/media files, .env files, and Supabase temp directories. This keeps Claude's context window clean and focused on actual source code.


The Permission System

The settings.json file contains an exhaustive permission configuration with approximately 860 allow entries and 13 deny entries.

The deny list

The deny list is the safety layer. It blocks destructive operations at the tool level:

rm, rmdir                          # File deletion
git reset --hard                   # Destructive git resets
git push --force, git push -f      # Force pushes
git clean                          # Working tree cleaning
git branch -D                      # Force-delete branches
docker rm, docker rmi              # Container/image deletion
docker system prune                # Docker cleanup
kubectl delete                     # Kubernetes resource deletion
DROP, TRUNCATE, DELETE FROM        # Destructive SQL

The philosophy: AI agents should never be able to accidentally destroy work. These operations require explicit human confirmation.

The allow list

The allow list covers essentially every CLI tool I use: git, gh, npm/yarn/pnpm/bun, cargo/rustup, python/pip, docker, kubectl, terraform, supabase, wrangler, aws/gcloud/az, psql/mysql/redis-cli, ffmpeg, and hundreds more. The breadth eliminates permission prompts for legitimate development commands while the deny list catches the dangerous ones.

The bug that inspired the hooks

Claude Code has known bugs (GitHub issues #15921 and #13340) where the VSCode extension ignores settings.json permissions entirely, and piped commands bypass the allow list. I documented this in a bug report and built a workaround: Python hooks that re-implement the permission logic at the tool-use interception layer, where they're reliably enforced regardless of the client.


Enforcement Hooks

Four Python scripts intercept Claude Code's tool calls at different layers. They're the governance framework that ensures AI agents follow the same workflow rules as human developers.

enforce-issue-workflow.py (UserPromptSubmit)

Fires on every user prompt. It detects work-request verbs (update, fix, add, create, implement, build, change, modify, refactor, etc.) and filters out questions (starts with what/why/how, ends with ?, contains explain/describe/list).

When a work request is detected, it injects a workflow reminder into Claude's context:

STOP - Before making ANY code or file changes, you MUST:
1. CHECK: Does a GitHub issue exist for this work?
2. CREATE BRANCH: git checkout -b {issue-number}-{description}
3. IMPLEMENT: Make your changes
4. COMMIT & PR with issue-number prefix

This hook is toggled by the presence of ~/.claude/github-repo-protocols.md as a symlink. Remove the symlink and the hook becomes a no-op. This lets me disable the full issue-tracking workflow for quick experiments without modifying any code.

enforce-git-workflow.py (PreToolUse:Bash)

Intercepts every git commit and git push command. It enforces three rules:

  1. No commits on main: Must use a feature branch. The hook detects the current branch and blocks if it's main or master.
  2. Commit message format: Must match ^\d+: (issue number prefix like 42: Fix the login bug). Parses the message from -m or --message flags. Skips validation for heredoc-style messages (can't parse them reliably), merge commits, and amend-without-new-message.
  3. No pushes to main: Detects if the push target resolves to main (handles HEAD, explicit main, no refspec).

There's an allowlist (DIRECT_TO_MAIN_REPOS) that exempts the dotfiles repo itself, and an emergency bypass via ALLOW_MAIN_COMMIT=1 environment variable.

auto-approve-bash.py (PreToolUse:Bash)

The workaround for the settings.json permission bug. This hook reads all Bash(pattern) entries from ~/.claude/settings.json at runtime. For each incoming Bash command, it checks the deny list first (higher priority), then the allow list. Pattern matching handles :* suffix (prefix match), * suffix (prefix match), and startswith fallback.

If a command matches a deny pattern, it returns permissionDecision: deny with a user-readable reason. If it matches an allow pattern, it returns permissionDecision: allow. If no match, it exits silently and falls through to Claude's default permission system.

auto-approve-file-ops.py (PreToolUse:Read/Edit/Write)

Same rationale as the Bash hook, same bug. Loads Read(...), Edit(...), Write(...) path patterns from settings. Normalizes paths, handles ** glob as prefix match, falls back to fnmatch. Currently auto-approves all file operations within ~/code/**, ~/.claude/**, and /tmp/**.


Slash Commands

Nine custom slash commands standardize the most common development workflows. Each is a markdown file in commands/ with frontmatter specifying description, allowed-tools, and optional argument-hint. They use ! prefix syntax to execute shell commands inline at invocation time.

/startup (Session Initialization)

The most complex command at ~250 lines. It runs at the beginning of every Claude Code session and bootstraps the agent's context:

Section 0: Checks if the current directory is a git repo. If not, offers alternatives.

Section 1 (Session Logging): Pulls the latest logs from the centralized log repo. Derives agent identity from the working directory suffix (lem-work-2 -> agent-2). Checks for today's log file. If the log repo has uncommitted changes older than 60 minutes, auto-commits and pushes them.

Section 2 (Git Status): Current branch, ahead/behind main, working directory cleanliness.

Section 3 (Open PRs): Lists all open pull requests for the current repo.

Section 4 (Open Issues): Lists issues by status: all open, in-progress, in-review. This gives the agent a clear picture of what work is available.

Section 5 (Dependencies): Checks if package.json exists, whether lock files are present, and if dependencies might be out of date relative to main.

After running all sections, it presents the agent with a prioritized recommendation of what to work on next.

/commit

Stages changes, runs lint/build verification, and commits with the enforced {issue_number}: {description} format. Reports the commit hash on success.

/pr

The full PR workflow: pre-flight cleanliness check, lint/build/test verification, rebase on origin/main, push with -u, check for PR template files in two locations (repo root and .github/), create the PR with gh pr create, update the issue label from in-progress to in-review, and report the PR URL.

/sync

Fetch origin and rebase the current branch on main. Handles uncommitted changes (offers to stash, commit, or abort). Reports any conflicts. Reminds about --force-with-lease for already-pushed branches.

/new-issue

Creates GitHub issues with the proper label taxonomy. Gathers title, type, and description. Maps types to labels (feature -> enhancement, bug -> bug). Supports a special human-agent type for tasks requiring manual intervention (env vars, account setup, etc.), which auto-assigns to the repo owner.

/gs

Quick git status overview: branch name, remote tracking, working directory status, sync status relative to main, last 5 commits, and open PRs. Summarizes the state and recommends the next action.

/promote-rule

Analyzes the current repo's CLAUDE.md for rules that should be promoted to the global config. Checks for explicit <!-- CANDIDATE:GLOBAL --> markers, detects implicit candidates (rules that are repo-agnostic), reads 2-3 other CLAUDE.md files to identify patterns, checks against the global config for duplicates, and presents findings in a table with a recommendation.

/dotsync

Runs sync-config.sh --dry first for a preview of what will change, then runs the actual sync. Reports what was updated and whether the changes were committed and pushed.

/walkthrough

Activates step-by-step guided mode for complex tasks. Identifies the task, breaks it into discrete steps, and presents one step at a time with a progress counter (Step N/Total). Waits for the user to confirm completion before proceeding to the next step. Never skips ahead.


The /audit Skill

Beyond slash commands, I've built a comprehensive codebase audit system as a Claude Code skill. The /audit command is a 7-phase self-healing system that launches 8 parallel agents, auto-fixes what it can, and creates GitHub issues for what needs human review.

Phase 1: Pre-Flight

In fix mode (the default), it verifies the working directory is clean and creates a checkpoint branch (audit-checkpoint-YYYYMMDD-HHMMSS) for rollback. If there are uncommitted changes, it stops and offers options.

Phase 2: Discovery

Detects monorepo structure (checks for apps/, packages/, workspaces, pnpm-workspace.yaml), identifies the tech stack (TypeScript, React, package manager), reads existing CLAUDE.md and ESLint configs, and determines the audit scope.

Phase 3: Parallel Audit (8 Agents)

Eight Task agents launch simultaneously, each covering a different category:

Agent Focus Auto-fixable Examples
Security Hardcoded secrets, SQL injection, XSS, auth issues Remove console.logs with sensitive data
Dependencies npm vulnerabilities, outdated packages, unused deps npm audit fix, npm uninstall unused
Code Quality ESLint violations, unused vars, long methods, empty catches eslint --fix, remove unused imports
Architecture Circular deps, god objects, layering violations Limited (mostly human review)
TypeScript/React Excessive any, hooks violations, Fast Refresh issues Add inferred types, add missing keys
Testing Missing test files, tests without assertions Generate test stubs
Documentation Missing JSDoc, stale comments, README gaps Generate JSDoc from types
Performance N+1 queries, missing React.memo, large imports Add memo wrappers

Each agent reports structured JSON with severity, file/line, description, auto-fixability status, and fix confidence level.

Phase 4: Classify

All findings are collected, deduplicated, and split into two queues: auto_fix_queue (high-confidence fixable) and human_review_queue (everything else).

Phase 5: Fix Cycle

For each auto-fixable finding, a Task agent implements the fix. Then the verification suite runs (lint, type-check, tests). If all pass, the fix is committed as an atomic commit (audit: {category} - {brief title}). If any verification fails, the fix is reverted with git checkout -- . and moved to the human review queue.

Phase 6: Summary

Displays results in a table by category and severity, with fix results (successfully fixed / failed verification / skipped for human review), a list of all commits made, and rollback instructions pointing to the checkpoint branch.

Phase 7: Issue Creation

Optionally creates GitHub issues for findings that need human review. Labels them with audit and needs-human-review. Groups findings by category into single issues. Checks for existing audit issues to avoid duplicates on re-runs.

Reference Files

The skill includes five reference documents that the audit agents consult:

  • security-patterns.md: OWASP Top 10 mapping, regex patterns for detecting API keys (OpenAI sk-, GitHub ghp_, AWS AKIA), injection patterns, auth weaknesses, insecure configurations, severity guidelines
  • architecture.md: Clean Architecture patterns, god object detection thresholds (>1000 lines, >20 methods), circular dependency detection via madge, feature-based vertical slice recommendations
  • code-quality.md: Martin Fowler refactoring patterns, cyclomatic complexity thresholds (>10/function), cognitive complexity (>15), nesting depth (>4 levels), naming consistency
  • fix-patterns.md: Implementation guides for each fix type with when-to/when-not-to guidance, verification commands by project type
  • output-template.md: GitHub issue templates with severity-based sections, collapsible details for medium/low findings, rollback instructions

MCP Servers

Seven MCP (Model Context Protocol) servers are configured globally, giving Claude Code direct access to external services:

Server Purpose Details
GitHub Issue/PR management Uses GITHUB_TOKEN from environment
Filesystem Local file access Scoped to ~/code
Memory Persistent context Cross-session knowledge retention
Fetch Web content retrieval HTTP fetching for documentation, APIs
Supabase (lem-work) Database access Direct SQL, migrations, logs for lem-work project
Supabase (lem-photo) Database access Direct SQL, migrations, logs for lem-photo project
n8n Workflow automation Connected to cloud n8n instance for pipeline management

The MCP config uses __HOME__ placeholders in the dotfiles repo and gets expanded to real paths by setup.sh. This means the same config template works on any machine without modification.

Having Supabase as an MCP server is particularly valuable for debugging. Instead of using browser automation to diagnose database issues (slow, unreliable), I can run SQL directly, check logs, and apply migrations from within Claude Code. The debugging priority I've established is: Supabase MCP first for data issues, server logs for API issues, browser automation only as a last resort for UI issues.


Plugins

Fourteen Claude Code plugins extend the base functionality. Grouped by function:

Code Quality

  • code-review: Multi-agent PR review that examines code changes across multiple dimensions
  • pr-review-toolkit: Comprehensive review agents including a silent-failure hunter, code simplifier, comment analyzer, test analyzer, and type design analyzer
  • security-guidance: Real-time security checks on file edits as they happen

Development

  • feature-dev: Guided feature development with codebase analysis and architecture-focused planning
  • frontend-design: Production-grade UI generation that avoids generic AI aesthetics
  • typescript-lsp: TypeScript language server integration for type-aware code intelligence
  • serena: Semantic code analysis and understanding

Integration

  • github: GitHub platform integration (issues, PRs, branches, releases)
  • supabase: Supabase project management and database operations
  • playwright: Headless browser automation for E2E testing and screenshots
  • figma: Figma design tool integration for implementing designs from Figma files
  • greptile: Deep codebase search and understanding

Workflow

  • hookify: Create custom Claude Code hooks from conversation analysis
  • explanatory-output-style: Educational/explanatory output mode that provides insights about implementation choices

All plugins are pre-authorized in the permissions system. If a plugin is installed, the user has already decided to grant access. No per-operation permission prompts.


Multi-Agent Orchestration

This is the core of the system. Four Claude Code agents work in parallel on the same codebase without file conflicts, git collisions, or duplicated work.

Architecture

Each project that supports multi-agent work uses four independent clones:

~/code/{repo}-repos/
├── {repo}-0/           # Clone 0 (agent-0)
├── {repo}-1/           # Clone 1 (agent-1)
├── {repo}-2/           # Clone 2 (agent-2)
└── {repo}-3/           # Clone 3 (agent-3)

Each clone is a full, independent git repository with its own .git/ directory, branches, stash, reflog, and node_modules/. There are no shared resources between clones, which means:

  • Full isolation: Each agent has its own git state, its own branches, its own stash. No cross-agent interference.
  • Standard git workflows: Every git command works exactly as documented. No special rules or workarounds.
  • Independent fetches: Each clone fetches on its own schedule. No shared object store to reason about.

Each agent runs in its own tmux pane, visible simultaneously. I manage all four from my phone via Blink terminal (iOS) through Mosh + Tailscale, which survives WiFi drops and device sleep.

The evolution from worktrees to clones

The original architecture used a bare git repo plus four worktrees. The theory was compelling: shared git object store, single fetch updates all worktrees, zero duplicated data. After several weeks of running this across four repos with four agents each, the theory fell apart.

What broke:

  1. Branch locking was the killer issue. Worktrees can't have the same branch checked out in two places. This broke standard git checkout -b workflows. The enforce-git-workflow hook and issue-workflow hook both assumed standard branch creation. Agents would create a branch but commits would end up on the wrong branch because the worktree was locked to its parking branch.

  2. Shared stashes were a liability. Worktrees share the reflog and stash through the bare repo. An agent stashing changes in one worktree could interfere with another agent's stash pop.

  3. "Never checkout main" rule was confusing. Local main couldn't exist in bare repo worktrees. Every agent and every hook had to use origin/main everywhere. This constantly tripped up agents and broke workflow assumptions.

  4. Storage savings were negligible. The bare repos were 2-9MB each. Duplicating across four clones adds at most 36MB total. node_modules/ (the real disk consumer) was already per-worktree anyway.

  5. Single fetch was rarely useful. Agents fetch at different times. The shared-fetch advantage only matters if all agents need the same new commits simultaneously, which almost never happens.

  6. Simpler mental model wins. A clone is a clone. Every developer and every AI agent understands it. Worktrees are a git power feature that adds cognitive overhead for agents that already have enough to track.

The migration was straightforward: about 10 minutes per project. Delete the bare repo, clone four times, copy over .env files and node_modules/. The slightly higher disk usage is worth the dramatically simpler mental model.

Agent identity

Agent number is derived automatically from the working directory name suffix:

AGENT_NUM=$(basename "$PWD" | grep -oE '[0-9]+$' || echo "0")
Directory Agent
lem-work-0 or lem-work agent-0
lem-work-1 agent-1
lem-work-2 agent-2
lem-work-3 agent-3

No configuration needed. The agent knows who it is from where it's running.

Issue claiming protocol

Before any agent writes a single line of code, it must claim the issue through a multi-check protocol:

  1. Check GitHub labels: Skip if any agent-* or in-progress label exists

    gh issue view {number} --json labels --jq '[.labels[].name]'
  2. Check sibling clone branches: Skip if any clone already has a branch starting with {issue-number}-

    for dir in $(find ~/code/{repo}-repos/ -maxdepth 1 -name "{repo}-[0-9]*" -type d); do
      echo "$(basename $dir): $(git -C $dir branch --show-current)"
    done
  3. Label the issue immediately (before creating a branch):

    gh issue edit {number} --add-label "in-progress" --add-label "agent-{N}"
  4. Verify labels applied before proceeding

Both checks must be clear. If there's a conflict (two agents try to claim simultaneously), the human supervisor resolves it. In practice, this hasn't been an issue because the label check and branch check together create a reliable mutex.

Git workflow

Standard git workflows apply. No special rules needed:

  • Branch from main: git checkout main && git pull && git checkout -b {issue}-{desc} or directly git checkout -b {issue}-{desc} origin/main. Both work.
  • After PR merge: Return to main: git checkout main && git pull && git branch -d {old-branch}
  • Stashes are per-clone: Each agent has its own stash. No cross-agent interference.
  • npm install is per-clone: Each has its own node_modules/.
  • Env files are per-clone: .env and .env.local must be copied individually.

Session Logging System

The coordination layer that makes multi-agent work possible. Without it, agents would duplicate work, lose context between sessions, and have no awareness of what other agents are doing.

Architecture

A centralized git repo (lem-agent-logs) where every agent writes structured entries:

~/code/lem-agent-logs/
├── biotechstonks/
│   └── 20260203/
│       ├── agent-0.md
│       └── agent-1.md
├── darkly-suite/
│   └── 20260216/
│       ├── agent-0.md
│       ├── agent-1.md
│       ├── agent-2.md
│       └── agent-3.md
├── gmail-darkly/
│   └── 20260212/
│       ├── agent-0.md
│       ├── agent-1.md
│       └── agent-2.md
└── ... (14 projects tracked)

Each agent writes exclusively to its own file. This eliminates merge conflicts entirely. When multiple agents push to the log repo simultaneously, git pull --rebase resolves cleanly because the files never overlap.

Mandatory log triggers

The log must be updated immediately at each of these events:

  1. After every git commit: Issue number, branch, what changed, decisions made, gotchas encountered
  2. After creating a PR: Issue number, PR URL, mark #in-review
  3. After PR merge: Mark #completed, commit and push the log repo
  4. After closing an issue: Resolution summary, mark #completed, commit and push
  5. Before context compaction: Current WIP state, uncommitted changes, next step (this preserves context when Claude's conversation gets too long and earlier messages are compressed)

Cross-agent awareness

At session start (/startup), every agent reads other agents' logs for today's date directory. This builds a "Cross-Agent Awareness" section that documents what each sibling is working on:

## Cross-Agent Awareness

- **Agent-1** (today): Working on issue #84, created PR #87. In-review.
- **Agent-2** (today): Idle on agent-2 branch.
- **Agent-3** (today): Working on issue #95 (bundle completion).

The agent uses this to avoid claiming issues that are already in flight, even if the GitHub labels haven't been updated yet.

Log file format

Real example from a session:

# agent-0 — 20260217 — darkly-suite

> Continued from previous session (20260216)

## Session Start

- **Time**: 2026-02-17 afternoon ET
- **Branch**: `main` (clean, up to date)
- **Previous session context**: Completed massive monorepo buildout...

## Work Log

### Issue #92 — Fix mini-panel white background in dark mode

- **Branch**: `92-fix-mini-panel-dark-mode` (from `origin/main`)
- **PR**: #93 — https://github.com/lucasmccomb/darkly-suite/pull/93
- **Merged**: PR #93 → `4f124a1` on main #completed

**Wrong fix (commit 1)**: Added settings-container class to mini-panel.
This broke layout because .settings-container has height: 100% and display: flex...

**Correct fix (commit 2)**: Added mini-panel to the CSS re-inversion rules.

**Lesson**: When working with filter-based dark mode, prefer CSS-only fixes over DOM class changes.

Key elements: continuation notes linking to previous sessions, issue/branch/PR references with URLs, status tags (#completed, #in-progress, #blocked), decision capture ("wrong fix / correct fix"), and lesson sections that preserve institutional knowledge across sessions.


Per-Project Configuration Layering

Claude Code loads all CLAUDE.md files in the directory hierarchy from root to working directory. This creates a three-tier inheritance system:

  1. Global (~/.claude/CLAUDE.md): Universal rules that apply everywhere. No AI attribution in commits, PR template usage, git workflow, session logging, code standards, security, MCP tool permissions.

  2. Workspace (~/code/CLAUDE.md): Describes the multi-project directory layout, repo shorthand references, git aliases.

  3. Project ({repo}/CLAUDE.md): Project-specific build commands, tech stack, unique behaviors.

How projects specialize

Project Tech Stack Unique Rules
lem-photo React + Express + Supabase + Cloudflare + SwiftUI "Build locally, deploy pre-built" server model (Render's free tier can't compile TS). Never use --no-verify for server pushes. CI disabled, replaced by 8-step pre-push hook. Migrations run via Supabase MCP directly.
gmail-darkly Chrome Extension + TypeScript + React + Stripe Never verify in browser via automation (extension requires manual chrome://extensions refresh). Dev mode bypasses Stripe payment gate. InboxSDK integration rules. gd- CSS prefix.
sheets-darkly Chrome Extension + TypeScript + React + Stripe Custom DOM injection (no InboxSDK). Waffle grid canvas handling (double-inversion technique). sd- CSS prefix.
docs-darkly Chrome Extension + TypeScript + React + Stripe Kix canvas tile handling for Google Docs. dd- CSS prefix.
darkly-suite pnpm monorepo, 9 packages, 4 extensions Default to dev mode (production build enables Stripe paywall locally). Auto-rebuild decision flow. Conflict detection via data-darkly-active attribute. Prefix resolution uses three strategies (CSS loader, React context, config injection).
biotechstonks React + Express + Supabase + n8n 4-agent n8n AI pipeline. JSONB tags with GIN indexes. Entity lazy-creation pattern. Finnhub stock API integration with rate limiting.
totomail Tauri 2 + React + Rust + SQLite Tauri IPC patterns. OAuth tokens in OS keychain. Rust toolchain (cargo check).
human-of-habit React + Supabase + Tailwind + Radix Vanilla CSS imports (not CSS modules). Migrations require human-agent issues (older pattern).
nadaproof React + Supabase + Vite CI enabled and mirrors pre-push hooks exactly. Monorepo with shared package built first. Security headers in render.yaml.

Project files that say "global instructions are in ~/.claude/CLAUDE.md" are documenting what Claude Code already handles automatically. It's a human-readable reminder that these files intentionally contain only project-specific additions.


Quality Gates

Quality is enforced at multiple levels, from individual keystrokes to deployment.

Pre-commit (every commit)

A global git hook runs on every commit across all repos:

  • gitleaks: Scans for accidentally committed secrets (API keys, tokens, passwords)
  • lint-staged: Runs ESLint with --fix and TypeScript type-checking on staged files only

Pre-push (every push)

The most sophisticated gate. lem-photo's pre-push hook is 113 lines and mirrors the full CI pipeline:

  1. Lint client workspace
  2. Lint server workspace
  3. Type-check client
  4. Type-check server
  5. Run client tests
  6. Run server tests
  7. Build client
  8. Build server + verify server/dist/ is committed and up-to-date

A rapid mode (SKIP_CHECKS=1 git push) skips steps 1-6 but still builds both workspaces and verifies the server dist. The hook uses colored output and timing for each step.

Other projects adapt this pattern to their needs. nadaproof builds its shared package first, then runs the standard lint/type-check/test/build sequence. The darkly extensions run lint, type-check, tests, and webpack build.

CI/CD (GitHub Actions)

Where enabled, CI runs on PR events and push-to-main. The pipeline typically mirrors the pre-push hook exactly, ensuring that local and CI checks are identical. Some projects (lem-photo) have disabled CI to save GitHub Actions minutes and rely entirely on the pre-push hook.

Automated accessibility testing

All darkly extensions include WCAG AA contrast ratio tests that run programmatically across every theme preset and color pairing. The test computes relative luminance using the WCAG 2.1 formula and asserts >= 4.5:1 for body text and >= 3:1 for UI components. This catches accessibility regressions automatically across 78+ theme/color combinations.

Coverage thresholds

Test coverage is enforced at the framework level. lem-photo requires 60% coverage on the client and 70% on the server. Vitest fails the run if thresholds aren't met, which means the pre-push hook blocks the push.


Real Results

darkly-suite: Full monorepo in 2 days

Four agents working in parallel built an entire pnpm monorepo from scratch:

  • 9 packages (shared core, 3 site packages, 4 extension packages, landing page)
  • 4 Chrome extensions (Gmail, Sheets, Docs, bundle)
  • Self-hosted payment system (Cloudflare Pages Functions + D1 + Stripe)
  • 23 PRs merged across 2 days
  • All tests passing, all extensions functional

The git log of the log repo shows the real-time coordination: four agents committing to the shared log repo within seconds of each other, each working on a different package, with git pull --rebase resolving cleanly because each agent writes to its own file.

Day 1 parallel streams:

  • Agent-0: Scaffolded monorepo, created all 61 GitHub issues, coordinated merges
  • Agent-1: Built site packages and individual extensions
  • Agent-2: Built CSS prefix loader, webpack factory, bundle extension, CI
  • Agent-3: Built landing page, payment APIs, D1 schema, marketing pages

gmail-darkly: 50 issues in a sprint

50 agent issues closed across 7 PRs, with 120+ tests passing. 18 PRs merged on day 1 alone. The extension shipped with self-hosted Stripe payments (no SDK, raw crypto.subtle HMAC-SHA256 for webhook verification), WCAG-validated themes, InboxSDK integration, and a full admin portal.

lem-photo: 2,267+ tests

The most tested project in the portfolio:

  • ~1,388 client tests (Vitest + React Testing Library + MSW in strict mode)
  • ~764 server tests (Vitest + Supertest)
  • ~89 Playwright E2E tests with programmatic auth (two users: regular + admin)
  • 26 Python pytest tests for the facial recognition microservice
  • Coverage thresholds enforced (60% client, 70% server)

The Playwright setup authenticates via Supabase REST API directly (not through browser UI), serializes session state as JSON, and injects it into browser contexts via custom fixtures. This makes E2E tests deterministic and headless-friendly.

biotechstonks: AI pipeline processing 17 RSS feeds

A 4-agent n8n workflow running daily at noon:

  • 4 Claude Opus instances processing different RSS feed categories in parallel (Reddit, News/PR Newswire, BioSpace, GlobeNewswire)
  • 17 RSS feed tools across the 4 agents
  • Structured output parsing (companies, topics, sectors, action classification)
  • Deduplication via source_url unique constraint + upsert RPC
  • JSONB tags with GIN indexes for flexible querying

How This Scales

The system I've built for personal use maps directly to a team environment.

The consultant starter kit

The claude-dotfiles repo becomes a team template. Fork it, swap in client-specific values (GitHub org, Supabase project, n8n instance), run setup.sh, and every developer has identical enforcement hooks, quality gates, slash commands, and MCP connections from day one. No manual configuration of individual repos.

Boilerplate project templates

A library of pre-configured project templates, each with CI/CD pipelines, test frameworks, pre-push hooks, and CLAUDE.md project instructions already baked in. A new client engagement starts with picking the right template (React + Express, Chrome Extension, n8n pipeline, etc.), spinning up the repo, and the quality infrastructure is already in place.

Multi-consultant coordination

The session logging system scales naturally. If three consultants are working on the same client project, they each write to their own log file in the same date directory. At session start, they read each other's logs for awareness. The same coordination layer that prevents my four personal agents from stepping on each other works across a team.

This could be adapted to integrate with external tools. An MCP server configured to update Jira tasks or Confluence docs as work progresses would give non-technical stakeholders visibility into what's happening without requiring them to read git logs.

Shared infrastructure

For MCP tooling, shared remote servers (GitHub, Supabase, n8n) rather than local instances. Onboarding a new team member is just connecting to existing infrastructure rather than rebuilding it from scratch.

The enforcement hooks as governance

The hooks ensure that AI agents follow the same workflow rules as human developers. No special treatment, no shortcuts. Every commit references an issue. Every push passes the full verification suite. Every destructive operation is blocked. This is the kind of policy layer that enterprises need when adopting AI in their development workflows.


Closing Thoughts

The most interesting thing about this system isn't any individual component. It's how they compose. The dotfiles repo makes the system reproducible. The hooks make it self-enforcing. The logging makes it observable. The clones make it parallel. And the per-project CLAUDE.md files make it adaptable.

A solo developer with four AI agents and the right infrastructure can produce output that would typically require a small engineering team. But the productivity multiplier isn't just about speed. It's about maintaining quality at scale through automated enforcement.

The system works not because the AI models are perfect, but because the infrastructure around them is designed to catch mistakes, prevent conflicts, and preserve context. That's the real lesson: AI enablement isn't about the models. It's about the workflow you design around them.