embedding-shapes: "One Human + One Agent = One Browser From Scratch in 20K LOC" (64 HN Points, 26 Comments)—Built HTML/CSS Browser in 3 Days (Zero Rust Dependencies) Proves One Human + One Agent Outperforms Hundreds of Autonomous Agents—Voice AI for Demos Follows Same Pattern

# embedding-shapes: "One Human + One Agent = One Browser From Scratch in 20K LOC" (64 HN Points, 26 Comments)—Built HTML/CSS Browser in 3 Days (Zero Rust Dependencies) Proves One Human + One Agent Outperforms Hundreds of Autonomous Agents—Voice AI for Demos Follows Same Pattern: Human Defines Goal, Agent Executes Navigation ## Meta Description embedding-shapes built HTML/CSS browser from scratch in 20K LOC over 3 days using one human + one agent (Codex). Zero Rust dependencies, cross-platform (Windows/macOS/Linux). Proves one human directing one agent outperforms "hundreds of autonomous agents for weeks". Voice AI for demos uses identical pattern: human asks question, agent executes navigation. ## Introduction: Human-Agent Collaboration vs Autonomous Agent Swarms The anonymous author behind "embedding-shapes" (64 HN points, 26 comments, 6 hours ago on Hacker News) documented building a basic web browser from scratch in ~20,000 lines of Rust code over 3 days using "one human + one agent" collaboration with OpenAI Codex. The browser renders HTML/CSS (no JavaScript), runs on Windows, macOS, and Linux, and uses **zero third-party Rust dependencies**—only OS-provided libraries. The project's core thesis: **"One human using one agent seems far more effective than one human using thousands of agents."** Their key question: "If one person with one agent can produce equal or better results than 'hundreds of agents for weeks', then the answer to the question: 'Can we scale autonomous coding by throwing more agents at a problem?', probably has a more pessimistic answer than some expected." This connects directly to **Voice AI for website demos**: The one-human-one-agent pattern (human defines requirements, agent executes implementation) mirrors Voice AI's workflow (user asks question, agent executes navigation). Both prove **directed agency outperforms autonomous agency** for constrained problems. The core trade-off: **Autonomous agents explore solution space broadly** (trying thousands of approaches in parallel). **Directed agents execute known paths efficiently** (human provides high-level guidance, agent handles low-level details). For **website demos**, autonomous exploration is counterproductive—there's one correct navigation path to the pricing page, not thousands of possible paths to explore. Voice AI uses directed agency: user specifies destination ("show me pricing"), agent finds shortest navigation path. This article explores: 1. **The one-human-one-agent workflow**: How embedding-shapes built a browser in 3 days with human oversight 2. **Why zero dependencies matter**: Compilation speed, security audits, deployment simplicity 3. **Directed agency vs autonomous swarms**: When human guidance beats parallel exploration 4. **Voice AI's identical pattern**: User questions direct agent navigation instead of autonomous website exploration 5. **The coordination overhead problem**: Why "hundreds of agents" create more coordination cost than execution value ## The One-Human-One-Agent Workflow: Building a Browser in 3 Days ### Day 1: Baseline Rendering (7,500 LOC) embedding-shapes started with foundational requirements: **Day 1 constraints:** - Render "Hello World" and nested HTML tags - Take screenshots (so agent could verify rendering) - Add HTML/CSS specifications (though agent rarely consulted them) - Implement regression/E2E tests (compare screenshots to baseline images) - Add link clicking functionality - Keep codebase compilable at all times - Split code across files (<1000 lines per file) **Day 1 result:** 7,500 lines of Rust rendering websites via X11 + cURL, with Cargo.lock empty (zero dependencies). **Human role:** Define requirements, specify architecture (X11 for Linux, no dependencies), verify screenshot output. **Agent role (Codex):** Generate rendering code, implement HTML/CSS parsers, handle X11 system calls, write test scaffolding. The workflow: "Pick a website, share screenshot without JavaScript, ask Codex to replicate it following our instructions. Most of the time was the agent doing work by itself, and me checking in when it notifies me it was done." ### Day 2: Testing and Rendering Improvements **Day 2 additions:** - --headless flag (stop test windows from spawning during other work) - Window resizing fixes - Cross-browser compatibility fixes - Performance optimizations - Font/text rendering improvements - More regression tests **Human role:** Identify pain points (test windows interrupting work), specify new features (headless mode), validate rendering quality. **Agent role:** Implement headless rendering, optimize layout algorithms, fix font rendering bugs, add test cases. The pattern: Human experiences friction (test windows annoying) → specifies solution (headless mode) → agent implements fix. ### Day 3 (+ Day 4): Cross-Platform and Polish **Day 3-4 additions:** - Scrolling support ("this is a mother fucking browser, it has to be able to scroll") - Debug logs (for demonstration videos) - Back button (avoid restarting from scratch after wrong clicks) - macOS support (window opening, tests passing) - Windows support (same process, different platform) - CI for all three platforms - Release builds from CI **Final stats:** ~20,150 lines of code across 72 files, ~70 hours elapsed time (first commit to last), zero third-party dependencies. **Human role:** Define feature priority (scrolling more important than advanced CSS), validate cross-platform builds, approve final release. **Agent role:** Implement scrolling algorithms, add debug instrumentation, port X11 code to macOS/Windows equivalents, configure CI pipelines. The escalation: Day 1 = single platform basics. Day 2 = quality improvements. Day 3-4 = multi-platform deployment infrastructure. ## Why Zero Dependencies Matter: Compilation Speed, Security, Deployment ### The Zero-Dependency Constraint embedding-shapes' requirements explicitly forbade third-party Rust dependencies: - **Allowed:** OS-provided libraries (X11, Cairo, Xft on Linux; Cocoa, CoreGraphics on macOS; Win32, Direct2D, DirectWrite on Windows) - **Forbidden:** cargo install anything from crates.io - **Result:** Cargo.lock is empty, compilation finishes in seconds **Why this matters for agent collaboration:** 1. **Fast iteration cycles**: Agent changes code → human runs cargo build → instant feedback. With dependencies (like Servo with 500+ crates), compilation takes minutes, slowing human-agent feedback loop. 2. **Security audit simplicity**: 20K lines of project code vs 20K lines + 2 million lines of dependencies. Human can audit entire codebase in days. With dependencies, full audit takes months. 3. **Deployment simplicity**: Single statically-linked binary (~5MB) vs binary + dependency tree. Browser runs on any Linux/macOS/Windows without installing Rust toolchain or managing library versions. 4. **Agent focus**: Agent only maintains project code, not dependency compatibility. No "this crate broke our build because it upgraded tokio" debugging sessions. ### The Compilation Speed Advantage From embedding-shapes' cloc output: ``` 72 text files. 72 unique files. 0 files ignored. T=0.06 s (1172.5 files/s, 373824.0 lines/s) ``` The entire 20K LOC codebase processes in **0.06 seconds** for line counting. Rust compilation (with --release flag) likely finishes in **5-10 seconds**. Compare to typical Rust web projects: - **Servo** (Firefox's experimental browser engine): 500+ dependencies, 10-15 minute clean builds - **Rocket** (popular web framework): 200+ dependencies, 3-5 minute clean builds - **Tokio-based apps**: 100+ dependencies, 2-3 minute clean builds **For human-agent iteration:** - **Zero deps:** Agent writes code → human compiles (10s) → runs test (5s) → provides feedback (30s) = **45 second cycle** - **Heavy deps:** Agent writes code → human compiles (3m) → runs test (5s) → provides feedback (30s) = **3.5 minute cycle** Over 3 days of development (~70 hours), embedding-shapes likely ran **1,000-2,000 build cycles**. At 45s/cycle vs 3.5m/cycle, zero-deps saved **40-60 hours of pure compilation waiting**. ### The Security Audit Advantage A zero-dependency browser codebase is **fully auditable by one human**: - **20,150 lines of Rust:** ~10-15 days for thorough security review (1,500 LOC/day) - **OS API surface:** X11, Win32, Cocoa APIs are well-documented, vulnerabilities publicly tracked - **Attack surface:** Only network (cURL/WinHTTP) and rendering (Cairo/Direct2D) touch external data Compare to dependency-heavy browser: - **20K project code + 2M dependency code:** 1,350 days for full review (impossible) - **Transitive dependencies:** Dependencies have dependencies (exponential audit explosion) - **Supply chain risk:** Malicious crate injection, abandoned maintainers, breaking API changes **For Voice AI**, zero-dependency philosophy translates to **zero third-party JavaScript execution**: - Voice AI parses DOM (read-only HTML) without executing JavaScript → no supply chain risk from website code - No eval(), no Function() constructor, no dynamic script loading → attack surface limited to HTML parsing - Browser's JavaScript sandbox remains intact → Voice AI sits outside sandbox, reads only embedding-shapes proved: **Constrained tools force creative solutions.** Ban dependencies → must use OS APIs directly → deeper understanding of platform primitives. Ban JavaScript execution → must infer navigation from static HTML → clearer separation between data and behavior. ## Directed Agency vs Autonomous Swarms: The Coordination Overhead Problem ### embedding-shapes' Core Claim From the article: "If one person with one agent can produce equal or better results than 'hundreds of agents for weeks', then the answer to the question: 'Can we scale autonomous coding by throwing more agents at a problem?', probably has a more pessimistic answer than some expected." **The comparison:** - **One human + one agent:** 20K LOC browser in 3 days (~70 hours) - **Hundreds of agents:** Hypothetical weeks-long autonomous coding experiment generating "millions of lines of source code" **embedding-shapes' implicit critique:** Autonomous agent swarms generate massive code volume but lack **architectural coherence**. The human provides coherence by making high-level decisions (use X11 on Linux, zero dependencies, <1000 lines per file). ### Why Directed Agency Wins for Constrained Problems **Directed agency workflow (embedding-shapes + Codex):** 1. Human: "Build HTML renderer using X11 and Cairo" 2. Agent: Generates X11 window setup, Cairo drawing context, HTML parser skeleton 3. Human: Reviews code, approves approach, specifies next feature 4. Agent: Implements next feature within established architecture 5. Loop repeats with human maintaining architectural consistency **Autonomous agency workflow (hypothetical):** 1. Swarm: "Build a browser" 2. Agent 1: Tries using Servo as dependency 3. Agent 2: Tries reimplementing Blink rendering engine 4. Agent 3: Tries using Electron framework 5. Agent 4: Tries using WebAssembly + Rust 6. Agent 5-100: Explore other architectural approaches 7. Coordinator agent: Must merge 100 conflicting architectures into coherent codebase **The coordination overhead:** - **Directed:** Human → agent communication (simple command-response) - **Autonomous:** Agent ↔ agent communication (exponential coordination: 100 agents = 4,950 unique pairings) **Brooks' Law applies:** "Adding manpower to a late software project makes it later." Replace "manpower" with "autonomous agents"—same coordination overhead. ### When Autonomous Swarms Win vs Directed Agency **Autonomous swarms excel at:** 1. **Exploration problems:** Unknown solution space, need to try many approaches - Example: Discovering new chemical compounds (try millions of combinations) - Example: Evolving game-playing strategies (explore strategy space) 2. **Embarrassingly parallel problems:** Subtasks don't communicate - Example: Rendering video frames (each frame independent) - Example: Testing across environments (each test independent) 3. **Optimization problems:** Many local optima, need diverse search - Example: Hyperparameter tuning (try many configurations) - Example: Protein folding (explore conformation space) **Directed agency excels at:** 1. **Constrained problems:** One correct answer, need to find shortest path - Example: Building browser with specific architecture (embedding-shapes) - Example: Navigating website to specific page (Voice AI) 2. **Sequential problems:** Later steps depend on earlier decisions - Example: Multi-platform deployment (Linux first, then macOS/Windows) - Example: User onboarding flow (signup before payment setup) 3. **Coherence-critical problems:** All parts must fit together - Example: UI design (visual consistency across components) - Example: API design (endpoint naming conventions) **For website demos:** Navigation is a **constrained, sequential, coherence-critical problem**. - **Constrained:** One optimal path to pricing page (not thousands of valid paths) - **Sequential:** Must click "Products" before seeing product-specific pricing - **Coherence-critical:** User expects Voice AI to guide them through standard navigation flow, not creative explorations Autonomous swarms would waste effort exploring dead-end navigation paths. Directed agency (user specifies goal, agent finds path) matches problem structure. ## Voice AI's Identical Pattern: User Questions Direct Agent Navigation ### The Parallel Workflow **embedding-shapes + Codex:** ``` Human: "Add scrolling support" Agent: Implements scroll event handlers, viewport calculations, repaint logic Human: Verifies scrolling works, specifies next feature ``` **User + Voice AI:** ``` User: "Show me the pricing page" Agent: Identifies navigation path (header → Pricing link), guides user User: Verifies correct page loaded, asks next question ``` **Shared pattern:** 1. **Human provides high-level goal** (add scrolling / show pricing) 2. **Agent decomposes into low-level steps** (scroll events + viewport / click header → click Pricing) 3. **Human verifies outcome** (can scroll / pricing page loaded) 4. **Loop continues** (next feature / next question) **Key similarity:** Human maintains **goal coherence** across interactions. embedding-shapes ensured browser features fit together (scrolling + back button + cross-platform). User ensures questions build toward purchase decision (pricing → features → checkout). ### Why Voice AI Can't Use Autonomous Exploration **Hypothetical autonomous Voice AI:** ``` [User says nothing, Voice AI explores website independently] Agent 1: Clicks "About Us" Agent 2: Clicks "Careers" Agent 3: Clicks "Blog" Agent 4: Clicks "Contact" Agent 5: Scrolls to footer Agent 6: Reads privacy policy ... [continues for 100 agents] [3 hours later] User: "Show me pricing" Voice AI: "I explored the entire website and discovered 47 pages. Pricing is page #23. Here's my exploration report..." ``` **The problem:** User doesn't care about website exploration. User wants **shortest path to goal**, not **comprehensive website map**. **Actual Voice AI workflow:** ``` User: "Show me pricing" Voice AI: [Parses DOM] [Finds
← Back to Blog