Files
2026-04-09 09:17:50 +02:00

8.6 KiB
Raw Permalink Blame History

AutoDev — Autonomous CLI Development Studio

AutoDev reads a project description and reference manuals, then autonomously plans, implements, compiles, tests, and debugs complete software projects using a local LLM.

No cloud APIs. No subscriptions. Runs entirely on your machine with Ollama or vLLM.

How It Works

description.txt + manuals/ → LLM plans the project → writes code → compiles → tests → debugs → delivers
  1. You write a description.txt explaining what you want built
  2. You put reference documentation in a manuals/ folder
  3. You run autodev
  4. AutoDev reads everything, creates a development plan, and executes it step by step
  5. If something fails to compile or run, it debugs itself — analyzing errors, generating fixes, and retrying
  6. When done, you have a working project

You don't interact with it. You watch it work.

Quick Start

# 1. Make sure Ollama is running with a model loaded
ollama run gemma4:e4b

# 2. Set up your project folder
mkdir my-project && cd my-project
mkdir manuals

# 3. Write what you want
cat > description.txt << 'EOF'
Language: Python
Build a CLI tool that converts CSV files to JSON.
It should accept an input file and output file as arguments.
Handle errors gracefully if the input file doesn't exist.
EOF

# 4. Add any reference docs (API docs, specs, examples)
cp csv-format-spec.pdf manuals/

# 5. Run AutoDev
autodev

Installation

# Clone the repository
git clone https://github.com/your-username/autodev.git
cd autodev

# Symlink to your PATH
ln -s $(pwd)/autodev/autodev-cli ~/.local/bin/autodev
# or
ln -s $(pwd)/autodev/autodev-cli ~/bin/autodev

# Alternatively, run directly
python -m autodev --workdir /path/to/project

Requirements

  • Python 3.10+
  • Ollama or vLLM running locally or on your network
  • No pip dependencies — uses only the Python standard library

Configuration

Edit autodev/config.py to set your LLM backend:

LLM_BACKEND = "ollama"                          # "ollama" or "vllm"
OLLAMA_URL  = "http://localhost:11434"           # your Ollama instance
MODEL_NAME  = "qwen2.5-coder:14b"               # any model Ollama serves

You can also override at runtime:

autodev --backend ollama --model gemma4:e4b

Tested Models

All models were tested against the same task: plan, implement, compile, test, and debug a C "hello world" project with a Makefile. Tested on Ollama with GPU offload.

Model Size Result Speed Notes
gemma4:e4b ~12B Pass Fast Clean run, no debug needed. Best balance of speed and quality. Recommended.
gemma3:27b 27B Pass Slow Works well but slow. Needed sandbox fixes during early testing. Good for complex projects.
gemma4:e2b ~8B Fail Very fast Plans OK, but setup created a directory that blocked the executable name. Could not self-correct — repeated the same failed approach 10 times.
gemma3:4b 4B Fail Very fast Steps 14 passed, but debugger hallucinated a nonexistent hello.c file and could not reason about what files actually exist on disk.
qwen2.5-coder:7b 7B Fail Fast Classified "create main.c" as setup instead of implement, so the file was never generated. Debugger could not write a valid Makefile after 10 attempts.

Takeaway: Models below ~12B parameters can plan and generate simple code, but they cannot self-correct when things go wrong. They repeat failed approaches, hallucinate files, and produce broken build scripts. 14B+ recommended for autonomous development.

Project Structure

Your project folder needs:

my-project/
├── description.txt    # Required — what to build
└── manuals/           # Required — reference docs (use -nomanual to skip)
    ├── api-spec.md
    └── protocol.txt

AutoDev creates these files as it works:

my-project/
├── description.txt
├── manuals/
├── plan.json          # The development plan (human-readable)
├── worklog.json       # Every action logged with timestamps
├── dependency.txt     # External dependencies (compilers, libraries)
├── .autodev_state.json
└── ... your project files ...

Features

Autonomous Development Loop

Reads the description, understands the requirements, creates a structured plan, then executes it: setup → implement → compile → test → debug → finalize. No human input needed during execution.

Self-Debugging

When compilation or tests fail, AutoDev enters a debug loop:

  • Analyzes the error and source code
  • Diagnoses root cause (not just symptoms)
  • Generates a fix and applies it
  • Verifies the fix works
  • Rolls back automatically if the fix makes things worse
  • Tracks failed approaches so it never repeats the same fix twice

Resumable Sessions

Every action is logged to worklog.json. If AutoDev is interrupted or fails:

# Just run it again — it picks up where it left off
autodev

It reads the worklog, loads the existing plan, and continues from the last incomplete step.

Cycle & Hallucination Detection

Detects when the LLM is stuck in a loop (producing similar outputs repeatedly) and automatically clears stale context to break out.

Sandboxed Execution

  • All file operations are confined to the working directory
  • Shell commands are validated against a whitelist of safe tools (compilers, build tools, standard utilities)
  • sudo and system-level commands are blocked
  • Path traversal outside the working directory is prevented

Language Agnostic

Works with any programming language the LLM knows. Tested with C, Python, and Makefiles. The LLM determines the appropriate build tools, compilers, and project structure.

Dependency Tracking

All external dependencies (compilers, libraries, tools) are recorded in dependency.txt so you know exactly what the project needs.

CLI Options

autodev [options]

Options:
  -nomanual              Skip reading manuals/ directory (for simple tasks)
  -web PORT              Start live web dashboard on PORT (e.g. -web 4500)
  --backend {ollama,vllm}  LLM backend (default: from config)
  --model MODEL          Model name (default: from config)
  --workdir DIR          Working directory (default: current directory)

Web Dashboard

Run autodev -web 4500 and open http://localhost:4500 in your browser.

The dashboard shows three panels:

  • Plan Progress — step-by-step checklist with ✓/✗/▸ status and completion counter
  • Project Files — clickable file tree with live content viewer
  • LLM Activity — real-time log of all actions and model thinking (newest first)

Updates are pushed live via Server-Sent Events — no page refresh needed.

Incremental Updates

If you change description.txt and restart AutoDev, it detects the change and re-plans incrementally — telling the LLM what files already exist so it builds on previous work instead of starting over.

Architecture

autodev/
├── config.py       # LLM backend settings, timeouts, expert system prompt
├── llm.py          # Ollama + vLLM communication with streaming and retry
├── context.py      # Token-aware context window with relevance scoring
├── planner.py      # Reads description + manuals, creates development plan
├── executor.py     # Code generation, file writing, compilation
├── debugger.py     # Error analysis, fix generation, rollback
├── sandbox.py      # Whitelist-based command validation, path confinement
├── logger.py       # Action logging to console and persistent worklog
├── dependency.py   # Dependency tracking
├── resume.py       # State persistence and session resumption
├── main.py         # CLI orchestrator
└── autodev-cli     # Symlink-friendly entry point

How the Description Should Be Written

Be specific. Every sentence is treated as a requirement.

Good:

Language: C
Build a TCP echo server that listens on port 8080.
It should handle multiple clients using fork().
Include proper signal handling for SIGCHLD to avoid zombies.
Include a Makefile with 'all' and 'clean' targets.
The server should log connections to stderr.

Too vague:

Make a server program.

Limitations

  • Quality depends entirely on the LLM model — larger models produce better results
  • No interactive mode — you can't guide it mid-run (by design)
  • Manual parsing is plain text only (no PDF extraction)
  • Token counting is estimated, not exact
  • The LLM may occasionally produce code that compiles but doesn't meet all requirements