autodev/README.md

# AutoDev — Autonomous CLI Development Studio

AutoDev reads a project description and reference manuals, then autonomously plans, implements, compiles, tests, and debugs complete software projects using a local LLM.

No cloud APIs. No subscriptions. Runs entirely on your machine with [Ollama](https://ollama.com) or [vLLM](https://github.com/vllm-project/vllm).

## How It Works

```
description.txt + manuals/ → LLM plans the project → writes code → compiles → tests → debugs → delivers
```

1. You write a `description.txt` explaining what you want built
2. You put reference documentation in a `manuals/` folder
3. You run `autodev`
4. AutoDev reads everything, creates a development plan, and executes it step by step
5. If something fails to compile or run, it debugs itself — analyzing errors, generating fixes, and retrying
6. When done, you have a working project

You don't interact with it. You watch it work.

## Quick Start

```bash
# 1. Make sure Ollama is running with a model loaded
ollama run gemma4:e4b

# 2. Set up your project folder
mkdir my-project && cd my-project
mkdir manuals

# 3. Write what you want
cat > description.txt << 'EOF'
Language: Python
Build a CLI tool that converts CSV files to JSON.
It should accept an input file and output file as arguments.
Handle errors gracefully if the input file doesn't exist.
EOF

# 4. Add any reference docs (API docs, specs, examples)
cp csv-format-spec.pdf manuals/

# 5. Run AutoDev
autodev
```

## Installation

```bash
# Clone the repository
git clone https://github.com/your-username/autodev.git
cd autodev

# Symlink to your PATH
ln -s $(pwd)/autodev/autodev-cli ~/.local/bin/autodev
# or
ln -s $(pwd)/autodev/autodev-cli ~/bin/autodev

# Alternatively, run directly
python -m autodev --workdir /path/to/project
```

### Requirements

- Python 3.10+
- [Ollama](https://ollama.com) or [vLLM](https://github.com/vllm-project/vllm) running locally or on your network
- No pip dependencies — uses only the Python standard library

## Configuration

Edit `autodev/config.py` to set your LLM backend:

```python
LLM_BACKEND = "ollama"                          # "ollama" or "vllm"
OLLAMA_URL  = "http://localhost:11434"           # your Ollama instance
MODEL_NAME  = "qwen2.5-coder:14b"               # any model Ollama serves
```

You can also override at runtime:

```bash
autodev --backend ollama --model gemma4:e4b
```

### Tested Models

All models were tested against the same task: plan, implement, compile, test, and debug a C "hello world" project with a Makefile. Tested on Ollama with GPU offload.

| Model | Size | Result | Speed | Notes |
|-------|------|--------|-------|-------|
| `gemma4:e4b` | ~12B | ✅ Pass | Fast | Clean run, no debug needed. Best balance of speed and quality. **Recommended.** |
| `gemma3:27b` | 27B | ✅ Pass | Slow | Works well but slow. Needed sandbox fixes during early testing. Good for complex projects. |
| `gemma4:e2b` | ~8B | ❌ Fail | Very fast | Plans OK, but setup created a directory that blocked the executable name. Could not self-correct — repeated the same failed approach 10 times. |
| `gemma3:4b` | 4B | ❌ Fail | Very fast | Steps 1–4 passed, but debugger hallucinated a nonexistent `hello.c` file and could not reason about what files actually exist on disk. |
| `qwen2.5-coder:7b` | 7B | ❌ Fail | Fast | Classified "create main.c" as setup instead of implement, so the file was never generated. Debugger could not write a valid Makefile after 10 attempts. |

**Takeaway:** Models below ~12B parameters can plan and generate simple code, but they cannot self-correct when things go wrong. They repeat failed approaches, hallucinate files, and produce broken build scripts. **14B+ recommended for autonomous development.**

## Project Structure

Your project folder needs:

```
my-project/
├── description.txt    # Required — what to build
└── manuals/           # Required — reference docs (use -nomanual to skip)
    ├── api-spec.md
    └── protocol.txt
```

AutoDev creates these files as it works:

```
my-project/
├── description.txt
├── manuals/
├── plan.json          # The development plan (human-readable)
├── worklog.json       # Every action logged with timestamps
├── dependency.txt     # External dependencies (compilers, libraries)
├── .autodev_state.json
└── ... your project files ...
```

## Features

### Autonomous Development Loop
Reads the description, understands the requirements, creates a structured plan, then executes it: setup → implement → compile → test → debug → finalize. No human input needed during execution.

### Self-Debugging
When compilation or tests fail, AutoDev enters a debug loop:
- Analyzes the error and source code
- Diagnoses root cause (not just symptoms)
- Generates a fix and applies it
- Verifies the fix works
- Rolls back automatically if the fix makes things worse
- Tracks failed approaches so it never repeats the same fix twice

### Resumable Sessions
Every action is logged to `worklog.json`. If AutoDev is interrupted or fails:
```bash
# Just run it again — it picks up where it left off
autodev
```
It reads the worklog, loads the existing plan, and continues from the last incomplete step.

### Cycle & Hallucination Detection
Detects when the LLM is stuck in a loop (producing similar outputs repeatedly) and automatically clears stale context to break out.

### Sandboxed Execution
- All file operations are confined to the working directory
- Shell commands are validated against a whitelist of safe tools (compilers, build tools, standard utilities)
- `sudo` and system-level commands are blocked
- Path traversal outside the working directory is prevented

### Language Agnostic
Works with any programming language the LLM knows. Tested with C, Python, and Makefiles. The LLM determines the appropriate build tools, compilers, and project structure.

### Dependency Tracking
All external dependencies (compilers, libraries, tools) are recorded in `dependency.txt` so you know exactly what the project needs.

## CLI Options

```
autodev [options]

Options:
  -nomanual              Skip reading manuals/ directory (for simple tasks)
  -web PORT              Start live web dashboard on PORT (e.g. -web 4500)
  --backend {ollama,vllm}  LLM backend (default: from config)
  --model MODEL          Model name (default: from config)
  --workdir DIR          Working directory (default: current directory)
```

### Web Dashboard

Run `autodev -web 4500` and open `http://localhost:4500` in your browser.

The dashboard shows three panels:
- **Plan Progress** — step-by-step checklist with ✓/✗/▸ status and completion counter
- **Project Files** — clickable file tree with live content viewer
- **LLM Activity** — real-time log of all actions and model thinking (newest first)

Updates are pushed live via Server-Sent Events — no page refresh needed.

### Incremental Updates

If you change `description.txt` and restart AutoDev, it detects the change and re-plans incrementally — telling the LLM what files already exist so it builds on previous work instead of starting over.

## Architecture

```
autodev/
├── config.py       # LLM backend settings, timeouts, expert system prompt
├── llm.py          # Ollama + vLLM communication with streaming and retry
├── context.py      # Token-aware context window with relevance scoring
├── planner.py      # Reads description + manuals, creates development plan
├── executor.py     # Code generation, file writing, compilation
├── debugger.py     # Error analysis, fix generation, rollback
├── sandbox.py      # Whitelist-based command validation, path confinement
├── logger.py       # Action logging to console and persistent worklog
├── dependency.py   # Dependency tracking
├── resume.py       # State persistence and session resumption
├── main.py         # CLI orchestrator
└── autodev-cli     # Symlink-friendly entry point
```

## How the Description Should Be Written

Be specific. Every sentence is treated as a requirement.

**Good:**
```
Language: C
Build a TCP echo server that listens on port 8080.
It should handle multiple clients using fork().
Include proper signal handling for SIGCHLD to avoid zombies.
Include a Makefile with 'all' and 'clean' targets.
The server should log connections to stderr.
```

**Too vague:**
```
Make a server program.
```

## Limitations

- Quality depends entirely on the LLM model — larger models produce better results
- No interactive mode — you can't guide it mid-run (by design)
- Manual parsing is plain text only (no PDF extraction)
- Token counting is estimated, not exact
- The LLM may occasionally produce code that compiles but doesn't meet all requirements