233 lines
8.6 KiB
Markdown
233 lines
8.6 KiB
Markdown
# AutoDev — Autonomous CLI Development Studio
|
||
|
||
AutoDev reads a project description and reference manuals, then autonomously plans, implements, compiles, tests, and debugs complete software projects using a local LLM.
|
||
|
||
No cloud APIs. No subscriptions. Runs entirely on your machine with [Ollama](https://ollama.com) or [vLLM](https://github.com/vllm-project/vllm).
|
||
|
||
## How It Works
|
||
|
||
```
|
||
description.txt + manuals/ → LLM plans the project → writes code → compiles → tests → debugs → delivers
|
||
```
|
||
|
||
1. You write a `description.txt` explaining what you want built
|
||
2. You put reference documentation in a `manuals/` folder
|
||
3. You run `autodev`
|
||
4. AutoDev reads everything, creates a development plan, and executes it step by step
|
||
5. If something fails to compile or run, it debugs itself — analyzing errors, generating fixes, and retrying
|
||
6. When done, you have a working project
|
||
|
||
You don't interact with it. You watch it work.
|
||
|
||
## Quick Start
|
||
|
||
```bash
|
||
# 1. Make sure Ollama is running with a model loaded
|
||
ollama run gemma4:e4b
|
||
|
||
# 2. Set up your project folder
|
||
mkdir my-project && cd my-project
|
||
mkdir manuals
|
||
|
||
# 3. Write what you want
|
||
cat > description.txt << 'EOF'
|
||
Language: Python
|
||
Build a CLI tool that converts CSV files to JSON.
|
||
It should accept an input file and output file as arguments.
|
||
Handle errors gracefully if the input file doesn't exist.
|
||
EOF
|
||
|
||
# 4. Add any reference docs (API docs, specs, examples)
|
||
cp csv-format-spec.pdf manuals/
|
||
|
||
# 5. Run AutoDev
|
||
autodev
|
||
```
|
||
|
||
## Installation
|
||
|
||
```bash
|
||
# Clone the repository
|
||
git clone https://github.com/your-username/autodev.git
|
||
cd autodev
|
||
|
||
# Symlink to your PATH
|
||
ln -s $(pwd)/autodev/autodev-cli ~/.local/bin/autodev
|
||
# or
|
||
ln -s $(pwd)/autodev/autodev-cli ~/bin/autodev
|
||
|
||
# Alternatively, run directly
|
||
python -m autodev --workdir /path/to/project
|
||
```
|
||
|
||
### Requirements
|
||
|
||
- Python 3.10+
|
||
- [Ollama](https://ollama.com) or [vLLM](https://github.com/vllm-project/vllm) running locally or on your network
|
||
- No pip dependencies — uses only the Python standard library
|
||
|
||
## Configuration
|
||
|
||
Edit `autodev/config.py` to set your LLM backend:
|
||
|
||
```python
|
||
LLM_BACKEND = "ollama" # "ollama" or "vllm"
|
||
OLLAMA_URL = "http://localhost:11434" # your Ollama instance
|
||
MODEL_NAME = "qwen2.5-coder:14b" # any model Ollama serves
|
||
```
|
||
|
||
You can also override at runtime:
|
||
|
||
```bash
|
||
autodev --backend ollama --model gemma4:e4b
|
||
```
|
||
|
||
### Tested Models
|
||
|
||
All models were tested against the same task: plan, implement, compile, test, and debug a C "hello world" project with a Makefile. Tested on Ollama with GPU offload.
|
||
|
||
| Model | Size | Result | Speed | Notes |
|
||
|-------|------|--------|-------|-------|
|
||
| `gemma4:e4b` | ~12B | ✅ Pass | Fast | Clean run, no debug needed. Best balance of speed and quality. **Recommended.** |
|
||
| `gemma3:27b` | 27B | ✅ Pass | Slow | Works well but slow. Needed sandbox fixes during early testing. Good for complex projects. |
|
||
| `gemma4:e2b` | ~8B | ❌ Fail | Very fast | Plans OK, but setup created a directory that blocked the executable name. Could not self-correct — repeated the same failed approach 10 times. |
|
||
| `gemma3:4b` | 4B | ❌ Fail | Very fast | Steps 1–4 passed, but debugger hallucinated a nonexistent `hello.c` file and could not reason about what files actually exist on disk. |
|
||
| `qwen2.5-coder:7b` | 7B | ❌ Fail | Fast | Classified "create main.c" as setup instead of implement, so the file was never generated. Debugger could not write a valid Makefile after 10 attempts. |
|
||
|
||
**Takeaway:** Models below ~12B parameters can plan and generate simple code, but they cannot self-correct when things go wrong. They repeat failed approaches, hallucinate files, and produce broken build scripts. **14B+ recommended for autonomous development.**
|
||
|
||
## Project Structure
|
||
|
||
Your project folder needs:
|
||
|
||
```
|
||
my-project/
|
||
├── description.txt # Required — what to build
|
||
└── manuals/ # Required — reference docs (use -nomanual to skip)
|
||
├── api-spec.md
|
||
└── protocol.txt
|
||
```
|
||
|
||
AutoDev creates these files as it works:
|
||
|
||
```
|
||
my-project/
|
||
├── description.txt
|
||
├── manuals/
|
||
├── plan.json # The development plan (human-readable)
|
||
├── worklog.json # Every action logged with timestamps
|
||
├── dependency.txt # External dependencies (compilers, libraries)
|
||
├── .autodev_state.json
|
||
└── ... your project files ...
|
||
```
|
||
|
||
## Features
|
||
|
||
### Autonomous Development Loop
|
||
Reads the description, understands the requirements, creates a structured plan, then executes it: setup → implement → compile → test → debug → finalize. No human input needed during execution.
|
||
|
||
### Self-Debugging
|
||
When compilation or tests fail, AutoDev enters a debug loop:
|
||
- Analyzes the error and source code
|
||
- Diagnoses root cause (not just symptoms)
|
||
- Generates a fix and applies it
|
||
- Verifies the fix works
|
||
- Rolls back automatically if the fix makes things worse
|
||
- Tracks failed approaches so it never repeats the same fix twice
|
||
|
||
### Resumable Sessions
|
||
Every action is logged to `worklog.json`. If AutoDev is interrupted or fails:
|
||
```bash
|
||
# Just run it again — it picks up where it left off
|
||
autodev
|
||
```
|
||
It reads the worklog, loads the existing plan, and continues from the last incomplete step.
|
||
|
||
### Cycle & Hallucination Detection
|
||
Detects when the LLM is stuck in a loop (producing similar outputs repeatedly) and automatically clears stale context to break out.
|
||
|
||
### Sandboxed Execution
|
||
- All file operations are confined to the working directory
|
||
- Shell commands are validated against a whitelist of safe tools (compilers, build tools, standard utilities)
|
||
- `sudo` and system-level commands are blocked
|
||
- Path traversal outside the working directory is prevented
|
||
|
||
### Language Agnostic
|
||
Works with any programming language the LLM knows. Tested with C, Python, and Makefiles. The LLM determines the appropriate build tools, compilers, and project structure.
|
||
|
||
### Dependency Tracking
|
||
All external dependencies (compilers, libraries, tools) are recorded in `dependency.txt` so you know exactly what the project needs.
|
||
|
||
## CLI Options
|
||
|
||
```
|
||
autodev [options]
|
||
|
||
Options:
|
||
-nomanual Skip reading manuals/ directory (for simple tasks)
|
||
-web PORT Start live web dashboard on PORT (e.g. -web 4500)
|
||
--backend {ollama,vllm} LLM backend (default: from config)
|
||
--model MODEL Model name (default: from config)
|
||
--workdir DIR Working directory (default: current directory)
|
||
```
|
||
|
||
### Web Dashboard
|
||
|
||
Run `autodev -web 4500` and open `http://localhost:4500` in your browser.
|
||
|
||
The dashboard shows three panels:
|
||
- **Plan Progress** — step-by-step checklist with ✓/✗/▸ status and completion counter
|
||
- **Project Files** — clickable file tree with live content viewer
|
||
- **LLM Activity** — real-time log of all actions and model thinking (newest first)
|
||
|
||
Updates are pushed live via Server-Sent Events — no page refresh needed.
|
||
|
||
### Incremental Updates
|
||
|
||
If you change `description.txt` and restart AutoDev, it detects the change and re-plans incrementally — telling the LLM what files already exist so it builds on previous work instead of starting over.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
autodev/
|
||
├── config.py # LLM backend settings, timeouts, expert system prompt
|
||
├── llm.py # Ollama + vLLM communication with streaming and retry
|
||
├── context.py # Token-aware context window with relevance scoring
|
||
├── planner.py # Reads description + manuals, creates development plan
|
||
├── executor.py # Code generation, file writing, compilation
|
||
├── debugger.py # Error analysis, fix generation, rollback
|
||
├── sandbox.py # Whitelist-based command validation, path confinement
|
||
├── logger.py # Action logging to console and persistent worklog
|
||
├── dependency.py # Dependency tracking
|
||
├── resume.py # State persistence and session resumption
|
||
├── main.py # CLI orchestrator
|
||
└── autodev-cli # Symlink-friendly entry point
|
||
```
|
||
|
||
## How the Description Should Be Written
|
||
|
||
Be specific. Every sentence is treated as a requirement.
|
||
|
||
**Good:**
|
||
```
|
||
Language: C
|
||
Build a TCP echo server that listens on port 8080.
|
||
It should handle multiple clients using fork().
|
||
Include proper signal handling for SIGCHLD to avoid zombies.
|
||
Include a Makefile with 'all' and 'clean' targets.
|
||
The server should log connections to stderr.
|
||
```
|
||
|
||
**Too vague:**
|
||
```
|
||
Make a server program.
|
||
```
|
||
|
||
## Limitations
|
||
|
||
- Quality depends entirely on the LLM model — larger models produce better results
|
||
- No interactive mode — you can't guide it mid-run (by design)
|
||
- Manual parsing is plain text only (no PDF extraction)
|
||
- Token counting is estimated, not exact
|
||
- The LLM may occasionally produce code that compiles but doesn't meet all requirements
|