# AutoDev — Autonomous CLI Development Studio AutoDev reads a project description and reference manuals, then autonomously plans, implements, compiles, tests, and debugs complete software projects using a local LLM. No cloud APIs. No subscriptions. Runs entirely on your machine with [Ollama](https://ollama.com) or [vLLM](https://github.com/vllm-project/vllm). ## How It Works ``` description.txt + manuals/ → LLM plans the project → writes code → compiles → tests → debugs → delivers ``` 1. You write a `description.txt` explaining what you want built 2. You put reference documentation in a `manuals/` folder 3. You run `autodev` 4. AutoDev reads everything, creates a development plan, and executes it step by step 5. If something fails to compile or run, it debugs itself — analyzing errors, generating fixes, and retrying 6. When done, you have a working project You don't interact with it. You watch it work. ## Quick Start ```bash # 1. Make sure Ollama is running with a model loaded ollama run qwen2.5-coder:14b # 2. Set up your project folder mkdir my-project && cd my-project mkdir manuals # 3. Write what you want cat > description.txt << 'EOF' Language: Python Build a CLI tool that converts CSV files to JSON. It should accept an input file and output file as arguments. Handle errors gracefully if the input file doesn't exist. EOF # 4. Add any reference docs (API docs, specs, examples) cp csv-format-spec.pdf manuals/ # 5. Run AutoDev autodev ``` ## Installation ```bash # Clone the repository git clone https://github.com/your-username/autodev.git cd autodev # Symlink to your PATH ln -s $(pwd)/autodev/autodev-cli ~/.local/bin/autodev # or ln -s $(pwd)/autodev/autodev-cli ~/bin/autodev # Alternatively, run directly python -m autodev --workdir /path/to/project ``` ### Requirements - Python 3.10+ - [Ollama](https://ollama.com) or [vLLM](https://github.com/vllm-project/vllm) running locally or on your network - No pip dependencies — uses only the Python standard library ## Configuration Edit `autodev/config.py` to set your LLM backend: ```python LLM_BACKEND = "ollama" # "ollama" or "vllm" OLLAMA_URL = "http://localhost:11434" # your Ollama instance MODEL_NAME = "qwen2.5-coder:14b" # any model Ollama serves ``` You can also override at runtime: ```bash autodev --backend ollama --model gemma4:e4b ``` ### Tested Models All models were tested against the same task: plan, implement, compile, test, and debug a C "hello world" project with a Makefile. Tested on Ollama with GPU offload. | Model | Size | Result | Speed | Notes | |-------|------|--------|-------|-------| | `gemma4:e4b` | ~12B | ✅ Pass | Fast | Clean run, no debug needed. Best balance of speed and quality. **Recommended.** | | `gemma3:27b` | 27B | ✅ Pass | Slow | Works well but slow. Needed sandbox fixes during early testing. Good for complex projects. | | `gemma4:e2b` | ~8B | ❌ Fail | Very fast | Plans OK, but setup created a directory that blocked the executable name. Could not self-correct — repeated the same failed approach 10 times. | | `gemma3:4b` | 4B | ❌ Fail | Very fast | Steps 1–4 passed, but debugger hallucinated a nonexistent `hello.c` file and could not reason about what files actually exist on disk. | | `qwen2.5-coder:7b` | 7B | ❌ Fail | Fast | Classified "create main.c" as setup instead of implement, so the file was never generated. Debugger could not write a valid Makefile after 10 attempts. | **Takeaway:** Models below ~12B parameters can plan and generate simple code, but they cannot self-correct when things go wrong. They repeat failed approaches, hallucinate files, and produce broken build scripts. **14B+ recommended for autonomous development.** ## Project Structure Your project folder needs: ``` my-project/ ├── description.txt # Required — what to build └── manuals/ # Required — reference docs (use -nomanual to skip) ├── api-spec.md └── protocol.txt ``` AutoDev creates these files as it works: ``` my-project/ ├── description.txt ├── manuals/ ├── plan.json # The development plan (human-readable) ├── worklog.json # Every action logged with timestamps ├── dependency.txt # External dependencies (compilers, libraries) ├── .autodev_state.json └── ... your project files ... ``` ## Features ### Autonomous Development Loop Reads the description, understands the requirements, creates a structured plan, then executes it: setup → implement → compile → test → debug → finalize. No human input needed during execution. ### Self-Debugging When compilation or tests fail, AutoDev enters a debug loop: - Analyzes the error and source code - Diagnoses root cause (not just symptoms) - Generates a fix and applies it - Verifies the fix works - Rolls back automatically if the fix makes things worse - Tracks failed approaches so it never repeats the same fix twice ### Resumable Sessions Every action is logged to `worklog.json`. If AutoDev is interrupted or fails: ```bash # Just run it again — it picks up where it left off autodev ``` It reads the worklog, loads the existing plan, and continues from the last incomplete step. ### Cycle & Hallucination Detection Detects when the LLM is stuck in a loop (producing similar outputs repeatedly) and automatically clears stale context to break out. ### Sandboxed Execution - All file operations are confined to the working directory - Shell commands are validated against a whitelist of safe tools (compilers, build tools, standard utilities) - `sudo` and system-level commands are blocked - Path traversal outside the working directory is prevented ### Language Agnostic Works with any programming language the LLM knows. Tested with C, Python, and Makefiles. The LLM determines the appropriate build tools, compilers, and project structure. ### Dependency Tracking All external dependencies (compilers, libraries, tools) are recorded in `dependency.txt` so you know exactly what the project needs. ## CLI Options ``` autodev [options] Options: -nomanual Skip reading manuals/ directory (for simple tasks) -web PORT Start live web dashboard on PORT (e.g. -web 4500) --backend {ollama,vllm} LLM backend (default: from config) --model MODEL Model name (default: from config) --workdir DIR Working directory (default: current directory) ``` ### Web Dashboard Run `autodev -web 4500` and open `http://localhost:4500` in your browser. The dashboard shows three panels: - **Plan Progress** — step-by-step checklist with ✓/✗/▸ status and completion counter - **Project Files** — clickable file tree with live content viewer - **LLM Activity** — real-time log of all actions and model thinking (newest first) Updates are pushed live via Server-Sent Events — no page refresh needed. ### Incremental Updates If you change `description.txt` and restart AutoDev, it detects the change and re-plans incrementally — telling the LLM what files already exist so it builds on previous work instead of starting over. ## Architecture ``` autodev/ ├── config.py # LLM backend settings, timeouts, expert system prompt ├── llm.py # Ollama + vLLM communication with streaming and retry ├── context.py # Token-aware context window with relevance scoring ├── planner.py # Reads description + manuals, creates development plan ├── executor.py # Code generation, file writing, compilation ├── debugger.py # Error analysis, fix generation, rollback ├── sandbox.py # Whitelist-based command validation, path confinement ├── logger.py # Action logging to console and persistent worklog ├── dependency.py # Dependency tracking ├── resume.py # State persistence and session resumption ├── main.py # CLI orchestrator └── autodev-cli # Symlink-friendly entry point ``` ## How the Description Should Be Written Be specific. Every sentence is treated as a requirement. **Good:** ``` Language: C Build a TCP echo server that listens on port 8080. It should handle multiple clients using fork(). Include proper signal handling for SIGCHLD to avoid zombies. Include a Makefile with 'all' and 'clean' targets. The server should log connections to stderr. ``` **Too vague:** ``` Make a server program. ``` ## Limitations - Quality depends entirely on the LLM model — larger models produce better results - No interactive mode — you can't guide it mid-run (by design) - Manual parsing is plain text only (no PDF extraction) - Token counting is estimated, not exact - The LLM may occasionally produce code that compiles but doesn't meet all requirements