I've been experimenting with multi-agent AI systems to automate parts of the software development workflow — not just "AI writes code," but a proper pipeline where specialized agents handle planning, coding, reviewing, and testing like a real dev team would. After trying a few frameworks, I settled on CrewAI as the fastest way to get a working software development pipeline up and running. Here's exactly how to set it up.

What We're Building

By the end of this guide, you'll have a working pipeline that takes a plain-English feature request and runs it through four AI agents:

  1. Product Manager — turns your request into a clear technical spec
  2. Architect — designs the file structure and tech decisions
  3. Developer — writes the actual code
  4. Code Reviewer — catches bugs and produces the final output

Each agent passes its work to the next, like a real development workflow. You give it "build me a bookmark API" and get back reviewed, production-ready code.

Prerequisites

  • Python 3.10+
  • An API key from Anthropic (Claude) or OpenAI
  • A terminal and about 15 minutes

Step 1: Project Setup

mkdir crewai-dev-pipeline && cd crewai-dev-pipeline
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

Install CrewAI:

pip install crewai crewai-tools

Set your API key:

# For Claude (recommended)
export ANTHROPIC_API_KEY="your-key-here"

# Or for OpenAI
export OPENAI_API_KEY="your-key-here"

Step 2: Define the Dev Team (Agents)

Each agent gets a role, goal, and backstory. The backstory isn't fluff — it anchors the LLM's behavior and noticeably improves output quality.

Create agents.py:

from crewai import Agent


def create_dev_team(model: str = "claude-sonnet-4-5-20250514"):
    """Assemble the software development agent team."""

    pm = Agent(
        role="Product Manager",
        goal="Transform vague feature requests into clear, actionable technical specs",
        backstory=(
            "You are a seasoned product manager with 10 years of experience "
            "shipping software. You write unambiguous requirements that "
            "developers can implement without guesswork. You always define "
            "API contracts, data models, edge cases, and acceptance criteria."
        ),
        llm=model,
        verbose=True,
    )

    architect = Agent(
        role="Software Architect",
        goal="Design clean, modular system architecture from product specs",
        backstory=(
            "You are a principal software architect. You favor simplicity "
            "over cleverness, pick proven technologies, and produce clear "
            "file structures with documented technical decisions. You always "
            "output a step-by-step implementation plan."
        ),
        llm=model,
        verbose=True,
    )

    developer = Agent(
        role="Senior Developer",
        goal="Write complete, production-quality code with no placeholders",
        backstory=(
            "You are a senior full-stack developer who writes clean, "
            "readable code. You follow best practices, handle errors "
            "properly, validate all inputs, and never leave TODO comments "
            "or incomplete implementations."
        ),
        llm=model,
        verbose=True,
    )

    reviewer = Agent(
        role="Code Reviewer",
        goal="Catch bugs, security issues, and produce the final corrected code",
        backstory=(
            "You are a staff engineer known for thorough code reviews. "
            "You check for security vulnerabilities, logic errors, missing "
            "error handling, and performance problems. When you find issues, "
            "you fix them — you don't just point them out."
        ),
        llm=model,
        verbose=True,
    )

    return pm, architect, developer, reviewer

Step 3: Define the Pipeline Tasks

Tasks are the actual work items. They chain together automatically — each task's output becomes context for the next agent.

Create tasks.py:

from crewai import Task


def create_pipeline_tasks(pm, architect, developer, reviewer):
    """Define the sequential software development tasks."""

    spec_task = Task(
        description=(
            "Analyze this feature request and write a detailed technical "
            "specification:\n\n{feature_request}\n\n"
            "Include: functional requirements, API endpoints and contracts, "
            "data models with field types, error scenarios and status codes, "
            "and acceptance criteria for each feature."
        ),
        expected_output=(
            "A structured technical spec in markdown with clear sections "
            "for requirements, API design, data models, and edge cases."
        ),
        agent=pm,
    )

    design_task = Task(
        description=(
            "Based on the technical spec, produce a system design. Include: "
            "technology stack with justifications, complete file and folder "
            "structure, key interfaces and data flow, dependency list, and "
            "an ordered implementation plan the developer should follow."
        ),
        expected_output=(
            "A technical design document with file tree, architecture "
            "decisions, and numbered implementation steps."
        ),
        agent=architect,
    )

    code_task = Task(
        description=(
            "Implement the complete solution following the technical design. "
            "Write every file with full implementations. Include proper "
            "error handling, input validation, type hints, and docstrings. "
            "No placeholders, no stubs, no TODOs."
        ),
        expected_output=(
            "Complete source code for every file, each clearly labeled "
            "with its file path, ready to run."
        ),
        agent=developer,
    )

    review_task = Task(
        description=(
            "Review all code from the developer. Check for:\n"
            "- Bugs and logic errors\n"
            "- Security vulnerabilities (injection, auth issues, etc.)\n"
            "- Missing error handling or input validation\n"
            "- Performance issues\n"
            "- Code style and readability\n\n"
            "Fix every issue you find and output the final corrected "
            "version of all source files."
        ),
        expected_output=(
            "A brief review summary listing what was found and fixed, "
            "followed by the complete final source code for all files."
        ),
        agent=reviewer,
    )

    return [spec_task, design_task, code_task, review_task]

The {feature_request} placeholder gets filled in at runtime when you kick off the crew.

Step 4: Wire It Up and Run

Create main.py:

from crewai import Crew, Process
from agents import create_dev_team
from tasks import create_pipeline_tasks


def run_dev_pipeline(feature_request: str, model: str = "claude-sonnet-4-5-20250514"):
    # Assemble the team
    pm, architect, developer, reviewer = create_dev_team(model)

    # Create the task chain
    tasks = create_pipeline_tasks(pm, architect, developer, reviewer)

    # Build the crew
    crew = Crew(
        agents=[pm, architect, developer, reviewer],
        tasks=tasks,
        process=Process.sequential,
        verbose=True,
    )

    # Run the pipeline
    result = crew.kickoff(inputs={"feature_request": feature_request})
    return result


if __name__ == "__main__":
    request = (
        "Build a REST API for a personal bookmarks manager using Python "
        "and FastAPI. Users should be able to create, read, update, and "
        "delete bookmarks. Each bookmark has a URL, title, description, "
        "and tags. Include tag-based filtering and full-text search."
    )

    result = run_dev_pipeline(request)

    # Save output
    with open("output.md", "w") as f:
        f.write(str(result))

    print("\n Pipeline complete! Check output.md")

Run it:

python main.py

You'll see each agent working through its task in real time. The full pipeline usually takes 2–4 minutes depending on complexity.

Step 5: Give Agents Real Tools

Agents get significantly more useful when they can interact with the file system. CrewAI has built-in tools for this:

from crewai_tools import FileWriterTool, DirectoryReadTool

developer = Agent(
    role="Senior Developer",
    goal="Write production-quality code and save it to the project directory",
    backstory="...",
    tools=[FileWriterTool(), DirectoryReadTool()],
    llm=model,
    verbose=True,
)

Now the Developer agent will actually create files on disk instead of just outputting code as text. Other useful tools to consider:

  • CodeInterpreterTool — lets agents run and test their own code
  • SerperDevTool — web search for looking up documentation
  • GithubSearchTool — search GitHub repos for reference implementations

Step 6: Add a Review-Revise Loop

A sequential pipeline is a great start, but the real power comes when the Reviewer can send code back to the Developer for fixes. You can do this with CrewAI's built-in task delegation:

developer = Agent(
    role="Senior Developer",
    goal="Write and revise code based on review feedback",
    backstory="...",
    allow_delegation=False,
    llm=model,
)

reviewer = Agent(
    role="Code Reviewer",
    goal="Review code and delegate fixes back to the developer if needed",
    backstory="...",
    allow_delegation=True,  # Can send work back
    llm=model,
)

For more complex loops with explicit retry limits, consider migrating to LangGraph which gives you conditional edges and state machines. But for most projects, CrewAI's delegation handles it well.

Project Structure

Your final project should look like this:

crewai-dev-pipeline/
├── agents.py          # Agent definitions
├── tasks.py           # Task definitions
├── main.py            # Entry point
├── output.md          # Pipeline output
├── requirements.txt   # Dependencies
└── venv/              # Virtual environment

And your requirements.txt:

crewai
crewai-tools

Tips That Saved Me Hours

Be explicit about output format. If you want the Developer to label each file with its path, say so. If you want the Reviewer to output corrected files (not just comments), say so. Agents follow instructions literally.

Watch your tokens. A full 4-agent run on a moderate feature can hit 50K–100K tokens. For cost optimization, you can use a smaller model for the PM and Reviewer (they need reasoning, not raw coding ability) and a stronger model for the Developer.

Log intermediate outputs. When the final code has a bug, you need to know if the spec was wrong, the design was off, or the coder made a mistake. I save each agent's output separately:

for i, task_output in enumerate(crew.tasks):
    with open(f"step_{i}_{task_output.agent.role}.md", "w") as f:
        f.write(str(task_output.output))

Start with a simple request. Don't throw "build me a full SaaS app" at it on day one. Start with something concrete and bounded like a single API endpoint or a CLI tool. Get a feel for how the agents hand off work before scaling up.

Always cap iterations. If you add feedback loops, set a max retry count (I use 3). Without it, agents can get stuck in an infinite cycle of the Reviewer finding issues and the Developer introducing new ones.

Where to Go from Here

This setup is a solid foundation. Once it's running well, natural next steps include adding a QA/Testing agent that writes and runs tests against the Developer's code, a DevOps agent that generates Dockerfiles and CI configs, and human-in-the-loop checkpoints where the pipeline pauses for your approval before coding starts.

For more complex workflows with branching logic and parallel agents, look into LangGraph — it's more setup but gives you full control over the execution graph.

The code for this pipeline is straightforward enough that you can have it running in 15 minutes. Give it a try and see what your AI dev team builds.