Agentic Workflows & MCP

Using MLSys·im as the physics backend for LLM Agents.

The ultimate vision for mlsysim is not just to educate humans, but to serve as the ground-truth physics engine for autonomous AI systems.

Large Language Models (like Claude 3.5 Sonnet, GPT-4o, or Gemini Pro) are excellent at writing code and structuring YAML, but they frequently hallucinate complex math. If you ask an LLM to calculate the Inter-Token Latency of a 70B model on 8x H100s with PagedAttention, it will confidently guess wrong.

By wrapping mlsysim in the Model Context Protocol (MCP), you give your agents the ability to dynamically design hardware clusters, run them through a dimensionally strict physics engine, and interpret the precise bottlenecks to iteratively improve the design.

1. Using MLSys·im with Claude Desktop (MCP)

We provide a production-ready MCP server that exposes the mlsysim engine to Claude Desktop.

Setup

Ensure you have installed mlsysim and the mcp Python package:
```
pip install mlsysim mcp
```
Open your Claude Desktop configuration file.
- macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
- Windows: %APPDATA%\Claude\claude_desktop_config.json

Add the mlsysim server:

{
  "mcpServers": {
    "mlsysim": {
      "command": "python3",
      "args": ["-m", "mlsysim.examples.mcp_server"]
    }
  }
}

Restart Claude Desktop. You will now see a hammer icon 🛠️ indicating the tools are available.

What to ask Claude

You can now ask Claude questions that require deep hardware simulation: > “I need to serve Llama-3 70B. Can you use your mlsysim tool to find out if it fits on a single H100? If it doesn’t, design a cluster that does, and tell me the annual TCO.”

Claude will automatically generate the required YAML schema, call the evaluate_cluster_yaml tool, see the Out-of-Memory (OOM) failure, correct its design to use 2 nodes, and return the final mathematical truth to you.

2. The Agentic “Predict-Compute-Reflect” Loop

If you are building your own multi-agent system (using LangChain, AutoGen, or raw Gemini APIs), mlsysim’s schema architecture is built specifically for you.

The Input: Export our schema using mlsysim schema --type plan. Feed this JSON schema directly into your LLM’s system prompt or tool definition. The LLM instantly knows how to structure the request.
The Execution: Call mlsysim eval your_file.yaml --output json (or use the Python API).
The Feedback: Because mlsysim outputs a strictly-typed, flat JSON dictionary, your agent can easily parse the results. If f_status == "FAIL", the agent reads the f_summary (e.g., “OOM: Requires 140 GB but only has 80 GB”) and adjusts its design autonomously.

We have included a conceptual Python implementation of this loop in our repository at mlsysim/examples/gemini_design_loop.py.

3. Exposed MCP Tools

When running as an MCP server, mlsysim exposes the following tools to the connected agent:

Tool	Description
`get_schemas`	Return the current JSON schema for valid MLSys·im YAML plans
`evaluate_cluster_yaml`	Evaluate a YAML cluster specification through the full 3-lens scorecard (Feasibility, Performance, Macro)

The agent can call these tools programmatically. The YAML schema can be exported with:

mlsysim schema --type plan

Feed this schema into your agent’s system prompt or tool definition so it knows how to structure valid requests.

4. Troubleshooting

Claude doesn’t show the hammer icon:: Make sure you restarted Claude Desktop after editing the config. Check that python3 /path/to/MLSysBook/mlsysim/examples/mcp_server.py runs without errors in your terminal.
Agent gets OOM errors:: This is expected behavior — it means the model doesn’t fit on the specified hardware. The agent should read the error message and adjust (e.g., add nodes, reduce precision, or pick larger hardware).
Agent hallucinates hardware specs:: Remind the agent to call get_schemas and use registry names from the schema/docs rather than inventing specs. The llms.txt file at the root of the docs site contains agent-specific guidance.

5. Why This Matters

The “academic simulator graveyard” is filled with tools that were too hard for humans to compile and too unstructured for machines to use.

By defining mlsysim through strict Pydantic schemas and standardizing the 22 ML Systems Walls, we have created an intermediate representation (IR) that both humans and AI agents can understand. In the near future, you will not manually calculate whether a new model architecture is viable; you will ask your Agentic Architect to run 10,000 simulations against the mlsysim physics engine while you sleep.