Agentic Workflows & MCP
Using MLSys·im as the physics backend for LLM Agents.
The ultimate vision for mlsysim is not just to educate humans, but to serve as the ground-truth physics engine for autonomous AI systems.
Large Language Models (like Claude 3.5 Sonnet, GPT-4o, or Gemini Pro) are excellent at writing code and structuring YAML, but they frequently hallucinate complex math. If you ask an LLM to calculate the Inter-Token Latency of a 70B model on 8x H100s with PagedAttention, it will confidently guess wrong.
By wrapping mlsysim in the Model Context Protocol (MCP), you give your agents the ability to dynamically design hardware clusters, run them through a dimensionally strict physics engine, and interpret the precise bottlenecks to iteratively improve the design.
1. Using MLSys·im with Claude Desktop (MCP)
We provide a production-ready MCP server that exposes the mlsysim engine to Claude Desktop.
Setup
Ensure you have installed
mlsysimand themcpPython package:pip install mlsysim mcpOpen your Claude Desktop configuration file.
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
- macOS:
Add the
mlsysimserver:{ "mcpServers": { "mlsysim": { "command": "python3", "args": ["-m", "mlsysim.examples.mcp_server"] } } }Restart Claude Desktop. You will now see a hammer icon 🛠️ indicating the tools are available.
What to ask Claude
You can now ask Claude questions that require deep hardware simulation: > “I need to serve Llama-3 70B. Can you use your mlsysim tool to find out if it fits on a single H100? If it doesn’t, design a cluster that does, and tell me the annual TCO.”
Claude will automatically generate the required YAML schema, call the evaluate_cluster_yaml tool, see the Out-of-Memory (OOM) failure, correct its design to use 2 nodes, and return the final mathematical truth to you.
2. The Agentic “Predict-Compute-Reflect” Loop
If you are building your own multi-agent system (using LangChain, AutoGen, or raw Gemini APIs), mlsysim’s schema architecture is built specifically for you.
- The Input: Export our schema using
mlsysim schema --type plan. Feed this JSON schema directly into your LLM’s system prompt or tool definition. The LLM instantly knows how to structure the request. - The Execution: Call
mlsysim eval your_file.yaml --output json(or use the Python API). - The Feedback: Because
mlsysimoutputs a strictly-typed, flat JSON dictionary, your agent can easily parse the results. Iff_status == "FAIL", the agent reads thef_summary(e.g., “OOM: Requires 140 GB but only has 80 GB”) and adjusts its design autonomously.
We have included a conceptual Python implementation of this loop in our repository at mlsysim/examples/gemini_design_loop.py.
3. Exposed MCP Tools
When running as an MCP server, mlsysim exposes the following tools to the connected agent:
| Tool | Description |
|---|---|
evaluate_cluster_yaml |
Evaluate a YAML cluster specification through the full 3-lens scorecard (Feasibility, Performance, Macro) |
list_hardware |
List all hardware in the Zoo with specs |
list_models |
List all models in the Zoo with parameter counts |
The agent can call these tools programmatically. The YAML schema can be exported with:
mlsysim schema --type planFeed this schema into your agent’s system prompt or tool definition so it knows how to structure valid requests.
4. Troubleshooting
- Claude doesn’t show the hammer icon:
-
Make sure you restarted Claude Desktop after editing the config. Check that
python3 -m mlsysim.examples.mcp_serverruns without errors in your terminal. - Agent gets OOM errors:
- This is expected behavior — it means the model doesn’t fit on the specified hardware. The agent should read the error message and adjust (e.g., add nodes, reduce precision, or pick larger hardware).
- Agent hallucinates hardware specs:
-
Remind the agent to use
list_hardwareto discover available hardware rather than inventing specs. Thellms.txtfile at the root of the docs site contains agent-specific guidance.
5. Why This Matters
The “academic simulator graveyard” is filled with tools that were too hard for humans to compile and too unstructured for machines to use.
By defining mlsysim through strict Pydantic schemas and standardizing the 22 ML Systems Walls, we have created an intermediate representation (IR) that both humans and AI agents can understand. In the near future, you will not manually calculate whether a new model architecture is viable; you will ask your Agentic Architect to run 10,000 simulations against the mlsysim physics engine while you sleep.