The MLSys·im Philosophy

First-Principles Analytical Modeling with Zero Hallucinations

MLSys·im was built to solve a specific problem in machine learning systems education and engineering: the gap between abstract intuition and cycle-accurate simulation.

When reasoning about ML infrastructure—whether sizing a serving fleet for LLaMA-3 or estimating the carbon footprint of a 10,000-GPU training run—engineers often rely on messy spreadsheets filled with hidden assumptions, unit-conversion errors, and “magic numbers.” Conversely, cycle-accurate simulators require weeks of setup, deep proprietary knowledge, and hours to run a single workload.

MLSys·im provides a third path: First-Principles Analytical Modeling.

To achieve textbook-grade rigor, the framework is built on four non-negotiable design principles.

1. No Hallucinations, No Magic Numbers

In an era of generative AI, it is easy to ask a language model to “estimate the latency of ResNet-50 on an A100.” The model will confidently output a number—often a hallucination based on an unverified blend of internet forum posts.

MLSys·im does not guess. It computes.

%%{init: {'theme': 'neutral'}}%%
flowchart LR
    A[<b>Primary Source</b><br/><i>Datasheet / Paper</i>] -->|provenance| B(<b>Registry</b><br/><i>e.g., Hardware.Cloud.H100</i>)
    B -->|Sourced Value<br/>+ SI Units| C{<b>Solver Engine</b>}
    C --> D[<b>Trusted Estimate</b><br/><i>No Magic Numbers</i>]

    style A fill:#f8fafc,stroke:#cbd5e1
    style B fill:#f1f5f9,stroke:#94a3b8
    style C fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style D fill:#ecfdf5,stroke:#10b981

Every single number in the MLSys·im ecosystem is mathematically derived from explicit, vetted constants. If a solver needs the memory bandwidth of an H100 GPU, it does not use a hardcoded float 3000.0. It queries the Hardware registry, which returns a Sourced object containing exactly 3.35 TB/s, complete with a Provenance struct linking directly to the NVIDIA H100 PCIe Datasheet.

If a student or engineer asks, “Where did this number come from?”, the framework provides the exact URL, the academic paper, or the empirical methodology used to derive it. There are no “magic numbers” hidden in the source code.

2. Dimensional Strictness (SI Units Everywhere)

The most common source of error in back-of-the-envelope systems analysis is unit mismatch (e.g., dividing gigabytes by gigabits per second without a factor of 8).

To physically prevent these errors, MLSys·im enforces strict dimensional analysis at runtime using the pint unit library.

A workload does not require 140 memory; it requires 140 * ureg.GB.
A network does not deliver 400 bandwidth; it delivers 400 * ureg.Gbps.
If a user accidentally attempts to add a latency (ms) to a throughput (1/s), the Python interpreter will instantly raise a DimensionalityError.

By forcing every input, intermediate variable, and output to carry physical SI units, the framework guarantees that the mathematical “physics” of the engine are structurally sound.

3. Analytical Speed over Cycle-Accurate Simulation

MLSys·im is an analytical engine, not a discrete-event simulator.

We do not track individual packets across a network switch, nor do we simulate warp-scheduler occupancies inside a GPU streaming multiprocessor. Instead, we use closed-form equations (like the Roofline model, the \(\alpha\)-\(\beta\) communication model, and Erlang-C queueing theory) to establish the hard physical bounds of a system.

By relying on analytical math (\(Y = f(X)\)): * Evaluation is instantaneous: A full-stack analysis of a 100,000-GPU cluster takes less than 0.3 seconds on a standard laptop. * Bottlenecks are explicit: The math makes it trivial to calculate gradients (Sensitivity Analysis) or invert equations to solve for required hardware (Synthesis Analysis). * Intuition is preserved: Analytical models expose why a system is slow (e.g., “Arithmetic Intensity < Ridge Point”), whereas detailed simulations only tell you that it is slow.

To bridge the gap between theoretical peaks and realized performance, MLSys·im introduces a single, explicit Efficiency Coefficient (\(\eta\)) into compute-bound solvers (like Model FLOPs Utilization, MFU). This clearly separates the raw physics of the hardware from the software friction of the framework.

4. Separation of Demand and Supply

A recurring anti-pattern in systems modeling is tightly coupling a model to the hardware it runs on (e.g., writing a script specifically for “GPT-3 on A100”).

MLSys·im enforces a strict Demand vs. Supply abstraction:

%%{init: {'theme': 'neutral'}}%%
flowchart TB
    subgraph Demand["<b>Demand (Workloads)</b>"]
        direction TB
        W[<i>Transformer / CNN</i><br/>Parameters & FLOPs]
    end

    subgraph Supply["<b>Supply (Hardware & Systems)</b>"]
        direction TB
        H[<i>Compute & Memory</i><br/>TFLOP/s & Bandwidth]
    end

    Demand -->|Lower to graph| S{<b>Solvers (Layer E)</b>}
    Supply -->|Physical Constraints| S
    S --> R[<b>Analytical Prediction</b><br/><i>Latency, Bottleneck, Cost</i>]

    style Demand fill:#fef08a,stroke:#d97706
    style Supply fill:#ddd6fe,stroke:#7c3aed
    style S fill:#e0f2fe,stroke:#0284c7,stroke-width:2px

Demand (Workloads): A TransformerWorkload only knows about its parameters, sequence length, and arithmetic intensity. It has no concept of what a GPU is.
Supply (Hardware/Systems): A HardwareNode only knows about its peak TFLOP/s, memory hierarchy, and power draw. It has no concept of what a Transformer is.

The magic happens in the Solvers (Layer E). Solvers act as the brokers. They take the abstract Demand, project it onto the physical Supply, and apply the mathematical laws of the universe to predict the outcome.

This decoupling means you can define a workload once and instantly sweep it across every hardware device in the Silicon Zoo, from a 1-Watt microcontroller to a 100-MegaWatt supercomputer.

Conclusion

MLSys·im is the executable companion to the Machine Learning Systems textbook. It is designed to replace “gut feelings” and fragile spreadsheets with a composable, unit-safe, mathematically rigorous engineering tool.

When MLSys·im tells you a system will bottleneck on memory bandwidth, you can trust the math, check the units, and audit the datasheets.