The Differential Explainer

Automated ‘Why?’ analysis for hardware upgrades.

analysis

intermediate

Learn how to use the DifferentialExplainer to automatically compare two configurations and generate a written explanation of the performance delta.

The Question

When you run an analytical comparison between an A100 and an H100, the output might say: - A100 Latency: 11.0 ms - H100 Latency: 8.0 ms

The speedup is 1.4x. But the hardware sheet says the H100 has 3.2x more FLOP/s! How do we automatically explain this discrepancy to a user or a stakeholder without manually digging through the formulas?

What You Will Learn

Compare two system evaluations automatically.
Generate a human-readable explanation of why a speedup did (or didn’t) match hardware specs.
Identify “Regime Shifts” where an upgrade fundamentally changes the bottleneck.

1. Setup

Import the necessary modules. We will use the standard Engine to get our baseline and proposed profiles, and the new DifferentialExplainer to compare them.

import mlsysim
from mlsysim.engine.engine import Engine
from mlsysim.engine.explainers import DifferentialExplainer

2. A Memory-Bound Upgrade (The Disappointment)

Let’s test the classic scenario: upgrading hardware for LLM Inference at a low batch size.

model = mlsysim.Models.Language.Llama3_8B

# Get our two profiles
prof_a100 = Engine.solve(model=model, hardware=mlsysim.Hardware.Cloud.A100, batch_size=1)
prof_h100 = Engine.solve(model=model, hardware=mlsysim.Hardware.Cloud.H100, batch_size=1)

# Ask the explainer what happened
explanation = DifferentialExplainer.compare_performance(
    baseline=prof_a100,
    proposal=prof_h100
)

print(explanation)

Output:

📊 Differential Analysis: Proposal vs. Baseline
• Speedup: 1.39x
• Baseline Regime: Memory Bound
• Proposal Regime: Memory Bound

Analysis: The workload remained Memory Bound. The speedup is constrained strictly by the ratio of HBM bandwidth between the two configurations. Any additional compute capacity (FLOP/s) in the proposal was left unutilized.

3. A Regime Shift (The Breakthrough)

What happens if we increase the batch size significantly?

# At batch size 256, the A100 is struggling with compute, but the H100 has plenty
prof_a100_batch = Engine.solve(model=model, hardware=mlsysim.Hardware.Cloud.A100, batch_size=256)
prof_h100_batch = Engine.solve(model=model, hardware=mlsysim.Hardware.Cloud.H100, batch_size=256)

explanation_batch = DifferentialExplainer.compare_performance(
    baseline=prof_a100_batch,
    proposal=prof_h100_batch
)

print(explanation_batch)

Output:

📊 Differential Analysis: Proposal vs. Baseline
• Speedup: 2.65x
• Baseline Regime: Compute Bound
• Proposal Regime: Compute Bound

Analysis: The workload remained Compute Bound. The speedup is constrained strictly by the ratio of peak arithmetic throughput (FLOP/s) between the two configurations. Additional memory bandwidth was not the limiting factor.

What You Learned

The Differential Explainer takes the cognitive load off the user by explicitly stating why an upgrade behaved the way it did.
It detects Regime Shifts, helping you realize when a hardware upgrade actually solved your bottleneck.
This tool is perfect for embedding into CI/CD pipelines (e.g., leaving a comment on a GitHub PR explaining why a new model architecture will slow down production).