Provenance
Where MLSys·im numbers come from and how to audit them
Every Tier A number in MLSys·im — registry entries and public sourced scalars — carries structured lineage via the Provenance model. Citation managers and prose live outside the package; MLSysIM stores audit metadata only.
Provenance answers “how do we know this number?” It is not a registry category. A number can have provenance whether it lives under Hardware, Systems, Infrastructure, Ops, Scenarios, or Literature. Literature.* is the registry for cited scalar anchors from papers/tables; those entries use the same provenance machinery as every other sourced value.
mlsysim.core.defaults
Constants were reorganized into semantic registries (Hardware, Models, Systems, Infrastructure, Ops, Scenarios, Literature) plus engine.calibration. See the Zoo overview and DATA_MODEL.md.
Where constants live
| Namespace | Examples |
|---|---|
Hardware.* |
Peak FLOPs, HBM bandwidth, TDP — datasheet truth |
Hardware.Tech.* |
Technology-class latency, op energy, movement energy |
Literature.* |
MFU bands, Chinchilla ratio, communication and batch-size literature anchors |
Infrastructure.Grids / Pricing / Capacity |
Carbon, PUE, cloud $ anchors |
Systems.Reliability / Nodes / Racks / Fabrics / Clusters |
MTTF, server/rack profiles, fabrics, fleet tiers |
Ops.Monitoring / TrainingRunOverheads |
PSI thresholds, KS coefficient, goodput-loss profiles |
Scenarios.* |
Executable workload + system + constraint bundles |
ReferenceStats.* |
Non-executable sourced world statistics and case-study anchors |
engine.calibration |
Solver/engine default kwargs — not cited in appendix tables |
Do not duplicate registry fields across namespaces (chip unit_cost lives on Hardware.Cloud.* only).
Registry entries
Hardware and model zoo entries attach provenance via metadata.provenance:
import mlsysim
hw = mlsysim.Hardware.Cloud.H100
prov = hw.metadata.provenance
print(prov.kind, prov.ref)Literature scalars
Standalone scalar anchors use the Sourced type and sourced() factory:
mfu = mlsysim.Literature.Training.MfuHigh
print(mfu, mfu.provenance.ref)Browse the full catalog in the Literature Zoo.
Audit gates
Contributors should run before opening a PR:
python -m mlsysim.tools.audit_provenance --scope all --strict| Gate | What it catches |
|---|---|
| Registry metadata | Hardware/model nodes without metadata.provenance |
| Sourced scalars | Literature.*, Ops.*, ReferenceStats.*, Infrastructure.Capacity.*, engine.calibration.* without lineage |
| Appendix lineage | Stale paths or missing provenance in assumption appendices |
Downstream lineage checks consume MLSysIM through public registry paths; the package audit stays focused on package registries and sourced scalars.
Full contributor rules: Contributing — Provenance.
Canonical Python paths
Use nested registry paths in Python (Hardware.Cloud.H100, Models.Language.Llama3_8B). The CLI accepts short names (mlsysim eval Llama3_8B H100) via internal lookup — that convenience does not apply to import mlsysim code.
See API Stability.