Provenance

Where MLSys·im numbers come from and how to audit them

Every Tier A number in MLSys·im — registry entries and public sourced scalars — carries structured lineage via the Provenance model. Citation managers and prose live outside the package; MLSysIM stores audit metadata only.

Provenance answers “how do we know this number?” It is not a registry category. A number can have provenance whether it lives under Hardware, Systems, Infrastructure, Ops, Scenarios, or Literature. Literature.* is the registry for cited scalar anchors from papers/tables; those entries use the same provenance machinery as every other sourced value.

NoteThere is no mlsysim.core.defaults

Constants were reorganized into semantic registries (Hardware, Models, Systems, Infrastructure, Ops, Scenarios, Literature) plus engine.calibration. See the Zoo overview and DATA_MODEL.md.

Where constants live

Namespace Examples
Hardware.* Peak FLOPs, HBM bandwidth, TDP — datasheet truth
Hardware.Tech.* Technology-class latency, op energy, movement energy
Literature.* MFU bands, Chinchilla ratio, communication and batch-size literature anchors
Infrastructure.Grids / Pricing / Capacity Carbon, PUE, cloud $ anchors
Systems.Reliability / Nodes / Racks / Fabrics / Clusters MTTF, server/rack profiles, fabrics, fleet tiers
Ops.Monitoring / TrainingRunOverheads PSI thresholds, KS coefficient, goodput-loss profiles
Scenarios.* Executable workload + system + constraint bundles
ReferenceStats.* Non-executable sourced world statistics and case-study anchors
engine.calibration Solver/engine default kwargs — not cited in appendix tables

Do not duplicate registry fields across namespaces (chip unit_cost lives on Hardware.Cloud.* only).

Registry entries

Hardware and model zoo entries attach provenance via metadata.provenance:

import mlsysim

hw = mlsysim.Hardware.Cloud.H100
prov = hw.metadata.provenance
print(prov.kind, prov.ref)

Literature scalars

Standalone scalar anchors use the Sourced type and sourced() factory:

mfu = mlsysim.Literature.Training.MfuHigh
print(mfu, mfu.provenance.ref)

Browse the full catalog in the Literature Zoo.

Audit gates

Contributors should run before opening a PR:

python -m mlsysim.tools.audit_provenance --scope all --strict
Gate What it catches
Registry metadata Hardware/model nodes without metadata.provenance
Sourced scalars Literature.*, Ops.*, ReferenceStats.*, Infrastructure.Capacity.*, engine.calibration.* without lineage
Appendix lineage Stale paths or missing provenance in assumption appendices

Downstream lineage checks consume MLSysIM through public registry paths; the package audit stays focused on package registries and sourced scalars.

Full contributor rules: Contributing — Provenance.

Canonical Python paths

Use nested registry paths in Python (Hardware.Cloud.H100, Models.Language.Llama3_8B). The CLI accepts short names (mlsysim eval Llama3_8B H100) via internal lookup — that convenience does not apply to import mlsysim code.

See API Stability.

Back to top