solvers.EfficiencyModel

solvers.EfficiencyModel()

Models the gap between peak and achieved FLOPS (Wall 3: Software Efficiency).

This model quantifies the software efficiency of a workload — the fraction of peak hardware FLOPS that the software stack actually converts into useful computation. It decomposes Model FLOPs Utilization (MFU) by workload type, accounting for kernel fusion efficiency, SM occupancy, and memory access patterns.

Literature Source: 1. Chowdhery et al. (2022), “PaLM: Scaling Language Modeling with Pathways.” (First systematic MFU reporting for large Transformers.) 2. Dao et al. (2022), “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.” (FlashAttention MFU improvement.) 3. NVIDIA (2023), “Hopper Architecture Tuning Guide.” (SM Occupancy model.)

Methods

Name	Description
solve	Estimates achievable MFU and FLOPS for a given workload type.

solve

solvers.EfficiencyModel.solve(
    model,
    hardware,
    workload_type='ffn',
    use_flash_attention=False,
    precision='fp16',
    efficiency=0.5,
)

Estimates achievable MFU and FLOPS for a given workload type.

Parameters

Name	Type	Description	Default
model	Workload	The model architecture to simulate.	required
hardware	HardwareNode	The target hardware node.	required
workload_type	str	The dominant kernel type (‘attention’, ‘ffn’, ‘conv’).	`'ffn'`
use_flash_attention	bool	Whether FlashAttention is enabled (only applies to ‘attention’).	`False`
precision	str	Numerical precision (‘fp16’, ‘fp32’, ‘int8’, ‘int4’).	`'fp16'`
efficiency	float	Base compute efficiency factor (0.0 to 1.0).	`0.5`

Returns

Name	Type	Description
	Dict[str, Any]	MFU estimate, achievable FLOPS, and overhead breakdown.