solvers.EfficiencyModel

solvers.EfficiencyModel()

Models the gap between peak and achieved FLOPS (Wall 3: Software Efficiency).

This model quantifies the software efficiency of a workload — the fraction of peak hardware FLOPS that the software stack actually converts into useful computation. It decomposes Model FLOPs Utilization (MFU) by workload type, accounting for kernel fusion efficiency, SM occupancy, and memory access patterns.

Literature Source: 1. Chowdhery et al. (2022), “PaLM: Scaling Language Modeling with Pathways.” (First systematic MFU reporting for large Transformers.) 2. Dao et al. (2022), “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness.” (FlashAttention MFU improvement.) 3. NVIDIA (2023), “Hopper Architecture Tuning Guide.” (SM Occupancy model.)

Methods

Name Description
solve Estimates achievable MFU and FLOPS for a given workload type.

solve

solvers.EfficiencyModel.solve(
    model,
    hardware,
    workload_type='ffn',
    use_flash_attention=False,
    precision='fp16',
    efficiency=0.5,
)

Estimates achievable MFU and FLOPS for a given workload type.

Parameters

Name Type Description Default
model Workload The model architecture to simulate. required
hardware HardwareNode The target hardware node. required
workload_type str The dominant kernel type (‘attention’, ‘ffn’, ‘conv’). 'ffn'
use_flash_attention bool Whether FlashAttention is enabled (only applies to ‘attention’). False
precision str Numerical precision (‘fp16’, ‘fp32’, ‘int8’, ‘int4’). 'fp16'
efficiency float Base compute efficiency factor (0.0 to 1.0). 0.5

Returns

Name Type Description
Dict[str, Any] MFU estimate, achievable FLOPS, and overhead breakdown.
Back to top