core.solver.CompressionModel

core.solver.CompressionModel()

Analyzes model compression trade-offs (Accuracy vs. Efficiency).

This solver models the ‘Compression Tax’ — the accuracy degradation that occurs when reducing model size via quantization or pruning, balanced against the gains in memory footprint and inference latency.

Literature Source: 1. Han et al. (2015), “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.” 2. Gholami et al. (2021), “A Survey of Quantization Methods for Efficient Neural Network Inference.” 3. Blalock et al. (2020), “What is the State of Neural Network Pruning?”

Methods

Name	Description
solve	Solves for compression gains and estimated accuracy impact.

solve

core.solver.CompressionModel.solve(
    model,
    hardware,
    method='quantization',
    target_bitwidth=8,
    sparsity=0.0,
)

Solves for compression gains and estimated accuracy impact.

Parameters

Name	Type	Description	Default
model	Workload	The model to be compressed.	required
hardware	HardwareNode	The target execution hardware.	required
method	str	The compression method (‘quantization’, ‘pruning’, ‘distillation’).	`'quantization'`
target_bitwidth	int	Target numerical precision in bits (e.g., 8 for INT8, 4 for INT4).	`8`
sparsity	float	Target sparsity ratio (0.0 to 1.0) for pruning.	`0.0`

Returns

Name	Type	Description
	Dict[str, Any]	Compression metrics including memory savings, latency speedup, and estimated accuracy delta.