solvers.BatchingOptimizer
solvers.BatchingOptimizer()Finds the maximum batch size that satisfies a P99 latency SLA.
Searches the continuous batching design space using an M/M/c queueing model to find the optimal balance between throughput and tail latency.
Methods
| Name | Description |
|---|---|
| solve | Determines the maximum batch size that satisfies a P99 tail latency SLA. |
solve
solvers.BatchingOptimizer.solve(
model,
hardware,
seq_len,
sla_latency_ms,
arrival_rate_qps,
num_replicas=1,
precision='fp16',
efficiency=0.5,
max_search_batch=256,
)Determines the maximum batch size that satisfies a P99 tail latency SLA.