solvers.BatchingOptimizer

solvers.BatchingOptimizer()

Finds the maximum batch size that satisfies a P99 latency SLA.

Searches the continuous batching design space using an M/M/c queueing model to find the optimal balance between throughput and tail latency.

Methods

Name Description
solve Determines the maximum batch size that satisfies a P99 tail latency SLA.

solve

solvers.BatchingOptimizer.solve(
    model,
    hardware,
    seq_len,
    sla_latency_ms,
    arrival_rate_qps,
    num_replicas=1,
    precision='fp16',
    efficiency=0.5,
    max_search_batch=256,
)

Determines the maximum batch size that satisfies a P99 tail latency SLA.

Back to top