Pick a model and hardware. See the decode rate and what's bottlenecking it.
Defaults work for most setups. Change these only if you need specific behavior.
Select one device to edit. The full system appears in the topology below.
Real-world throughput plus official model-card task scores from vendor, community, and research sources. Throughput rows are used for configuration alignment; task-score rows are for comparison only.
| Model | Quantization / Mode | Runtime / Benchmark | Hardware / Task | Batch / Eval | Seq / Setting | Batch Rate / Score | Single Rate / Score | Source |
|---|