Pick a model and hardware. See the decode rate and what's bottlenecking it.
Defaults work for most setups. Change these only if you need specific behavior.
Total prompt + generated tokens per request.Pick a GPU or CPU per device. Use + Add device for multi-GPU setups.
Real-world token rates from vendor, community, and research sources. Use the filters to find a reference close to your configuration.
| Model | Quantization | Framework | Hardware | Batch Size | Sequence Length | Token Rate (Batch) | Token Rate (Single) | Source |
|---|