ML System Bottleneck Analyzer
Model Configuration
Model Preset
Custom
Llama 3 8B
Llama 3 70B
Mistral 7B
DeepSeek V3 (700B)
Large Model (400B+)
Very Large Model (1T+)
Quantization
Q4
INT8
FP16
BF16
FP32
Total Parameters (B)
Batch Size
Sequence Length
Hidden Size
Number of Layers
Number of Heads
Data Type
float32
bfloat16
float16
int8
q4
Parallelism Strategy
Pipeline Parallelism
Tensor Parallelism
Devices
Resource Utilization
System Analysis
(Token rates are approximations)
Real-world results are below for reference
Model:
Hardware:
Quantization:
Model
Quantization
Framework
Hardware
Batch Size
Sequence Length
Token Rate (Batch)
Token Rate (Single)
Source