ML System Bottleneck Analyzer

Model Configuration

Llama3 8b @ Q4

Devices

Resource Utilization

System Analysis (Token rates are approximations)

System Topology (Connection diagram)

NVLink (300+ GB/s)
PCIe 5.0 (32-64 GB/s)
PCIe 4.0 (8-32 GB/s)
PCIe 3.0/DDR5 (<16 GB/s)

Real-world results are below for reference

Model Quantization Framework Hardware Batch Size Sequence Length Token Rate (Batch) Token Rate (Single) Source