Learning-Deep-Learning

Requirements to compute hardware between training and inference

Comparison between training, cloud-inference, and edge-inference

Training improves the model, cloud inference serves users, and edge inference controls the physical world.

Dimension Training Cloud Inference Edge Inference
Objective Model quality Scale + cost Real-time control
Mode Offline Online Closed-loop
Batch Large Medium 1
Latency Requirements None Low Hard real-time
Scaling Scale-out Scale-out No scale-out
Bottleneck Compute / HBM / comms KV / scheduling / $ Latency / jitter / power
Memory Preference HBM HBM + KV optimizations SRAM-first
Cost Model $ per training run $ per token $ per latency + Watt

Groq

Groq was aqcuired by Nvidia in 12/2025 for 20 Billion USD.

GPUs win throughput economics; Groq wins latency economics.

Groq’s LPU (language processing unit)

DRAM (HBM, DDR) vs SRAM

Property SRAM DRAM
Structure 6-transistor (6T) latch charge in a capacitor (1T1C)
Latency Very low (~1ns) Higher (~50–100ns)
Bandwidth Very high (on-chip) Medium (DDR) to high (HBM)
Need refresh? No Yes
Density Low High
Power Higher (static leakage) Lower
Cost Higher Lower
Location On-chip (cache) Off-chip (DDR/LPDDR/HBM)

Types of DRAM

Memory
 └── Volatile
      ├── SRAM (on-chip cache)
      └── DRAM
            ├── DDR / LPDDR
            └── HBM   
Type of DRAM Power Bandwidth Capacity Cost Use Case
DDR Med Med High Low PCs / Servers
LPDDR Low Med Med Med Mobile / Edge
HBM Med High Med High AI / GPU / HPC

DDR vs HBM