TechWatch

AI training vs. inferencing vs. memory bandwidth

Getting closer to very fast memory is the key.

Jon Peddie
AI

AI processors are categorized into training and inference types. Inference happens in the cloud or on local devices like phones or robots. Specialized devices like AMD’s Versal and Google’s TPU are preferred for inference. Training is also called prefill, and inference is decoding. Large language models have two phases: training (compute-intensive) and inference (memory-bound). Nvidia’s Vera Rubin chip addresses this divide, and its partnership with Groq strengthens its position in the market, potentially extending its CUDA dominance. AI processors have been categorized into two types: training and inference. Inference can occur in two locations: the cloud or a local client
...

Enjoy full access with a TechWatch subscription!

TechWatch is the front line of JPR information gathering service, comprising current stories of interest to the graphics industry spanning the core areas of graphics hardware and software, workstations, gaming, and design.

A subscription to TechWatch includes 4 hours of consulting time to be used over the course of the subscription.

Already a subscriber? Login below

This content is restricted

Subscribe to TechWatch