Bolting out of the shadows

Path tracing has traditionally been too slow for real-time applications, which is a challenge for creators who need immediate visual feedback. In film production, rendering a single frame can take hours, and a full movie requires extensive computational resources. A thousand render nodes with 64,000 CPU cores might take 500 hours for a final render, but films undergo many lower-quality iterations. Similarly, architecture and design professionals need real-time path tracing for accurate client collaboration. While partial ray tracing is effective, industries like environmental impact studies demand the highest precision in lighting and materials for accurate design representation. Bolt has a new GPU design that cuts this down to real time

What do we think? Bolt promises to create a major inflection point in the traditional GPU, ray-tracing, and path-tracing market with a revolutionary new, scalable processor design with a road map that glides into supercomputer land and employs elements of RISC-V. The company has strong backing and seems to understand all the nuances of building a product that can scale up and scale out. Also, the company is developing a full ecosystem around its first product, the Zeus rendering AIB. It will probably take a few years before the studios and renderfarms adopt and build out new farms with Bolt, farms that will be smaller, use less power, and be faster.

Bolt rendered — ***^{Bolt Graphics changes the landscape of high-quality, high-performance rendering. (Source: Bolt)}***

Bolt Graphics’ new, scalable processor design

We’ve reported on Bolt Graphics a few times since it was formally founded in 2020. The company has been in a general stealth-like mode, which is typical for a start-up during its development period. Its promises and taunts, however, have been enticing and left us and others with a “sounds great if (they can do it).” Turns out, they could and did. Prepare to be amazed.

Nvidia CEO Jensen Huang has said, “If you want to do extraordinary things, it shouldn’t be easy.” Real-time path tracing is not easy.

Historically, path tracing has been limited to applications that don’t require real-time rendering. However, this isn’t ideal for creators, who need to work in a view that accurately shows them the output of their work immediately.

In the film industry, rendering a single frame can take around four hours using 64 CPU cores. This estimate can vary greatly, with some scenes taking up to 50 hours to render. A typical film consists of 129,600 frames, requiring significant rendering resources.

To put this into perspective, 1,000 render nodes with 64,000 CPU cores would take around 500 hours to render a film. However, this is only for the final render, and films often go through hundreds of iterations, with non-final iterations rendered at lower quality. This is not ideal, as creators need to work with accurate and high-quality representations of their work.

“Attempting to render even a fraction of one city block in Big Hero 6 in an existing ray-traced renderer exceeded the 64GB physical memory limit of our systems at the time, and we estimated our need to be perhaps an order of magnitude greater,” states a Walt Disney Animation Studios research article published by ACM.

Another segment that needs highly accurate and fast rendering is digital twins. In architecture and interior design, real-time path tracing is essential for collaborating with clients. Designers need to be able to walk through projects with clients, make immediate changes, and see the updated results in real time.

Although current partial ray-tracing technologies do a very good job, clients and firms require the most precise lighting and material information to show their design decisions—this is especially true in the case of environmental impact studies.

Bolt environmental design — ***^{Seeing how a design will look in its environment is critical. (Source: Bolt)}***

Therefore, visualizing and interacting with designs in real time is crucial for making informed decisions.

In the area of HPC, engineers and researchers depend on accurate simulations to inform their research, design, and development of components and products.

Real-time path tracing is also transforming product photography and advertising, allowing companies to create realistic digital product representations before manufacturing. This enables immersive product exploration, accurate customization, and realistic and appealing product displays.

Pre-rendered game cutscenes and cinematic trailers have been used to show what a game or movie will look like before it is released. Because such scenes need to be particularly polished, game studios have typically opted to pre-render them rather than using real-time rendering.

Bolt approached these opportunities and challenges by redesigning the GPU. They call their masterpiece Zeus.

Bolt2a — ***^{Bolt Graphics’ Zeus AIB. (Source: Bolt)}***

First, the AIB will have an air cooler shroud, and then, for higher capacity and faster AIBs, a liquid-cooled version, like an LN2 or SFF, will be available.

The chip is a GPU-CPU hybrid. It has TMUs, a shading pipeline, and a display driver. It is also good at vector compute workloads. The company isn’t disclosing process node, fab, or clock at this time. However, the cores are FP64, split into 2x FP32, 4x FP16.

The AIBs have 32GB, 64GB, or 128GB. Plus, there are two or four SODIMMs (laptop sticks) that add more bandwidth and capacity. So, the smaller card (1c26-032) has two slots for up to 128GB plus 32GB built in.

Bolt is proud of its memory configuration and thinks it will upend the industry. CEO and Founder Darwesh Singh points out, “The 6000 Quadro Blackwell is only going to be 64GB at almost $10K.”

The company is not announcing pricing yet but says it will be priced competitively to Nvidia.

But, can Bolt live up to the promise?

Based on their benchmarks, we’d have to say yes. The following chart shows a comparison between Zeus 4c (the data center version) and the Nvidia RTX 5090 in a real path-tracing workload. Adds Singh, “Scaling a renderer to 280 GPUs is quite painful, so we found the lowest resolution that hits 100 SPP per 20 bounces.”

Bolt v 5090 — ***^{Bolt claims 28 of their GPUs can beat 280 RTX 5090 GPUs in rendering speed. (Source: Bolt)}***

For real-time path-tracing applications, the company says the ray-per-pixel budget improves dramatically.

Bolt performance — ***^{Intersection performance comparisons. (Source: Bolt)}***

The budget includes multiple bounces—4K 120 fps path tracing not achievable on other GPUs: Zeus 2c = 8 SPP, 5 bounces, plus denoising.

Bolt has also developed a path-tracing application that exploits the hardware’s unique capabilities, and they call it Glowstick.

Bolt Glowstick — ***^{Glowstick, Boltʼs real-time path tracer. (Source: Bolt)}***

Bolt says Glowstick, their in-house renderer, matches the quality of production renderers (specifically, when compared against Blender’s Cycles, Autodesk’s Arnold, Nvidia’s Iray, Nvidia’s RTX renderer in Omniverse, and Chaos’ V-Ray).

Glowstick has some important and interesting features:

• Statistically accurate sampling and filtering
• Texture caching and tiled access
• Progressive rendering
• Unbiased Monte Carlo integration of ray samples
• Path-traced global illumination, occlusion, visibility, and ray traversal
• Physically accurate reflectance, refraction, transmission, emissivity, and caustics
• Energy conserving and preserving illumination models
• Physical area lights and emitters, including HDRI sources
• Camera depth of field for physically accurate focus effect

It has compatibility with OpenUSD:

• Direct interchange with content creation apps
• Fully composable scene hierarchy, cameras, lighting
• Powerful workflows for animated sequences and collaborative editing
• Programmable asset pipeline
• Flexible geometric meshing extensible to SubDs, NURBS, and volumetrics

And it offers hardware texture mapping:

• Supports OpenImageIO standard
• Cached image buffers
• Tiled and mipmapped textures
• Procedural textures from Open Shading Language
• Direct support for USD Imaging
• OpenColorIO for color management
• Ptex compatibility

The company is also announcing a scanned texture library that it says is larger than Nvidia’s and AMD’s combined—5,000 scanned textures at up to 16K resolution. Bolt has secured licenses for those textures and claims they are not available in any other library.

Glowstick has a nice set of features and is free.

Glowstick has a programmable shading pipeline that integrates the MaterialX standard, developed by the Academy Software Foundation (ASWF), to facilitate seamless material exchange and look development within USD (Universal Scene Description) scenes.

MaterialX enables the definition of USD materials by connecting shaders and compute nodes. Glowstick applies these materials, including UsdPreviewSurface, UsdOpenPBRSurface, and UsdUVTexture, to scene elements.

To implement physically based rendering (PBR), Glowstick utilizes Open Shading Language (OSL) to create BSDFs (Bidirectional Scattering Distribution Functions). OSL shaders combine programmable BSDFs, procedural textures, and imaging operations.

Glowstick combines OSL shaders into a MaterialX graph, which can be parsed to generate OSL shaders. These shaders are then compiled to LLVM IR (intermediate representation), allowing Glowstick to execute them efficiently.

The API provides hardware and software emulation on multiple platforms and architectures.

There is development tooling on Linux, and the company says x86, RISC-V, and Arm; Windows is coming soon. Glowstick can also be run as a networked service.

It works with renderfarm management (e.g., AWS Thinkbox’s Deadline) and job description formats (OpenJDK), and is compatible with distributed rendering for rendering animations. It also offers super-resolution outputs by parallelizing region rendering, and it provides integration to DCC apps through Hydra rendering delegates.

The company says it is partnering with tile, textile, fabric, and other consumer and commercial vendors to digitize textures with 4K, 8K, 16K, and beyond resolutions. Bolt thinks it will be suitable for commercial use with appropriate license. The initial scale is >5,000, eventual scale of hundreds of thousands of textures in about four years is anticipated.

The company also sees the simulation market as being benefited by a fast, highly accurate rendering capability.

Simulations are used everywhere, no exaggeration—designing, engineering, testing, optimizing parts is done on powerful computers.

• Mechanical design of buildings, oil rigs, ships, etc.
• HVAC computer-aided design and efficiency simulations required for LEED certifications
• Electromagnetic interference of CE/UL-certified devices (smartphones, pacemakers, antennae)
• Mechanical, EMI, thermal analysis of EV motors
• Airflow simulation for cars, planes, drones
• Protein folding and molecular dynamics simulations for drugs/pharma
• Radar cross-section for stealth aircraft, ships, submarines
• Petrochemical simulations for plastics and engineered materials
• Designing and optimizing photonic crystals, lenses, waveguides, lasers
• Photovoltaic cells, nuclear fusion and fission, geothermal, hydrogen power generation

Earth-scale problems require powerful computers running a variety of simulations such as weather prediction (short- and medium-term) and climate analysis (long-term).

Bolt says their Zeus will unlock simulations that were impossible before. It will enable larger simulation spaces, finer meshes, and more complete simulations, going from simulating a fraction of a photonics chip, lens, or PCB to the entire thing.

Another area where ray tracing and path tracing are used is in EM simulation: optimizing Zeus for key industry HPC workloads. EM simulation is key in designing radar detection/absorption systems, silicon photonics, CT scanners, and, of course, EMI in PCBs and electronics. Bolt claims its EM simulation performance is 300x higher than legacy GPUs. Specifically, B200 EM simulation is historically done on CPUs and requires lots of compute.

Looking forward, Bolt sees a logical expansion of Zeus into a system situation.

Zeus cluster — ***^{Zeus can be built into highly scalable clusters. (Source: Bolt)}***

A scalar CPU core (RVA23) offers high single-thread performance with vector cores. It will have FP64 ALUs. It will offer RVV 1.0 with slight modifications. Bolt will add extensions for other accelerators, and it will use RISC-V.

High-performance accelerators are often orders of magnitude faster than programmable cores. Bolt says its accelerator blocks are domain-specific with varying levels of abstraction: math functions, kernel modules, and application accelerators. The company also adds that its Zeus design dedicates less area to FP ALUs and more area to high-performance accelerators.

The scalability of the Zeus architecture can be expanded to next-generation hardware.

Zeus next-gen architecture — ***^{Zeus next-generation architecture. (Source: Bolt)}***

Bolt has an early access program (EAP). It will make pre-silicon porting and benchmarking access with priority support available, and cloud credits for test-driving Zeus and the Bolt software ecosystem. The company offers to co-design with EAP customers.

There will also be developer kits to help smoothly migrate from EAP to dev kits, and qualify and validate PCIe cards, servers, firmware, drivers, and software.

From that, mass production should smoothly migrate from dev kits to production silicon. PCIe boards will be available through Bolt retail channels and 2U servers available through server partners.

_{LIKE WHAT YOU’RE READING? INTRODUCE US TO YOUR FRIENDS AND COLLEAGUES.}

Bolting out of the shadows

Related posts

MediaTek’s sales increase year to year in Q3’25

What next for SiFive?

AMD’s 5th-gen Epyc processor, Turin

Recent products

Bolting out of the shadows

Related posts

MediaTek’s sales increase year to year in Q3’25

What next for SiFive?

AMD’s 5th-gen Epyc processor, Turin

Recent products

AI Processors in Wearables Report

FPGAs in AI Report

Neuromorphic AI Processors Report