Intel on Intel beats Nvidia on Intel

Surprising results using Blender and oneAPI.

Jon Peddie

In recent weeks, Intel introduced its Embree ray-tracing library tailored for Arc GPUs. While initially utilized for accelerating ray tracing on multicore CPUs, Embree has now been integrated into widely used rendering applications such as V-Ray, MoonRay, Maxon Cinema 4D, among others. This recent enhancement expands accessibility to Embree and the complete suite of tools within the oneAPI toolkit to encompass the Arc GPU lineup. Notably, in a comparison with the Core i9-12900K, the Arc A750 demonstrates a remarkable acceleration of up to 30× in path tracing when leveraging the capabilities of Embree 4.2.

In the latest Embree 4.2 release, Intel introduced enhanced capabilities, including support for the company’s discrete GPUs through its oneAPI’s SYCL* implementation, marking its transition from beta to a production-ready state. This development leverages the SYCL cross-platform abstraction layer, complying with open standards, to facilitate GPU support and heterogeneous processing. As a result, says Intel, developers gain the flexibility to craft C++ code optimized for either CPUs or GPUs, within the same application codebase.

That allows developers to choose between scalability and performance when optimizing their Embree-optimized applications. The unified codebase approach, claims the company, streamlines development efforts and minimizes the need for ongoing code maintenance. Furthermore, says Intel, experts in rendering can achieve real-time rendering acceleration by harnessing ray-traced hardware acceleration on supported GPUs through the Intel Embree framework.

Prove it.

The latest iteration of Blender, version 3.6 LTS, has demonstrated remarkable enhancements. It employs oneAPI compatibility, extends support to multiple Intel GPUs, introduces driver enhancements, and integrates two potent Intel rendering libraries—Intel Embree and Intel Open Image Denoise—to elevate its rendering capabilities.

To showcase those advancements, Embree includes an example of a path tracer, a single-source renderer that effectively operates on both CPU and GPU architectures. For Intel’s existing CPU-focused customers, Intel Embree offers a benchmark (Figure 1) highlighting the Intel Arc A750 graphics GPU and surpassing the performance of an Intel i9-12900K CPU, resulting in significantly faster rendering speeds for a path-tracing workload.

graphs comparison
Figure 1. Intel Embree 4.2 resulting in significantly faster performance running on an Intel Arc A750 GPU compared to an Intel i9-12900K CPU for a path-tracing workload. (Source: Intel)

The models used in the benchmark were publicly available models from Austrian Imperial Crown modeled by Martin Lubich, and other models from the Stanford 3D Scanning Repository.

Intel CPUs
Figure 2. Intel Embree delivers multiarchitecture performance and productivity on Intel CPU and GPUs. (Source: Intel)

A benchmark comparing the Arc A750 against Nvidia’s RTX 3060, conducted using the Chameleon RT path tracer with Embree, Nvidia OptiX, and Vulkan, revealed that the Arc A750 had up to a 30% performance advantage over the RTX 3060 in the open-source path tracer built upon Embree. Notably, the GeForce AIB utilizing Vulkan technology approaches the Arc GPU’s performance when Embree is employed, but the Arc GPU triumphs in four out of nine test scenarios.

Figure 3. Embree 4.2 on an Intel Arc A750 GPU vs. optimized by Nvidia OptiX and Vulkan running on a Nvidia GeForce RTX* 3060 GPU. (Source: Intel)

The models used were from the Stanford 3D Scanning Repository and renderings done using ChameleonRT v0.0.10. The public models were ported to SYCL with the same code optimized by Intel Embree 4.2.

Although the RTX 3060 exhibited slightly superior performance using Vulkan in other benchmarks, it’s anticipated that these performance differences may diminish with future updates. An important feature of Embree 4.2 lies in its backward compatibility, allowing for compatibility with prior APIs. The popular rendering tool Blender can now accelerate ray tracing using oneAPI across a spectrum of Intel GPUs by effectively combining the strengths of Embree and Open Image Denoise.

You can see a demo of the test here.

* In order to unlock complete access to the ray-traced hardware acceleration offered by Intel Arc GPUs or data center GPUs, the process entails transitioning both CPU code and CUDA code to oneAPI’s SYCL implementation (Data Parallel C++ / DPC++). This migration results in the creation of a unified source code that caters to multiple architectures, enabling the exploitation of performance and efficiency advantages. For scenarios where the original code was exclusively designed for CPUs, developers might need to make certain design decisions to optimize performance, contingent upon the specific render design. However, if one’s code primarily operates on CPUs without any intention of utilizing GPU off-loading now or in the future, transitioning to SYCL is not obligatory.

What do we think?
One test does not a winner make—nine do. There are a lot of variables in this: latest driver, optimizations for the API, amount and type of memory on the AIB, and library optimizations. Having said all that, Intel knows better than to play any tricks and has been squeaky clean about disclosing their methodology and results. The next tests to be examined is an Arc 750 running raytracing in some of the popular games.