GPU scheduling—don’t be late

DirectX12 exposed a tighter CPU–GPU coupling, now it’s been implemented

Jon Peddie
Source: Daily Mail

GPU Scheduling—was is it, who needs it, who wants it?

The (May 2020, ver. 2004) updated Microsoft’s Windows 10’s DirectX API got a new optional feature called Hardware Accelerated GPU Scheduling—a feature designed to reduce latency caused by buffering between the CPU and GPU.

Microsoft says DirectX 12 can now offload most of GPU scheduling to a dedicated GPU-based scheduling processor with the right hardware and drivers. This feature, says the company, was created to prioritize GPU work to ensure a responsive user experience.

This is not a completely new concept or feature; GPU scheduling has been available in one form or another since Windows Display Driver Model 1.0 (WDDM) was introduced in 2006. Before WDDM, applications could submit jobs to the GPU as wanted. They submitted to a global queue where it was executed in a strict “first to submit, first to execute” fashion. Those rudimentary scheduling schemes worked when most GPU applications were full-screen games, being run one at a time.

The CPU used to do all the preparing and submitting commands to the GPU. Doing that one frame at a time was inefficient and frame buffering so the CPU could send commands in batches. That increased overall performance (measured as framerate) but it also increased latency.

With Windows 10 May 2020 update, Microsoft introduced a hardware-accelerated GPU scheduler as a user opt-in, which is in the off state by default—the user has to turn it on. Microsoft said that “changing the scheduler is akin to rebuilding the foundation of a house while still living in it.” And that’s why it is an opt-in feature now.

Microsoft added HW scheduling to the Advanced Graphics Settings page of Display Settings


The new GPU scheduler will be supported on recent GPUs that have the necessary hardware, combined with a WDDMv2.7 driver that exposes this support to Windows. If you don’t have a new driver installed, you won’t see the new screen.

Windows continues to control prioritization and decide which applications have priority among contexts. Windows can now offload frequently used tasks to the GPU scheduling processor, handling data management and context switching of various GPU engines. There still is a high-priority thread running on the CPU that prioritizes and schedules the jobs submitted by applications.

Hardware-accelerated GPU scheduling, however, is a big change for drivers. While some GPUs have the necessary hardware, the associated driver exposing this support will only be released once it has gone through a significant amount of testing with Microsoft’s Insider population.

Nvidia was first to implement and announced the new capability on 24 June, and AMD was right behind announcing their support with a beta driver on 1 July. GPU Hardware-accelerated scheduling is supported on AMD’s 5600 and 5700 series of AIBs and Nvidia Pascal and Turing AIBs. The fact that AIBs to Turing have support for this feature implies that it’s been planned for some time now.

Test results

We did a simple benchmark using UL’s Time Spy Benchmark on an Nvidia RTX 2080 Super in an Alienware Area 51 with a 3.3 GHz Core i9 7900x and saw a definite improvement. We started with Windows 10 build 1903 with Nvidia’s 445.87 drivers and then upgraded to Windows 10 build 2004 and Nvidia’s 451.48 drivers.

General performance tests with older driver (445.87) and new GPU scheduling driver (451.48)


The overall score improved by 1.2%, and the CPU load decreased by 10.8%. That’s not a lot, but users get it for free. AMD and Nvidia had to put work into it, but that is spread over tens of millions of users, and it’s worth it to have happy users.

What do we think?

Eventually, GPU scheduling will be on by default. Microsoft says through their experimentation platform and telemetry system they can run A/B experiments and compare how systems running with hardware-accelerated GPU scheduling compare to systems running Microsoft’s old GPU scheduler. The company monitors reliability telemetry such as kernel crashes (bluescreens), user-mode crashes, GPU hangs freeze/deadlocks as well as a limited set of performance metrics. So yes, big brother is watching you. When Microsoft is happy with the results that come back, that’s when they flip the switch for GPU scheduling to default on.

Android has a somewhat similar, explicit capability in its developer options that help visualize where an app might be running into issues rendering its UI, such as performing more rendering work than necessary or executing long thread and GPU operations. The Profile GPU Rendering tool displays, as a scrolling histogram, a visual representation of how much time it takes to render the frames of a UI window relative to a benchmark of 16 ms per frame.

On less powerful GPUs, available fill-rate (the speed at which the GPU can fill the frame buffer) can be quite low. As the number of pixels required to draw a frame increases, the GPU may take longer to process new commands and ask the rest of the system to wait until it can catch up. The profiling tool helps identify when the GPU gets overwhelmed trying to draw pixels or is burdened by heavy overdraw.