Announcing preview support for Llama 2 in DirectML

Brink · Nov 20, 2023

At Inspire this year we talked about how developers will be able to run Llama 2 on Windows with DirectML and the ONNX Runtime and we’ve been hard at work to make this a reality.

We now have a sample showing our progress with Llama 2 7B!

See Olive/examples/directml/llama_v2 at main · microsoft/Olive

This sample relies on first doing an optimization pass on the model with Olive, a powerful optimization tool for ONNX models. Olive utilizes powerful graph fusion optimizations from the ONNX Runtime and a model architecture optimized for DirectML to speed up inference times by up to 10X!

After this optimization pass, Llama 2 7B runs fast enough that you can have a conversation in real time on multiple vendors’ hardware!

We’ve also built a little UI to make it easy to see the optimized model in action.

Thank you to our hardware partners who helped make this happen. For more on how Llama 2 lights up on our partners’ hardware with DirectML, see:

AMD: [How-To] Running Optimized Llama2 with Microsoft DirectML on AMD Radeon Graphics

Intel: Intel and Microsoft Collaborate to Optimize DirectML for Intel® Arc™ Graphics Solutions

NVIDIA: New TensorRT-LLM Release For RTX-Powered PCs | NVIDIA Blog

We’re excited about this milestone, but this is only a first peek – stay tuned for future enhancements to support even larger models, fine-tuning and lower-precision data types.

Getting Started
Requesting Llama 2 access
To run our Olive optimization pass in our sample you should first request access to the Llama 2 weights from Meta.

Drivers
We recommend upgrading to the latest drivers for the best performance.

AMD has released optimized graphics drivers supporting AMD RDNA™ 3 devices including AMD Radeon™ RX 7900 Series graphics cards. Download Adrenalin Edition™ 23.11.1 or newer (https://www.amd.com/en/support)

Intel has released optimized graphics drivers supporting Intel Arc A-Series graphics cards. Download the latest drivers here

NVIDIA: Users of NVIDIA GeForce RTX 20, 30 and 40 Series GPUs, can see these improvements first hand, in GeForce Game Ready Driver 546.01

Source:

Announcing preview support for Llama 2 in DirectML - DirectX Developer Blog

At Inspire this year we talked about how developers will be able to run Llama 2 on Windows with DirectML and the ONNX Runtime and we’ve been hard at work to make this a reality. We now have a sample showing our progress with Llama 2 7B! See...

devblogs.microsoft.com

Announcing preview support for Llama 2 in DirectML

Getting Started

Requesting Llama 2 access

Drivers

Announcing preview support for Llama 2 in DirectML - DirectX Developer Blog

Similar threads

Latest Support Threads

Latest Tutorials

Announcing preview support for Llama 2 in DirectML

Getting Started​

Requesting Llama 2 access​

Drivers​

Announcing preview support for Llama 2 in DirectML - DirectX Developer Blog

Similar threads

Latest Support Threads

Latest Tutorials

Getting Started

Requesting Llama 2 access

Drivers