The Ops Community ⚙️

Cover image for Highlights from Peter Desantis keynote at AWS reinvent 2024
Eyal Estrin
Eyal Estrin

Posted on • Originally published at eyal-estrin.Medium

Highlights from Peter Desantis keynote at AWS reinvent 2024

Many topics were shared during the keynote, and in this short blog post, we will review some of the highlights.

The technical aspects began with David Brown, VP of AWS Compute & Networking.

AWS Graviton

David shared how the Graviton processor evolved over the years.

If we use the Graviton2 processor as a baseline for performance comparison, the Graviton3 is capable of producing 60% more performance (than Graviton2) in real workload using NGINX, and the Graviton4 is capable of producing 40% more performance (than Graviton3) in real workload using NGINX.

Image description

Graviton processors are powering many of the popular AWS services:

Image description

Image description

AWS Nitro System

All new AWS compute services in the past couple of years are powered by the Nitro System, which offers better performance and hardware-enforced separation.

Image description

Image description

Image description

Image description

For more information:

https://docs.aws.amazon.com/whitepapers/latest/security-design-of-aws-nitro-system/the-components-of-the-nitro-system.html

AWS Trainium

Peter Desantis shared information about the AWS Trainium processors for generative AI workloads, and its architecture.

For more information: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/neuron-hardware/trainium.html

Image description

Image description

Systolic Array

A systolic array is a specialized architecture used in parallel processing, particularly effective for tasks like matrix multiplication and convolution operations in deep learning.

For more information: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/trainium_inferentia2_arch.html

Image description

Neuron Kernel Interface (NKI)

The Neuron Kernel Interface (NKI) is a programming interface introduced by AWS as part of the Neuron SDK, designed to optimize compute kernels specifically for AWS Trainium and Inferentia chips. It enables developers to create high-performance kernels that enhance the capabilities of deep learning models.

For more information: https://aws.amazon.com/about-aws/whats-new/2024/09/aws-neuron-nki-nxd-training-jax/

Image description

Announcement - Latency-optimized inference option for Amazon Bedrock (Available in Preview)

Latency-optimized inference for foundation models in Amazon Bedrock is now available in public preview, delivering faster response times and improved responsiveness for AI applications. Currently, these new inference options support Anthropic's Claude 3.5 Haiku model and Meta's Llama 3.1 405B and 70B models offering reduced latency compared to standard models without compromising accuracy.

For more information:

https://aws.amazon.com/about-aws/whats-new/2024/12/latency-optimized-inference-foundation-models-amazon-bedrock/

https://docs.aws.amazon.com/bedrock/latest/userguide/latency-optimized-inference.html

Image description

Image description

Image description

UltraCluster 2.0 and the 10p10u network

The last information discussed in the keynote was the UltraCluster and its underlying network which AWS internally calls 10p10u.

For more information: https://www.aboutamazon.com/news/aws/aws-infrastructure-generative-ai

Image description

Image description

The entire keynote video can be found at https://www.youtube.com/watch?v=vx36tyJ47ps

About the author

Eyal Estrin is a cloud and information security architect, an AWS Community Builder, and the author of the books Cloud Security Handbook and Security for Cloud Native Applications, with more than 20 years in the IT industry.

You can connect with him on social media (https://linktr.ee/eyalestrin).

Opinions are his own and not the views of his employer.

Top comments (1)

Collapse
 
marie_jones profile image
Marie Jones

Great post!
NKI sounds like a powerful tool for optimizing deep learning on AWS Trainium and Inferentia chips. Excited to see how it improves model performance.
Thanks for sharing!