
The Software Development Engineer II role focuses on designing and maintaining cloud-based repair and recovery workflows for NVIDIA GB200 and GB300 UltraServers within the EC2 UltraServer Availability team. Key responsibilities include architecting scalable solutions using AWS services, automating diagnostic triage and hardware testing, and collaborating with hardware engineering and datacenter operations to ensure high availability of AI/ML infrastructure. The position offers a high-impact opportunity to lead end-to-end development in a fast-growing domain, working within a collaborative two-pizza team that values operational excellence and continuous improvement. The role is based in Seattle and involves hands-on ownership of the full software development lifecycle from design to operations.














