Jobs
Locationsexpand_more
All locations
Jobs in TexasJobs in New YorkJobs in CaliforniaJobs in FloridaJobs in North CarolinaJobs in Massachusetts
Categoriesexpand_more
All categories
Healthcare & NursingLogistics & WarehouseEngineeringITHospitality & CateringSales
SkillsCompaniesCareer GuidesBlogSalary
JobsLocationsCategoriesCompaniesCareer GuidesBlogSalary

Top states

TexasNew YorkCaliforniaFloridaNorth CarolinaMassachusetts

Top categories

Healthcare & NursingLogistics & WarehouseEngineeringITHospitality & CateringSales
Recrutus

Curating the world's most innovative career opportunities. We bridge the gap between visionary talent and industry-leading companies.

Search roles by city, category, skill, or job type — explore verified US employers, salary benchmarks, and remote-friendly teams hiring nationwide.

publiclanguageshare
Job seekers
Browse jobsCompanies hiringRemote jobsJobs by locationJobs by cityJobs by categoryJobs by skillCareer guidesCareer blogSalary insights
Job types
Contractor jobsFull-Time jobsIntern jobsOther jobsPart-Time jobsPer-Diem jobsTemporary jobs
Top states
Jobs in TexasJobs in New YorkJobs in CaliforniaJobs in FloridaJobs in North CarolinaJobs in MassachusettsAll states →
Top categories
Healthcare & Nursing jobsLogistics & Warehouse jobsEngineering jobsIT jobsHospitality & Catering jobsSales jobsTeaching jobs
Popular skills
CDL A jobsRegistered Nurse jobsBLS jobsExcel jobs
Featured employers
Company
About usFAQContactPrivacy policyUS privacy noticeAccessibility

Recrutus helps candidates discover roles that match their skills and helps teams reach qualified applicants faster. Browse by metro, discipline, or work style — from internships to senior leadership.

© 2026 Recrutus. All rights reserved.
Terms of serviceCookie policyAcceptable useDMCA policyEmployer termsCandidate terms
Jobs
Locationsexpand_more
All locations
Jobs in TexasJobs in New YorkJobs in CaliforniaJobs in FloridaJobs in North CarolinaJobs in Massachusetts
Categoriesexpand_more
All categories
Healthcare & NursingLogistics & WarehouseEngineeringITHospitality & CateringSales
SkillsCompaniesCareer GuidesBlogSalary
JobsLocationsCategoriesCompaniesCareer GuidesBlogSalary

Top states

TexasNew YorkCaliforniaFloridaNorth CarolinaMassachusetts

Top categories

Healthcare & NursingLogistics & WarehouseEngineeringITHospitality & CateringSales
Recrutus

Curating the world's most innovative career opportunities. We bridge the gap between visionary talent and industry-leading companies.

Search roles by city, category, skill, or job type — explore verified US employers, salary benchmarks, and remote-friendly teams hiring nationwide.

publiclanguageshare
Job seekers
Browse jobsCompanies hiringRemote jobsJobs by locationJobs by cityJobs by categoryJobs by skillCareer guidesCareer blogSalary insights
Job types
Contractor jobsFull-Time jobsIntern jobsOther jobsPart-Time jobsPer-Diem jobsTemporary jobs
Top states
Jobs in TexasJobs in New YorkJobs in CaliforniaJobs in FloridaJobs in North CarolinaJobs in MassachusettsAll states →
Top categories
Healthcare & Nursing jobsLogistics & Warehouse jobsEngineering jobsIT jobsHospitality & Catering jobsSales jobsTeaching jobs
Popular skills
CDL A jobsRegistered Nurse jobsBLS jobsExcel jobs
Featured employers
Company
About usFAQContactPrivacy policyUS privacy noticeAccessibility

Recrutus helps candidates discover roles that match their skills and helps teams reach qualified applicants faster. Browse by metro, discipline, or work style — from internships to senior leadership.

© 2026 Recrutus. All rights reserved.
Terms of serviceCookie policyAcceptable useDMCA policyEmployer termsCandidate terms
Jobs
Locationsexpand_more
All locations
Jobs in TexasJobs in New YorkJobs in CaliforniaJobs in FloridaJobs in North CarolinaJobs in Massachusetts
Categoriesexpand_more
All categories
Healthcare & NursingLogistics & WarehouseEngineeringITHospitality & CateringSales
SkillsCompaniesCareer GuidesBlogSalary
JobsLocationsCategoriesCompaniesCareer GuidesBlogSalary

Top states

TexasNew YorkCaliforniaFloridaNorth CarolinaMassachusetts

Top categories

Healthcare & NursingLogistics & WarehouseEngineeringITHospitality & CateringSales
  1. Home
  2. chevron_right
  3. it
  4. chevron_right
  5. Software Development Engineer, EC2 UltraServer Availability
Amazon logo

Software Development Engineer, EC2 UltraServer Availability

Not Disclosed•Full-TimeOn-site

location_on301, Union Street, Central Business District, Belltown, Seattle, King County, Washington, 98101, United States

Apply Now

About the Team

The EC2 UltraServer Availability team is a high-performing engineering organization responsible for maintaining high availability of NVIDIA-based ML infrastructure at scale. We manage end-to-end repair and recovery workflows for GB200 and GB300 UltraServers, from initial problem detection through repair and recovery. Our team drives operational excellence through continuous improvement of problem detection, repair efficacy, and customer impact mitigation. We work closely with hardware engineering, data center operations, and EC2 service teams to ensure reliable, efficient recovery of critical ML compute capacity. This is a high-impact role leading a two-pizza team of talented engineers solving complex technical challenges in one of Amazon's fastest-growing infrastructure domains.

About the Role

As a Software Development Engineer II, you will design, build, and maintain cloud-based repair and recovery workflows for NVIDIA GB200 / GB300 UltraServers. This role orchestrates repair and recovery operations from impairment detection through completed recovery, requiring expertise in AWS services, system architecture, and cross-functional collaboration with Capacity Management, Hardware Engineering, and Datacenter Operations to manage AI/ML infrastructure.

This is a hands-on position in which you will own everything from end to end: requirements gathering, designs, design reviews, implementations, code reviews, incremental feature launches, operations, mentoring, and the driving of continuous improvement. You will work in environments where the technology strategy is defined but the solution design is not, building solutions that are stable, logical, testable, and efficient while making independent trade-off decisions.

Hiring Process

The interview process typically includes a technical deep-dive, system design discussions, and team fit assessments to ensure alignment with our engineering standards and culture.

Equal Opportunity

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit amazon.jobs for more information.

Work location

Work model: On-site

location_on

Similar Job Opportunities

Amazon logo

Software Development Engineer II, AWS DynamoDB

Amazon • Seattle, Washington

$144k-194karrow_forward
Synchrony logo

VP, Solution Architect - Marketing Engineering

Synchrony • Alpharetta, Georgia

$135k-230karrow_forward
TransMedics, Inc. logo

Principal Data Science Engineer - AI/ML & Algorithms

TransMedics, Inc. • Andover, Massachusetts

Skills, education and keywords

Skills: Aws, System Architecture, Software Development, Software Programming Language, Source Control Management, Continuous Deployments, Testing, Operational Excellence, Hardware Integration, Software Integration.

Education: Bachelor's degree in computer science or equivalent.

Frequently asked questions about Software Development Engineer, EC2 UltraServer Availability at Amazon

What does a Software Development Engineer, EC2 UltraServer Availability at Amazon do?expand_more
In this Software Development Engineer, EC2 UltraServer Availability at Amazon role, you will manage network partition configurations for multi-node gpu clusters and ai/ml training systems; handle firmware validation and consistency checks across asset groups; design and architect cross-functional cloud-based repair and recovery workflows for nvidia gb200/gb300 ultraservers; and develop and maintain automated diagnostic triage, hardware testing, and cable validation processes.
What are the requirements for this Software Development Engineer, EC2 UltraServer Availability role?expand_more
To qualify for the Software Development Engineer, EC2 UltraServer Availability at Amazon position, applicants should have: 3+ years of non-internship professional software development experience; 2+ years of non-internship design or architecture experience; Experience programming with at least one software programming language; 3+ years of full software development life cycle experience; and Bachelor's degree in computer science or equivalent.
Where is the Software Development Engineer, EC2 UltraServer Availability role at Amazon located?
Recrutus

Curating the world's most innovative career opportunities. We bridge the gap between visionary talent and industry-leading companies.

Search roles by city, category, skill, or job type — explore verified US employers, salary benchmarks, and remote-friendly teams hiring nationwide.

publiclanguageshare
Job seekers
Browse jobsCompanies hiringRemote jobsJobs by location

301, Union Street, Central Business District, Belltown, Seattle, King County, Washington, 98101, United States

Seattle, Washington

Key Responsibilities

  • check_circleManage network partition configurations for multi-node GPU clusters and AI/ML training systems
  • check_circleHandle firmware validation and consistency checks across asset groups
  • check_circleDesign and architect cross-functional cloud-based repair and recovery workflows for NVIDIA GB200/GB300 UltraServers
  • check_circleDevelop and maintain automated diagnostic triage, hardware testing, and cable validation processes
  • check_circleCollaborate with hardware engineering and datacenter operations to resolve complex technical challenges
  • check_circleBuild scalable infrastructure solutions using AWS native services to ensure high availability
  • check_circleCreate observable systems with appropriate metrics and alarming for workflow monitoring
  • check_circleExecute and monitor UltraServer repair workflows while troubleshooting failures

Requirements

  • verified3+ years of non-internship professional software development experience
  • verified2+ years of non-internship design or architecture experience
  • verifiedExperience programming with at least one software programming language
  • verified3+ years of full software development life cycle experience
  • verifiedBachelor's degree in computer science or equivalent

Benefits & Perks

check_circleHealth insurance including medical, dental, vision, prescription, Basic Life & AD&Dcheck_circleSupplemental life plans and option for additional coveragecheck_circleEmployee Assistance Program (EAP) and Mental Health Supportcheck_circleFlexible Spending Accountscheck_circleAdoption and Surrogacy Reimbursement coveragecheck_circle401(k) matchingcheck_circlePaid time offcheck_circleParental leave
Amazon logo
Company

Amazon

Industry

it

View company profilearrow_forwardlanguageWebsite
Quick Overview

Experience

3+ yrs (Mid Level)

Education

Bachelor's degree in computer science or equivalent

Job Type

Full-Time

Skills Required

AwsSystem ArchitectureSoftware DevelopmentSoftware Programming LanguageSource Control ManagementContinuous DeploymentsTestingOperational ExcellenceHardware IntegrationSoftware Integration
$152k-216karrow_forward
expand_more
Software Development Engineer, EC2 UltraServer Availability at Amazon is based in 301, Union Street, Central Business District, Belltown, Seattle, King County, Washington, 98101, United States. This is a on-site role.
Is this Software Development Engineer, EC2 UltraServer Availability job remote, hybrid, or on-site?expand_more
Amazon has listed this Software Development Engineer, EC2 UltraServer Availability role as on-site.
How much experience is required for this Software Development Engineer, EC2 UltraServer Availability role?expand_more
Software Development Engineer, EC2 UltraServer Availability at Amazon typically requires 3+ years of relevant experience at the mid level level.
What skills do you need for the Software Development Engineer, EC2 UltraServer Availability role at Amazon?expand_more
Key skills for Software Development Engineer, EC2 UltraServer Availability at Amazon include Aws; System Architecture; Software Development; Software Programming Language; Source Control Management; Continuous Deployments; Testing; and Operational Excellence.
What education is required for Software Development Engineer, EC2 UltraServer Availability at Amazon?expand_more
Educational requirements for this role: Bachelor's degree in computer science or equivalent.
What category does the Software Development Engineer, EC2 UltraServer Availability role belong to?expand_more
Software Development Engineer, EC2 UltraServer Availability at Amazon is part of the it job category on Recrutus.
Jobs by city
Jobs by category
Jobs by skill
Career guides
Career blog
Salary insights
Job types
Contractor jobsFull-Time jobsIntern jobsOther jobsPart-Time jobsPer-Diem jobsTemporary jobs
Top states
Jobs in TexasJobs in New YorkJobs in CaliforniaJobs in FloridaJobs in North CarolinaJobs in MassachusettsAll states →
Top categories
Healthcare & Nursing jobsLogistics & Warehouse jobsEngineering jobsIT jobsHospitality & Catering jobsSales jobsTeaching jobs
Popular skills
CDL A jobsRegistered Nurse jobsBLS jobsExcel jobs
Featured employers
Company
About usFAQContactPrivacy policyUS privacy noticeAccessibility

Recrutus helps candidates discover roles that match their skills and helps teams reach qualified applicants faster. Browse by metro, discipline, or work style — from internships to senior leadership.

© 2026 Recrutus. All rights reserved.
Terms of serviceCookie policyAcceptable useDMCA policyEmployer termsCandidate terms