Principal Technical Program Manager — Des Moines, IA at

About the Team

The AI Infrastructure GPU Operations Team drives deployment planning, execution governance, operational readiness, reliability, and business rhythm for Oracle Cloud Infrastructure's (OCI) rapidly expanding GPU infrastructure portfolio. As AI becomes embedded across our products and services, we help customers turn that promise into a better future for all. Our platform group ensures the reliability and scalability of large-scale GPU fleets, supporting distributed AI training and inference workloads across multi-region clusters.

True innovation starts when everyone is empowered to contribute. We are committed to growing a workforce that promotes opportunities for all, fostering a culture where structured, data-driven leaders can thrive in collaborative environments.

About the Role

As a Principal Technical Program Manager, you will lead cross-functional programs that connect engineering, platform, operations, business, finance, observability, SRE, network, and leadership teams. You will own the operating mechanisms for regional deployment readiness, GPU fleet health, milestone tracking, executive reporting, incident and change governance, and operational handoff across multiple concurrent GPU operations programs.

This role is designed for a structured, data-driven program leader who values simplicity, scalability, and clear operational mechanisms. You will turn ambiguous technical and operational inputs into clear priorities, metrics, decisions, and action plans. Your day-to-day involves strengthening dashboards, telemetry, documentation, onboarding, playbooks, and repeatable processes to improve how the organization scales. You will also drive the practical use of AI to enhance operations productivity, reduce manual toil, and accelerate triage.

The ideal candidate brings crisp communication, strong ownership, and pragmatic simplification to high-visibility GPU operations programs where disciplined execution, customer impact, and measurable reliability outcomes matter. You will serve as a primary escalation point between engineering and operations teams, resolving priority conflicts and accelerating issue resolution while translating complex situations into accurate narratives for senior stakeholders.

Hiring Process

Applications for this role will generally be accepted for at least three calendar days from the posting date or as long as the job remains posted. Candidates are evaluated based on their ability to lead complex, cross-functional initiatives with measurable outcomes and their experience in building cadences, governance mechanisms, and KPI reporting.

Equal Opportunity & Culture

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans' status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

We are committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, please let us know by emailing accommodation-request_mb@oracle.com or by calling 1-888-404-2494 in the United States.

We encourage employees to give back to their communities through our volunteer programs and offer competitive benefits that support our people with flexible medical, life insurance, and retirement options.

Work location

Work model: On-site

Meredith Corporation, 1716, Locust Street, Meredith Corporation, Des Moines, Polk County, Iowa, 50309, United States

Des Moines, Iowa

Key Responsibilities

check_circleDrive availability and reliability of large-scale GPU fleets by identifying systemic issues and leading cross-functional recovery efforts
check_circleSupport operational readiness and performance of distributed AI training and inference workloads across multi-region GPU clusters
check_circleOwn end-to-end execution of critical AI Infrastructure GPU Operations programs ensuring alignment with business priorities and risk signals
check_circleSet and run weekly operating cadences and governance forums to ensure clear ownership, timelines, and committed actions
check_circleManage deployment governance, change review, readiness tracking, and operational execution processes
check_circleEstablish and scale structured incident management mechanisms to improve root cause analysis and durable fixes
check_circleBuild and maintain business planning inputs, financial forecasts, and executive-level reporting for senior leadership

Requirements

verified5+ years experience in technical program management, program operations, business operations, data analysis, or infrastructure operations
verifiedAdvanced Excel skills including pivots, lookups, conditional logic, and data modeling
verifiedWorking knowledge of PowerPoint, Jira, and Confluence

Nice to Have

Experience with cloud infrastructure, AI/ML infrastructure, GPU operations, data center deployment, capacity planning, or large-scale platform operations. Experience supporting large GPU fleets, distributed AI training or inference workloads, or performance-sensitive infrastructure environments. Experience with incident management, root cause analysis, corrective and preventive action tracking, Change Review Board processes, or high-volume change governance. Familiarity with observability, telemetry, RDMA, RoCE, InfiniBand, network fabric health, service health metrics, ticket/incident analytics, or operational dashboarding. Finance, business planning, workforce planning, or operational readiness experience in a technology organization. Track record of influencing senior business and technology leaders without relying on direct authority.

Benefits & Perks

check_circleMedical, dental, and vision insurance including expert medical opinioncheck_circleShort term disability and long term disability coveragecheck_circleLife insurance and AD&D coveragecheck_circleSupplemental life insurance for employee, spouse, and childHealth care and dependent care Flexible Spending Accounts

Frequently asked questions about Principal TPM -AI Infrastructure at Oracle

What does a Principal TPM -AI Infrastructure at Oracle do?expand_more

Day-to-day, the Principal TPM -AI Infrastructure at Oracle will drive availability and reliability of large-scale gpu fleets by identifying systemic issues and leading cross-functional recovery efforts; support operational readiness and performance of distributed ai training and inference workloads across multi-region gpu clusters; own end-to-end execution of critical ai infrastructure gpu operations programs ensuring alignment with business priorities and risk signals; and set and run weekly operating cadences and governance forums to ensure clear ownership, timelines, and committed actions.

What are the requirements for this Principal TPM -AI Infrastructure role?expand_more

Oracle is looking for candidates who meet the following requirements: 5+ years experience in technical program management, program operations, business operations, data analysis, or infrastructure operations; Advanced Excel skills including pivots, lookups, conditional logic, and data modeling; and Working knowledge of PowerPoint, Jira, and Confluence.

Where is the Principal TPM -AI Infrastructure role at Oracle located?expand_more

Principal TPM -AI Infrastructure at Oracle is based in Meredith Corporation, 1716, Locust Street, Meredith Corporation, Des Moines, Polk County, Iowa, 50309, United States. This is a on-site role.

Is this Principal TPM -AI Infrastructure job remote, hybrid, or on-site?expand_more

Oracle has listed this Principal TPM -AI Infrastructure role as on-site.

How much experience is required for this Principal TPM -AI Infrastructure role?expand_more

Principal TPM -AI Infrastructure at Oracle typically requires 5+ years of relevant experience at the lead level.

What skills do you need for the Principal TPM -AI Infrastructure role at Oracle?expand_more

Key skills for Principal TPM -AI Infrastructure at Oracle include Data Analysis; Advanced Excel; Powerpoint; Jira; Confluence; Rdma; Roce; and Infiniband.

What category does the Principal TPM -AI Infrastructure role belong to?expand_more

Principal TPM -AI Infrastructure at Oracle is part of the it job category on Recrutus.

Principal TPM -AI Infrastructure