Job Description
Introduction Overview In this role, you will be part of a team that develops and supports the Apptio Kubernetes Platform (AKP) where all Apptio applications are deployed. In a typical day you will interact with Github, Linux, Kubernetes, ArgoCD, Docker, Confluence, Jira, Slack, and AWS. You Are You are passionate about problem solving and reliability and have significant experience in SRE or an adjacent role. Your team can count on you to solve challenging problems across the entire Apptio Portfolio. You collaborate with other SREs, developers, and support teams to help provide value to the broader organization. You take responsibility when fixing problems in an automated code first way and are happy to step outside your comfort zone to develop your skillset. You are a mentor to other engineers and able to assist Management in key decision making. Us The Platform and Site Reliability Engineering team - PRE - at Apptio is responsible for enhancing and maintaining our Kubernetes platform and driving the adoption of SRE best practices across our engineering teams. We are a distributed team working across three locations including the United States, Poland, and Australia. Your Role and Responsibilities Manage deployments of Apptio services to AKP Streamline the deployment process Improve observability of the services within your purview by reviewing KPI dashboards and alerting Mentor junior to mid-level engineers Author and maintain documentation of deployment and monitoring processes Write and use runbooks to troubleshoot and triage production issues Detect issues and handle Tier 3 troubleshooting Drive online “swarm” collaboration sessions Collaborate with service developers Participate in on-call rotation Perform maintenance of the platform (patching, resets, upgrades, etc.) Operate independently and own end-to-end delivery of solutions Have significant input in the product roadmap and be able to articulate effectively the benefits of alternative technologies Required Technical and Professional Expertise 5 years’ experience in an SRE or adjacent role Functional understanding of at least one programming language and source control (Preferably Golang) Expertise with distributed application deployment and management via Kubernetes Expertise with container technologies (e.g., Kubernetes, Docker) Expertise with Infrastructure-as-code (IaC) concepts (Terraform) Expertise with cloud provider services, preferably AWS Ability to work with RESTful systems and their APIs Familiarity with observability (e.g., Prometheus, Open telemetry) Demonstrated fluency with the English language skills Preferred Technical and Professional Expertise 7 years’ experience in an SRE or adjacent role J-18808-Ljbffr