
location_on166, Cumberland Street, City of Rochester, Monroe County, New York, 14605, United States
ECLARO is partnering with a leading technology solutions provider in Rochester, NY to find a Site Reliability Engineering (SRE) Platform Engineer (Lead). This organization collaborates closely with customers to manage complex needs and achieve strategic business goals. The client is seeking a technical leader to drive reliability engineering strategy and execution across critical IT Business Solutions platforms.
This is a high-impact role designed to build and mature reliability engineering capabilities from the ground up. The successful candidate will serve as the technical lead for SRE practices, establishing monitoring standards and influencing tooling decisions while partnering across infrastructure, development, operations, and vendor teams. The position focuses on improving uptime, performance, and operational efficiency through software enhancements, observability, automation, and data-driven Root Cause Analysis (RCA).
As a Lead SRE Platform Engineer, you will define and mature SRE best practices across both cloud and on-prem environments. Your work will involve designing comprehensive monitoring strategies using tools like Dynatrace, Datadog, and Microsoft SCOM, while evolving a MELT (Metrics, Events, Logs, Traces) data strategy to enhance service reliability.
You will lead the critical transition of CI/CD pipelines from Azure DevOps to GitHub, enhancing pipeline automation with embedded security controls. A significant portion of your time will be dedicated to supporting reliability across diverse systems, including Azure cloud infrastructure, legacy VMware environments, and internal .NET/C# applications. You will also manage integrations with workforce platforms like Workday and ADP, as well as warehouse distribution systems like Blue Yonder.
Beyond technical implementation, you will drive strategic initiatives such as developing predictive reliability models using statistical techniques and identifying systemic risks across production systems. You will optimize incident management workflows within the BMC ecosystem, ensuring effective triage and escalation. This role requires active participation in off-hour escalation support and cross-functional collaboration to document procedures and guide future tooling decisions.
If you are up to the challenge and ready to take on this rewarding opportunity, please contact Jeanine Hastings directly to discuss your qualifications.
ECLARO values diversity and is an Equal Opportunity Employer. We do not discriminate based on Race, Color, Religion, Sex, Sexual Orientation, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status, in compliance with all applicable laws.
Amazon • Dallas, Texas
Nexiva • Talleyville, Delaware
Ford Motor Company • Santa Fe, New Mexico
Skills: Site Reliability Engineering, Dynatrace, Datadog, Microsoft Scom, Azure, Hyper-V, Vmware, Netapp, Pure Storage, Azure Log Analytics.
Education: Bachelor's Degree in a related technical field (Preferred).
Work model: On-site
166, Cumberland Street, City of Rochester, Monroe County, New York, 14605, United States
City of Rochester, New York
Strong programming experience in .NET / C#, Python, and SQL. Experience with MSSQL (primary) and Oracle (limited). Experience with GitHub. Agile / Scrum experience. Knowledge of Reliability-Centered Engineering and maintenance strategies. Experience with synthetic testing and proactive validation post-deployment. Bachelor's Degree in a related technical field.
Recrutus helps candidates discover roles that match their skills and helps teams reach qualified applicants faster. Browse by metro, discipline, or work style — from internships to senior leadership.