CoreWeave is seeking a Production Engineer – Team Lead to join their team in Bellevue, WA. The role focuses on ensuring the stability and reliability of CoreWeave's cloud infrastructure while providing strategic direction and operational continuity.
About the Role
As the Production Engineer – Team Lead, you will be responsible for incident management, operational excellence, and team development. You will act as the Incident Commander during critical incidents, coordinate cross-functional teams, and lead root cause analysis efforts. Additionally, you will define and track Service Level Objectives (SLOs), drive improvements in system resilience, and mentor the Production Engineering Team.
About You
Required:
4+ years of experience in production engineering, cloud operations, site reliability engineering (SRE), or incident response roles.
Deep knowledge of cloud platforms (e.g., Kubernetes-based infrastructure, AWS, GCP).
Strong familiarity with incident management frameworks such as ITIL and SRE best practices.
Proficiency with monitoring and alerting tools (e.g., Prometheus, Grafana) and strong understanding of observability principles.
Hands-on experience with automation, scripting, and configuration management tools (e.g., Python, Bash, Terraform).
Demonstrated ability to make critical decisions under pressure, guiding teams through high-stakes incident resolution.
Excellent communication skills, with the ability to translate complex technical issues for both technical and non-technical stakeholders.
Proven experience mentoring and coaching technical teams, driving a culture of growth and continuous improvement.
Applicants must have work authorization that does not require sponsorship from the company now or in the future.
Preferred:
Previous experience in an Incident Commander role, managing high-priority incidents and major service restorations.
Advanced knowledge of Kubernetes, containerization, and distributed systems.
Familiarity with change management processes, post-incident analysis techniques, and runbook automation.
Experience with developing and managing self-healing infrastructure.
Benefits
Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption
CoreWeave
CoreWeave is the AI Hyperscaler™
Company Size: 501-1000 employeesTechnology, Information and Internet