CoreWeave is seeking a Production Engineer I/II to join their team in Bellevue, WA. The role involves maintaining the reliability and stability of CoreWeave's cloud infrastructure while supporting incident response and operational improvements.
About the Role
As a Production Engineer, you will monitor system performance, troubleshoot issues, and participate in incident management. You will assist in incident response efforts, document incidents, and contribute to the development of incident response playbooks. Additionally, you will collaborate with engineers to improve platform reliability and participate in knowledge-sharing activities.
About You
Required:
4 years of experience in cloud operations, site reliability engineering (SRE), or related technical roles.
Understanding of cloud platforms (e.g., Kubernetes, AWS, GCP) and basic knowledge of cloud infrastructure.
Familiarity with incident management practices and frameworks (e.g., ITIL, SRE best practices).
Experience with monitoring and alerting tools (e.g., Prometheus, Grafana) or willingness to learn.
Basic experience with scripting or automation tools (e.g., Python, Bash, Terraform, Ansible).
Strong communication skills, with the ability to explain technical concepts clearly and concisely.
Preferred:
Exposure to Kubernetes, containerization, and distributed systems.
Familiarity with change management processes and post-incident analysis.
Experience with automated systems or self-healing infrastructure is a plus.
A desire to learn and grow in the areas of cloud operations, reliability engineering, and incident management.
Benefits
Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption
CoreWeave
CoreWeave is the AI Hyperscaler™
Company Size: 501-1000 employeesTechnology, Information and Internet