DenverRecruiter Since 2001
the smart solution for Denver jobs

Customer Reliability Engineer

Company: NetApp
Location: Louisville
Posted on: January 26, 2023

Job Description:

Job Summary

As a Customer Reliability Engineer, you'll manage a portfolio of customer-facing cloud services (SaaS/IaaS) ensuring overall availability, performance and security. You'll work in a highly collaborative environment with NetApp and Google/AWS/Microsoft teams from all over the world (RTP, Reykjav--k, Bangalore, Sunnyvale, Redmond, and more). This position includes rotational on-call work as part of a global team due to the critical nature of the services we support.

Job Requirements

You will be working in a hectic and fast paced organization as an engineer on the Customer Reliability Engineering (CRE) team. This team is responsible for assisting NetApp Cloud Volume Services (CVS) and Astra customers in resolving complex technical issues in production environments.

We are looking for a CRE with a deep understanding of complex distributed system platforms/cloud technologies and ability to simply articulate it to customers and SREs within a customer organization.

You will have the opportunity to work with your teammates and our customers to support many new, leading-edge technologies that solve real challenges. You will work to provide robust feedback and guidance to our Product and Engineering teams while being a voice for our customers. You want to make our customers successful while strengthening their relationship with NetApp. You can make a huge impact and have real ownership for the work you do.

Job Requirements

Essential Responsibilities

  • Work with external customers and partners to help make them successful
  • Respond to, troubleshoot and drive root cause analysis (RCA) of complex live production incidents and cross platform issues handling OS, Networking and Database in a cloud-based SaaS / IaaS environments by following and implementing SRE best practices
  • Continuously monitor, analyze and measure the availability, latency and overall system health using tools like Prometheus, Stackdriver, ElasticSearch, Grafana and SolarWinds as well as develop steps to improve system and application performance, availability and reliability
  • Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available
  • Keep up-to date with security and proactively identify, diagnose, and solve complex security issues
  • Maintain and monitor deployment, orchestration of the servers, docker containers, databases, and general backend infrastructure
  • Apply automation to any tasks or parts of the system that would benefit from it or are performed manually
  • Utilize Atlassian Jira to track issues to resolution based on their priority


    • Advanced knowledge of the Incident Management processes and ability to resolve issues within agreed organization SLA/SLO
      • Advanced knowledge of Linux operating systems (Ubuntu, CentOS, etc.)
      • Advanced knowledge of container-based architecture (Kubernetes)
      • Advanced knowledge of tools like Ansible, Python, Bash, Go, PowerShell and other scripting language
      • Intermediate knowledge in algorithms, data structures and databases (SQL/NoSQL)
      • Intermediate knowledge of networking concepts
      • Intermediate understanding of cloud environments such as GCP or AWS
      • Intermediate knowledge of site reliability engineering principles Education
        • BS in computer science or equivalent or 10+ years professional experience

Keywords: NetApp, Denver , Customer Reliability Engineer, Engineering , Louisville, Colorado

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest Colorado jobs by following @recnetCO on Twitter!

Denver RSS job feeds