Site Reliability Engineer

Total Experience: 5-10 Years

Mandatory Skills : Python, Cloudflare, AWS WAF, CloudFront

Job Description:

  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
  • Create sustainable systems and services through automation and upliftsto remove operational toil and manual processes
  • Participate in systems designreviewsand partner with teams on action items from Root Cause Analysis sessions
  • Build well-defined service level objectives(SLOs), metrics, monitors, and logs as required
  • Close collaboration with team members as well as cross-functional teams such as DevOps, Cloud, and development teams within Petco
  • When required, respond to incidents and concerns related to the production environments
  • Debug, troubleshoot, and solve for concernswith a proactive approach to problemsolving •5-10+ yearsas a Software Engineer, DevOps Engineer, or Site Reliability Engineer
  • Coding experiencewith at least 1 high-level language such as Python, Go, or Java.
  • Experience with supporting critical services in productionin the cloud (AWS) andon-premises
  • Infrastructure as Code (IaC)tools such as Terraform
  • Monitoring tools such as New Relic, SumoLogic, DataDog, SevOne, Sentry
  • Proactive approach to spotting problems, areas for improvement,removing manual process and toilusing code,andfixing performance concernsusing code
  • Shift hours 4PM–1AM IST
  • 10+ years as a Software Engineer, DevOps Engineer, or Site Reliability Engineer
  • Backend software development experienceusing Python
  • Expertise with CDN and WAF technologies such as Cloudflare, AWS WAF, CloudFront
  • Experiencewith addingtelemetry, distributed tracing, and performance debuggingandwith building solutions to fix themusing code
  • Experience with building SLIs, SLOs, and error budgets
Wordpress Social Share Plugin powered by Ultimatelysocial