Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
Create sustainable systems and services through automation and upliftsto remove operational toil and manual processes
Participate in systems designreviewsand partner with teams on action items from Root Cause Analysis sessions
Build well-defined service level objectives(SLOs), metrics, monitors, and logs as required
Close collaboration with team members as well as cross-functional teams such as DevOps, Cloud, and development teams within Petco
When required, respond to incidents and concerns related to the production environments
Debug, troubleshoot, and solve for concernswith a proactive approach to problemsolving •5-10+ yearsas a Software Engineer, DevOps Engineer, or Site Reliability Engineer
Coding experiencewith at least 1 high-level language such as Python, Go, or Java.
Experience with supporting critical services in productionin the cloud (AWS) andon-premises
Infrastructure as Code (IaC)tools such as Terraform
Monitoring tools such as New Relic, SumoLogic, DataDog, SevOne, Sentry
Proactive approach to spotting problems, areas for improvement,removing manual process and toilusing code,andfixing performance concernsusing code
Shift hours 4PM–1AM IST
10+ years as a Software Engineer, DevOps Engineer, or Site Reliability Engineer
Backend software development experienceusing Python
Expertise with CDN and WAF technologies such as Cloudflare, AWS WAF, CloudFront
Experiencewith addingtelemetry, distributed tracing, and performance debuggingandwith building solutions to fix themusing code
Experience with building SLIs, SLOs, and error budgets