- Iterate on processes to improve our ability to ship fast while maintaining high quality systems that we can depend on
- Partner with Engineering teams to provide your insight on reliability best practices and to build and develop a strong collaborative partnership.
- Maintain and support our infrastructure
- Maintain up-to-date documentation on deployments, process and standard operating procedures
- Maximize efficiency in a constantly evolving environment where the process is fluid and creative solutions are the norm
- Measure and improve service performance
- Perform post-mortems and in-depth root cause analysis to ensure we are always improving
- Deploying and maintaining highly available services in AWS and GCP using Terraform, Kubernetes, Jenkins, Grafana and Prometheus
- Preparing for and simulating disasters of all sorts. We’re mission critical for our customers and need to stay up, no matter what!
Skills and Experience
- 4+ years of experience in a DevOps or SRE position.
- 4+ years of experience configuring and troubleshooting Linux environments.
- An understanding of what it takes to design and run services at scale – and achieve the capabilities with infrastructure as code.
- Knowledge of production large scale environments/ on premises environments
- Experience working with AWS, Terraform, and Kubernetes at an expert level.
- Terraform expert at least in 1 cloud provider, with the ability to adapt to multiple cloud providers
- Experience maintaining and scaling unmanaged Kaftka instances
- Linux shell expert
- Knowledge in maintaining mutable and immutable infrastructures
- Knowledge in at least one of the following: saltstack, chef, puppet, ansible or other mainstream deployment orchestration tool.
- Jenkins CI experience
- Operational knowledge/experience in AWS and GCP
- Intermediate knowledge of Go and Python
- You have a deep curiosity, and a strong sense of purpose, urgency, and drive.
- You have a passion for doing infrastructure as code
- You’re a collaborative problem solver who is open to the opinions and ideas of others.
- You have a strong appreciation for Linux and its value
- You are a US citizen or green card holder living in the US and are permitted to work with AWS GovCloud (US) accounts
- Experience and interest in Security
- Experience in a scaling startup