CoreWeave is a modern cloud, providing enterprise scale with the flexibility of a start-up.
At CoreWeave we are building the next generation public cloud for accelerated workloads. Our stack consists of 8,000 bare metal servers with 45,000 GPUs total with over 7 data-centers. We lay Kubernetes on top to provide bare metal performance to users in ML Training, Real Time Inference and CGI markets. CoreWeave espouses small teams of high-performing, highly accountable engineers; the use of open source tools and libraries whenever possible; and focusing on delivering reliable, useful software to our customers.
What will you be working on?
The infrastructure team is the center of our business. We build and maintain the hardware and software that all our customers run on. The team has a broad mandate, allowing people with different skill sets to contribute while providing plenty of room to grow and learn.
- Kubernetes underpins our stack. Both containerized workloads and Virtual Machines are orchestrated by Kubernetes. Our clusters have several thousand bare-metal nodes in them. The team spends a lot of time on scaling, optimizing and troubleshooting the Kubernetes control plane.
- Automation. We strictly follow GitOps, and heavily leverage Helm and Argo CD to manage all components of our stack. Images for bare metal machines and VMs are built with Packer.
- Open Source contributions. As we face unique challenges due to our scale, we often make tweaks or fix bugs in the Open Source software we utilize. As a principle, we PR all changes upstream that will benefit the broader community. Our engineers contribute to projects like Calico, Kubevirt and Argo Workflows.
- Tooling. We built a lot of custom controllers to wrangle our large set of hardware.
- Observability. We leverage Prometheus and Loki for early problem detection. The infrastructure team maintains the observability stack, as well as the alerts and metrics used to detect problems and track regressions.
- Solution design. The CoreWeave infrastructure is unique in the breadth of hardware options we offer to clients, as well as it’s scale. The infrastructure team assists in solution-design for large scale clients to make sure their applications play seamlessly on our platform.
We’re looking for engineers with experience in building infrastructure at scale. We expect a candidate to have production experience in some but not all of the following areas.
- Kubernetes “the hard way” – We run fully managed control planes for our customers. Kubernetes clusters contain up to 1,000 nodes at a time. Knowledge of Kubernetes internals such as the scheduler, modifying scheduler policies, pod security policies, admission controllers are areas a candidate should have experience in.
- Continuous Deployment via tools such as Ansible and Argo CD
- Linux – Bare metal, Ubuntu based images booted over iSCSI and PXE. Understanding of CPU sets and NUMA topologies.
- Go Development – We are contributing to open source projects such as Kubeflow, Argo and write our own admission controllers, CLIs and scheduler plugins.
At CoreWeave we work hard, have fun and move fast! The company has entered a stage of hyper-growth that you will not want to miss out on! Today we are a small, growing team of intelligent, genuine people, that value different perspectives and approaches to solving complex problems. At CoreWeave we support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that champions collaboration and prioritizes innovative solutions to complex problems. As we get set to take off, the growth opportunities within the organization are limitless. You will be surrounded by some of the best talent in the industry. Come join us!
We offer a competitive salary and benefits, including:
- Medical, dental and vision insurance – 100% paid for the employee
- Life Insurance
- Short and long-term disability insurance
- Flexible Spending Account
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our NJ office
- Weekly massages in NJ office
- A casual work environment
- Work culture focused on innovation.
COVID-19 vaccine requirements for in-person work:
To protect the health and safety of our employees, we require any employee conducting in-person work to be fully vaccinated against COVID-19 by their start date. If you are unable to be vaccinated due to medical or protected religious reasons, please reach out to our HR team at firstname.lastname@example.org to submit an accommodations request.
CoreWeave is an equal opportunity employer, committed to our diversity and inclusiveness. We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.