Learn more at [Register to View]
At Snorkel AI, we’re working to make AI practical for everyone. We’re building Snorkel Flow, our enterprise AI application development platform based on our cutting edge research on programmatic labeling.
Snorkel Flow accelerates creating and evaluating training datasets and state-of-the-art machine learning models, letting customers build end-to-end AI applications in as little as hours, not months. We work across the stack on a wide variety of AI use cases, and we’re growing fast — come join our amazing team of Snorkelers!
As a Site Reliability Engineer, you'll play a key role in making Snorkel Flow seamless to use for data scientists, engineers and operators alike. Your responsibilities will include designing complex multi-node applications with Kubernetes, building out an expansive observability platform for machine learning applications, and designing data pipelines that can handle massive workloads. This role is distinct in that it will require a deeper understanding of the technologies involved in machine learning, and exposure to state of the art MLOps technologies and principles.
- Design, prototype, and refine scalable infrastructure for operating Snorkel's machine learning pipeline at scale
- Enable engineers at Snorkel to succeed in the end-to-end software development process
- Automating infrastructure maintenance and management for tracking our fleet of customer deployments
- Work directly with ML engineers and customers to enable successful deployments of Snorkel Flow
- 1+ years of professional engineering experience or work program equivalents in a relevant field
- Demonstrated self-motivation and willingness to dive into complicated DevOps challenges
- Exposure to machine learning fundamentals such as training data and models
- Familiarity with Kubernetes or other container orchestration tools in a production setting
- Ability to work in a fast-paced environment and strong technical communication skills