AI Research Engineer, GPU Performance - Runway ML
About the role
Build the systems that make training Runway's large-scale generative models faster, more memory-efficient, and more scalable. You'll work on GPU kernel optimization, custom CUDA operations, and distributed training infrastructure that directly accelerates the pace of Runway's foundation model research across thousands of GPUs.
What you'll do
- Design and implement custom CUDA kernels and GPU-level optimizations for Runway's large-scale model training workloads
- Improve training throughput, memory efficiency, and scaling characteristics across distributed GPU clusters
- Profile and identify performance bottlenecks in training pipelines for diffusion and transformer-based models
- Collaborate with research scientists to translate algorithmic improvements into hardware-efficient implementations
Requirements
- Strong CUDA programming skills with deep understanding of GPU memory hierarchy and compute patterns
- Experience with distributed training frameworks (PyTorch, JAX) and large-scale ML training infrastructure
- Background in low-level systems optimization: profiling tools, kernel fusion, memory bandwidth optimization
- Familiarity with training large-scale deep learning models (transformers, diffusion models, or similar)
About Runway ML
Runway is a pioneering generative AI company building foundation models for video, image, and audio content creation. Creator of the Gen-1, Gen-2, and Gen-3 video generation models, powering a new era of AI-native filmmaking and visual storytelling. Backed by Google, NVIDIA, Salesforce, and Felicis.
AI Alerts shares third-party job opportunities for informational purposes only. We are not the employer and are not involved in the hiring process. Always verify the company and role through official channels before applying, and never pay to apply, train, onboard, process documents, or secure a job offer. Legitimate employers do not ask applicants for money. Read our Terms to learn more.