Research Engineer (GPU Performance)
About the role
Runway is hiring a Research Engineer specialising in GPU performance to optimize the training and inference infrastructure powering its large-scale video generation models. You will write custom kernels, tune distributed training, and work closely with model research teams to make experimental iteration faster and reduce compute costs.
What you'll do
- Write and optimize custom CUDA/Triton kernels for attention, normalization, and other compute-intensive operations.
- Profile and improve training throughput on large GPU clusters running video diffusion and world models.
- Implement and tune distributed training strategies (FSDP, tensor and pipeline parallelism, NCCL tuning).
- Benchmark and improve inference efficiency for production-serving of video generation models.
- Partner with research teams to identify and resolve compute bottlenecks blocking faster experimentation.
Requirements
- Deep expertise in GPU programming (CUDA or Triton); experience writing custom operators for PyTorch.
- Strong understanding of distributed training frameworks (FSDP, DeepSpeed, Megatron).
- Experience profiling and optimizing large model training on multi-node GPU clusters.
- Solid C++ and Python skills; familiarity with PyTorch internals and autograd.
- Prior work on diffusion model or LLM training optimization is a plus.
About Runway
Runway is an AI company building general-purpose multimodal simulators of the world through AI video generation. Its models (Gen-4, Aleph, General World Models) power millions of creators and enterprises, and are recognized by CNBC Disruptor 50 and Forbes AI 50.
AI Alerts shares third-party job opportunities for informational purposes only. We are not the employer and are not involved in the hiring process. Always verify the company and role through official channels before applying, and never pay to apply, train, onboard, process documents, or secure a job offer. Legitimate employers do not ask applicants for money. Read our Terms to learn more.