Featherless AI — ML Engineer — Training Optimization

Posted 4d ago

About the role

Featherless AI is looking for an ML Engineer to bring training efficiency research into production. You will implement and scale training optimization techniques that reduce compute cost and improve model quality across the open-source models Featherless hosts, turning research insights into reliable engineering systems.

What you'll do

Implement training efficiency techniques at scale: mixed precision, gradient accumulation, activation checkpointing
Optimize distributed training across large GPU clusters using FSDP, DeepSpeed, or Megatron-LM
Profile and debug training instabilities, loss spikes, and memory bottlenecks
Build tooling for training monitoring, evaluation, and experiment reproducibility
Translate optimization research findings into production training workflows

Requirements

Strong ML engineering background with production training pipeline experience
Experience with distributed training frameworks (FSDP, DeepSpeed, Megatron-LM)
Proficiency in Python and PyTorch or JAX
Track record shipping optimized training pipelines at scale across large model families

About Featherless AI

Featherless AI is a serverless inference platform hosting 3,000+ open-source LLMs, letting developers call any model via a simple API without managing GPU infrastructure.

Visit Featherless AI→

AI Alerts shares third-party job opportunities for informational purposes only. We are not the employer and are not involved in the hiring process. Always verify the company and role through official channels before applying, and never pay to apply, train, onboard, process documents, or secure a job offer. Legitimate employers do not ask applicants for money. Read our Terms to learn more.

About the role

What you'll do

Requirements

About Featherless AI

More from Featherless AI

Related Engineering