Featherless AI — ML Engineer — Multilingual Data

Posted 4d ago

About the role

Featherless AI is hiring an ML Engineer to build data pipelines that improve multilingual coverage across thousands of hosted open-source models. You will source, curate, and process high-quality multilingual datasets that directly improve how well models on the Featherless platform serve speakers of diverse languages.

What you'll do

Design and maintain multilingual data collection, filtering, and curation pipelines at scale
Evaluate dataset quality across diverse language families, scripts, and writing systems
Implement deduplication, quality scoring, and language-specific normalization at scale
Collaborate with AI Researchers on multilingual benchmark design and evaluation
Engage with open-source multilingual data communities (OPUS, CulturaX, etc.)

Requirements

Experience building multilingual NLP data pipelines
Familiarity with major open-source multilingual corpora (CC100, CulturaX, mC4, OPUS, etc.)
Proficiency in Python; experience with large-scale data processing (Spark, Apache Beam, or similar)
Knowledge of text normalization challenges across scripts and language families

About Featherless AI

Featherless AI is a serverless inference platform hosting 3,000+ open-source LLMs, letting developers call any model via a simple API without managing GPU infrastructure.

Visit Featherless AI→

AI Alerts shares third-party job opportunities for informational purposes only. We are not the employer and are not involved in the hiring process. Always verify the company and role through official channels before applying, and never pay to apply, train, onboard, process documents, or secure a job offer. Legitimate employers do not ask applicants for money. Read our Terms to learn more.

About the role

What you'll do

Requirements

About Featherless AI

More from Featherless AI

Related Engineering