Featherless AI — ML Engineer — Multilingual Data
About the role
Featherless AI is hiring an ML Engineer to build data pipelines that improve multilingual coverage across thousands of hosted open-source models. You will source, curate, and process high-quality multilingual datasets that directly improve how well models on the Featherless platform serve speakers of diverse languages.
What you'll do
- Design and maintain multilingual data collection, filtering, and curation pipelines at scale
- Evaluate dataset quality across diverse language families, scripts, and writing systems
- Implement deduplication, quality scoring, and language-specific normalization at scale
- Collaborate with AI Researchers on multilingual benchmark design and evaluation
- Engage with open-source multilingual data communities (OPUS, CulturaX, etc.)
Requirements
- Experience building multilingual NLP data pipelines
- Familiarity with major open-source multilingual corpora (CC100, CulturaX, mC4, OPUS, etc.)
- Proficiency in Python; experience with large-scale data processing (Spark, Apache Beam, or similar)
- Knowledge of text normalization challenges across scripts and language families
About Featherless AI
Featherless AI is a serverless inference platform hosting 3,000+ open-source LLMs, letting developers call any model via a simple API without managing GPU infrastructure.
AI Alerts shares third-party job opportunities for informational purposes only. We are not the employer and are not involved in the hiring process. Always verify the company and role through official channels before applying, and never pay to apply, train, onboard, process documents, or secure a job offer. Legitimate employers do not ask applicants for money. Read our Terms to learn more.