AI Research Engineer (Model Compression & Quantization)
About the role
Join Tether's AI research team to advance model compression and efficient deployment for multimodal AI systems including LLMs and VLMs. You will apply and extend quantization, knowledge distillation, and pruning techniques to enable high-performance AI on resource-constrained edge devices such as smartphones. The role involves a hands-on, research-driven approach to building scalable, low-latency AI inference pipelines and publishing findings in top AI conferences.
What you'll do
- Apply low-bit quantization (QAT and PTQ) to reduce model size and inference latency for LLMs, VLMs, and multimodal architectures while preserving accuracy
- Leverage knowledge distillation to compress large teacher models into efficient student models for multimodal reasoning across text, image, and audio
- Implement and improve pruning techniques — including adaptive pruning schedules and feature-matching distillation — to remove redundant parameters without sacrificing performance
- Research and apply mixed-precision quantization and other advanced compression strategies; analyze accuracy–efficiency trade-offs empirically
- Build robust compression pipelines, establish performance and fidelity metrics, and resolve bottlenecks in production inference on edge hardware
- Author and publish technical papers at top-tier AI conferences (NeurIPS, ICML, ICLR, CVPR, ACL, AAAI)
Requirements
- PhD in NLP, Machine Learning, or a related field (CS degree minimum) with a strong track record in AI R&D and publications in A* conferences
- Hands-on experience with model quantization including Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ)
- Research and practical experience with knowledge distillation and model pruning applied to LLMs and multimodal architectures
- Strong proficiency in PyTorch; solid understanding of transformer architectures, backpropagation, and fine-tuning techniques
- Familiarity with C++ is a plus, especially for low-level quantization kernels or inference optimizations
About Tether Operations Limited
Tether is a global fintech company behind USDT, the world's most widely used stablecoin, and operates divisions in AI, communications, energy, and education through its Tether Data, Tether Evo, Tether Power, and Tether Education subsidiaries.
Visit Tether Operations Limited→
AI Alerts shares third-party job opportunities for informational purposes only. We are not the employer and are not involved in the hiring process. Always verify the company and role through official channels before applying, and never pay to apply, train, onboard, process documents, or secure a job offer. Legitimate employers do not ask applicants for money. Read our Terms to learn more.