AI Research Engineer (Model Compression & Quantization)

Posted 5d ago

About the role

Join Tether's AI research team to advance model compression and efficient deployment for multimodal AI systems including LLMs and VLMs. You will apply and extend quantization, knowledge distillation, and pruning techniques to enable high-performance AI on resource-constrained edge devices such as smartphones. The role involves a hands-on, research-driven approach to building scalable, low-latency AI inference pipelines and publishing findings in top AI conferences.

What you'll do

Apply low-bit quantization (QAT and PTQ) to reduce model size and inference latency for LLMs, VLMs, and multimodal architectures while preserving accuracy
Leverage knowledge distillation to compress large teacher models into efficient student models for multimodal reasoning across text, image, and audio
Implement and improve pruning techniques — including adaptive pruning schedules and feature-matching distillation — to remove redundant parameters without sacrificing performance
Research and apply mixed-precision quantization and other advanced compression strategies; analyze accuracy–efficiency trade-offs empirically
Build robust compression pipelines, establish performance and fidelity metrics, and resolve bottlenecks in production inference on edge hardware
Author and publish technical papers at top-tier AI conferences (NeurIPS, ICML, ICLR, CVPR, ACL, AAAI)

Requirements

PhD in NLP, Machine Learning, or a related field (CS degree minimum) with a strong track record in AI R&D and publications in A* conferences
Hands-on experience with model quantization including Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ)
Research and practical experience with knowledge distillation and model pruning applied to LLMs and multimodal architectures
Strong proficiency in PyTorch; solid understanding of transformer architectures, backpropagation, and fine-tuning techniques
Familiarity with C++ is a plus, especially for low-level quantization kernels or inference optimizations

About Tether Operations Limited

Tether is a global fintech company behind USDT, the world's most widely used stablecoin, and operates divisions in AI, communications, energy, and education through its Tether Data, Tether Evo, Tether Power, and Tether Education subsidiaries.

Visit Tether Operations Limited→

AI Alerts shares third-party job opportunities for informational purposes only. We are not the employer and are not involved in the hiring process. Always verify the company and role through official channels before applying, and never pay to apply, train, onboard, process documents, or secure a job offer. Legitimate employers do not ask applicants for money. Read our Terms to learn more.

About the role

What you'll do

Requirements

About Tether Operations Limited

More from Tether Operations Limited

Related Engineering