AI Research Engineer (Kernel & Inference Optimization)

Posted 5d ago

About the role

Join Tether's AI model team to drive innovation in model serving and inference architectures for advanced AI systems — from resource-efficient edge models to complex multimodal architectures. You will write custom GPU compute kernels using Metal Shading Language, build high-throughput inference pipelines, and optimize distributed inference across massive GPU clusters and resource-constrained mobile devices.

What you'll do

Design and deploy model serving architectures achieving high throughput and low latency while minimizing memory footprint across edge and resource-constrained environments
Write custom GPU compute shaders from scratch using Metal Shading Language (MSL) for on-device AI inference on smartphones
Build, run, and monitor controlled inference tests tracking response latency, throughput, memory consumption, and error rates
Design distributed inference systems using Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism for massive GPU-cluster deployments
Identify and resolve computational bottlenecks including suboptimal batch processing, network delays, and high memory usage in production pipelines
Apply pruning, quantization, Flash attention, KV Cache, and Speculative Decoding to continuously push inference performance boundaries

Requirements

PhD in NLP, Machine Learning, or a related field (CS degree minimum) with A* conference publications and a strong AI R&D track record
Must have hands-on knowledge of Metal Shading Language (MSL) — comfortable writing custom compute shaders from scratch
Proven experience in low-level kernel optimizations and inference optimization on mobile/edge devices with measurable improvements in latency, throughput, and memory footprint
Deep expertise in modern model serving architectures, inference optimization techniques, and efficient memory management for resource-constrained deployment
Strong background in distributed inference, transformer/diffusion model architectures, and efficiency techniques (Flash attention, KV Cache, Speculative Decoding, quantization, pruning)

About Tether Operations Limited

Tether is a global fintech company behind USDT, the world's most widely used stablecoin, and operates divisions in AI, communications, energy, and education through its Tether Data, Tether Evo, Tether Power, and Tether Education subsidiaries.

Visit Tether Operations Limited→

AI Alerts shares third-party job opportunities for informational purposes only. We are not the employer and are not involved in the hiring process. Always verify the company and role through official channels before applying, and never pay to apply, train, onboard, process documents, or secure a job offer. Legitimate employers do not ask applicants for money. to learn more.

About the role

What you'll do

Requirements

About Tether Operations Limited

More from Tether Operations Limited

Related Engineering