Dezhi Yu

Dezhi Yu

Senior ML Engineer

I am a research-oriented machine learning systems engineer working on foundation model infrastructure, alignment, and evaluation. My work focuses on building efficient, reliable systems for large language models while studying the algorithms and data choices that make these models more useful, controllable, and cost-effective in real applications.

At TikTok, my recent work centers on Model-as-a-Service platforms and high-performance LLM inference. I develop serving infrastructure with vLLM and SGLang across model runtime integration, scheduling and continuous batching, KV-cache and memory management, distributed execution, observability, and reliability. This systems work is closely connected to my research on distributed disaggregated inference, preference optimization, instruction-tuning data selection, multimodal evaluation, and retrieval-augmented biomedical summarization.

My broader research spans reinforcement learning for robotics, healthcare sequence modeling, privacy-preserving machine learning, and motion planning. I am especially interested in model-system co-design: how model architecture, inference algorithms, data curation, hardware utilization, scheduling, and distributed runtimes interact. My goal is to advance frontier AI systems that are faster to experiment with, more rigorous to evaluate, and dependable enough to serve at scale.

Efficient Cross-GPU Communication for Disaggregated LLM Serving featured image

Efficient Cross-GPU Communication for Disaggregated LLM Serving

CommBridge is a portable communication runtime for disaggregated LLM serving that decouples LLM communication primitives from RDMA backends, improving deployment portability across …

avatar
Dezhi Yu
Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches featured image

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

Comparative generative modeling for Bach-style symbolic music generation.

avatar
Dezhi Yu
Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study featured image

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

Empirical study of Direct Preference Optimization for chatbot fine-tuning.

avatar
Dezhi Yu
How Far Are Video Models from True Multimodal Reasoning? featured image

How Far Are Video Models from True Multimodal Reasoning?

Evaluating video models for true multimodal reasoning.

x.-zhang
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning featured image

ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning

Reward-oriented data selection for task-specific LLM instruction tuning.

y.-wu
DK-RRT: Deep Koopman RRT for Collision-Aware Motion Planning of Space Manipulators in Dynamic Debris Environments featured image

DK-RRT: Deep Koopman RRT for Collision-Aware Motion Planning of Space Manipulators in Dynamic Debris Environments

Deep Koopman RRT for collision-aware space manipulator planning.

q.-chen
Machine Learning Optimizes the Efficiency of Picking and Packing in Automated Warehouse Robot Systems featured image

Machine Learning Optimizes the Efficiency of Picking and Packing in Automated Warehouse Robot Systems

Machine learning for efficient picking and packing in automated warehouse robot systems.

avatar
Dezhi Yu
KVDirect: Distributed Disaggregated LLM Inference featured image

KVDirect: Distributed Disaggregated LLM Inference

Distributed disaggregated inference for efficient LLM serving.

s.-chen
Deep Adaptive Control with Frequency Modulation for Aerospace Robotic Manipulators in Dynamic Object Transportation featured image

Deep Adaptive Control with Frequency Modulation for Aerospace Robotic Manipulators in Dynamic Object Transportation

Deep adaptive control for aerospace robotic manipulators.

y.-zhang