Dezhi Yu

Dezhi Yu

Senior ML Engineer

I am a research-oriented machine learning systems engineer working on foundation model infrastructure, alignment, and evaluation. My work focuses on building efficient, reliable systems for large language models while studying the algorithms and data choices that make these models more useful, controllable, and cost-effective in real applications.

At TikTok, my recent work centers on Model-as-a-Service platforms and high-performance LLM inference. I develop serving infrastructure with vLLM and SGLang across model runtime integration, scheduling and continuous batching, KV-cache and memory management, distributed execution, observability, and reliability. This systems work is closely connected to my research on distributed disaggregated inference, preference optimization, instruction-tuning data selection, multimodal evaluation, and retrieval-augmented biomedical summarization.

My broader research spans reinforcement learning for robotics, healthcare sequence modeling, privacy-preserving machine learning, and motion planning. I am especially interested in model-system co-design: how model architecture, inference algorithms, data curation, hardware utilization, scheduling, and distributed runtimes interact. My goal is to advance frontier AI systems that are faster to experiment with, more rigorous to evaluate, and dependable enough to serve at scale.

AdaMixup: A Dynamic Defense Framework for Membership Inference Attack Mitigation featured image

AdaMixup: A Dynamic Defense Framework for Membership Inference Attack Mitigation

Adaptive defense against membership inference attacks.

y.-chen
Research on Reinforcement Learning Based Warehouse Robot Navigation Algorithm in Complex Warehouse Layout featured image

Research on Reinforcement Learning Based Warehouse Robot Navigation Algorithm in Complex Warehouse Layout

Reinforcement learning for warehouse robot navigation in complex layouts.

k.-li
Predicting 30-Day Hospital Readmission in Medicare Patients: Insights from an LSTM Deep Learning Model featured image

Predicting 30-Day Hospital Readmission in Medicare Patients: Insights from an LSTM Deep Learning Model

LSTM modeling for 30-day hospital readmission prediction.

x.-li
Deep Reinforcement Learning-Based Obstacle Avoidance for Robot Movement in Warehouse Environments featured image

Deep Reinforcement Learning-Based Obstacle Avoidance for Robot Movement in Warehouse Environments

Deep reinforcement learning for obstacle avoidance in warehouse robotics.

k.-li
LeetCode Cookbook featured image

LeetCode Cookbook

This is Cookbook about solutions to LeetCode by Go, 100% test coverage, runtime beats 100%. Now it has obtained 520 solutions.

avatar
Dezhi Yu
Segment Tree Basics featured image

Segment Tree Basics

Introduce SegmentTree data structure in ACM-ICPC, and application of SegmentTree.

avatar
Dezhi Yu
Redis Multi-Data Center Two-Way Synchronization featured image

Redis Multi-Data Center Two-Way Synchronization

Introduce Distributed theory in Redis Synchronization, and Conflict-free Replicated Data Type.

avatar
Dezhi Yu
Redis Design Ideas and Usage Specifications featured image

Redis Design Ideas and Usage Specifications

Introduce common use methods and design ideas of redis.

avatar
Dezhi Yu
Detailed HTTP/2 header compression algorithm-HPACK featured image

Detailed HTTP/2 header compression algorithm-HPACK

In HTTP/1.1 (see [RFC7230]), the header field is not compressed. As the number of requests within a web page grows to the point where tens to hundreds of requests are needed, …

avatar
Dezhi Yu
Golang Message Streaming Practice in Eleme featured image

Golang Message Streaming Practice in Eleme

How to use golang to build a high-concurrency and high-performance Message Streaming System? Let me tell you.

avatar
Dezhi Yu