Me | Dezhi Yu

I am a research-oriented machine learning systems engineer working on foundation model infrastructure, closed-loop evaluation and optimization systems, and scalable AI platforms. My work focuses on building reliable Model-as-a-Service and Harness-as-a-Service platforms that connect data, training, inference, evaluation, and feedback loops into measurable, continuously improving AI products.

My recent work centers on Model-as-a-Service platforms and high-performance LLM inference. I develop serving infrastructure with vLLM and SGLang across model runtime integration, scheduling and continuous batching, KV-cache and memory management, distributed execution, observability, and reliability. This systems work is closely connected to my research on distributed disaggregated inference, preference optimization, instruction-tuning data selection, multimodal evaluation.

My broader research centers on reinforcement learning infrastructure and reinforcement learning optimization algorithms for scalable AI systems. I am interested in how policy optimization, reward modeling, preference learning, offline RL, simulation environments, distributed rollout systems, and automated evaluation harnesses can be engineered together to improve model behavior. My goal is to build frontier AI systems that learn from feedback efficiently, evaluate progress rigorously, and remain dependable when deployed at scale.

Dezhi Yu

Efficient Cross-GPU Communication for Disaggregated LLM Serving

Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approaches

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

How Far Are Video Models from True Multimodal Reasoning?

ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning

DK-RRT: Deep Koopman RRT for Collision-Aware Motion Planning of Space Manipulators in Dynamic Debris Environments

Machine Learning Optimizes the Efficiency of Picking and Packing in Automated Warehouse Robot Systems

LaySummX at BioLaySumm: Retrieval-Augmented Fine-Tuning for Biomedical Lay Summarization Using Abstracts and Retrieved Full-Text Context

KVDirect: Distributed Disaggregated LLM Inference

Deep Adaptive Control with Frequency Modulation for Aerospace Robotic Manipulators in Dynamic Object Transportation