Summary

I am a research-oriented machine learning systems engineer working on foundation model infrastructure, alignment, and evaluation. My work focuses on building efficient, reliable systems for large language models while studying the algorithms and data choices that make these models more useful, controllable, and cost-effective in real applications.

At TikTok, my recent work centers on Model-as-a-Service platforms and high-performance LLM inference. I develop serving infrastructure with vLLM and SGLang across model runtime integration, scheduling and continuous batching, KV-cache and memory management, distributed execution, observability, and reliability. This systems work is closely connected to my research on distributed disaggregated inference, preference optimization, instruction-tuning data selection, multimodal evaluation, and retrieval-augmented biomedical summarization.

My broader research spans reinforcement learning for robotics, healthcare sequence modeling, privacy-preserving machine learning, and motion planning. I am especially interested in model-system co-design: how model architecture, inference algorithms, data curation, hardware utilization, scheduling, and distributed runtimes interact. My goal is to advance frontier AI systems that are faster to experiment with, more rigorous to evaluate, and dependable enough to serve at scale.

Interests

  • Efficient LLM Inference and Serving Systems
  • Model-System Co-design for Foundation Models
  • Alignment, Preference Optimization, and Instruction Tuning
  • Multimodal Reasoning and Model Evaluation
  • Retrieval-Augmented and Biomedical NLP
  • Reinforcement Learning for Robotics

Education

  • Master of Information and Data Science, GPA 3.966/4.0

    University of California, Berkeley, CA, United States of America

  • Graduate Studies in Computer Science, GPA 4.0/4.0

    Stanford University, CA, United States of America

  • Master of Science in Software Engineering, GPA 3.95/4.0

    Carnegie Mellon University, CA, United States of America

  • Bachelor of Engineering in Computer Science, GPA 3.8/4.0

    Huzhou Normal University, Zhejiang, China

Selected Publications

Efficient Cross-GPU Communication for Disaggregated LLM Serving

Large Language Model (LLM) serving increasingly relies on disaggregated architectures to improve resource utilization and scalability. However, existing LLM systems often depend on hardware-specific communication libraries tightly coupled to particular RDMA transports, resulting in fragmented implementations, limited portability, and …

Academic Service

Program Committee & Reviewer

Program committee member and conference/workshop reviewer across machine learning, NLP, AI, data mining, and multimedia venues.

  • ICLR 2026
  • NeurIPS 2026
  • ACL ARR 2025-2026
  • AISTATS 2026
  • AAAI
  • IJCAI
  • KDD Workshops
  • NeurIPS 2024 and 2025 Workshops
  • IEEE SSCI 2025
  • IEEE MIPR 2025
  • KSEM 2025

Journal Reviewer

Journal reviewer for machine learning, imaging, multimedia systems, and internet technology venues.

  • IEEE Transactions on Image Processing (TIP)
  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
  • Journal of Medical Imaging (JMI)
  • Journal of Machine Learning for Biomedical Imaging
  • ACM Transactions on Internet Technology (TOIT)

Experience

Senior Machine Learning Engineer - LLMs

TikTok Full-time

Feb 2023 - Present Mountain View, California, United States

  • Lead the development of Model-as-a-Service infrastructure and high-performance LLM serving systems, integrating vLLM and SGLang across model runtimes, distributed execution, and production deployment workflows.
  • Design model-system co-optimizations for scheduling, continuous batching, KV-cache and memory management, observability, fault isolation, and reliability to improve the efficiency and operability of large-scale inference.
  • Co-authored KVDirect on distributed disaggregated LLM inference and developed CommBridge, a portable cross-GPU communication runtime spanning KV-cache transfer, MoE communication, and heterogeneous RDMA backends.
  • Work across inference algorithms, runtime design, networking, and hardware utilization to make foundation-model experimentation faster and production serving more dependable.

Quantitative Algo Developer - Smart Contract (DeFi 2.0)

Aperture Finance Internship

May 2022 - Sep 2022 Mountain View, California, United States

  • Served as technical owner for Solana quantitative-strategy projects and designed a pseudo-delta-neutral hedging strategy using leveraged liquidity-farming protocols.
  • Built a research pipeline that combined autoencoder-based risk-factor extraction, GAN-generated synthetic data, wavelet denoising, and Kalman filtering over five years of historical market data.
  • Evaluated individual and composite alpha factors through offline backtesting with Zipline and Backtrader, reducing estimated deployment risk for candidate strategies by 30%.
  • Implemented Rust smart contracts that converted model-generated signals into dynamic long-short portfolio adjustments; the deployed strategy generated $2M in net income in one month.

Staff Engineer - Cloud Infrastructure | Tech Lead

Binance Full-time

May 2020 - May 2022 Singapore City, Singapore

  • Led an eight-engineer team to design, build, and launch Themis, Binance’s first global strategy-distribution platform for traffic allocation and predicate-based decisioning.
  • Established the platform’s offline simulation, production-data feedback, and experimentation workflows, supporting up to 1,000 concurrent A/B experiments and 20,000 strategy releases per day.
  • Designed the online-offline evaluation loop used to improve resource-allocation decisions across products serving 200M users and approximately $1B in weekly transaction volume.
  • Owned technical direction and delivery across the platform, deployment tooling, administration workflows, and client integration.

Cloud Infrastructure Staff Engineer | Tech Lead

Alibaba Cloud Full-time

Jan 2017 - May 2020 Shanghai, China

  • Led a five-engineer team responsible for high-scale personalization and messaging infrastructure during two Tmall Double 11 campaigns, including the 2019 event that processed RMB 268.4B in daily GMV.
  • Designed a behavior-aware advertising and promotion-ranking system using purchase history, search activity, customer profiles, and engagement signals; the program was associated with 48% annual revenue growth and an 8% increase in purchase rate.
  • Improved online and offline recommendation pipelines for push, banner, and SMS campaigns by 25%; expanded user-profile features and raised XGBoost/logistic-regression model AUC from 0.59 to 0.77.
  • Architected the Taco distributed messaging ecosystem to process more than 1.1B notifications per day with over 97% delivery, 0.72-1.2 second latency, 750K peak connections, and 30K-50K QPS.

Infrastructure Team Lead

Shanghai Function Internet Finance Information Service Co., Ltd. Full-time

May 2016 - Jan 2017 Shanghai, China

  • Led the engineering team that built a Java-based infrastructure snapshot and caching system for read-heavy services, reducing average response time by 65% and improving robustness by 81% at petabyte-scale daily cache volume.
  • Directed the migration of 12 containerized backend services to Alibaba Cloud Kubernetes and redesigned the Redis/MySQL data layer, improving measured platform stability by 65%.
  • Designed an intelligent investor-project matching system combining recommendation, text summarization, and text-to-speech components, increasing successful project bids by 40%.

Infrastructure Team Lead

PING AN Full-time

Nov 2015 - Apr 2016 Shanghai, China

  • Designed the 1QB Wallet recommendation system to match customers with financial products using behavioral and search signals, improving recommendation accuracy by 30%.
  • Built a Go and Kafka event-driven transaction system for millisecond-scale promotional ordering by millions of users; the platform contributed RMB 10B in incremental GMV during its first year.
  • Re-architected distributed task processing with Celery, RabbitMQ, and MySQL, reducing queue latency by 20%, increasing throughput by 4x, and lowering memory consumption by 40%.
  • Established code review, static analysis, package deployment, backup distribution, and CI/CD workflows, improving engineering delivery efficiency by 5x.

Research Engineer

Quatanium Technology Co., Ltd. Full-time

Jun 2014 - Nov 2015 Shanghai, China

  • Designed and built the company’s first cluster observability platform, processing millions of logs per day through Logstash, Kibana, and Grafana with 99.999% annual availability.
  • Developed containerized CI/CD pipelines for AWS that automated build, packaging, and production deployment for an average of 200 integrations per day.
  • Built an automated error-detection and ticketing service for operations teams, reducing incident-response waiting time by 80%.

Contact