Deep Reinforcement Learning-Based Obstacle Avoidance for Robot Movement in Warehouse Environments

Sep 1, 2024·

K. Li

J. Chen

Dezhi Yu

T. Dajun

X. Qiu

J. Lian

R. Ji

S. Zhang

Z. Wan

B. Sun

· 0 min read

arXiv DOI DOI

Abstract

A deep reinforcement learning method for obstacle avoidance in warehouse robot movement, targeting robust navigation in dynamic and constrained environments.

Type

Conference paper

Publication

2024 IEEE 6th International Conference on Civil Aviation Safety and Information Technology (ICCASIT 2024)

Last updated on Jun 12, 2026

Deep Reinforcement Learning Robotics Obstacle Avoidance Motion Planning

Authors

Dezhi Yu

Senior ML Engineer

I am a research-oriented machine learning systems engineer working on foundation model infrastructure, closed-loop evaluation and optimization systems, and scalable AI platforms. My work focuses on building reliable Model-as-a-Service and Harness-as-a-Service platforms that connect data, training, inference, evaluation, and feedback loops into measurable, continuously improving AI products.

My recent work centers on Model-as-a-Service platforms and high-performance LLM inference. I develop serving infrastructure with vLLM and SGLang across model runtime integration, scheduling and continuous batching, KV-cache and memory management, distributed execution, observability, and reliability. This systems work is closely connected to my research on distributed disaggregated inference, preference optimization, instruction-tuning data selection, multimodal evaluation.

My broader research centers on reinforcement learning infrastructure and reinforcement learning optimization algorithms for scalable AI systems. I am interested in how policy optimization, reward modeling, preference learning, offline RL, simulation environments, distributed rollout systems, and automated evaluation harnesses can be engineered together to improve model behavior. My goal is to build frontier AI systems that learn from feedback efficiently, evaluate progress rigorously, and remain dependable when deployed at scale.

← Predicting 30-Day Hospital Readmission in Medicare Patients: Insights from an LSTM Deep Learning Model Oct 1, 2024

LeetCode Cookbook Aug 13, 2020 →