Alignment | Dezhi Yu

Alignment | Dezhi Yuhttps://halfrost.me/tags/alignment/AlignmentHugoBlox Kit (https://hugoblox.com)en-usMon, 01 Jun 2026 00:00:00 +0000https://halfrost.me/media/favicon_hu_4db6119fa52e8e17.pngAlignmenthttps://halfrost.me/tags/alignment/Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Studyhttps://halfrost.me/publication/dpo-chatbot-finetuning/Mon, 01 Jun 2026 00:00:00 +0000https://halfrost.me/publication/dpo-chatbot-finetuning/ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuninghttps://halfrost.me/publication/rose-reward-oriented-data-selection/Sat, 01 Nov 2025 00:00:00 +0000https://halfrost.me/publication/rose-reward-oriented-data-selection/