Large Language Models

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study featured image

Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study

Empirical study of Direct Preference Optimization for chatbot fine-tuning.

avatar
Dezhi Yu
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning featured image

ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning

Reward-oriented data selection for task-specific LLM instruction tuning.

y.-wu