With the rapid development of generative artificial intelligence, significant advances have been made in diffusion model-based Text-to-Image (T2I) and Text-to-Video (T2V) technologies. Recent studies have introduced Direct Preference Optimization (DPO) to T2I tasks, greatly improving human preferences in generated images. However, current T2V methods lack a complete pipeline and specific loss function to align generated videos with human preferences via DPO. Moreover, challenges such as the lack of sufficient paired video preference data prevent effective model training. Additional, the SD v1-4 weights lack the capability to maintain spatiotemporal consistency during video generation, which may constrain the model’s flexibility and lead to lower-quality outputs. In response, we propose three solutions: 1) Our work implements the integration of the DPO fine-tuning strategy into T2V tasks. By deriving a carefully structured loss function, we utilize human feedback to align video generation with human preferences. We refer to this new method as \textbf{HuViDPO}. 2) Our work constructs a small-scale Human Preference Video Pair Dataset to fulfill the core requirements of the DPO fine-tuning strategy, aiming to address the current scarcity of pairwise video preference datasets. 3) We proposed a DPO-based fine-tuning strategy by adapting SD v1.4 for short video generation, achieving clear visual improvements over baselines. Additionally, we verified its strong performance in T2V customization tasks.