Finetune Stable Diffusion Models with DDPO via TRL | Pasteblog