Boosting Continuous Control with Consistency Policy
In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 , May 2024
Due to its training stability and strong expression, the diffusion model has attracted considerable attention in offline reinforcement learning. However, several challenges have also come with it: 1) The demand for a large number of diffusion steps makes the diffusion- model-based methods time inefficient and limits their applications in real-time control; 2) How to achieve policy improvement with accurate guidance for diffusion model-based policy is still an open problem. Inspired by the consistency model, we propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL), which derives action from noise by a single step. By estab- lishing a mapping from the reverse diffusion trajectories to the de- sired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based pol- icy with the learned Q-function. We demonstrate that CPQL can achieve policy improvement with accurate guidance for offline re- inforcement learning, and can be seamlessly extended for online RL tasks. Experimental results indicate that CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, signif- icantly improving inference speed by nearly 45 times compared to Diffusion-QL.