Offline RL is renowned for its sample efficiency and has attracted widespread attention in both academia and industry. However, despite the abundance of emerging algorithms, it remains unclear which methods excel at specific tasks or which are superior for most robotic applications.
I benchmarked classic and cutting-edge offline RL algorithms to objectively assess their performance. To this end, we provide the following clarifications:
Given the growing interest in Embodied AI, the testing centered on robotic locomotion and manipulation tasks using MuJoCo (Hopper, Walker2d, Humanoid, Swimmer, HalfCheetah, Ant) and MetaWorld.
The evaluated algorithms range from classics to the latest frontiers, including:
Disclaimer: These results are provided solely for academic reference. .
Preview
Performance MuJoCo for Offline RL
MuJoCo:HopperMuJoCo: Walker2dMuJoCo: Humanoid
MuJoCo: SwimmerMuJoCo: HalfCheetahMuJoCo: Ant
Note
The tag 'office' denotes the official source code implementation, while 'self' denotes my implementation from scratch (without using third-party APIs).
The experimental code for this project is not publicly available here. If interested, please contact me or refer to the official implementation
Video of Initial and End Performance for MuJoCo-Hopper
Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Hopper. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:
2016-ICLR-DDPG-Office Initial MuJoCo-Hopper2016-ICLR-DDPG-Office Final MuJoCo-Hopper2016-ICLR-DDPG-self Initial MuJoCo-Hopper2016-ICLR-DDPG-self Final MuJoCo-Hopper2018-ICML-SAC-office Initial MuJoCo-Hopper2018-ICML-SAC-office Final MuJoCo-Hopper
2018-ICML-SAC-self Initial MuJoCo-Hopper2018-ICML-SAC-self Final MuJoCo-Hopper2018-ICML-TD3-office Initial MuJoCo-Hopper2018-ICML-TD3-office Final MuJoCo-Hopper2018-ICML-TD3-self Initial MuJoCo-Hopper2018-ICML-TD3-self Final MuJoCo-Hopper
2021-ICLR-REDQ-self Initial MuJoCo-Hopper2022-ICLR-REDQ-self Final MuJoCo-Hopper2022-ICLR-DroQ-self Initial MuJoCo-Hopper2022-ICLR-DroQ-self Final MuJoCo-Hopper
Analysis of Performance for MuJoCo-Walker2d
Training Curve MuJoCo-Walker2d
MuJoCo-Walker2d
Video of Initial and End Performance for MuJoCo-Walker2d
Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Walker2d. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:
2016-ICML-DDPG-office Initial MuJoCo-Walker2d2016-ICML-DDPG-office Final MuJoCo-Walker2d2016-ICML-DDPG-self Initial MuJoCo-Walker2d2016-ICML-DDPG-self Final MuJoCo-Walker2d2018-ICML-SAC-office Initial MuJoCo-Walker2d2018-ICML-SAC-office Final MuJoCo-Walker2d
2018-ICML-SAC-self Initial MuJoCo-Walker2d2018-ICML-SAC-self Final MuJoCo-Walker2d2018-ICML-TD3-office Initial MuJoCo-Walker2d2018-ICML-TD3-office Final MuJoCo-Walker2d2018-ICML-TD3-self Initial MuJoCo-Walker2d2018-ICML-TD3-self Final MuJoCo-Walker2d
2021-ICML-REDQ-self Initial MuJoCo-Walker2d2021-ICML-REDQ-self Final MuJoCo-Walker2d2022-ICML-DroQ-self Initial MuJoCo-Walker2d2022-ICML-DroQ-self Final MuJoCo-Walker2d
Analysis of Performance for MuJoCo-Humanoid
Training Curve MuJoCo-Humanoid
MuJoCo-Humanoid
Video of Initial and End Performance for MuJoCo-Humanoid
Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Humanoid. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:
2018-ICML-DDPG-office Initial MuJoCo-Humanoid2018-ICML-DDPG-office Final MuJoCo-Humanoid2018-ICML-DDPG-self Initial MuJoCo-Humanoid2018-ICML-DDPG-self Final MuJoCo-Humanoid2018-ICML-SAC-office Initial MuJoCo-Humanoid2018-ICML-SAC-office Final MuJoCo-Humanoid
2018-ICML-SAC-self Initial MuJoCo-Humanoid2018-ICML-SAC-self Final MuJoCo-Humanoid2018-ICML-TD3-office Initial MuJoCo-Humanoid2018-ICML-TD3-office Final MuJoCo-Humanoid2018-ICML-TD3-self Initial MuJoCo-Humanoid2018-ICML-TD3-self Final MuJoCo-Humanoid
2021-ICML-REDQ-self Initial MuJoCo-Humanoid2021-ICML-REDQ-self Final MuJoCo-Humanoid2022-ICML-DroQ-self Initial MuJoCo-Humanoid2022-ICML-DroQ-self Final MuJoCo-Humanoid
Analysis of Performance for MuJoCo-Swimmer
Training Curve MuJoCo-Swimmer
MuJoCo-Swimmer
Video of Initial and End Performance for MuJoCo-Swimmer
Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Swimmer. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:
2016-ICML-DDPG-office Initial MuJoCo-Swimmer2016-ICML-DDPG-office Final MuJoCo-Swimmer2016-ICML-DDPG-self Initial MuJoCo-Swimmer2016-ICML-DDPG-self Final MuJoCo-Swimmer2018-ICML-SAC-office Initial MuJoCo-Swimmer2018-ICML-SAC-office Final MuJoCo-Swimmer
2018-ICML-SAC-self Initial MuJoCo-Swimmer2018-ICML-SAC-self Final MuJoCo-Swimmer2018-ICML-TD3-office Initial MuJoCo-Swimmer2018-ICML-TD3-office Final MuJoCo-Swimmer2018-ICML-TD3-self Initial MuJoCo-Swimmer2018-ICML-TD3-self Final MuJoCo-Swimmer
2021-ICML-REDQ-self Initial MuJoCo-Swimmer2021-ICML-REDQ-self Final MuJoCo-Swimmer2022-ICML-DroQ-self Initial MuJoCo-Swimmer2022-ICML-DroQ-self Final MuJoCo-Swimmer
Analysis of Performance for MuJoCo-HalfCheetah
Training Curve MuJoCo-HalfCheetah
MuJoCo-HalfCheetah
Video for MuJoCo-HalfCheetah
Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-HalfCheetah. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:
2016-ICML-DDPG-office Initial MuJoCo-HalfCheetah2016-ICML-DDPG-office Final MuJoCo-HalfCheetah2016-ICML-DDPG-self Initial MuJoCo-HalfCheetah2016-ICML-DDPG-self Final MuJoCo-HalfCheetah2018-ICML-SAC-office Initial MuJoCo-HalfCheetah2018-ICML-SAC-office Final MuJoCo-HalfCheetah
2018-ICML-SAC-self Initial MuJoCo-HalfCheetah2018-ICML-SAC-self Final MuJoCo-HalfCheetah2018-ICML-TD3-office Initial MuJoCo-HalfCheetah2018-ICML-TD3-office Final MuJoCo-HalfCheetah2018-ICML-TD3-self Initial MuJoCo-HalfCheetah2018-ICML-TD3-self Final MuJoCo-HalfCheetah
2021-ICML-REDQ-self Initial MuJoCo-HalfCheetah2021-ICML-REDQ-self Final MuJoCo-HalfCheetah2022-ICML-DroQ-self Initial MuJoCo-HalfCheetah2022-ICML-DroQ-self Final MuJoCo-HalfCheetah
Analysis of Performance for MuJoCo-Ant
MuJoCo-Ant
Video of Initial and End Performance for MuJoCo-Ant
Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Ant. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:
2016-ICML-DDPG-office Initial MuJoCo-Ant2016-ICML-DDPG-office Final MuJoCo-Ant2016-ICML-DDPG-self Initial MuJoCo-Ant2016-ICML-DDPG-self Final MuJoCo-Ant2018-ICML-SAC-office Initial MuJoCo-Ant2018-ICML-SAC-office Final MuJoCo-Ant
2018-ICML-SAC-self Initial MuJoCo-Ant2018-ICML-SAC-self Final MuJoCo-Ant2018-ICML-TD3-office Initial MuJoCo-Ant2018-ICML-TD3-office Final MuJoCo-Ant2018-ICML-TD3-self Initial MuJoCo-Ant2018-ICML-TD3-self Final MuJoCo-Ant
2021-ICML-REDQ-self Initial MuJoCo-Ant2021-ICML-REDQ-self Final MuJoCo-Ant2022-ICML-DroQ-self Initial MuJoCo-Ant2022-ICML-DroQ-self Final MuJoCo-Ant