Analysis of Performance in OfflineRL

Introduction

Preface

Offline RL is renowned for its sample efficiency and has attracted widespread attention in both academia and industry. However, despite the abundance of emerging algorithms, it remains unclear which methods excel at specific tasks or which are superior for most robotic applications.

I benchmarked classic and cutting-edge offline RL algorithms to objectively assess their performance. To this end, we provide the following clarifications:

  1. Given the growing interest in Embodied AI, the testing centered on robotic locomotion and manipulation tasks using MuJoCo (Hopper, Walker2d, Humanoid, Swimmer, HalfCheetah, Ant) and MetaWorld.

  2. The evaluated algorithms range from classics to the latest frontiers, including:

Warning

Disclaimer: These results are provided solely for academic reference. Unauthorized use for any other purpose is strictly prohibited.

 

Preview

Performance MuJoCo for Offline RL

Hopper-v5_online
MuJoCo:Hopper
Walker2d-v5_online
MuJoCo: Walker2d
Humanoid-v5_online
MuJoCo: Humanoid
Swimmer-v5_online
MuJoCo: Swimmer
HalfCheetah-v5_online
MuJoCo: HalfCheetah
Ant-v5_online
MuJoCo: Ant

Note

  1. The tag 'office' denotes the official source code implementation, while 'self' denotes my implementation from scratch (without using third-party APIs).

  2. The experimental code for this project is not publicly available here. If interested, please contact me or refer to the official implementation

 

Video MuJoCo for Offline RL

Initial MuJoCo-Hopper
Final MuJoCo-Hopper
Initial MuJoCo-Walker2d
Final MuJoCo-Walker2d
Initial MuJoCo-Humanoid
Final MuJoCo-Humanoid
Initial MuJoCo-Swimmer
Final MuJoCo-Swimmer
Initial MuJoCo-HalfCheetah
Final MuJoCo-HalfCheetah
Initial MuJoCo-Ant
Final MuJoCo-Ant

 

Analysis of Performance

Analysis of Performance for MuJoCo-Hopper

Training Curve MuJoCo-Hopper

Swimmer-v5_online
MuJoCo-Hopper

 

Video of Initial and End Performance for MuJoCo-Hopper

Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Hopper. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:

2016-ICLR-DDPG-Office Initial MuJoCo-Hopper
2016-ICLR-DDPG-Office Final MuJoCo-Hopper
2016-ICLR-DDPG-self Initial MuJoCo-Hopper
2016-ICLR-DDPG-self Final MuJoCo-Hopper
2018-ICML-SAC-office Initial MuJoCo-Hopper
2018-ICML-SAC-office Final MuJoCo-Hopper
2018-ICML-SAC-self Initial MuJoCo-Hopper
2018-ICML-SAC-self Final MuJoCo-Hopper
2018-ICML-TD3-office Initial MuJoCo-Hopper
2018-ICML-TD3-office Final MuJoCo-Hopper
2018-ICML-TD3-self Initial MuJoCo-Hopper
2018-ICML-TD3-self Final MuJoCo-Hopper
2021-ICLR-REDQ-self Initial MuJoCo-Hopper
2022-ICLR-REDQ-self Final MuJoCo-Hopper
2022-ICLR-DroQ-self Initial MuJoCo-Hopper
2022-ICLR-DroQ-self Final MuJoCo-Hopper

 

Analysis of Performance for MuJoCo-Walker2d

Training Curve MuJoCo-Walker2d

Swimmer-v5_online
MuJoCo-Walker2d

 

Video of Initial and End Performance for MuJoCo-Walker2d

Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Walker2d. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:

2016-ICML-DDPG-office Initial MuJoCo-Walker2d
2016-ICML-DDPG-office Final MuJoCo-Walker2d
2016-ICML-DDPG-self Initial MuJoCo-Walker2d
2016-ICML-DDPG-self Final MuJoCo-Walker2d
2018-ICML-SAC-office Initial MuJoCo-Walker2d
2018-ICML-SAC-office Final MuJoCo-Walker2d
2018-ICML-SAC-self Initial MuJoCo-Walker2d
2018-ICML-SAC-self Final MuJoCo-Walker2d
2018-ICML-TD3-office Initial MuJoCo-Walker2d
2018-ICML-TD3-office Final MuJoCo-Walker2d
2018-ICML-TD3-self Initial MuJoCo-Walker2d
2018-ICML-TD3-self Final MuJoCo-Walker2d
2021-ICML-REDQ-self Initial MuJoCo-Walker2d
2021-ICML-REDQ-self Final MuJoCo-Walker2d
2022-ICML-DroQ-self Initial MuJoCo-Walker2d
2022-ICML-DroQ-self Final MuJoCo-Walker2d

 

Analysis of Performance for MuJoCo-Humanoid

Training Curve MuJoCo-Humanoid

Swimmer-v5_online
MuJoCo-Humanoid

 

Video of Initial and End Performance for MuJoCo-Humanoid

Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Humanoid. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:

2018-ICML-DDPG-office Initial MuJoCo-Humanoid
2018-ICML-DDPG-office Final MuJoCo-Humanoid
2018-ICML-DDPG-self Initial MuJoCo-Humanoid
2018-ICML-DDPG-self Final MuJoCo-Humanoid
2018-ICML-SAC-office Initial MuJoCo-Humanoid
2018-ICML-SAC-office Final MuJoCo-Humanoid
2018-ICML-SAC-self Initial MuJoCo-Humanoid
2018-ICML-SAC-self Final MuJoCo-Humanoid
2018-ICML-TD3-office Initial MuJoCo-Humanoid
2018-ICML-TD3-office Final MuJoCo-Humanoid
2018-ICML-TD3-self Initial MuJoCo-Humanoid
2018-ICML-TD3-self Final MuJoCo-Humanoid
2021-ICML-REDQ-self Initial MuJoCo-Humanoid
2021-ICML-REDQ-self Final MuJoCo-Humanoid
2022-ICML-DroQ-self Initial MuJoCo-Humanoid
2022-ICML-DroQ-self Final MuJoCo-Humanoid

 

Analysis of Performance for MuJoCo-Swimmer

Training Curve MuJoCo-Swimmer

Swimmer-v5_online
MuJoCo-Swimmer

 

Video of Initial and End Performance for MuJoCo-Swimmer

Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Swimmer. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:

2016-ICML-DDPG-office Initial MuJoCo-Swimmer
2016-ICML-DDPG-office Final MuJoCo-Swimmer
2016-ICML-DDPG-self Initial MuJoCo-Swimmer
2016-ICML-DDPG-self Final MuJoCo-Swimmer
2018-ICML-SAC-office Initial MuJoCo-Swimmer
2018-ICML-SAC-office Final MuJoCo-Swimmer
2018-ICML-SAC-self Initial MuJoCo-Swimmer
2018-ICML-SAC-self Final MuJoCo-Swimmer
2018-ICML-TD3-office Initial MuJoCo-Swimmer
2018-ICML-TD3-office Final MuJoCo-Swimmer
2018-ICML-TD3-self Initial MuJoCo-Swimmer
2018-ICML-TD3-self Final MuJoCo-Swimmer
2021-ICML-REDQ-self Initial MuJoCo-Swimmer
2021-ICML-REDQ-self Final MuJoCo-Swimmer
2022-ICML-DroQ-self Initial MuJoCo-Swimmer
2022-ICML-DroQ-self Final MuJoCo-Swimmer

 

Analysis of Performance for MuJoCo-HalfCheetah

Training Curve MuJoCo-HalfCheetah

Swimmer-v5_online
MuJoCo-HalfCheetah

 

Video for MuJoCo-HalfCheetah

Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-HalfCheetah. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:

2016-ICML-DDPG-office Initial MuJoCo-HalfCheetah
2016-ICML-DDPG-office Final MuJoCo-HalfCheetah
2016-ICML-DDPG-self Initial MuJoCo-HalfCheetah
2016-ICML-DDPG-self Final MuJoCo-HalfCheetah
2018-ICML-SAC-office Initial MuJoCo-HalfCheetah
2018-ICML-SAC-office Final MuJoCo-HalfCheetah
2018-ICML-SAC-self Initial MuJoCo-HalfCheetah
2018-ICML-SAC-self Final MuJoCo-HalfCheetah
2018-ICML-TD3-office Initial MuJoCo-HalfCheetah
2018-ICML-TD3-office Final MuJoCo-HalfCheetah
2018-ICML-TD3-self Initial MuJoCo-HalfCheetah
2018-ICML-TD3-self Final MuJoCo-HalfCheetah
2021-ICML-REDQ-self Initial MuJoCo-HalfCheetah
2021-ICML-REDQ-self Final MuJoCo-HalfCheetah
2022-ICML-DroQ-self Initial MuJoCo-HalfCheetah
2022-ICML-DroQ-self Final MuJoCo-HalfCheetah

 

Analysis of Performance for MuJoCo-Ant

Swimmer-v5_online
MuJoCo-Ant

 

Video of Initial and End Performance for MuJoCo-Ant

Below is the performance of classic and state-of-the-art Offline RL algorithms on MuJoCo-Ant. The video on the left shows the initial policy, while the video on the right shows the performance after 1M samples of online training:

2016-ICML-DDPG-office Initial MuJoCo-Ant
2016-ICML-DDPG-office Final MuJoCo-Ant
2016-ICML-DDPG-self Initial MuJoCo-Ant
2016-ICML-DDPG-self Final MuJoCo-Ant
2018-ICML-SAC-office Initial MuJoCo-Ant
2018-ICML-SAC-office Final MuJoCo-Ant
2018-ICML-SAC-self Initial MuJoCo-Ant
2018-ICML-SAC-self Final MuJoCo-Ant
2018-ICML-TD3-office Initial MuJoCo-Ant
2018-ICML-TD3-office Final MuJoCo-Ant
2018-ICML-TD3-self Initial MuJoCo-Ant
2018-ICML-TD3-self Final MuJoCo-Ant
2021-ICML-REDQ-self Initial MuJoCo-Ant
2021-ICML-REDQ-self Final MuJoCo-Ant
2022-ICML-DroQ-self Initial MuJoCo-Ant
2022-ICML-DroQ-self Final MuJoCo-Ant