Periodic Reporting for period 4 - ResiBots (Robots with animal-like resilience)
Período documentado: 2019-11-01 hasta 2020-10-31
The objective of the ResiBots project is to introduce a powerful and general approach for failure recovery based novel trial-and-error learning algorithms (reinforcement learning). The main challenge is to design learning algorithms that can learn in a few minutes (a dozen of trials), instead of days and thousands of trials for traditional algorithms. Our main insight is that we can leverage the model of the intact robot as a prior for a data-driven model of either the dynamics or the expected cumulative reward; these models can be exploited to make reinforcement learning much more data-efficient.
Our second achievement is the ""Reset-Free Trial & Error"" algorithm(RTE), which extends the ideas introduced in IT&E but make them usable in real-life scenarios: instead of using learning episodes, which always start from the same state, RTE allows a mobile robot to ""learn while doing"" without any reset. Concretely, the robot takes the environment into account to choose control policies that are likely to help it to achieve it task, while improving its predictions about the outcome of each possible policy. This algorithm was tested on a 6-legged walking robot, which was able to learn from its mistake and reach target points in the environment in spite of a missing leg.
Both the IT&E and RTE algorithms critically rely on another new algorithm, called MAP-Elites (and its extension CVT-Map-Elites). MAP-Elites is a novel kind of evolutionary algorithm that does not attempt to find the optimum of a function, but instead searches for a diverse set of high-performing solutions (e.g. 10000 solutions that are all different but all high-performing). This algorithm opened many new research avenues for evolutionary computation and is part of a new class of algorithms called ""illumination algorithms"" or ""quality diversity algorithms"".
Our fourth achievement is the ""Black-box Data-Efficient Robot Policy Search (Black-DROPS)"" algorithm, which is a model-based reinforcement learning algorithm that is (1) highly flexible (which makes it easy to adapt to many problems/robots) and (2) highly parallelizable (which makes it possible to exploit multi-core computers). This algorithm was successfully tested on a robotic manipulator and on our 6-legged robot. Depending on the hypotheses, it can usually learn policies by trial-and-error in less than 10 episodes.
All these algorithm have been implemented in C++11 within our generic, open-source framework called Limbo (https://github.com/resibots/limbo). Limbo implements fast Gaussian processes and state-of-the-art optimization algorithms."
The ResiBots project goes beyond the state-of-the-art by proposing a new, general approach for damage recovery that does not require any diagnosis of the failure. To do so, the ResiBots project introduced several trial-and-error algorithms that allow damaged robots to discover compensatory behaviors in less than 2 minutes (a dozen of trials only), compared to hours or often days with traditional reinforcement learning algorithms. All of them leverages models based on Gaussian processes and a simulator of the intact robot as a prior.
Overall, the ResiBots project leverages data-efficient reinforcement learning to make robots more resilient."