Reinaldo Augusto da Costa Bianchi's Homepage

Heuristically Accelerated Reinforcement Learning

.the.project

Reinforcement Learning (RL) techniques have been attracting a great deal of attention in the context of robotic, control and AI systems. The reasons frequently cited for such attractiveness are: the existence of strong theoretical guarantees on convergence, they are easy to use, and they provide model-free learning of adequate control strategies. Besides that, they also have been successfully applied to solve a wide variety of control and planning problems.

However, one of the main problems with RL algorithms is that they typically suffers from very slow learning rates, requiring a huge number of iterations to converge on a good solution. This problem becomes worse in tasks with high dimensional or continuous state spaces and when the system is given sparse rewards. One of the reasons for the slow learning rates is that most RL algorithms assumes that neither an analytical model nor a sampling model of the problem is available a priori, when, in some cases, there is domain knowledge that could be used to speed up the learning process: ``Without an environment model or additional guidance from the programmer, the agent may literally have to keep falling off the edge of a cliff in order to learn that this is bad behavior'' (Hasinoff, 2003).

As a way to add domain knowledge to help in the solution of the RL problem, the Heuristically Accelerated Reinforcement Learning (HARL) algorithms were proposed in 2004 (Bianchi, 2004). These algorithms allows the use of heuristics to speed up well-known Reinforcement Learning algorithms, using a heuristic function that influences the choice of the actions. Several HRL algoritms have been proposed:

Heuristically Accelerated Q-Learning (HAQL), that allows the use of heuristics to speed up the well-known Reinforcement Learning algorithm Q-Learning.
Heuristically Accelerated Q-Lambda (HAQ-Lambda).
Heuristically Accelerated SARSA-Lambda (HA-SARSA-Lambda).
Heuristically Accelerated MINIMAX-Q (HAMMQ), that extends the MINIMAX-Q algorithm.

This project investigates the use of the HARL to speed up the learning process of several types of domains, including mobile robots acting in a unknown environment, teams of mobile autonomous robotic agents acting in a concurrent multiagent environment like the RoboCup 2D Simulator and FIRA MiroSot and SimuroSot.

.students

Three MsC Students are working on HRL now:

Luiz A. Celiberto Jr, working on applying HRL to RoboCUp Simulaton 2D agents.
Murilo Martins Fernandes, working on HRL applied to FIRA MiroSot Physical Soccer Robots.

.project.publications

List with all my publications can be found here.

Papers in Journals

BIANCHI, R. A. C.; RIBEIRO, Carlos Henrique Costa; COSTA, Anna Helena Reali. Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics, Springer-Verlag, 2008. (draft here)

Papers in International Conferences

BIANCHI, R. A. C. ; ROS, R. ; MANTARAS, R. L. Improving Reinforcement Learning by using Case Based Heuristics. In: International Conference on Case-Based Reasoning, 2009, Seattle. Lecture Notes in Artificial Intelligence. Berlin : Springer, 2009. v. 5650. p. 75-89. (draft here)

BIANCHI, R. A. C. ; Ribeiro, Carlos H. C. ; COSTA, Anna Helena Reali. On the relation between Ant Colony Optimization and Heuristically Accelerated Reinforcement Learning. In: International Workshop on Hybrid Control of Autonomous Systems - IJCAI 2009 Workshop, 2009, Pasadena. Proceedings of the International Workshop on Hybrid Control of Autonomous Systems, 2009.
BIANCHI, R. A. C. ; MANTARAS, R. L. Should I trust my teammates? An experiment in Heuristic Multiagent Reinforcement Learning. In: IJCAI 2009 Workshop on Grand Challenges for Reasoning from Experiences, 2009, Pasadena. Proceedings of the IJCAI-09 Workshop on Grand Challenges for Reasoning from Experiences, 2009.

BIANCHI, R. A. C.; RIBEIRO, Carlos Henrique Costa ; COSTA, Anna Helena Reali. Heuristic Selection of Actions in Multiagent Reinforcement Learning. In: International Joint Conference on Artificial Intelligence - IJCAI, 2007, Hyderabad. Proceedings of IJCAI 2007, 2007. (abstract here).

CELIBERTO JÚNIOR, Luiz Antônio ; RIBEIRO, Carlos Henrique Costa ; COSTA, Anna Helena Reali ; BIANCHI, R. A. C. Heuristic Reinforcement Learning Applied to RoboCup Simulation Agents. In: RoboCup 2007: Robot Soccer World Cup XI, 2008, Atlanta. Lecture Notes in Artificial Intelligence. Berlin : Springer, 2008. v. 5001. p. 220-227. (draft here)
CELIBERTO JÚNIOR, Luiz Antônio ; Matsuura, Jackson Paul ; BIANCHI, R. A. C. Heuristic Q-Learning Soccer Players: A New Reinforcement Learning Approach to RoboCup Simulation. In: Portuguese Conference on Artificial Intelligence, 2007, Guimarães. Lecture Notes in Artificial Intelligence. Berlin : Springer, 2007. v. 4874. p. 520-529.
BIANCHI, R. A. C. ; RIBEIRO, C. H. C. ; COSTA, A. H. R. Heuristically Accelerated Q-Learning: a New Approach to Speed Up Reinforcement Learning. Lecture Notes in Artificial Intelligence. Springer Verlag, Berlin, Heidelberg, 2004, vol. 3171, p. 245-254. (draft version can be found here).

Papers in National Conferences (in Portuguese)
CELIBERTO JÚNIOR, Luiz Antônio ; MARTINS, Murilo Fernandes ; BIANCHI, R. A. C. ; Matsuura, Jackson Paul . Utilizando transferência de conhecimento para acelerar o aprendizado por reforço. In: Simpósio Brasileiro de Automação Inteligente, 2009, Brasília. Anais do IX Simpósio Brasileiro de Automação Inteligente, 2009.
MARTINS, Murilo Fernandes; BIANCHI, R. A. C. . Comparação de Desempenho de Algoritmos de Aprendizado por Reforço no Domínio do Futebol de Robôs . In: VIII Simpósio Brasileiro de Automação Inteligente, 2007, Florianópolis. Anais do VIII Simpósio Brasileiro de Automação Inteligente. Florianópolis : SBA, 2007.
CELIBERTO JÚNIOR, Luiz Antônio; BIANCHI, R. A. C. ; MATSUURA, Jackson P. Aprendizado por Reforço Acelerado por Heurísticas no Domínio do Futebol de Robôs Simulado . In: VIII Simpósio Brasileiro de Automação Inteligente, 2007, Florianópolis. Anais do VIII Simpósio Brasileiro de Automação Inteligente. Florianópolis : SBA, 2007.
BIANCHI, R. A. C. Uso de heurísticas para a aceleração do aprendizado por reforço. São Paulo, 2004. Tese (Doutorado) - Escola Politécnica, Universidade de São Paulo. (abstract here).
BIANCHI, R. A. C.; COSTA, Anna Helena Reali. . The use of heuristics to speedup Reinforcement Learning . Boletim Interno, No. BT/PCS/0409. Escola Politécnica da USP, São Paulo, 2004.
CELIBERTO JÚNIOR, Luiz Antônio ; BIANCHI, R. A. C. . Extração de estruturas do ambiente para aceleração do aprendizado por reforço em uma aplicação de robótica móvel. In: Congresso Brasileiro de Automática, 2006, Salvador. Anais do XVI Congresso Brasileiro de Automática. Salvador : SBA, 2006. v. 1. p. 2387-2392.
CELIBERTO JÚNIOR, Luiz Antônio ; BIANCHI, R. A. C. . Aprendizado por Reforço Acelerado por Heurística para um Sistema Multi-Agentes. In: III Workshop de Teses e Dissertações em Inteligência Artificial, 2006, Ribeirão Preto. Anais do International Joint Conference X Ibero-American Conference on Artificial Intelligence, XVIII Brazilian Symposium on Artificial Intelligence and IX Brazilian Symposium on Neural Networks. Ribeirão Preto : USP, 2006.
BIANCHI, R. A. C. ; COSTA, Anna Helena Reali . Uso de Heurísticas para a Aceleração do Aprendizado por Reforço. In: V Concurso de Teses e Dissertações em Inteligência Artificial, 2006, Ribeirão Preto. Anais da International Joint Conference X Ibero-American Artificial Intelligence Conference, XVIII Brazilian Artificial Intelligence Symposium and IX Brazilian Neural Networks Symposium. Ribeirão Preto : USP, 2006.
BIANCHI, R. A. C. ; COSTA, Anna Helena Reali . Uso de Heurísticas para a Aceleração do Aprendizado por Reforço. In: XVIII Concurso de Teses e Dissertações - XXV Congresso da Sociedade Brasileira de Computação, 2005, São Leopoldo. Anais do XXV Congresso da Sociedade Brasileira de Computação - Concurso de Teses e Dissertações. Porto Alegre : Sociedade Brasileira de Computação, 2005. v. 1. p. 130-139.

created february 15th, 1994
last updated february 27th, 2007
webdesign by tri-star