Feature-Attending Recurrent Modules Facilitate Generalization Across Object-centric Tasks

1University of Michigan, 2DeepMind

Abstract

To generalize across object-centric tasks, a reinforcement learning (RL) agent needs to exploit the structure that objects induce. Prior work has either hard-coded object-centric features, used complex object-centric generative models, or updated state using local spatial features. However, these approaches have had limited success in enabling general RL agents. Motivated by this, we introduce “Feature-Attending Recurrent Modules” (FARM), an architecture for learning state representations that relies on simple, broadly applicable inductive biases for representing diverse object-induced spatiotemporal regularities. FARM learns a state representation that is distributed across multiple modules that select their own spatiotemporal observation features to update with. We hypothesize that distributing state across modules enables an RL agent to flexibly recombine its experiences for generalization. We study task suites in both 2D and 3D environments and find that FARM better generalizes compared to competing architectures that leverage attention or multiple modules.

Interpolate start reference image.

Overview of FARM. (a) FARM learns an agent state representation that is distributed across n recurrent mod- ules. (b) By distributing agent state across multiple modules, FARM is able to represent different object-centric task reg- ularities, such as navigating around obstacles or picking up goal keys, across subsets of modules. We hypothesize that this enables a deep RL agent to flexibly recombine its experience for generalization.

Interpolate start reference image.

We study three environments with different structural regularities induced by objects. In the Ballet environment, tasks share regularities induced by object motions; in the KeyBox environment, they share regularities induces by object configurations; and in the Place environment, tasks share regularities induces by 3D objects. The Ballet and KeyBox environments pose learning challenges for long-horizon memory and require generalizing to more objects. The KeyBox and Place environments pose learning challenges in obstacle navigation and requires generalizing to a larger map. We provide videos of our agent performing these tasks in the supplementary material.

BibTeX


    @article{carvalho2021feature,
      title={Feature-Attending Recurrent Modules for Generalization in Reinforcement Learning},
      author={Carvalho, Wilka and Lampinen, Andrew and Nikiforou, Kyriacos and Hill, Felix and Shanahan, Murray},
      journal={arXiv preprint arXiv:2112.08369},
      year={2021}
    }