A Contextual Bandit Approach to General Robot Intelligence with Commonsense Reasoning

Matthew McNeill, Fordham University

Abstract

Artificially intelligent assistive agents are playing an increased role in our work and homes. In contrast to currently predominant conversational agents, whose intelligence derives from a reliance on dialogue trees and functionality modules, an autonomous domestic or workplace robot must carry out more complicated reasoning with more exhaustive knowledge of its surroundings. For example, a construction or military robot could be brought to new locations about which it knows very little, but be expected to carry out important tasks with minimal supervision. Such a robot must make good decisions, learn from experience, respond to feedback, and rely on feedback only as much as is necessary. In this thesis, we narrow the focus of a robot assistant to a simple room-tidying task in a simulated domestic environment. Given an item, the robot must choose where to put it among many destinations, then receive feedback from a human operator if the operator chooses to leave it. We frame the problem as a contextual bandit, a reinforcement learning approach frequently used in recommendation systems. We evaluate accuracy and time to more correct than incorrect decisions when the human operator leaves feedback every episode to only 25% of episodes. We evaluate learning from episodes where the human does not leave feedback. Additionally, we maintain historical episode data, and remove episodes without feedback when they are no longer useful. To improve early-episode performance, we incorporate a priori knowledge into action selection through commonsense reasoning with ConceptNet. We show that combining these methods in epsilon-greedy action selection can increase early-episode accuracy from approximately 10% to 40%, reduce the number of episodes before a majority of decisions are correct by 15%-28%, and increase the cumulative reward at this point by 80%. Additionally, we show that replacing epsilon-greedy action selection with LinUCB can achieve total accuracy of approximately 90%, even when feedback is left only 25% of the time.

Subject Area

Artificial intelligence|Robotics|Computer science

Recommended Citation

McNeill, Matthew, "A Contextual Bandit Approach to General Robot Intelligence with Commonsense Reasoning" (2019). ETD Collection for Fordham University. AAI22618432.
https://research.library.fordham.edu/dissertations/AAI22618432

Share

COinS