Octopus Arm Control with Sparse Bayesian Reinforcement Learning


Yekutieli et al. "Dynamic Model of the Octopus Arm. I. Biomechanics of the Octopus Reaching Movement"
Journal of Neurophysiology, vol. 94, no.2, pp. 1443-1458, 2005.

Octopus arm problem is for learning to control the simulated octopus arm through viscous environment to achieve a certain goal. By controlling each muscle in individual compartment, the agent moves the tip of the arm close to the goal. This is very interesting problem because of its high-dimensional actions, non-linear dynamics, and large search space.

Method:

Our recent approach uses relevance vector machine to sample continuous actions from relevance vectors. As learning progresses, relevance vector sampling discovers continuous actions that maximize the estimated Q values. Following videos display the success learning the curling action.

    Relevance Vector Sampling for Reinforcement Learning in Continuous Action Space,
    Minwoo Lee and Chuck Anderson,
    The 15th IEEE International Conference on Machine Learning and Applications (IEEE ICMLA'16), December 2016


Single Goal Curling Action


After training for 10 episodes

After training for 35 episodes

After training for 70 episodes

After training for 100 episodes

After training for 150 episodes


Learned Policy


Alternating Goals on Each Episode: Touching the object with any part of trunk

The learning curve below shows the sum of rewards for each episode during training an agent. For training to handle changing goals, the goal is changed to above the arm and under the arm and the goal location is given as an input. The agent gradually learns the policy that can reach the different goal. The triangles are the position where the learned trajectories are sampled for the following animations.


After training for 10 episodes

After training for 11 episodes


After training for 24 episodes

After training for 25 episodes


After training for 50 episodes

After training for 51 episodes


After training for 150 episodes

After training for 151 episodes


Learned Policy Up

Learned Policy Down

links