Week 11 SOLUTION: Associative Learning, Estimating Rewards, Action Policies

  due date: Tue Apr 4 by 9:00 PM
  email to: mcb419@gmail.com
  subject: hw11
  email contents:
    1) jsbin.com link to your project code
    2) answer all the questions at the bottom of this page in the email
  

Introduction

This assignment combines elements of associative learning (associating pellet color with reward ), estimating reward values using the delta rule, and implementing action policies based on estimated reward values. In this assignment, a single bot forages for RED, GREEN and BLUE pellets. The different colors will have different reward values. Your bot needs to learn the expected value of the different colors, and implement an efficient foraging strategy using that information. The objective is to collect as much energy as possible in a fixed time period (2000 ticks).

Pellets:
pellets - 10 each of red, green, and blue; randomly distributed; can be detected at a distance;
pellet values - pellet colors are randomly assigned to 3 categories: Best, Neutral, Worst
-- Best: 90% of pellets return a reward of +4, 10% return a reward of -4
-- Neutral: 50% return +4, 50% return -4
-- Worst: 10% return +4, 90% return -4
Bot sensory inputs:
bot.sns.left/right = a 1-d array [snsR, snsG, snsB] returning the sensed intensity for each pellet color (Braitenberg-style);
bot.sns.collision = true when the bot hits a boundary; false otherwise
bot.sns.deltaEnergy = energy gained on previous time step
bot.sns.lastColorConsumed = a string ("red", "green", "blue") indicating the color of the last pellet consumed
Bot motor output:
bot.mtr.left/right = motor velocity (Braitenberg-style);
Controllers:
seekRed - seeks red pellets, ignores other colors
seekGreen - seeks green pellets, ignores other colors
seekBlue - seeks blue pellets, ignores other colors
seekAll - seeks all pellets by using sum of R,G,B sensors
seekUser - this is the controller that you will develop
 


Estimated red:  
Estimated green:
Estimated blue: 

Instructions

First, run the provided controllers and understand how they work. Next, using the seekAll controller for testing, modify the bot.prototype.updateEstimates method to updated the bot's estimatedValue array as the bot consumes pellets. This array has three elements for the estimated values of red, green and blue, respectively. The values will be displayed automatically to the right of the canvas.

Once your estimates are being computed correctly, then write your own controller code bot.prototype.seekUser to use this information in a way that optimizes foraging performance. You should be able to reliably achieve scores over 200.

Questions:

(provide answers in the body of your email)
  1. Given the reward probabilities specified above, what is the expected reward for each color category?
    BEST: 3.2, NEUTRAL: 0.0, WORST: -3.2
    [3.2 = 0.9 * (4.0) + 0.1 * (-4.0)]
  2. What learning rate did you use for your delta rule? How did you select this value?
    I picked 0.2, which gave reliable convergence to the target values listed above within 2000 ticks when using seekAll.
  3. Briefly describe the controller strategy that you implemented. How did you use the estimated reward values to control the bot behavior?
    My controller finds the color with the max expected value and pursues that color with a tropotaxis strategy.
  4. How well does your controller perform relative to the best single-color controller (e.g. seekCOLOR)? What do you think accounts for the difference in performance?
    My seekUser controller achieved a mean fitness of around 240, whereas the best single-color controller achieved a mean fitness of around 260. The difference is likely due to the initial time required to learn the relative pellet values.

Results Table

ControllerFitness
mean (std dev)