The following videos show evolved waypoint seeking as individuals and teams:
The waypoint seeking scenarios were used to determine if robots that had already been trained in an obstacle avoidance behavior, could have their training modified to acheive additional tasks. For these scenarios, robots were encouraged to drive through all waypoints of the opposite color. Two different fitness scales were tested. In the individual waypoint seeking scenario, robots would receive points for every waypoint they crossed. In the team waypoint seeking scenario, robots would receive points for every waypoint one of their team members crossed. Scenario specifics are shown below each of the following videos.
Individual Waypoint Seeking Scenario: Robots were tasked with exploring the map to find waypoints.
Setup: The environment was divided into three sections. One section contained only red robots and red waypoints. A second section contained only green robots and green waypoints. The third section acted to create an open area between the other two.
Fitness Evaluation: Robots were encouraged to find all waypoints of the opposite color as quickly as possible. Every robot that drove sufficiently close to the appropriate waypoint was awarded an increase in fitness that was proportional to the amount of time left in that trial. When a robot discovered all five waypoints, its accumulated fitness value was doubled and the robot was removed from the map.
Quantitative Evaluation: The fitness plot is shown in the background of the video. Since robots had already been trained to explore their map, they did a reasonably good job at finding waypoints in the very first generation. For some reason, the performance dropped steadily for the first fifteen generations. One possible explanation for this decrease is that individuals, which evolved independently during the object avoidance task, were suddenly allowed to share genetic material. The different netork structures that formed in the separate populations were not necessarily compatible when combined, and could have pruduced a number of useless robots. This degredation seemed to stop after 15 generations. The robots then improved for a while, stabilized, and improved further until the session ended. The red plot shows the best performing robot from all populations, the green line shows the worst performing robot, and the blue line shows the average robot.
Qualitative Evaluation: A side effect of the waypoint positioning algorithm, was that all waypoints tended to appear in the centers of the corridors. Consequently, robots learned to move toward the center of the corridors when approaching waypoints, but would move to the left side of the corridors when approaching other robots, thus allowing safe passage. Most robots evolved to quickly eliminate all waypoints.
Team Waypoint Seeking Scenario: Robots were divided into red and green teams. When one robot on a team finds a waypoint, all robots on the team are rewarded. Robots do not directly gain an advantage over their teammates when they succeed, but they do directly suffer losses when they crash. This creates a situation where robots are conflicted between the selfless desire to accumulate points for their team and the selfish desire to be safe and have others accumulate points for them.
Setup: The environment was divided into three sections and object were placed the same way that they were in the Individual Waypoint-Seeking scenario.
Fitness Evaluation: Robots teams were encouraged to find all waypoints of the opposite color as quickly as possible. If any robot on a team discovers a waypoint, every living member of that team is rewarded with an increase in fitness that is proportional to the amount of time left in that trial, while every member (living or dead) on the opposite team receives a penalty equal to half the amount.
Quantitative Evaluation: The fitness plot is shown in the video. Red lines shows fitness of red robots, green lines show fitness for green robots, and blue lines show the average. Like in the previous scenario, average performance dropped early in the trial. In this case, the drop was much more significant, which may have been because the scenario created selection pressure for robots that moved slowly and safely. Although this behavior was beneficial to the individual, it was harmful to the team's ability to accumulate points. The green team was the first to show some recovery from the drop and dominated the field for about 45 generations. The red team then gained control and dominated through the remainder of the run. Following the initial drop in fitness, combined fitness for the two teams increased steadily through the experiment and did eventually exceed the initial values.
Qualitative Evaluation: Like in the previous experiment, early stages of this game were quite chaotic. After a hundred, or so iterations, however, the robots really appeared to develop a strategy. As mentioned, robots received a stiff penalty for dying, so most moved carefully when around other robots. Robots also seemed to hover around their own waypoints with the possible dual-goal of protecting their own life while blocking their waypoint from the opposite team. When robots would see an opening toward the opponent’s waypoint, they would race at full speed to intercept the waypoint.