MIT Department of Electrical Engineering & Computer Science

E E C S

Neuro-Dynamic Programming

Dimitri P. Bertsekas
MIT, EECS and LIDS

Monday, April 1, 1996
4:00 PM (3:45 refreshments)
Edgerton Hall, Room 34-101
EECS Colloquium

Abstract

Deep Blue, a chess program, recently defeated the world champion, Gary Kasparov, in the first game of a 6-game match. TD-Gammon, a backgammon program, was given the game rules, and after training through self-play for several months almost beat a world champion in a long best match. These programs are based on the principle of evaluating positions by means of a scoring function and of selecting a move that leads to the position with the best score. Hardware speed was a key factor in the success of the chess program, but algorithmic sophistication in automatically constructing the scoring function was the key factor in the success of the backgammon program.

Neuro-dynamic programming provides an algorithmic and conceptual foundation for the type of decision making used in these programs. A scoring function is used to choose controls in complex dynamic systems, arising in a broad variety of applications from communication, control, engineering design, and operations research. The appropriate scoring function is "learned" (iteratively improved) using neural network-like approximation schemes and simulation/evaluation of the system's performance. Ideas of this type have been part of the methodology of reinforcement learning for a long time. However, these ideas were clarified and streamlined only recently by exploiting a strong connection with the dynamic programming methodology. The talk will overview recent developments and the contexts in which they can be applied.


URL of this page: http://www-eecs.mit.edu/AY95-96/events/35.html
Created: Mar 18, 1996  | Modified: Jun 25, 1997
This announcement is from the MIT EECS 1995-96 archive.  | Current events
To MIT EECS home page  | Your comments and inquiries are welcome.