MIT Department of Electrical Engineering & Computer Science
Neuro-Dynamic Programming
Dimitri P. Bertsekas
MIT, EECS and LIDS
Monday, April 1, 1996
4:00 PM (3:45 refreshments)
Edgerton Hall, Room 34-101
EECS Colloquium
Abstract
Deep Blue, a chess program, recently defeated the world
champion, Gary Kasparov, in the first game of a 6-game match.
TD-Gammon, a backgammon program, was given the game rules, and after
training through self-play for several months almost beat a world
champion in a long best match. These programs are based on the
principle of evaluating positions by means of a scoring function and
of selecting a move that leads to the position with the best score.
Hardware speed was a key factor in the success of the chess program,
but algorithmic sophistication in automatically constructing the
scoring function was the key factor in the success of the backgammon
program.
Neuro-dynamic programming provides an algorithmic and conceptual
foundation for the type of decision making used in these programs. A
scoring function is used to choose controls in complex dynamic
systems, arising in a broad variety of applications from
communication, control, engineering design, and operations research.
The appropriate scoring function is "learned" (iteratively improved)
using neural network-like approximation schemes and
simulation/evaluation of the system's performance. Ideas of this type
have been part of the methodology of reinforcement learning for a long
time. However, these ideas were clarified and streamlined only
recently by exploiting a strong connection with the dynamic
programming methodology. The talk will overview recent developments
and the contexts in which they can be applied.
URL of this page:
http://www-eecs.mit.edu/AY95-96/events/35.html
Created: Mar 18, 1996
|
Modified: Jun 25, 1997
This announcement is from the MIT EECS 1995-96 archive.
|
Current events
To MIT EECS home page
|
Your comments
and inquiries are welcome.