MACHINE PREDICTS HUMAN BEHAVIOR IN VIDEO
Most of us can predict what will happen just after we see two people meet: a handshake, a punch, a hug, or a kiss. We’ve honed this ability through decades of experience in dealing with people. Our ‘intuition’ is thoroughly trained.
A machine, no matter how competently programmed, has trouble evaluating such complex information.
If computers, though, could predict human action reliably, they would open up a host of possibilities. We might wear devices that will suggest responses to differing situations. We might have emergency response systems to predict breakdowns or security breaches. Robots will better understand how to move and act among humans.
in June, M.I.T.’s Computer Science and Artificial Intelligence Laboratory (CSAIL) announced a huge breakthrough in the field. Researchers there developed an algorithm for what they call ‘predictive vision’. It can predict human behavior much more accurately than anything that came before.
The system was trained with YouTube videos and TV shows, including The Office and Desperate Housewives. It can predict when two characters will shake hands, hug, kiss, or ‘high five’. It also predicts what objects will appear in a video five seconds later.
Previous approaches to ‘predictive vision’ have followed one of two patterns. One is to examine the pixels in an image. From this data, the machine tries to construct a future image, pixel by pixel. MIT’s lead researcher in this project calls this process “difficult for a professional painter, much less an algorithm”.
The second approach is for humans to label images for the computers in advance. This is practical only on a very small scale.
MIT’s CSAIL team instead offered the machine “visual representations”. These were freeze-frame alternate versions of how a scene might appear. “Rather than saying that one pixel is blue, the next one is red… visual representations reveal information about the larger picture, such as a certain collection of pixels that represents a human face”, the lead researcher said.
CSAIL uses ‘neural networks’ to teach computers to scan massive amounts of data. From this, the computers find patterns on their own.
CSAIL trained its algorithm with more than 600 hours of unlabeled video. Afterward, the team tested it on new video featuring objects and human action.
Though CSAIL’s algorithm was not as accurate as humans in predicting human behavior, it is a huge advance over what came before. Very soon, it’s likely to outperform humans. When it does, its impact on our lives could be revolutionary.
(Editor’s note: machine learning is another term for artificial intelligence. The enclosed image is the cast of ‘The Big Bang Theory’.)
(Get the most out of information technology. Get the most out of your machines. For this, you need a strong web connection. Talk to us. We can help.)