Buy F&SF • Read F&SF • Contact F&SF • Advertise In F&SF • Blog • Forum

July/August 2011
 
Book Reviews
Charles de Lint
Elizabeth Hand
Michelle West
James Sallis
Chris Moriarty
 
Columns
Curiosities
Plumage from Pegasus
Off On a Tangent: F&SF Style
 
Film
Kathi Maio
Lucius Shepard
 
Science
Gregory Benford
Pat Murphy & Paul Doherty
 
Coming Attractions
F&SF Bibliography: 1949-1999
Index of Title, Month and Page sorted by Author

Current Issue • Departments • Bibliography

Science
by Pat Murphy & Paul Doherty

PATTERN RECOGNITION, RANDOMNESS, AND ROSHAMBO

Pat is sixty rounds into a contest with a computer. Right now, Pat has 20 wins, the computer has 20 wins, and the two have tied 20 times. If there were a watching crowd, they'd be on the edge of their seats.

The game is roshambo, also known as rock-paper-scissors. Yes, this is the classic schoolyard hand game: you hold out rock (a fist) or paper (an open hand) or scissors (index and middle finger extended). Rock breaks scissors; scissors cut paper; paper wraps rock.

Pat's competition in roshambo, following in the wake of the recent Jeopardy! win by a computer named Watson, has us thinking about that staple of science fiction: the sentient computer. More specifically, we've been thinking about computers, language, pattern recognition, and randomness.

HAL AND HIS PALS

As a reader of science fiction, you can probably think of a few sentient computers without breaking a sweat. Way back in 1946, Murray Leinster wrote about one in a short story titled "A Logic Named Joe." Since then, there have been many others, including Mike in Heinlein's The Moon Is a Harsh Mistress, HAL 9000 in 2001: A Space Odyssey, and the ship's computer in Star Trek. Today, fictional artificial intelligences (be they evil, misguided, or benign) are common.

Generally, all these fictional entities can meet the Turing test, proposed as a test of machine intelligence by Alan Turing back in 1950. A machine passes the test if a human judge can have a conversation with a machine and a human—and can't tell which is human. Basically, the machine has to converse in natural language, a complicated task that takes humans years to learn.

Computers aren't quite there yet. But researchers in computational linguistics, a field that combines computer science and linguistics, are bringing them a lot closer. Consider, for example, Watson.

Four years ago, researchers at IBM set out to make a "question answering machine" that understands questions posed in natural language and answers them. The result of their work is Watson, the computer that bested two Jeopardy! champions at their own game.

The team at IBM doesn't say that their creation actually thinks. Their goal was to build a system that can find answers in "unstructured data," a category that includes texts of all kinds. Reference books, novels, encyclopedias, plays, blogs, textbooks, restaurant menus, dictionaries, fanzines, technical papers, Web pages—they're all unstructured data.

For the past four years, IBM researchers have been shoveling mountains of unstructured data into the Watson system. According to the folks at IBM, Watson can hold the equivalent of one million books' worth of information. (Yes, we know that's not a rigorous measure. This is not a rigorous column.)

Computers have never been good at finding answers in unstructured data. Sure, they can search the data for key words and refer you all the places those words are in close proximity. But as any user of Google knows, the results of a key word search are often nowhere near an actual answer.

Watson's abilities in Jeopardy! reveal a facility with natural language that's pretty amazing. After all, Jeopardy! answers not only require encyclopedic knowledge—they often involve puns and wordplay. (Consider the answer: "Alex Comfort's romantic how-to guide for Johnny Rotten & Sid Vicious's band." The question was: "What is The Joy of Sex Pistols?")

Ask Watson a Jeopardy! answer in search of a question and more than 100 algorithms come into play simultaneously. An algorithm is a step-by-step formula for solving a particular problem. A recipe for a cake can be considered a kind of algorithm. The recipe, like other algorithms, has a series of unambiguous steps and a clear end point.

All Watson's algorithms work with the information in the computer's database, generating answers (in the form of a question since Watson is playing Jeopardy!). Then another set of algorithms comes along to rank these answers, determining which one is most likely to be right. Finally, one more set of algorithms determines whether the top ranking answer is ranked high enough to risk a bet. If the answer with the best ranking isn't rated high enough, Watson won't push that buzzer and risk losing money.

In its competition with Jeopardy! champions Ken Jennings and Brad Rutter, Watson emerged victorious, earning $77,147, versus $24,000 for Jennings and $21,600 for Rutter. Does that make Watson intelligent? Hard to say. Depends on your definition of intelligence, but however you look at it, Watson is a major step toward the creation of a system that can actually provide answers to questions—and maybe chat with the questioner about how much conviction the system has that the answer is correct.

PATTERNS OF DECEPTION

The team that created Watson focused on dealing with natural language and finding answers in unstructured information. To put it in high-minded terms, they were searching for truth.

Another group of researchers come at natural language from a different angle. Their efforts involve analyzing unstructured information (mostly in the form of email) and looking for lies.

Computational linguistics research into deception got a boost in 2003 from the Enron scandal. If you don't remember the story, here's a Cliff's Notes version. Enron, an energy company run by a bunch of guys who thought they were really smart, used accounting tricks and loopholes to hide billions in debt from failed deals and projects, leading to stockholder losses of 11 billion dollars, the largest bankruptcy ever up to that point, and the biggest audit failure. The government case against Enron involved a collection of more than five million email messages. In 2003, the Federal Regulatory Agency made that email database available.

For computational linguists, this was a bonanza. Before Enron, no large database of email was available. After all, nobody wanted researchers snooping in their private stuff. Working with the Enron emails, researchers found it was possible to figure out a great deal about what a group was up to by analyzing patterns of emailing and word usage—even if you never actually read the mail itself. When people lie, they are thinking about both the real events and the lying version. The language they use reveals the tension between those different versions and doesn't follow the writer's normal patterns.

Research into the Enron database has lead to the development of software that sorts through email, looking for possible lies. At their most basic, these e-discovery programs search for relevant words in association with each other. But that's just the beginning. More sophisticated programs seek other patterns. They look at who usually communicates with whom, how often, and by what channels.

Having found those patterns, the software looks for breaks in those patterns, so-called "digital anomalies" that may indicate that something funny is going on. A change in communication mode can be a red flag—if two people who always communicate on email decide to meet in person, maybe they have something to hide. When someone who always writes in a breezy style (yes, the software can identify that) shifts to very formal language, maybe something fishy is going on. When a sensitive document is edited an unusual number of times by an unusual number of people, maybe something is up.

By observing patterns, the software pinpoints possible areas of deception. The resulting insights can bear a startling resemblance to those of Sherlock Holmes, Columbo, House, or Monk (depending on your reading and viewing habits).

NO ONE KNOWS WHAT'S NEXT

That brings us to an ability that Pat and Paul think is just as much a human characteristic as natural language (and a lot more important to winning at roshambo). That is trait is pattern recognition.

People and many other animals are really good at finding patterns. It's a survival trait. Recognizing patterns helps you figure out what is likely to happen next—and that can be very useful. If a rat eats a food that makes him sick, he avoids anything that tastes like that food in the future. If you notice that one road home from work is always jammed at rush hour, you'll avoid that route. You look for patterns and use the patterns you find in your decision-making. And that brings us back to roshambo.

Winning at roshambo is all about figuring out what your opponent is going to do next. Suppose you are playing a match where the victor is whoever wins 3 out of 5. Before you throw the first hand-sign, you think about what your opponent is likely to throw. Is this someone who's going to throw a rock? Or is this more of a paper kind of guy? (Just for reference, Douglas Walker of the World RPS Society has been known to offer a guide to Rock-Paper-Scissors personality types: Muhammad Ali, rock; Mohandas Gandhi, paper; Leonardo da Vinci, scissors.)

Once the match is underway, you look for patterns in your opponent's play. If you see his pattern, you can figure out what he will throw next. Does he favor rock over scissors? Does he tend to throw a sign that would beat what you threw on the previous turn? At the same time you must be aware of your own patterns—and try to be unpredictable.

Now let's consider Pat's opponent at Rock-Paper-Scissors. The computer player is located at the New York Times website (http://www.nytimes.com/interactive/science/rock-paper-scissors.html). You can choose to face a novice computer, who has no memory of previous games. Or you can choose to battle a veteran computer who has memories of 200,000 games. Pat, being an overachiever from way back, chose to battle the veteran.

Roshambo theorists (yes, they're out there) note that it is advantageous to recognize and exploit the nonrandom behavior of an opponent. The veteran computer knows human patterns from those 200,000 past games. So the path to triumph seems simple: Be random.

It turns out that's not as easy as you might think.

HEADS OR TAILS

There are several things mathematicians look for in a random sequence of numbers. The most important is the lack of an obvious pattern. In addition mathematicians often impose the requirement that the numbers must appear an equal number of times over a very long sequence.

People are terrible at creating a series of numbers without a pattern. People tend to emphasize the equal distribution of numbers over much too short a number of repetitions.

To see what we mean, try this. Using just zero and one, write down a sequence of 128 numbers, trying to be random. Now make a second sequence of zeros and ones by flipping a coin 128 times. A head is one and a tail is zero.

Compare the sequences. Chances are long strings of the same digit are more frequent in the sequence generated by flipping a coin. People usually won't write down long enough strings of the same digit to match the random pattern created by the coin. A computer predicting what will come next in your sequence can gain an edge by betting that after three ones in a row the human is more likely to put a zero than random chance would.

To help people understand randomness, Paul does the following demonstration whenever he is working with a large audience. Let's say there are 256 people. He has them all stand up, and then asks each person take out a coin and flip it. Any one who gets a tail has to sit down. Then everyone who is still standing flips the coin again.

After one flip, 128 people are left standing, on average. After two flips, 64 people are standing. After five flips, 8 people remain standing.

At this point, Paul asks the audience what will happen on the next flip. Some will say that since these 8 coins have come up heads five times in a row then "by the law of randomness" they are due for a 0.

Not so. An important property of a random number sequence is that the next number in a sequence does not depend on the previous number. Each flip of the coin or roll of the dice is independent of the ones that went before. The coins do NOT remember the past and are equally likely to come up heads as well as tails on EVERY flip.

One way to create a random number sequence is with a true random number sequence based on random physical events. Historically coin flips or dice rolling have been used. More recently radioactive decay, atmospheric radio noise, videos of lava lamps, and the time in microseconds between mouse moves by a computer operator have been used to create true random sequences of numbers.

Paul recommends using random numbers to play against the computer with no pattern at all. You can generate a random pattern of rock, paper, or scissors by rolling a six-sided die. If you roll 1 or 4, you throw rock. A roll of 2 or 5 means paper. And 3 or 6 means scissors. There is also a free online random number generator service (http://www.random.org/integers/). It allows you to specify the range of integers you want. Choose 1, 2, and 3 and it will produce a string as long as you want of rock, paper, scissors moves.

PAT VERSUS THE COMPUTER

So what happened to Pat's tied game? It timed out while Pat was writing this article. In that game, Pat was playing just as she would against a human opponent—trying to throw something her opponent would not expect.

For the next game, Pat tried the the random approach, using numbers from an online random number generator. After 60 rounds, the score was 17 wins by the random number, 21 ties, and 21 wins by the computer.

Then Pat tried to predict what the computer would throw by thinking about what most people would throw in a particular situation—and throwing something different. At sixty rounds, the result was not significantly different: 18 wins by Pat, 20 ties, and 21 wins by the computer. In a subsequent try, Pat did better. Her best was 25 wins by Pat, 19 ties, and 16 wins by the computer. But she couldn't maintain that—the computer beat her the next time around.

So there wasn't a significant difference in score when Pat tried to predict what the computer would throw. But Pat says there was a significant difference in her attitude. Throwing the random numbers was boring. On the other hand, trying to predict the computer was far from dull. In fact, it was kind of like doing mental gymnastics. At times, Pat swore she was in the groove—she knew just what that computer thought she was going to do—and she did something else and won. Such a glorious feeling. And then the run would end.

And when it ended, she would think about pigeons.

Not just any pigeons. She thought about the pigeons used in psychological experiments by noted psychologist B. F. Skinner. Skinner set an automatic food dispenser to drop food pellets to a group of pigeons. The food dispenser was on a completely random schedule. No one, not even the experimenter, knew when the pigeons would be fed.

And here's what happened. Whatever a pigeon was doing just before the food arrived became associated with the food. In an effort to entice more food from the dispenser, the bird did more of whatever it was doing when the pellet came bouncing down the chute. A bird that was bobbing its head started bobbing its head even more. A bird that was pecking pecked more.

Skinner dubbed this "superstitious behavior," a term that should give pause to anyone with a "lucky" shirt or hat. When a run of good throws ended, Pat was aware that she might be acting like a pigeon in a Skinner box. Animals (including people) are so good at finding patterns that sometimes we find patterns that aren't really there.

AND THE MORAL IS

In the end, this column has had the desired effect. Pat has stopped playing Rock-Paper-Scissors with the computer, a pastime that could easily have become an obsession. Why so obsessive? Because searching for and finding patterns is great fun. Not only is it a tool for survival, it's a fundamental part of trying to figure out how the world works.

Seeking and recognizing patterns is fun, and so is messing with and breaking out of patterns, looking beyond the expected and finding something new.

To anyone concerned about world domination by our computer overlords, we say, "Just chill." Thanks to the efforts of human researchers in computational linguistics, we anticipate that computers will get better and better at working with unstructured data, with using natural language, with finding and exploiting patterns. But no matter how good they get at spotting patterns, we think that people will always have them beat when it comes to breaking out of patterns and going beyond them. No matter how good Watson gets at coming up with answers, people will still be better at coming up with questions that stretch the boundaries.

For us meat puppets, the future is in creativity and imagination, in thinking outside the box. Even when computers can think, they can't think out of the box. They are the box.

--------

The Exploratorium is San Francisco's museum of science, art, and human perception—where science and science fiction meet. Paul Doherty works there. Pat Murphy used to work there, but now she works at Klutz (www.klutz.com), a publisher of how-to books for kids. Pat's latest novel is The Wild Girls; her latest nonfiction title is The Klutz Guide to the Galaxy, which comes with a sundial and a telescope that you can put together. To learn more about Pat Murphy's writing, visit her website at www.brazenhussies.net/murphy. For more on Paul Doherty's work and his latest adventures, visit www.exo.net/~pauld.

===THE END===

To contact us, send an email to Fantasy & Science Fiction.
If you find any errors, typos or anything else worth mentioning, please send it to sitemaster@fandsf.com.

Copyright © 1998–2014 Fantasy & Science Fiction All Rights Reserved Worldwide

Hosted by:
SF Site spot art