Thursday, July 31, 2008

Sources of Power

Interesting points from Sources of Power - How People Make Decisions, by Gary Klein.

The author tries to avoid advocating any single approach towards decision making, but instead studies how people make decisions in various circumstances. Knowledge of decision making in field is at least one take away from the book. Besides that the author analyzes the decision making process based entirely on rigorous research of real people and circumstances and presents his results. It's up to the user to draw any strong inferences or lessons from these results. The author simply ensures that the research is rigorous and relevant.

I've only read first half of the book so far. The book begins by dispelling the common belief that most decision making is comparative analysis based. In reality, most decisions, and important ones (examples cited are from military, fire fighters, ICU nurses, etc.), are of type RPD (Recognition Primed Decision). These are decisions which are essentially sequential in nature where the decision maker considers a list of options sequentially and picks the first one he likes. In fact, most such decision makers believe that they are not actually making a decision. In contrast, most outside observers get the impression that the decision maker actually made a "decision" (meaning decided based on comparative analysis).

According to author's survey of few hundred important decisions taken by individuals in serious and/or time-pressure circumstances about 80% fall into the RPD category. There are several scenarios where RPD is not the primary decision process, e.g., RPD is based on recognition for which an essential requirement is subject expertise. So when novices are making decisions they tend to be more comparative analysis based than RPD. Similarly, when the decision needs to be justified, e.g., to a higher authority, comparative analysis based decisions are more common. There are several other interesting statistics in the book regarding how the contribution of different decision making approaches varies by scenario.

Once the plenitude of RPD is established, the book delves into the details of how RPD works and what are the various components of the RPD process. Probably the most significant model the book introduces is that of simulation. Simulation is useful in two scenarios:
  1. In the scenario recongnition phase where the available data is matched with past experience to decide which scenario we are in. Simulation helps by arranging and explaning a set of sequence of events such that the end result is the current scenario. This knowledge of the past, or what brought us here, is often critical in deciding where we are headed in the future. The author also introduces the notion of "expectancies" - once a simulation has been made to arrive at our current position, the decision maker can generate a set of expectancies and watch out for them in the future. If certain expectancies do not happen, or certain unexpected things keep happening, then that might be a good reason to worry that the simulation (and hence our perception of the current scenario and future) may be incorrect. The author amply illustrates this point using the example of the Libyan passenger airliner whose pilots failed to verify the expectancies, the fire fighter who figured that something was not right and pulled his team out of the house just before the entire first floor went crashing into the basement (he was not told that the house had a basement), etc.
  2. Simulation is also fundamental in predicting and analyzing the effect of the decision. The power to do useful and realistic simulations can be very useful in making important decisions. Future simulation, like other aspects of RPD (like scenario recognition) require a good deal of experience. However, in most cases creative ideas can play an important role in the simulation - coming up with a creative insight into a problem and being able to simulate its effect can be very valuable in solving difficult problems. The author gives the example of the newborn baby with a blocked throat, the fire fighter who chose to make a rescue by removing the car's roof, etc.
The second half of the chapter on simulation is very interesting. Here the author makes some summary observations based on experiences drawn from different experiments. One of the topics covered here is about expertise. Who is an expert ? How do experts use and leverage simulation, etc. Here are some points:
  • Experts are constantly improving themselves by doing simulations, mental or aided by paper, computer, etc.
  • Pre-mortem (crystal ball). The author defines pre-mortem as the process of assuming that something failed even before actually doing it and then trying to figure out how the failure might have happened. In their experience, they found that pre-mortem often generates new/interesting insight into the problem. It's called the crystall ball approach also since a powerful analogy is that of looking at the future in a crystal ball and finding that something failed.
  • Post-mortem. The author uses this concept in a more general sense. They do a postmortem of a task after the task but before results are out and then after results are out.
  • An important trait of experts is the gut to doubt the data. This is highlighted using numerous examples throughout the book. The data available for making decisions may not always be accurate or timely. In such scenarios, the decision maker has to leverage his past experience not only for decision making, but also for weeding out inconsistencies in the data or outright doubting the data. In some ways, this is a higher form of expertise. There are several examples of this, e.g., the firefighter doubting their knowledge of the fire and pulling out the entire team before the floor collapsed into the basement, or the Naval ship crew not doubting the fact that a passenger airliner was emitting millitary frequencies making them believe that it's a millitary aircraft, etc.
The Vincennes shootdown is one of the classic examples of decision making going wrong under complex scenarios. The incident is but a combination of multiple smaller incidents and provides valuable insight into decision making. For one, half of the ship crew had already made up their mind that the object they are handling is really a missile and they were looking for reasons to justify it. This shows how individuals often make gut decisions subconsiously and then attempt to justify them by contorting evidence. Secondly, the ship crew did not doubt inconsistent, unlikely data, most importantly the millitary frequency coming from the passenger airliner was due to an error in placing the instrument, it was not double checked. Thirdly, it shows that under severe time pressure, better decision support systems are needed in order to avoid making wrong decisions, e.g., in this case better visualization and analysis tools.

Wednesday, July 09, 2008

Moments of genius

Dixie [sitting at front passenger's seat in SS's car talking to Rai sitting on back seat, the seat next to Rai is empty]: You know, many other people exist in higher dimensions, like the 11th dimension. For example, someone could be sitting right next to you at this very moment. [Looking at the empty seat next to Rai] Hello hi.

Dixie: Maybe you will run so fast, that you will walk on water.

Dixie: Let's discuss some new topics, like keeping quiet.

Dixie: You want to get high... eventually.

Dixie: You are wasting my _valuable_ time.

Pande: If you fly perpendicular to time, you'll find Floyd somewhere.

Dixie: We have three options, and the fourth option is ...

SS: Beauty lies in the eye of the upholder.

"Dixie: Kya khaun, heart attack khaun ki blood pressure khaun.
SS: Liver disease khao."

Dixie: Use your muscles... dead.

Rai: Dixie equals Airport Blvd equals Dixie.

Rai: Where is whose car ?

Saturday, July 05, 2008

NLP based search

I am getting increasingly interested in Natural Language Processing (NLP) these days. NLP can enable better human computer interfaces, powerful search engines, etc. One of the search startups in this area that I have been following is www.powerset.com which was recently acquired by Microsoft. A good source to learn about powerset and a rought technical overview is at http://www.slate.com/id/2193837/.

Powerset's NLP technology breaks a sentence into smaller entities (nouns, verbs, adjectives, etc.) and establishes relationships between them, e.g., "eiffel tower was built in 1889" gets recorded as "eiffel tower" (noun) "built" (verb) 1889 (noun). Each such relationship (called "fact") is recorded and comprises a single quantum of information derived from the web page. A search query is translated into a similar, but incomplete fact, e.g., "when was eiffel tower constructed?" would become "eiffel tower" (noun), "constructed" (verb), and "when/year/time/date" (noun/adjective). The search algorithm then matches the "factualized" query to the closest resembling fact and fills in the missing details (the year 1889 in this case).

The cool thing about converting content and queries to facts is that the search engine can identify and return relationships not explicitly stated in the contents, unlike keyword based search. However, most popular content on the web is actually explicitly stated in a single sentence, so NLP seems less useful for searching popular content since Google search would already do a pretty good job here.

However, the real promise of NLP based search seems to be in the context of the "long tail" of search - which are frequently searches not explicity answered on any single web page. As the web continues to grow and many different kinds of contents come online (blogs, books, emails, etc.), the long tail of web search will continue to increase its share of the total search volume. Most of us have experienced that the unpopular searches often are not explicitly found in any single web page, instead they require the user to scan multiple web pages before they find what they want. Keyword based searching cannot make things any better here since the keywords may either be spread out across webpages or they may simply be absent (e.g., "dog" and "tommy" can be related if tommy is the name of a dog - a fact that keyword based search cannot discover). This is where NLP can really make a difference. It can identify facts from across web pages and save users valuable time spent scanning different web pages trying to forge an answer to their search queries.

So, very roughly speaking, if you can find the answer to your query in one Google search and after scanning 1-2 returned web pages, then NLP will not make things any better for you. If it takes more than one search and visiting 5+ search results to answer a given query, and if your query and its potential response can be formed into a fact, then NLP might be useful.

Another analogy for the applicability of NLP may be the information density of a web page. NLP will be more useful finding content in web pages with low information density. By converting the text to facts, NLP is in a way converting "semantic compression" of the contents. "NLP compressed facts", owing to their increased information density, are better suited to answer user queries. These "low information density" web pages may be web pages with lower page ranks on Google. Other examples of low density content might be casual chat sessions, email threads, etc.

Unfortunately, in my experience Powerset doesn't seem to be doing a good job in identifying complex facts. They do a decent job at identifying obvious or simple facts but based on some examples I saw, not so well for complex facts. For examle, if you search for "who was the author of the godfather", you get the answer "Mario Puzo". But Google also fairly easily gives you the same answer when you search for "Godfather author" or "Godfather writer". But if you query, "how many years did Mario Puzo take to write the godfather", Powerset doesn't seem to offer any useful results.

Also, I wonder if their algorithm can really connect information from across different websites, different paragraphs in the same web page (should be there I think), etc.

I'd conclude that for NLP search to be really useful, it should target the long tail of searches - searches which individually are an insignificant part of the total search volume but put together comprise a major chunk. Powerset NLP search doesn't seem to be there yet and quite likely neither do other existing NLP based searche engines.