The Numerati, disambiguation, and Watson.
Since the release of Bakers book (numerati), society at large has come to see the ‘Numerati’ as being increasingly important, integrated parts of society rather than the inexplicable oddity, the curiosity presented in that book, that oft and easily invoked harbinger of ‘ potential risk'; what was once a “cautionary story”, is now business as usual, fully integrated across sectors and in even more roles than Baker predicted; but perhaps, in perspective shift, the focus today is on the back-end hardware. Where the early writings of Baker (which eventually comprised his novel “The Numerati”) were about how these ideas could play out, some of the arenas to watch, and how the changes would look to an end user; today he works and writes a great deal on the topic of IBM’s Watson, the puzzle solving computer designed by IBM, with a dream of natural language processing.
It has been an increasingly trivial task over the last twenty years in computing to search out a string of characters (words), where Watson advances is in the realm of disambiguation, in parsing the connectivity between words, and within words (one character string bearing multiple ‘semiotic meanings’ [Record, a merit signaling excellence or achievement, Record, to preserve, Record, a vinyl disc that captures sounds in grooves, Record, colloquial terminology for a musical album in the 20thcentury, Record, the documentation of one’s history, Record, documentation of life history as in fossil record], now, several of my examples may appear to be “the same”, but they are easily differentiated enough that for a tool such as Watson required some sort of ability to seek, and parse each potential meaning, and correlate this with the data requested in a question, or query).
The big changes coming down the pipeline are the centralization of ideas on a need for not only “deep computing”, as Deep Blue, IBM’s old Chess-master-slayer. “Deep Disambiguation” is a departure, or perhaps, an evolutionary divergence from deep computing tools. Disambiguation is a term anyone who has used Wikipedia will be intimately familiar with; it is the means by which a user may separate the many uses for a word, phrase, or concept, and specify which use-function they desired information on. This is, as far as my reading suggests, essentially what Watson does. Watson has been fed million of pages from the internet, Wikipedia, and other encyclopedias, and then the programming team attempted to give Watson the cues, and key word sensitivity to parse the disambiguation of any given set of terms (which, broadly speaking, is the “way” a human would consider Jeopardy, taking first the word cues, then considering the multiple meanings for the terms, and then lastly considering double meanings based on cross referencing the several meanings for the various words in the clue).
We may not think of it as such today, but one day we will look back on the early days of “Watson”, and think, this is an integral point in the speeding up of history. A word we will remember as ramping up towards true semantic natural language in computing. The word is Disambiguation. Which essentially means taking an unordered list, adding order, and creating a means of determining context from the cues, clues and conditional clauses – making a term with various meanings less ambiguous, specifying which of the contexts for any given concept is being referred to, and providing specific answers based on inputs and cues embedded in a given query.
Insert here the cliché internet era meme-“joke”, following a particular formula, which Ken Jennings appended to his ‘final Jeopardy’ written response on the last day of their competition, “I for one welcome our new ______ over-lords” (where the insertion is whichever type of neat, or interesting, or exciting ideas or thing is being looked at [often cursorily, as this is often appended to particularly cute cats, or giant snails, and various other “non-sentient” things]).
In a recent essay for the Harvard Business Review, Baker writes:
And yet I would argue that the blooper was healthy for IBM in the long run because it bolstered the case that Watson is a fallible, sub-human machine—and therefore an unthreatening addition to the workplace.
We can choose today to laugh at Watson, for how seemingly foolish it is to give certain answers it did (Toronto?????). But what will it look like when we realize that Watson is the tool, the word processor with spell checking, not the author of immeasurable creativity. Watson in the wild will actually have co-workers, will have people working in concert, giving additional cues, perhaps inspired by the initial suggestions from Watson. The way Watson will interoperate with humans in a post Jeopardy world, is that, rather than attempting to “stump” Watson, people will be constantly feeding new data, providing feedback, and giving hints, or suggestions, adding to the reciprocity of information sharing. Imagine a scene right out of House M.D., with Gregory House standing at his characteristic Whiteboard, marker in hand; writing SYMPTOMS on the board, hoping for his team to give up some of the differential diagnosis that they have stored in their minds over decades of reading medical journals, medical textbooks, and lived experiences. Unfortunately, in real life, Doctors are segregated by discipline, and such ‘teams’ of cross-discipline diagnosticians are neither common, nor accessible on a scale that might make an impact on treatment and diagnostic outcomes (Fox, if you see [and choose to act on] the obvious R’ilsaw-awesome product placement potential with IBM and Watson here, don’t forget me ;)…
Now, in addition to his crew, put Watson in the room; what if his team could have every page of every one of those books, and journals open all at the same time (or very near simultaneously), in conjunction with a connection between them, and their counterparts at hundreds of other hospitals around the globe. Suddenly the potential for a differential diagnosis machine becomes more ‘fleshed out’.
Much of the “fear” surrounding any new announcement of progress towards AI frequently revolves around the question of “robot domination”; now, I am not making the case that “robot domination” is some impossibility, merely that when thinking on Watson, and the tools that derive outwards, this quote from Baker (again in his Harvard Business Review article) is important to consider;
Watson’s offspring will be zealous research assistants, not managers.
Where Watson will evolve from novelty-test, to tool of progress; lies in the uses which it is already being integrated; energy and efficiency (making choices based on long understood, and highly precise) data on the most likely spot for the greatest resource recovery, or most efficient uses of a piece of land, to medical uses (diagnosis, patient recovery tracking, symptom tracking, pandemic predictions), customer service roles (yes, many of the roles Watson could fill without breaking a sweat would replace low wage, unskilled jobs for real people; those people weren’t “destined” for a customer service role, and, just as Baker describes in his writing on Watson, the “Weather Savant” was usurped by the invention and spreading of the Barometer, they moved on, some used the new tools to deepen their understanding of weather phenomena, and others moved into entirely unrelated roles; another example, just as the “scribe” diminished, and nearly disappeared from society upon the wide adoption of movable type printing presses, but the mentality of the scribe persists, and people to this day take on such roles.
Imagine an Italian town in the 17th century. Perhaps there’s one man who has a special sense for the weather. Let’s call him Luigi. Using his magnificent brain, he picks up on signals—changes in the wind, certain odors, perhaps the flight paths of birds or noises coming from the barn. And he spreads word through the town that rain will be coming in two days, or that a cold front might freeze the crops. Luigi is a valuable member of society.
Along comes a traveling vendor who carries a new instrument invented in 1643 by Evangelista Torricelli. It’s a barometer, and it predicts the weather about as well as Luigi. It’s certainly not as smart as him, if it can be called smart at all. It has no sense of self, is deaf to the animals in the barn, blind to the flight patterns of birds. Yet it comes up with valuable information.
In a world with barometers, Luigi and similar weather savants must find other work for their fabulous minds. Perhaps using the new tool, they can deepen their analysis of weather patterns, keep careful records and then draw conclusions about optimal farming techniques. They might become consultants. Maybe some of them drop out of the weather business altogether. The new tool creates both displacement and economic opportunity. It forces people to reconsider how they use their heads.
There is no “universal law” saying that no one may be a scribe, or a monk translating, or copying texts… and if, by chance, there were someone who found their deepest desire for fulfillment to be contained in taking on “customer service roles”, there must just as surely not be proscriptive regulations that prevent such a role for a person; merely common sense that what may be done with a tool and ease is not a particularly stimulating role for a person.
Computers today are like mice at birth, they have brains, and programming, and hard-wiring (to eat, drink milk etc,.) but they are blind, they have limited sensory information beyond scent, and perhaps the associated sense of taste, newborn mice possess tiny fractions of the volume of information which makes adult mice a challenge even for the nimble, fast, cunning feline species which may desire to eat any given mouse. When we consider what a difference it makes to have full access to senses such as sight, hearing, touch… we may start to tally how much of the information that humans are interpreting from moment to moment is “encoded” not at the purely textual, or phonemically, or syntactical, or even at the semiotic level. Some things are more subtle, and there is currently no disambiguation tool available (a “raised eyebrow” can mean a great many things, and in two contexts, which are not obviously different without reading a complex subtext of individuality, personal history, opinion, mood, relationship to the interlocutor, mixed with physical observations; and so we have something that a computer cannot even SEE, not to mention interpret). Another example would be “security” (theatre) computing, which various airports already use, analyzing facial movements, eye patterns, in attempts to find “outliers”; but they cannot smell the person, are they sweating profusely, do they smell of alcohol…
One of the most interesting of Baker’s recent blog posts relates to the increasing ubiquity of sensory apparatus in the many spaces we humans inhabit. The blog post is about baseball, and how new sensors, combined with a radically increasing number and variety of statistical measurements actually are beginning to alter the game, alter who is selected by big teams, and altering the way baseball is measured. For a long time the most famous sabermetrician was evolutionary biologist, and part time/amateur sabermetrician Stephen Jay Gould, tomorrow, a child of Watson may take on this role.
In the beginning, there was the hit. Then the strike out, the RBI, the batting average, the run scored, the win, the loss. This was the first original generation of baseball data. It was the universe occupied by dead-ball era players, like Ty Cobb and Honus Wagner. Sometime early on, the first Numerati of the sport started to crunch some numbers. If each pitcher were to go a full nine-inning game, how many runs would he let in? That led to the earned-run average, or ERA.
The second generation, as Michael Lewis described in Moneyball, came about in the ’90s. Number-crunchers started to develop new enhanced statistics, which took on a life of their own at Baseball Prospectus. They brought in loads of new variables, and analyzed correlations. They could calculate, for example, the AEqR, “the number of equivalent runs scored by a team, adjusted for their opponents’ pitching and defense.”
And thus, suggests Baker, the Numerati assert their dominance over another realm of human cultural artifact. The important change, and thing to watch being the rise of cheaper, more accurate, robust and reliable sensor technologies:
now comes the sensor revolution, which will bring to baseball (and the rest of our lives) mountains of new statistics. These ones, as Ira Boudway writes at Bloomberg, will measure players not by the traditional route–results–but instead by monitoring and measuring their behavior. The new monitoring, already in place at San Francisco’s AT&T Park, is called Fieldf/x. Boudway writes:
Fieldf/x is a motion-capture system created by Chicago- based Sportvision. It uses four cameras perched high above the field to track players and the ball and log their movements, gathering more than 2.5 million records per game. That means you could find out whether Ichiro Suzukitruly gets the best jump on fly balls hit into the right-field gap, or if Derek Jeter really deserved that Gold Glove last year.
It’s with systems like this that the Numerati establish their hegemony over businesses, including baseball. The reason is that the statistics are so rich and varied that only experts with advanced computer skills can analyze them. Of course, eventually, they build and sell the software to widen the markets to the rest of us. But their systems come to define the game.
Just because mass produced desktop computers have long had severely limited sensory organs (mice, rudimentary joysticks, and keyboards; despite touch technology existing for more than thirty years), does not mean that we are not about to jump into a world of sensing computation; the Microsoft Kinect has just brought motion tracking, facial recognition, gesture recognition and visual senses to a less than one hundred dollar device, with a version with open drivers for easy personal computer use on the way. Temperature, pressure, and other weather data sensors are dropping in price every day – this has impact no only on “weather nerds”; imagine if all of the homes in Canada could detect internal climate, and automatically adjust, lowering temperatures when no one is in a room, re-directing heat from an unused room to an occupied one, or vice-versa on warm days, opening blinds when the sun hits the one dollar photo-receptive sensor on a sun-facing wall, or closing them when in the sweltering summer.
And then we reach the point of “things that are not about saving [changing] the world” (the grandiose claims/aims of IBM), but rather about culture, and what people do in the time they put aside for their personal time. Video games with personal role-playing, and participation (like Wii bowling, as compared to bowling on the original Nintendo entertainment system), an infinity of data about a persons favorite sport, athlete, or team… multi-spectral tools and peta-pixel resolution images of artwork to allow examination of minutia, and hidden elements, chemical composition analysis tools for the perfection of gastronomic sciences (at home).
Now we just have to answer the question; is there a natural deprecation of “digital” activities. Are digital activities and tools “necessarily” lower in a hierarchy of “culture” to bowl (for example) in a digital realm? Even if digital-equivalents require exactly the same physical motions (while also allowing differently-abled players opportunities to participate)? Even if it allows, and even requires communal playing, in the same room (as computer-sensor systems such as Kinect allow). Is there intrinsic hierarchy, or is it in the control, a single person has over the ability to “opt in”, or “opt out”. It is only “harmful” if we disallow real life engagement. If digital environments are seen as tools. I think it is difficult to make an argument that one or the other is “better” or “worse”; the important thing to struggle towards is a future where people are not tossed aside for not participating in the digital realm, where there is space for both digital and physical spaces, and where digital augmentation of bodies is not simply accepted (here I am thinking of Violet, and her representation of the “helpless” person forced into a world which has taken on “digi-mods”, with no consideration, or course corrections along the way) there was nothing intrinsic to “the feed” which demanded such a dystopic, forced mindlessness. It was the ideas of the author, ideas which speak to the fear of being left without a choice. Like Violet, I suppose I fear a world which reduces bodily autonomy, and casts aside those who do not seek to use themselves as test subjects for digital “alterations”.
Baker’s recent posts on his blog have been extremely fascinating, two in particular; the one about baseball, mentioned above, and the post titled “Watson and the Barometer“, which speaks to how Watson ought to be considered, and some of the ideas which may need rethinking in a post Watson reality.
IBM Centennial Film: They Were There – People who changed the way the world works
Building Watson – A Brief Overview of the DeepQA Project
See how a DeepQA system similar to Watson could help transform how doctors and hospitals care for patients.
A system with capabilities similar to Watson could help transform how the finance industry does business.
Watch as IBM experts explain how an IBM DeepQA system similar to Watson could help transform customer service.
The Next Grand Challenge
IBM and its history of scientific breakthroughs can be credited to a commitment to research and Grand Challenges. Find out how these challenges help push science in ways that weren’t thought possible before.
The IBM Jeopardy! Challenge is more than a game.Jeopardy! makes great demands on its players-and even greater demands on a computer system. Learn about the unique hurdles Jeopardy! presents that Watson must overcome in order to achieve the scientific goals of the project.
A System Designed for Answers
A computer system that can understand natural language and deliver a single, precise answer to a question requires the right combination of hardware and software. Watch to find out how Watson integrates both into a unique solution.
Countdown to Jeopardy!
Watson will soon face the two greatest Jeopardy!champions in history: Brad Rutter and Ken Jennings. Watch to find out more about the competitors in this historic challenge.
The Face of Watson
Watson consists of 90 servers – not the most interesting thing to look at on the Jeopardy! stage. See how IBM worked to create a representation of this computing system for the viewing audience – from its stage presence to its voice.
Watson after Jeopardy!
Watson was optimized to tackle a specific challenge: competing against the world’s best Jeopardy!contestants. Beyond Jeopardy!, the IBM team is working to deploy this technology across industries such as healthcare, finance and customer service.
Final Jeopardy! and the Future of Watson
Watch the post-game video to see the reactions of the contestants and assessments of the match by the IBM team.
Final Jeopardy! and the Future of Watson
Stephen Bakers “Final Jeopardy” blog; www.finaljeopardy.net/
Also related to this project are objects linked on the resource collection blog I have shared online, accessible at:
- Click to email this to a friend (Opens in new window)
- Click to print (Opens in new window)
- Click to share on Twitter (Opens in new window)
- Share on Facebook (Opens in new window)
- Click to share on Tumblr (Opens in new window)
- Click to share on Pinterest (Opens in new window)
- Click to share on Pocket (Opens in new window)
- Click to share on Google+ (Opens in new window)
- Click to share on Reddit (Opens in new window)