The Buzz on Google NGram Viewer

'Tis the season for conference presentations. A time when people are compelled to make grand statements and mobilize snappy visuals to back them up. In this short post I'm hoping to spark some conversation about one such resource: the Google Ngram Viewer.

For the uninitiated, the Ngram Viewer works like this: through a relatively simple user interface, you plug in one or more terms. With the click of a button, a graph pops up that tracks the frequency with which they appear in a wide range of books since 1800. C'mon, try it -- everyone's doing it. I mean, who doesn't crave quick answers to the question of 'zombies' versus 'vampires?':

But like sugar and caffeine -- two of my addictions -- the buzz wears off quickly, often leaving me more disoriented than before I imbibed.

In all seriousness, this is a tool that invites as many questions as it answers, especially when tracking concepts across different languages and cultures (although you can search in a range of other modern languages). I won't even get into the bigger issues of sampling and statistical modelling, but welcome comments on these aspects, as well.

Take an example from a recent workshop I attended on "Endangerment and it Consequences." It was exciting to be in a room with scholars from around the world, working in different cultural contexts across several centuries. However, this raised the inevitable question of terminology. In the final wrap-up, the Ngram viewer provided a provocative means of reinforcing our shared sense that "endangerment" was a timely topic; that the two days of attention had been worthwhile:

But, of course, this graphic could not tell us what the word meant or even how it has come to assume such currency. And it made it easy to forget that the ideas expressed by the English word "endangerment" might be expressed differently in other languages (or even within English, itself).

You get the point. For me, the initial buzz of "wow!" quickly gave way to a lull of "so what?" But a week later, I'm coming around to realizing that instruments like the Ngram Viewer present problems of knowledge as worthy of inquiry as concepts like "endangerment." This, I imagine, is an issue that those of you in the digital humanities are particularly well-situated to consider.

So, what I really want to know is: Have you used the Ngram Viewer in conjunction with your scholarly activities, teaching included? How? What do they tell you? What are the risks?


I have lots of thoughts on this. I'll limit myself to pointing out someone with even more.

A friend, Ben Schmidt, has written on n-grams as part of his interest in the digital humanities ("Using tools from the 1990s to answer questions from the 1960s about 19th century America"). There are a lot of relevant posts: the top four are probably (in reverse-chronological order): here, here, here, and here.

Interestingly (especially given that last, oldest post), Ben's now employed by the Culturomics folks as, if I'm not mistaken, the first humanist on staff. This is announced on his blog, and he's on their website now, too. His take on the matter's worth exploring.

I've seen n-grams used in talks, but mainly as introduction-fodder (i.e. in un-exciting ways separate from analysis). Anecdotally, I know of professors who've responded skeptically only to be discovered with printed-out three-grams stacked up on their desks a week later..

I did a post on this back in January with the example of "air police" vs. "international control of atomic energy". Basically, it just says that a well-crafted ngram search (eg, using non-common terms specific to a definable historical milieu), combined with some insight from historical research, can indeed produce some interesting/enlightening results.

Also, check out Etienne Benson's thoughtful comments on the uses of Ngram Viewer:

Note: Only a member of this blog may post a comment.

back to top