[Second post of the series “Things that I probably will not develop in a proper paper, but I find interesting enough to write here”. The first is on the XX century decrease of turnover rate in popular culture]
In the last couple of years, part of my research has been dedicated to explore the emotional content of published books, using the material present in the Google Books Ngram Corpus. Our analysis produced some interesting results. While analysis like ours need to be carefully weighted and possibly re-produced with various samples (but this should happen always…), I think that tools like the Google Books Corpus represent an extraordinary opportunity, as my goal is to study human culture in a scientific/quantitative framework.
Keeping this in mind, there are few reasons to be cautious (see for example here), mainly due to the fact that we do not know which books are inside the Google Books Corpus. It is well known, for example, that the share of scientific and technical literature greatly increases in the XX century sample, generating potential distortions (on the other side: the share of scientific and technical literature increased in reality in the XX century). In one of my first posts, I analysed how different normalisations seem to create different biases in the trends, with the frequencies of the same set of random words (which are supposed to be stable through time) decreasing when normalised with the total count of words in the sample (as Google does in the Ngram Viewer) and increasing when normalised with the count of “the” in the sample (assuming the word “the” would be a good representative of “real” writing and “real” sentences).
As a consequence, I am lately trying to back up and extend results from Google Ngram with a less-distant reading analysis, that is, to repeat the same automatic analysis, but in specific books of which we know authors, time of publication, etc. An interesting side-result of the analysis I am working on, that keeps to appear practically everywhere, is that books tend to become more “positive” with authors’ age. I calculated a ratio of the amount of words associated to negative and positive emotions (using LIWC), so that higher values represent preponderance of negative emotions versus positive ones and viceversa. The “King of Horror” Stephen King (see the plot below), for example, seems in fact to get milder with time (the “outlier” in the bottom-right of the plot is “The Colorado Kid”, considered indeed “a true diversion from King’s normal horror fare“).
Analysing a quasi-random sample of contemporary best-seller authors (which includes 354 books, with authors like Terry Pratchet, Dean Koontz, Michael Crichton, etc.), there is the same strongly significative correlation between authors’ age and ratio negative/positive emotions (see the plot below, p<.001). The same analysis in another sample of 200 books from the Gutenberg project (mainly XIX Century best-sellers, including the like of Charles Dickens or Robert Louis Stevenson) shows an analogous (significative, but weaker, with Spearman’s rho=-.17 and p<.05) trend.
This result is quite well known. James Pennebaker (the developer of LIWC) reported a similar study, where the same effect was found using written or spoken text samples from more than 3000 subjects participating in various disclosure studies (i.e. “the common feature of all studies was that the investigators were studying individuals who were disclosing emotional events or experiences in their lives”). In the same paper, Pennebaker and colleagues analysed also a sample from 10 published authors, somehow similar to my Gutenberg sample, but they did not find significative trends.
While quite incomplete (I would need a bigger sample; compare different ways to extract the emotional content; what happens in other languages? etc.), the results are quite interesting to me. First, they tell us that we get happier (or, well, that we use a more positive language…) with age, which is against the stereotype of grumpy grandpas and screaming-with-pasta-rolling-pin-in-the-hand grandmas (this is the Italian version, which is, in any case, better than the lonely/sad “seniors” of contemporary mainstream western culture). Incidentally they resonate with the hugely publicised finding that well-being would follow a U-shape trend through life, with the lowest point in the 40s, and an increase after that (I can not really say much about this. Just as a balance, here a partly skeptical view).
Second, the majority of anthropologists tend to think that general regularities in human behaviour (i) do not exist (as local “cultures” will mainly act towards differentiation) or (ii) when they do, they are very abstract and hence not informative (say, all humans need to eat). If we can predict that, with age, the balance between negative and positive emotion words changes, and that it changes in a specific direction, this seems quite specific and informative to me.