I found, thanks to twitter-induced serendipity (others call it procrastination), the lyrics of the songs included in the annual Billboard Top-100 from 1965 to 2015 (i.e., considering a few missing, ~5,000 songs). You can find in GitHub, together with the raw data, some clarifications on how the data were collected, their limitations, etc. plus a pointer to a nice analysis already done.
It may not come as a surprise to familiar readers of this blog that, to get an idea of the dataset, I checked the emotional trends in the lyrics. And – guess what? – I found almost exactly the same pattern that we already found in English-language fiction (but with some hints that the same happens in other western languages too). Pop hits, in the last 50 years, become less “emotional”, and this decrease is driven entirely by a decline in the proportion of positive emotion-related words, while the frequency of negative emotion-related words – on the contrary – increases (in English fiction, negative words remain more or less stable, see the actual paper here, and a blog post here).
Now, this is all well and good. The trend in pop songs lyrics is plausible, and it resonates, for example, with a previous analysis. The trend in fiction seems quite robust too (see some comments here), but the question is: what is going on? In English fiction, the trend seems to start two centuries ago and, in the paper linked above, we discuss how some possible long-terms dynamics (including urbanisation, or a “regression to the mean” with respect to emotionally exuberant Romantic writers) could explain it. But, can these explanations work also for a recent trend like the one found in pop songs lyrics?
Given the relatively small size of the Billboard dataset (I hope I will then move to fiction, etc.), I started to look into the data to understand better what the trends mean. First, these trends are produced by “counting” words that are associated with positive and negative emotions. One can ask wich words are actually causing the trend, and what their relative contribution is. The plot below (inspired by the paper linked above and by successive works of the same group) shows the first 10 words that contribute more to the overall emotional trend. The contribution of a word is calculated as its total abundance – how many time “x” is found – multiplied by the slope of its linear tendency – an abundant word which is more or less stable will not have much effect (I feel like there are more elegant, or just correct, ways to do it, but nothing came promptly to mind. This works for now, and I am happy to have suggestions).
“love” (or the lack thereof) is the main culprit of the emotional decline of pop music in the last fifty years. The term “love” is very abundant. In the 70s, there was around 1 “love” every 70 words in the lyrics (to compare, the same figure for “the”, which is the most common words in English, is 1 every 20/25). Apparently, “love” suffered a sharp decline in usage, starting at the beginning of the 90s, to stabilise, or even increase, with the 2000s, but with a frequency reduced by approximately half (see the plot below).
The second most contributing word is “like”, even though the contribution is in the opposite direction, that is, its frequency increases in time. All the other words have a smaller contribution. Two things to notice. The majority of these words are positive emotions, and all these words are decreasing. One double exception is “shit” (yes, no typo), on which I will come back soon.
A reasonable question is: what would happen without “love”? Or without “like”, etc. i.e. do high-frequency words influence the results in datasets that are relatively small as well as biased towards certain words? (I would not expect “love” being the main culprit of the emotional decline in English fiction, but who knows?)
As the plot above shows, they do, but up to a certain point. Here, I took out, one at a time, the ten most contributing words, and recalculated the general trend. “love” in fact has a big role (see the point at “1” above), but, in general, the trend remains negative, especially after taking out also “like”, the second most contributing word.
One can do the same exercise separately for the positive and the negative emotions trend. For positive emotions, unsurprisingly, things are pretty similar (as explained above, the “general” trend is mainly caused by positive emotions).
Besides “love” and “like”, the trend is apparently due to a consistent decline in the frequency of relatively common words associated with positive emotions (“well”, “sweet”, “good”, etc.) and it seems quite robust when excluding hi-contributing words.
Compare now with what happens for the increase in negative emotions.
First, here the trend appears being driven mainly by recently introduced, especially slang-ish, vulgar, terms. In fact, my first thought was that my calculation of relative contribution was skewed towards the slope part, giving less importance to the abundance part. This may be the case, but not necessarily. These terms do have sharp increases – practically none of them was present before the 90s – but also, surprisingly, they are relatively common. “shit” totals 898 occurrences, which make it more frequent that the second hi-contributing word (“lonely” has 798 occurrences). Similar orders of magnitude for “fuck” (634 occurrences) and “alone” (expected hi-frequency, but in reality totalling 986 occurrences). Second, the trend for negative emotions is less robust in respect to the exclusion of hi-contributing words.
I have quite a few thoughts about all the above (e.g. positive emotion-related terms are, in general, terms with higher frequency in respect to negative emotion-related terms: does this have something to do with their decline? What is the role of hi-frequency terms -“love”- and recently introduced terms -“shit” – in other datasets?), but I will keep them for when I have more results, so here is a video of The Doobie Brothers.