Text Mining the GOP Announcement Speeches
Ohio Gov. John Kasich and former Virginia Gov. Jim Gilmore will launch their runs for the White House on Tuesday and in early August, respectively, which will likely put a plug in the months-long trickle of announcements from GOP presidential hopefuls. And while I don’t know what Kasich or Gilmore will say, I can’t help but wonder if their speeches will be deja vu all over again. While watching and reading the other 14 announcement speeches, I got an eerie feeling that I was hearing the same speech repeated.
And as it turns out, I was.
Through a combination of reading and text mining (applying statistical methods to text, rather than numbers, to find patterns and insights), I found that, despite their differences, almost every announcement speech has followed the same general recipe. The exception was Donald Trump’s long, meandering, and at times bizarre speech.
Recipe for a Presidential Announcement
The past 14 major Republican presidential announcement speeches (not counting Carly Fiorina’s because she announced in a short video) have had a few key components: a sympathetic or inspiring segment of the candidate’s or candidate’s family’s story, a number of policy prescriptions (often mixed in with highlights from their political or private careers), sharp criticism of the Obama administration and/or Hillary Clinton and aspirational points about the future, often centering on restoring or strengthening the American Dream.
The recipe is somewhat customizable – candidates tackle these themes in different orders and put emphasis on their greatest strengths. For instance, former Florida Gov. Jeb Bush briefly highlighted his teen study-abroad experiences in Mexico – where he met his wife, Columba. But biography was a greater focus for Florida Sen. Marco Rubio, who climbed the socioeconomic ladder from somewhat humble origins to the Senate floor. But in general, the speeches stuck to the main ingredients.
Simple text-mining tools revealed the aspirational nature of these speeches and highlighted some of their similarities. Excluding conjunctions, prepositions and a number of other primarily structural/organizational words, the top 10 most common words across all 14 speeches were: will, people, one, America, President, can, country, know, make, and American. I also calculated that there were around 1,500 words that two or more candidates shared – which is a significant overlap considering these speeches typically ran around 2,000 to 4,000 total words. But there was significant variation in the frequency candidates used these shared words. Every Republican invoked Obama in some way (by naming him, talking about Obamacare, making a jab at the Obama-Clinton foreign policy, etc.) but Rubio, former Pennsylvania Sen. Rick Santorum and Ben Carson only mentioned him once, while Trump used his name 11 times.
More advanced text-mining tools really showcased the similarity between these speeches. I used a variety of methods, but the results from “fuzzy c-means clustering” demonstrate these similarities most effectively. Basically, this method takes a set of things (in these case, speeches) and divides them into two, three, four, five or some other number of subsets, or “clusters.” Here, it compares how frequently candidates use the same words in their speeches in an effort to put the speeches into different clusters.
For instance, if two candidates often repeated the word “freedom” they are more likely to be put in the same cluster. But if a third candidate never used the word, he or she would be more likely to fall into another cluster. Methods like these sometimes gloss over the nuances and details within text, but they often provide a good bird’s eye view of the texts. The word “fuzzy” simply means that a speech can partially lie in one cluster while mostly lying in another. Here are my results:
This table shows how well each speech fits into each of the three clusters. The key feature is that Trump dominates cluster three and the other candidates fall more or less evenly into clusters one and two. This means that Trump’s address was different enough from the other speeches to essentially get its own cluster. The rest of the candidates were evenly divided between the other two clusters – meaning that no other candidate or group of candidates differed sufficiently from the rest of the pack to even begin to carve out his own cluster. In other words, Trump used a different vocabulary with different frequencies than the rest of the field – who tended to use similar words at similar rates.
(Note: I used three clusters because it follows a good rule of thumb for picking cluster sizes, but this same result – Trump being singled out and the rest – appeared when other cluster sizes were used.)
These similarities also showed up when I compared individual pairs of speeches. For example, Bush and Texas Sen. Ted Cruz, despite their ideological differences, used similar words.
This is a comparison cloud – it’s similar to the traditional word cloud, but instead of displaying the most frequent words in one text, it highlights the differences between two documents. In this case, Cruz used the orange words more often than Bush, and the size of each orange word is proportional to how much more often Cruz used that word. For example, Cruz used the word “imagine” frequently and Bush did not, so the word shows up as large and orange. Bush, on the other hand, used “can” slightly more than Cruz so it is teal and slightly larger than the other words.
The key takeaway is that aside from “imagine,” all of these words are pretty small. That means that despite the gulf between Bush’s more moderate/establishment style and Cruz’s brand of Tea Party conservatism, they used similar words at similar rates.
Trump Broke the Pattern
Anyone who watched Trump’s speech should not be surprised that it differed from the rest. Rather than following the recipe, Trump hopped between disparate policy points without a clear structure. He also covered a number of topics that seem odd in that context. He spent a significant amount of time talking about his personal wealth, taking potshots at the other candidates for their supposed inability to “make deals” or “negotiate” and sharing his personal musings about ISIS, China, Japan and Mexico. The speech was also as long as it was bombastic – totaling over 6,000 words when most of his competitors stayed around 2,000 to 4,000 words.
It’s easy to draw a line between some of Trump’s departures from the norm and the results of the c-means algorithm. If Trump spends hundreds of words talking about some topic that the others feel no need to cover – say, bragging about his personal wealth and business exploits – then he spends less time talking about a topic the other candidates tend to speak on – like the American Dream. And if Trump is the only candidate heavily speaking on one theme (e.g. a hypothetical conflict between President Trump and Ford Motors) that would also move his speech away from the others and toward its own cluster.