What AI-Powered Social Content Analysis Gets You: A Look at the Social Security Conversation
By Craig Johnson, Nicolai Haddal, Alan Rosenblatt, Ph.D., and Linda Benesch
In our last post, we talked about the concept of how machine learning and more advanced big data techniques are becoming accessible to teams with few people and limited resources. In this addition we would like to talk about some of the uses this technology provides you in a real sense, with a real life example.
We look at Social Security in this case study because it represents a pole in difficulty surrounding language analysis because there are clear lines of division on the issue. Libertarian and Republican groups oppose the program while Democrats and progressives work to protect and expand it. But those differences are not always immediately apparent in how people talk about the program.
What sets Social Security apart is that as the “third-rail” of politics, very few prominent advocacy groups or lawmakers are openly against it. They talk about “saving” social security when they really mean they want to cut, privatize, and end the program. Sen. Marco Rubio’s (R-FL) bill is even more pernicious because he wraps a cut to Social Security in a federal paid leave program which makes most of the conversation around it seem positive.
This represents one of the biggest problems for traditional language analysis tools because most of the time the evaluation algorithm will rate the entire conversation positive despite there being a clear pro/against conversation going on. That’s where new advances in AI-machine learning technology comes in.
Now we can train a computer model using language on a specific topic from opposite poles of the conversation and then analyze the rest of the conversation data to sort it accordingly. We use language that we know is “for” or “against” Social Security, in this example, to train the computer to recognize which tweets are about protecting and saving Social Security and which comments are about ending it, even if they use positive-sentiment words. And then we use the model to analyze all of the other tweets about Social Security to sort them into the right position.
Because groups often share language and attempt to coordinate on messaging, we typically have two strong language poles within any issue conversation to compare to the overall conversation to help us better understand who is winning the conversation battle (share of voice and strength of argument). Not only can this analysis tell us about share of voice but, most consequentially , it can tell us about the stark differences between conversations about our issue.
This is critical to know if we want to talk to people online in their own vernacular and message frame. As we know from cognitive science, the more you talk like a person talks, look like a person looks, and act like a person acts, the more effective your persuasion messaging will be.
The better we can home in on how people are communicating about an issue, the better we will be able to communicate with them in a way that connects. And the better we can detect shifts in national conversation, the more rapidly and effectively we are able to adjust our own messaging to achieve our strategic legislative and electoral goals.
Case Study: Social Security
Here are some examples of the Social Security Twitter data and how visualizing it helps us understand what is happening in the national conversation. This data is just a snapshot of tweets posted June 1-6 and 28-30. It includes 83,000 records.
Figure 1. Reach and Frequency of Pro- and Anti-Social Security Tweets (June 1-6 and 28-30, 2022)
One of the key things we learned from this experience is that while a vast majority of tweets are positive about Social Security, anti-Social Security posts had a greater reach. In particular, looking at the raw data, the media’s talk about the Social Security Trustees Report during the month of June sparked a large increase in conversations undermining trust in the longevity of the program.
Figure 2. Anti-Social Security Conversation Geographic Distribution
Part of the reason we believe that anti-Social Security messages reach more people is that the news organizations reporting on it generally have a large reach on their Twitter accounts, but also because, as the city location (even with its limited availability) shows, an anti-Social Security conversation is largely happening only among large-reach accounts in DC and NY. When it comes to large-reach accounts, they talk at people, but they do not indicate a swell of potential votes or voices. Meanwhile, the pro-Social Security conversation includes many smaller reach accounts that are more geographically diverse, representing many, many voters and voices.
Figure 3. Pro-Social Security Conversation Geographic Distribution
Knowing how the conversation is tilted and where it is tilted is essential for any advocacy organization. By understanding the complexities of an issue conversation, groups can avoid triggering their audiences with poorly worded messages.
By understanding both who is winning the reach battle and who is winning the frequency battle gives us a deeper understanding of Share of Voice. At first, you would expect that since the pro-Social Security tweets outnumber the anti-Social Security tweets, it would control the conversation. Instead, we see that, while having fewer tweets, the anti- group narrowly reaches more people, offsetting the pro-Social Security frequency-based share of voice. By mapping the pro- and anti- tweets in a geographic visualization, we further clarify the narrative landscape, showing us that while the conversation may look even or slightly tilted toward the anti- corner due to its larger reach (which would be a very concerning development), that reach is not impactful as one might normally expect because it is offset by the larger frequency of messages from the pro-Social Security corner. In this case, the larger frequency of pro-Social Security tweets widely distributed across the country indicates that the loud reach of the national organizations in DC and NYC is not changing the conversation in the rest of the country.
This is a concrete illustration of something that those who work on Social Security have long known— anti-Social Security sentiment is a Wall Street, media, and Republican driven narrative, not a grassroots narrative.
Now that we got all of this out of the way, there are some major caveats to the conclusion, which reveal the limitations of AI-Machine Learning tools. Some of these problems apply to all subjects, while others are more limited in nature. We used one of the most complicated topics–Social Security–to test and we feel comfortable that this is, indeed, an extreme case.
But first, here are a few wins–tweets that were coded correctly by the model. Our model training using known pro- vs. anti-Social Security tweets was satisfactorily strong. Here are a few examples from the data set that the model got right:
Text: RT @RockvilleMom14: @NelsonforWI Republicans, led by multi millionaire Senator Rick Scott, want to “sunset” Social Security, Medicare, Medicaid and the Affordable Care Act. It’s point 6 in his plan for the doubters. We all know people who rely on these programs! Vote BLUE and say NO to Republicans!
Label: Pro Social Security
Position Score: 0.999874353
Text: @bnweaver81 @davidsirota The Republicans want to end social security, ACA, Medacaide and Medicare. They want to privatize everything. Nothing for the people and all profit and gains.
Label: Pro Social Security
Position Score: 0.999870181
Text: Americans and especially senior citizens are extremely worried about losing their Social Security and Medicare if Republicans are elected in the midterm elections because that's exactly what Republicans will do and they will also raise taxes on the middle class #VoteBlueIn2022
Label: Pro Social Security
Position Score: 0.999866962
What makes this exciting for us is that our method of training the model was extremely useful in getting past the natural tendency for sentimentality measures to equate positive words = good and negative words = bad. Here we have many negative words in the tweets. “Extremely worried,” “losing,” “end social security,” “sunset social security” were negative phrases mentioned with Social Security. Traditional methods for measuring sentiment would categorize these tweets as negative, when in fact they positively support Social Security.
Here are successfully coded negative examples:
Text: RT @DroppedmicAgain: Your money for your retirement will run out... but there's plenty of money for a million other things that shouldn't be in the budget in the first place. ; The Social Security trust fund most Americans rely on for their retirement will be able to continue to pay out benefits on a timely basis until 2034, one year later than the Treasury Dept. estimated last year, according to an updated government report. https://t.co/DBFt98ctac
Label: Anti Social Security
Position Score: 0.998942792
Text: RT @astoldbyneekz: QT @CNBCnow: Ok so stop taking it out my fucking check. ; The Social Security trust fund most Americans rely on for their retirement will be able to continue to pay out benefits on a timely basis until 2034, one year later than the Treasury Dept. estimated last year, according to an updated government report. https://t.co/DBFt98ctac
Label: Anti Social Security
Position Score: 0.998942792
Text: QT @JohnArnoldFndtn: Private sector, perhaps even Enron in the day, e.g., use (low) interest rates to stuff DB contributions and get tax deferral. That's more than just fed-gvt/SSA talking points. I nominate @jasonfurman for the job of US or world interest rate czar. https://t.co/q1pu8C3HPa ; Social Security Trust Fund is long treasury bonds, so they assume 4.7% interest rate in 2030 to make their forecast look better. Meanwhile the Treasury pays interest, so they assume rates will be 3%. Government accounting < Enron accounting. h/t @jasonfurman (recommended follow)
Label: Anti Social Security
Position Score: 0.998961091
Text: QT @CNBCnow: Good news -- your 62 year-old grandfather won't see his benefits cut by 20% until he turns 75! #HugYourGrampa ; The Social Security trust fund most Americans rely on for their retirement will be able to continue to pay out benefits on a timely basis until 2034, one year later than the Treasury Dept. estimated last year, according to an updated government report. https://t.co/DBFt98ctac
Label: Anti Social Security
Position Score: 0.998961091
Text: RT @KatiePavlich: QT @CNBCnow: The White House is spinning a one year extension of this catastrophe as an example of Biden’s “successful” economic agenda. ; The Social Security trust fund most Americans rely on for their retirement will be able to continue to pay out benefits on a timely basis until 2034, one year later than the Treasury Dept. estimated last year, according to an updated government report. https://t.co/DBFt98ctac
Label: Anti Social Security
Position Score: 0.998237848
The general trend in the negative sentiments is that they are largely powered by the news cycle— in the case of the tweets we captured, the release of the Social Security Trustees Report. Tweets that are anti-Social Security don’t directly attack the program’s workings or its beneficiaries. Instead, they undermine trust in Social Security. It is often thought by activists that this is why there is so much attention to proposals to implement some form of privatization.
The retirement industry is a $16 trillion industry and it salivates when it looks at the Social Security program, itself, as a way to make even more money. The less people trust Social Security, the more they can be convinced to invest with high-fee financial advisors. Undermining trust in the program and promoting privatizing it is their clear end goal and we think is reflected in the anti-Social Security tweets.
Limitations
Now onto the limitations. One of the biggest challenges that we are actively working to solve is that, particularly with Social Security, not everyone talks about it in a political sense. For millions of Americans, Social Security is their income— sometimes their only one. As such, they talk about it in a budgetary way and how a missed check or some other mishap is affecting their lives.
We ran a zero-shot analysis on the tweets and found that indeed, the Social Security conversation includes a lot of noise. Zero-shot analysis is a language analysis method that isn’t pretrained and attempts to categorize language based on arbitrary categories you set up in advance. It then spits out a ranked choice analysis of the categories it thinks each tweet is most about. So Label 1 represents its best pick, Label 2 its second best, etc.
Figure 4. Zero-Shot Analysis of Social Security Tweets
And finally the confidence score for each category by ranked choice.
Figure 5. Confidence Score for Each Category by Ranked Choice
As we can see amongst the analysis at the top, many of those tweets are not likely about Social Security politics at all. The first label has politics only representing roughly half the conversation and is relatively confident about the politics’ share in particular. This is backed up by human interest in the second label being a similar result as we think it’s logical that the human story is so interconnected with this program that likely conversation about it could be reasonably classified as human interest.
We also see in the data a fair amount of scam tweets trying to steal Social Security numbers. One of the improvements we seek to make in the future is to limit the analysis of tweets to the zero-shot 1st label politics cohort, and perhaps even further refine it.
N-Gram Analysis
We dig deeper into the tweets about Social Security to identify key phrases that are being used in the conversation. Sorting through the data, we pulled the top thirty 2-, 3- and 4-word N-grams, or combinations of words, occurring among the tweets to see if any particularly compelling phrases are driving the conversation. What immediately pops out from this analysis is that these themes are among the most common:
People are very worried about Social Security ending, being taken away, or cut
Some people are talking about expanding Social Security, while others want to “reform” it (reforming Social Security is the language used by Republicans who want to end or privatize Social Security).
Scammers are actively trying to get people to give them their Social Security numbers
The N-gram results, backed up by the ratio of political to human interest tweets by the 0-shot classifier discussed above, identify a significant spam/scam portion of the conversation represented by tweets trying to harvest Social Security numbers. This issue clearly needs to be addressed by Twitter, as it represents a significant portion of the online conversation about Social Security and is designed to snare those looking for information they need regarding their own Social Security accounts.
Scam tweets aside, we can see that N-gram analysis is excellent for identifying common messaging themes occurring within a social media conversation. And we can also use it to identify themes that may be less prominent, at the moment, but are worth preparing for in the event that it gets traction.
Working with N-Grams, as we have learned, can be somewhat of an art form. Knowing which “stop” or exclusion words to use can either clean the data of irrelevant tweets (like the scams) or it can eliminate important meaningful tweets, depending on which words are excluded. An experienced human touch is necessary to figure out which words to exclude and which to include. There is a bit of trial and error, along with finesse to work through iterations of this analysis to see the themes really pop. The N-Gram analysis is a crucial piece of the puzzle for understanding the context of the conversations happening on Twitter and other online platforms because this is where most of the clues originate that we then explore using the zero-shot and Poles analysis.
Conclusions
No matter the issue, the better we understand what people are saying about it, who is saying it, and whose comments are drawing the most attention and engagement, the better we can develop counter messaging AND counter distribution strategy. This is a core capacity we should expect from our digital organizing programs. Not only does it provide high value intelligence, but it also provides key insights for targeting and is a rich resource for building your audience on social media and beyond; online and offline.
For Social Security, we found that despite the loud and far reach of the anti-Social Security message, it originates from a small number of advocacy organizations and media outlets. By contrast, though producing a smaller reach, the pro-Social Security conversation is happening among a much more expansive population spread across the country, And not only is there more participation on the pro-Social Security side of the conversation, but much of their conversation talks about Social Security as their primary income and the loss of Social Security is a family budget issue, not a political issue.
How can this analysis applied to your issue help you improve your advocacy, political, and fundraising campaigns? We think it will be pivotal.