Detecting and monitoring concerns against HPV vaccination on social media using large language models

Rai, Sunny; Kornides, Melanie; Morgan, Jennifer; Kumar, Aman; Cappella, Joseph; Guntuku, Sharath Chandra

doi:10.1038/s41598-024-64703-3

Detecting and monitoring concerns against HPV vaccination on social media using large language models

Article
Open access
Published: 21 June 2024

Volume 14, article number 14362, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Detecting and monitoring concerns against HPV vaccination on social media using large language models

Download PDF

Sunny Rai^1,4,
Melanie Kornides^2,4,
Jennifer Morgan²,
Aman Kumar¹,
Joseph Cappella^3,4 &
…
Sharath Chandra Guntuku^1,4

373 Accesses
Explore all metrics

Abstract

Health risks due to preventable infections such as human papillomavirus (HPV) are exacerbated by persistent vaccine hesitancy. Due to limited sample sizes and the time needed to roll out, traditional methodologies like surveys and interviews offer restricted insights into quickly evolving vaccine concerns. Social media platforms can serve as fertile ground for monitoring vaccine-related conversations and detecting emerging concerns in a scalable and dynamic manner. Using state-of-the-art large language models, we propose a minimally supervised end-to-end approach to identify concerns against HPV vaccination from social media posts. We detect and characterize the concerns against HPV vaccination pre- and post-2020 to understand the evolution of HPV vaccine discourse. Upon analyzing 653 k HPV-related post-2020 tweets, adverse effects, personal anecdotes, and vaccine mandates emerged as the dominant themes. Compared to pre-2020, there is a shift towards personal anecdotes of vaccine injury with a growing call for parental consent and transparency. The proposed approach provides an end-to-end system, i.e. given a collection of tweets, a list of prevalent concerns is returned, providing critical insights for crafting targeted interventions, debunking messages, and informing public health campaigns.

Advances in Social Media Research: Past, Present and Future

Article Open access 06 November 2017

The disaster of misinformation: a review of research in social media

Article 15 February 2022

The impact of fake news on social media and its influence on health during the COVID-19 pandemic: a systematic review

Article 09 October 2021

Introduction

One of the key national aims outlined in the Healthy People 2030 Framework, is to “Increase the proportion of adolescents who get recommended doses of the HPV vaccine—IID‑08”¹. Motivation to vaccinate, which encompasses intersecting constructs of intention, willingness, acceptability, hesitancy, and the social environment (e.g. social norms about vaccination, provider recommendations, vaccine myths, and misinformation about vaccines), collectively contribute to vaccine uptake². Notably, parental reluctance to vaccinate their children, known as parental vaccine hesitancy, significantly correlates with lower HPV vaccination rates among adolescents³. Research conducted on a national US sample revealed a significant prevalence (23%) of parental hesitancy toward HPV vaccination, which exerts a stronger influence on receiving the vaccine than barriers like cost or accessibility⁴. The reasons for vaccine hesitancy evolve with exposure to vaccine-related discourse⁵. Designing effective public health messaging and policies to mitigate vaccine hesitancy necessitates ongoing monitoring of emerging concerns impeding vaccine confidence⁶.

Prior works have primarily relied on interviews and surveys to uncover concerns against vaccines⁷; however, there is a growing interest in examining social media data due to its extensive reach among the masses⁸. Nearly all parents of adolescents engage with social media, with 68% using it for health information⁹. Manual annotation of social media posts permits examination of a limited number of posts and remains static. However, the concerns evolve, and dynamically tracking them is an unsolved research challenge. Social media is an easily accessible platform to learn people’s concerns and beliefs regarding vaccinations¹⁰. Vaccine misinformation (also concerns against vaccination) is a frequently studied research problem; forty-three percent of health-related misinformation studies on social media are vaccine-related¹¹. Moreover, social media content discussing vaccine harms strongly influences vaccination behavior¹². Individuals exposed to vaccination harm-related stories were 44% more likely to refuse vaccination and 13% more likely to delay vaccination than unexposed individuals¹³.

Digital public health surveillance via social listening is a promising avenue for identifying and addressing health concerns at the earliest stages¹⁴. WHO built a social listening platform, EARS to track emerging concerns related to COVID-19 from social media posts using a semi-supervised machine learning technique¹⁵. Sentiment¹⁶ and stances¹⁷ against HPV vaccination on social media platforms, including Twitter and Reddit¹⁸ are monitored to estimate people’s opinions. Topical analyses and the diffusion of HPV-related content on social media can reveal underlying concerns and prevent the cascading effect of misinformation^19,20. However, social listening platforms need to be repetitively manually supervised to be effective²¹. This paper aims to overcome the need for periodic human supervision and proposes a minimally supervised approach to identify emerging concerns against HPV vaccination from public tweets. Our approach relies on semantic information encoded in language model embeddings and thus requires minimal labeled data. The proposed method is an end-to-end system wherein, given a set of tweets, it generates a set of articulated concerns directly applicable to subsequent tasks, such as crafting debunking messages or designing public health initiatives. Moreover, the existing public concerns collected from HPV-related online communication can be easily compared and contrasted with concerns identified from the latest ongoing dialogue on HPV vaccination, This will facilitate dynamic adjustments to debunking messages or enhancements to ongoing health campaigns based on the evolving landscape of concerns.

Methods

Data

Using Twitter API, we collected 653 K HPV-related tweets posted from Jan 2020 to June 2022 using the search keywords—{hpv, gardasil, papilloma}. The tweets were processed to mask the URLs and user mentions from the content of the tweets. The proposed approach comprises three phases (a) Characterizing HPV discourse, (b) Identifying topics associated with concerns against HPV vaccines and (c) Leveraging GPT-4 for contextual topic labeling. These phases are discussed in detail below.

Characterizing HPV discourse

The first step identifies latent themes in HPV discourse using topic modeling. Latent Dirichlet allocation (LDA)²² is an unsupervised clustering algorithm to identify latent topics in large quantities of text. The algorithm assumes that each word occurrence can be attributed to one or more topics generated from the corpus. Words are assigned to a topic based on co-occurrence with other words across the corpus of HPV-related tweets and repeated until all words are designated to a set of topics with other semantically similar words. These topics represent semantically coherent clusters of words in which words are assigned weights based on their likelihood of occurring within each topic. We first removed the top 100 most frequent words in the dataset. Using the DLATK library’s interface for the MALLET implementation of LDA²³, we generated 50, 100, and 150 topics, with an alpha level of 5. We computed coherence scores²⁴ for all sets of topics (See Supplementary). Two human experts independently analyzed the quality of the word clusters. The number of topics was set to 100 after evaluating the quality and granularity of topics.

Topics associated with “concern” specific HPV discourse

To recognize topics related to concerns against HPV vaccines, one alternative is to manually analyze the topics (cluster of words) and related tweets. However, it is a slow and labor-intensive process that needs to be performed after every few months²¹. Automation will enable periodic and fine-grained analysis of vaccine hesitancy at scale. In our approach, we first build a classifier to automatically label tweets expressing concern against HPV vaccination and then perform a regression analysis on topic distribution (in Sect. "Characterizing HPV discourse") of labeled tweets to identify topics correlated with concerns.

We used an existing hand-labeled collection of 3876 tweets⁸ posted between December 15, 2019, and March 31, 2020, for training a supervised classifier for HPV vaccination related concern detection. 24% of 3876 tweets were hand-coded as expressing concern against HPV vaccines or vaccination. Here, concern refers to any reason (misinformation-driven or legitimate) causing reluctance against HPV vaccines. The prediction performance of a classifier hinges on access to abundant labeled data within the specific domain. However, annotating large amounts of data is both time-consuming and cost-intensive. We mitigate these challenges by using contextual embeddings from language models. Compared to word embedding such as word2vec, contextual embeddings are adept at disambiguating polysemous words and recognizing semantic meanings such as expressions of fear, conversation style, etc. prevalent in concern-expressing tweets. Using embeddings with pre-encoded semantic meanings as features helps in performing few-shot learning i.e. training a good quality classifier with a small set of labeled data. The shared semantic meaning in embeddings ensures that the model learns to distinguish the semantic style and tone in tweets expressing concerns from others even when topical content varies. We used pre-trained Robust Bidirectional Encoder Representations from Transformers (RoBERTa) embeddings²⁵ to transform words in tweets into numeric features. Twenty percent of the 3876 tweets were randomly sampled to create a test set, and the rest were used for training a logistic regression model. The trained model provided an AuC of 0.958 and an F1 score of 0.88 on the test data. We applied the trained model to 653 K tweets (from Sect "Data") to identify tweets expressing concerns.

Upon identifying tweets expressing concern, we performed regression analysis on tweets’ topic distributions (from Sect "Characterizing HPV discourse") to find topics correlated with “concern”. All topics with odds ratios > 1 and with confidence intervals of 95% were considered for further analysis. In parallel, we extracted 50 LDA topics from the training dataset and performed regression to identify topics correlated with concerns. The number of topics was set to 50 after analyzing topics’ quality.

Leveraging GPT-4 for contextual topic labeling

A topic is a set of related words and requires additional analysis to derive an understandable and relevant theme. e.g. “doctors, jab, decision, am, injured, daughter”—> adverse side-effects of vaccine/ vaccine injury. Labeling topics can also help in connecting overlap** concerns (e.g. mistrust against pharmaceutical companies and adverse side effects). The labeling task demands an expert aware of the terminologies in HPV-related discourse and their implicit connotations.

A growing amount of literature supports the utility of language models as expert annotators with performance at par with humans²⁶. We used a state-of-the-art language model, GPT-4 chat, and performed prompt engineering (See Table 1 for prompt and model parameters) to summarize the cluster of words in topics.

Table 1 Prompts used to label “Topic” and “Themes” using GPT 4 Chat.

Full size table

Results

Performance evaluation

Almost half of the 653 K tweets were predicted as expressing a concern. Three human experts hand-coded randomly sampled 3 K tweets from the newly collected tweets dataset to evaluate the quality of the predictions. On a subset of 106 tweets, the kappa statistic for inter-rater agreement was 0.64 for (rater-1, rater-2), 0.6 for (rater-1, rater-3), and 0.6 for (rater 2, rater 3). Every tweet was handcoded by two coders and discrepancies in labels were resolved by consensus. 12.7% of 3 K tweets were hand-coded as expressing concern.

We obtained a recall of 71.4% and a precision of 31.2%. The objective of our study is to identify the new and evolving concerns against HPV vaccination, and we optimized our model for high recall which led to relatively poor precision. The model accurately captured the tone of worry in tweets; however, it also mislabeled the tweets expressing concern against the HPV disease as a concern. 28.6% of concern-expressing tweets were incorrectly predicted as not concern whereas 22.9% of non-concern tweets were incorrectly predicted as concern. The samples of misclassified tweets that were incorrectly predicted as a “concern” (i.e. False Positive) and incorrectly predicted as “not concern” (i.e. False negative) are provided in Table 2.

Table 2 Misclassified tweets: false positives and false negatives.

Full size table

Topics associated with concern in HPV discourse

Forty-six topics were significantly correlated with concern expressing HPV discourse in the newly collected tweets (See Table 3). Lawsuits against Gardasil (“lawsuit, merck, filed, behalf, #gardasil”, OR = 3.36), personal experiences (“women, she, her, pay, geico”, OR = 2.05; “where, especially, doctors, jab, decision”, OR = 2.02), adverse side effects (“effects, side, vax, adverse, look”, OR = 1.73; “down, someone, off, bad, hope”, OR = 1.27) and vaccine mandates for adolescents (“becuase, does would, those, remember”, OR = 1.37; “kids, them, vaccinated, covid, children”, OR = 1.35) are the dominant themes behind vaccine hesitancy. Topics related to sexual health and STIs (“hiv, herpes, too, aids, having”, OR = 1.14), the effectiveness of HPV vaccination along with innate immunity to overcome HPV infection (“immune, system, symptoms, body, its”, OR = 1.09) are also debated. Additionally, topics (“free, insurance, poor, cells, areas”, OR = 1.05) discuss the availability of affordable vaccines for the poor and uninsured.

Table 3 Top 10 topics significantly correlated (p < 0.05) with concern in the new tweets dataset.

Full size table

Evaluating quality of topic labeling using GPT-4

Two human experts evaluated the quality of labels assigned by GPT 4 chat. The experts were asked to rate the correctness of the label on a 3-level Likert scale i.e. appropriate, somewhat appropriate, and not appropriate (See Table S3 for annotation guidelines). Out of 46 topics (OR > 1 and p < 0.05), only ten (21.7%) topic labels generated using prompt-1 were marked as “not appropriate” by either of the annotators, whereas all clustered theme labels generated using prompt-2 were marked as “appropriate” (i.e., all topics fall under this theme) or “somewhat appropriate” (i.e., not all topics fall under this theme but the majority do). These labels were also generated for the training tweets dataset⁸; three (17.6%) out of 17 topic labels (OR > 1 and p < 0.05) generated using prompt-1 were marked as “not appropriate” by either of the annotators. In contrast, all theme labels were marked as “appropriate” or “somewhat appropriate.” It is worth noting that the GPT-4 chat tends to over-generalize when clustering topics and labeling themes, e.g. topics related to vaccine mandates were assigned the theme “Personal Experiences and Opinions,” which is not completely incorrect but lacks the precise concern.

Pre and post-2020: evolution of concerns against HPV vaccination

The themes in tweets pre-2020 are (a) Adverse Effects and Controversies (“#vaccineinjury, after, #study):, case, et”, OR = 1.45; “after, gardasil, dr, expert, harper”, OR = 1.43), (b) Vaccine Efficacy and Disease Prevention (“may, lesions, disease, it's, merck's”, OR = 1.21), (c) Vaccine Mandates and Parental Concerns (“mandate, school, parents, #hpvvax, vaccines”, OR = 1.18), and (d) Personal Experiences and Opinions (“they, his, was, my, he, there”, OR = 1.14) (See Table 4).

Table 4 Top 10 topics significantly correlated (p < 0.05) with concern in the training dataset i.e., tweets posted pre 2020⁸.

Full size table

In tweets post 2020, topics “adverse effects and controversies” (“lawsuit, merck, filed, behalf, #gardasil”, OR = 3.36) and “Vaccine Mandates and Parental Concerns” (“because, does, would, those, remember”, OR = 1.32) remained the top concerns. Personal experiences centered on parent’s consent and vaccine mandates are discussed profusely. We also observe unseen themes i.e. HPV and Women's Health (“girls, india, gates, bill, africa, foundation”, OR = 1.3), HPV Vaccine Development and Market (“big, money, way, keep, where, daughters”, OR = 1.22; “market, data, pdsb, china, top”, OR = 1.02), and Sexual Health and STIs (“hiv, herpes, too, aids, having”, OR = 1.14; “hiv, herpes, list, syphilis, gonorrhea”, OR = 1.09) and body immunity (“immune, system, symptoms, body, its”, OR = 1.09) (See Table S2).

Overall, the discussion on HPV vaccination has become more personal in the past few years, with more individuals questioning the HPV vaccine mandates for school children. Consequently, we also see more tweets sharing personal experiences and increasing parents’ reluctance towards the HPV vaccine for their children.

Discussion

Disproving misinformation on social media is a challenge, often intensified by the lack of explicit information countering the false claims, thereby strengthening individuals' beliefs. Particularly concerning HPV vaccine misinformation, posts of this nature tend to receive higher "likes" compared to pro-vaccine content²⁰, amplifying their visibility among wider audiences. Amid the COVID-19 pandemic, vaccination-related concerns surged due to rapidly changing guidelines. Healthcare professionals and policymakers encounter a significant challenge in compiling and comprehending the diverse array of vaccine-related concerns causing hesitancy. People's tendency to seek information online regarding potential adverse effects, and often finding search results affirming their fears, has further complicated the vaccination landscape.

The unfiltered access to first-hand public opinions on social media presents an opportunity to learn and address concerns that might otherwise go unnoticed. In the past, topical analyses coupled with human supervision were used for collating reasons behind vaccine hesitancy^27,28. The proposed pipeline revealed vaccine safety, vaccine effectiveness, and mistrust due to vaccine mandates as major concerns in pre 2020 tweets; this is aligned with prior findings examining social media posts from a similar time frame¹⁶. Increasing negative anecdotal reports influence parents’ decision to vaccinate their children⁷. The reasons behind vaccine hesitancy also evolve with the socio-political environment (e.g. increased mistrust against government and healthcare institutions during COVID-19) and health policies (e.g. mandating vaccines). Our approach can be easily adapted to detect multilingual concerns at the desired region level such as country, state, county, etc. with minimal human supervision.

There are several limitations to our approach that need to be addressed. First, our study was limited to tweets in English and did not cover concerns of non-English speakers. Second, we limited our analysis to textual posts however, multimodal content such as memes on X and other social media platforms such as Tiktok and YouTube are also of interest²⁹. Third, not all expressions of vaccine hesitancy manifest in overtly negative or explicit concerns. Our model struggles to detect obscure concerns against vaccination^30,31. Below are a few examples that were misclassified:

“RT <USER>: I thought <EMOJI> aged out of the HPV vaccines but you can them until <EMOJI> 45! Expensive, though.”

Here, the writer does not have a negative stance against the HPV vaccine, however, the cost of the vaccine is the concern.

“<USER> HPV isn't even that dangerous, and it doesn't show up in males. Even if that WERE true, there is absolutely no way Vinny could've known he had it because there's no male HPV test, from what I understand.”

Here, the writer has projected HPV as a low-risk illness and does not have an explicit negative stance against vaccination or tests.

Tweets can also have a sarcastic tone; a negative message is conveyed with a positive undertone.

“Thanks <USER> - your dangerous and ineffective HPV vaccine is doing a great job! <EMOJI> <URL>”

“RT <USER>: How did that HPV vaccine go again? How many young girls were paralyzed by this perfectly safe and vitally necessary vaccine? <EMOJI>”

Or, tweets plainly stating facts or news that have a negative undertone.

“RT <USER>: More than 25% of parents in 2019 refused the HPV vaccine for their child, up from 5% in 2008, showing "disinformation <EMOJI>”

To conclude, automating the detection and monitoring of vaccination-related concerns is a complex task that could immensely benefit from advancing natural language processing. LLMs could help examine obscure motivations³² behind vaccine hesitancy. More research is needed to expand digital social listening to multimodal content.

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

Healthy People. Increase the proportion of adolescents who get recommended doses of the HPV vaccine—Data - Healthy People 2030 | health.gov. https://health.gov/healthypeople/objectives-and-data/browse-objectives/vaccination/increase-proportion-adolescents-who-get-recommended-doses-hpv-vaccine-iid-08/data (2023).
Brewer, N. T., Chapman, G. B., Rothman, A. J., Leask, J. & Kempe, A. Increasing vaccination: Putting psychological science into action. Psychol. Sci. Public Interest J. Am. Psychol. Soc. 18, 149–207 (2017).
Google Scholar
Bianco, A., Mascaro, V., Zucco, R. & Pavia, M. Parent perspectives on childhood vaccination: How to deal with vaccine hesitancy and refusal?. Vaccine 37, 984–990 (2019).
Article PubMed Google Scholar
Sonawane, K. et al. Factors associated with parental human papillomavirus vaccination intentions among adolescents from socioeconomically advantaged versus deprived households: A nationwide, cross-sectional survey. Lancet Reg. Health Am. 31, 100694 (2024).
PubMed PubMed Central Google Scholar
Sonawane, K. et al. Trends in human papillomavirus vaccine safety concerns and adverse event reporting in the United States. JAMA Netw. Open 4, e2124502 (2021).
Article PubMed PubMed Central Google Scholar
Chan, M.-P.S., Jones, C. R., Hall Jamieson, K. & Albarracín, D. Debunking: A meta-analysis of the psychological efficacy of messages countering misinformation. Psychol. Sci. 28, 1531–1546 (2017).
Article PubMed PubMed Central Google Scholar
Beavis, A. L. et al. Exploring HPV vaccine hesitant parents’ perspectives on decision-making and motivators for vaccination. Vaccine X 12, 100231 (2022).
Article PubMed PubMed Central Google Scholar
Kornides, M. L. et al. Exploring content of misinformation about HPV vaccine on twitter. J. Behav. Med. 46, 1–14 (2022).
Google Scholar
Bryan, M. A., Evans, Y., Morishita, C., Midamba, N. & Moreno, M. Parental perceptions of the internet and social media as a source of pediatric health information. Acad. Pediatr. 20, 31–38 (2020).
Article PubMed Google Scholar
Sundstrom, B. et al. Correcting HPV vaccination misinformation online: Evaluating the HPV vaccination NOW social media campaign. Vaccines Basel 9, 352 (2021).
Article CAS PubMed PubMed Central Google Scholar
Suarez-Lledo, V. & Alvarez-Galvez, J. Prevalence of health misinformation on social media: Systematic review. J. Med. Internet Res. 23, e17187 (2021).
Article PubMed PubMed Central Google Scholar
Dubé, E. et al. Vaccine hesitancy: An overview. Hum. Vaccines Immunother. 9, 1763–1773 (2013).
Article Google Scholar
Margolis, M. A., Brewer, N. T., Shah, P. D., Calo, W. A. & Gilkey, M. B. Stories about HPV vaccine in social media, traditional media, and conversations. Prev. Med. 118, 251–256 (2019).
Article PubMed Google Scholar
Ishizumi, A. et al. Beyond misinformation: Develo** a public health prevention framework for managing information ecosystems. Lancet Public Health https://doi.org/10.1016/S2468-2667(24)00031-8 (2024).
Article PubMed Google Scholar
White, B. K. et al. Using machine learning technology (early artificial intelligence–supported response with social listening platform) to enhance digital social understanding for the COVID-19 infodemic: Development and implementation study. JMIR Infodemiol. 3, e47317 (2023).
Article Google Scholar
Boucher, J.-C. et al. HPV vaccine narratives on Twitter during the COVID-19 pandemic: A social network, thematic, and sentiment analysis. BMC Public Health 23, 694 (2023).
Article PubMed PubMed Central Google Scholar
Skeppstedt, M., Kerren, A. & Stede, M. Automatic detection of stance towards vaccination in online discussion forums. In Proc. of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) (eds. Jonnagaddala, J., Dai, H.-J. & Chang, Y.-C.) 1–8 (Association for Computational Linguistics, 2017).
Du, J. et al. Using machine learning–based approaches for the detection and classification of human papillomavirus vaccine misinformation: Infodemiology study of Reddit discussions. J. Med. Internet Res. 23, e26478 (2021).
Article PubMed PubMed Central Google Scholar
Chin, J. et al. Tracking the human papillomavirus vaccine risk misinformation: An explorative study to examine how the misinformation has spread in user-generated content. Proc. Int. Symp. Hum. Factors Ergon. Health Care 9, 312–316 (2020).
Article Google Scholar
Massey, P. M. et al. Dimensions of misinformation about the HPV vaccine on instagram: Content and network analysis of social media characteristics. J. Med. Internet Res. 22, e21451 (2020).
Article PubMed PubMed Central ADS Google Scholar
Boatman, D. et al. Using social listening for digital public health surveillance of human papillomavirus vaccine misinformation online: Exploratory study. JMIR Infodemiol. 4, e54000 (2024).
Article Google Scholar
Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).
Google Scholar
Schwartz, H. A. et al. Dlatk: Differential language analysis toolkit. In Proc. of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (eds. Specia, L., Post, M. & Paul, M.) 55–60 (Association for Computational Linguistics, Copenhagen, 2017) https://doi.org/10.18653/v1/D17-2010.
Mimno, D., Wallach, H. M., Talley, E., Leenders, M. & Mccallum, A. Optimizing semantic coherence in topic models. In Proc. of the Conference on Empirical Methods in Natural Language Processing, 262–272 (Association for Computational Linguistics, 2011).
Liu, Y. et al. RoBERTa: A robustly optimized BERT pretraining approach (2019).
Alizadeh, M. et al. Open-source large language models outperform crowd workers and approach ChatGPT in text-annotation tasks. Preprint at https://doi.org/10.48550/ar**v.2307.02179 (2023).
Jiang, S., Wang, P., Liu, P. L., Ngien, A. & Wu, X. Social media communication about HPV vaccine in China: A study using topic modeling and survey. Health Commun. https://doi.org/10.1080/10410236.2021.1983338 (2023).
Article PubMed Google Scholar
Surian, D. et al. Characterizing twitter discussions about HPV vaccines using topic modeling and community detection. J. Med. Internet Res. 18, e6045 (2016).
Article Google Scholar
Van Poucke, M. COVID-19 vaccine hesitancy and shaming on TikTok: A multimodal appraisal analysis. Multimodality Soc. 3, 97–129 (2023).
Article Google Scholar
Fasce, A. et al. A taxonomy of anti-vaccination arguments from a systematic literature review and text modelling. Nat. Hum. Behav. 7, 1462–1480 (2023).
Article PubMed Google Scholar
Du, J. et al. Leveraging deep learning to understand health beliefs about the human papillomavirus vaccine from social media. NPJ Digit. Med. 2, 1–4 (2019).
Article Google Scholar
Rai, S. et al. A cross-cultural analysis of social norms in bollywood and hollywood movies. Preprint at https://doi.org/10.48550/ar**v.2402.11333 (2024).

Download references

Funding

This project was partly supported by Penn Global and the Indian Research Engagement Fund, National Institutes of Health, NIH-NIMHD:R01MD018340 awarded to Dr. Guntuku and NIH-NCI:R37CA259210 awarded to Dr. Kornides.

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
Sunny Rai, Aman Kumar & Sharath Chandra Guntuku
School of Nursing, University of Pennsylvania, Philadelphia, PA, 19104, USA
Melanie Kornides & Jennifer Morgan
Annenberg School of Communication, University of Pennsylvania, Philadelphia, PA, 19104, USA
Joseph Cappella
Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA, 19104, USA
Sunny Rai, Melanie Kornides, Joseph Cappella & Sharath Chandra Guntuku

Authors

Sunny Rai
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Kornides
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Morgan
View author publications
You can also search for this author in PubMed Google Scholar
Aman Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Cappella
View author publications
You can also search for this author in PubMed Google Scholar
Sharath Chandra Guntuku
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.R, M.K, J.M, J.C and S.C.G conceived and designed the research; S.R, A.K, and S.C.G. performed research; S.R, A.K, S.C.G. contributed new reagents/analytic tools and analyzed the data; S.R., J.M., S.C.G. wrote the paper.

Corresponding author

Correspondence to Sunny Rai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Tables.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rai, S., Kornides, M., Morgan, J. et al. Detecting and monitoring concerns against HPV vaccination on social media using large language models. Sci Rep 14, 14362 (2024). https://doi.org/10.1038/s41598-024-64703-3

Download citation

Received: 17 December 2023
Accepted: 12 June 2024
Published: 21 June 2024
DOI: https://doi.org/10.1038/s41598-024-64703-3
Springer Nature Limited

Detecting and monitoring concerns against HPV vaccination on social media using large language models

Abstract

Similar content being viewed by others

Advances in Social Media Research: Past, Present and Future

The disaster of misinformation: a review of research in social media

The impact of fake news on social media and its influence on health during the COVID-19 pandemic: a systematic review

Introduction

Methods

Data

Characterizing HPV discourse

Topics associated with “concern” specific HPV discourse

Leveraging GPT-4 for contextual topic labeling

Results

Performance evaluation

Topics associated with concern in HPV discourse

Evaluating quality of topic labeling using GPT-4

Pre and post-2020: evolution of concerns against HPV vaccination

Discussion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Tables.

Rights and permissions

About this article

Cite this article

Navigation

Detecting and monitoring concerns against HPV vaccination on social media using large language models

Abstract

Similar content being viewed by others

Advances in Social Media Research: Past, Present and Future

The disaster of misinformation: a review of research in social media

The impact of fake news on social media and its influence on health during the COVID-19 pandemic: a systematic review

Introduction

Methods

Data

Characterizing HPV discourse

Topics associated with “concern” specific HPV discourse

Leveraging GPT-4 for contextual topic labeling

Results

Performance evaluation

Topics associated with concern in HPV discourse

Evaluating quality of topic labeling using GPT-4

Pre and post-2020: evolution of concerns against HPV vaccination

Discussion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Tables.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation