R&D Seminar Series: Interdisciplinary Studies Using Natural Language Data

virtual Seminar/Symposium

This talk will introduce three studies that have used natural language data from different domains. The first study investigates how privacy as an ethical concept exists in two languages: Mandarin Chinese and American English. Relying on a mixed-methods approach, using two computational linguistic techniques: structural topic modeling (STM) and semantic network analysis (SNA), this study looks into a 10-year corpus of news and social media posts. This study has revealed variations when it comes to understanding privacy in these two languages.

Zoom Link

Abstract:

This talk will introduce three studies that have used natural language data from different domains.

The first study investigates how privacy as an ethical concept exists in two languages: Mandarin Chinese and American English. Relying on a mixed-methods approach, using two computational linguistic techniques: structural topic modeling (STM) and semantic network analysis (SNA), this study looks into a 10-year corpus of news and social media posts. This study has revealed variations when it comes to understanding privacy in these two languages. In particular, English language studied emphasizes the institution, and the Chinese language emphasizes the individual. This study contributes to comparative privacy research by offering a way to operationalize such work by relying on natural language. In addition, using natural language proves to be an effective way of revealing the conceptualization complexities of an abstract information ethics concept.

The second study examines clinical notes to identify reported symptoms and investigate patient-provider communication processes in alpha-gal syndrome (AGS) retrieved from the Electronic Medical Record Search Engine (EMERSE). Findings from this study can serve as a basis for future automation of rare disease analysis; moreover, this study provides a basic understanding of the granularity of information that an electronic health record (EHR) may provide for rare disease identification.

The third study analyzed about one year’s content from 1600 Daily– The Official White House email style newsletter. In doing so, we identify the central frames the Trump White House relied on leading up to the 2020 election and the media sources used to legitimize these claims. Relying on named entity recognition, frequency counts, structural topic modeling, and qualitative content analysis, this study reveals the important role electoral communication plays in framing current events and the extent to which email is an essential node in the right-wing media ecosystem.

Biosketch:

Yuanye Ma is a doctoral candidate at the University of North Carolina (UNC) at Chapel Hill, School of Information and Library Science (SILS). Her research interests include privacy, surveillance, natural language processing, and information ethics. Specifically, she is interested in working with natural language data to answer real-world questions. One of her core research topics lie at the intersection of information ethics and natural language processing, where she seeks to answer questions about how different languages express information ethics concepts; how conceptual changes are manifested via changes in language; and if languages themselves have an impact on how people of these languages conceptualize and understand information ethics concepts and issues.

R&D Seminar Series: Interdisciplinary Studies Using Natural Language Data

Upcoming Events

Tech Pulse 2030: Guardians of the Algorithm: Responsible AI Through the Security

AI x Healthcare Forum

R&D Seminar Series: Interdisciplinary Studies Using Natural Language Data

Share This

Upcoming Events

Tech Pulse 2030: Guardians of the Algorithm: Responsible AI Through the Security

AI x Healthcare Forum