CSE Proposal Defense: Hao Yan

Oct 11, 2018
11 a.m.
1 p.m.
Jolley Hall, Room 309

"Measuring Partisanship, Ideology, and Locality in Political Corpora with Machine Learning"

Hao Yan
Adviser: Sanmay Das

Political discourse is a fundamental aspect of government across the world, especially so in democratic institutions. However, due to the fact that political communication often takes complex linguistic forms, understanding ideology or partisanship from text is an important but hard methodological task in studying political interactions between people and in understanding the role of political institutions in both new and traditional media. In this dissertation, we develop and apply machine learning techniques to design new measures based on text data that are important to political analysis.
We start with the task of classifying partisanship based on text data, and analyze the question of whether measures of partisanship generalize across domains. We find that the cross-domain learning performance, with or without state-of-the-art domain adaptation techniques, is poor when the Congressional Record is compared to media and crowd-sourced estimates, even though the algorithms perform very well in within-dataset cross-validation tests. That is, it is very difficult to generalize from one domain to another.
Next, we turn to investigating measures of ideology, or the intensity of partisanship. Our results show that predicting party affiliation is easy. However, the within-party scores provide very little information about the within-party ideology of legislators, as gleaned from the canonical benchmarks. Legislators use communication strategically across different platforms, and so, estimating ideology from text on a particular platform is a subtle and difficult task.
Third, we propose to develop new measures of "local vs. national concern" based on comparison of social media text and interactions between local elected officials (mayors) and national elected officials (members of congress). Politics at the state level in the United States Congress has recently become more "nationalized." We plan to investigate whether this effect extends to the local level, and the implications thereof. Doing so will require innovation in the measurement techniques and novel research in estimation of distances between chunks of text.