Nov 11, 2016
Lopata Hall, Room 101
Privacy, Information and Generalization in Adaptive Data Analysis
Computer Science and Engineering
Consider an agency holding a large database of sensitive personal information – medical records, census survey answers, web search records, or genetic data, for example. The agency would like to discover and publicly release global characteristics of the data (i.e. to inform policy or business decisions) while protecting the privacy of individuals' records.
I will begin by discussing what makes this problem difficult, and exhibit some of the nontrivial issues that plague simple attempts at anonymization and aggregation. Motivated by this, I will present differential privacy, a rigorous definition of privacy in statistical databases that has received significant attention.
In the latter part of the talk, I will explain how differential privacy is connected to a seemingly different problem: "adaptive data analysis", the practice by which insights gathered from data are used to inform further analysis of the same data set. This is increasingly common in scientific research, in which data sets are shared and re-used across multiple studies. Classical statistical theory assumes that the analysis to be run is selected independently of the data. This assumption breaks down when data are re-used; the resulting dependencies can significantly bias the analyses' outcomes. I'll show how the limiting the information revealed about a data set during analysis allows one to control such bias, and why differentially private analyses provide a particularly attractive tool for limiting information.
Adam Smith is a professor of Computer Science and Engineering at Penn State. His research interests lie in data privacy and cryptography, and their connections to machine learning, statistics, information theory, and quantum computing. He received his Ph.D. from MIT in 2004 and has held visiting positions at the Weizmann Institute of Science, UCLA, Boston University and Harvard. In 2009, he received a Presidential Early Career Award for Scientists and Engineers (PECASE). In 2016, he received the Theory of Cryptography Test of Time award, jointly with C. Dwork, F. McSherry and K. Nissim.