Apr 3, 2017
Jolley Hall, Room 309
Stochastic Generative Models for Complex Networks
Omidyar Postdoctoral Fellow
Santa Fe Institute
Understanding real-world network datasets requires tools to identify patterns and methods to quantify whether those patterns are noteworthy and meaningful. Stochastic generative models satisfy both these needs in mathematically principled ways by specifying parameters of a stochastic data-generating process that results in an ensemble of networks, each with an associated probability of having been generated by the process. When we engineer the probabilities to be uniform over the ensemble, we can treat the generative model as a null model to measure whether an empirically observed network property is normal or surprising. When we instead use parameters to engineer the probabilities to be biased toward particular types of networks—for example, those with community structures, groups, or clusters—we can infer the parameters that best explain empirical data, thereby detecting communities, extracting hierarchies, or identifying correlations in the process. The properties of the generative model impact not just the types of structures that can be identified, but also the efficiency with which ensemble parameters can be inferred from real data and how rapidly the ensemble can be sampled. Therefore, careful mathematical choices about generative models can drastically improve our ability to make useful predictions and can also enable us to rigorously analyze the performance of statistical inference of network structure. I will introduce stochastic generative models in the context of two applied problems: the evolution of malaria parasite virulence genes and the movement of scholars in the academic labor market. In the process, these investigations will reveal provable limits to the detection of community structures in complex networks which apply beyond the framework of stochastic generative models to network science more broadly.
Daniel Larremore is an Omidyar Fellow at the Santa Fe Institute. His research develops statistical and inferential methods for analyzing large-scale network data, and uses those methods to solve applied problems in diverse domains, including public health and academic labor markets. Prior to joining the Santa Fe Institute he was a post-doctoral fellow at the Harvard T.H. Chan School of Public Health 2012-2015. He obtained his Ph.D. in Applied Mathematics from the University of Colorado at Boulder in 2012, and holds an undergraduate degree from Washington University in St. Louis.