Colloquia Series-Dennis Goldfarb

Jan 25
11:00 a.m.
Lopata Hall, Room 101

Novel Data Analysis and Acquisition Methods


Improve Protein Sequencing by Mass Spectrometry


One of the biggest surprises from the Human Genome Project was the small number of genes encoded in our DNA. Our genome's limited repertoire of ~20,000 genes did not reflect the complexity of life, so scientists have turned to the study of the proteome—an organism's full protein complement—to begin filling the void in molecular diversity. However, the primary technique for studying the proteome, mass spectrometry, generates vast quantities of complex data that cannot yet be fully interpreted.

The goal of mass spectrometry in proteomics is to determine the abundance and amino acid sequence of every protein in a biological sample. To accomplish this, proteins are cleaved into subsequences called peptides and injected into a mass spectrometer. The peptides are fragmented into smaller molecules, and a mass analyzer records a mass spectrum: the mass-to-charge ratios and abundances of the fragments. A sequencing algorithm is then tasked with determining which peptides generated the observed fragmentation patterns.

In this talk I will describe projects at various stages of the computational proteomics pipeline. First, while most sequencing algorithms attempt to identify a single peptide per mass spectrum, multiple peptides are often fragmented together and create chimeric mass spectra. I will describe a method to deconvolve chimeric mass spectra into their individual peptide components by examining the isotopic distributions of their fragments. This method results in increased peptide-spectrum-matches by 15-30%. Second, I will describe using data analysis in real-time to optimize the data acquisition strategy of the mass spectrometer.


Dennis Goldfarb is a Postdoctoral Research Associate at the University of North Carolina at Chapel Hill. He received his PhD in Computer Science also from UNC Chapel Hill in 2018 and completed his bachelors in Mathematics and Computer Science at Rensselaer Polytechnic Institute in 2010. His interests are in computational biology, proteomics, and mass spectrometry.