Lopata Hall, Room 101
"Deep Learning Based Cell Distribution Overlap Analysis for Fluorescence Microscopy Embryonic Cell Images"
Adviser: Mark Anastasio
Fluorescence microscopy is commonly employed to study the cell activities in human gastrulation. Quantitative overlap analysis for different cell distributions in fluorescence microscopy embryonic cell images, is a useful way to study the underlying cell movement behaviors during the differentiation of human embryonic stem cells (hESC). Previous colocation based overlap analysis focused on analyzing the overlap of cell individuals, which may cannot elucidate the overlap of the cell distributions. In addition, these methods tried to analyze the overlap in the intensity space, which will make the analysis results are vulnerable to the variance of the intensity and cell shapes in the fluorescence images. In this study, we proposed a novel overlap analysis method to analyze the overlap of the cell distributions. In our method, we first employed a neural network based mapping to map intensity cell image to cell count density map (density map). Then, we analyzed the overlap of cell distributions with these estimated density maps. In this way, we can avoid the instability caused by the intensity variance and shape variance. Preliminary experiment results showed that our method can provide reasonable analysis results for real fluorescence microscopy embryonic cell data provided by Lilianna Solnica-Krezel Lab.
"Precision-Recall versus Accuracy and the Role of Large Data Sets"
Adviser: Brendan Juba
Practitioners of data mining and machine learning have long observed that the imbalance of classes in a data set has a negative impact on the quality of classifiers trained on that data. Numerous techniques for coping with such imbalances have been proposed, but nearly all lack any theoretical grounding. By contrast, the standard theoretical analysis of machine learning admits no dependence on the imbalance of classes at all. The basic theorems of statistical learning establish the number of examples needed to estimate the accuracy of a classifier as a function of its complexity (VC-dimension) and the confidence desired; the class imbalance does not enter these formulas anywhere. In this work, we consider the measures of classifier performance in terms of precision and recall, a measure that is widely suggested as more appropriate to the classification of imbalanced data. We observe that whenever the precision is moderately large, the worse of the precision and recall is within a small constant factor of the accuracy weighted by the class imbalance. A corollary of this observation is that the only cure for class-imbalance is a larger number of examples, a finding we also illustrate empirically. We further observe that for many applications high precision is actually needed, and hence these class-imbalance dependent measures are indeed more relevant than the accuracy.