Doctoral Student Seminar:Jinghan Yang and Hai Le

Apr 5
12:30 PM
1:30 PM
Lopata 101
Jinghan Yang
Title: Protecting Geolocation Privacy of Photo Albums

Abstract: As people increasingly share personal information, including their photos and photo albums, on social media, there have been increasing concerns about personal privacy. We consider the specific issue of location privacy as potentially revealed by posting photo albums, which facilitate accurate geolocation with the help of deep learning methods even in the absence of geotags. We formalize this problem as limiting the number of photos to remove from an album to cause incorrect geolocation prediction and study this problem algorithmically. While we show that this problem is NP-Hard for several variants, we exhibit effective solution techniques based on integer programming, as well as an important tractable special case. Our experiments on real photo albums demonstrate that our approaches are indeed highly effective at preserving geolocation privacy with removal of a small fraction of photos.

Hai Le
Title: Conditional Sparse L_p-norm Regression With Optimal Probability

Abstract: We consider the following conditional linear regression problem: the task is to identify both (i) a k-DNF condition c and (ii) a linear rule f such that the probability of c is (approximately) at least some given bound µ, and minimizing the l_p loss of f at predicting the target z in the distribution conditioned on c. Thus, the task is to identify a portion of the distribution on which a linear rule can provide a good fit. Algorithms for this task are useful in cases where portions of the distribution are not modeled well by simple, learnable rules, but on other portions such rules perform well. The prior state-of-the-art for such algorithms could only guarantee finding a condition of probability O(µ/n^k ) when a condition of probability µ exists, and achieved a O(n^k)-approximation to the target loss. Here, we give efficient algorithms for solving this task with a condition c that nearly matches the probability of the ideal condition, while also improving the approximation to the target loss to a O ~(n^{k/2}) factor. We also give an algorithm for finding a k-DNF reference class for prediction at a given query point, that obtains a sparse regression fit that has loss within O(n^k) of optimal among all sparse regression parameters and sufficiently large k-DNF reference classes containing the query point.