Lopata Hall, Room 101
"Efficient Nonmyopic Batch Active Search"
Adviser: Roman Garnett
Active search is a learning paradigm for actively identifying as many members of a given class as possible. Important applications include drug discovery, fraud detection, and product recommendation. All existing work focuses on sequential policies, i.e., selecting one point to query at a time. However, in many real applications, it is possible to evaluate multiple points simultaneously. In this paper we investigate batch active search, the first such study we know of in the literature. We first derive the Bayesian optimal policy for batch active search, and prove a lower bound on the performance gap between sequential and batch optimal policies. Then we propose novel batch policies inspired by state-of-the-art sequential policies, and develop an aggressive pruning technique that can further speed up the computation by up-to nearly 50 times. We conduct thorough experiments on three application domains: a citation network, material science, and drug discovery, testing all proposed policies (14 total) for a wide range of batch sizes. Results show that the empirical gap matches our theoretical bound; nonmyopic policies usually beat myopic ones significantly; we also find diversity to be an important consideration for batch policy design.
"Modeling Gene Networks using TFA Inference"
Adviser: Michael Brent
A single cell, whether yeast or human, has no consciousness, but can respond to changes in its environment such as by building tools to harvest encountered nutrients while stopping the production of tools for nutrients no longer available. This is accomplished through complex networks of signals, looping in and out from genes in the cell’s DNA that encodes instructions for building said tools, as well as for building the signal carriers. To model these networks, it makes sense to include the activity of the signal carriers, particularly transcription factors (TFs) that directly activate or repress the copying of instructions from genes, but current technology is unable to efficiently measure these. As such, computational inference of transcription factor activity (TFA) from cell state information that can be efficiently measured is an ongoing problem. This talk will briefly review previous work, the common obstacles, and the current trajectory of TFA inference.