# CSE Doctoral Student Seminar: Zihao Deng and Wei Tang

*Nov 9, 2018*

*12:30 p.m.*

*2 p.m.*

*Lopata Hall, Room 101*

**"Learnability of Low-Energy Neural Networks"**

**Zihao Deng**

Adviser: Brendan Juba

**"Bandit Learning with Biased Human Feedback"**

**Wei Tang**

Adviser: Chien-Ju Ho

We study a multi-armed bandit problem with biased human feedback. In our setting, each arm is associated with an unknown reward distribution. When an arm is played, a user receives a realized re- ward drawn from the distribution of the arm. She then provides feedback, a biased report of the realized reward, that depends on both the realized reward and the feedback history of the arm. The principal can observe only the biased feedback but not the realized rewards. The goal is to design a strategy to sequentially choose arms to maximize the total rewards users receive while only having access to the biased user feedback. We explore two natural feedback models. When user feedback is biased only by the average feedback of the arm (i.e., the ratio of positive feedback), we demonstrate that the evolution of the average feedback over time is mathematically equivalent to users performing online gradient descent for some latent function with a decreasing step size. With this mathematical connection, we show that under some mild conditions, it is possible to design algorithms achieving a regret (i.e., the difference between the algorithm performance and the optimal performance of always choosing the best arm) sublinear in the number of rounds. However, in another model when user feedback is biased by both the average feedback and the number of feedback, we show that there exist no bandit algorithms that could achieve sublinear regrets. Our results demonstrate the importance of understanding human behavior when applying bandit approaches in systems with humans in the loop.