Skip to main content

Epsilon-greedy strategy for nonparametric bandits

Date:
-
Location:
MDS 220
Speaker(s) / Presenter(s):
Sakshi Arya

Title:  Epsilon-greedy strategy for nonparametric bandits 

Abstract: Contextual bandit algorithms are popular for sequential decision-making in several practical applications, ranging from online advertisement recommendations to mobile health. The goal of such problems is to maximize cumulative reward over time for a set of choices/arms while considering covariate (or contextual) information. Epsilon-Greedy is a popular heuristic for the Multi-Armed Bandits problem, however, it is not one of the most studied algorithms theoretically in the presence of contextual information. We study the Epsilon-Greedy strategy in nonparametric bandits, i.e., when no parametric form is assumed for the reward functions.  In this work, we assume that the similarities between the covariates and expected rewards can be modeled as arbitrary linear functions of the contexts' images in a specific reproducing kernel Hilbert space (RKHS). We propose a kernelized epsilon-greedy algorithm and establish its convergence rates for estimation and cumulative regret, which are closely tied to the intrinsic dimensionality of the RKHS. We show that the rates closely match the optimal rates for linear contextual bandits when restricted to a finite-dimensional RKHS. Lastly, we illustrate our results through simulation studies and real-data analysis.

Event Series: