메뉴 드롭다운
메뉴 드롭다운
KR EN

대학원 소개

연구성과

UNIST 인공지능대학원의 대학원 및 연구성과를 확인하실 수 있습니다.

SDM Lab’s (Prof. Gi-Soo Kim) collaborative work to be published at NeurIPS 2021

  • 2021
  • 01.01 - 12.31
Statistical Decision Making (SDM) lab’s paper is accepted to 35th Conference on Neural Information Processing Systems (NeurIPS) 2021, one of the top-3 conferences for artificial intelligence and machine learning.
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing. The dependence of the arm choice on the past context and reward pairs compounds the complexity of regret analysis. We propose a novel multi-armed contextual bandit algorithm called Doubly Robust (DR) Thompson Sampling employing the doubly-robust estimator used in missing data literature to Thompson Sampling with contexts (LinTS).
The proposed algorithm enjoys an improved regret bound compared to LinTS. Also, this is the first regret bound of LinTS that is expressed in terms of the minimum eigenvalue of the covariance matrix of contexts instead of the dimension.
Doubly Robust Thompson Sampling with Linear Payoffs” by Wonyoung Kim, Gi-Soo Kim, Myunghee Cho Paik.
사람 1명의 이미지일 수 있음