Efficient Counterfactual Learning from Bandit Feedback

Yusuke Narita
Shota Yasui
Kohei Yata

Document Type

Discussion Paper

Publication Date

12-1-2018

CFDP Number

2155

CFDP Pages

Journal of Economic Literature (JEL) Code(s)

C1, C5, C9

Abstract

What is the most statistically eﬀicient way to do oﬀ-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider oﬀline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical conﬁdence compared to a state-of-theart benchmark.

Recommended Citation

Narita, Yusuke; Yasui, Shota; and Yata, Kohei, "Efficient Counterfactual Learning from Bandit Feedback" (2018). Cowles Foundation Discussion Papers. 110.
https://elischolar.library.yale.edu/cowles-discussion-paper-series/110

Download

Included in

Economics Commons

COinS

Efficient Counterfactual Learning from Bandit Feedback

Document Type

Publication Date

CFDP Number

CFDP Pages

Journal of Economic Literature (JEL) Code(s)

Abstract

Recommended Citation

Included in

Search

Browse

Contribute

Copyright, Publishing and Open Access

Links

Cowles Foundation Discussion Papers

Efficient Counterfactual Learning from Bandit Feedback

Authors

Document Type

Publication Date

CFDP Number

CFDP Pages

Journal of Economic Literature (JEL) Code(s)

Abstract

Recommended Citation

Included in

Share

Search

Browse

Contribute

Copyright, Publishing and Open Access

Links