Open Bandit Pipeline; a python library for bandit algorithms and off-policy evaluation¶

Overview¶

Open Bandit Pipeline (OBP) is an open source python library for bandit algorithms and off-policy evaluation (OPE). The toolkit comes with the Open Bandit Dataset , a large-scale logged bandit feedback data collected on a fashion e-commerce platform, ZOZOTOWN. The purpose of the open data and library is to enable easy, realistic, and reproducible evaluation of bandit algorithms and OPE. OBP has a series of implementations of dataset preprocessing, bandit policy interfaces, and a variety of OPE estimators.

Our open data and pipeline facilitate evaluation and comparison related to the following research topics.

Bandit Algorithms: Our data include the probabilities of each action being selected by behavior policies (the true propensity scores).

Therefore, it enables the evaluation of new online bandit algorithms, including contextual and combinatorial algorithms, in a large real-world setting.

Off-Policy Evaluation: We present implementations of behavior policies used when collecting datasets as a part of our pipeline.

Our open data also contains logged bandit feedback data generated by multiple behavior policies. Therefore, it enables the evaluation of off-policy evaluation with ground-truths for the performances of evaluation policies.

This website contains pages with example analyses to help demonstrate the usage of this library. Additionally, it presents examples of evaluating counterfactual bandit algorithms and OPE itself. The reference page contains the full reference documentation for the current functions of this toolkit.

Algorithms and OPE Estimators Supported¶

Bandit Algorithms¶

Online

Context-free

Random

Epsilon Greedy

Bernoulli Thompson Sampling

Contextual (Linear)

Linear Epsilon Greedy

Linear Thompson Sampling [10]

Linear Upper Confidence Bound [11]

Contextual (Logistic)

Logistic Epsilon Greedy

Logistic Thompson Sampling [12]

Logistic Upper Confidence Bound [13]

Offline (Off-Policy Learning) [4]

Inverse Probability Weighting

OPE Estimators¶

Replay Method (RM) [14]

Direct Method (DM) [15]

Inverse Probability Weighting (IPW) [2] [3]

Self-Normalized Inverse Probability Weighting (SNIPW) [16]

Doubly Robust (DR) [4]

Switch Estimators [8]

Doubly Robust with Optimistic Shrinkage (DRos) [9]

More Robust Doubly Robust (MRDR) [1]

Double Machine Learning (DML) [17]

Citation¶

If you use our dataset and pipeline in your work, please cite our paper below.

@article{saito2020large,: title={Large-scale Open Dataset, Pipeline, and Benchmark for Bandit Algorithms}, author={Saito, Yuta, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita}, journal={arXiv preprint arXiv:2008.07146}, year={2020}

}

Google Group¶

If you are interested in the Open Bandit Project, we can follow the updates at its google group: https://groups.google.com/g/open-bandit-project

Contact¶

For any question about the paper, data, and pipeline, feel free to contact: saito@hanjuku-kaso.com

Table of Contents¶

Introduction:

Off-Policy Evaluation (OPE):

Getting Started:

Others: