moe.bandit.bla package¶
Submodules¶
moe.bandit.bla.bla module¶
Classes (Python) to compute the Bandit BLA (Bayesian Learning Automaton) arm allocation and choose the arm to pull next.
See moe.bandit.bandit_interface for further details on bandits.
- class moe.bandit.bla.bla.BLA(historical_info, subtype='BLA')[source]¶
Bases: moe.bandit.bandit_interface.BanditInterface
Implementation of the constructor of BLA (Bayesian Learning Automaton) and method allocate_arms.
A class to encapsulate the computation of bandit BLA. The Algorithm is from the paper: A Generic Solution to Multi-Armed Bernoulli Bandit Problems, Norheim, Bradland, Granmo, OOmmen (2010) ICAART.
See moe.bandit.bandit_interface docs for further details.
- allocate_arms()[source]¶
Compute the allocation to each arm given historical_info, running bandit subtype endpoint.
Computes the allocation to each arm based on the given subtype, and, historical info.
Works with k-armed bandits (k >= 1).
The Algorithm is from the paper: A Generic Solution to Multi-Armed Bernoulli Bandit Problems, Norheim, Bradland, Granmo, OOmmen (2010) ICAART. The original algorithm handles k = 2. We extended the algorithm naturally to handle k >= 1.
This method will pull the optimal arm (best BLA payoff).
See moe.bandit.bla.bla.BLA.get_bla_payoff() for details on how to compute the BLA payoff
In case of a tie, the method will split the allocation among the optimal arms. For example, if we have three arms (arm1, arm2, and arm3) with expected BLA payoff 0.5, 0.5, and 0.1 respectively. We split the allocation between the optimal arms arm1 and arm2.
{arm1: 0.5, arm2: 0.5, arm3: 0.0}
Returns: the dictionary of (arm, allocation) key-value pairs Return type: a dictionary of (str, float64) pairs Raise: ValueError when sample_arms are empty.
- get_bla_payoff(sampled_arm)[source]¶
Compute the BLA payoff using the BLA subtype formula.
BLA payoff is computed as follows:
\[r_j = Sample(Beta(\alpha_j, \beta_j))\]where \(\alpha_j\) is the number of arm j wins + 1 (sampled_arm.win + 1) and \(\beta_j\) is the number of arm j losses + 1 (sampled_arm.total - sampled_arm.win + 1).
In other words, BLA payoff is computed by sampling from a beta distribution \(Beta(\alpha, \beta)\) with \(\alpha = number\_wins + 1\) and \(\beta = number\_losses + 1 = number\_total - number\_wins + 1\).
Note that for an unsampled arm, \(Beta(1, 1)\) is a uniform distribution. Learn more about beta distribution at http://en.wikipedia.org/wiki/Beta_distribution.
Parameters: sampled_arm (moe.bandit.data_containers.SampleArm) – a sampled arm Returns: bla payoff Return type: float64 Raise: ValueError when sampled_arm is empty.
- get_winning_arm_names(arms_sampled)[source]¶
Compute the set of winning arm names based on the given arms_sampled..
Throws an exception when arms_sampled is empty.
Parameters: arms_sampled (dictionary of (str, SampleArm()) pairs) – a dictionary of arm name to moe.bandit.data_containers.SampleArm Returns: set of names of the winning arms Return type: frozenset(str) Raise: ValueError when arms_sampled are empty.
Module contents¶
Bandit directory containing multi-armed bandit implementations of BLA policies in python.
Files in this package
- moe.bandit.bla.bla: BLA object for allocating bandit arms and choosing the winning arm based on BLA policy.