moe.bandit.bla package¶

Submodules¶

moe.bandit.bla.bla module¶

Classes (Python) to compute the Bandit BLA (Bayesian Learning Automaton) arm allocation and choose the arm to pull next.

See moe.bandit.bandit_interface for further details on bandits.

class moe.bandit.bla.bla.BLA(historical_info, subtype='BLA')[source]¶

Bases: moe.bandit.bandit_interface.BanditInterface

Implementation of the constructor of BLA (Bayesian Learning Automaton) and method allocate_arms.

A class to encapsulate the computation of bandit BLA. The Algorithm is from the paper: A Generic Solution to Multi-Armed Bernoulli Bandit Problems, Norheim, Bradland, Granmo, OOmmen (2010) ICAART.

See moe.bandit.bandit_interface docs for further details.

allocate_arms()[source]¶

Compute the allocation to each arm given historical_info, running bandit subtype endpoint.

Computes the allocation to each arm based on the given subtype, and, historical info.

Works with k-armed bandits (k >= 1).

The Algorithm is from the paper: A Generic Solution to Multi-Armed Bernoulli Bandit Problems, Norheim, Bradland, Granmo, OOmmen (2010) ICAART. The original algorithm handles k = 2. We extended the algorithm naturally to handle k >= 1.

This method will pull the optimal arm (best BLA payoff).

See moe.bandit.bla.bla.BLA.get_bla_payoff() for details on how to compute the BLA payoff

In case of a tie, the method will split the allocation among the optimal arms. For example, if we have three arms (arm1, arm2, and arm3) with expected BLA payoff 0.5, 0.5, and 0.1 respectively. We split the allocation between the optimal arms arm1 and arm2.

{arm1: 0.5, arm2: 0.5, arm3: 0.0}

Returns:	the dictionary of (arm, allocation) key-value pairs
Return type:	a dictionary of (str, float64) pairs
Raise:	ValueError when `sample_arms` are empty.

get_bla_payoff(sampled_arm)[source]¶

Compute the BLA payoff using the BLA subtype formula.

BLA payoff is computed as follows:

\[r_j = Sample(Beta(\alpha_j, \beta_j))\]

where \(\alpha_j\) is the number of arm j wins + 1 (sampled_arm.win + 1) and \(\beta_j\) is the number of arm j losses + 1 (sampled_arm.total - sampled_arm.win + 1).

In other words, BLA payoff is computed by sampling from a beta distribution \(Beta(\alpha, \beta)\) with \(\alpha = number\_wins + 1\) and \(\beta = number\_losses + 1 = number\_total - number\_wins + 1\).

Note that for an unsampled arm, \(Beta(1, 1)\) is a uniform distribution. Learn more about beta distribution at http://en.wikipedia.org/wiki/Beta_distribution.

Parameters:	sampled_arm (`moe.bandit.data_containers.SampleArm`) – a sampled arm
Returns:	bla payoff
Return type:	float64
Raise:	ValueError when `sampled_arm` is empty.

get_winning_arm_names(arms_sampled)[source]¶

Compute the set of winning arm names based on the given arms_sampled..

Throws an exception when arms_sampled is empty.

Parameters:	arms_sampled (dictionary of (str, SampleArm()) pairs) – a dictionary of arm name to `moe.bandit.data_containers.SampleArm`
Returns:	set of names of the winning arms
Return type:	frozenset(str)
Raise:	ValueError when `arms_sampled` are empty.

Module contents¶

Bandit directory containing multi-armed bandit implementations of BLA policies in python.

Files in this package

moe.bandit.bla.bla: BLA object for allocating bandit arms and choosing the winning arm based on BLA policy.