moe.bandit.ucb package¶

Submodules¶

moe.bandit.ucb.ucb1 module¶

Classes (Python) to compute the Bandit UCB1 arm allocation and choosing the arm to pull next.

See moe.bandit.ucb.ucb_interface.UCBInterface for further details on this bandit.

class moe.bandit.ucb.ucb1.UCB1(historical_info)[source]¶

Bases: moe.bandit.ucb.ucb_interface.UCBInterface

Implementation of UCB1.

A class to encapsulate the computation of bandit UCB1. See moe.bandit.ucb.ucb_interface.UCBInterface.allocate_arms() for more details on how UCB allocates arms.

See superclass moe.bandit.ucb.ucb_interface.UCBInterface for further details.

get_ucb_payoff(sampled_arm, number_sampled)[source]¶

Compute the expected upper confidence bound payoff using the UCB1 formula.

The expected upper confidence bound payoff (expected UCB payoff) is computed as follows:

\[r_j = \mu + \sqrt{\frac{2 \ln n}{n_j}}\]

where \(\mu\) is the average payoff obtained from arm j (the given sampled_arm), \(n_j\) is the number of times arm j has been pulled (sampled_arm.total), and n is overall the number of pulls so far (number_sampled). number_sampled (number sampled) is calculated by summing up total from each arm sampled.

Parameters:	sampled_arm (`moe.bandit.data_containers.SampleArm`) – a sampled arm number_sampled (int) – the overall number of pulls so far
Returns:	ucb payoff
Return type:	float64
Raise:	ValueError when `sampled_arm` is empty.

moe.bandit.ucb.ucb1_tuned module¶

Classes (Python) to compute the Bandit UCB1-tuned arm allocation and choosing the arm to pull next.

See moe.bandit.ucb.ucb_interface.UCBInterface for further details on this bandit.

class moe.bandit.ucb.ucb1_tuned.UCB1Tuned(historical_info)[source]¶

Bases: moe.bandit.ucb.ucb_interface.UCBInterface

Implementation of UCB1-tuned.

A class to encapsulate the computation of bandit UCB1-tuned. See moe.bandit.ucb.ucb_interface.UCBInterface.allocate_arms() for more details on how UCB allocates arms.

See superclass moe.bandit.ucb.ucb_interface.UCBInterface for further details.

get_ucb_payoff(sampled_arm, number_sampled)[source]¶

Compute the expected upper confidence bound payoff using the UCB1-tuned formula.

The upper confidence bound for the variance of machine j \(v_j(n_j)\) is computed as follows:

\[v_j(n_j) = \sigma^2 + \sqrt{\frac{2 \ln n}{n_j}}\]

where \(\sigma^2\) is the sample variance of arm j (the given sampled_arm), \(n_j\) is the number of times arm j has been pulled (sampled_arm.total), and n is overall the number of pulls so far (number_sampled). number_sampled (number sampled) is calculated by summing up total from each arm sampled.

The expected upper confidence bound payoff (expected UCB payoff) is computed as follows:

\[r_j = \mu + \sqrt{\frac{\ln n}{n_j} \min \{1/4, v_j(n_j)\}}\]

where \(\mu\) is the average payoff obtained from arm j and 1/4 is an upper bound on the variance of a Bernoulli random variable.

Parameters:	sampled_arm (`moe.bandit.data_containers.SampleArm`) – a sampled arm number_sampled (int) – the overall number of pulls so far
Returns:	ucb payoff
Return type:	float64
Raise:	ValueError when `sampled_arm` is empty.

moe.bandit.ucb.ucb_interface module¶

Classes (Python) to compute the Bandit UCB (Upper Confidence Bound) arm allocation and choosing the arm to pull next.

See moe.bandit.bandit_interface for further details on bandit.

class moe.bandit.ucb.ucb_interface.UCBInterface(historical_info, subtype=None)[source]¶

Bases: moe.bandit.bandit_interface.BanditInterface

Implementation of the constructor of UCB (Upper Confidence Bound) and method allocate_arms. The method get_ucb_payoff is implemented in subclass.

A class to encapsulate the computation of bandit UCB. The Algorithm: http://moodle.technion.ac.il/pluginfile.php/192340/mod_resource/content/0/UCB.pdf

To inherit this class, a subclass needs to implement get_ucb_payoff (see moe.bandit.ucb.ucb1.UCB1.get_ucb_payoff() for an example), everything else is already implemented.

See moe.bandit.bandit_interface docs for further details.

allocate_arms()[source]¶

Compute the allocation to each arm given historical_info, running bandit subtype endpoint.

Computes the allocation to each arm based on the given subtype, and, historical info.

Works with k-armed bandits (k >= 1).

The Algorithm: http://moodle.technion.ac.il/pluginfile.php/192340/mod_resource/content/0/UCB.pdf

If there is at least one unsampled arm, this method will choose to pull the unsampled arm (randomly choose an unsampled arm if there are multiple unsampled arms). If all arms are pulled at least once, this method will pull the optimal arm (best expected upper confidence bound payoff).

See moe.bandit.ucb.ucb_interface.UCBInterface.get_ucb_payoff() for details on how to compute the expected upper confidence bound payoff (expected UCB payoff)

In case of a tie, the method will split the allocation among the optimal arms. For example, if we have three arms (arm1, arm2, and arm3) with expected UCB payoff 0.5, 0.5, and 0.1 respectively. We split the allocation between the optimal arms arm1 and arm2.

{arm1: 0.5, arm2: 0.5, arm3: 0.0}

Returns:	the dictionary of (arm, allocation) key-value pairs
Return type:	a dictionary of (str, float64) pairs
Raise:	ValueError when `sample_arms` are empty.

get_ucb_payoff(sampled_arm, number_sampled)[source]¶

Compute the expected upper confidence bound payoff using the UCB subtype formula.

See definition in subclasses for details.

Parameters:	sampled_arm (`moe.bandit.data_containers.SampleArm`) – a sampled arm number_sampled (int) – the overall number of pulls so far
Returns:	ucb payoff
Return type:	float64
Raise:	ValueError when `sampled_arm` is empty.

static get_unsampled_arm_names(arms_sampled)[source]¶

Compute the set of unsampled arm names based on the given arms_sampled..

Throws an exception when arms_sampled is empty.

Parameters:	arms_sampled (dictionary of (str, SampleArm()) pairs) – a dictionary of arm name to `moe.bandit.data_containers.SampleArm`
Returns:	set of names of the unsampled arms
Return type:	frozenset(str)
Raise:	ValueError when `arms_sampled` are empty.

get_winning_arm_names(arms_sampled)[source]¶

Compute the set of winning arm names based on the given arms_sampled..

Throws an exception when arms_sampled is empty.

Parameters:	arms_sampled (dictionary of (str, SampleArm()) pairs) – a dictionary of arm name to `moe.bandit.data_containers.SampleArm`
Returns:	set of names of the winning arms
Return type:	frozenset(str)
Raise:	ValueError when `arms_sampled` are empty.

Module contents¶

Bandit directory containing multi-armed bandit implementations of UCB policies in python.

Files in this package

moe.bandit.ucb.ucb1: UCB1 object for allocating bandit arms and choosing the winning arm based on UCB1 policy.
moe.bandit.ucb.ucb1_tuned: UCB1Tuned object for allocating bandit arms and choosing the winning arm based on UCB1-tuned policy.
moe.bandit.ucb.ucb_interface: a base UCBInterface object for all bandit UCB subtypes.