moe.bandit.epsilon package¶
Submodules¶
moe.bandit.epsilon.epsilon_first module¶
Classes (Python) to compute the Bandit Epsilon-First arm allocation and choosing the arm to pull next.
See moe.bandit.epsilon.epsilon_interface.EpsilonInterface for further details on bandit.
- class moe.bandit.epsilon.epsilon_first.EpsilonFirst(historical_info, epsilon=0.05, total_samples=100)[source]¶
Bases: moe.bandit.epsilon.epsilon_interface.EpsilonInterface
Implementation of EpsilonFirst.
A class to encapsulate the computation of bandit epsilon first.
total_samples is the total number of samples (number to sample + number sampled) number sampled is calculated by summing up total from each arm sampled. total_samples is T from Multi-Armed Bandits.
See superclass moe.bandit.epsilon.epsilon_interface.EpsilonInterface for further details.
- allocate_arms()[source]¶
Compute the allocation to each arm given historical_info, running bandit subtype endpoint with hyperparameters in hyperparameter_info.
Computes the allocation to each arm based on the given subtype, historical info, and hyperparameter info.
Works with k-armed bandits (k >= 1).
The Algorithm: http://en.wikipedia.org/wiki/Multi-armed_bandit#Approximate_solutions
This method starts with a pure exploration phase, followed by a pure exploitation phase. If we have a total of T trials, the first \(\epsilon\) T trials, we only explore. After that, we only exploit (t = \(\epsilon\) T, \(\epsilon\) T + 1, ..., T).
This method will pull a random arm in the exploration phase. Then this method will pull the optimal arm (best expected return) in the exploitation phase.
In case of a tie in the exploitation phase, the method will split the allocation among the optimal arms.
For example, if we have three arms, two arms (arm1 and arm2) with an average payoff of 0.5 ({win:10, lose:10, total:20}) and a new arm (arm3, average payoff is 0 and total is 0).
Let the epsilon \(\epsilon\) be 0.1.
The allocation depends on which phase we are in:
Case 1: T = 50
Recall that T = number to sample + number sampled. number sampled \(= 20 + 20 + 0 = 40\). So we are on trial #41. We explore the first \(\epsilon T = 0.1 * 50 = 5\) trials and thus we are in the exploitation phase. We split the allocation between the optimal arms arm1 and arm2.
{arm1: 0.5, arm2: 0.5, arm3: 0.0}
Case 2: T = 500
We explore the first \(\epsilon T = 0.1 * 500 = 50\) trials. Since we are on trail #41, we are in the exploration phase. We choose arms randomly:
{arm1: 0.33, arm2: 0.33, arm3: 0.33}
Returns: the dictionary of (arm, allocation) key-value pairs Return type: a dictionary of (str, float64) pairs Raise: ValueError when sample_arms are empty.
moe.bandit.epsilon.epsilon_greedy module¶
Classes (Python) to compute the Bandit Epsilon-Greedy arm allocation and choosing the arm to pull next.
See moe.bandit.epsilon.epsilon_interface.EpsilonInterface for further details on this bandit.
- class moe.bandit.epsilon.epsilon_greedy.EpsilonGreedy(historical_info, epsilon=0.05)[source]¶
Bases: moe.bandit.epsilon.epsilon_interface.EpsilonInterface
Implementation of EpsilonGreedy.
A class to encapsulate the computation of bandit epsilon greedy.
See superclass moe.bandit.epsilon.epsilon_interface.EpsilonInterface for further details.
- allocate_arms()[source]¶
Compute the allocation to each arm given historical_info, running bandit subtype endpoint with hyperparameter epsilon.
Computes the allocation to each arm based on the given subtype, historical info, and hyperparameter epsilon.
Works with k-armed bandits (k >= 1).
The Algorithm: http://en.wikipedia.org/wiki/Multi-armed_bandit#Approximate_solutions
This method will pull the optimal arm (best expected return) with probability \(1-\epsilon\), with probability \(\epsilon\) a random arm will be pulled.
In case of a tie, the method will split the probability \(1-\epsilon\) among the optimal arms and with probability \(\epsilon\), a random arm will be pulled. For example, if we have three arms, two arms (arm1 and arm2) with an average payoff of 0.5 and a new arm (arm3, average payoff is 0). Let the epsilon \(\epsilon\) be 0.12. The allocation will be as follows: arm1: 0.48, arm2: 0.48, arm3: 0.04.
The calculation is as follows:
arm1 and arm2 both get the same allocation:
\[\frac{1-\epsilon}{2} + \frac{\epsilon}{3} = \frac{1-0.12}{2} + \frac{0.12}{3} = 0.44 + 0.04 = 0.48\]arm3 gets the allocation:
\[\frac{\epsilon}{3} = \frac{0.12}{3} = 0.04\]Returns: the dictionary of (arm, allocation) key-value pairs Return type: a dictionary of (str, float64) pairs Raise: ValueError when sample_arms are empty.
moe.bandit.epsilon.epsilon_interface module¶
Classes (Python) to compute the Bandit Epsilon arm allocation and choosing the arm to pull next.
See moe.bandit.bandit_interface for further details on bandit.
- class moe.bandit.epsilon.epsilon_interface.EpsilonInterface(historical_info, subtype=None, epsilon=0.05)[source]¶
Bases: moe.bandit.bandit_interface.BanditInterface
Implementation of the constructor and common methods of Epsilon. Abstract method allocate_arms implemented in subclass.
A class to encapsulate the computation of bandit epsilon. Epsilon is the sole hyperparameter in this class. Subclasses may contain other hyperparameters.
See moe.bandit.bandit_interface docs for further details.
- static get_winning_arm_names(arms_sampled)[source]¶
Compute the set of winning arm names based on the given arms_sampled..
Throws an exception when arms_sampled is empty. Implementers of this interface will never override this method.
Parameters: arms_sampled (dictionary of (str, SampleArm()) pairs) – a dictionary of arm name to moe.bandit.data_containers.SampleArm Returns: of set of names of the winning arms Return type: frozenset(str) Raise: ValueError when arms_sampled are empty.
Module contents¶
Bandit directory containing multi-armed bandit implementations of epsilon policies in python.
Files in this package
- moe.bandit.epsilon.epsilon_first: EpsilonFirst object for allocating bandit arms and choosing the winning arm based on epsilon-first policy.
- moe.bandit.epsilon.epsilon_greedy: EpsilonGreedy object for allocating bandit arms and choosing the winning arm based on epsilon-greedy policy.
- moe.bandit.epsilon.epsilon_interface: a base EpsilonInterface object for all bandit epsilon subtypes.