moe.bandit package¶

Subpackages¶

Submodules¶

moe.bandit.bandit_interface module¶

Interface for Bandit functions, supports allocate arms and choose arm.

class moe.bandit.bandit_interface.BanditInterface[source]¶

Bases: object

Interface for a bandit algorithm.

Abstract class to enable bandit functions, supports allocate arms and choose arm. Implementers of this interface will never override the method choose_arm.

Implementers of this ABC are required to manage their own hyperparameters.

allocate_arms()[source]¶

Compute the allocation to each arm given historical_info, running bandit subtype` endpoint with hyperparameters in hyperparameter_info`.

Computes the allocation to each arm based on the given subtype, historical info, and hyperparameter info.

Returns:	the dictionary of (arm, allocation) key-value pairs
Return type:	a dictionary of (str, float64) pairs

static choose_arm(arms_to_allocations)[source]¶

Choose the arm based on allocation information given in arms_to_allocations.

Throws an exception when ‘arms_to_allocations’ is empty. Implementers of this interface will never override this method.

Rtype arms_to_allocations:
Parameters:	arms_to_allocations – the dictionary of (arm, allocation) key-value pairs
	a dictionary of (str, float64) pairs
Returns:	name of the chosen arm
Return type:	str

moe.bandit.constant module¶

Some default configuration parameters for bandit components.

moe.bandit.constant.MAX_BERNOULLI_RANDOM_VARIABLE_VARIANCE = 0.25¶: Used in moe.bandit.ucb.ucb1_tuned.UCB1Tuned.get_ucb_payoff(). 0.25 is the maximum value the variance of a Bernoulli random variable can possibly take. Because the variance formula is \(p(1-p)\) where p is the probability of success and the maximum value is when p = 0.5. \(p(1-p) = 0.5(1-0.5) = 0.25\). See http://en.wikipedia.org/wiki/Bernoulli_distribution for more details.

moe.bandit.data_containers module¶

Data containers convenient for/used to interact with bandit members.

class moe.bandit.data_containers.BernoulliArm(win=0.0, loss=0.0, total=0, variance=None)[source]¶

Bases: moe.bandit.data_containers.SampleArm

A Bernoulli arm (name, win, loss, total, variance) sampled from the objective function we are modeling/optimizing.

A Bernoulli arm has payoff 1 for a success and 0 for a failure. See more details on Bernoulli distribution at http://en.wikipedia.org/wiki/Bernoulli_distribution

See superclass SampleArm for more details.

validate()[source]¶

Check this Bernoulli arm is a valid Bernoulli arm. Also check that this BernoulliArm passes basic validity checks: all values are finite.

A Bernoulli arm has payoff 1 for a success and 0 for a failure. See more details on Bernoulli distribution at http://en.wikipedia.org/wiki/Bernoulli_distribution

Raises ValueError:
	if any member data is non-finite or out of range or the arm is not a valid Bernoulli arm

class moe.bandit.data_containers.HistoricalData(sample_arms=None, validate=True)[source]¶

Bases: object

A data container for storing the historical data from an entire experiment in a layout convenient for this library.

Users will likely find it most convenient to store experiment historical data of arms in tuples of (win, loss, total, variance); for example, these could be the columns of a database row, part of an ORM, etc. The SampleArm class (above) provides a convenient representation of this input format, but users are not required to use it.

Variables:	_arms_sampled – (dict) mapping of arm names to already-sampled arms

append_sample_arms(sample_arms, validate=True)[source]¶

Append the contents of sample_arms to the data members of this class.

This method first validates the arms and then updates the historical data. The result of combining two valid arms is always a valid arm.

Parameters:	sample_arms (a dictionary of (arm name, SampleArm) key-value pairs) – the already-sampled arms: wins, losses, and totals validate (boolean) – whether to sanity-check the input sample_arms

arms_sampled[source]¶: Return the arms_sampled, a dictionary of (arm name, SampleArm) key-value pairs.

json_payload()[source]¶: Construct a json serializeable and MOE REST recognizeable dictionary of the historical data.

num_arms[source]¶: Return the number of sampled arms.

static validate_sample_arms(sample_arms)[source]¶

Check that sample_arms passes basic validity checks: all values are finite.

Parameters:	sample_arms (a dictionary of (arm name, SampleArm) key-value pairs) – already-sampled arms: names, wins, losses, and totals
Returns:	True if inputs are valid
Return type:	boolean

class moe.bandit.data_containers.SampleArm(win=0.0, loss=0.0, total=0, variance=None)[source]¶

Bases: object

An arm (name, win, loss, total, variance) sampled from the objective function we are modeling/optimizing.

This class is a representation of a “Sample Arm,” which is defined by the four data members listed here. SampleArm is a convenient way of communicating data to the rest of the bandit library (via the HistoricalData container); it also provides a convenient grouping for interactive introspection.

Users are not required to use SampleArm, iterables with the same data layout will suffice.

Variables:	win – (float64 >= 0.0) The amount won from playing this arm loss – (float64 >= 0.0) The amount loss from playing this arm total – (int >= 0) The number of times we have played this arm variance – (float >= 0.0) The variance of this arm, if there is no variance it is equal to None

json_payload()[source]¶: Convert the sample_arm into a dict to be consumed by json for a REST request.

loss[source]¶: Return the amount loss, always greater than or equal to zero.

total[source]¶: Return the total number of tries, always a non-negative integer.

validate()[source]¶

Check this SampleArm passes basic validity checks: all values are finite.

Raises ValueError:
	if any member data is non-finite or out of range

variance[source]¶: Return the variance of sampled tries, always greater than or equal to zero, if there is no variance it is equal to None.

win[source]¶: Return the amount win, always greater than or equal to zero.

moe.bandit.linkers module¶

Links between the implementations of bandit algorithms.

class moe.bandit.linkers.BanditMethod¶

Bases: tuple

BanditMethod(subtype, bandit_class)

bandit_class¶: Alias for field number 1

subtype¶: Alias for field number 0

moe.bandit.utils module¶

Utilities for bandit.

moe.bandit.utils.get_equal_arm_allocations(arms_sampled, winning_arm_names=None)[source]¶

Split allocations equally among the given winning_arm_names. If no winning_arm_names given, split allocations among arms_sampled.

Throws an exception when arms_sampled is empty.

Parameters:	arms_sampled (dictionary of (str, SampleArm()) pairs) – a dictionary of arm name to `moe.bandit.data_containers.SampleArm`
Param:	winning_arm_names: a set of names of the winning arms
Type:	winning_arm_names: frozenset(str)
Returns:	the dictionary of (arm, allocation) key-value pairs
Return type:	a dictionary of (str, float64) pairs
Raise:	ValueError when `arms_sampled` are empty.

moe.bandit.utils.get_winning_arm_names_from_payoff_arm_name_list(payoff_arm_name_list)[source]¶

Compute the set of winning arm names based on the given payoff_arm_name_list..

Throws an exception when payoff_arm_name_list is empty.

Parameters:	payoff_arm_name_list (list of (float64, str) tuples) – a list of (payoff, arm name) tuples
Returns:	of set of names of the winning arms
Return type:	frozenset(str)
Raise:	ValueError when `payoff_arm_name_list` are empty.

Module contents¶

Bandit directory containing multi-armed bandit implementation in python.

Files in this package

moe.bandit.constant: some default configuration values for optimal_learning components
moe.bandit.data_containers: SampleArm and HistoricalData containers for passing data to the bandit library
moe.bandit.linkers: linkers connecting bandit components.
moe.bandit.bandit_interface: an interface into different bandit policies

Bandit packages moe.bandit.epsilon: Epsilon bandit policies moe.bandit.ucb: UCB bandit policies moe.bandit.bla: BLA bandit policies

A set of abstract base classes (ABCs) defining an interface for interacting with bandit. These consist of composable functions and classes to allocate bandit arms and choose arm.