Welcome to MOE’s documentation!


What is MOE?

MOE (Metric Optimization Engine) is an efficient way to optimize a system’s parameters, when evaluating parameters is time-consuming or expensive.

Here are some examples of when you could use MOE:

  • Optimizing a system’s click-through rate (CTR). MOE is useful when evaluating CTR requires running an A/B test on real user traffic, and getting statistically significant results requires running this test for a substantial amount of time (hours, days, or even weeks).
  • Optimizing tunable parameters of a machine-learning prediction method. MOE is useful if calculating the prediction error for one choice of the parameters takes a long time, which might happen because the prediction method is complex and takes a long time to train, or because the data used to evaluate the error is huge.
  • Optimizing the design of an engineering system (an airplane, the traffic network in a city, a combustion engine, a hospital). MOE is useful if evaluating a design requires running a complex physics-based numerical simulation on a supercomputer.
  • Optimizing the parameters of a real-world experiment (a chemistry, biology, or physics experiment, a drug trial). MOE is useful when every experiment needs to be physically created in a lab, or very few experiments can be run in parallel.

MOE is ideal for problems in which the optimization problem’s objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as little as possible.

Inside, MOE uses Bayesian global optimization, which performs optimization using Bayesian statistics and optimal learning.

Optimal learning is the study of efficient methods for collecting information, particularly when doing so is time-consuming or expensive, and was developed and popularized from its roots in decision theory by Prof. Peter Frazier (Cornell, Operations Research and Information Engineering) and Prof. Warren Powell (Princeton, Operations Research and Financial Engineering). For more information about the mathematics of optimal learning, and more real-world applications like heart surgery, drug discovery, and materials science, see these intro slides to optimal learning.


To illustrate how MOE works, suppose we wish to maximize the click-through-rate (CTR) on a website we manage, by varying some real-valued parameter vector \(\vec{x}\) that governs how site content is presented to the user. Evaluating the CTR for a new set of parameters requires running an A/B test over a period of several days. We write this problem mathematically as,

\[\underset{\vec{x}}{\mathrm{argmax}} \ \text{CTR} (\vec{x}).\]

We want to find the best set of parameters \(\vec{x}\) while evaluating the underlying function (CTR) as few times as possible. See Objective Functions for more examples of objective functions and the best ways to combine metrics.

MOE builds the following loop, in which it takes the results from those A/B tests that have been run so far, processes them through its internal engine, and then determines at which parameter vector \(\vec{x}\) it would be most valuable to next observe the CTR. MOE runs an A/B test at this new parameter vector, and then repeats the loop.

This choice of the most valuable point trades a desire to evaluate points where we have a lot of uncertainty about the CTR (this is called exploration), and to evaluate points where we think the CTR is large (this is called exploitation).

By continuing to optimize over many iterations, MOE quickly finds approximate optima, or points with large CTR. As the world changes over time, MOE can surf these shifting optima as they move, staying at the peak of the potentially changing objective function in parameter space as time advances.

moe loop

For more examples on how MOE can be used see Examples

Video and slidedeck introduction to MOE:

MOE does this internally by:

  1. Building a Gaussian Process (GP) with the historical data

  2. Optimizing the hyperparameters of the Gaussian Process (model selection)

  3. Finding the point(s) of highest Expected Improvement (EI)

  4. Returning the points to sample, then repeat

Externally you can use MOE through:

You can be up and optimizing in a matter of minutes.

Quick Install

Install in docker:

This is the recommended way to run the MOE REST server. All dependencies and building is done automatically and in an isolated container.

Docker (http://docs.docker.io/) is a container based virtualization framework. Unlike traditional virtualization Docker is fast, lightweight and easy to use. Docker allows you to create containers holding all the dependencies for an application. Each container is kept isolated from any other, and nothing gets shared.

$ docker pull yelpmoe/latest # You can also pull specific versions like yelpmoe/v0.1.0
$ docker run -p 6543:6543 yelpmoe/latest

If you are on OSX, or want a build based on the current master branch you may need to build this manually.

$ git clone https://github.com/Yelp/MOE.git
$ cd MOE
$ docker build -t moe_container .
$ docker run -p 6543:6543 moe_container

The webserver and REST interface is now running on port 6543 from within the container. http://localhost:6543

Build from source (linux and OSX 10.8 and 10.9 supported)

Full Install

Quick Start

REST/web server and interactive demo

To get the REST server running locally, from the directory MOE is installed:

$ pserve --reload development.ini # MOE server is now running at http://localhost:6543

You can access the server from a browser or from the command line,

$ curl -X POST -H "Content-Type: application/json" -d '{"domain_info": {"dim": 1}, "points_to_evaluate": [[0.1], [0.5], [0.9]], "gp_historical_info": {"points_sampled": [{"value_var": 0.01, "value": 0.1, "point": [0.0]}, {"value_var": 0.01, "value": 0.2, "point": [1.0]}]}}'

gp_ei endpoint documentation: moe.views.rest.gp_ei

From ipython

$ ipython
> from moe.easy_interface.experiment import Experiment
> from moe.easy_interface.simple_endpoint import gp_next_points
> exp = Experiment([[0, 2], [0, 4]])
> exp.historical_data.append_sample_points([[[0, 0], 1.0, 0.01]])
> next_point_to_sample = gp_next_points(exp)
> print next_point_to_sample

easy_interface documentation: moe.easy_interface package

Within Python

See moe_examples.next_point_via_simple_endpoint or Examples for more examples.

import math
import random

from moe.easy_interface.experiment import Experiment
from moe.easy_interface.simple_endpoint import gp_next_points
from moe.optimal_learning.python.data_containers import SamplePoint

# Note: this function can be anything, the output of a batch, results of an A/B experiment, the value of a physical experiment etc.
def function_to_minimize(x):
    """Calculate an aribitrary 2-d function with some noise with minimum near [1, 2.6]."""
    return math.sin(x[0]) * math.cos(x[1]) + math.cos(x[0] + x[1]) + random.uniform(-0.02, 0.02)

if __name__ == '__main__':
    exp = Experiment([[0, 2], [0, 4]])  # 2D experiment, we build a tensor product domain
    # Bootstrap with some known or already sampled point(s)
        SamplePoint([0, 0], function_to_minimize([0, 0]), 0.05),  # Iterables of the form [point, f_val, f_var] are also allowed

    # Sample 20 points
    for i in range(20):
        # Use MOE to determine what is the point with highest Expected Improvement to use next
        next_point_to_sample = gp_next_points(exp)[0]  # By default we only ask for one point
        # Sample the point from our objective function, we can replace this with any function
        value_of_next_point = function_to_minimize(next_point_to_sample)

        print "Sampled f({0:s}) = {1:.18E}".format(str(next_point_to_sample), value_of_next_point)

        # Add the information about the point to the experiment historical data to inform the GP
        exp.historical_data.append_sample_points([SamplePoint(next_point_to_sample, value_of_next_point, 0.01)])  # We can add some noise


MOE is licensed under the Apache License, Version 2.0

Source Documentation

Python Files

C++ Files

Indices and tables