gpp_expected_improvement_gpu¶
Contents:
gpp_expected_improvement_gpu.hpp¶
All GPU related functions are declared here, and any other C++ functions who wish to call GPU functions should only call functions here.
DefinesOL_CUDA_ERROR_THROW(X)
Macro that checks error message (CudaError object) returned by CUDA functions, and throws OptimalLearningCudaException if there is error.
namespace optimal_learning
Macro to allow restrict as a keyword for C++ compilation and CUDA/nvcc compilation. See related entry in gpp_common.hpp for more details.
FunctionsOL_NONNULL_POINTERS void SetupExpectedImprovementState(const CudaExpectedImprovementEvaluator & ei_evaluator, double const *restrict points_to_sample, double const *restrict points_being_sampled, int num_to_sample, int num_being_sampled, int max_num_threads, bool configure_for_gradients, UniformRandomGenerator * uniform_rng, std::vector< typename CudaExpectedImprovementEvaluator::StateType > * state_vector)Set up vector of CudaExpectedImprovementEvaluator::StateType.
This is a utility function just for reducing code duplication.
Throws InvalidValueException if max_num_threads != 1. Multiple threads (=> multiple GPUs) is not yet supported. TODO(GH-398): remove this requirement/comments when we support computation on multiple GPUs
- Parameters:
ei_evaluator: evaluator object associated w/the state objects being constructed points_to_sample[dim][num_to_sample]: initial points to load into state (must be a valid point for the problem); i.e., points at which to evaluate EI and/or its gradient points_being_sampled[dim][num_being_sampled]: points that are being sampled in concurrently experiments num_to_sample: number of potential future samples; gradients are evaluated wrt these points (i.e., the “q” in q,p-EI) num_being_sampled: number of points being sampled concurrently (i.e., the p in q,p-EI) max_num_threads: maximum number of threads for use by OpenMP (generally should be <= # cores) configure_for_gradients: true if these state objects will be used to compute gradients, false otherwise state_vector[arbitrary]: vector of state objects, arbitrary size (usually 0) uniform_rng[1]: UniformRandomGenerator object used to seed the GPU PRNG(s) - Outputs:
uniform_rng[1]: UniformRandomGenerator object will have its state changed due to random draws state_vector[max_num_threads]: vector of states containing max_num_threads properly initialized state objects template < typename DomainType >OL_NONNULL_POINTERS void CudaComputeOptimalPointsToSampleViaMultistartGradientDescent(const GaussianProcess & gaussian_process, const GradientDescentParameters & optimizer_parameters, const DomainType & domain, const ThreadSchedule & thread_schedule, double const *restrict start_point_set, double const *restrict points_being_sampled, int num_multistarts, int num_to_sample, int num_being_sampled, double best_so_far, int max_int_steps, int which_gpu, UniformRandomGenerator * uniform_rng, bool *restrict found_flag, double *restrict best_next_point)This function is the same as ComputeOptimalPointsToSampleViaMultistartGradientDescent in gpp_math.hpp except that it is specifically used for GPU EI evaluators. Refer to gpp_math.hpp for detailed documentation.
- Parameters:
gaussian_process: GaussianProcess object (holds points_sampled, values, noise_variance, derived quantities) that describes the underlying GP optimizer_parameters: GradientDescentParameters object that describes the parameters controlling EI optimization (e.g., number of iterations, tolerances, learning rate) domain: object specifying the domain to optimize over (see gpp_domain.hpp) thread_schedule: struct instructing OpenMP on how to schedule threads; i.e., (suggestions in parens) max_num_threads (num cpu cores), schedule type (omp_sched_dynamic), chunk_size (0). start_point_set[dim][num_to_sample][num_multistarts]: set of initial guesses for MGD (one block of num_to_sample points per multistart) points_being_sampled[dim][num_being_sampled]: points that are being sampled in concurrent experiments num_multistarts: number of points in set of initial guesses num_to_sample: number of potential future samples; gradients are evaluated wrt these points (i.e., the “q” in q,p-EI) num_being_sampled: number of points being sampled concurrently (i.e., the “p” in q,p-EI) best_so_far: value of the best sample so far (must be min(points_sampled_value)) max_int_steps: maximum number of MC iterations which_gpu: the device ID of GPU used for computation uniform_rng[1]: UniformRandomGenerator object used to seed the GPU PRNG(s) - Outputs:
uniform_rng[1]: UniformRandomGenerator object will have its state changed due to random draws found_flag[1]: true if best_next_point corresponds to a nonzero EI best_next_point[dim][num_to_sample]: points yielding the best EI according to MGD template < typename DomainType >void CudaComputeOptimalPointsToSampleWithRandomStarts(const GaussianProcess & gaussian_process, const GradientDescentParameters & optimizer_parameters, const DomainType & domain, const ThreadSchedule & thread_schedule, double const *restrict points_being_sampled, int num_to_sample, int num_being_sampled, double best_so_far, int max_int_steps, int which_gpu, bool *restrict found_flag, UniformRandomGenerator * uniform_generator, double *restrict best_next_point)This function is the same as ComputeOptimalPointsToSampleWithRandomStarts in gpp_math.hpp except that it is specifically used for GPU EI evaluators. Refer to gpp_math.hpp for detailed documentation.
- Parameters:
gaussian_process: GaussianProcess object (holds points_sampled, values, noise_variance, derived quantities) that describes the underlying GP optimizer_parameters: GradientDescentParameters object that describes the parameters controlling EI optimization (e.g., number of iterations, tolerances, learning rate) domain: object specifying the domain to optimize over (see gpp_domain.hpp) thread_schedule: struct instructing OpenMP on how to schedule threads; i.e., (suggestions in parens) max_num_threads (num cpu cores), schedule type (omp_sched_dynamic), chunk_size (0). points_being_sampled[dim][num_being_sampled]: points that are being sampled in concurrent experiments num_to_sample: number of potential future samples; gradients are evaluated wrt these points (i.e., the “q” in q,p-EI) num_being_sampled: number of points being sampled concurrently (i.e., the “p” in q,p-EI) best_so_far: value of the best sample so far (must be min(points_sampled_value)) max_int_steps: maximum number of MC iterations which_gpu: the device ID of GPU used for computation uniform_generator[1]: UniformRandomGenerator object used to seed the GPU PRNG(s) - Outputs:
found_flag[1]: true if best_next_point corresponds to a nonzero EI uniform_generator[1]: UniformRandomGenerator object will have its state changed due to random draws best_next_point[dim][num_to_sample]: points yielding the best EI according to MGD template < typename DomainType >void CudaComputeOptimalPointsToSampleViaLatinHypercubeSearch(const GaussianProcess & gaussian_process, const DomainType & domain, const ThreadSchedule & thread_schedule, double const *restrict points_being_sampled, int num_multistarts, int num_to_sample, int num_being_sampled, double best_so_far, int max_int_steps, int which_gpu, bool *restrict found_flag, UniformRandomGenerator * uniform_generator, double *restrict best_next_point)This function is the same as ComputeOptimalPointsToSampleViaLatinHypercubeSearch in gpp_math.hpp except that it is specifically used for GPU EI evaluators. Refer to gpp_math.hpp for detailed documentation.
- Parameters:
gaussian_process: GaussianProcess object (holds points_sampled, values, noise_variance, derived quantities) that describes the underlying GP domain: object specifying the domain to optimize over (see gpp_domain.hpp) thread_schedule: struct instructing OpenMP on how to schedule threads; i.e., (suggestions in parens) max_num_threads (num cpu cores), schedule type (omp_sched_static), chunk_size (0). points_being_sampled[dim][num_being_sampled]: points that are being sampled in concurrent experiments num_multistarts: number of random points to check num_to_sample: number of potential future samples; gradients are evaluated wrt these points (i.e., the “q” in q,p-EI) num_being_sampled: number of points being sampled concurrently (i.e., the “p” in q,p-EI) best_so_far: value of the best sample so far (must be min(points_sampled_value)) max_int_steps: maximum number of MC iterations which_gpu: device ID of the GPU used for computation uniform_generator[1]: a UniformRandomGenerator object providing the random engine for uniform random numbers - Outputs:
- found_flag[1]: true if best_next_point corresponds to a nonzero EI :uniform_generator[1]: UniformRandomGenerator object will have its state changed due to random draws :best_next_point[dim][num_to_sample]: points yielding the best EI according to dumb search
class CudaDeleter
A deleter for std::unique_ptr that are created with memory returned by cudaMalloc(); e.g., the member of CudaDevicePointer.
For a description of deleters, see: http://en.cppreference.com/w/cpp/memory/unique_ptr http://en.cppreference.com/w/cpp/memory/unique_ptr/~unique_ptr
The STL-provided default deleter: http://en.cppreference.com/w/cpp/memory/default_delete
Public Functionsvoid operator()(void * device_ptr)Free the memory pointed to by device_ptr. Called as part of the dtor for std::unique_ptr and MUST NOT throw exceptions. Wraps cudaFree().
- Parameters:
device_ptr: device pointer to memory previously allocated by cudaMalloc(). class CudaDevicePointer
This struct is a smart pointer that wraps std::unique_ptr. It provides a simple interface for device memory (i.e., on a GPU) ownership. It automatically handles cudaMalloc() and cudaFree() calls and error checks; the result is stored in a std::unique_ptr.
Public FunctionsPrivate MembersCudaDevicePointer(int num_values_in)Construct a CudaDevicePointer (via cudaMalloc()) that owns a block of num_values * sizeof(ValueType) bytes on the GPU device.
If allocation fails, device_ptr is nullptr and num_values is 0.
- Parameters:
num_values: number of values allocated at the device memory address held in this object CudaDevicePointer(CudaDevicePointer && other)int num_values()ValueType * device_ptr()OL_DISALLOW_DEFAULT_AND_COPY_AND_ASSIGN(CudaDevicePointer)int num_values_
number of values to allocate on gpu, so the memory size is num_values * sizeof(ValueType)
std::unique_ptr< ValueType, CudaDeleter > device_ptr_
pointer to the memory location on gpu
class CudaExpectedImprovementEvaluator
This class has the same functionality as ExpectedImprovementEvaluator (see gpp_math.hpp), except that computations are performed on GPU.
Public TypePublic Functionstypedef CudaExpectedImprovementState StateType
Private MembersCudaExpectedImprovementEvaluator(const GaussianProcess & gaussian_process_in, int num_mc_in, double best_so_far, int devID_in)Constructor that also specify which gpu you want to use (for multi-gpu system)
~CudaExpectedImprovementEvaluator()int dim()int num_mc()const GaussianProcess * gaussian_process()double ComputeObjectiveFunction(StateType * ei_state)Wrapper for ComputeExpectedImprovement(); see that function for details.
void ComputeGradObjectiveFunction(StateType * ei_state, double *restrict grad_ei)Wrapper for ComputeGradExpectedImprovement(); see that function for details.
double ComputeExpectedImprovement(StateType * ei_state)This function has the same functionality as ComputeExpectedImprovement (see gpp_math.hpp) in class ExpectedImprovementEvaluator.
- Parameters:
ei_state[1]: properly configured state object - Outputs:
ei_state[1]: state with temporary storage modified; uniform_rng modified - Returns:
- the expected improvement from sampling points_to_sample with points_being_sampled concurrent experiments
void ComputeGradExpectedImprovement(StateType * ei_state, double *restrict grad_ei)This function has the same functionality as ComputeGradExpectedImprovement (see gpp_math.hpp) in class ExpectedImprovementEvaluator.
- Parameters:
ei_state[1]: properly configured state object - Outputs:
ei_state[1]: state with temporary storage modified; uniform_rng modified grad_ei[dim][num_to_sample]: gradient of EI void SetupGPU(int devID)Call CUDA API function to activate a GPU. Refer to: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g418c299b069c4803bfb7cab4943da383
- Parameters:
devID: device ID of the GPU need to be activated OL_DISALLOW_DEFAULT_AND_COPY_AND_ASSIGN(CudaExpectedImprovementEvaluator)const int dim_
spatial dimension (e.g., entries per point of points_sampled)
int num_mc_
number of mc iterations
double best_so_far_
best (minimum) objective function value (in points_sampled_value)
const GaussianProcess * gaussian_process_
pointer to gaussian process used in EI computations
class CudaExpectedImprovementState
This has the same functionality as ExpectedImprovementState (see gpp_math.hpp) except that it is for GPU computing
Public TypePublic Functionstypedef CudaExpectedImprovementEvaluator EvaluatorType
Public MembersCudaExpectedImprovementState(const EvaluatorType & ei_evaluator, double const *restrict points_to_sample, double const *restrict points_being_sampled, int num_to_sample_in, int num_being_sampled_in, bool configure_for_gradients, UniformRandomGenerator * uniform_rng_in)This struct has same functionality as ExpectedImprovementState in gpp_math.hpp, except that it is specifically for GPU EI evaluator. Refer to gpp_math.hpp for detailed documentation.
- Parameters:
ei_evaluator: expected improvement evaluator object that specifies the parameters & GP for EI evaluation points_to_sample[dim][num_to_sample]: points at which to evaluate EI and/or its gradient to check their value in future experiments (i.e., test points for GP predictions) points_being_sampled[dim][num_being_sampled]: points being sampled in concurrent experiments num_to_sample: number of potential future samples; gradients are evaluated wrt these points (i.e., the “q” in q,p-EI) num_being_sampled: number of points being sampled in concurrent experiments (i.e., the “p” in q,p-EI) configure_for_gradients: true if this object will be used to compute gradients, false otherwise uniform_rng[1]: pointer to a properly initialized* UniformRandomGenerator object Note
* The UniformRandomGenerator object must already be seeded. If multithreaded computation is used for EI, then every state object must have a different UniformRandomGenerator (different seeds, not just different objects).
CudaExpectedImprovementState(const EvaluatorType & ei_evaluator, double const *restrict points_to_sample, double const *restrict points_being_sampled, int num_to_sample_in, int num_being_sampled_in, bool configure_for_gradients, UniformRandomGenerator * uniform_rng_in, bool configure_for_test)CudaExpectedImprovementState(CudaExpectedImprovementState && other)int GetProblemSize()void GetCurrentPoint(double *restrict points_to_sample)Get the points_to_sample: potential future samples whose EI (and/or gradients) are being evaluated
- Outputs:
points_to_sample[dim][num_to_sample]: potential future samples whose EI (and/or gradients) are being evaluated void SetCurrentPoint(const EvaluatorType & ei_evaluator, double const *restrict points_to_sample)Change the potential samples whose EI (and/or gradient) are being evaluated. Update the state’s derived quantities to be consistent with the new points.
- Parameters:
ei_evaluator: expected improvement evaluator object that specifies the parameters & GP for EI evaluation points_to_sample[dim][num_to_sample]: potential future samples whose EI (and/or gradients) are being evaluated void SetupState(const EvaluatorType & ei_evaluator, double const *restrict points_to_sample)Configures this state object with new points_to_sample, the location of the potential samples whose EI is to be evaluated. Ensures all state variables & temporaries are properly sized. Properly sets all dependent state variables (e.g., GaussianProcess’s state) for EI evaluation.
Warning
This object’s state is INVALIDATED if the ei_evaluator (including the GaussianProcess it depends on) used in SetupState is mutated! SetupState() should be called again in such a situation.
- Parameters:
ei_evaluator: expected improvement evaluator object that specifies the parameters & GP for EI evaluation points_to_sample[dim][num_to_sample]: potential future samples whose EI (and/or gradients) are being evaluated OL_DISALLOW_DEFAULT_AND_COPY_AND_ASSIGN(CudaExpectedImprovementState)Public Static Functionsconst int dim
spatial dimension (e.g., entries per point of points_sampled)
const int num_to_sample
number of potential future samples; gradients are evaluated wrt these points (i.e., the “q” in q,p-EI)
const int num_being_sampled
number of points being sampled concurrently (i.e., the “p” in q,p-EI)
const int num_derivatives
number of derivative terms desired (usually 0 for no derivatives or num_to_sample)
const int num_union
number of points in union_of_points: num_to_sample + num_being_sampled
std::vector< double > union_of_points
points currently being sampled; this is the union of the points represented by “q” and “p” in q,p-EI points_to_sample is stored first in memory, immediately followed by points_being_sampled
GaussianProcess::StateType points_to_sample_state
gaussian process state
UniformRandomGenerator * uniform_rng
random number generator
std::vector< double > to_sample_mean
the mean of the GP evaluated at union_of_points
std::vector< double > grad_mu
the gradient of the GP mean evaluated at union_of_points, wrt union_of_points[0:num_to_sample]
std::vector< double > cholesky_to_sample_var
the cholesky (LL^T) factorization of the GP variance evaluated at union_of_points
std::vector< double > grad_chol_decomp
the gradient of the cholesky (LL^T) factorization of the GP variance evaluated at union_of_points wrt union_of_points[0:num_to_sample]
bool configure_for_test
CudaDevicePointer < double > gpu_mu
structs wrapping GPU device pointers used to store GP quantities temp device space for input data for GPU computations; GPU should not modify these
CudaDevicePointer < double > gpu_grad_mu
CudaDevicePointer < double > gpu_chol_var
CudaDevicePointer < double > gpu_grad_chol_var
CudaDevicePointer < double > gpu_ei_storage
temp device space for intermediate results returned by GPU computations
CudaDevicePointer < double > gpu_grad_ei_storage
CudaDevicePointer < double > gpu_random_number_ei
device storage for random numbers used by the GPU computing of ei & grad_ei only used for testing
CudaDevicePointer < double > gpu_random_number_grad_ei
std::vector< double > random_number_ei
host storage for random numbers used by the GPU in computing ei & grad_ei only used for testing
std::vector< double > random_number_grad_ei
std::vector< double > BuildUnionOfPoints(double const *restrict points_to_sample, double const *restrict points_being_sampled, int num_to_sample, int num_being_sampled, int dim)Create a vector with the union of points_to_sample and points_being_sampled (the latter is appended to the former).
Note the l-value return. Assigning the return to a std::vector<double> or passing it as an argument to the ctor will result in copy-elision or move semantics; no copying/performance loss.
- Parameters::
points_to_sample[dim][num_to_sample]: points at which to evaluate EI and/or its gradient to check their value in future experiments (i.e., test points for GP predictions) points_being_sampled[dim][num_being_sampled]: points being sampled in concurrent experiments num_to_sample: number of potential future samples; gradients are evaluated wrt these points (i.e., the “q” in q,p-EI) num_being_sampled: number of points being sampled in concurrent experiments (i.e., the “p” in q,p-EI) dim: the number of spatial dimensions of each point array - Returns:
- std::vector<double> with the union of the input arrays: points_being_sampled is appended to points_to_sample
int GetVectorSize(int num_mc_itr, int num_threads, int num_blocks, int num_points)A simple utility function to calculate how many random numbers will be generated by GPU computation of EI/gradEI given number of MC simulations. (user set num_mc_itr is not necessarily equal to the actual num_mc_itr used in GPU computation, because actual num_mc_itr has to be multiple of (num_threads * num_blocks)
- Parameters::
num_mc_itr: number of MC simulations num_threads: number of threads per block in GPU computation num_blocks: number of blocks in GPU computation num_points: number of points interested (aka q+p) - Returns:
- int: number of random numbers generated in GPU computation
class OptimalLearningCudaException
Exception to handle runtime errors returned by CUDA API functions. This class subclasses OptimalLearningException in gpp_exception.hpp/cpp, and basically has the same functionality as its superclass, except the constructor is different.
Public FunctionsPublic Static AttributesOptimalLearningCudaException(const CudaError & error)Constructs a OptimalLearningCudaException with struct CudaError.
- Parameters:
error: C struct that contains error message returned by CUDA API functions OL_DISALLOW_DEFAULT_AND_ASSIGN(OptimalLearningCudaException)constexpr char const * kName
String name of this exception ofr logging.
gpp_expected_improvement_gpu.cpp¶
This file contains implementations of GPU related functions. They are actually C++ wrappers for CUDA C functions defined in gpu/gpp_cuda_math.cu.
namespace optimal_learning
Macro to allow restrict as a keyword for C++ compilation and CUDA/nvcc compilation. See related entry in gpp_common.hpp for more details.
Functionsvoid CudaEvaluateEIAtPointList(const GaussianProcess & gaussian_process, const ThreadSchedule & thread_schedule, double const *restrict initial_guesses, double const *restrict points_being_sampled, int num_multistarts, int num_to_sample, int num_being_sampled, double best_so_far, int max_int_steps, int which_gpu, bool *restrict found_flag, UniformRandomGenerator * uniform_rng, double *restrict function_values, double *restrict best_next_point)This function is same as EvaluateEIAtPointList in gpp_math.cpp, except that it is specifically used for GPU functions. Refer to gpp_math.cpp for detailed documentation.
This function is the same as EvaluateEIAtPointList in gpp_math.hpp except that it is specifically used for GPU EI evaluators. Refer to gpp_math.hpp for detailed documentation.
- Parameters:
gaussian_process: GaussianProcess object (holds points_sampled, values, noise_variance, derived quantities) that describes the underlying GP thread_schedule: struct instructing OpenMP on how to schedule threads; i.e., (suggestions in parens) max_num_threads (num cpu cores), schedule type (omp_sched_static), chunk_size (0). initial_guesses[dim][num_to_sample][num_multistarts]: list of points at which to compute EI points_being_sampled[dim][num_being_sampled]: points that are being sampled in concurrent experiments num_multistarts: number of points to check num_to_sample: number of potential future samples; gradients are evaluated wrt these points (i.e., the “q” in q,p-EI) num_being_sampled: number of points being sampled concurrently (i.e., the “p” in q,p-EI) best_so_far: value of the best sample so far (must be min(points_sampled_value)) max_int_steps: maximum number of MC iterations which_gpu: the device ID of GPU used for computation uniform_rng[1]: UniformRandomGenerator object used to seed the GPU PRNG(s) - Outputs:
found_flag[1]: true if best_next_point corresponds to a nonzero EI uniform_rng[1]: UniformRandomGenerator object will have its state changed due to random draws function_values[num_multistarts]: EI evaluated at each point of initial_guesses, in the same order as initial_guesses; never dereferenced if nullptr best_next_point[dim][num_to_sample]: points yielding the best EI according to dumb search template < typename DomainType >void CudaComputeOptimalPointsToSample(const GaussianProcess & gaussian_process, const GradientDescentParameters & optimizer_parameters, const DomainType & domain, const ThreadSchedule & thread_schedule, double const *restrict points_being_sampled, int num_to_sample, int num_being_sampled, double best_so_far, int max_int_steps, bool lhc_search_only, int num_lhc_samples, int which_gpu, bool *restrict found_flag, UniformRandomGenerator * uniform_generator, double *restrict best_points_to_sample)This function is same as ComputeOptimalPointsToSample in gpp_math.cpp, except that it is specifically used for GPU functions. Refer to gpp_math.cpp for detailed documentation.
This function is the same as ComputeOptimalPointsToSample in gpp_math.hpp except that it is specifically used for GPU EI evaluators. Refer to gpp_math.hpp for detailed documentation.
- Parameters:
gaussian_process: GaussianProcess object (holds points_sampled, values, noise_variance, derived quantities) that describes the underlying GP optimizer_parameters: GradientDescentParameters object that describes the parameters controlling EI optimization (e.g., number of iterations, tolerances, learning rate) domain: object specifying the domain to optimize over (see gpp_domain.hpp) thread_schedule: struct instructing OpenMP on how to schedule threads; i.e., (suggestions in parens) max_num_threads (num cpu cores), schedule type (omp_sched_dynamic), chunk_size (0). points_being_sampled[dim][num_being_sampled]: points that are being sampled in concurrent experiments num_to_sample: how many simultaneous experiments you would like to run (i.e., the q in q,p-EI) num_being_sampled: number of points being sampled concurrently (i.e., the p in q,p-EI) best_so_far: value of the best sample so far (must be min(points_sampled_value)) max_int_steps: maximum number of MC iterations lhc_search_only: whether to ONLY use latin hypercube search (and skip gradient descent EI opt) num_lhc_samples: number of samples to draw if/when doing latin hypercube search which_gpu: device ID of GPU used for computation uniform_generator[1]: UniformRandomGenerator object used to seed the GPU PRNG(s) - Outputs:
found_flag[1]: true if best_points_to_sample corresponds to a nonzero EI if sampled simultaneously uniform_generator[1]: UniformRandomGenerator object will have its state changed due to random draws best_points_to_sample[num_to_sample*dim]: point yielding the best EI according to MGD template void CudaComputeOptimalPointsToSample(const GaussianProcess & gaussian_process, const GradientDescentParameters & optimizer_parameters, const TensorProductDomain & domain, const ThreadSchedule & thread_schedule, double const *restrict points_being_sampled, int num_to_sample, int num_being_sampled, double best_so_far, int max_int_steps, bool lhc_search_only, int num_lhc_samples, int which_gpu, bool *restrict found_flag, UniformRandomGenerator * uniform_generator, double *restrict best_points_to_sample)template void CudaComputeOptimalPointsToSample(const GaussianProcess & gaussian_process, const GradientDescentParameters & optimizer_parameters, const SimplexIntersectTensorProductDomain & domain, const ThreadSchedule & thread_schedule, double const *restrict points_being_sampled, int num_to_sample, int num_being_sampled, double best_so_far, int max_int_steps, bool lhc_search_only, int num_lhc_samples, int which_gpu, bool *restrict found_flag, UniformRandomGenerator * uniform_generator, double *restrict best_points_to_sample)