gpp_cuda_math¶

Contents:

gpp_cuda_math.hpp

gpp_cuda_math.cu

gpp_cuda_math.hpp¶

This file contains declaration of gpu functions (host code) that are called by C++ code. The functions include calculating ExpectedImprovement, gradient of ExpectedImprovement, and gpu utility functions (memory allocation, setup gpu device, etc)

namespace optimal_learning

Macro to allow restrict as a keyword for C++ compilation and CUDA/nvcc compilation. See related entry in gpp_common.hpp for more details.

Variables

const unsigned int kEINumBlocks

Number of blocks assigned for computing Expected Improvement on GPU.

const unsigned int kEINumThreads

Number of threads per block assigned for computing Expected Improvement on GPU.

const unsigned int kGradEINumBlocks

Number of blocks assigned for computing Gradient of Expected Improvement on GPU.

const unsigned int kGradEINumThreads

Number of threads per block assigned for computing Gradient of Expected Improvement on GPU.

const CudaError kCudaSuccess

CudaError struct encoding a successful CUDA operation.

class CudaError

This C struct contains error information that are used by exception handling in gpp_expected_improvement_gpu.hpp/cpp. File/line and function information are empty strings if the error code is cudaSuccess (i.e., no error).

Public Members

cudaError_t err

error returned by CUDA API functions (basically enum type)

char const * file_and_line_info

file and line info of the function which returned error

char const * func_info

name of the function that returned error

gpp_cuda_math.cu¶

This file contains implementations of all GPU functions. There are both device code (executed on GPU device) and host code (executed on CPU), and they are compiled by NVCC, which is a NVIDIA CUDA compiler.

Defines

OL_CUDA_STRINGIFY_EXPANSION_INNER(x)

Macro to stringify the expansion of a macro. For example, say we are on line 53:

#__LINE__ --> "__LINE__"

OL_CUDA_STRINGIFY_EXPANSION(__LINE__) --> "53"

OL_CUDA_STRINGIFY_EXPANSION_INNER is not meant to be used directly; but we need #x in a macro for this expansion to work.

This is a standard trick; see bottom of: http://gcc.gnu.org/onlinedocs/cpp/Stringification.html

OL_CUDA_STRINGIFY_EXPANSION(x)

OL_CUDA_STRINGIFY_FILE_AND_LINE

Macro to stringify and format the current file and line number. For example, if the macro is invoked from line 893 of file gpp_foo.cpp, this macro produces the compile-time string-constant: (gpp_foo.cpp: 893)

OL_CUDA_ERROR_RETURN(X)

Macro that checks error message (with type cudaError_t) returned by CUDA API functions, and if there is error occurred, the macro produces a C struct containing error message, function name where error occured, file name and line info, and then terminate the function.

namespace optimal_learning

Macro to allow restrict as a keyword for C++ compilation and CUDA/nvcc compilation. See related entry in gpp_common.hpp for more details.

Functions

CudaError CudaGetEI(double const *restrict mu, double const *restrict chol_var, int num_union, int num_mc, double best, uint64_t base_seed, bool configure_for_test, double *restrict gpu_mu, double *restrict gpu_chol_var, double *restrict random_number_ei, double *restrict gpu_random_number_ei, double *restrict gpu_ei_storage, double *restrict ei_val)

Compute Expected Improvement by Monte-Carlo using GPU, and this function is only meant to be used by CudaExpectedImprovementEvaluator::ComputeExpectedImprovement(...) in gpp_expected_improvement_gpu.hpp/cpp

Parameters:

mu[num_union]: the mean of the GP evaluated at points interested

chol_var[num_union][num_union]:

cholesky factorization of the GP variance evaluated at points interested

num_union: number of the points interested

num_mc: number of iterations for Monte-Carlo simulation

best: best function evaluation obtained so far

base_seed: base seed for the GPU’s RNG; will be offset by GPU thread index (see curand)

configure_for_test:

whether record random_number_ei or not

Outputs:

gpu_mu[num_union]:

device pointer to memory storing mu on GPU

gpu_chol_var[num_union][num_union]:

device pointer to memory storing chol_var on GPU

random_number_ei[num_union][num_iteration][num_threads][num_blocks]:

random numbers used for computing EI, for testing purpose only

gpu_random_number_ei[num_union][num_iteration][num_threads][num_blocks]:

device pointer to memory storing random numbers used for computing EI, for testing purpose only

gpu_ei_storage[num_threads][num_blocks]:

device pointer to memory storing values of EI on GPU

ei_val[1]: pointer to value of Expected Improvement

Returns:

CudaError state, which contains error information, file name, line and function name of the function that occurs error

CudaError CudaGetGradEI(double const *restrict mu, double const *restrict grad_mu, double const *restrict chol_var, double const *restrict grad_chol_var, int num_union, int num_to_sample, int dim, int num_mc, double best, uint64_t base_seed, bool configure_for_test, double *restrict gpu_mu, double *restrict gpu_grad_mu, double *restrict gpu_chol_var, double *restrict gpu_grad_chol_var, double *restrict random_number_grad_ei, double *restrict gpu_random_number_grad_ei, double *restrict gpu_grad_ei_storage, double *restrict grad_ei)

Compute Gradient of Expected Improvement by Monte-Carlo using GPU, and this function is only meant to be used by CudaExpectedImprovementEvaluator::ComputeGradExpectedImprovement(...) in gpp_expected_improvement_gpu.hpp/cpp

Parameters:

mu[num_union]: the mean of the GP evaluated at points interested

grad_mu[dim][num_to_sample]:

the gradient of mean of the GP evaluated at points interested

chol_var[num_union][num_union]:

cholesky factorization of the GP variance evaluated at points interested

grad_chol_var[dim][num_union][num_union][num_to_sample]:

gradient of cholesky factorization of the GP variance evaluated at points interested

num_union: number of the union of points (aka q+p)

num_to_sample: number of points to sample (aka q)

dim: dimension of point space

num_mc: number of iterations for Monte-Carlo simulation

best: best function evaluation obtained so far

base_seed: base seed for the GPU’s RNG; will be offset by GPU thread index (see curand)

configure_for_test:

whether record random_number_grad_ei or not

Outputs:

gpu_mu[num_union]:

device pointer to memory storing mu on GPU

gpu_grad_mu[dim][num_to_sample]:

device pointer to memory storing grad_mu on GPU

gpu_chol_var[num_union][num_union]:

device pointer to memory storing chol_var on GPU

gpu_grad_chol_var[dim][num_union][num_union][num_to_sample]:

device pointer to memory storing grad_chol_var on GPU

random_number_grad_ei[num_union][num_threads][num_blocks]:

random numbers used for computing gradEI, for testing purpose only

gpu_random_number_grad_ei[num_union][num_threads][num_blocks]:

device pointer to memory storing random numbers used for computing gradEI, for testing purpose only

gpu_grad_ei_storage[dim][num_to_sample][num_threads][num_blocks]:

device pointer to memory storing values of gradient EI on GPU

grad_ei[dim][num_to_sample]:

pointer to gradient of Expected Improvement

Returns:

CudaError state, which contains error information, file name, line and function name of the function that occurs error

CudaError CudaMallocDeviceMemory(size_t size, void **restrict address_of_ptr_to_gpu_memory)

Allocate GPU device memory for storing an array; analogous to malloc() in C. Thin wrapper around cudaMalloc() that handles errors. See: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html

Do not dereference address_of_ptr_to_gpu_memory outside the GPU device. Do not dereference address_of_ptr_to_gpu_memory if the error code (return_value.err) is not cudaSuccess.

Parameters:

size: number of bytes to allocate

address_of_ptr_to_gpu_memory:

address of the pointer to alllocated device memory on the GPU

Returns:

CudaError state, which contains error information, file name, line and function name of the function that occurs error

CudaError CudaFreeDeviceMemory(void *restrict ptr_to_gpu_memory)

Free GPU device memory on the GPU; analogous to free() in C. Thin wrapper around cudaFree() that handles errors. See: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html

Parameters:

ptr_to_gpu_memory:

pointer to memory on GPU to free; MUST have been returned by a previous call to cudaMalloc().

Returns:

CudaError state, which contains error information, file name, line and function name of the function that occurs error

CudaError CudaSetDevice(int devID)

Setup GPU device, and all GPU function calls will be operated on the GPU activated by this function.

Parameters:

devID: the ID of GPU device to setup

Returns:

CudaError state, which contains error information, file name, line and function name of the function that occurs error