How to use templates in OpenCL

OpenCL itself (at least currently) does not support templates. In many scenarios (such as porting CUDA code), this can cause some trouble.
Here, we introduce a solution. The main idea is to use the characteristics of OpenCL compiled at runtime to realize the function similar to the C++ model through macro definition.

Solution

Get the type name dynamically

First of all, we have to solve the problem of dynamically obtaining the type name.

method 1

template<typename T>
struct TypeParseTraits {
     static const char *name;
};
template<typename T>
const char * TypeParseTraits<T>::name = "Unsupported Type!";

#define REGISTER_PARSE_TYPE(T) template<> struct TypeParseTraits<T> {\
     static const char* name; \
  }; \
  template<> const char * TypeParseTraits<T>::name = #T;

REGISTER_PARSE_TYPE(int);
REGISTER_PARSE_TYPE(float);
REGISTER_PARSE_TYPE(double);

For the registered ones, we can get the corresponding strings through TypeParseTraits::name. For unregistered types, “Unsupported” is returned by default. (see boostCompute for an example)

Method 2

template<typename T>
struct CLTypes {
     static const char * getName() { return "Unsupported"; }
};

template<>
struct CLTypes<int> {
     static const char * getName() { return "int"; }
};

template<>
struct CLTypes<float> {
     static const char * getName() { return "float"; }
};

template<>
struct CLTypes<double> {
     static const char * getName() { return "double"; }
};

String type conversion

We may choose to use the built-in functions provided by C++.

typeid(T).name()

But the C++ standard does not guarantee that it will return the expected result (for example, int is not guaranteed to return the string “int” we expect) , so there are potential portability problems with using it.

On the basis of dynamically obtaining the type name, we can replace the “model” type in the OpenCL kernel definition through a macro.

Ensure that each kernel source code will only be compiled once

method 1

In order to ensure that each kernel source code will only be compiled once, here we use the call_once function of C++11.

static std::once_flag compiled;
std::call_once(compiled, []() {
         std::ostringstream options;
         options << "-D T=" << TypeParseTraits<T>::name;
         prg = cl::Program(addVectors_src, false);
         prg.build(options.str().c_str());
         std::cout << "vector addition kernel compiled for type: " << TypeParseTraits<T>::name << std::endl;
         kl = cl::Kernel(prg, "addVectors_kl");
     });

Method 2

Another way to achieve this is to use regular static variables.

static bool complied = []() {
         std::ostringstream options;
         options << "-D T=" << TypeParseTraits<T>::name;
         prg = cl::Program(addVectors_src, false);
         prg.build(options.str().c_str());
         std::cout << "vector addition kernel compiled for type: " << TypeParseTraits<T>::name << std::endl;
         kl = cl::Kernel(prg, "addVectors_kl");
         return true;
     };

Here, there is no difference between the two methods. But in other cases, call_once may be more flexible.

Complete code

 #define STRINGFY(src) #src

 template<typename T>
 struct TypeParseTraits {
     static const char* name = "Unsupported";
 };

 #define REGISTER_PARSE_TYPE(X) template<> struct TypeParseTraits<X> { \
     static const char* name = #X; \
     };

 REGISTER_PARSE_TYPE(int);
 REGISTER_PARSE_TYPE(float);
 REGISTER_PARSE_TYPE(double);

 template<typename T>
 void addVectors(vector<T> &out, vector<T> &in1, vector<T> &in2, size_t n) {
     static std::once_flag compiled;
     static cl::Program prg;
     static cl::Kernel kl;

     static const char * addVectors_src = STRINGFY(
         kernel
         void addVectors_kl(global const T * const a, global const T * const b, global T * restrict const c) {
         unsigned idx = get_global_id(0);
         c[idx] = a[idx] + b[idx];
         });

     std::call_once(compiled, []() {
         std::ostringstream options;
         options << "-D T=" << TypeParseTraits<T>::name;
         prg = cl::Program(addVectors_src, false);
         prg.build(options.str().c_str());
         std::cout << "vector addition kernel compiled for type: " << TypeParseTraits<T>::name << std::endl;
         kl = cl::Kernel(prg, "addVectors_kl");
     });

     Buffer a(begin(in1), end(in1), true, false);
     Buffer b(begin(in2), end(in2), true, false);
     Buffer c(CL_MEM_READ_WRITE, n * sizeof(T));

     auto addVectors_kl = cl::make_kernel<Buffer, Buffer, Buffer>(kl);

     addVectors_kl(EnqueueArgs(n), a, b, c);

     cl::copy(c, begin(out), end(out));
 }

Testing

 void pseudo_template_test() {
     const int n = 10;
     vector<int> iv1(n, 1);
     vector<int> iv2(n, 2);
     vector<int> iv3(n);

     vector<double> dv1(n, 1.0);
     vector<double> dv2(n, 2.0);
     vector<double> dv3(n);

     addVectors(iv3, iv1, iv2, iv1. size());
     addVectors(dv3, dv1, dv2, dv1. size());

     for (int i = 0; i < n; i++) {
         cout << iv3[i] << ":\t" << iv1[i] + iv2[i] << endl;
     }
     cout << endl;

     for (int i = 0; i < n; i++) {
         cout << dv3[i] << ":\t" << dv1[i] + dv2[i] << endl;
     }
     cout << endl;
 }

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31