OpenCL itself (at least currently) does not support templates. In many scenarios (such as porting CUDA code), this can cause some trouble.
Here, we introduce a solution. The main idea is to use the characteristics of OpenCL compiled at runtime to realize the function similar to the C++ model through macro definition.
Solution
Get the type name dynamically
First of all, we have to solve the problem of dynamically obtaining the type name.
method 1
template<typename T>
struct TypeParseTraits {
static const char *name;
};
template<typename T>
const char * TypeParseTraits<T>::name = "Unsupported Type!";
#define REGISTER_PARSE_TYPE(T) template<> struct TypeParseTraits<T> {\
static const char* name; \
}; \
template<> const char * TypeParseTraits<T>::name = #T;
REGISTER_PARSE_TYPE(int);
REGISTER_PARSE_TYPE(float);
REGISTER_PARSE_TYPE(double);
For the registered ones, we can get the corresponding strings through TypeParseTraits::name. For unregistered types, “Unsupported” is returned by default. (see boostCompute for an example)
Method 2
template<typename T>
struct CLTypes {
static const char * getName() { return "Unsupported"; }
};
template<>
struct CLTypes<int> {
static const char * getName() { return "int"; }
};
template<>
struct CLTypes<float> {
static const char * getName() { return "float"; }
};
template<>
struct CLTypes<double> {
static const char * getName() { return "double"; }
};
String type conversion
We may choose to use the built-in functions provided by C++.
typeid(T).name()
But the C++ standard does not guarantee that it will return the expected result (for example, int is not guaranteed to return the string “int” we expect) , so there are potential portability problems with using it.
On the basis of dynamically obtaining the type name, we can replace the “model” type in the OpenCL kernel definition through a macro.
Ensure that each kernel source code will only be compiled once
method 1
In order to ensure that each kernel source code will only be compiled once, here we use the call_once function of C++11.
static std::once_flag compiled;
std::call_once(compiled, []() {
std::ostringstream options;
options << "-D T=" << TypeParseTraits<T>::name;
prg = cl::Program(addVectors_src, false);
prg.build(options.str().c_str());
std::cout << "vector addition kernel compiled for type: " << TypeParseTraits<T>::name << std::endl;
kl = cl::Kernel(prg, "addVectors_kl");
});
Method 2
Another way to achieve this is to use regular static variables.
static bool complied = []() {
std::ostringstream options;
options << "-D T=" << TypeParseTraits<T>::name;
prg = cl::Program(addVectors_src, false);
prg.build(options.str().c_str());
std::cout << "vector addition kernel compiled for type: " << TypeParseTraits<T>::name << std::endl;
kl = cl::Kernel(prg, "addVectors_kl");
return true;
};
Here, there is no difference between the two methods. But in other cases, call_once may be more flexible.
Complete code
#define STRINGFY(src) #src
template<typename T>
struct TypeParseTraits {
static const char* name = "Unsupported";
};
#define REGISTER_PARSE_TYPE(X) template<> struct TypeParseTraits<X> { \
static const char* name = #X; \
};
REGISTER_PARSE_TYPE(int);
REGISTER_PARSE_TYPE(float);
REGISTER_PARSE_TYPE(double);
template<typename T>
void addVectors(vector<T> &out, vector<T> &in1, vector<T> &in2, size_t n) {
static std::once_flag compiled;
static cl::Program prg;
static cl::Kernel kl;
static const char * addVectors_src = STRINGFY(
kernel
void addVectors_kl(global const T * const a, global const T * const b, global T * restrict const c) {
unsigned idx = get_global_id(0);
c[idx] = a[idx] + b[idx];
});
std::call_once(compiled, []() {
std::ostringstream options;
options << "-D T=" << TypeParseTraits<T>::name;
prg = cl::Program(addVectors_src, false);
prg.build(options.str().c_str());
std::cout << "vector addition kernel compiled for type: " << TypeParseTraits<T>::name << std::endl;
kl = cl::Kernel(prg, "addVectors_kl");
});
Buffer a(begin(in1), end(in1), true, false);
Buffer b(begin(in2), end(in2), true, false);
Buffer c(CL_MEM_READ_WRITE, n * sizeof(T));
auto addVectors_kl = cl::make_kernel<Buffer, Buffer, Buffer>(kl);
addVectors_kl(EnqueueArgs(n), a, b, c);
cl::copy(c, begin(out), end(out));
}
Testing
void pseudo_template_test() {
const int n = 10;
vector<int> iv1(n, 1);
vector<int> iv2(n, 2);
vector<int> iv3(n);
vector<double> dv1(n, 1.0);
vector<double> dv2(n, 2.0);
vector<double> dv3(n);
addVectors(iv3, iv1, iv2, iv1. size());
addVectors(dv3, dv1, dv2, dv1. size());
for (int i = 0; i < n; i++) {
cout << iv3[i] << ":\t" << iv1[i] + iv2[i] << endl;
}
cout << endl;
for (int i = 0; i < n; i++) {
cout << dv3[i] << ":\t" << dv1[i] + dv2[i] << endl;
}
cout << endl;
}