Adding new functionsļ
The library is organized into tiers. Each tier consists of functions that rely on one or more functions from the previous tier.
Tier1
is the lowest, primarily composed of basic operations in pure OpenCL
code.
Tier2
relies on one or more calls to Tier1 functions. Tier3
relies on one or more calls to Tier2 functions, and so forth.
Tier0
is a special tier containing functions that span multiple tiers and are used by all others for managing specific operations such as output management.
Create a new functionļ
Adding a new function to the library involves several steps. To add a new function, define it in a header file (.hpp
) and instantiate it in a source file (.cpp
), located respectively in the include
and src
directories of the clic
folder.
Function definitionļ
The definition holds the signature of the function. In the correct tierās header file, add the functionās signature.
// clic/include/tier2.hpp
auto my_operation_func(const Device::Pointer &device, const Array::Pointer &src, Array::Pointer &dst, float param1, int param2) -> Array::Pointer;
The function to create must follow several rules to properly be integrated inside the library.
First, the function name should be suffixed with _func
and use the trailing return syntax.
The first parameter should always be a const Device::Pointer& device
, followed by all the inputs and outputs required by the function.
We require that the inputs are passed as const Array::Pointer&
and named src
. The output must be an Array::Pointer&
and named dst
.
If multiple inputs or outputs are needed, an index is added to the name, e.g., src0
, src1
, dst0
, dst1
.
The function should return a type corresponding to the dst
parameter, usually Array::Pointer
.
Finally, we can add other parameters required by the function as native types.
We do not use default values for parameters as this code is not intended to be used directly by the user.
Default values should be set in the documentation block of the function to be propagate to the front-end layers.
The function definition must also come with a Doxygen documentation block. This documentation block should be placed just before the function definition and should respect the following format. This is important as the block will be used in the code autogeneration process to create the front-end layers of the library.
// clic/include/tier2.hpp
/**
* @name my_operation
* @brief This function does something.
* The brief can span several lines if needed.
*
* @param device Device to perform the operation on. [const Device::Pointer &]
* @param src The input array. [const Array::Pointer &]
* @param dst The output array. [Array::Pointer ( = None )]
* @param param1 The first parameter. [float ( = 0 )]
* @param param2 The second parameter. [int ( = 1 )]
* @return Array::Pointer
*
* @note 'category1', 'category2'
* @see https://reference_to_the_function_documentation_or_other_link_1
* @see https://reference_to_the_function_documentation_or_other_link_2
*/
auto my_operation_func(const Device::Pointer &device, const Array::Pointer &src, Array::Pointer &dst, float param1, int param2) -> Array::Pointer;
@name
is the name of the operation minus the _func
suffix.
@brief
is a short description of the function.
We will then add a @param
tag for each parameter of the function.
The @return
tag is used to specify the return type of the function.
The @param
tag should be specified as follows: @param {name_of_the_parameter} Description of the parameter [{type_of_the_parameter} ( = {default_value} )]
.
The @note
and @see
tags are optional.
The @note
tag is used to pass additional information for later, mainly used to pass categories
used to sort the functions in menus, inferences, or other similar operations.
The @see
tag is used to add links and references to the documentation. Multiple links can be added.
Note
The order of the tags impacts the documentation appearance. We advise keeping the same order as in the example: @name
, @brief
, @param
, @return
, @note
, and finally @see
.
Warning
If wrongly or undocumented, a function will either fail to be added to the library or will have undefined behavior when used in the library API.
Warning
Auto-format tools will add line returns to the documentation block. This can be a problem for the autogeneration process. Until better integration, we advise keeping the documentation line length to a minimum. This is particularly important for @param, @see, and @note tags.
Function instantiationļ
Once defined and documented, we can proceed the actual function code which will be in a .cpp
file.
Start by creating a new source file in the correct tier directory with the name of the operation you are implementing.
Here, as my function name is my_operation_func
, the file should be named my_operation.cpp
.
If it is a tier2 function, the file should be located in the clic/src/tier2
directory.
The file will have to include the needed headers and the namespace of the tier as shown below.
We encourage users to start from an existing function file and adapt it to their needs.
// clic/src/tier2.cpp
#include "tier0.hpp"
#include "tier1.hpp"
#include "tier2.hpp"
#include "utils.hpp"
namespace cle::tier2
{
auto my_operation_func(const Device::Pointer &device, const Array::Pointer &src, Array::Pointer &dst, float param1, int param2) -> Array::Pointer
{
// Implementation of the function
}
} // namespace cle::tier2
The first step in the function implementation is managing the return value. In CLIc, if not provided by the user, the functions are responsible for managing the output array creation and allocation. We can rely on a set of tier0 functions which will create and allocate the output array dst. These functions test the existence of a dst array, and if not provided, will allocate one. The most common case is to use the tier0::create_like() function. This function utilizes the information from src (size, dimension, etc.) to create an array of the same size as src. Optionally, we can specify a dType parameter if the function is supposed to return an array of a specific type. The default behavior is to propagate the src data type to the dst array.
// clic/src/tier2.cpp
auto my_operation_func(const Device::Pointer &device, const Array::Pointer &src, Array::Pointer &dst, float param1, int param2) -> Array::Pointer
{
tier0::create_like(dst, src, dType::FLOAT);
// Implementation of the function
}
Note
Several output creation functions already exist. See tier0.hpp for more information.
Note
The current convention is labels Array are supposed to use the dType::LABEL
and binary Array the dType::BINARY
type.
The rest of the code should correspond to the algorithm of the function. It is highly advised to rely on pre-existing functions from previous tiers to avoid code duplication and ensure the consistency of the library. We recommend examining other functions to see how they are implemented and using them as a template for your own function, especially for similar operations.
Call lower-tier functionsļ
Once the shell of the function is implemented, with the return variable managed, we can proceed to implement the function itself. We simply use C++ code to implement the function and rely on already existing functions from previous tiers to perform the operations.
An easy example would be the difference_of_gaussian_func
in tier2
, which relies on the gaussian_blur_func
and add_weighted_images_func
functions from tier1
to perform the operation.
// clic/src/tier2.cpp
auto difference_of_gaussian_func(const Device::Pointer& device,
const Array::Pointer& src,
Array::Pointer dst,
float sigma1_x,
float sigma1_y,
float sigma1_z,
float sigma2_x,
float sigma2_y,
float sigma2_z) -> Array::Pointer
{
tier0::create_like(src, dst, dType::FLOAT);
auto gauss1 = tier1::gaussian_blur_func(device, src, nullptr, sigma1_x, sigma1_y, sigma1_z);
auto gauss2 = tier1::gaussian_blur_func(device, src, nullptr, sigma2_x, sigma2_y, sigma2_z);
return tier1::add_images_weighted_func(device, gauss1, gauss2, dst, 1, -1);
}
The gaussian_blur_func
computes two temporary Arrays gauss1
and gauss2
on the device.
The add_images_weighted_func
then applies the difference between the two Gaussians and stores the result in dst
, as well as returning it.
Here, only relying on pre-existing functions is enough to implement a more advance function in a few lines of code without the need to write more complex OpenCL code.
A more advanced function implementation could be the extend_labeling_via_voronoi_func
also in tier2
, which also relies on pre-existing functions but implements them in a loop.
// clic/src/tier2.cpp
auto extend_labeling_via_voronoi_func(const Device::Pointer& device,
const Array::Pointer& src,
Array::Pointer dst) -> Array::Pointer
{
tier0::create_like(src, dst, dType::UINT32);
auto flip = Array::create(dst);
auto flop = Array::create(dst);
tier1::copy_func(device, src, flip);
auto flag = Array::create(1, 1, 1, 1, dType::INT32, mType::BUFFER, device);
flag->fill(0);
int flag_value = 1;
int iteration_count = 0;
while (flag_value > 0)
{
if (iteration_count % 2 == 0)
{
tier1::onlyzero_overwrite_maximum_box_func(device, flip, flag, flop);
}
else
{
tier1::onlyzero_overwrite_maximum_box_func(device, flop, flag, flip);
}
flag->readTo(&flag_value);
flag->fill(0);
iteration_count++;
}
if (iteration_count % 2 == 0)
{
flip->copyTo(dst);
}
else
{
flop->copyTo(dst);
}
return dst;
}
This function is a good example of how to create temporary Arrays in a memory-efficient way.
The flip
and flop
Arrays are created using the Array::create()
function, which creates an Array of the same size and type as the dst
Array.
We then alternate the Arrays depending on the iteration count, hence the Arraysā names flip
and flop
.
Call an OpenCL kernel fileļ
In the previous examples, we havenāt directly called a GPU kernel, yet weāve managed to fully accelerate a difference of Gaussians
operation on the GPU.
This is mainly because we relied on blocks of the algorithm already implemented on the GPU, such as gaussian_blur_func and add_images_weighted_func from tier1.
If we inspect their implementation, we can see that they donāt contain algorithmic code but rather calls for GPU kernel execution.
Indeed, the lower in the tiers we go, the more we rely on GPU kernels to perform the operations.
Inversement, the higher in the tiers we go, the more we rely on pre-existing functions to perform the operations.
In this section, we will see how to call a GPU kernel directly from a function.
This will require that the kernel already exist and is compatible with the CLIJ convention.
More on this can be found in the CLIJ kernel repository.
Kernels in the CLIJ repository are automatically stringify and stored in a header file that can be included
in the library.
// clic/src/tier1.cpp
// Include the kernel header file containing the kernel code
#include "cle_add_images_weighted.h
auto add_images_weighted_func(const Device::Pointer& device,
const Array::Pointer& src0,
const Array::Pointer& src1,
Array::Pointer dst,
float factor0,
float factor1) -> Array::Pointer
{
tier0::create_like(src0, dst, dType::FLOAT);
const KernelInfo kernel = {"add_images_weighted", kernel::add_images_weighted};
const ParameterList params = {{"src0", src0}, {"src1", src1}, {"dst", dst}, {"scalar0", factor0}, {"scalar1", factor1}};
const RangeArray range = {dst->width(), dst->height(), dst->depth()};
execute(device, kernel, params, range);
return dst;
}
We maintain the same structure as in the previous examples with the function signature, parameters, and return value management.
The rest of the function code is dedicated to preparing the GPU code and running the execute
function.
We rely on what is called JIT compilation, or Just In Time
.
This means that the kernels are compiled and run at runtime.
This is a very powerful feature as it allows writing GPU code in a flexible way, adapted to your data size and time requirements, but it requires a bit of preparation for execution.
It will also add compilation time to the process, which can be a bit long for the first execution of a kernel but is drastically reduced for subsequent calls due to a caching system.
The first thing to ensure is that the kernel code we will call is available in the CLIJ kernel repository and respects the CLIJ convention. If this is the case, we will be able to include the kernel as a header file in the clic
library. This header will contain a stringified version of the kernel code and will be passed to the execute
function as a KernelInfo
structure with the name of the kernel and the code of the kernel. By default, the KernelInfo should match the pattern { ākernel_nameā, kernel::kernel_name }.
// clic/src/tier1.cpp
#include "cle_add_images_weighted.h"
auto add_images_weighted_func(const Device::Pointer& device,
const Array::Pointer& src0,
const Array::Pointer& src1,
Array::Pointer dst,
float factor0,
float factor1) -> Array::Pointer
{
tier0::create_like(src0, dst, dType::FLOAT);
const KernelInfo kernel = {"add_images_weighted", kernel::add_images_weighted};
const ParameterList params = {
{"src0", src0}, {"src1", src1}, {"dst", dst}, {"scalar0", factor0}, {"scalar1", factor1}
};
const RangeArray range = {dst->width(), dst->height(), dst->depth()};
execute(device, kernel, params, range);
return dst;
}
The next step is to prepare the parameters for the kernel.
The parameters are passed as a ParameterList
structure with the name of the parameter and the value of the parameter.
The ParameterList
is a list of parameters defined by a tag
and a value
.
Here, the tag
is the parameter name defined in the kernel code, and the value
is an Array::Pointer
or a native type.
The order of the parameters is important and should match the order of the parameters in the kernel code.
// clic/src/tier1.cpp
#include "cle_add_images_weighted.h"
auto add_images_weighted_func(const Device::Pointer& device,
const Array::Pointer& src0,
const Array::Pointer& src1,
Array::Pointer dst,
float factor0,
float factor1) -> Array::Pointer
{
tier0::create_like(src0, dst, dType::FLOAT);
const KernelInfo kernel = {"add_images_weighted", kernel::add_images_weighted};
const ParameterList params = {
{"src0", src0}, {"src1", src1}, {"dst", dst}, {"scalar0", factor0}, {"scalar1", factor1}
};
const RangeArray range = {dst->width(), dst->height(), dst->depth()};
execute(device, kernel, params, range);
return dst;
}
The last step is to prepare the range of the kernel execution. For that, we need to define a range of processing. Here, the range is the computational dimension of the kernel.
By default, it is the dimension of the output memory, but it can be changed and must be optimized for the computation.
Once the KernelInfo
, ParameterList
, and RangeArray
are prepared, we can call the execute
function.
This function will take care of the kernel compilation and execution.
The output of the computation should be stored as one of the parameters of the ParameterList
.
In the majority of the cases, the output will be the dst
Array.
// clic/src/tier1.cpp
#include "cle_add_images_weighted.h"
auto add_images_weighted_func(const Device::Pointer& device,
const Array::Pointer& src0,
const Array::Pointer& src1,
Array::Pointer dst,
float factor0,
float factor1) -> Array::Pointer
{
tier0::create_like(src0, dst, dType::FLOAT);
const KernelInfo kernel = {"add_images_weighted", kernel::add_images_weighted};
const ParameterList params = {
{"src0", src0}, {"src1", src1}, {"dst", dst}, {"scalar0", factor0}, {"scalar1", factor1}
};
const RangeArray range = {dst->width(), dst->height(), dst->depth()};
execute(device, kernel, params, range);
return dst;
}
Note
The RangeArray has a strong impact on the performance of the kernel.
Add Function Testsļ
The final step is to add tests for the function.
The tests are located in the tests
directory at the root of the repository.
They are organized in the same way as the library, in tiers.
The tests for the function should be added in the correct tier folder.
Tests are written in cpp
and utilize the Google Test framework.
Their objective is to ensure that both the kernel and the functions work correctly in the library and that the output is as expected.
The test file should be located in the appropriate tier and named as test_{function_name}.cpp
.
It should include the gtest/gtest.h
header and the cle.hpp
header.
We recommend copying an existing test file and adapting it to the new function.
After adding a test, it may be necessary to reconfigure and rebuild the library for CMake to incorporate the new tests.
Tests can be executed using the ctest
command.
Additionally, the CI/CD pipeline runs tests on each pull request.
Note
To run a specific test, use the ctest -C Debug -R {test_name}
command.