Arm Neon is an advanced single instruction multiple data (SIMD) architecture extension for the Arm Cortex-A and Arm Cortex-R series of processors with capabilities that vastly improve use cases on mobile devices, such as multimedia encoding/decoding, user interface, 2D/3D graphics and gaming.
It essentially uses 128-bit NEON SIMD registers, which means that if you operate 32-bit floating-point numbers, you can operate 4 at the same time (variables can be defined: float32x4_t); if you operate 16-bit integers (short), you can operate 8 at the same time ( Variables can be defined: int16x8_t); and if 8-bit integers are operated, 16 can be operated at the same time (variables can be defined: int8x16_t).
Prerequisites
- CMake3.13+
- VSCode1.62+
- NDK r23b+
- ADB
Please refer to this article for VSCode and NDK configuration.
Include header files
First, include Neon header file in your source code file.
#include <arm_neon.h>
Add Neon codes
The following is a simple example of assigning values to memory using the Neon instruction set.
void arm_neon_test()
{
const int sz = 1000000;
const float val = 1.0f;
float *arr = new float[sz];
float32x4_t _v = vdupq_n_f32(val);
int i = 0;
for(; i < sz; i+=4)
{
vst1q_f32(arr + i, _v);
}
for(; i < sz; i++)
{
arr[i] = val;
}
delete[] arr;
}
Build & Run
NDK natively supports NEON, no need to link other libraries, compile and run directly refering to the previous script.