How to vectorize code in C++?

Would you like to know how to vectorize code in C++ ? because the material I found on the internet is a bit over the top about it.

I understand how to vectorize the use, not only of vectors, but of doing in a single step a whole sequence of steps, that is, doing at once d= (c+e)/2; instead of repeating these step for each position of the Matrix d[i][j] = (c[i][j]+e[i][j])/2;

For example how to vectorize the following program ?

#include <iostream>

using namespace std;
int main(){

    int d[4][4],c[4][4],e[4][4];

    for(int i=0;i<4;i++){
        for(int j=0;j<4;j++){
            c[i][j] =i+j;
            e[i][j] = 4*i;
        }
    }
    for(int i=0;i<4;i++){
        for(int j=0;j<4;j++){
            d[i][j] = (c[i][j]+e[i][j])/2;
            if(d[i][j]<3){
                d[i][j]=3;
            } 
        }  
    }
    for(int i=0;i<4;i++){
        for(int j=0;j<4;j++){
           cout << d[i][j] << " ";
        }
        cout << endl;
    }


    return 0;
}

When I use the vectorization flag to see how many loops are being vectorized with the help of -O2 -ftree-vectorize -fopt-info-vec-optimized it answers me "vectorized loop" i.e. only one loop has been vectorized and if I use a-all instead of -optimized it returns me that many parts of the program have not been vectorized.

Author: Lacobus, 2017-09-21

1 answers

The problem is that the conditional if contained within the second loop does not allow it to be optimized by the compiler:

for(int i=0;i<4;i++){
    for(int j=0;j<4;j++){
        d[i][j] = (c[i][j]+e[i][j])/2;
        if(d[i][j]<3){
            d[i][j]=3;
        }
    }
}

A solution to this problem is to replace the conditional if with a conditional ternário, for example:

for(int i=0;i<4;i++){
    for(int j=0;j<4;j++){
        d[i][j] = (c[i][j]+e[i][j])/2;
        d[i][j] = ( d[i][j] < 3 ) ? 3 : d[i][j];
    }
}

Build Test GCC:

$ g++ -v -O2 -ftree-vectorize -fopt-info-vec-optimized vect.cpp -o vect

Output:

[...]

Analyzing loop at vect.cpp:21

Analyzing loop at vect.cpp:14

vect.cpp:14: note: vect_recog_divmod_pattern: detected: 
vect.cpp:14: note: pattern recognized: patt_3 = patt_4 >> 1;

Analyzing loop at vect.cpp:15

vect.cpp:15: note: vect_recog_divmod_pattern: detected: 
vect.cpp:15: note: pattern recognized: patt_77 = patt_1 >> 1;


Vectorizing loop at vect.cpp:15

vect.cpp:15: note: LOOP VECTORIZED.
Analyzing loop at vect.cpp:8

Analyzing loop at vect.cpp:9


Vectorizing loop at vect.cpp:9

vect.cpp:9: note: LOOP VECTORIZED.
vect.cpp:4: note: vectorized 2 loops in function.

[...]

References:

Https://locklessinc.com/articles/vectorize /

Https://gcc.gnu.org/projects/tree-ssa/vectorization.html

EDIT:

The answer applies only to the 4.8 version of GCC.

The version 7.0, is already capable of vectorizing loops without the need to replace the conditionals if by ternary operators through the optimization option -fsplit-loops.

Reference: https://clearlinux.org/blogs/gcc-7-importance-cutting-edge-compiler

 2
Author: Lacobus, 2017-09-22 14:11:39