How to decrease runtime using openmp

I wanted to know what I can do to decrease runtime using Openmp threads. I made a code to add the values of a vector of size 22⁷, but the time measures are practically the same when I increase the number of threads... shouldn't the runtime fall for every thread added?

#include <stdio.h>
#include <omp.h>
#include <math.h>
#include <stdlib.h>

int main(){
    long int N = pow(2,27);
    long int *vet = malloc(sizeof(long int)*N);
    long int i;
    int nThreads = omp_get_num_threads();
    long int soma = 0;
    for(i=0; i<N; i++) vet[i] = i;

#pragma omp parallel for reduction(+:soma)
    for(i=0; i<N; i++){
        soma += vet[i];
    }

    printf("Resultado: %ld\n", soma);
    return 0;
}

insert the description of the image here

Author: Renato Sousa, 2020-05-02