Testing the hypothesis of the equal probability of the distribution of the RNG: the meaning of the experimental Chi-square value

There is a RNG based on the linear congruent method. It is necessary to test the hypothesis about the equal probability of the distribution of the sample values obtained using the generator. I wrote this function:

def xi_check(num_array):
    intervals = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    for i in range(len(num_array)):
        intervals[num_array[i]//10] += 1
    xi_2_theor = [0, 3.33, 5.9, 8.34, 11.4, 16.9, 100] # квантили Хи-квадрат
    xi_2_inter = [1, 0.95, 0.75, 0.50, 0.25, 0.05, 0]  # уровень значимости
    xi_2_exp = 0
    for i in range(len(intervals)):
        xi_2_exp = xi_2_exp + ((float(intervals[i]) / float(len(num_array))
        - 0.1) ** 2) / 0.1
    for i in range(6):
        if xi_2_theor[i] <= xi_2_exp <= xi_2_theor[i+1]:
            return f"{xi_2_inter[i]} - {xi_2_inter[i+1]}", xi_2_exp
        else:
            return "Error", xi_2_exp

If I get an experimental Chi-square value between 11.4 and 16.9 in the output, what does this mean for me? Is it possible to rely on the fact that the RNG determines equally probable sequence values?

In other words, I can't figure out how you can interpret the level of significance. How do I get the probability of confidence from it? 1 - (significance level)?

Author: MaxU, 2019-03-19

1 answers

The test that you are planning to perform is generally called "checking the sample for compliance with the theoretical distribution law". It can be performed by many different methods-Kolmogorov-Smirnov, Kramer-von Mises, Ginny ..... and also by using the CHI-square Pearson agreement criterion. Its essence is that you calculate a certain statistic and compare it with the value that it (statistics) would have if your sample matched the theoretical one the law. In the case of using the CHI-squared criterion, as such a statistic, the HIcv.prakt= SUM by the number of quantiles ((Ni-Ei)**2)/Ei) is used (note that this is somewhat different from the formula that you use in the program).

Then the KIcv.prakt is compared with the value taken from the KICV.crit table. when the significance level value is selected and the degree of freedom value is set. If the resulting value is less than the value of the XIcv. crit (alpha, df), then it is considered, that with the degree of significance ("confidence") of alpha, your sample does not differ from the theoretical (in this case, uniform) distribution.

Note 1. Having a distribution table is no longer mandatory today – almost all tools – from Python to EXCEL-contain functions that define these values themselves. For the same reason, there is no need to build a vegetable garden with an independent Monte Carlo simulation.

Note 2. For analysis compliance with the uniform distribution law is also a special criterion, for example-Sherman, Neumann-Barton, etc.

That's something like that.

 2
Author: passant, 2019-03-19 15:24:24