Test For Randomness in R-How to check Dataset Randomness
Test For Randomness in R, How to check dataset randomness?
Assume that a and b are symbols indicating the kind of items or numbers that make up a sequence and the test hypothesis is that
H0:-The symbols occur in random order
H1:- The symbols occur in a set pattern.
Suppose the sample size is n contains n1 symbols of a and n2 symbols of b, that is n1+n2=n. Same way r1 is the number of runs in a and r2 be the number of runs in b and the total of r1+r2=r.
To decide H0, the value of r is compared with the critical number of runs from tables.
If the observed number of runs in a sample lies in between these critical values, H0 is not rejected, and if outside these critical values, H0 is rejected.
So basically run test allows us to determine the randomness of the dataset. Let’s see how to execute the same in R.
Test For Randomness
Different libraries are available
Approach 1: snpar library
Let’s make use of runs.test() function from the snpar library.
runs.test(x, exact = FALSE, alternative = c(“two.sided”, “less”, “greater”))
Load the library
create a dataset for testing
data <- c(10, 6, 18, 5, 10, 12, 12, 18, 15, 18)
Execute run test in R
runs.test(data) Approximate runs rest data: data Runs = 4, p-value = 0.2061 alternative hypothesis: two.sided
The p-value of the run test is 0.2061. Since the p-value is greater than 0.05we cannot reject the null hypothesis. It indicates that sufficient evidence observed data was formed in a random manner.
Approach 2: randtests library
runs.test() function from the randtests library, function, and syntax almost similar to approach 1.
Let’s load the library first,
Let’s make use of the same dataset.
randtests ::runs.test(data) Runs Test data: data statistic = -0.76376, runs = 4, n1 = 4, n2 = 4, n = 8, p-value = 0.445 alternative hypothesis: nonrandomness
The p-value is slightly different from approach 1, however, it’s pointing to the same inference.
Since the p-value of the test is 0.445 that is greater than 0.05, indicating that sufficient evidence to say that the data was formed in a random manner.