Test For Randomness in R-How to check Dataset Randomness

Test For Randomness in R, How to check dataset randomness?

Assume that a and b are symbols indicating the kind of items or numbers that make up a sequence and the test hypothesis is that

H0:-The symbols occur in random order

H1:- The symbols occur in a set pattern.

Suppose the sample size is n contains n1 symbols of a and n2 symbols of b, that is n1+n2=n. Same way r1 is the number of runs in a and r2 be the number of runs in b and the total of r1+r2=r.

To decide H0, the value of r is compared with the critical number of runs from tables.

tidyverse in r – Complete Tutorial » Unknown Techniques »

If the observed number of runs in a sample lies in between these critical values, H0 is not rejected, and if outside these critical values, H0 is rejected.

So basically run test allows us to determine the randomness of the dataset. Let’s see how to execute the same in R.

Test For Randomness

Different libraries are available

Approach 1: snpar library

Let’s make use of runs.test() function from the snpar library.

Naive Bayes Classification in R » Prediction Model »

Syntax:-

runs.test(x, exact = FALSE, alternative = c(“two.sided”, “less”, “greater”))

Load the library

library(snpar)

create a dataset for testing

data <- c(10, 6, 18, 5, 10, 12, 12, 18, 15, 18)

Execute run test in R

runs.test(data)
          Approximate runs rest
data:  data
Runs = 4, p-value = 0.2061
alternative hypothesis: two.sided

The p-value of the run test is 0.2061. Since the p-value is greater than 0.05we cannot reject the null hypothesis. It indicates that sufficient evidence observed data was formed in a random manner.

Deep Neural Network in R » Keras & Tensor Flow

Approach 2: randtests library

runs.test() function from the randtests library, function, and syntax almost similar to approach 1.

Let’s load the library first,

library(randtests)

Let’s make use of the same dataset.

randtests ::runs.test(data)
          Runs Test
data:  data
statistic = -0.76376, runs = 4, n1 = 4, n2 = 4, n = 8, p-value = 0.445
alternative hypothesis: nonrandomness

The p-value is slightly different from approach 1, however, it’s pointing to the same inference.

Since the p-value of the test is 0.445 that is greater than 0.05, indicating that sufficient evidence to say that the data was formed in a random manner.

LSTM Network in R » Recurrent Neural network »

You may also like...

1 Response

  1. Excellent post however , I was wondering if you could write a litte more on this subject?
    I’d be very grateful if you could elaborate a little bit further.
    Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *

3 + 1 =