Create groups based on the lowest and highest values in R?

Create groups based on the lowest and highest values in R, to divide an input vector into n buckets, use the ntile() function in the R dplyr package.

The basic syntax used by this function is as follows.

ntile(x, n)

where:

x: Input vector

n: Number of buckets

Note: The bucket sizes might vary by up to one.

Create groups based on the lowest and highest values in R

The practical application of this function is demonstrated in the examples that follow.

Example 1: Use ntile() with a Vector

The ntile() function can be used to divide a vector of 11 elements into 5 groups using the following code.

library(dplyr)

Let’s create a vector

x <- c(10, 13, 14, 26, 27, 18, 11, 12, 15, 20, 13)
x
[1] 10 13 14 26 27 18 11 12 15 20 13

and divide the vector into five buckets.

ntile(x, 5)
[1] 1 2 3 5 5 4 1 1 3 4 2

We can see from the result that each component of the original vector has been assigned to one of five bins.

The bucket with the fewest values is number 1, while the bucket with the biggest values is number 5.

For instance:

Bucket 1 is given the 10, 11, and 12 values with the lowest values.

The bucket with the highest values of 26 and 27 is number 5.

Example 2: Use ntile() with a Data Frame

Consider the following R data frame, which displays the points scored by different basketball players:

Let’s create a data frame

df <- data.frame(player=LETTERS[1:9],
                 points=c(102, 109, 57, 122, 824, 528, 125, 159, 195))

Now we can view the data frame

df
   player points
1      A    102
2      B    109
3      C     57
4      D    122
5      E    824
6      F    528
7      G    125
8      H    159
9      I    195

The following code demonstrates how to add a new column to the data frame using the ntile() function that places each player into one of three buckets based on their total number of points.

add a new column that sorts players according to their point totals.

df$bucket <- ntile(df$points, 3)

Let’s view the updated data frame

df
  player points bucket
1      A    102      1
2      B    109      1
3      C     57      1
4      D    122      2
5      E    824      3
6      F    528      3
7      G    125      2
8      H    159      2
9      I    195      3

Each player is given a value between 1 and 3 in the new bucket column.

Players who have the fewest points are assigned a value of 1, while those who have the most points are assigned a value of 3.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

5 + 13 =

finnstats