Create groups based on the lowest and highest values in R?
Create groups based on the lowest and highest values in R, to divide an input vector into n buckets, use the ntile() function in the R dplyr package.
The basic syntax used by this function is as follows.
ntile(x, n)
where:
x: Input vector
n: Number of buckets
Note: The bucket sizes might vary by up to one.
Create groups based on the lowest and highest values in R
The practical application of this function is demonstrated in the examples that follow.
Example 1: Use ntile() with a Vector
The ntile() function can be used to divide a vector of 11 elements into 5 groups using the following code.
library(dplyr)
Let’s create a vector
x <- c(10, 13, 14, 26, 27, 18, 11, 12, 15, 20, 13) x [1] 10 13 14 26 27 18 11 12 15 20 13
and divide the vector into five buckets.
ntile(x, 5) [1] 1 2 3 5 5 4 1 1 3 4 2
We can see from the result that each component of the original vector has been assigned to one of five bins.
The bucket with the fewest values is number 1, while the bucket with the biggest values is number 5.
For instance:
Bucket 1 is given the 10, 11, and 12 values with the lowest values.
The bucket with the highest values of 26 and 27 is number 5.
Example 2: Use ntile() with a Data Frame
Consider the following R data frame, which displays the points scored by different basketball players:
Let’s create a data frame
df <- data.frame(player=LETTERS[1:9], points=c(102, 109, 57, 122, 824, 528, 125, 159, 195))
Now we can view the data frame
df
player points 1 A 102 2 B 109 3 C 57 4 D 122 5 E 824 6 F 528 7 G 125 8 H 159 9 I 195
The following code demonstrates how to add a new column to the data frame using the ntile() function that places each player into one of three buckets based on their total number of points.
add a new column that sorts players according to their point totals.
df$bucket <- ntile(df$points, 3)
Let’s view the updated data frame
df
player points bucket 1 A 102 1 2 B 109 1 3 C 57 1 4 D 122 2 5 E 824 3 6 F 528 3 7 G 125 2 8 H 159 2 9 I 195 3
Each player is given a value between 1 and 3 in the new bucket column.
Players who have the fewest points are assigned a value of 1, while those who have the most points are assigned a value of 3.