Sort Data in R With Examples
Sort Data in R With Examples, R is a powerful tool for data analysis and manipulation. Sorting data in R is a common task that involves arranging data in ascending or descending order based on one or more columns.
This can be especially useful in exploratory data analysis, as it allows you to quickly identify patterns and outliers in your data.
In this article, we will discuss various ways to sort data in R, including sorting by a single column, sorting by multiple columns, sorting by a specific order, and sorting based on a custom function.
We will also demonstrate these methods using several inbuilt datasets in R.
Sorting Data in R by a Single Column
Sorting data by a single column is the most basic type of sorting in R. To sort a dataset by a single column, we can use the order()
function. This function returns a vector of row indices corresponding to the sorted data.
Let’s illustrate this with the mtcars
dataset, which contains information about various car models:
data(mtcars)
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The data()
function is used to load the dataset into the R environment. The head()
function displays the first 6 rows of the dataset.
To sort the mtcars
dataset by the mpg
column in ascending order, we can use the following code:
sorted_mpg <- mtcars[order(mtcars$mpg), ]
head(sorted_mpg)
mpg cyl disp hp drat wt qsec vs am gear carb
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
The order()
function takes the column we want to sort by as the first argument. We then use this vector of row indices to select and rearrange the rows of the mtcars
data frame.
The output shows the first 6 rows of the sorted mtcars
dataset, with the lowest mpg
values at the top.
Sorting Data in R by Multiple Columns
Sorting data by multiple columns can be useful if we want to arrange data according to more than one criterion.
To sort a dataset by multiple columns, we can use the order()
function with additional arguments specifying the order of importance for each column.
For example, let’s sort the mtcars
dataset by decreasing mpg
values and then by decreasing hp
values:
sorted_mpg_hp <- mtcars[order(-mtcars$mpg, -mtcars$hp), ]
head(sorted_mpg_hp)
mpg cyl disp hp drat wt qsec vs am gear carb
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Here, we use the -
sign before each variable name to sort in descending order. The mtcars
the dataset is now sorted first by mpg
and then by hp
, with the highest mpg
and hp
values at the top.
Sorting Data in R by a Specific Order
Sometimes, we may want to sort data in a specific order that is not alphabetical or numerical. In such cases, we can use a factor variable to specify the desired order. When we sort a data frame by a factor variable, R sorts the data according to the order of levels in the factor.
To illustrate this, let’s use the Iris
dataset, which contains measurements of various iris flower species:
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
To sort the Iris
dataset by the Species
column in the order setosa
, versicolor
, virginica
, we first convert the Species
column to a factor variable with the desired levels:
iris$Species <- factor(iris$Species, levels=c("setosa", "versicolor", "virginica"))
sorted_species <- iris[order(iris$Species), ]
head(sorted_species)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
The factor()
function is used to convert the Species
column to a factor variable with the levels in the desired order.
Decision tree regression and Classification ยป
We can now sort the Iris
dataframe by the Species
column and the data will be arranged according to the desired order.
Sorting Data in R Based on a Custom Function
In some cases, we may want to sort data based on a custom function that does not rely on a standard ordering criterion such as alphabetical or numerical order.
For example, we may want to sort a dataset of people based on their age, but we may also want to prioritize people who have a higher income or a certain job title.
To sort data based on a custom function, we can use the order()
function with a custom ordering function passed as an argument.
The ordering function should take a vector of values and return a vector of indices indicating the sorted order.
Let’s illustrate this with the mtcars
dataset, where we want to sort by the ratio of mpg
to wt
values:
sort_by_ratio <- function(df) {
ratio <- df$mpg / df$wt
sorted_indices <- order(ratio)
return(df[sorted_indices, ])
}
sorted_ratio <- sort_by_ratio(mtcars)
head(sorted_ratio)
mpg cyl disp hp drat wt qsec vs am gear carb
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Here, we define a function sort_by_ratio()
that takes a dataframe as its argument and computes the ratio of mpg
to wt
. We then use the order()
function to sort the ratio
vector and return the corresponding indices. We use these indices to rearrange the rows of the dataframe and return the sorted dataframe.
Summary
Sorting data in R is a common task that involves arranging data in ascending or descending order based on one or more columns.
We can sort a dataset in various ways, such as sorting by a single column, sorting by multiple columns, sorting by a specific order, and sorting based on a custom function.
In this article, we demonstrated these sorting methods using several inbuilt datasets in R, including the mtcars
and Iris
datasets.
By using these methods, we can quickly identify patterns and outliers in our data and make informed decisions based on our findings.