How to split vector and data frame in R
Split vector and data frame in R, splitting data into groups depending on factor levels can be done with R’s split() function.
Split() is a built-in R function that divides a vector or data frame into groups according to the function’s parameters. It takes a vector or data frame as an argument and divides the information into groups.
Time Series Trend Analysis in R » finnstats
The syntax for this function is as follows:
split(x, f, drop = FALSE, ...) split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)
where:
x: Name of the data frame or vector to be divided into groups
f: A criterion used to classify people into groups.
In R, the unsplit() function reverses the split() function. The split() function returns a list of vectors holding the values of the groups.
The examples below demonstrate how to divide vectors and data frames into groups using this method.
Example 1: To divide a vector into groups, use the split() function.
The code below demonstrates how to divide a vector of data values into groups using a vector of factor levels.
Let’s create some vector of data values for an illustration
data <- c(5, 6, 8, 2, 1, 2, 18, 19)
Now we can define a vector of groupings
groups <- c('A', 'A', 'A', 'B', 'C', 'C'',C', 'C')
Yes, It’s ready to split vector of data values into groups
split(x = data, f = groups)
$A [1] 5 6 8 $B [1] 2 $C [1] 1 2 18 19
The vector data split into three groups.
Intro to Tensorflow-Machine Learning with TensorFlow » finnstats
It’s worth noting that indexing can also be used to retrieve certain groups.
split the data values vector into groups and only show the second group
$B [1] 2
Example 2: Split a Data Frame Into Groups with split().
Let’s imagine we have the following R data frame.
create a df data frame for illustration purpose
df <- data.frame(Product=c('X', 'X', 'Y', 'Y', 'Y', 'Z'), Condition=c('T', 'T', 'F', 'F', 'T', 'F'), Score=c(303, 128, 341, 319, 54, 74), Quality=c(38, 27, 224, 228, 32, 41))
Let’s view the data frame
df
Product Condition Score Quality 1 X T 303 38 2 X T 128 27 3 Y F 341 224 4 Y F 319 228 5 Y T 54 32 6 Z F 74 41
To divide the data frame into groups based on the ‘product’ variable, we can use the following code:
Let’s split the data frame into groups based on ‘product’
split(df, f = df$Product) $X
Product Condition Score Quality 1 X T 303 38 2 X T 128 27 $Y Product Condition Score Quality 3 Y F 341 224 4 Y F 319 228 5 Y T 54 32 $Z Product Condition Score Quality 6 Z F 74 41
As a result, there are three groupings. The first has only rows where ‘product’ equals X, the second has only rows where ‘product’ equals Y, and the third has only rows where ‘product’ equals Z.
It’s worth mentioning that the data are able to divide into groups using a variety of factor variables.
For example, the following code shows how to divide data into groups based on the ‘product’ and ‘condition’ variables.
Using the ‘product’ and ‘condition’ variables, divide the data frame into groups.
Sentiment analysis in R » Complete Tutorial » finnstats
split(df, f = list(df$Product, df$Condition))
$X.F [1] Product Condition Score Quality <0 rows> (or 0-length row.names) $Y.F Product Condition Score Quality 3 Y F 341 224 4 Y F 319 228 $Z.F Product Condition Score Quality 6 Z F 74 41 $X.T Product Condition Score Quality 1 X T 303 38 2 X T 128 27 $Y.T Product Condition Score Quality 5 Y T 54 32 $Z.T [1] Product Condition Score Quality <0 rows> (or 0-length row.names)
Use the unsplit() function to restore the original data frame from the split() method. The unsplit() method has the following syntax.
unsplit(df, f = df$Product)
Conclusion
Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.