How to split vector and data frame in R

Split vector and data frame in R, splitting data into groups depending on factor levels can be done with R’s split() function.

Split() is a built-in R function that divides a vector or data frame into groups according to the function’s parameters. It takes a vector or data frame as an argument and divides the information into groups.

Time Series Trend Analysis in R » finnstats

The syntax for this function is as follows:

split(x, f, drop = FALSE, ...)
split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)

where:

x: Name of the data frame or vector to be divided into groups

f: A criterion used to classify people into groups.

In R, the unsplit() function reverses the split() function. The split() function returns a list of vectors holding the values of the groups.

The examples below demonstrate how to divide vectors and data frames into groups using this method.

Example 1: To divide a vector into groups, use the split() function.

The code below demonstrates how to divide a vector of data values into groups using a vector of factor levels.

Let’s create some vector of data values for an illustration

data <- c(5, 6, 8, 2, 1, 2, 18, 19)

Now we can define a vector of groupings

groups <- c('A', 'A', 'A', 'B', 'C', 'C'',C', 'C')

Yes, It’s ready to split vector of data values into groups

split(x = data, f = groups)
$A
[1] 5 6 8
$B
[1] 2
$C
[1]  1  2 18 19

The vector data split into three groups.

Intro to Tensorflow-Machine Learning with TensorFlow » finnstats

It’s worth noting that indexing can also be used to retrieve certain groups.

split the data values vector into groups and only show the second group

$B
[1] 2

Example 2: Split a Data Frame Into Groups with split().

Let’s imagine we have the following R data frame.

create a df data frame for illustration purpose

df <- data.frame(Product=c('X', 'X', 'Y', 'Y', 'Y', 'Z'),
                 Condition=c('T', 'T', 'F', 'F', 'T', 'F'),
                 Score=c(303, 128, 341, 319, 54, 74),
                 Quality=c(38, 27, 224, 228, 32, 41))

Let’s view the data frame

df
  Product Condition Score Quality
1       X         T   303      38
2       X         T   128      27
3       Y         F   341     224
4       Y         F   319     228
5       Y         T    54      32
6       Z         F    74      41

To divide the data frame into groups based on the ‘product’ variable, we can use the following code:

Let’s split the data frame into groups based on ‘product’

split(df, f = df$Product)
$X
  Product Condition Score Quality
1       X         T   303      38
2       X         T   128      27

$Y
  Product Condition Score Quality
3       Y         F   341     224
4       Y         F   319     228
5       Y         T    54      32

$Z
  Product Condition Score Quality
6       Z         F    74      41

As a result, there are three groupings. The first has only rows where ‘product’ equals X, the second has only rows where ‘product’ equals Y, and the third has only rows where ‘product’ equals Z.

It’s worth mentioning that the data are able to divide into groups using a variety of factor variables.

For example, the following code shows how to divide data into groups based on the ‘product’ and ‘condition’ variables.

Using the ‘product’ and ‘condition’ variables, divide the data frame into groups.

Sentiment analysis in R » Complete Tutorial » finnstats

split(df, f = list(df$Product, df$Condition))
$X.F
[1] Product   Condition Score     Quality 
<0 rows> (or 0-length row.names)

$Y.F
  Product Condition Score Quality
3       Y         F   341     224
4       Y         F   319     228

$Z.F
  Product Condition Score Quality
6       Z         F    74      41

$X.T
  Product Condition Score Quality
1       X         T   303      38
2       X         T   128      27

$Y.T
  Product Condition Score Quality
5       Y         T    54      32

$Z.T
[1] Product   Condition Score     Quality 
<0 rows> (or 0-length row.names)

Use the unsplit() function to restore the original data frame from the split() method. The unsplit() method has the following syntax.

unsplit(df, f = df$Product)

Conclusion

Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.

Likelihood Ratio Test in R with Example »

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

3 × 4 =