Filtering Data in R 10 Tips -tidyverse package

filtering data in r, In this tutorial describes how to filter or extract data frame rows based on certain criteria.

It’s the process of getting your raw data transformed into a format that’s easier to work with for analysis.

In this tutorial, you will learn the filter R functions from the tidyverse package.

The main idea is to showcase different ways of filtering from the data set.

Filtering data is one of the common tasks in the data analysis process. When you want to remove or extract a part of the data use tidyverse package ’filter()’ function.

Load Library

library(tidyverse)
str(msleep)
tibble[,11] [83 x 11] (S3: tbl_df/tbl/data.frame)
  $ name        : chr [1:83] "Cheetah" "Owl monkey" "Mountain beaver" "Greater short-tailed shrew" …
  $ genus       : chr [1:83] "Acinonyx" "Aotus" "Aplodontia" "Blarina" …
  $ vore        : chr [1:83] "carni" "omni" "herbi" "omni" …
  $ order       : chr [1:83] "Carnivora" "Primates" "Rodentia" "Soricomorpha" …
  $ conservation: chr [1:83] "lc" NA "nt" "lc" …
  $ sleep_total : num [1:83] 12.1 17 14.4 14.9 4 14.4 8.7 7 10.1 3 …
  $ sleep_rem   : num [1:83] NA 1.8 2.4 2.3 0.7 2.2 1.4 NA 2.9 NA …
  $ sleep_cycle : num [1:83] NA NA NA 0.133 0.667 …
  $ awake       : num [1:83] 11.9 7 9.6 9.1 20 9.6 15.3 17 13.9 21 …
  $ brainwt     : num [1:83] NA 0.0155 NA 0.00029 0.423 NA NA NA 0.07 0.0982 …
  $ bodywt      : num [1:83] 50 0.48 1.35 0.019 600 …

We’ll use the R built-in msleep data set, which we will use for a different types of filtering.

It’s estimated that as much as 75% of a data scientist’s time is spent data cleaning & reshaping. To be an effective data scientist, you need to learn fastest way handling these things.

Example 1

Sentiment analysis in R » Complete Tutorial »

data1<-msleep %>%
  select(name,sleep_total) %>%
  filter(sleep_total>15)

Output:-

      name                           sleep_total                                    
  1 Owl monkey                            17  
  2 Long-nosed armadillo                  17.4
  3 North American Opossum                18  
  4 Big brown bat                         19.7
  5 Thick-tailed opposum                  19.4
  6 Little brown bat                      19.9
  7 Tiger                                 15.8
  8 Giant armadillo                       18.1
  9 Arctic ground squirrel                16.6
 10 Golden-mantled ground squirrel        15.9
 11 Eastern american chipmunk             15.8
 12 Tenrec                                15.6

Example 2

Correlation Analysis Different Types of Plots in R »

data2<-msleep %>%
  select(name,sleep_total) %>%
  filter(!sleep_total>15)

Output:-

     name                       sleep_total                                
  1 Cheetah                           12.1
  2 Mountain beaver                   14.4
  3 Greater short-tailed shrew        14.9
  4 Cow                                4  
  5 Three-toed sloth                  14.4
  6 Northern fur seal                  8.7
  7 Vesper mouse                       7  
  8 Dog                               10.1
  9 Roe deer                           3  
 10 Goat                               5.3

Example 3

Power analysis in Statistics with R »

data3<-msleep %>%
  select(name,order,bodywt,sleep_total) %>%
  filter(order=="Primates", bodywt>15)

Output:-

    name       order    bodywt sleep_total                     
 1 Human      Primates   62           8  
 2 Chimpanzee Primates   52.2         9.7
 3 Baboon     Primates   25.2         9.4

Example 4

Principal component analysis (PCA) in R »

data4<-msleep %>%
select(name, order, bodywt,sleep_total) %>%
  filter(order=="Primates" | bodywt>15)

Output:-

    name              order           bodywt sleep_total                                    
  1 Cheetah           Carnivora        50           12.1
  2 Owl monkey        Primates          0.48        17  
  3 Cow               Artiodactyla    600            4  
  4 Northern fur seal Carnivora        20.5          8.7
  5 Goat              Artiodactyla     33.5          5.3
  6 Grivet            Primates          4.75        10  
  7 Asian elephant    Proboscidea    2547            3.9
  8 Horse             Perissodactyla  521            2.9
  9 Donkey            Perissodactyla  187            3.1
 10 Patas monkey      Primates         10           10.9

Example 5

data5<-msleep %>%
  select(name,sleep_total) %>%
  filter(name=="Cow" |
           name=="Dog"|
           name=="Goat")

Output:-

   name  sleep_total          
 1 Cow           4  
 2 Dog          10.1
 3 Goat          5.3

Example 6

Stock Prediction-Intraday Trading » With High Accuracy »

data6<-msleep %>%
  select(name, sleep_total) %>%
  filter(name %in% c("Cow","Dog","Goat"))

Output:-

   name  sleep_total          
 1 Cow           4  
 2 Dog          10.1
 3 Goat          5.3

Example 7

data7<-msleep %>%
  select(name, sleep_total) %>%
  filter(between(sleep_total,16,18))

Output:-

   name                   sleep_total                           
 1 Owl monkey                    17  
 2 Long-nosed armadillo          17.4
 3 North American Opossum        18  
 4 Arctic ground squirrel        16.6

Example 8

KNN Algorithm Machine Learning » Classification & Regression »

data8<-msleep %>%
  select(name, sleep_total) %>%
  filter(near(sleep_total,17, tol=0.5))

Output:-

   name                   sleep_total                           
 1 Owl monkey                    17  
 2 Long-nosed armadillo          17.4
 3 Arctic ground squirrel        16.6

Example 9

data9<-msleep %>%
  select(name, conservation,sleep_total) %>%
  filter(is.na(conservation))

Output:-

   name                        conservation sleep_total                                         
  1 "Owl monkey"                NA                  17  
  2 "Three-toed sloth"          NA                  14.4
  3 "Vesper mouse"              NA                   7  
  4 "African giant pouched rat" NA                   8.3
  5 "Western american chipmunk" NA                  14.9
  6 "Galago"                    NA                   9.8
  7 "Human"                     NA                   8  
  8 "Macaque"                   NA                  10.1
  9 "Vole "                     NA                  12.8
 10 "Little brown bat"          NA                  19.9

Example 10

data10<-msleep %>%   
select(name, conservation,sleep_total) %>%   
filter(!is.na(conservation)) 

Output:-

    name                       conservation sleep_total                                        
  1 Cheetah                    lc                  12.1
  2 Mountain beaver            nt                  14.4
  3 Greater short-tailed shrew lc                  14.9
  4 Cow                        domesticated         4  
  5 Northern fur seal          vu                   8.7
  6 Dog                        domesticated        10.1
  7 Roe deer                   lc                   3  
  8 Goat                       lc                   5.3
  9 Guinea pig                 domesticated         9.4
 10 Grivet                     lc                  10 

Market Basket Analysis in R » What Goes With What »

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

six + 9 =