Import CSV Files into R Step-by-Step Guide

Import CSV Files into R, the contents of a CSV file are stored in a tabular-like style with rows and columns. A delimiter string separates the values of the columns in each row.

K Nearest Neighbor Algorithm in Machine Learning » finnstats

The CSV files can be imported into the working environment and edited using built-in techniques as well as external package imports.

Assume we have a data.csv CSV file saved in the following location:

D:\RStudio\Binning\data.csv

This CSV file can be imported into R in one of three ways

  1. Use read.csv from R’s base package (Slowest method, but works fine for smaller datasets)

To load a.csv file into the current script and operate with it, use the read.csv() method in base R.

Regression analysis in R-Model Comparison » finnstats

The output is delivered as a data frame, with row numbers given to integers starting at 1.

data1 <- read.csv("D:\\RStudio\\Binning\\data.csv", header=TRUE, stringsAsFactors=FALSE)

2. Use the readr package’s read csv command (2-3x faster than read.csv)

The R package “readr” is used to quickly and efficiently read huge flat files into the working space.

library(readr)
data2 <- read_csv("D:\\RStudio\\Binning\\data.csv ")

3. Use the data.table package’s fread (2-3 times faster than read csv)

library(data.table)
data3 <- fread("D:\\RStudio\\Binning\\data.csv ")

This tutorial demonstrates how to import a CSV file into R using each of these approaches.

Approach 1: read.csv

If your CSV file is small enough, you may simply use Base R’s read.csv function to import it.

Decision Tree R Code » Classification & Regression » finnstats

To avoid R converting character or categorical variables into factors, set stringsAsFactors=FALSE when using this technique.

The following code demonstrates how to import this CSV file into R using read.csv:

Let’s import the CSV data file from the location

data1 <- read.csv("D:\\RStudio\\Binning\\data.csv", header=TRUE, stringsAsFactors=FALSE)
head(data1)
  Product      WHC_SLP      DHC_VOL      DHC_GLS
1       A NotPreferred NotPreferred    Preferred
2       A    Preferred    Preferred NotPreferred
3       A NotPreferred    Preferred    Preferred
4       A    Preferred NotPreferred NotPreferred
5       A NoPreference    Preferred NotPreferred
6       B NoPreference NotPreferred    Preferred

Let’s view the structure of data

str(data1)
'data.frame':      11 obs. of  4 variables:
 $ Product: chr  "A" "A" "A" "A" ...
 $ WHC_SLP: chr  "NotPreferred" "Preferred" "NotPreferred" "Preferred" ...
 $ DHC_VOL: chr  "NotPreferred" "Preferred" "Preferred" "NotPreferred" ...
 $ DHC_GLS: chr  "Preferred" "NotPreferred" "Preferred" "NotPreferred" ...

Approach 2: read_csv

You can use the read CSV function from the readr package if you’re working with larger files.

LSTM Network in R » Recurrent Neural network » finnstats

library(readr)

Now we can import the data set

data2 <- read_csv("D:\\RStudio\\Binning\\data.csv")
head(data2)
  Product WHC_SLP      DHC_VOL      DHC_GLS    
  <chr>   <chr>        <chr>        <chr>      
1 A       NotPreferred NotPreferred Preferred  
2 A       Preferred    Preferred    NotPreferred
3 A       NotPreferred Preferred    Preferred  
4 A       Preferred    NotPreferred NotPreferred
5 A       NoPreference Preferred    NotPreferred
6 B       NoPreference NotPreferred Preferred

Let’s view the structure of the data

str(data2)
spec_tbl_df [11 x 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Product: chr [1:11] "A" "A" "A" "A" ...
 $ WHC_SLP: chr [1:11] "NotPreferred" "Preferred" "NotPreferred" "Preferred" ...
 $ DHC_VOL: chr [1:11] "NotPreferred" "Preferred" "Preferred" "NotPreferred" ...
 $ DHC_GLS: chr [1:11] "Preferred" "NotPreferred" "Preferred" "NotPreferred" ...
 - attr(*, "spec")=
  .. cols(
  ..   Product = col_character(),
  ..   WHC_SLP = col_character(),
  ..   DHC_VOL = col_character(),
  ..   DHC_GLS = col_character()
  .. )

Approach 3: fread

If your CSV is exceptionally huge, the fread function from the data is the fastest way to import it into the R.

Naive Bayes Classifier in Machine Learning » Prediction Model » finnstats

Load data.table package

library(data.table)
data3 <- fread("D:\\RStudio\\Binning\\data.csv")
head(data3)
  Product      WHC_SLP      DHC_VOL      DHC_GLS
1:       A NotPreferred NotPreferred    Preferred
2:       A    Preferred    Preferred NotPreferred
3:       A NotPreferred    Preferred    Preferred
4:       A    Preferred NotPreferred NotPreferred
5:       A NoPreference    Preferred NotPreferred
6:       B NoPreference NotPreferred    Preferred

Now let’s view the structure of the data3

str(data3)
Classes ‘data.table’ and 'data.frame':       11 obs. of  4 variables:
 $ Product: chr  "A" "A" "A" "A" ...
 $ WHC_SLP: chr  "NotPreferred" "Preferred" "NotPreferred" "Preferred" ...
 $ DHC_VOL: chr  "NotPreferred" "Preferred" "Preferred" "NotPreferred" ...
 $ DHC_GLS: chr  "Preferred" "NotPreferred" "Preferred" "NotPreferred" ...
 - attr(*, ".internal.selfref")=<externalptr>

To avoid the following common error, we used double backslashes (\\) in the file path in each example.

Error: '\U' used without hex digits in character string starting ""C:\U"

Deep Neural Network in R » Keras & Tensor Flow finnstats

Subscribe to our newsletter!

[newsletter_form type=”minimal”]

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

five × 2 =