Import CSV Files into R Step-by-Step Guide
Import CSV Files into R, the contents of a CSV file are stored in a tabular-like style with rows and columns. A delimiter string separates the values of the columns in each row.
K Nearest Neighbor Algorithm in Machine Learning » finnstats
The CSV files can be imported into the working environment and edited using built-in techniques as well as external package imports.
Assume we have a data.csv CSV file saved in the following location:
D:\RStudio\Binning\data.csv
This CSV file can be imported into R in one of three ways
- Use read.csv from R’s base package (Slowest method, but works fine for smaller datasets)
To load a.csv file into the current script and operate with it, use the read.csv() method in base R.
Regression analysis in R-Model Comparison » finnstats
The output is delivered as a data frame, with row numbers given to integers starting at 1.
data1 <- read.csv("D:\\RStudio\\Binning\\data.csv", header=TRUE, stringsAsFactors=FALSE)
2. Use the readr package’s read csv command (2-3x faster than read.csv)
The R package “readr” is used to quickly and efficiently read huge flat files into the working space.
library(readr) data2 <- read_csv("D:\\RStudio\\Binning\\data.csv ")
3. Use the data.table package’s fread (2-3 times faster than read csv)
library(data.table) data3 <- fread("D:\\RStudio\\Binning\\data.csv ")
This tutorial demonstrates how to import a CSV file into R using each of these approaches.
Approach 1: read.csv
If your CSV file is small enough, you may simply use Base R’s read.csv function to import it.
Decision Tree R Code » Classification & Regression » finnstats
To avoid R converting character or categorical variables into factors, set stringsAsFactors=FALSE when using this technique.
The following code demonstrates how to import this CSV file into R using read.csv:
Let’s import the CSV data file from the location
data1 <- read.csv("D:\\RStudio\\Binning\\data.csv", header=TRUE, stringsAsFactors=FALSE) head(data1)
Product WHC_SLP DHC_VOL DHC_GLS 1 A NotPreferred NotPreferred Preferred 2 A Preferred Preferred NotPreferred 3 A NotPreferred Preferred Preferred 4 A Preferred NotPreferred NotPreferred 5 A NoPreference Preferred NotPreferred 6 B NoPreference NotPreferred Preferred
Let’s view the structure of data
str(data1)
'data.frame': 11 obs. of 4 variables: $ Product: chr "A" "A" "A" "A" ... $ WHC_SLP: chr "NotPreferred" "Preferred" "NotPreferred" "Preferred" ... $ DHC_VOL: chr "NotPreferred" "Preferred" "Preferred" "NotPreferred" ... $ DHC_GLS: chr "Preferred" "NotPreferred" "Preferred" "NotPreferred" ...
Approach 2: read_csv
You can use the read CSV function from the readr package if you’re working with larger files.
LSTM Network in R » Recurrent Neural network » finnstats
library(readr)
Now we can import the data set
data2 <- read_csv("D:\\RStudio\\Binning\\data.csv") head(data2)
Product WHC_SLP DHC_VOL DHC_GLS <chr> <chr> <chr> <chr> 1 A NotPreferred NotPreferred Preferred 2 A Preferred Preferred NotPreferred 3 A NotPreferred Preferred Preferred 4 A Preferred NotPreferred NotPreferred 5 A NoPreference Preferred NotPreferred 6 B NoPreference NotPreferred Preferred
Let’s view the structure of the data
str(data2)
spec_tbl_df [11 x 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame) $ Product: chr [1:11] "A" "A" "A" "A" ... $ WHC_SLP: chr [1:11] "NotPreferred" "Preferred" "NotPreferred" "Preferred" ... $ DHC_VOL: chr [1:11] "NotPreferred" "Preferred" "Preferred" "NotPreferred" ... $ DHC_GLS: chr [1:11] "Preferred" "NotPreferred" "Preferred" "NotPreferred" ... - attr(*, "spec")= .. cols( .. Product = col_character(), .. WHC_SLP = col_character(), .. DHC_VOL = col_character(), .. DHC_GLS = col_character() .. )
Approach 3: fread
If your CSV is exceptionally huge, the fread function from the data is the fastest way to import it into the R.
Naive Bayes Classifier in Machine Learning » Prediction Model » finnstats
Load data.table package
library(data.table)
data3 <- fread("D:\\RStudio\\Binning\\data.csv") head(data3)
Product WHC_SLP DHC_VOL DHC_GLS 1: A NotPreferred NotPreferred Preferred 2: A Preferred Preferred NotPreferred 3: A NotPreferred Preferred Preferred 4: A Preferred NotPreferred NotPreferred 5: A NoPreference Preferred NotPreferred 6: B NoPreference NotPreferred Preferred
Now let’s view the structure of the data3
str(data3)
Classes ‘data.table’ and 'data.frame': 11 obs. of 4 variables: $ Product: chr "A" "A" "A" "A" ... $ WHC_SLP: chr "NotPreferred" "Preferred" "NotPreferred" "Preferred" ... $ DHC_VOL: chr "NotPreferred" "Preferred" "Preferred" "NotPreferred" ... $ DHC_GLS: chr "Preferred" "NotPreferred" "Preferred" "NotPreferred" ... - attr(*, ".internal.selfref")=<externalptr>
To avoid the following common error, we used double backslashes (\\) in the file path in each example.
Error: '\U' used without hex digits in character string starting ""C:\U"
Deep Neural Network in R » Keras & Tensor Flow finnstats
Subscribe to our newsletter!