Stringr in r 10 data manipulation Tips and Tricks

Stringr in r data manipulation Tips and Tricks, In this tutorial we are going to discuss useful functions and expressions in stringr package.

Stringr in r

Variety of functions available in stringr package but we are going cover only important functions in our day-to-day data analysis.

library(stringr)

1. Word Length

statement<-c("R", "is powerful", "tool", "for data", "analysis")

Suppose if you want to find the length of each word, you can use str_length

statement
"R"           "is powerful" "tool"        "for data"    "analysis"
str_length(statement)
1 11  4  8  8

2. Concatnate

If you want to join the string str_c will be useful.

Market Basket Analysis in R » What Goes With What »

str_c(statement,collapse=" ")

“R is powerful tool for data analysis”

str_c("test",1:10, sep="-")[1] "test-1"  "test-2"  "test-3"  "test-4"  "test-5"  "test-6"  "test-7"  "test-8"  "test-9"
[10] "test-10"
str_c("test",1:10, sep=",")
[1] "test,1"  "test,2"  "test,3"  "test,4"  "test,5"  "test,6"  "test,7"  "test,8"  "test,9"
[10] "test,10"

3. NA Replace

Now will see how to handle missing data’s

str_c(c("My Name", NA, "Jhon"),".")
"My Name." NA         "Jhon."

So you can see missing value is not concatenated. This we can overcome based on str_replace_na()

replace NA with . or character

str_replace_na(c("My Name", NA, "Jhon"),".")
"My Name" "."       "Jhon"

4. String Extraction

If you want to extract the substring then str_sub will be handy.

str_sub(statement,1,5)
"R"     "is po" "tool"  "for d" "analy"

Now you can see the first 5 characters extracted from the string vector.

Decision Trees in R » Classification & Regression »

If you know the length of the string you can update your string also.

str_sub(statement, 4,-1)<-"Wow"
statement
"RWow"   "is Wow" "tooWow" "forWow" "anaWow"

5. Split

If you want to split the string based on pattern, str_split will be useful.

str_split(statement,pattern=" ")
[[1]]
[1] "RWow"
[[2]]
[1] "is"  "Wow"
[[3]]
[1] "tooWow"
[[4]]
[1] "forWow"
[[5]]
[1] "anaWow"

6. Subset

Suppose if you want to subset word in the particular pattern you can make use of str_subset

Handling missing values in R Programming »

str_subset(colors(),pattern="green")
[1] "darkgreen"         "darkolivegreen"    "darkolivegreen1"   "darkolivegreen2" 
 [5] "darkolivegreen3"   "darkolivegreen4"   "darkseagreen"      "darkseagreen1"   
 [9] "darkseagreen2"     "darkseagreen3"     "darkseagreen4"     "forestgreen"     
[13] "green"             "green1"            "green2"            "green3"          
[17] "green4"            "greenyellow"       "lawngreen"         "lightgreen"      
[21] "lightseagreen"     "limegreen"         "mediumseagreen"    "mediumspringgreen"
[25] "palegreen"         "palegreen1"        "palegreen2"        "palegreen3"      
[29] "palegreen4"        "seagreen"          "seagreen1"         "seagreen2"       
[33] "seagreen3"         "seagreen4"         "springgreen"       "springgreen1"    
[37] "springgreen2"      "springgreen3"      "springgreen4"      "yellowgreen"
If  you want to extract colors start with orange or end with red then ^$ will be helpful
str_subset(colors(),pattern="^orange|red$")
1] "darkred"         "indianred"       "mediumvioletred" "orange"          "orange1"      
 [6] "orange2"         "orange3"         "orange4"         "orangered"       "orangered1"    
[11] "orangered2"      "orangered3"      "orangered4"      "palevioletred"   "red"           
[16] "violetred"

^ indicate the starting of the string and $ indicate string ending with

If you want to extract characters or numbers from string str_extract will be useful

list<-c("Hai1", "my 10", "Name 20")
str_extract(list,pattern="[a-z]")

“a” “m” “a”

If you want full word then you can use

str_extract(list,pattern="[a-z]+")

“ai”  “my”  “ame”

7. html view

If you want to see the html vie output then you can use str_view

Stock Prediction-Intraday Trading » With High Accuracy »

str_view(statement,"a.")

Return first match in first string

9. Count

str_count for counting the character

str_count(statement,"[ae]")
0 0 0 0 2

9. Location

str_locate(statement,"[ae]")
start end
[1,]    NA  NA
[2,]    NA  NA
[3,]    NA  NA
[4,]    NA  NA
[5,]     1   1

str_locate will display the first match

Filtering Data in R 10 Tips -tidyverse package »

10. Lower/Upper case

For lower case letters

str_to_lower(statement)

“rwow”   “is wow” “toowow” “forwow” “anawow”

str_to_upper(statement)

“RWOW”   “IS WOW” “TOOWOW” “FORWOW” “ANAWOW”

For case, the sensitive first letter in upper case, and rest will be lower case

apply family in r apply(), lapply(), sapply(), mapply() and tapply()

str_to_title(statement)

“Rwow”   “Is Wow” “Toowow” “Forwow” “Anawow”

?stringr and go to index you will get all stringr functions.

Conclusion

Many other functions are available in stringr package. If you feel any other important function we missed out please mention it in the comment box.

Latest Data Science Job Vacancies

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

9 + six =

finnstats