Social Network Analysis in R

by finnstats

Social Network Analysis in R, Social Network Analysis (SNA) is the process of exploring the social structure by using graph theory.

It is mainly used for measuring and analyzing the structural properties of the network.

It helps to measure social network relationships (Facebook, Twitter likes comments following etc..), Email connectivity, flows between groups, organizations, and other connected entities.

In this training tutorial, we will be using a small network that indicates a connection between the two columns’ information.

Look at the example data provided below in that the first and second columns are connected with social networks.

Please make commonly used words in social network analysis.

A network is represented as a graph, which shows links (if any) between each vertex (or node) and its neighbors.

Edge: – A-line indicating a link between vertices.

Component: – A group of vertices that are mutually reachable by following edges on the graph.

Path: -The edges followed from one vertex to another are called a path.

Tidyverse in R complete tutorial

Load Library

library(igraph)

Getting Data

data <- read.csv("D:/RStudio/SocialNetworkAnalysis/socialnetworkdata.csv", header=T)

The same dataset you can avail from gitHub click here

y <- data.frame(data$first, data$second)

Create network

net <- graph.data.frame(y, directed=T)
V(net)

+ 52/52 vertices, named, from 4c03f50:
 [1] AA AB AF DD CD BA CB CC BC ED AE CA EB BF BB AC DC BD DB CF DF BE EA CE EE EF FF FD GB GC GD AD KA KF LC DA EC FA FB DE FC FE GA GE KB KC KD KE LB LA LD LE

E(net)

290/290 edges from 4c03f50 (vertex names):
  [1] AA->DD AB->DD AF->BA DD->DA CD->EC DD->CE CD->FA CD->CC BA->AF CB->CA CC->CA CD->CA BC->CA DD->DA ED->AD AE->AC AB->BA CD->EC CA->CC

 [20] EB->CC BF->CE BB->CD AC->AE CC->FB DC->BB BD->CF DB->DA DD->DA DB->DD BC->AF CF->DE DF->BF CB->CA BE->CA EA->CA CB->CA CB->CA CC->CA

 [39] CD->CA BC->CA BF->CA CE->CA AC->AD BD->BE AE->DF CB->DF AC->DF AA->DD AA->DD AA->DD CD-
....................................................................................
....................................................................................
[153] CA->CC CD->CC CA->CC BB->CC DF->CE CA->CE AE->GA DC->ED BB->ED CD->ED BF->ED DF->ED CC->KB AD->EA EF->EA CF->KC EE->BB CC->BB BC->BB

[172] CD->KD AE->CF DF->AC DF->AC ED->AC KA->CA BB->CA CB->CA CC->CA EB->CA BE->CC BE->CD CB->CD CB->KE ED->AB AB->KF BA->AF CC->AF CA->AF
+ ... omitted several edges

We got 52 vertices and 290 edges. Let’s assign the labels.

V(net)$label <- V(net)$name
V(net)$degree <- degree(net)

Histogram of node degree

hist(V(net)$degree,
     col = 'green',
     main = 'Histogram of Node Degree',
     ylab = 'Frequency',
     xlab = 'Degree of Vertices')

Around 35 nodes with less than 10 degree and some nodes with high degree (60 to 70 connections) also.

It indicates that many nodes with few connections and few nodes with many connections.

Proportion test in R

Network diagram

set.seed(222)
plot(net,
     vertex.color = 'green',
     vertext.size = 2,
     edge.arrow.size = 0.1,
     vertex.label.cex = 0.8)

Highlighting degrees & layouts

plot(net,
     vertex.color = rainbow(52),
     vertex.size = V(net)$degree*0.4,
     edge.arrow.size = 0.1,
     layout=layout.fruchterman.reingold)

CA has the highest degree followed by others.

One sample analysis in R

plot(net,
     vertex.color = rainbow(52),
     vertex.size = V(net)$degree*0.4,
     edge.arrow.size = 0.1,
     layout=layout.graphopt)

plot(net,
     vertex.color = rainbow(52),
     vertex.size = V(net)$degree*0.4,
     edge.arrow.size = 0.1,
     layout=layout.kamada.kawai)

You can try out with different layouts and you can use for best one suited to your requirements.

Market Basket Analysis in R

Hub and Authorities

Basically, the hub has many outgoing links, and Authorities have many incoming links.

hs <- hub_score(net)$vector
as <- authority.score(net)$vector
set.seed(123)
plot(net,
     vertex.size=hs*30,
     main = 'Hubs',
     vertex.color = rainbow(52),
     edge.arrow.size=0.1,
     layout = layout.kamada.kawai)

You can see the highest outgoing links from CC and CA.

Gradient Boosting in R

set.seed(123)
plot(net,
     vertex.size=as*30,
     main = 'Authorities',
     vertex.color = rainbow(52),
     edge.arrow.size=0.1,
     layout = layout.kamada.kawai)

It is the exactly the same layout with maximum incoming from CA followed by CC.

Decision Trees in R

Community Detection

To detect densely connected nodes

net <- graph.data.frame(y, directed = F)
cnet <- cluster_edge_betweenness(net)
plot(cnet,
     net,
     vertex.size = 10,
     vertex.label.cex = 0.8)