1 Introduction to this workshop/tutorial

This workshop and tutorial provide an overview of R packages for network analysis. This online tutorial is also designed for self-study, with example code and self-contained data.

  • Statnet suite (Krivitsky et al., n.d.) including:
    • network (Butts 2008, 2021) – storage and manipulation of network data
    • sna (Butts 2020) – descriptive statistics and graphics for exploratory network analysis`
  • igraph (Csardi and Nepusz 2006)
  • tidygraph (Pedersen 2020) and ggraph (Pedersen 2021)
  • graph (Gentleman et al. 2020) and Rgraphviz (Hansen et al. 2021)

and other more specialized packages that provide tools for e.g. particular SNA techniques or visualization, but rely on one of the above for network data storage and manipulation.

1.1 Prerequisites

This workshop assumes basic familiarity with R, experience with network concepts, terminology and data, and familiarity with the general framework for statistical modeling and inference. While previous experience with ERGMs is not required, some of the topics covered here may be difficult to understand without a strong background in linear and generalized linear models in statistics.

1.2 Software installation

Minimally, you will need to install the latest version of R (available here) and the packages listed below. The workshops are conducted using the free version of RStudio (available here).

The packages required for the workshop can be installed with the following expression:

install.packages(c("network", "sna", "igraph", "tidygraph", "ggraph", 
                   "intergraph", "remotes"))

Package remotes (Hester et al. 2021) is needed to install the remaining two packages.

remotes::install_bioc("graph")

For more information about installing other packages from the Statnet suite can be found on statnet workshop wiki. In particular, you can install (but do not have to for this tutorial) the whole Statnet suite with:

install.packages('statnet')

1.3 Necessary data files

  • Classroom data is a network within a school class of 26 9-year-olds coming from a larger study of Dolata (2014). The name generator question was “With whom do you like to play with?”. The data is available in the following files
    • classroom-adjacency.csv with adjacency matrix
    • classroom-edges.csv with an edgelist with edge attribute: liking – numeric, on the scale 1-5 the extent to which ego likes the alter. This attribute has been randomly generated for illustrative purposes.
    • classroom-nodes.csv with node attributes: female – logical, gender (TRUE for girls); isei08_m, isei08_f – numeric, social status score of, respectively, mother and father
  • Several other datasets contained in the file introToSNAinR.Rdata.

Download all the files as a ZIP file intro-sna-data.zip.

The code from this tutorial is available as a script too.

1.4 Working Directory

Before we go further, make sure R’s Working Directory (WD) is set to the folder where you extracted the data files from the ZIP archive for the workshop. If you’ve not set the working directory, you must do so now by one of:

  1. (Recommended) Create an RStudio Project dedictated to the workshop and unpack the data files there.

  2. Use RStudio “Files” tab to navigate to the directory with the workshop files, then click “More” and “Set As Working Directory”:

  3. You can use setwd() to change the working directory as well, like so:

    setwd("path/to/folder/with/workshop/files")

Verify if the WD is set correctly by

  1. Looking at the top of the Console window in RStudio, or

  2. Use getwd():

    getwd() # Check what directory you're in
    [1] "/home/mbojan/Teaching/workshop-intro-sna-tools"
    list.files() # Check what's in the working directory
     [1] "bibliography.bib"               "captab.html"                   
     [3] "captab.Rmd"                     "classroom-adjacency.csv"       
     [5] "classroom-edges.csv"            "classroom-nodes.csv"           
     [7] "edgeList.csv"                   "intro_tutorial.html"           
     [9] "intro_tutorial.R"               "intro_tutorial.Rmd"            
    [11] "intro-sna-data.zip"             "introToSNAinR.Rdata"           
    [13] "Makefile"                       "practicals_files"              
    [15] "practicals-solved_files"        "practicals.Rmd"                
    [17] "publish.R"                      "README.md"                     
    [19] "relationalData.csv"             "rstudio-wd.png"                
    [21] "vertexAttributes.csv"           "workshop-intro-sna-tools.Rproj"

1.5 Mitigating function name conflicts

Some packages we are going to demonstrate provide functions with identical names as in other packages. Examples include a function get.vertex.attribute() which is defined in packages network and igraph. Hence, if we load both packages with library() it matters which package is loaded last as its version of the function will be used when we write get.vertex.attribute.

In particular, note the following function name clashes:

  • Between igraph and network:

     [1] "%c%"                    "%s%"                    "add.edges"             
     [4] "add.vertices"           "delete.edges"           "delete.vertices"       
     [7] "get.edge.attribute"     "get.edges"              "get.vertex.attribute"  
    [10] "is.bipartite"           "is.directed"            "list.edge.attributes"  
    [13] "list.vertex.attributes" "set.edge.attribute"     "set.vertex.attribute"  
  • Between igraph and sna:

     [1] "betweenness"  "bonpow"       "closeness"    "components"   "degree"      
     [6] "dyad.census"  "evcent"       "hierarchy"    "is.connected" "neighborhood"
    [11] "triad.census"

There are the following strategies to make sure possible conflicts are as painless as possible:

  1. Load the packages but do not attach them and always use ::
  2. Load and attach the packages and use :: for disambiguation.
  3. Load the packages and selectively attach and detach them in order to always have only one of them attached.

In this tutorial we had to deal with these conflicts as well. We have opted for strategy (3) because:

  • Code blocks will illustrate working with a particular package when without worrying about the conflicts.
  • The code examples are clean of :: namespace directives and hence cleaner to read.

The disadvantage is that

  • You will see frequent calls to library() and detach() at the beginning and end of the subsections to make sure only one intended package is attached at a given time.

2 Importing Relational Data

Network data is usually stored as

  • Adjacency matrices
  • Edge lists
  • Edge and vertex data frames

2.1 Network

library(network)

'network' 1.17.1 (2021-06-12), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
library(sna)
Loading required package: statnet.common

Attaching package: 'statnet.common'
The following objects are masked from 'package:base':

    attr, order
sna: Tools for Social Network Analysis
Version 2.6 created on 2020-10-5.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

Read an adjacency matrix (R stores it as a data frame by default). R also won’t permit numbers as column names, although this is fine for rownames.

relations <- read.csv("classroom-adjacency.csv",header=T,row.names=1,stringsAsFactors=FALSE)
relations[1:10,1:10] #look at a subgraph using bracket notation
     X1003 X1006 X1009 X1012 X1015 X1018 X1021 X1024 X1027 X1030
1003     0     0     0     0     0     1     0     0     0     0
1006     0     0     1     0     0     0     0     0     0     0
1009     0     1     0     0     0     0     0     0     0     0
1012     0     0     0     0     1     0     1     0     0     0
1015     0     0     0     1     0     0     1     0     0     0
1018     0     0     0     0     0     0     0     0     0     0
1021     0     1     1     1     0     0     0     0     0     0
1024     0     0     0     0     0     0     0     0     0     0
1027     0     1     1     0     0     0     0     0     0     0
1030     0     0     1     0     0     0     0     0     0     0

We might want to store it as a matrix. Most routines will accept either data format. However, depending on how a function was written, it might require one or the other. The isSymmetric function from the sna package is one example that requires a matrix rather than a data frame.

relations <- as.matrix(relations) # convert to matrix format
isSymmetric(relations)
[1] FALSE

To make the row and column names identical, we can overwrite the rownames:

colnames(relations) <- rownames(relations)

Read in some vertex attribute data (okay to leave it as a data frame - in fact converting to a matrix would create problems as matrices can only have strings or numbers, but data frames can have vectors of both)

nodeInfo <- read.csv("classroom-nodes.csv",header=TRUE,stringsAsFactors=FALSE)
head(nodeInfo)
  name female isei08_m isei08_f
1 1003  FALSE       NA    25.71
2 1006   TRUE    14.64    33.76
3 1009   TRUE    28.48    37.22
4 1012   TRUE    26.64    25.23
5 1015   TRUE    21.24       NA
6 1018  FALSE    23.47    24.45

We could also convert it to a network object. This would be useful for (1) storing all data in the same file, (2) a more compact format for large, sparse matrices, or (3) using the data in later analyses where the routines require network objects (e.g. ERGM)

nrelations <- network(relations)
summary(nrelations) # Get an overall summary
Network attributes:
  vertices = 26
  directed = TRUE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
  bipartite = FALSE
 total edges = 88 
   missing edges = 0 
   non-missing edges = 88 
 density = 0.1353846 

Vertex attributes:
  vertex.names:
   character valued attribute
   26 valid vertex names

No edge attributes

Network edgelist matrix:
      [,1] [,2]
 [1,]   20    1
 [2,]   24    1
 [3,]    3    2
 [4,]    7    2
 [5,]    9    2
 [6,]   12    2
 [7,]   14    2
 [8,]   21    2
 [9,]   26    2
[10,]    2    3
[11,]    7    3
[12,]    9    3
[13,]   10    3
[14,]   19    3
[15,]   25    3
[16,]   26    3
[17,]    5    4
[18,]    7    4
[19,]    4    5
[20,]    1    6
[21,]   13    6
[22,]   14    6
[23,]   15    6
[24,]   16    6
[25,]   17    6
[26,]   18    6
[27,]   20    6
[28,]   24    6
[29,]    4    7
[30,]    5    7
[31,]   22    9
[32,]   25    9
[33,]   12   11
[34,]   14   13
[35,]   18   13
[36,]    2   14
[37,]    6   14
[38,]   12   14
[39,]   13   14
[40,]   15   14
[41,]   16   14
[42,]   17   14
[43,]   20   14
[44,]   21   14
[45,]   16   15
[46,]   17   15
[47,]    6   16
[48,]    9   16
[49,]   17   16
[50,]   18   16
[51,]    1   17
[52,]    6   17
[53,]    9   17
[54,]   11   17
[55,]   14   17
[56,]   16   17
[57,]   18   17
[58,]   20   17
[59,]   23   17
[60,]   24   17
[61,]   16   18
[62,]   17   18
[63,]    2   19
[64,]    3   19
[65,]    7   19
[66,]    9   19
[67,]   25   19
[68,]   26   19
[69,]    6   20
[70,]   22   21
[71,]   23   21
[72,]   21   22
[73,]   23   22
[74,]   21   23
[75,]   22   23
[76,]    1   24
[77,]    6   24
[78,]   18   24
[79,]    4   25
[80,]   22   25
[81,]    2   26
[82,]    3   26
[83,]    5   26
[84,]    7   26
[85,]    9   26
[86,]   10   26
[87,]   19   26
[88,]   25   26

Here the row and column names have been carried through becasue they were attached to the matrix. We can look at them by using the network variable methods and the shorthand %v%:

list.vertex.attributes(nrelations)
[1] "na"           "vertex.names"
nrelations%v%"vertex.names"
 [1] "1003" "1006" "1009" "1012" "1015" "1018" "1021" "1024" "1027" "1030"
[11] "1033" "1036" "1039" "1042" "1045" "1048" "1051" "1054" "1057" "1060"
[21] "1063" "1066" "1069" "1072" "1075" "1078"

If we wanted to set the names back to the original numbers, we could use these methods as well:

nrelations%v%"vertex.names" <- nodeInfo$name
nrelations%v%"vertex.names"
 [1] 1003 1006 1009 1012 1015 1018 1021 1024 1027 1030 1033 1036 1039 1042 1045
[16] 1048 1051 1054 1057 1060 1063 1066 1069 1072 1075 1078

2.1.1 Now with edgelists

Reading in an edgelist and converting it to a network object is also straightforward. Edgelists are useful because they are a smaller, more concise data structure for larger, sparser networks that we typically deal with in social network analysis.

In the newest release of statnet it will automatically read the weight data and store it as “Weight.” If you’re using an older version of statnet, you might need to add two more commands to the network command: ignore.eval=FALSE and names.eval="Weight".

edgelist<-read.csv("classroom-edges.csv",header=T,stringsAsFactors = F)
head(edgelist)
  from   to liking
1 1003 1018      3
2 1003 1051      3
3 1003 1072      4
4 1006 1009      5
5 1006 1042      1
6 1006 1057      4
edgeNet<-network(edgelist,matrix.type="edgelist")
edgeNet
 Network attributes:
  vertices = 25 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 88 
    missing edges= 0 
    non-missing edges= 88 

 Vertex attribute names: 
    vertex.names 

 Edge attribute names: 
    liking 

Converting back to an adjacency matrix is simple:

edgeNet[,] ##what's missing?
     1003 1006 1009 1012 1015 1018 1021 1027 1030 1033 1036 1039 1042 1045 1048
1003    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
1006    0    0    1    0    0    0    0    0    0    0    0    0    1    0    0
1009    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
1012    0    0    0    0    1    0    1    0    0    0    0    0    0    0    0
1015    0    0    0    1    0    0    1    0    0    0    0    0    0    0    0
1018    0    0    0    0    0    0    0    0    0    0    0    0    1    0    1
1021    0    1    1    1    0    0    0    0    0    0    0    0    0    0    0
1027    0    1    1    0    0    0    0    0    0    0    0    0    0    0    1
1030    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
1033    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1036    0    1    0    0    0    0    0    0    0    1    0    0    1    0    0
1039    0    0    0    0    0    1    0    0    0    0    0    0    1    0    0
1042    0    1    0    0    0    1    0    0    0    0    0    1    0    0    0
1045    0    0    0    0    0    1    0    0    0    0    0    0    1    0    0
1048    0    0    0    0    0    1    0    0    0    0    0    0    1    1    0
1051    0    0    0    0    0    1    0    0    0    0    0    0    1    1    1
1054    0    0    0    0    0    1    0    0    0    0    0    1    0    0    1
1057    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
1060    1    0    0    0    0    1    0    0    0    0    0    0    1    0    0
1063    0    1    0    0    0    0    0    0    0    0    0    0    1    0    0
1066    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0
1069    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1072    1    0    0    0    0    1    0    0    0    0    0    0    0    0    0
1075    0    0    1    0    0    0    0    1    0    0    0    0    0    0    0
1078    0    1    1    0    0    0    0    0    0    0    0    0    0    0    0
     1051 1054 1057 1060 1063 1066 1069 1072 1075 1078
1003    1    0    0    0    0    0    0    1    0    0
1006    0    0    1    0    0    0    0    0    0    1
1009    0    0    1    0    0    0    0    0    0    1
1012    0    0    0    0    0    0    0    0    1    0
1015    0    0    0    0    0    0    0    0    0    1
1018    1    0    0    1    0    0    0    1    0    0
1021    0    0    1    0    0    0    0    0    0    1
1027    1    0    1    0    0    0    0    0    0    1
1030    0    0    0    0    0    0    0    0    0    1
1033    1    0    0    0    0    0    0    0    0    0
1036    0    0    0    0    0    0    0    0    0    0
1039    0    0    0    0    0    0    0    0    0    0
1042    1    0    0    0    0    0    0    0    0    0
1045    0    0    0    0    0    0    0    0    0    0
1048    1    1    0    0    0    0    0    0    0    0
1051    0    1    0    0    0    0    0    0    0    0
1054    1    0    0    0    0    0    0    1    0    0
1057    0    0    0    0    0    0    0    0    0    1
1060    1    0    0    0    0    0    0    0    0    0
1063    0    0    0    0    0    1    1    0    0    0
1066    0    0    0    0    1    0    1    0    1    0
1069    1    0    0    0    1    1    0    0    0    0
1072    1    0    0    0    0    0    0    0    0    0
1075    0    0    1    0    0    0    0    0    0    1
1078    0    0    1    0    0    0    0    0    0    0

In network edges and edge weights are considered separate. This is confusing, but done for a number of reasons. (1) you might want multiple types of weights associated with a given edge, or (2) you might want a weight associated where there isn’t an edge at all.

To see a particular weight, use the edge attribute shorthand %e% and to get the full network with weights, the command as.sociomatrix.sna. Note that the network command just called the weights by the column name from the csv file.

list.edge.attributes(edgeNet)
[1] "liking" "na"    
edgeNet %e% "liking"
 [1] 3 3 4 5 1 4 1 2 3 5 5 4 3 2 1 2 1 4 3 5 2 2 4 3 4 4 4 3 2 5 3 4 4 3 1 4 2 2
[39] 4 3 2 3 2 1 2 1 1 4 5 2 2 4 3 4 1 1 3 3 2 1 4 4 4 2 5 4 5 3 3 5 4 3 1 5 4 3
[77] 2 3 3 3 2 5 3 1 5 1 1 4
as.sociomatrix.sna(edgeNet, "liking")
     1003 1006 1009 1012 1015 1018 1021 1027 1030 1033 1036 1039 1042 1045 1048
1003    0    0    0    0    0    3    0    0    0    0    0    0    0    0    0
1006    0    0    5    0    0    0    0    0    0    0    0    0    1    0    0
1009    0    2    0    0    0    0    0    0    0    0    0    0    0    0    0
1012    0    0    0    0    5    0    4    0    0    0    0    0    0    0    0
1015    0    0    0    2    0    0    1    0    0    0    0    0    0    0    0
1018    0    0    0    0    0    0    0    0    0    0    0    0    1    0    4
1021    0    2    4    3    0    0    0    0    0    0    0    0    0    0    0
1027    0    4    3    0    0    0    0    0    0    0    0    0    0    0    2
1030    0    0    4    0    0    0    0    0    0    0    0    0    0    0    0
1033    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1036    0    4    0    0    0    0    0    0    0    2    0    0    2    0    0
1039    0    0    0    0    0    4    0    0    0    0    0    0    3    0    0
1042    0    2    0    0    0    3    0    0    0    0    0    2    0    0    0
1045    0    0    0    0    0    2    0    0    0    0    0    0    1    0    0
1048    0    0    0    0    0    1    0    0    0    0    0    0    4    5    0
1051    0    0    0    0    0    4    0    0    0    0    0    0    3    4    1
1054    0    0    0    0    0    3    0    0    0    0    0    3    0    0    2
1057    0    0    4    0    0    0    0    0    0    0    0    0    0    0    0
1060    2    0    0    0    0    5    0    0    0    0    0    0    4    0    0
1063    0    3    0    0    0    0    0    0    0    0    0    0    3    0    0
1066    0    0    0    0    0    0    0    3    0    0    0    0    0    0    0
1069    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1072    3    0    0    0    0    3    0    0    0    0    0    0    0    0    0
1075    0    0    5    0    0    0    0    3    0    0    0    0    0    0    0
1078    0    1    1    0    0    0    0    0    0    0    0    0    0    0    0
     1051 1054 1057 1060 1063 1066 1069 1072 1075 1078
1003    3    0    0    0    0    0    0    4    0    0
1006    0    0    4    0    0    0    0    0    0    1
1009    0    0    3    0    0    0    0    0    0    5
1012    0    0    0    0    0    0    0    0    3    0
1015    0    0    0    0    0    0    0    0    0    2
1018    3    0    0    5    0    0    0    2    0    0
1021    0    0    4    0    0    0    0    0    0    4
1027    5    0    3    0    0    0    0    0    0    4
1030    0    0    0    0    0    0    0    0    0    3
1033    1    0    0    0    0    0    0    0    0    0
1036    0    0    0    0    0    0    0    0    0    0
1039    0    0    0    0    0    0    0    0    0    0
1042    1    0    0    0    0    0    0    0    0    0
1045    0    0    0    0    0    0    0    0    0    0
1048    2    2    0    0    0    0    0    0    0    0
1051    0    1    0    0    0    0    0    0    0    0
1054    1    0    0    0    0    0    0    4    0    0
1057    0    0    0    0    0    0    0    0    0    4
1060    5    0    0    0    0    0    0    0    0    0
1063    0    0    0    0    0    5    4    0    0    0
1066    0    0    0    0    1    0    5    0    4    0
1069    3    0    0    0    2    3    0    0    0    0
1072    2    0    0    0    0    0    0    0    0    0
1075    0    0    1    0    0    0    0    0    0    5
1078    0    0    4    0    0    0    0    0    0    0
# Detaching the packages
detach(package:sna)
detach(package:network)

2.2 Igraph

library(igraph)

Attaching package: 'igraph'
The following objects are masked from 'package:stats':

    decompose, spectrum
The following object is masked from 'package:base':

    union

Small Igraph objects can be created using make_graph(). You can create network from data using one of the functions from the table below. The table point to functions for:

  • creating igraph objects from other R objects
  • transforming igraph objects into other R objects
Object Object -> Igraph Igraph -> Object
Adjacency matrix graph_from_adjacency_matrix as_adjacency_matrix
Edge list graph_from_edgelist as_edgelist
Data frames graph_from_data_frame as_data_frame

2.2.1 Simple graphs with make_graph()

Function make_graph() can quickly create small networks. Relational information can be supplied in two ways:

  1. As a vector of even number of node IDs. Pairs of adjacent IDs are interpreted as edges:

    make_graph( c(1,2, 2,3, 3,4), directed=FALSE)
    IGRAPH 179053b U--- 4 3 -- 
    + edges from 179053b:
    [1] 1--2 2--3 3--4
  2. Using symbolic formula in which

    • -- undirected tie
    • --+ directed tie (+ is arrow’s head)
    • : refer to node sets (e.g. A -- B:C creates ties A -- B and A -- C)
    • A network is either directed or undirected, it is not possible mix directed and undirected ties.
    • Between given two nodes there can be many relations.
    g1 <- make_graph(~ A - B, B - C:D:E)
    g2 <- make_graph(~ A --+ B, B +-- C, A --+ D:E, B --+ A)
    g2
    IGRAPH b6fd761 DN-- 5 5 -- 
    + attr: name (v/c)
    + edges from b6fd761 (vertex names):
    [1] A->B A->D A->E B->A C->B

The print-out of g2 exemplifies how igraph summarizes igraph objects:

  • First line includes
    • Class of the object (IGRAPH)
    • An ID of the object, not of particular interest (see also ?igraph::graph_id)
    • A set of four slots for letter codes indicating, in order:
      • U or D if the network is Undirected or Directed
      • N if the nodes have names
      • W if the network is weighted
      • B if the network is bipartite
    • Number of nodes
    • Number of edges
  • Starting with `+ attr:list of present attributes, each of the formnameoftheattribute (x/y)where -xinforms about the type of an attribute:vertex,edge orgraph attribute -yinforms about the mode of an attribute:numeric,character,logical, or ex`tended (e.g. lists)
  • Starting with `+ edges:` a list of (some of) the edges of the network

2.2.2 Networks from adjacency matrices

Igraph objects can be created from adjacency matrices with graph_from_adjacency_matrix():

graph_from_adjacency_matrix(relations, mode="directed")
IGRAPH 1c96e37 DN-- 26 88 -- 
+ attr: name (v/c)
+ edges from 1c96e37 (vertex names):
 [1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
 [7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
+ ... omitted several edges

Important arguments:

  • mode – how to interpret the matrix
    • "directed", "undirected": directed/undirected network
    • "max", "min", "sum": determine the number of \(i\)-\(j\) relations that will be created, e.g., max( m[i,j], m[j,i] ).
    • "lower", "upper": whether to read only lower/upper triangle of the matrix
  • weighted – if TRUE non-zero values of the matrix are stored in edge attribute weight

2.2.3 Networks from edge lists

Function graph_from_edgelist() expects a two-column matrix

edgelist_matrix <- as.matrix(edgelist[,1:2])
head(edgelist_matrix)
     from   to
[1,] 1003 1018
[2,] 1003 1051
[3,] 1003 1072
[4,] 1006 1009
[5,] 1006 1042
[6,] 1006 1057

Now create the object:

graph_from_edgelist(edgelist_matrix, directed=TRUE)
IGRAPH 141ddbf D--- 1078 88 -- 
+ edges from 141ddbf:
 [1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
 [7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
[49] 1048->1045 1048->1051 1048->1054 1051->1018 1051->1042 1051->1045
+ ... omitted several edges

Note the number of edges! If edgelist matrix contains integers the function assumes that node IDs start from 1 and thus the result will contain a lot of isolates. In this case we have to convert the matrix to character mode before passing it to graph_from_edgelist():

edgelist_matrix_ch <- as.character(edgelist_matrix)
dim(edgelist_matrix_ch) <- dim(edgelist_matrix)
graph_from_edgelist(edgelist_matrix_ch, directed=TRUE)
IGRAPH 16cdee7 DN-- 25 88 -- 
+ attr: name (v/c)
+ edges from 16cdee7 (vertex names):
 [1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
 [7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
+ ... omitted several edges

This also shows the disadvantage of solely relying on edgelist representation as we are missing one boy who is an isolate.

2.2.4 Networks from data frames

Igraph objects can be created from data frames with data on edges and, optionally, on vertices with graph_from_data_frame

classroom_kids <- read.csv("classroom-nodes.csv", header=TRUE, colClasses=c(name = "character"))
head(classroom_kids)
  name female isei08_m isei08_f
1 1003  FALSE       NA    25.71
2 1006   TRUE    14.64    33.76
3 1009   TRUE    28.48    37.22
4 1012   TRUE    26.64    25.23
5 1015   TRUE    21.24       NA
6 1018  FALSE    23.47    24.45
classroom_play <- read.csv("classroom-edges.csv", header=TRUE, colClasses = c(from="character", to="character"))
head(classroom_play)
  from   to liking
1 1003 1018      3
2 1003 1051      3
3 1003 1072      4
4 1006 1009      5
5 1006 1042      1
6 1006 1057      4
classroom <- graph_from_data_frame(classroom_play, vertices=classroom_kids,
                                   directed=TRUE)
classroom
IGRAPH a0b410a DN-- 26 88 -- 
+ attr: name (v/c), female (v/l), isei08_m (v/n), isei08_f (v/n),
| liking (e/n)
+ edges from a0b410a (vertex names):
 [1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
 [7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
+ ... omitted several edges
  • First two columns of classroom_play are vertex IDs, additional columns are interpreted as edge attributes.
  • First two column of classroom_play is vertex ID, additional columns are interpreted as vertex attributes.
  • All vertex IDs present in edge data frame (classroom_play) must be present in the node data frame (classroom_kids)
detach(package:igraph)

2.3 Tidygraph

Package tidygraph uses igraph internally to store network data but provides a “tidy” interface for data manipulation – network data are interfaced as to interconnected data frames (1) nodes and (2) edges. This is very similar to the data structure accepted by igraph::graph_from_data_frame() demonstrated above.

library(tidygraph)

Attaching package: 'tidygraph'
The following object is masked from 'package:stats':

    filter

Objects can be created with:

  1. tbl_graph() from two data frames, similarly to igraph::graph_from_data_frame()

    tg_classroom <- tbl_graph(nodes = classroom_kids, edges = classroom_play, 
                              directed = TRUE)
    tg_classroom
    # A tbl_graph: 26 nodes and 88 edges
    #
    # A directed simple graph with 2 components
    #
    # Node Data: 26 x 4 (active)
      name  female isei08_m isei08_f
      <chr> <lgl>     <dbl>    <dbl>
    1 1003  FALSE      NA       25.7
    2 1006  TRUE       14.6     33.8
    3 1009  TRUE       28.5     37.2
    4 1012  TRUE       26.6     25.2
    5 1015  TRUE       21.2     NA  
    6 1018  FALSE      23.5     24.4
    # … with 20 more rows
    #
    # Edge Data: 88 x 3
       from    to liking
      <int> <int>  <int>
    1     1     6      3
    2     1    17      3
    3     1    24      4
    # … with 85 more rows
  2. as_tbl_graph() which accepts variety of objects: adjacency matrices, igraph, network, ggraph and some more (c.f. the documentation)

    # From igraph object created earlier
    tg_classroom2 <- as_tbl_graph(classroom)
    tg_classroom2
    # A tbl_graph: 26 nodes and 88 edges
    #
    # A directed simple graph with 2 components
    #
    # Node Data: 26 x 4 (active)
      name  female isei08_m isei08_f
      <chr> <lgl>     <dbl>    <dbl>
    1 1003  FALSE      NA       25.7
    2 1006  TRUE       14.6     33.8
    3 1009  TRUE       28.5     37.2
    4 1012  TRUE       26.6     25.2
    5 1015  TRUE       21.2     NA  
    6 1018  FALSE      23.5     24.4
    # … with 20 more rows
    #
    # Edge Data: 88 x 3
       from    to liking
      <int> <int>  <int>
    1     1     6      3
    2     1    17      3
    3     1    24      4
    # … with 85 more rows
    # From network object created earlier
    tg_net <- as_tbl_graph(edgeNet)
    tg_net
    # A tbl_graph: 25 nodes and 88 edges
    #
    # A directed simple graph with 1 component
    #
    # Node Data: 25 x 1 (active)
      na   
      <lgl>
    1 FALSE
    2 FALSE
    3 FALSE
    4 FALSE
    5 FALSE
    6 FALSE
    # … with 19 more rows
    #
    # Edge Data: 88 x 4
       from    to liking na   
      <int> <int>  <int> <lgl>
    1     1     6      3 FALSE
    2     1    16      3 FALSE
    3     1    23      4 FALSE
    # … with 85 more rows

2.3.1 Working with attributes

In tidygraph you can use dplyr (Wickham et al. 2021) verbs such as mutate(), select() etc. once you activate() either the nodes or edges data frame. Here are some examples.

Calculate social status of kid’s family as a minimal value of social statuses of mother and father:

tg_classroom %>%
  activate(nodes) %>%
  mutate(
    status = pmin(isei08_m, isei08_f, na.rm=TRUE)
  )
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# Node Data: 26 x 5 (active)
  name  female isei08_m isei08_f status
  <chr> <lgl>     <dbl>    <dbl>  <dbl>
1 1003  FALSE      NA       25.7   25.7
2 1006  TRUE       14.6     33.8   14.6
3 1009  TRUE       28.5     37.2   28.5
4 1012  TRUE       26.6     25.2   25.2
5 1015  TRUE       21.2     NA     21.2
6 1018  FALSE      23.5     24.4   23.5
# … with 20 more rows
#
# Edge Data: 88 x 3
   from    to liking
  <int> <int>  <int>
1     1     6      3
2     1    17      3
3     1    24      4
# … with 85 more rows

Similarly to dplyr you can use the pipe operator %>% to chain multiple data transformations. Here add a node attribute first, then edge attribute like5 second:

tg_classroom %>%
  activate(nodes) %>%
  mutate(
    status = pmin(isei08_m, isei08_f, na.rm=TRUE)
  ) %>%
  activate(edges) %>%
  mutate(
    like5 = liking == 5  # TRUE if liking is 5
  )
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# Edge Data: 88 x 4 (active)
   from    to liking like5
  <int> <int>  <int> <lgl>
1     1     6      3 FALSE
2     1    17      3 FALSE
3     1    24      4 FALSE
4     2     3      5 TRUE 
5     2    14      1 FALSE
6     2    19      4 FALSE
# … with 82 more rows
#
# Node Data: 26 x 5
  name  female isei08_m isei08_f status
  <chr> <lgl>     <dbl>    <dbl>  <dbl>
1 1003  FALSE      NA       25.7   25.7
2 1006  TRUE       14.6     33.8   14.6
3 1009  TRUE       28.5     37.2   28.5
# … with 23 more rows

You can refer to node attributes with .N() when computing on edges data frame and refer to edge attribute