1 Introduction to this workshop/tutorial

This workshop and tutorial provide an overview of R packages for network analysis. This online tutorial is also designed for self-study, with example code and self-contained data.

  • Statnet suite (Krivitsky et al. 2003-2020) including:
    • network (Butts 2008, 2021) – storage and manipulation of network data
    • sna (Butts 2020) – descriptive statistics and graphics for exploratory network analysis`
  • igraph (Csardi and Nepusz 2006)
  • tidygraph (Pedersen 2020) and ggraph (Pedersen 2021)
  • graph (Gentleman et al. 2020) and Rgraphviz (Hansen et al. 2021)

and other more specialized packages that provide tools for e.g. particular SNA techniques or visualization, but rely on one of the above for network data storage and manipulation.

1.1 Prerequisites

This workshop assumes basic familiarity with R, experience with network concepts, terminology and data, and familiarity with the general framework for statistical modeling and inference. While previous experience with ERGMs is not required, some of the topics covered here may be difficult to understand without a strong background in linear and generalized linear models in statistics.

1.2 Software installation

Minimally, you will need to install the latest version of R (available here) and the packages listed below. The workshops are conducted using the free version of RStudio (available here).

The packages required for the workshop can be installed with the following expression:

install.packages(c("network", "sna", "igraph", "tidygraph", "ggraph", 
                   "intergraph", "remotes"))

Package remotes (Hester et al. 2021) is needed to install the remaining two packages.

remotes::install_bioc("graph")

For more information about installing other packages from the Statnet suite can be found on Statnet website. In particular, you can install (but do not have to for this tutorial) the whole Statnet suite with:

install.packages('statnet')

1.3 Necessary data files

  • Classroom data is a network within a school class of 26 9-year-olds coming from a larger study of Dolata (2014). The name generator question was “With whom do you like to play with?”. The data is available in the following files
    • classroom-adjacency.csv with adjacency matrix
    • classroom-edges.csv with an edgelist with edge attribute: liking – numeric, on the scale 1-5 the extent to which ego likes the alter. This attribute has been randomly generated for illustrative purposes.
    • classroom-nodes.csv with node attributes: female – logical, gender (TRUE for girls); isei08_m, isei08_f – numeric, social status score of, respectively, mother and father
  • Several other datasets contained in the file introToSNAinR.Rdata.

Download all the files as a ZIP file intro-sna-data.zip.

The code from this tutorial is available as a script too.

1.4 Working Directory

Before we go further, make sure R’s Working Directory (WD) is set to the folder where you extracted the data files from the ZIP archive for the workshop. If you’ve not set the working directory, you must do so now by one of:

  1. (Recommended) Create an RStudio Project dedictated to the workshop and unpack the data files there.

  2. Use RStudio “Files” tab to navigate to the directory with the workshop files, then click “More” and “Set As Working Directory”:

  3. You can use setwd() to change the working directory as well, like so:

    setwd("path/to/folder/with/workshop/files")

Verify if the WD is set correctly by

  1. Looking at the top of the Console window in RStudio, or

  2. Use getwd():

    getwd() # Check what directory you're in
    [1] "/home/mbojan/Teaching/workshop-intro-sna-tools"
    list.files() # Check what's in the working directory
     [1] "bibliography.bib"               "captab.html"                   
     [3] "captab.Rmd"                     "classroom-adjacency.csv"       
     [5] "classroom-edges.csv"            "classroom-nodes.csv"           
     [7] "common"                         "edgeList.csv"                  
     [9] "index.html"                     "intro_tutorial.html"           
    [11] "intro_tutorial.R"               "intro_tutorial.Rmd"            
    [13] "intro-sna-data.zip"             "introToSNAinR.Rdata"           
    [15] "Makefile"                       "practicals_files"              
    [17] "practicals-solved_files"        "practicals-solved.html"        
    [19] "practicals.html"                "practicals.Rmd"                
    [21] "README.md"                      "relationalData.csv"            
    [23] "rstudio-wd.png"                 "vertexAttributes.csv"          
    [25] "workshop-intro-sna-tools.Rproj"

1.5 Mitigating function name conflicts

Some packages we are going to demonstrate provide functions with identical names as in other packages. Examples include a function get.vertex.attribute() which is defined in packages network and igraph. Hence, if we load both packages with library() it matters which package is loaded last as its version of the function will be used when we write get.vertex.attribute.

In particular, note the following function name clashes:

  • Between igraph and network:

     [1] "%c%"                    "%s%"                    "add.edges"             
     [4] "add.vertices"           "delete.edges"           "delete.vertices"       
     [7] "get.edge.attribute"     "get.edges"              "get.vertex.attribute"  
    [10] "is.bipartite"           "is.directed"            "list.edge.attributes"  
    [13] "list.vertex.attributes" "set.edge.attribute"     "set.vertex.attribute"  
  • Between igraph and sna:

     [1] "betweenness"  "bonpow"       "closeness"    "components"   "degree"      
     [6] "dyad.census"  "evcent"       "hierarchy"    "is.connected" "neighborhood"
    [11] "triad.census"

There are the following strategies to make sure possible conflicts are as painless as possible:

  1. Load the packages but do not attach them and always use ::
  2. Load and attach the packages and use :: for disambiguation.
  3. Load the packages and selectively attach and detach them in order to always have only one of them attached.

In this tutorial we had to deal with these conflicts as well. We have opted for strategy (3) because:

  • Code blocks will illustrate working with a particular package when without worrying about the conflicts.
  • The code examples are clean of :: namespace directives and hence cleaner to read.

The disadvantage is that

  • You will see frequent calls to library() and detach() at the beginning and end of the subsections to make sure only one intended package is attached at a given time.

2 Importing Relational Data

Network data is usually stored as

  • Adjacency matrices
  • Edge lists
  • Edge and vertex data frames

2.1 Network

library(network)

'network' 1.18.1 (2023-01-24), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
library(sna)
Loading required package: statnet.common

Attaching package: 'statnet.common'
The following objects are masked from 'package:base':

    attr, order
sna: Tools for Social Network Analysis
Version 2.7-1 created on 2023-01-24.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

Read an adjacency matrix (R stores it as a data frame by default). R also won’t permit numbers as column names, although this is fine for rownames.

relations <- read.csv("classroom-adjacency.csv",header=T,row.names=1,stringsAsFactors=FALSE)
relations[1:10,1:10] #look at a subgraph using bracket notation
     X1003 X1006 X1009 X1012 X1015 X1018 X1021 X1024 X1027 X1030
1003     0     0     0     0     0     1     0     0     0     0
1006     0     0     1     0     0     0     0     0     0     0
1009     0     1     0     0     0     0     0     0     0     0
1012     0     0     0     0     1     0     1     0     0     0
1015     0     0     0     1     0     0     1     0     0     0
1018     0     0     0     0     0     0     0     0     0     0
1021     0     1     1     1     0     0     0     0     0     0
1024     0     0     0     0     0     0     0     0     0     0
1027     0     1     1     0     0     0     0     0     0     0
1030     0     0     1     0     0     0     0     0     0     0

We might want to store it as a matrix. Most routines will accept either data format. However, depending on how a function was written, it might require one or the other. The isSymmetric function from the sna package is one example that requires a matrix rather than a data frame.

relations <- as.matrix(relations) # convert to matrix format
isSymmetric(relations)
[1] FALSE

To make the row and column names identical, we can overwrite the rownames:

colnames(relations) <- rownames(relations)

Read in some vertex attribute data (okay to leave it as a data frame - in fact converting to a matrix would create problems as matrices can only have strings or numbers, but data frames can have vectors of both)

nodeInfo <- read.csv("classroom-nodes.csv",header=TRUE,stringsAsFactors=FALSE)
head(nodeInfo)
  name female isei08_m isei08_f
1 1003  FALSE       NA    25.71
2 1006   TRUE    14.64    33.76
3 1009   TRUE    28.48    37.22
4 1012   TRUE    26.64    25.23
5 1015   TRUE    21.24       NA
6 1018  FALSE    23.47    24.45

We could also convert it to a network object. This would be useful for (1) storing all data in the same file, (2) a more compact format for large, sparse matrices, or (3) using the data in later analyses where the routines require network objects (e.g. ERGM)

nrelations <- network(relations)
summary(nrelations) # Get an overall summary
Network attributes:
  vertices = 26
  directed = TRUE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
  bipartite = FALSE
 total edges = 88 
   missing edges = 0 
   non-missing edges = 88 
 density = 0.1353846 

Vertex attributes:
  vertex.names:
   character valued attribute
   26 valid vertex names

No edge attributes

Network edgelist matrix:
      [,1] [,2]
 [1,]   20    1
 [2,]   24    1
 [3,]    3    2
 [4,]    7    2
 [5,]    9    2
 [6,]   12    2
 [7,]   14    2
 [8,]   21    2
 [9,]   26    2
[10,]    2    3
[11,]    7    3
[12,]    9    3
[13,]   10    3
[14,]   19    3
[15,]   25    3
[16,]   26    3
[17,]    5    4
[18,]    7    4
[19,]    4    5
[20,]    1    6
[21,]   13    6
[22,]   14    6
[23,]   15    6
[24,]   16    6
[25,]   17    6
[26,]   18    6
[27,]   20    6
[28,]   24    6
[29,]    4    7
[30,]    5    7
[31,]   22    9
[32,]   25    9
[33,]   12   11
[34,]   14   13
[35,]   18   13
[36,]    2   14
[37,]    6   14
[38,]   12   14
[39,]   13   14
[40,]   15   14
[41,]   16   14
[42,]   17   14
[43,]   20   14
[44,]   21   14
[45,]   16   15
[46,]   17   15
[47,]    6   16
[48,]    9   16
[49,]   17   16
[50,]   18   16
[51,]    1   17
[52,]    6   17
[53,]    9   17
[54,]   11   17
[55,]   14   17
[56,]   16   17
[57,]   18   17
[58,]   20   17
[59,]   23   17
[60,]   24   17
[61,]   16   18
[62,]   17   18
[63,]    2   19
[64,]    3   19
[65,]    7   19
[66,]    9   19
[67,]   25   19
[68,]   26   19
[69,]    6   20
[70,]   22   21
[71,]   23   21
[72,]   21   22
[73,]   23   22
[74,]   21   23
[75,]   22   23
[76,]    1   24
[77,]    6   24
[78,]   18   24
[79,]    4   25
[80,]   22   25
[81,]    2   26
[82,]    3   26
[83,]    5   26
[84,]    7   26
[85,]    9   26
[86,]   10   26
[87,]   19   26
[88,]   25   26

Here the row and column names have been carried through becasue they were attached to the matrix. We can look at them by using the network variable methods and the shorthand %v%:

list.vertex.attributes(nrelations)
[1] "na"           "vertex.names"
nrelations%v%"vertex.names"
 [1] "1003" "1006" "1009" "1012" "1015" "1018" "1021" "1024" "1027" "1030"
[11] "1033" "1036" "1039" "1042" "1045" "1048" "1051" "1054" "1057" "1060"
[21] "1063" "1066" "1069" "1072" "1075" "1078"

If we wanted to set the names back to the original numbers, we could use these methods as well:

nrelations%v%"vertex.names" <- nodeInfo$name
nrelations%v%"vertex.names"
 [1] 1003 1006 1009 1012 1015 1018 1021 1024 1027 1030 1033 1036 1039 1042 1045
[16] 1048 1051 1054 1057 1060 1063 1066 1069 1072 1075 1078

2.1.1 Now with edgelists

Reading in an edgelist and converting it to a network object is also straightforward. Edgelists are useful because they are a smaller, more concise data structure for larger, sparser networks that we typically deal with in social network analysis.

In the newest release of statnet it will automatically read the weight data and store it as “Weight.” If you’re using an older version of statnet, you might need to add two more commands to the network command: ignore.eval=FALSE and names.eval="Weight".

edgelist<-read.csv("classroom-edges.csv",header=T,stringsAsFactors = F)
head(edgelist)
  from   to liking
1 1003 1018      3
2 1003 1051      3
3 1003 1072      4
4 1006 1009      5
5 1006 1042      1
6 1006 1057      4
edgeNet<-network(edgelist,matrix.type="edgelist")
edgeNet
 Network attributes:
  vertices = 25 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 88 
    missing edges= 0 
    non-missing edges= 88 

 Vertex attribute names: 
    vertex.names 

 Edge attribute names: 
    liking 

Converting back to an adjacency matrix is simple:

edgeNet[,] ##what's missing?
     1003 1006 1009 1012 1015 1018 1021 1027 1030 1033 1036 1039 1042 1045 1048
1003    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
1006    0    0    1    0    0    0    0    0    0    0    0    0    1    0    0
1009    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
1012    0    0    0    0    1    0    1    0    0    0    0    0    0    0    0
1015    0    0    0    1    0    0    1    0    0    0    0    0    0    0    0
1018    0    0    0    0    0    0    0    0    0    0    0    0    1    0    1
1021    0    1    1    1    0    0    0    0    0    0    0    0    0    0    0
1027    0    1    1    0    0    0    0    0    0    0    0    0    0    0    1
1030    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
1033    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1036    0    1    0    0    0    0    0    0    0    1    0    0    1    0    0
1039    0    0    0    0    0    1    0    0    0    0    0    0    1    0    0
1042    0    1    0    0    0    1    0    0    0    0    0    1    0    0    0
1045    0    0    0    0    0    1    0    0    0    0    0    0    1    0    0
1048    0    0    0    0    0    1    0    0    0    0    0    0    1    1    0
1051    0    0    0    0    0    1    0    0    0    0    0    0    1    1    1
1054    0    0    0    0    0    1    0    0    0    0    0    1    0    0    1
1057    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
1060    1    0    0    0    0    1    0    0    0    0    0    0    1    0    0
1063    0    1    0    0    0    0    0    0    0    0    0    0    1    0    0
1066    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0
1069    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1072    1    0    0    0    0    1    0    0    0    0    0    0    0    0    0
1075    0    0    1    0    0    0    0    1    0    0    0    0    0    0    0
1078    0    1    1    0    0    0    0    0    0    0    0    0    0    0    0
     1051 1054 1057 1060 1063 1066 1069 1072 1075 1078
1003    1    0    0    0    0    0    0    1    0    0
1006    0    0    1    0    0    0    0    0    0    1
1009    0    0    1    0    0    0    0    0    0    1
1012    0    0    0    0    0    0    0    0    1    0
1015    0    0    0    0    0    0    0    0    0    1
1018    1    0    0    1    0    0    0    1    0    0
1021    0    0    1    0    0    0    0    0    0    1
1027    1    0    1    0    0    0    0    0    0    1
1030    0    0    0    0    0    0    0    0    0    1
1033    1    0    0    0    0    0    0    0    0    0
1036    0    0    0    0    0    0    0    0    0    0
1039    0    0    0    0    0    0    0    0    0    0
1042    1    0    0    0    0    0    0    0    0    0
1045    0    0    0    0    0    0    0    0    0    0
1048    1    1    0    0    0    0    0    0    0    0
1051    0    1    0    0    0    0    0    0    0    0
1054    1    0    0    0    0    0    0    1    0    0
1057    0    0    0    0    0    0    0    0    0    1
1060    1    0    0    0    0    0    0    0    0    0
1063    0    0    0    0    0    1    1    0    0    0
1066    0    0    0    0    1    0    1    0    1    0
1069    1    0    0    0    1    1    0    0    0    0
1072    1    0    0    0    0    0    0    0    0    0
1075    0    0    1    0    0    0    0    0    0    1
1078    0    0    1    0    0    0    0    0    0    0

In network edges and edge weights are considered separate. This is confusing, but done for a number of reasons. (1) you might want multiple types of weights associated with a given edge, or (2) you might want a weight associated where there isn’t an edge at all.

To see a particular weight, use the edge attribute shorthand %e% and to get the full network with weights, the command as.sociomatrix.sna. Note that the network command just called the weights by the column name from the csv file.

list.edge.attributes(edgeNet)
[1] "liking" "na"    
edgeNet %e% "liking"
 [1] 3 3 4 5 1 4 1 2 3 5 5 4 3 2 1 2 1 4 3 5 2 2 4 3 4 4 4 3 2 5 3 4 4 3 1 4 2 2
[39] 4 3 2 3 2 1 2 1 1 4 5 2 2 4 3 4 1 1 3 3 2 1 4 4 4 2 5 4 5 3 3 5 4 3 1 5 4 3
[77] 2 3 3 3 2 5 3 1 5 1 1 4
as.sociomatrix.sna(edgeNet, "liking")
     1003 1006 1009 1012 1015 1018 1021 1027 1030 1033 1036 1039 1042 1045 1048
1003    0    0    0    0    0    3    0    0    0    0    0    0    0    0    0
1006    0    0    5    0    0    0    0    0    0    0    0    0    1    0    0
1009    0    2    0    0    0    0    0    0    0    0    0    0    0    0    0
1012    0    0    0    0    5    0    4    0    0    0    0    0    0    0    0
1015    0    0    0    2    0    0    1    0    0    0    0    0    0    0    0
1018    0    0    0    0    0    0    0    0    0    0    0    0    1    0    4
1021    0    2    4    3    0    0    0    0    0    0    0    0    0    0    0
1027    0    4    3    0    0    0    0    0    0    0    0    0    0    0    2
1030    0    0    4    0    0    0    0    0    0    0    0    0    0    0    0
1033    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1036    0    4    0    0    0    0    0    0    0    2    0    0    2    0    0
1039    0    0    0    0    0    4    0    0    0    0    0    0    3    0    0
1042    0    2    0    0    0    3    0    0    0    0    0    2    0    0    0
1045    0    0    0    0    0    2    0    0    0    0    0    0    1    0    0
1048    0    0    0    0    0    1    0    0    0    0    0    0    4    5    0
1051    0    0    0    0    0    4    0    0    0    0    0    0    3    4    1
1054    0    0    0    0    0    3    0    0    0    0    0    3    0    0    2
1057    0    0    4    0    0    0    0    0    0    0    0    0    0    0    0
1060    2    0    0    0    0    5    0    0    0    0    0    0    4    0    0
1063    0    3    0    0    0    0    0    0    0    0    0    0    3    0    0
1066    0    0    0    0    0    0    0    3    0    0    0    0    0    0    0
1069    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1072    3    0    0    0    0    3    0    0    0    0    0    0    0    0    0
1075    0    0    5    0    0    0    0    3    0    0    0    0    0    0    0
1078    0    1    1    0    0    0    0    0    0    0    0    0    0    0    0
     1051 1054 1057 1060 1063 1066 1069 1072 1075 1078
1003    3    0    0    0    0    0    0    4    0    0
1006    0    0    4    0    0    0    0    0    0    1
1009    0    0    3    0    0    0    0    0    0    5
1012    0    0    0    0    0    0    0    0    3    0
1015    0    0    0    0    0    0    0    0    0    2
1018    3    0    0    5    0    0    0    2    0    0
1021    0    0    4    0    0    0    0    0    0    4
1027    5    0    3    0    0    0    0    0    0    4
1030    0    0    0    0    0    0    0    0    0    3
1033    1    0    0    0    0    0    0    0    0    0
1036    0    0    0    0    0    0    0    0    0    0
1039    0    0    0    0    0    0    0    0    0    0
1042    1    0    0    0    0    0    0    0    0    0
1045    0    0    0    0    0    0    0    0    0    0
1048    2    2    0    0    0    0    0    0    0    0
1051    0    1    0    0    0    0    0    0    0    0
1054    1    0    0    0    0    0    0    4    0    0
1057    0    0    0    0    0    0    0    0    0    4
1060    5    0    0    0    0    0    0    0    0    0
1063    0    0    0    0    0    5    4    0    0    0
1066    0    0    0    0    1    0    5    0    4    0
1069    3    0    0    0    2    3    0    0    0    0
1072    2    0    0    0    0    0    0    0    0    0
1075    0    0    1    0    0    0    0    0    0    5
1078    0    0    4    0    0    0    0    0    0    0
# Detaching the packages
detach(package:sna)
detach(package:network)

2.2 Igraph

library(igraph)

Attaching package: 'igraph'
The following objects are masked from 'package:stats':

    decompose, spectrum
The following object is masked from 'package:base':

    union

Small Igraph objects can be created using make_graph(). You can create network from data using one of the functions from the table below. The table point to functions for:

  • creating igraph objects from other R objects
  • transforming igraph objects into other R objects
Object Object -> Igraph Igraph -> Object
Adjacency matrix graph_from_adjacency_matrix as_adjacency_matrix
Edge list graph_from_edgelist as_edgelist
Data frames graph_from_data_frame as_data_frame

2.2.1 Simple graphs with make_graph()

Function make_graph() can quickly create small networks. Relational information can be supplied in two ways:

  1. As a vector of even number of node IDs. Pairs of adjacent IDs are interpreted as edges:

    make_graph( c(1,2, 2,3, 3,4), directed=FALSE)
    IGRAPH 186ea35 U--- 4 3 -- 
    + edges from 186ea35:
    [1] 1--2 2--3 3--4
  2. Using symbolic formula in which

    • -- undirected tie
    • --+ directed tie (+ is arrow’s head)
    • : refer to node sets (e.g. A -- B:C creates ties A -- B and A -- C)
    • A network is either directed or undirected, it is not possible mix directed and undirected ties.
    • Between given two nodes there can be many relations.
    g1 <- make_graph(~ A - B, B - C:D:E)
    g2 <- make_graph(~ A --+ B, B +-- C, A --+ D:E, B --+ A)
    g2
    IGRAPH 944cf90 DN-- 5 5 -- 
    + attr: name (v/c)
    + edges from 944cf90 (vertex names):
    [1] A->B A->D A->E B->A C->B

The print-out of g2 exemplifies how igraph summarizes igraph objects:

  • First line includes
    • Class of the object (IGRAPH)
    • An ID of the object, not of particular interest (see also ?igraph::graph_id)
    • A set of four slots for letter codes indicating, in order:
      • U or D if the network is Undirected or Directed
      • N if the nodes have names
      • W if the network is weighted
      • B if the network is bipartite
    • Number of nodes
    • Number of edges
  • Starting with + attr: list of present attributes, each of the form nameoftheattribute (x/y) where
    • x informs about the type of an attribute: vertex, edge or graph attribute
    • y informs about the mode of an attribute: numeric, character, logical, or extended (e.g. lists)
  • Starting with + edges: a list of (some of) the edges of the network

2.2.2 Networks from adjacency matrices

Igraph objects can be created from adjacency matrices with graph_from_adjacency_matrix():

graph_from_adjacency_matrix(relations, mode="directed")
IGRAPH 699e75e DN-- 26 88 -- 
+ attr: name (v/c)
+ edges from 699e75e (vertex names):
 [1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
 [7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
+ ... omitted several edges

Important arguments:

  • mode – how to interpret the matrix
    • "directed", "undirected": directed/undirected network
    • "max", "min", "sum“: determine the number of \(i\)-\(j\) relations that will be created, e.g., max( m[i,j], m[j,i] ).
    • "lower", "upper": whether to read only lower/upper triangle of the matrix
  • weighted – if TRUE non-zero values of the matrix are stored in edge attribute weight

2.2.3 Networks from edge lists

Function graph_from_edgelist() expects a two-column matrix

edgelist_matrix <- as.matrix(edgelist[,1:2])
head(edgelist_matrix)
     from   to
[1,] 1003 1018
[2,] 1003 1051
[3,] 1003 1072
[4,] 1006 1009
[5,] 1006 1042
[6,] 1006 1057

Now create the object:

graph_from_edgelist(edgelist_matrix, directed=TRUE)
IGRAPH f7db3a9 D--- 1078 88 -- 
+ edges from f7db3a9:
 [1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
 [7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
[49] 1048->1045 1048->1051 1048->1054 1051->1018 1051->1042 1051->1045
+ ... omitted several edges

Note the number of edges! If edgelist matrix contains integers the function assumes that node IDs start from 1 and thus the result will contain a lot of isolates. In this case we have to convert the matrix to character mode before passing it to graph_from_edgelist():

edgelist_matrix_ch <- as.character(edgelist_matrix)
dim(edgelist_matrix_ch) <- dim(edgelist_matrix)
graph_from_edgelist(edgelist_matrix_ch, directed=TRUE)
IGRAPH 989e45f DN-- 25 88 -- 
+ attr: name (v/c)
+ edges from 989e45f (vertex names):
 [1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
 [7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
+ ... omitted several edges

This also shows the disadvantage of solely relying on edgelist representation as we are missing one boy who is an isolate.

2.2.4 Networks from data frames

Igraph objects can be created from data frames with data on edges and, optionally, on vertices with graph_from_data_frame

classroom_kids <- read.csv("classroom-nodes.csv", header=TRUE, colClasses=c(name = "character"))
head(classroom_kids)
  name female isei08_m isei08_f
1 1003  FALSE       NA    25.71
2 1006   TRUE    14.64    33.76
3 1009   TRUE    28.48    37.22
4 1012   TRUE    26.64    25.23
5 1015   TRUE    21.24       NA
6 1018  FALSE    23.47    24.45
classroom_play <- read.csv("classroom-edges.csv", header=TRUE, colClasses = c(from="character", to="character"))
head(classroom_play)
  from   to liking
1 1003 1018      3
2 1003 1051      3
3 1003 1072      4
4 1006 1009      5
5 1006 1042      1
6 1006 1057      4
classroom <- graph_from_data_frame(classroom_play, vertices=classroom_kids,
                                   directed=TRUE)
classroom
IGRAPH 9b31847 DN-- 26 88 -- 
+ attr: name (v/c), female (v/l), isei08_m (v/n), isei08_f (v/n),
| liking (e/n)
+ edges from 9b31847 (vertex names):
 [1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
 [7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
+ ... omitted several edges
  • First two columns of classroom_play are vertex IDs, additional columns are interpreted as edge attributes.
  • First two column of classroom_play is vertex ID, additional columns are interpreted as vertex attributes.
  • All vertex IDs present in edge data frame (classroom_play) must be present in the node data frame (classroom_kids)
detach(package:igraph)

2.3 Tidygraph

Package tidygraph uses igraph internally to store network data but provides a “tidy” interface for data manipulation – network data are interfaced as to interconnected data frames (1) nodes and (2) edges. This is very similar to the data structure accepted by igraph::graph_from_data_frame() demonstrated above.

library(tidygraph)

Attaching package: 'tidygraph'
The following object is masked from 'package:stats':

    filter

Objects can be created with:

  1. tbl_graph() from two data frames, similarly to igraph::graph_from_data_frame()

    tg_classroom <- tbl_graph(nodes = classroom_kids, edges = classroom_play, 
                              directed = TRUE)
    tg_classroom
    # A tbl_graph: 26 nodes and 88 edges
    #
    # A directed simple graph with 2 components
    #
    # A tibble: 26 × 4
      name  female isei08_m isei08_f
      <chr> <lgl>     <dbl>    <dbl>
    1 1003  FALSE      NA       25.7
    2 1006  TRUE       14.6     33.8
    3 1009  TRUE       28.5     37.2
    4 1012  TRUE       26.6     25.2
    5 1015  TRUE       21.2     NA  
    6 1018  FALSE      23.5     24.4
    # ℹ 20 more rows
    #
    # A tibble: 88 × 3
       from    to liking
      <int> <int>  <int>
    1     1     6      3
    2     1    17      3
    3     1    24      4
    # ℹ 85 more rows
  2. as_tbl_graph() which accepts variety of objects: adjacency matrices, igraph, network, ggraph and some more (c.f. the documentation)

    # From igraph object created earlier
    tg_classroom2 <- as_tbl_graph(classroom)
    tg_classroom2
    # A tbl_graph: 26 nodes and 88 edges
    #
    # A directed simple graph with 2 components
    #
    # A tibble: 26 × 4
      name  female isei08_m isei08_f
      <chr> <lgl>     <dbl>    <dbl>
    1 1003  FALSE      NA       25.7
    2 1006  TRUE       14.6     33.8
    3 1009  TRUE       28.5     37.2
    4 1012  TRUE       26.6     25.2
    5 1015  TRUE       21.2     NA  
    6 1018  FALSE      23.5     24.4
    # ℹ 20 more rows
    #
    # A tibble: 88 × 3
       from    to liking
      <int> <int>  <int>
    1     1     6      3
    2     1    17      3
    3     1    24      4
    # ℹ 85 more rows
    # From network object created earlier
    tg_net <- as_tbl_graph(edgeNet)
    tg_net
    # A tbl_graph: 25 nodes and 88 edges
    #
    # A directed simple graph with 1 component
    #
    # A tibble: 25 × 1
      na   
      <lgl>
    1 FALSE
    2 FALSE
    3 FALSE
    4 FALSE
    5 FALSE
    6 FALSE
    # ℹ 19 more rows
    #
    # A tibble: 88 × 4
       from    to liking na   
      <int> <int>  <int> <lgl>
    1     1     6      3 FALSE
    2     1    16      3 FALSE
    3     1    23      4 FALSE
    # ℹ 85 more rows

2.3.1 Working with attributes

In tidygraph you can use dplyr (Wickham et al. 2021) verbs such as mutate(), select() etc. once you activate() either the nodes or edges data frame. Here are some examples.

Calculate social status of kid’s family as a minimal value of social statuses of mother and father:

tg_classroom %>%
  activate(nodes) %>%
  mutate(
    status = pmin(isei08_m, isei08_f, na.rm=TRUE)
  )
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# A tibble: 26 × 5
  name  female isei08_m isei08_f status
  <chr> <lgl>     <dbl>    <dbl>  <dbl>
1 1003  FALSE      NA       25.7   25.7
2 1006  TRUE       14.6     33.8   14.6
3 1009  TRUE       28.5     37.2   28.5
4 1012  TRUE       26.6     25.2   25.2
5 1015  TRUE       21.2     NA     21.2
6 1018  FALSE      23.5     24.4   23.5
# ℹ 20 more rows
#
# A tibble: 88 × 3
   from    to liking
  <int> <int>  <int>
1     1     6      3
2     1    17      3
3     1    24      4
# ℹ 85 more rows

Similarly to dplyr you can use the pipe operator %>% to chain multiple data transformations. Here add a node attribute first, then edge attribute like5 second:

tg_classroom %>%
  activate(nodes) %>%
  mutate(
    status = pmin(isei08_m, isei08_f, na.rm=TRUE)
  ) %>%
  activate(edges) %>%
  mutate(
    like5 = liking == 5  # TRUE if liking is 5
  )
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# A tibble: 88 × 4
   from    to liking like5
  <int> <int>  <int> <lgl>
1     1     6      3 FALSE
2     1    17      3 FALSE
3     1    24      4 FALSE
4     2     3      5 TRUE 
5     2    14      1 FALSE
6     2    19      4 FALSE
# ℹ 82 more rows
#
# A tibble: 26 × 5
  name  female isei08_m isei08_f status
  <chr> <lgl>     <dbl>    <dbl>  <dbl>
1 1003  FALSE      NA       25.7   25.7
2 1006  TRUE       14.6     33.8   14.6
3 1009  TRUE       28.5     37.2   28.5
# ℹ 23 more rows

You can refer to node attributes with .N() when computing on edges data frame and refer to edge attributes with .E() when computing on the nodes. For example, to add an edge attribute which is TRUE if gender of ego and alter match and FALSE otherwise we can use .N() in the following manner. Function .N() returns a the node data frame.

tg_classroom %>%
  activate(edges) %>%
  mutate(
    # Add edge attribute which is TRUE if gender of ego and alter match
    sex_match = .N()$female[from] == .N()$female[to]
  )
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# A tibble: 88 × 4
   from    to liking sex_match
  <int> <int>  <int> <lgl>    
1     1     6      3 TRUE     
2     1    17      3 TRUE     
3     1    24      4 TRUE     
4     2     3      5 TRUE     
5     2    14      1 FALSE    
6     2    19      4 TRUE     
# ℹ 82 more rows
#
# A tibble: 26 × 4
  name  female isei08_m isei08_f
  <chr> <lgl>     <dbl>    <dbl>
1 1003  FALSE      NA       25.7
2 1006  TRUE       14.6     33.8
3 1009  TRUE       28.5     37.2
# ℹ 23 more rows

Use filter() to select subgraphs:

  • Select the subgraph of girls and relations between them:

    tg_classroom %>%
      activate(nodes) %>%
      filter(female)
    # A tbl_graph: 13 nodes and 41 edges
    #
    # A directed simple graph with 1 component
    #
    # A tibble: 13 × 4
      name  female isei08_m isei08_f
      <chr> <lgl>     <dbl>    <dbl>
    1 1006  TRUE       14.6     33.8
    2 1009  TRUE       28.5     37.2
    3 1012  TRUE       26.6     25.2
    4 1015  TRUE       21.2     NA  
    5 1021  TRUE       21.2     25.2
    6 1027  TRUE       NA       26.0
    # ℹ 7 more rows
    #
    # A tibble: 41 × 3
       from    to liking
      <int> <int>  <int>
    1     1     2      5
    2     1     8      4
    3     1    13      1
    # ℹ 38 more rows
  • Select a subgraph of relations for which liking is at least 3

    tg_classroom %>%
      activate(edges) %>%
      filter(liking >= 3)
    # A tbl_graph: 26 nodes and 56 edges
    #
    # A directed simple graph with 3 components
    #
    # A tibble: 56 × 3
       from    to liking
      <int> <int>  <int>
    1     1     6      3
    2     1    17      3
    3     1    24      4
    4     2     3      5
    5     2    19      4
    6     3    19      3
    # ℹ 50 more rows
    #
    # A tibble: 26 × 4
      name  female isei08_m isei08_f
      <chr> <lgl>     <dbl>    <dbl>
    1 1003  FALSE      NA       25.7
    2 1006  TRUE       14.6     33.8
    3 1009  TRUE       28.5     37.2
    # ℹ 23 more rows
detach(package:tidygraph)

2.4 graph

library(graph)
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'
The following object is masked from 'package:statnet.common':

    order
The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min

Package graph is implemented using S4 class system (see e.g. the part “Object oriented programming” of Wickham (2019), especially chapter 15 on the S4 system). The two main classes of objects are:

  • graphAM – networks internally stored as adjacency matrices
  • graphNEL – networks internally stored as adjacency lists (imprecisely called “edge lists” in the documentation). Adjacency list is a list (class of R object) with an element for every node being a vector of adjacent nodes.

Objects can be created with functions with the above names. From adjacency matrices:

gr1 <- graphAM(relations, edgemode = "directed")
gr1
A graphAM graph with directed edges
Number of Nodes = 26 
Number of Edges = 88 

To demonstrate graphNEL we have to create an adjacency list first. We can create it from adjacency matrix relations like so:

adjlist <- apply(relations, 1, function(r) rownames(relations)[which(r == 1)])
head(adjlist) # initial elements of the adj. list
$`1003`
[1] "1018" "1051" "1072"

$`1006`
[1] "1009" "1042" "1057" "1078"

$`1009`
[1] "1006" "1057" "1078"

$`1012`
[1] "1015" "1021" "1075"

$`1015`
[1] "1012" "1021" "1078"

$`1018`
[1] "1042" "1048" "1051" "1060" "1072"

… and now the object:

gr2 <- graphNEL(
  nodes = classroom_kids$name, # names of the nodes
  edgeL = adjlist, # adjacency list of node names
  edgemode = "directed"
)
gr2
A graphNEL graph with directed edges
Number of Nodes = 26 
Number of Edges = 88 

Both types of objects graphAM and graphNEL can store edge and node attributes. There are separate functions edgeData() and nodeData() for setting and accessing edge/node attributes. For example to add female attribute we need to:

# Set the default value, say FALSE
nodeDataDefaults(gr2, attr="female") <- FALSE
# Assign the values
nodeData(gr2, n = classroom_kids$name, attr="female") <- classroom_kids$female

Working with edge attributes look similar, but uses function edgeData() like so:

edgeDataDefaults(gr2, attr = "liking") <- as.numeric(NA)
edgeData(gr2, 
         from = classroom_play$from, 
         to = classroom_play$to, 
         attr = "liking") -> classroom_play$liking
detach(package:graph)

3 Converting objects

Use intergraph (Bojanowski 2015) to convert data objects between igraph and network representations.

# igraph -> network
classroom_network <- intergraph::asNetwork(classroom)

# network -> igraph
classroom_igraph <- intergraph::asIgraph(classroom_network)

classroom_network
 Network attributes:
  vertices = 26 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 88 
    missing edges= 0 
    non-missing edges= 88 

 Vertex attribute names: 
    female isei08_f isei08_m vertex.names 

 Edge attribute names: 
    liking 
classroom_igraph
IGRAPH daec9b4 D--- 26 88 -- 
+ attr: female (v/l), isei08_f (v/n), isei08_m (v/n), na (v/l),
| vertex.names (v/c), liking (e/n), na (e/l)
+ edges from daec9b4:
 [1]  1-> 6  1->17  1->24  2-> 3  2->14  2->19  2->26  3-> 2  3->19  3->26
[11]  4-> 5  4-> 7  4->25  5-> 4  5-> 7  5->26  6->14  6->16  6->17  6->20
[21]  6->24  7-> 2  7-> 3  7-> 4  7->19  7->26  9-> 2  9-> 3  9->16  9->17
[31]  9->19  9->26 10-> 3 10->26 11->17 12-> 2 12->11 12->14 13-> 6 13->14
[41] 14-> 2 14-> 6 14->13 14->17 15-> 6 15->14 16-> 6 16->14 16->15 16->17
[51] 16->18 17-> 6 17->14 17->15 17->16 17->18 18-> 6 18->13 18->16 18->17
[61] 18->24 19-> 3 19->26 20-> 1 20-> 6 20->14 20->17 21-> 2 21->14 21->22
+ ... omitted several edges

All the attributes are copied properly.

Use igraph::as_graphnel() and igraph::graph_from_graphnel() for igraph <-> graph conversions.

4 Capabilities of objects in different packages

Package / Class
network igraph graphNEL1 graphAM1
Bipartite v v x x
Hypergraphs2 v x x x
Vertex attributes v v v v
Edge attributes v v v v
Graph attributes v v v v
Attributes can be lists v v x x
Multigraphs3 x v x x

1 Package 'graph'.

2 Networks with edges connecting sets of vertices.

3 Networks with multiple edges in the same dyad.

5 Visualization

5.1 Network

library(network)

'network' 1.18.1 (2023-01-24), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
library(sna)
sna: Tools for Social Network Analysis
Version 2.7-1 created on 2023-01-24.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

We can plot matrices using the gplot routine from the sna package:

gplot(relations) # Requires sna

Or network objects using the plot command. This automatically incorporates the network level attribute data that the network is undirected. gplot came first and years later the network package with the more specialized data structures was written, but we preserve gplot for the ability to work directly with matrices.

plot(nrelations,displaylabels=T) # Plot with names

plot(nrelations,displaylabels=T,mode="circle") # A less useful layout...

More layout options are included in the sna package for gplot. Here’s one that selects a node and tries to arrange the other nodes around it as a bullseye.

gplot(relations,mode="target")

Let’s color the nodes in gender-stereotypic colors, and increase the size of the nodes

nodeColors<-ifelse(nodeInfo$female,"hotpink","dodgerblue")
plot(nrelations,displaylabels=T,vertex.col=nodeColors,vertex.cex=3)

Same with edgelists, and simple to also display the edge weights

plot(edgeNet,displaylabels=T) ##what's missing?

plot(edgeNet,displaylabels=T,edge.lwd=5*edgeNet%e%"question")

We can now look at slightly more complicated data in the supplied dataset. Plot the contiguity among nations in 1993 (from the Correlates of War (CoW)1 project)

load("introToSNAinR.Rdata")
gplot(contig_1993) # The default visualization

gplot(contig_1993, usearrows=FALSE) # Turn off arrows manually

Here’s an example of directed data|militarized interstate disputes (MIDs) for 1993, with added labels

gplot(mids_1993,label.cex=0.5,label.col="blue",displaylabels=TRUE)

All those isolates can get in the way. We can suppress them using displayisolates

gplot(mids_1993,label.cex=0.5,label.col="blue",displaylabels=TRUE,displayisolates=FALSE)

When a layout is generated, the results can be saved for later reuse. Here we use a spring-embedded algorithm on the global contiguity plot to place the nodes and then plot the edges from the militarized interstate dispute network. It’s very approximate, but generally shorter edges are attacks between contiguous or neigbouring countries and longer edges are between farther away countries.

coords <- gplot(contig_1993,gmode="graph",label=colnames(contig_1993[,]),label.cex=0.5,label.col="blue") # Capture the magic of the moment

head(coords) # Show the vertex coordinates
            x        y
[1,] 23.54235 11.50500
[2,] 25.65262 20.06014
[3,] 32.49855 11.43293
[4,] 34.05418  6.65080
[5,] 38.10269 11.25545
[6,] 35.63348 16.57894

Saved (or a priori) layouts can be used via the coord argument

gplot(mids_1993,gmode="graph",label=colnames(contig_1993[,]),label.cex=0.5,label.col="blue",coord=coords)

When the default settings are insuficient, interactive mode allows for tweaking. This is a bit clunky and not run here, but can be very useful for getting a specific image exactly correct. We haven’t run it here, but you can play around with it later.

coords <- gplot(contig_1993, interactive=TRUE) # Modify and save
gplot(contig_1993,coord=coords,displaylabels=TRUE,gmode="graph",label.cex=0.5,label.col="blue") # Should reproduce the modified layout
# Detaching the packages
detach(package:sna)
detach(package:network)

5.2 Igraph

library("igraph")

Attaching package: 'igraph'
The following objects are masked from 'package:BiocGenerics':

    normalize, path, union
The following objects are masked from 'package:stats':

    decompose, spectrum
The following object is masked from 'package:base':

    union

Network visualization is performed using plot function.

With default settings it looks like

plot(g1)
plot(g2)

Example plots using the classroom data:

plot(
  classroom, 
  layout=layout_with_fr,
  vertex.color="white",
  vertex.size=15,
  edge.arrow.size=0.5,
  vertex.label.color="black",
  vertex.label.family="sans",
  vertex.label=ifelse(V(classroom)$female, "F", "M") 
)


plot(
  classroom, 
  layout=layout_with_fr,
  vertex.label=NA,
  vertex.size=scales::rescale(degree(classroom, mode="in"), c(5, 25)),
  edge.arrow.size=0.5,
  vertex.color=ifelse(V(classroom)$female, "pink", "lightskyblue") 
)

detach(package:igraph)

5.3 A note on layouts

Notable layouts in sna:

  • target

Notable layouts in igraph:

  • sugiyama

Package graphlayouts:

  • stress
  • focus

6 Description

6.1 Network

library(network)

'network' 1.18.1 (2023-01-24), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
library(sna)
sna: Tools for Social Network Analysis
Version 2.7-1 created on 2023-01-24.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

The network package has many routines to describe the data. Dyads give the number of possible edges, so \(n*(n-1)\) for a directed graph, and \(\frac{n*(n-1)}{2}\) for undirected graphs. The edgecount will give the number of actual edges, and the size the number of nodes.

network.dyadcount(nrelations) # How many dyads?
[1] 650
network.edgecount(nrelations) # How many edges are present?
[1] 88
network.size(nrelations) # How large is the network?
[1] 26

Going back to the Correlates of War data, we can look at our centrality measures. Freeman degree is also called total degree and is the sum of the indegrees and outdegrees. One degree centrality function is used for all three with a default of Freeman degree. The input for cmode determines which kind of degree is calculated.

degree(mids_1993) # Default: total degree
  [1]  5  1  0  0  6  0  0  0  0  0  0  0  0  0  0  0  1  1  1  1  0  0  0  0  0
 [26]  0  0  0  0  0  0  0  0  1  0  3  0  2  2  0  4  0  0  0  1  0  0  1  1  0
 [51]  0  0  0  2  0  0  2  1  3 14  2  1  1  2  0  1  1  9  0  0  0  0  0  3  1
 [76]  2  0  0  0  0  0  0  0  0  0  0  1  0  0  0  2  1  0  0  0  0  1  1  0  1
[101]  0  0  1  1  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
[126]  0  0  0  0  0  0  0  1  3  5  8  2  1  1  0  2  1  0  2  0  0  0  0  6  1
[151]  1  1  1  1  4  0  1  4  1  1  1  0  1  0  0  0  0  0  0  0  0  1  0  0  0
[176]  0  0  0  1  0  0  1  0  0  0  0
ideg <- degree(mids_1993, cmode="indegree") # Indegree for MIDs
odeg <- degree(mids_1993, cmode="outdegree") # Outdegree for MIDs
all(degree(mids_1993) == ideg+odeg) # In + out = total?
[1] TRUE

Once centrality scores are computed, we can handle them using standard R methods. Here, the dashed line indicates where on the plot outgoing attacks would equal incoming attacks (y=x); countries above this line are net agressors, and countries below are net defenders.

plot(ideg, 
     odeg, 
     type="n", 
     xlab="Incoming MIDs", 
     ylab="Outgoing MIDs") # Plot ideg by odeg

abline(0, 1, lty=3)

text(jitter(ideg), 
     jitter(odeg), 
     network.vertex.names(contig_1993), 
     cex=0.75, 
     col=2)

Plot simple histograms of the degree distributions. These can be quite useful to get a sense of how skewed the network is.

hist(ideg, 
     xlab="Indegree",
     main="Indegree Distribution", 
     prob=TRUE)

hist(odeg, 
     xlab="Outdegree", 
     main="Outdegree Distribution", 
     prob=TRUE)

Centrality scores can also be used with other sna routines, e.g., gplot(). Here we’ve used the color functionality in rgb to shade each node by how much of an agressor (red) and defender (blue) compared to the other countries each node is.

gplot(mids_1993, 
      vertex.cex=(ideg+odeg)^0.5, 
      vertex.sides=50,
      label.cex=0.4,
      vertex.col=rgb(odeg/max(odeg),0,ideg/max(ideg)),
      displaylabels=TRUE,
      displayisolates=FALSE)

Betweenness and closeness are also popular measures

bet <- betweenness(contig_1993, 
                   gmode="graph") # Geographic betweenness

bet
  [1] 4997.6666667    0.0000000 1780.9333333 2063.4119048  445.9444444
  [6] 2237.2984127  550.4444444   29.2618687    1.4553030    0.2916667
 [11]    2.2191919    2.2191919    2.3441919    0.2916667  387.7916667
 [16]  640.0412698    1.0079365  262.8190476  197.6650794    0.4444444
 [21]  102.2246032   36.8031746    0.0000000 1359.0634921 1206.5426768
 [26]   44.5261544    2.2166667    0.0000000  242.5166667  760.7691198
 [31]    5.3095238    0.0000000    0.5833333    5.8928571    0.0000000
 [36]   65.6519270    0.0000000    8.7941824    1.0000000    0.0000000
 [41] 1558.2179920    0.0000000    0.0000000  123.6114706  244.8535495
 [46]    0.0000000    0.0000000 1216.0722311  135.5883869  287.1149376
 [51]  149.1209944    3.8980482   19.8155403 1046.8181481    0.0000000
 [56]   11.4371615   11.9421356    1.3251984  109.5508382  157.8191065
 [61]    0.0000000   12.4300337  526.2041909    9.9764069  132.2971043
 [66]    0.0000000  137.1758265 7944.2773019    3.3530897   10.4082027
 [71]    9.9320123  639.1920120    3.9538070    4.1910347   24.4580713
 [76]  215.9500257    1.0678571    4.4209469   86.0305490  683.1869157
 [81]    0.0000000    0.0000000    0.0000000    0.0000000    0.0000000
 [86]    0.0000000 1071.2878616  464.8800926   61.9749147  231.9178436
 [91] 1337.5873379  132.3104055  315.4354711  133.8140313    4.4364799
 [96]    0.0000000   54.8897054    2.0193708  376.7564917  496.6051739
[101]  165.3015620  141.0963159  561.2867358   73.2393103 1264.8267262
[106]  266.5677184  240.3355133 1481.3784317    0.0000000    0.3333333
[111]  916.4855143    7.9373749   44.9283484    9.5413404  254.5378746
[116]  495.7198422  620.0837129   22.4303647    0.0000000  192.0392450
[121]  103.3992854    0.0000000   21.3470314    0.0000000    0.0000000
[126]   49.1977850    0.0000000  228.1977850   29.7663917 1586.8067567
[131]  113.6367319 3396.9524336 2561.6616178 1385.6792878 4295.1559857
[136]  152.0162757  931.6201568   20.0901876    0.0000000    2.0239130
[141]   62.2773441  782.4652751  108.3209576    0.0000000    3.2329359
[146]    3.2329359    0.0000000 1171.7832050  121.9368695  242.2907025
[151]    6.4333836    6.5762408    2.2023810  360.4879556 2805.8710749
[156]    0.0000000    0.0000000    0.0000000    0.0000000  344.9912103
[161] 1217.3329177    0.0000000 1098.0060354    0.0000000  238.4518666
[166]    0.0000000    0.0000000    0.0000000   18.6739827    0.7500000
[171]  101.0817479  293.4544540  125.5886906    0.0000000    0.0000000
[176]  527.4145766  972.9699512    0.0000000  356.5000000    0.0000000
[181]    0.0000000  179.0000000    0.0000000    0.0000000    0.0000000
[186]    0.0000000

Closeness can be a bit tricky for disconnected graphs, but there are alternative definitions that fix some problems - see the help file for a discussion.

gplot(contig_1993, 
      vertex.cex=sqrt(bet)/25, 
      gmode="graph") # Use w/gplot

clo <- closeness(contig_1993) # Geographic closeness
clo # A large world after all?
  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[186] 0
closeness(contig_1993,cmode="suminvundir") 
  [1] 0.2933419 0.2186251 0.2515980 0.2615079 0.2263751 0.2389878 0.2335823
  [8] 0.1879666 0.1906693 0.1843630 0.1906693 0.1906693 0.1933720 0.1843630
 [15] 0.2079823 0.2450215 0.2098887 0.2169157 0.2227715 0.1890634 0.2102346
 [22] 0.2030274 0.1798585 0.2300544 0.2318562 0.1956243 0.1650359 0.1753540
 [29] 0.1879666 0.2059846 0.1677386 0.1614323 0.1560269 0.1668377 0.1587296
 [36] 0.2696933 0.2283376 0.2669906 0.2660897 0.2579816 0.3151094 0.2524024
 [43] 0.2184277 0.2819562 0.2747447 0.2287881 0.2089489 0.3321321 0.3088889
 [50] 0.2905148 0.2698734 0.2568104 0.2662698 0.3195903 0.2252660 0.2704912
 [57] 0.2635736 0.2438438 0.2689790 0.2696096 0.2074174 0.2551051 0.3184234
 [64] 0.2847297 0.2816860 0.2368876 0.2830373 0.3847555 0.2901502 0.2955556
 [71] 0.2928529 0.3213320 0.2775247 0.2775676 0.3015122 0.3166281 0.2795066
 [78] 0.2955556 0.2901502 0.3186186 0.0000000 0.1664139 0.2078757 0.1722698
 [85] 0.2078757 0.1664139 0.2574732 0.2130108 0.2329043 0.2401308 0.2904397
 [92] 0.2136736 0.2125604 0.2360317 0.1766585 0.1688464 0.2116208 0.2062154
 [99] 0.2553754 0.2564565 0.2218396 0.2577499 0.2858451 0.2302800 0.2702724
[106] 0.2540562 0.2578829 0.2575623 0.2168888 0.2195915 0.2580051 0.2421943
[113] 0.2638288 0.2544595 0.2290188 0.2222053 0.2396301 0.1955836 0.2041872
[120] 0.1892055 0.1984279 0.1491523 0.1924305 0.1738151 0.1747160 0.2030611
[127] 0.1616358 0.2057638 0.2347254 0.3060768 0.2867074 0.3400837 0.3156306
[134] 0.3292664 0.3650965 0.2974775 0.3229665 0.2809459 0.2728378 0.2579344
[141] 0.2868018 0.3105405 0.2715315 0.2577864 0.2664736 0.2664736 0.2601673
[148] 0.2866924 0.2848005 0.2976963 0.2533719 0.2542728 0.2411712 0.3125997
[155] 0.3420592 0.2798906 0.2476190 0.2852960 0.2852960 0.2956564 0.2888739
[162] 0.2476190 0.2957400 0.2138846 0.2636551 0.2102810 0.2102810 0.2476190
[169] 0.2353260 0.2115637 0.2548263 0.2618533 0.2398305 0.1887194 0.2061583
[176] 0.2618533 0.2465873 0.1882690 0.1945753 0.0000000 0.1339852 0.1593136
[183] 0.0000000 0.0000000 0.1882690 0.0000000

From centrality to centralization. Here we nest commands - the cmode input is sent to the degree function.

centralization(mids_1993, degree, cmode="indegree") # Do MIDs concentrate?
[1] 0.05773557
centralization(contig_1993, evcent) # Eigenvector centralization
[1] 0.3376634

Elementary graph-level indices are pretty useful. Density is the number of edges divided by the number of possible edges, or \(\frac{E}{n(n-1)}\) for a directed network, and \(\frac{E}{2n(n-1)}\) for undirected graphs.

gden(mids_1993) # Density
[1] 0.002034292

The MAN distribution is quite useful; it lists the number of Mutal, Assymetric, and Null ties in a given graph:

dyad.census(mids_1993)
     Mut Asym  Null
[1,]   3   64 17138
dyad.census(contig_1993)
     Mut Asym  Null
[1,] 534    0 16671

Reciprocity is calculated from the numbers in the dyad census. The defaul routine defines reciprocity as \(\frac{M+N}{M+A+N}\). This is often not what we first think of as reciprocity, since null ties are included in the definition making the MIDS network seem quite reciprocal. Edgewise reciprocity, defined as \(\frac{M}{M+A}\) is interpreted as the probability that a tie sent is also recieved. Under this definiton the MIDS network has a very low reciprocity.

grecip(mids_1993) # Dyadic reciprocity
      Mut 
0.9962802 
grecip(mids_1993, measure="edgewise") # Edgewise reciprocity
       Mut 
0.08571429 

Transitivity is the proportion of paths i–>j–>k where the i–>k edge is also present.

gtrans(mids_1993) # Transitivity
[1] 0.02409639
# Detaching the packages
detach(package:sna)
detach(package:network)

6.2 Igraph

library("igraph")

Attaching package: 'igraph'
The following objects are masked from 'package:BiocGenerics':

    normalize, path, union
The following objects are masked from 'package:stats':

    decompose, spectrum
The following object is masked from 'package:base':

    union
summary(g1)
IGRAPH 3d2eb20 UN-- 5 4 -- 
+ attr: name (v/c)
ecount(g1)      # number of edges
[1] 4
vcount(g1)      # number of vertices
[1] 5
is.directed(g1) # is the network directed?
[1] FALSE

Graph density and reciprocity

Density = proportion of exisintg edges

edge_density(classroom)
[1] 0.1353846

Reciprocity = proportion of mutual connections

g <- make_graph(c(1,2, 2,3, 3,2), n = 3)
reciprocity(g)
[1] 0.6666667
reciprocity(g, mode="ratio")
[1] 0.5

Vertex degrees

Calculating in-/out-/total degrees

degree(classroom)
1003 1006 1009 1012 1015 1018 1021 1024 1027 1030 1033 1036 1039 1042 1045 1048 
   5   11   10    5    4   14    7    0    8    2    2    3    4   13    4    9 
1051 1054 1057 1060 1063 1066 1069 1072 1075 1078 
  15    7    8    5    6    6    5    6    6   11 
degree(classroom, mode="in")
1003 1006 1009 1012 1015 1018 1021 1024 1027 1030 1033 1036 1039 1042 1045 1048 
   2    7    7    2    1    9    2    0    2    0    1    0    2    9    2    4 
1051 1054 1057 1060 1063 1066 1069 1072 1075 1078 
  10    2    6    1    2    2    2    3    2    8 
degree(classroom, mode="out")
1003 1006 1009 1012 1015 1018 1021 1024 1027 1030 1033 1036 1039 1042 1045 1048 
   3    4    3    3    3    5    5    0    6    2    1    3    2    4    2    5 
1051 1054 1057 1060 1063 1066 1069 1072 1075 1078 
   5    5    2    4    4    4    3    3    4    3 

Degree distribution

Fraction of nodes with given degree

degree_distribution(classroom)
 [1] 0.03846154 0.00000000 0.07692308 0.03846154 0.11538462 0.15384615
 [7] 0.15384615 0.07692308 0.07692308 0.03846154 0.03846154 0.07692308
[13] 0.00000000 0.03846154 0.03846154 0.03846154

Centrality

betweenness(classroom)
       1003        1006        1009        1012        1015        1018 
  0.5000000 102.0119048  12.1761905   7.6666667   0.3333333  74.4820346 
       1021        1024        1027        1030        1033        1036 
 10.7833333   0.0000000  27.3978355   0.0000000   1.8333333   0.0000000 
       1039        1042        1045        1048        1051        1054 
  1.2500000 128.2556277   0.0000000  15.6358225  76.8609307   9.5370130 
       1057        1060        1063        1066        1069        1072 
  0.0000000   9.2386364   6.1580087   6.2500000   3.9751082  12.7613636 
       1075        1078 
 16.3833333  17.5095238 
closeness(classroom)
      1003       1006       1009       1012       1015       1018       1021 
0.03225806 0.03448276 0.02564103 0.01851852 0.01694915 0.04166667 0.02083333 
      1024       1027       1030       1033       1036       1039       1042 
       NaN 0.03703704 0.01923077 0.02564103 0.02941176 0.03333333 0.04347826 
      1045       1048       1051       1054       1057       1060       1063 
0.03448276 0.04000000 0.04000000 0.03571429 0.02000000 0.04000000 0.02564103 
      1066       1069       1072       1075       1078 
0.02325581 0.02439024 0.03225806 0.02564103 0.02564103 

See also package netrankr (Schoch 2017) and this blogpost for more centrality indices.

detach(package:igraph)

6.3 graph

Package graph is rather thin with respect to analysis. For the most part it relies on a separate package RBGL (Carey, Long, and Gentleman 2021) available from Bioconductor repository.

7 Practicals

Challenge your self with the set of practicals we have prepared.

Appendix

Session info
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.0 (2023-04-21)
 os       Ubuntu 22.04.2 LTS
 system   x86_64, linux-gnu
 ui       X11
 language en
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Warsaw
 date     2023-06-27
 pandoc   3.1.2 @ /usr/bin/ (via rmarkdown)

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package        * version date (UTC) lib source
 BiocGenerics   * 0.46.0  2023-04-25 [1] Bioconductor
 bslib            0.5.0   2023-06-09 [1] CRAN (R 4.3.0)
 cachem           1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
 cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 coda             0.19-4  2020-09-30 [1] CRAN (R 4.3.0)
 colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
 digest           0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
 dplyr            1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
 evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
 fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
 farver           2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
 fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
 ggforce          0.4.1   2022-10-04 [1] CRAN (R 4.3.0)
 ggplot2          3.4.2   2023-04-03 [1] CRAN (R 4.3.0)
 ggraph           2.1.0   2022-10-09 [1] CRAN (R 4.3.0)
 ggrepel          0.9.3   2023-02-03 [1] CRAN (R 4.3.0)
 glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
 graph            1.78.0  2023-06-27 [1] bioc_git2r (@9df68e8)
 graphlayouts     1.0.0   2023-05-01 [1] CRAN (R 4.3.0)
 gridExtra        2.3     2017-09-09 [1] CRAN (R 4.3.0)
 gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
 highr            0.10    2022-12-22 [1] CRAN (R 4.3.0)
 htmltools        0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
 igraph           1.5.0   2023-06-16 [1] CRAN (R 4.3.0)
 intergraph       2.0-2   2016-12-05 [1] CRAN (R 4.3.0)
 isnar            1.0-0   2023-06-27 [1] Github (mbojan/isnar@5617770)
 jquerylib        0.1.4   2021-04-26 [1] CRAN (R 4.3.0)
 jsonlite         1.8.5   2023-06-05 [1] CRAN (R 4.3.0)
 knitr          * 1.43    2023-05-25 [1] CRAN (R 4.3.0)
 lattice          0.21-8  2023-04-05 [4] CRAN (R 4.3.0)
 lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
 magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 MASS             7.3-60  2023-05-04 [4] CRAN (R 4.3.0)
 munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
 network          1.18.1  2023-01-24 [1] CRAN (R 4.3.0)
 pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 png              0.1-8   2022-11-29 [1] CRAN (R 4.3.0)
 polyclip         1.10-4  2022-10-20 [1] CRAN (R 4.3.0)
 purrr            1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
 R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp             1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
 rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown        2.22    2023-06-01 [1] CRAN (R 4.3.0)
 sass             0.4.6   2023-05-03 [1] CRAN (R 4.3.0)
 scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
 sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 sna              2.7-1   2023-01-24 [1] CRAN (R 4.3.0)
 statnet.common * 4.9.0   2023-05-24 [1] CRAN (R 4.3.0)
 tibble           3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidygraph        1.2.3   2023-02-01 [1] CRAN (R 4.3.0)
 tidyr            1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
 tweenr           2.0.2   2022-09-06 [1] CRAN (R 4.3.0)
 utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
 vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
 viridis          0.6.3   2023-05-03 [1] CRAN (R 4.3.0)
 viridisLite      0.4.2   2023-05-02 [1] CRAN (R 4.3.0)
 withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
 xfun             0.39    2023-04-20 [1] CRAN (R 4.3.0)
 yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)

 [1] /home/mbojan/R/library/4.3
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

References

Bojanowski, Michal. 2015. Intergraph: Coercion Routines for Network Data Objects. http://mbojan.github.io/intergraph.
Butts, Carter T. 2008. “Network: A Package for Managing Relational Data in R.” Journal of Statistical Software 24 (2). https://www.jstatsoft.org/v24/i02/paper.
———. 2020. Sna: Tools for Social Network Analysis. https://CRAN.R-project.org/package=sna.
———. 2021. Network: Classes for Relational Data. The Statnet Project (https://statnet.org). https://CRAN.R-project.org/package=network.
Carey, Vince, Li Long, and R. Gentleman. 2021. RBGL: An Interface to the BOOST Graph Library. http://www.bioconductor.org.
Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. https://igraph.org.
Dolata, Roman, ed. 2014. Czy Szkoła Ma Znaczenie? Zróżnicowanie Wyników Nauczania Po Pierwszym Etapie Edukacyjnym Oraz Jego Pozaszkolne i Szkolne Uwarunkowania. Vol. 1. Warsaw: Instytut Badań Edukacyjnych.
Gentleman, R., Elizabeth Whalen, W. Huber, and S. Falcon. 2020. Graph: Graph: A Package to Handle Graph Data Structures.
Hansen, Kasper Daniel, Jeff Gentry, Li Long, Robert Gentleman, Seth Falcon, Florian Hahne, and Deepayan Sarkar. 2021. Rgraphviz: Provides Plotting Capabilities for R Graph Objects.
Hester, Jim, Gábor Csárdi, Hadley Wickham, Winston Chang, Martin Morgan, and Dan Tenenbaum. 2021. Remotes: R Package Installation from Remote Repositories, Including ’GitHub’. https://CRAN.R-project.org/package=remotes.
Krivitsky, Pavel N., Mark S. Handcock, David R. Hunter, Carter T. Butts, Chad Klumb, Steven M. Goodreau, and Martina Morris. 2003-2020. Statnet: Software Tools for the Statistical Modeling of Network Data. Statnet Development Team. http://statnet.org.
Pedersen, Thomas Lin. 2020. Tidygraph: A Tidy API for Graph Manipulation. https://CRAN.R-project.org/package=tidygraph.
———. 2021. Ggraph: An Implementation of Grammar of Graphics for Graphs and Networks. https://CRAN.R-project.org/package=ggraph.
Schoch, David. 2017. Netrankr: An r Package to Analyze Partial Rankings in Networks.
Wickham, Hadley. 2019. Advanced R. CRC press. https://adv-r.hadley.nz.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.