1 Introduction to this workshop/tutorial

This workshop and tutorial provide an overview of R packages for network analysis. This online tutorial is also designed for self-study, with example code and self-contained data.

  • Statnet suite (Krivitsky et al. 2003-2020) including:
    • network (Butts 2008, 2021) – storage and manipulation of network data
    • sna (Butts 2020) – descriptive statistics and graphics for exploratory network analysis`
  • igraph (Csardi and Nepusz 2006)
  • tidygraph (Pedersen 2020) and ggraph (Pedersen 2021)
  • graph (Gentleman et al. 2020) and Rgraphviz (Hansen et al. 2021)

and other more specialized packages that provide tools for e.g. particular SNA techniques or visualization, but rely on one of the above for network data storage and manipulation.

1.1 Prerequisites

This workshop assumes basic familiarity with R, experience with network concepts, terminology and data, and familiarity with the general framework for statistical modeling and inference. While previous experience with ERGMs is not required, some of the topics covered here may be difficult to understand without a strong background in linear and generalized linear models in statistics.

1.2 Software installation

Minimally, you will need to install the latest version of R (available here) and the packages listed below. The workshops are conducted using the free version of RStudio (available here).

The packages required for the workshop can be installed with the following expression:

install.packages(c("network", "sna", "igraph", "tidygraph", "ggraph", 
                   "intergraph", "remotes"))

Package remotes (Hester et al. 2021) is needed to install the remaining two packages.

remotes::install_bioc("graph")

For more information about installing other packages from the Statnet suite can be found on Statnet website. In particular, you can install (but do not have to for this tutorial) the whole Statnet suite with:

install.packages('statnet')

1.3 Necessary data files

  • Classroom data is a network within a school class of 26 9-year-olds coming from a larger study of Dolata (2014). The name generator question was “With whom do you like to play with?”. The data is available in the following files
    • classroom-adjacency.csv with adjacency matrix
    • classroom-edges.csv with an edgelist with edge attribute: liking – numeric, on the scale 1-5 the extent to which ego likes the alter. This attribute has been randomly generated for illustrative purposes.
    • classroom-nodes.csv with node attributes: female – logical, gender (TRUE for girls); isei08_m, isei08_f – numeric, social status score of, respectively, mother and father
  • Several other datasets contained in the file introToSNAinR.Rdata.

Download all the files as a ZIP file intro-sna-data.zip.

The code from this tutorial is available as a script too.

1.4 Working Directory

Before we go further, make sure R’s Working Directory (WD) is set to the folder where you extracted the data files from the ZIP archive for the workshop. If you’ve not set the working directory, you must do so now by one of:

  1. (Recommended) Create an RStudio Project dedictated to the workshop and unpack the data files there.

  2. Use RStudio “Files” tab to navigate to the directory with the workshop files, then click “More” and “Set As Working Directory”:

  3. You can use setwd() to change the working directory as well, like so:

    setwd("path/to/folder/with/workshop/files")

Verify if the WD is set correctly by

  1. Looking at the top of the Console window in RStudio, or

  2. Use getwd():

    getwd() # Check what directory you're in
    [1] "/home/mbojan/Teaching/workshop-intro-sna-tools"
    list.files() # Check what's in the working directory
     [1] "bibliography.bib"               "captab.html"                   
     [3] "captab.Rmd"                     "classroom-adjacency.csv"       
     [5] "classroom-edges.csv"            "classroom-nodes.csv"           
     [7] "common"                         "edgeList.csv"                  
     [9] "index.html"                     "intro_tutorial.html"           
    [11] "intro_tutorial.R"               "intro_tutorial.Rmd"            
    [13] "intro-sna-data.zip"             "introToSNAinR.Rdata"           
    [15] "Makefile"                       "practicals_files"              
    [17] "practicals-solved_files"        "practicals-solved.html"        
    [19] "practicals.html"                "practicals.Rmd"                
    [21] "README.md"                      "relationalData.csv"            
    [23] "rstudio-wd.png"                 "vertexAttributes.csv"          
    [25] "workshop-intro-sna-tools.Rproj"

1.5 Mitigating function name conflicts

Some packages we are going to demonstrate provide functions with identical names as in other packages. Examples include a function get.vertex.attribute() which is defined in packages network and igraph. Hence, if we load both packages with library() it matters which package is loaded last as its version of the function will be used when we write get.vertex.attribute.

In particular, note the following function name clashes:

  • Between igraph and network:

     [1] "%c%"                    "%s%"                    "add.edges"             
     [4] "add.vertices"           "delete.edges"           "delete.vertices"       
     [7] "get.edge.attribute"     "get.edges"              "get.vertex.attribute"  
    [10] "is.bipartite"           "is.directed"            "list.edge.attributes"  
    [13] "list.vertex.attributes" "set.edge.attribute"     "set.vertex.attribute"  
  • Between igraph and sna:

     [1] "betweenness"  "bonpow"       "closeness"    "components"   "degree"      
     [6] "dyad.census"  "evcent"       "hierarchy"    "is.connected" "neighborhood"
    [11] "triad.census"

There are the following strategies to make sure possible conflicts are as painless as possible:

  1. Load the packages but do not attach them and always use ::
  2. Load and attach the packages and use :: for disambiguation.
  3. Load the packages and selectively attach and detach them in order to always have only one of them attached.

In this tutorial we had to deal with these conflicts as well. We have opted for strategy (3) because:

  • Code blocks will illustrate working with a particular package when without worrying about the conflicts.
  • The code examples are clean of :: namespace directives and hence cleaner to read.

The disadvantage is that

  • You will see frequent calls to library() and detach() at the beginning and end of the subsections to make sure only one intended package is attached at a given time.

2 Importing Relational Data

Network data is usually stored as

  • Adjacency matrices
  • Edge lists
  • Edge and vertex data frames

2.1 Network

library(network)

'network' 1.18.1 (2023-01-24), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
library(sna)
Loading required package: statnet.common

Attaching package: &#