This workshop and tutorial provide an overview of R packages for network analysis. This online tutorial is also designed for self-study, with example code and self-contained data.
and other more specialized packages that provide tools for e.g. particular SNA techniques or visualization, but rely on one of the above for network data storage and manipulation.
This workshop assumes basic familiarity with R, experience with network concepts, terminology and data, and familiarity with the general framework for statistical modeling and inference. While previous experience with ERGMs is not required, some of the topics covered here may be difficult to understand without a strong background in linear and generalized linear models in statistics.
Minimally, you will need to install the latest version of R (available here) and the packages listed below. The workshops are conducted using the free version of RStudio (available here).
The packages required for the workshop can be installed with the following expression:
Package remotes (Hester et al. 2021) is needed to install the remaining two packages.
For more information about installing other packages from the Statnet suite can be found on statnet
workshop wiki. In particular, you can install (but do not have to for this tutorial) the whole Statnet suite with:
classroom-adjacency.csv
with adjacency matrixclassroom-edges.csv
with an edgelist with edge attribute: liking
– numeric, on the scale 1-5 the extent to which ego likes the alter. This attribute has been randomly generated for illustrative purposes.classroom-nodes.csv
with node attributes: female
– logical, gender (TRUE
for girls); isei08_m
, isei08_f
– numeric, social status score of, respectively, mother and fatherintroToSNAinR.Rdata
.Download all the files as a ZIP file intro-sna-data.zip
.
The code from this tutorial is available as a script too.
Before we go further, make sure R’s Working Directory (WD) is set to the folder where you extracted the data files from the ZIP archive for the workshop. If you’ve not set the working directory, you must do so now by one of:
(Recommended) Create an RStudio Project dedictated to the workshop and unpack the data files there.
Use RStudio “Files” tab to navigate to the directory with the workshop files, then click “More” and “Set As Working Directory”:
You can use setwd()
to change the working directory as well, like so:
Verify if the WD is set correctly by
Looking at the top of the Console window in RStudio, or
Use getwd()
:
[1] "/home/mbojan/Teaching/workshop-intro-sna-tools"
[1] "bibliography.bib" "captab.html"
[3] "captab.Rmd" "classroom-adjacency.csv"
[5] "classroom-edges.csv" "classroom-nodes.csv"
[7] "edgeList.csv" "intro_tutorial.html"
[9] "intro_tutorial.R" "intro_tutorial.Rmd"
[11] "intro-sna-data.zip" "introToSNAinR.Rdata"
[13] "Makefile" "practicals_files"
[15] "practicals-solved_files" "practicals.Rmd"
[17] "publish.R" "README.md"
[19] "relationalData.csv" "rstudio-wd.png"
[21] "vertexAttributes.csv" "workshop-intro-sna-tools.Rproj"
Some packages we are going to demonstrate provide functions with identical names as in other packages. Examples include a function get.vertex.attribute()
which is defined in packages network and igraph. Hence, if we load both packages with library()
it matters which package is loaded last as its version of the function will be used when we write get.vertex.attribute
.
In particular, note the following function name clashes:
Between igraph and network:
[1] "%c%" "%s%" "add.edges"
[4] "add.vertices" "delete.edges" "delete.vertices"
[7] "get.edge.attribute" "get.edges" "get.vertex.attribute"
[10] "is.bipartite" "is.directed" "list.edge.attributes"
[13] "list.vertex.attributes" "set.edge.attribute" "set.vertex.attribute"
Between igraph and sna:
[1] "betweenness" "bonpow" "closeness" "components" "degree"
[6] "dyad.census" "evcent" "hierarchy" "is.connected" "neighborhood"
[11] "triad.census"
There are the following strategies to make sure possible conflicts are as painless as possible:
::
::
for disambiguation.In this tutorial we had to deal with these conflicts as well. We have opted for strategy (3) because:
::
namespace directives and hence cleaner to read.The disadvantage is that
library()
and detach()
at the beginning and end of the subsections to make sure only one intended package is attached at a given time.Network data is usually stored as
'network' 1.17.1 (2021-06-12), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
Loading required package: statnet.common
Attaching package: 'statnet.common'
The following objects are masked from 'package:base':
attr, order
sna: Tools for Social Network Analysis
Version 2.6 created on 2020-10-5.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
For citation information, type citation("sna").
Type help(package="sna") to get started.
Read an adjacency matrix (R stores it as a data frame by default). R also won’t permit numbers as column names, although this is fine for rownames.
relations <- read.csv("classroom-adjacency.csv",header=T,row.names=1,stringsAsFactors=FALSE)
relations[1:10,1:10] #look at a subgraph using bracket notation
X1003 X1006 X1009 X1012 X1015 X1018 X1021 X1024 X1027 X1030
1003 0 0 0 0 0 1 0 0 0 0
1006 0 0 1 0 0 0 0 0 0 0
1009 0 1 0 0 0 0 0 0 0 0
1012 0 0 0 0 1 0 1 0 0 0
1015 0 0 0 1 0 0 1 0 0 0
1018 0 0 0 0 0 0 0 0 0 0
1021 0 1 1 1 0 0 0 0 0 0
1024 0 0 0 0 0 0 0 0 0 0
1027 0 1 1 0 0 0 0 0 0 0
1030 0 0 1 0 0 0 0 0 0 0
We might want to store it as a matrix. Most routines will accept either data format. However, depending on how a function was written, it might require one or the other. The isSymmetric
function from the sna
package is one example that requires a matrix rather than a data frame.
[1] FALSE
To make the row and column names identical, we can overwrite the rownames:
Read in some vertex attribute data (okay to leave it as a data frame - in fact converting to a matrix would create problems as matrices can only have strings or numbers, but data frames can have vectors of both)
name female isei08_m isei08_f
1 1003 FALSE NA 25.71
2 1006 TRUE 14.64 33.76
3 1009 TRUE 28.48 37.22
4 1012 TRUE 26.64 25.23
5 1015 TRUE 21.24 NA
6 1018 FALSE 23.47 24.45
We could also convert it to a network object. This would be useful for (1) storing all data in the same file, (2) a more compact format for large, sparse matrices, or (3) using the data in later analyses where the routines require network objects (e.g. ERGM)
Network attributes:
vertices = 26
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges = 88
missing edges = 0
non-missing edges = 88
density = 0.1353846
Vertex attributes:
vertex.names:
character valued attribute
26 valid vertex names
No edge attributes
Network edgelist matrix:
[,1] [,2]
[1,] 20 1
[2,] 24 1
[3,] 3 2
[4,] 7 2
[5,] 9 2
[6,] 12 2
[7,] 14 2
[8,] 21 2
[9,] 26 2
[10,] 2 3
[11,] 7 3
[12,] 9 3
[13,] 10 3
[14,] 19 3
[15,] 25 3
[16,] 26 3
[17,] 5 4
[18,] 7 4
[19,] 4 5
[20,] 1 6
[21,] 13 6
[22,] 14 6
[23,] 15 6
[24,] 16 6
[25,] 17 6
[26,] 18 6
[27,] 20 6
[28,] 24 6
[29,] 4 7
[30,] 5 7
[31,] 22 9
[32,] 25 9
[33,] 12 11
[34,] 14 13
[35,] 18 13
[36,] 2 14
[37,] 6 14
[38,] 12 14
[39,] 13 14
[40,] 15 14
[41,] 16 14
[42,] 17 14
[43,] 20 14
[44,] 21 14
[45,] 16 15
[46,] 17 15
[47,] 6 16
[48,] 9 16
[49,] 17 16
[50,] 18 16
[51,] 1 17
[52,] 6 17
[53,] 9 17
[54,] 11 17
[55,] 14 17
[56,] 16 17
[57,] 18 17
[58,] 20 17
[59,] 23 17
[60,] 24 17
[61,] 16 18
[62,] 17 18
[63,] 2 19
[64,] 3 19
[65,] 7 19
[66,] 9 19
[67,] 25 19
[68,] 26 19
[69,] 6 20
[70,] 22 21
[71,] 23 21
[72,] 21 22
[73,] 23 22
[74,] 21 23
[75,] 22 23
[76,] 1 24
[77,] 6 24
[78,] 18 24
[79,] 4 25
[80,] 22 25
[81,] 2 26
[82,] 3 26
[83,] 5 26
[84,] 7 26
[85,] 9 26
[86,] 10 26
[87,] 19 26
[88,] 25 26
Here the row and column names have been carried through becasue they were attached to the matrix. We can look at them by using the network variable methods and the shorthand %v%:
[1] "na" "vertex.names"
[1] "1003" "1006" "1009" "1012" "1015" "1018" "1021" "1024" "1027" "1030"
[11] "1033" "1036" "1039" "1042" "1045" "1048" "1051" "1054" "1057" "1060"
[21] "1063" "1066" "1069" "1072" "1075" "1078"
If we wanted to set the names back to the original numbers, we could use these methods as well:
[1] 1003 1006 1009 1012 1015 1018 1021 1024 1027 1030 1033 1036 1039 1042 1045
[16] 1048 1051 1054 1057 1060 1063 1066 1069 1072 1075 1078
Reading in an edgelist and converting it to a network object is also straightforward. Edgelists are useful because they are a smaller, more concise data structure for larger, sparser networks that we typically deal with in social network analysis.
In the newest release of statnet it will automatically read the weight data and store it as “Weight.” If you’re using an older version of statnet, you might need to add two more commands to the network
command: ignore.eval=FALSE
and names.eval="Weight"
.
from to liking
1 1003 1018 3
2 1003 1051 3
3 1003 1072 4
4 1006 1009 5
5 1006 1042 1
6 1006 1057 4
Network attributes:
vertices = 25
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 88
missing edges= 0
non-missing edges= 88
Vertex attribute names:
vertex.names
Edge attribute names:
liking
Converting back to an adjacency matrix is simple:
1003 1006 1009 1012 1015 1018 1021 1027 1030 1033 1036 1039 1042 1045 1048
1003 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1006 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
1009 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1012 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
1015 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
1018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
1021 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
1027 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1
1030 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1033 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1036 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0
1039 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
1042 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0
1045 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
1048 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0
1051 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1
1054 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1
1057 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1060 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0
1063 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
1066 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1069 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1072 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1075 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
1078 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1051 1054 1057 1060 1063 1066 1069 1072 1075 1078
1003 1 0 0 0 0 0 0 1 0 0
1006 0 0 1 0 0 0 0 0 0 1
1009 0 0 1 0 0 0 0 0 0 1
1012 0 0 0 0 0 0 0 0 1 0
1015 0 0 0 0 0 0 0 0 0 1
1018 1 0 0 1 0 0 0 1 0 0
1021 0 0 1 0 0 0 0 0 0 1
1027 1 0 1 0 0 0 0 0 0 1
1030 0 0 0 0 0 0 0 0 0 1
1033 1 0 0 0 0 0 0 0 0 0
1036 0 0 0 0 0 0 0 0 0 0
1039 0 0 0 0 0 0 0 0 0 0
1042 1 0 0 0 0 0 0 0 0 0
1045 0 0 0 0 0 0 0 0 0 0
1048 1 1 0 0 0 0 0 0 0 0
1051 0 1 0 0 0 0 0 0 0 0
1054 1 0 0 0 0 0 0 1 0 0
1057 0 0 0 0 0 0 0 0 0 1
1060 1 0 0 0 0 0 0 0 0 0
1063 0 0 0 0 0 1 1 0 0 0
1066 0 0 0 0 1 0 1 0 1 0
1069 1 0 0 0 1 1 0 0 0 0
1072 1 0 0 0 0 0 0 0 0 0
1075 0 0 1 0 0 0 0 0 0 1
1078 0 0 1 0 0 0 0 0 0 0
In network
edges and edge weights are considered separate. This is confusing, but done for a number of reasons. (1) you might want multiple types of weights associated with a given edge, or (2) you might want a weight associated where there isn’t an edge at all.
To see a particular weight, use the edge attribute shorthand %e% and to get the full network with weights, the command as.sociomatrix.sna
. Note that the network
command just called the weights by the column name from the csv file.
[1] "liking" "na"
[1] 3 3 4 5 1 4 1 2 3 5 5 4 3 2 1 2 1 4 3 5 2 2 4 3 4 4 4 3 2 5 3 4 4 3 1 4 2 2
[39] 4 3 2 3 2 1 2 1 1 4 5 2 2 4 3 4 1 1 3 3 2 1 4 4 4 2 5 4 5 3 3 5 4 3 1 5 4 3
[77] 2 3 3 3 2 5 3 1 5 1 1 4
1003 1006 1009 1012 1015 1018 1021 1027 1030 1033 1036 1039 1042 1045 1048
1003 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0
1006 0 0 5 0 0 0 0 0 0 0 0 0 1 0 0
1009 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
1012 0 0 0 0 5 0 4 0 0 0 0 0 0 0 0
1015 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0
1018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 4
1021 0 2 4 3 0 0 0 0 0 0 0 0 0 0 0
1027 0 4 3 0 0 0 0 0 0 0 0 0 0 0 2
1030 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0
1033 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1036 0 4 0 0 0 0 0 0 0 2 0 0 2 0 0
1039 0 0 0 0 0 4 0 0 0 0 0 0 3 0 0
1042 0 2 0 0 0 3 0 0 0 0 0 2 0 0 0
1045 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0
1048 0 0 0 0 0 1 0 0 0 0 0 0 4 5 0
1051 0 0 0 0 0 4 0 0 0 0 0 0 3 4 1
1054 0 0 0 0 0 3 0 0 0 0 0 3 0 0 2
1057 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0
1060 2 0 0 0 0 5 0 0 0 0 0 0 4 0 0
1063 0 3 0 0 0 0 0 0 0 0 0 0 3 0 0
1066 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0
1069 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1072 3 0 0 0 0 3 0 0 0 0 0 0 0 0 0
1075 0 0 5 0 0 0 0 3 0 0 0 0 0 0 0
1078 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1051 1054 1057 1060 1063 1066 1069 1072 1075 1078
1003 3 0 0 0 0 0 0 4 0 0
1006 0 0 4 0 0 0 0 0 0 1
1009 0 0 3 0 0 0 0 0 0 5
1012 0 0 0 0 0 0 0 0 3 0
1015 0 0 0 0 0 0 0 0 0 2
1018 3 0 0 5 0 0 0 2 0 0
1021 0 0 4 0 0 0 0 0 0 4
1027 5 0 3 0 0 0 0 0 0 4
1030 0 0 0 0 0 0 0 0 0 3
1033 1 0 0 0 0 0 0 0 0 0
1036 0 0 0 0 0 0 0 0 0 0
1039 0 0 0 0 0 0 0 0 0 0
1042 1 0 0 0 0 0 0 0 0 0
1045 0 0 0 0 0 0 0 0 0 0
1048 2 2 0 0 0 0 0 0 0 0
1051 0 1 0 0 0 0 0 0 0 0
1054 1 0 0 0 0 0 0 4 0 0
1057 0 0 0 0 0 0 0 0 0 4
1060 5 0 0 0 0 0 0 0 0 0
1063 0 0 0 0 0 5 4 0 0 0
1066 0 0 0 0 1 0 5 0 4 0
1069 3 0 0 0 2 3 0 0 0 0
1072 2 0 0 0 0 0 0 0 0 0
1075 0 0 1 0 0 0 0 0 0 5
1078 0 0 4 0 0 0 0 0 0 0
Attaching package: 'igraph'
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
Small Igraph objects can be created using make_graph()
. You can create network from data using one of the functions from the table below. The table point to functions for:
Object | Object -> Igraph | Igraph -> Object |
---|---|---|
Adjacency matrix | graph_from_adjacency_matrix |
as_adjacency_matrix |
Edge list | graph_from_edgelist |
as_edgelist |
Data frames | graph_from_data_frame |
as_data_frame |
make_graph()
Function make_graph()
can quickly create small networks. Relational information can be supplied in two ways:
As a vector of even number of node IDs. Pairs of adjacent IDs are interpreted as edges:
IGRAPH 179053b U--- 4 3 --
+ edges from 179053b:
[1] 1--2 2--3 3--4
Using symbolic formula in which
--
undirected tie--+
directed tie (+
is arrow’s head):
refer to node sets (e.g. A -- B:C
creates ties A -- B
and A -- C
)IGRAPH b6fd761 DN-- 5 5 --
+ attr: name (v/c)
+ edges from b6fd761 (vertex names):
[1] A->B A->D A->E B->A C->B
The print-out of g2
exemplifies how igraph summarizes igraph objects:
IGRAPH
)?igraph::graph_id
)U
or D
if the network is U
ndirected or D
irectedN
if the nodes have namesW
if the network is weightedB
if the network is bipartitelist of present attributes, each of the form
nameoftheattribute (x/y)where -
xinforms about the type of an attribute:
vertex,
edge or
graph attribute -
yinforms about the mode of an attribute:
numeric,
character,
logical, or e
x`tended (e.g. lists)Igraph objects can be created from adjacency matrices with graph_from_adjacency_matrix()
:
IGRAPH 1c96e37 DN-- 26 88 --
+ attr: name (v/c)
+ edges from 1c96e37 (vertex names):
[1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
[7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
+ ... omitted several edges
Important arguments:
mode
– how to interpret the matrix
"directed"
, "undirected"
: directed/undirected network"max"
, "min"
, "sum
": determine the number of \(i\)-\(j\) relations that will be created, e.g., max( m[i,j], m[j,i] )
."lower"
, "upper"
: whether to read only lower/upper triangle of the matrixweighted
– if TRUE
non-zero values of the matrix are stored in edge attribute weight
Function graph_from_edgelist()
expects a two-column matrix
from to
[1,] 1003 1018
[2,] 1003 1051
[3,] 1003 1072
[4,] 1006 1009
[5,] 1006 1042
[6,] 1006 1057
Now create the object:
IGRAPH 141ddbf D--- 1078 88 --
+ edges from 141ddbf:
[1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
[7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
[49] 1048->1045 1048->1051 1048->1054 1051->1018 1051->1042 1051->1045
+ ... omitted several edges
Note the number of edges! If edgelist matrix contains integers the function assumes that node IDs start from 1 and thus the result will contain a lot of isolates. In this case we have to convert the matrix to character mode before passing it to graph_from_edgelist()
:
edgelist_matrix_ch <- as.character(edgelist_matrix)
dim(edgelist_matrix_ch) <- dim(edgelist_matrix)
graph_from_edgelist(edgelist_matrix_ch, directed=TRUE)
IGRAPH 16cdee7 DN-- 25 88 --
+ attr: name (v/c)
+ edges from 16cdee7 (vertex names):
[1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
[7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
[43] 1042->1039 1042->1051 1045->1018 1045->1042 1048->1018 1048->1042
+ ... omitted several edges
This also shows the disadvantage of solely relying on edgelist representation as we are missing one boy who is an isolate.
Igraph objects can be created from data frames with data on edges and, optionally, on vertices with graph_from_data_frame
classroom_kids <- read.csv("classroom-nodes.csv", header=TRUE, colClasses=c(name = "character"))
head(classroom_kids)
name female isei08_m isei08_f
1 1003 FALSE NA 25.71
2 1006 TRUE 14.64 33.76
3 1009 TRUE 28.48 37.22
4 1012 TRUE 26.64 25.23
5 1015 TRUE 21.24 NA
6 1018 FALSE 23.47 24.45
classroom_play <- read.csv("classroom-edges.csv", header=TRUE, colClasses = c(from="character", to="character"))
head(classroom_play)
from to liking
1 1003 1018 3
2 1003 1051 3
3 1003 1072 4
4 1006 1009 5
5 1006 1042 1
6 1006 1057 4
classroom <- graph_from_data_frame(classroom_play, vertices=classroom_kids,
directed=TRUE)
classroom
IGRAPH a0b410a DN-- 26 88 --
+ attr: name (v/c), female (v/l), isei08_m (v/n), isei08_f (v/n),
| liking (e/n)
+ edges from a0b410a (vertex names):
[1] 1003->1018 1003->1051 1003->1072 1006->1009 1006->1042 1006->1057
[7] 1006->1078 1009->1006 1009->1057 1009->1078 1012->1015 1012->1021
[13] 1012->1075 1015->1012 1015->1021 1015->1078 1018->1042 1018->1048
[19] 1018->1051 1018->1060 1018->1072 1021->1006 1021->1009 1021->1012
[25] 1021->1057 1021->1078 1027->1006 1027->1009 1027->1048 1027->1051
[31] 1027->1057 1027->1078 1030->1009 1030->1078 1033->1051 1036->1006
[37] 1036->1033 1036->1042 1039->1018 1039->1042 1042->1006 1042->1018
+ ... omitted several edges
classroom_play
are vertex IDs, additional columns are interpreted as edge attributes.classroom_play
is vertex ID, additional columns are interpreted as vertex attributes.classroom_play
) must be present in the node data frame (classroom_kids
)Package tidygraph uses igraph internally to store network data but provides a “tidy” interface for data manipulation – network data are interfaced as to interconnected data frames (1) nodes and (2) edges. This is very similar to the data structure accepted by igraph::graph_from_data_frame()
demonstrated above.
Attaching package: 'tidygraph'
The following object is masked from 'package:stats':
filter
Objects can be created with:
tbl_graph()
from two data frames, similarly to igraph::graph_from_data_frame()
tg_classroom <- tbl_graph(nodes = classroom_kids, edges = classroom_play,
directed = TRUE)
tg_classroom
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# Node Data: 26 x 4 (active)
name female isei08_m isei08_f
<chr> <lgl> <dbl> <dbl>
1 1003 FALSE NA 25.7
2 1006 TRUE 14.6 33.8
3 1009 TRUE 28.5 37.2
4 1012 TRUE 26.6 25.2
5 1015 TRUE 21.2 NA
6 1018 FALSE 23.5 24.4
# … with 20 more rows
#
# Edge Data: 88 x 3
from to liking
<int> <int> <int>
1 1 6 3
2 1 17 3
3 1 24 4
# … with 85 more rows
as_tbl_graph()
which accepts variety of objects: adjacency matrices, igraph, network, ggraph and some more (c.f. the documentation)
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# Node Data: 26 x 4 (active)
name female isei08_m isei08_f
<chr> <lgl> <dbl> <dbl>
1 1003 FALSE NA 25.7
2 1006 TRUE 14.6 33.8
3 1009 TRUE 28.5 37.2
4 1012 TRUE 26.6 25.2
5 1015 TRUE 21.2 NA
6 1018 FALSE 23.5 24.4
# … with 20 more rows
#
# Edge Data: 88 x 3
from to liking
<int> <int> <int>
1 1 6 3
2 1 17 3
3 1 24 4
# … with 85 more rows
# A tbl_graph: 25 nodes and 88 edges
#
# A directed simple graph with 1 component
#
# Node Data: 25 x 1 (active)
na
<lgl>
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
# … with 19 more rows
#
# Edge Data: 88 x 4
from to liking na
<int> <int> <int> <lgl>
1 1 6 3 FALSE
2 1 16 3 FALSE
3 1 23 4 FALSE
# … with 85 more rows
In tidygraph you can use dplyr (Wickham et al. 2021) verbs such as mutate()
, select()
etc. once you activate()
either the nodes
or edges
data frame. Here are some examples.
Calculate social status of kid’s family as a minimal value of social statuses of mother and father:
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# Node Data: 26 x 5 (active)
name female isei08_m isei08_f status
<chr> <lgl> <dbl> <dbl> <dbl>
1 1003 FALSE NA 25.7 25.7
2 1006 TRUE 14.6 33.8 14.6
3 1009 TRUE 28.5 37.2 28.5
4 1012 TRUE 26.6 25.2 25.2
5 1015 TRUE 21.2 NA 21.2
6 1018 FALSE 23.5 24.4 23.5
# … with 20 more rows
#
# Edge Data: 88 x 3
from to liking
<int> <int> <int>
1 1 6 3
2 1 17 3
3 1 24 4
# … with 85 more rows
Similarly to dplyr you can use the pipe operator %>%
to chain multiple data transformations. Here add a node attribute first, then edge attribute like5
second:
tg_classroom %>%
activate(nodes) %>%
mutate(
status = pmin(isei08_m, isei08_f, na.rm=TRUE)
) %>%
activate(edges) %>%
mutate(
like5 = liking == 5 # TRUE if liking is 5
)
# A tbl_graph: 26 nodes and 88 edges
#
# A directed simple graph with 2 components
#
# Edge Data: 88 x 4 (active)
from to liking like5
<int> <int> <int> <lgl>
1 1 6 3 FALSE
2 1 17 3 FALSE
3 1 24 4 FALSE
4 2 3 5 TRUE
5 2 14 1 FALSE
6 2 19 4 FALSE
# … with 82 more rows
#
# Node Data: 26 x 5
name female isei08_m isei08_f status
<chr> <lgl> <dbl> <dbl> <dbl>
1 1003 FALSE NA 25.7 25.7
2 1006 TRUE 14.6 33.8 14.6
3 1009 TRUE 28.5 37.2 28.5
# … with 23 more rows
You can refer to node attributes with .N()
when computing on edges data frame and refer to edge attribute