Last updated: 2024-06-21
This tutorial is a joint product of the Statnet Development Team:
Pavel N. Krivitsky (University of New South Wales)
Martina Morris (University of Washington)
Mark S. Handcock (University of California, Los Angeles)
Carter T. Butts (University of California, Irvine)
David R. Hunter (Penn State University)
Steven M. Goodreau (University of Washington)
Chad Klumb (University of Washington)
Skye Bender de-Moll (Oakland, CA)
Michał Bojanowski (Kozminski University, Poland)
The temporal modeling software demonstrated in this tutorial is
authored by Pavel Krivitsky (tergm
) and Skye Bender de-Moll
(networkdynamic
, tsna
and
ndtv
).
All Statnet packages are open-source, written for the R computing environment, and published on CRAN. The source repositories are hosted on GitHub. Our website is statnet.org
Need help? For general questions and comments, please email the Statnet users group at statnet_help@uw.edu. You’ll need to join the listserv if you’re not already a member. You can do that here: Statnet_help listserve.
Found a bug in our software? Please let us know by filing an issue in the appropriate package GitHub repository, with a reproducible example.
Want to request new functionality? We welcome suggestions – you can make a request by filing an issue on the appropriate package GitHub repository. The chances that this functionality will be developed are substantially improved if the requests are accompanied by some proposed code (we are happy to review pull requests).
For all other issues, please email us at contact@statnet.org.
This workshop and tutorial provide an overview of statistical modeling of temporal network data using Statnet software. The tutorial is also designed for self-study, with example code and self-contained data.
The Statnet packages we will be demonstrating are:
networkDynamic
– storage and manipulation of temporal network datatsna
–
descriptive statistics and graphics for exploratory network
analysis`ndtv
–
utilities for plotting temporal networks (including network movies)tergm
–
statistical tools for estimating TERGMs, model assessment, and dynamic
network simulation.If you are transitioning from the tergm 3.x
package
series, please note the user interface for model specification has
changed as of tergm 4.0
. At this time, the newest version
in tergm 4.x
series is backwards compatible, so if you have
scripts based on the 3.x
syntax, they will continue to
work. However, we will be deprecating some of the functionality in the
future.
This workshop assumes
basic familiarity with R. A good starting place is this introduction
experience with network concepts, terminology and data. A good starting place is this online text, and our intro workshops and tutorials
experience using Exponential family Random Graph Models (ERGMs). The Statnet ERGM workshop is ideal preparation.
Minimally, you will need to install the latest version of R (available here) and the Statnet packages listed above to run the code presented here.
The workshops are conducted using the free version of
Rstudio
(available
here).
To install the packages for this tutorial, open an R session and type:
# statnet packages
install.packages('tergm')
install.packages('tsna')
install.packages('ndtv')
# other packages to enhance graphical output
install.packages('htmlwidgets')
install.packages('latticeExtra')
The two non-Statnet packages here (htmlwidgets
and
latticeExtra
) provide some nice graphing utilities, and
demonstrate one of the many benefits of working in the R software
ecosystem—you’re not restricted to the features in any one package.
Throughout this tutorial we will be exploiting this benefit, using
functions from base R and other packages seamlessly with our Statnet
software.
Make sure the packages are attached:
Check package versions
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] latticeExtra_0.6-30 lattice_0.22-6 htmlwidgets_1.6.4
[4] ndtv_0.13.3 sna_2.7-2 statnet.common_4.9.0
[7] animation_2.7 tsna_0.3.5 tergm_4.2.0
[10] networkDynamic_0.11.4 ergm_4.6.0 network_1.18.2
[13] knitr_1.47
loaded via a namespace (and not attached):
[1] sass_0.4.9 utf8_1.2.4 generics_0.1.3
[4] robustbase_0.99-2 jpeg_0.1-10 digest_0.6.35
[7] magrittr_2.0.3 RColorBrewer_1.1-3 evaluate_0.24.0
[10] lpSolveAPI_5.5.2.0-17.11 grid_4.4.0 networkLite_1.0.5
[13] fastmap_1.2.0 jsonlite_1.8.8 Matrix_1.7-0
[16] rle_0.9.2 purrr_1.0.2 fansi_1.0.6
[19] jquerylib_0.1.4 Rdpack_2.6 cli_3.6.2
[22] rlang_1.1.4 rbibutils_2.2.16 cachem_1.1.0
[25] yaml_2.3.8 tools_4.4.0 parallel_4.4.0
[28] deldir_2.0-4 memoise_2.0.1 coda_0.19-4.1
[31] dplyr_1.1.4 interp_1.1-6 base64_2.0.1
[34] png_0.1-8 vctrs_0.6.5 R6_2.5.1
[37] lifecycle_1.0.4 MASS_7.3-60.2 trust_0.1-8
[40] pkgconfig_2.0.3 pillar_1.9.0 bslib_0.7.0
[43] Rcpp_1.0.12 ergm.multi_0.2.1 glue_1.7.0
[46] DEoptimR_1.1-3 xfun_0.44 tibble_3.2.1
[49] tidyselect_1.2.1 rstudioapi_0.16.0 htmltools_0.5.8.1
[52] nlme_3.1-164 rmarkdown_2.27 compiler_4.4.0
[55] askpass_1.2.0 openssl_2.2.0
Set seed for simulations – this is not necessary, but it ensures that we all get the same results (if we execute the same commands in the same order).
Moving from static to dynamic networks leads to many different types of potential dynamics:
All of this requires new data storage mechanisms, new methods for descriptive statistics, new tools for visualization, and a new framework for modeling. As a result, there are 4 software packages in Statnet that perform these new tasks. Comparing to the Statnet packages used for static network analysis:
Function | Static Networks | Dynamic Networks |
---|---|---|
Data Storage | network | networkDynamic |
Descriptive Stats | sna | tsna |
Visualization | network (plot.network function) | ndtv |
Statistical Modeling | ergm | tergm |
Temporal network information can be collected in different ways, leading to several different types of data:
Each type typically requires different methods for analysis. TERGMs are models for discrete time dynamics, so they can be used for the first two, and both are covered in this tutorial.
There are three general modeling frameworks for temporal network data analysis in the social networks field.
Stochastic Actor Oriented Models (SAOM)–implemented in the
RSiena
package written by Tom A.B. Snijders and colleagues.
SAOMs allow one to model the coevolution of nodal attributes and tie
dynamics. They are formulated as continuous time models, but typically
are estimated on discrete time data.
Relational Event Models–implemented in the Statnet package
relevent
written by Carter Butts and colleagues. These are
continuous time models for node and tie dynamics, so they can assume
dyad independence, and exploit the methodology of logistic
regression.
Temporal Exponential-family Random Graph Models (TERGMS)–This is a large family of models that extend ERGMs to the analysis of dynamic networks. TERGMs represent tie dynamics only, and are typically formulated in discrete time.
This workshop will focus on TERGMs. We’ll start with a quick overview
of the utilities for data storage, descriptive stats and visualization
of temporal network data in the Statnet packages before moving on to the
statistical modeling with the package tergm
.
All temporal network models can be thought of as a form of “agent-based” model, in the sense that they can be used for stochastic simulation of a population of “agents” interacting over time. What makes the statistical temporal network modeling frameworks distinctive is its relationship to observed data:
Principled estimation of the network model parameters from observed data that allows for hypothesis testing and inference at the model building stage; and
Reproduction of the full joint distribution of observed network configurations that are represented by the terms in the model.
These are both powerful advantages over other agent-based modeling frameworks. If your goal is to capture and reproduce (simulate) empirically observed contact patterns in a network, the statistical network modeling frameworks provide that functionality, with a strong scientific foundation.
As with any statistical modeling, a proper statistical network analysis begins with exploratory data analysis (EDA). EDA tools provide insight into the overall structure and dynamics of a network that is changing over time. This is the foundation needed for specifying appropriate statistical models to summarize patterns and test hypotheses.
Temporal network EDA is a major topic in its own right. We only provide a brief intro to the Statnet EDA toolkit here.
A more detailed tutorial on EDA tools can be found in our workshop Temporal network tools in statnet: networkDynamic, ndtv and tsna
networkDynamic
For storing and working with temporal network data.
Temporal network data in Statnet can be stored in several ways,
including R
lists and series, but the
networkDynamic
class objects have been specifically
designed to provide access to a wide range of features and functions
unique to network data analysis. All networksDynamic
objects also have the class network
. They can thus be
edited or queried using standard functions from the network
package, as well as the functions designed for dynamic networks in the
package networkDynamic
.
Let us consider the 3 panels of the Sampson monastery data:
[1] "samplk1" "samplk2" "samplk3" "tab"
To create a dynamic network object from the panels, we need to combine them into a list:
Neither start or onsets specified, assuming start=0
Onsets and termini not specified, assuming each network in network.list should have a discrete spell of length 1
Argument base.net not specified, using first element of network.list instead
Created net.obs.period to describe network
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 0 until 3
Temporal mode: discrete
Time unit: step
Suggested time increment: 1
The networkDynamic
function is able to convert a wide
array of temporal network data types into the
networkDynamic
objects. For more information, see
?networkDynamic
.
As with network
objects you can get a quick summary of a
networkDynamic
object by typing the object name:
NetworkDynamic properties:
distinct change times: 4
maximal time range: 0 until 3
Includes optional net.obs.period attribute:
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 0 until 3
Temporal mode: discrete
Time unit: step
Suggested time increment: 1
Network attributes:
vertices = 18
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
net.obs.period: (not shown)
total edges= 88
missing edges= 0
non-missing edges= 88
Vertex attribute names:
active cloisterville group vertex.names
Edge attribute names:
active
By default, the starting time is set to 0, so the first network will
have this time stamp. You can change that behavior by specifying a
different value, for example:
samp.dyn <- networkDynamic(network.list = samp.list, start=1)
.
It is natural to think of the Sampson network panels as a series of
observations, starting with the first. So we will reset from the default
value to 1.
Onsets and termini not specified, assuming each network in network.list should have a discrete spell of length 1
Argument base.net not specified, using first element of network.list instead
Created net.obs.period to describe network
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 1 until 4
Temporal mode: discrete
Time unit: step
Suggested time increment: 1
NetworkDynamic properties:
distinct change times: 4
maximal time range: 1 until 4
Includes optional net.obs.period attribute:
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 1 until 4
Temporal mode: discrete
Time unit: step
Suggested time increment: 1
Network attributes:
vertices = 18
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
net.obs.period: (not shown)
total edges= 88
missing edges= 0
non-missing edges= 88
Vertex attribute names:
active cloisterville group vertex.names
Edge attribute names:
active
The summary indicates 4 timepoints (1 through 4), even though the data only have 3 panels. This is an artifact of the discrete time representation. If you look at the final network, you’ll see it is empty:
Network attributes:
vertices = 0
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
net.obs.period:
Length Class Mode
observations 1 -none- list
mode 1 -none- character
time.increment 1 -none- numeric
time.unit 1 -none- character
total edges = 0
missing edges = 0
non-missing edges = 0
density = NaN
No vertex attributes
No edge attributes
Network adjacency matrix:
Empty Graph
Compare that to the network at time 1:
Network attributes:
vertices = 18
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
net.obs.period:
Length Class Mode
observations 1 -none- list
mode 1 -none- character
time.increment 1 -none- numeric
time.unit 1 -none- character
total edges = 55
missing edges = 0
non-missing edges = 55
density = 0.1797386
Vertex attributes:
active:
mixed class attribute
18 values
cloisterville:
logical valued attribute
attribute summary:
Mode FALSE TRUE
logical 12 6
group:
character valued attribute
attribute summary:
Loyal Outcasts Turks Waverers
5 3 7 3
vertex.names:
character valued attribute
18 valid vertex names
Edge attributes:
active:
mixed class attribute
55 values
Network edgelist matrix:
[,1] [,2]
[1,] 3 2
[2,] 15 2
[3,] 5 4
[4,] 1 5
[5,] 4 5
[6,] 13 7
[7,] 11 8
[8,] 8 9
[9,] 9 16
[10,] 17 18
[11,] 2 1
[12,] 3 1
[13,] 6 1
[14,] 8 1
[15,] 12 1
[16,] 14 1
[17,] 15 1
[18,] 16 1
[19,] 18 1
[20,] 7 2
[21,] 8 2
[22,] 12 2
[23,] 16 2
[24,] 18 2
[25,] 1 3
[26,] 17 3
[27,] 6 4
[28,] 10 4
[29,] 9 5
[30,] 11 5
[31,] 13 5
[32,] 4 6
[33,] 2 7
[34,] 16 7
[35,] 18 7
[36,] 7 8
[37,] 9 8
[38,] 10 8
[39,] 6 9
[40,] 4 10
[41,] 5 11
[42,] 14 11
[43,] 14 12
[44,] 5 13
[45,] 17 13
[46,] 1 14
[47,] 2 14
[48,] 10 14
[49,] 11 14
[50,] 12 14
[51,] 15 14
[52,] 14 15
[53,] 7 16
[54,] 3 17
[55,] 13 18
The discrete time representation means that specifying
at=1
is equivalent to specifying the semi-open interval
[1-2)
with onset = 1, terminus = 2
.
# Equivalent extracts
nw <- network.extract(samp.dyn, at = 1) # first network
nw2 <- network.extract(samp.dyn, onset = 1, terminus=2) # same network
(NB: If you drill down into the components of these two networkDynamic objects, you’ll see that there is one small difference between them, in the graph attribute list element “net.obs.period$observations”, which reflects how the extract was specified.)
The networkDynamic
package contains many utilities for
modifying, extracting and querying networkDynamic
objects.
For example, the network.extract
function above extracts
one or more networks from the series, while the
network.collapse
function extracts one or more networks and
collapses them into a single network object.
You can get a good overview of the wide range of functionality in this package by reading the package vignette:
As one small example, we’ll just make a quick plot of the three
network panels, using the network.extract
function, and the
time-collapsed network (which shows all ties ever present).
par(mfrow = c(2,2), oma=c(1,1,1,1), mar=c(4,1,1,1))
plot(network.extract(samp.dyn, at = 1), main = "Time 1",
displaylabels = T, label.cex = 0.6, vertex.cex = 2, pad = 0.5)
plot(network.extract(samp.dyn, at = 2), main = "Time2",
displaylabels = T, label.cex = 0.6, vertex.cex = 2, pad = 0.5)
plot(network.extract(samp.dyn, at = 3), main = "Time3",
displaylabels = T, label.cex = 0.6, vertex.cex = 2, pad = 0.5)
plot(samp.dyn, main = "Collapsed",
displaylabels = T, label.cex = 0.6, vertex.cex = 2, pad = 0.5)
Not the most helpful display if you’re seeking some insight into the temporal changes in the friendships among the monks at the Sampson Monastery.
The next packages we’ll review are designed to facilitate summarizing the temporal changes in ways that help to reveal interesting patterns. This kind of exploratory analysis is an important first step in the statistical modeling workflow.
tsna
For descriptive temporal network analysis
As the name suggests, this package generalizes the sna
package functionality for temporal network data. It provides utilities
for calculating a range of cross-sectional and temporal statistics.
We’ll take a brief look at these here.
For a detailed introduction to the tsna
package, see the
vignette.
vignette("tsna_vignette")
The tSnaStats
function calculates traditional
cross-sectional social network analysis metrics at multiple time points,
as a time-series object. If you’re familiar with the metrics from the
package sna
, you’ll recognize these – they’re just
calculated now at each timepoint.
Time Series:
Start = 1
End = 4
Frequency = 1
John Bosco Gregory Basil Peter Bonaventure Berthold Mark Victor Ambrose
1 12 10 5 6 8 4 7 7 5
2 11 12 4 8 10 5 6 5 4
3 7 9 7 7 9 5 8 5 7
4 NA NA NA NA NA NA NA NA NA
Romauld Louis Winfrid Amand Hugh Boniface Albert Elias Simplicius
1 4 5 4 5 10 4 5 4 5
2 4 5 7 5 6 6 5 5 6
3 4 5 9 5 5 5 4 5 6
4 NA NA NA NA NA NA NA NA NA
The tErgmStats
function calculates cross-sectional ergm
term statistics at multiple time points, as a time-series object. Again,
if you’re familiar with the terms available in the Statnet package
ergm
, you’ll recognize these.
Time Series:
Start = 1
End = 4
Frequency = 1
edges triangle
1 55 31
2 57 56
3 56 62
4 0 0
The tPath
, tReach
and
plotPaths
functions calculate temporal paths through
networks, the temporal versions of geodesics.
Temporal paths are unique to dynamic networks, and they are some of
the most interesting properties to explore. Similar to a path in a
static network, they trace out the sequence and distances of nodes in a
networkDynamic
object reachable from an initial node, but
now they only follow paths constrained by edge timing (and direction, if
the network is directed). A temporal path can only traverse active nodes
and edges, so the onset times of successive elements must be greater
than or equal than those of the previous.
Temporal paths determine the diffusion potential across a network.
The “forward reachable set” (FRS) is a metric that captures which vertices can be reached from a specific vertex, and the timing of that reachability. Below, we compute and plot the FRS from vertex 1 (John Bosco) in the Sampson networks.
[1] "John Bosco"
$tdist
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
$previous
[1] 0 3 1 5 1 4 2 7 6 4 5 14 5 1 14 7 3 13
$gsteps
[1] 0 2 1 2 1 3 3 4 4 3 2 2 2 1 2 4 2 3
$start
[1] 1
$end
[1] Inf
$direction
[1] "fwd"
$type
[1] "earliest.arrive"
attr(,"class")
[1] "tPath" "list"
par(mfrow=c(1,2), mar=c(1,1,1,1))
coords <- plot(tp, main="Forward Reachable Set from v1", cex.main=.8) #plots, and saves the coords
plotPaths(samp.dyn, tp,
coord = coords, pad=2,
main = "Overlaid on collapsed Network",
label.cex=.7, label.col="blue", cex.main=.8)
Bosco has a directed path to every other node at time 1, so he is able to reach every other vertex in the first spell.
Bosco’s FRS still traces the ties that create the shortest path (or “geodesic”) to the destination node.
A forward tpath
also traces the earliest arriving path
by default, so in this case, tp$tdist = 0
for all vertices,
because they can all be reached in the first spell. If only one edge can
be traversed per time step in your context, this can be specified using
the graph.step.time
argument to tPath
.
tp$gsteps
is simply the geodesic (shortest path) from
Bonaventure to each other vertex in the first network, and plotting
tp
shows those geodesics.
Note that one can also construct a “backwards reachable set” for Bosco – the set of all vertices that can reach him. Though here, given the completely connected component in time 1, that would look about the same.
In this case, the first network alone determines the temporal path, which is a bit of a boring example. Things get much more interesting when there are multiple components in a graph that change composition over time.
So let’s take a look at a dynamic network that is not
completely connected at the first time slice, to see how the FRS
captures more complex temporal patterns. The example here comes from the
help page for the plotPaths
function.
library(ndtv)
data(moodyContactSim)
# What are the basic properties of this network?
print(moodyContactSim)
NetworkDynamic properties:
distinct change times: 35
maximal time range: 40 until 795
Includes optional net.obs.period attribute:
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 0 until 1000
Temporal mode: discrete
Time unit: step
Suggested time increment: 1
Network attributes:
vertices = 16
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
net.obs.period: (not shown)
total edges= 18
missing edges= 0
non-missing edges= 18
Vertex attribute names:
vertex.names
Edge attribute names:
active
# NOTE: very sparse network with few changes
# What does the collapsed network look like?
plot(moodyContactSim)
# NOTE: the network is completely connected over time, but that does not mean that everyone is reachable by everyone else.
# Let's look at the FRS for vertices 1 and 10
v1path<-tPath(moodyContactSim,v=1)
v10path<-tPath(moodyContactSim,v=10)
# And plot these paths, separately, and overlaid on the collapsed network
par(mfrow = c(1,3))
plot(v1path, main = "FRS from v1")
plot(v10path, main = "FRS from v10")
plotPaths(moodyContactSim,list(v10path,v1path),
main = "FRS overlaid on collapsed network")
# if ndtv package is installed, along with Graphviz system library,
# nice hierarchical trees can be drawn
plot(v10path,
coord=network.layout.animate.Graphviz(
as.network(v10path),
layout.par = list(gv.engine='dot')
),
jitter=FALSE
)
A more extended example that demonstrates some of the challenges in designing useful graphical displays for these FRS paths is provided in Appendix C.
Forward and backward reachable sets are key metrics for understanding
diffusion processes over networks. The tsna
package
vignette demonstrates many more interesting features of temporal paths
that you can explore.
The edgeDuration
and vertexDuration
functions summarize node and tie dynamics.
Durations are another feature unique to temporal networks, representing the length of time that vertices and edges are active. The tsna package contains several functions for reporting on active durations. Most of these functions have options for varying the units of aggregation or analysis. For example, the edge measures can be adjusted from focusing on an ‘edge’ (aggregating across all of the activity spells associated with that edge) or a ‘spell’ (a single interval of contiguous time during which an edge is activated).
1 2 3
48 15 30
# Note bimodality of spell durations -- what does this suggest about the dynamics?
# a table of edge active durations
table(edgeDuration(samp.dyn, mode = 'duration', subject = 'edges'))
1 2 3
38 20 30
# comparing the two: 5 of the dyads have the pattern 1,0,1. so these
# show up as 10 spells of length 1, or 2 edges of active duration 2.
For a list of all the functions available in the tsna
package, type:
ndtv
For visualizing dynamic networks
The Network Dynamic Temporal Visualization (ndtv
)
package provides tools for visualizing changes in network structure and
attributes over time.
ndtv
has its own workshop and tutorial, with materials
online at our Workshops
page
The package includes an example network data set named
short.stergm.sim
which is the output of a toy model based
on Padgett’s Florentine Family Business Ties dataset simulated using the
tergm
package. (We’ll use the tergm
package to
reconstruct this simulated dynamic network in a later section of this
workshop below.)
To get a quick visual summary of how the structure changes over time, we can render the dynamic network as an animation; here we will use an interactive HTML5 animation in a web browser.
This should bring up a browser window displaying the animation. In addition to playing and pausing the movie, it is possible to click on vertices and edges to show their ids, double-click to highlight neighbors, and zoom into the network at any point in time.
Finally, we can render the interactive movie from an Rmarkdown document to an html document like this one.
render.d3movie(short.stergm.sim,
plot.par=list(displaylabels=T),
output.mode = 'htmlWidget') # using htmlwidgets package here
slice parameters:
start:0
end:25
interval:1
aggregate.dur:1
rule:latest
Another interesting visualization tool is the “proximity timeline”. In this view vertices are positioned vertically (y-axis) by their geodesic distance proximity. This means that changes in network structure deflect the vertices’ lines into new positions, attempting to keep closely-tied vertices as neighbors. The edges are not shown at all in this view.
proximity.timeline(short.stergm.sim,default.dist = 6,
mode = 'sammon',labels.at = 17,vertex.cex = 4)
Notice how the bundles of vertex lines diverge at time=20, reflecting the split of the large component into two smaller components.
Ok, so that’s a whirlwind tour through the descriptive tools
available in Statnet for exploring temporal network data. We strongly
encourage exploring your data before you start to model it with
tergm
.
tergm
Temporal ERGMs (TERGMs) are an extension of ERGMs for modeling dynamic networks in discrete time. Where an ERGM posits one model for the prevalence of ties in a single cross-sectional single network, TERGMs instead typically posit two or more models for the tie dynamics in a network over time. It’s worth noting that each of these models is itself an ERGM, but an ERGM that operates on a specific set of nodes and links.
Model specifications in the tergm
package use the
standard R
formula syntax (nw ~ term1 + term2 …), with
terms that represent network statistics, just like we do for ERGMs. But
it also uses some new functions called “Term operators” that determine
how the network statistics (the terms in the model) are calculated.
As with all ERGMs, the object on the LHS of the formula is a network.
Depending on the type of data you have, this will either be a dynamic
network (for panel data) or a simple network (for cross-sectional
estimation with additional information on tie duration). Dynamic
networks can be passed as an R
list object (the
samp.list
object we created above), as a
networkDynamic
object (the samp.dyn
object we
created above), or as a NetSeries
object.
NetSeries
objects are specifically optimized for TERGM
calculations. To construct a NetSeries
object from the
Sampson data:
So which type of network object should you use?
For tergm
model fitting, it doesn’t really matter.
All objects are transformed to NetSeries
behind the scenes
for computational efficiency.
For initial descriptives and EDA, the networkDynamic
format is needed to access the functionality of the tsna
and ndtv
packages.
Both of these objects can also be constructed from a standard R list object, and a list may be more useful for working with base R or other packages.
Bottom line: use what makes the most sense in each context.
Term names, arguments and syntax carry over from the
ergm
package, and refer to the same configuration (e.g.,
edges
counts the ties between two nodes, and
degree(1)
counts the nodes with degree equal to 1).
There are a number of ways to get the help on the terms available in
ergm
:
# full documentation
?ergmTerm
# helpful summaries and tables of terms
vignette('ergm-term-crossRef')
For tergm
there are some new temporally-specific terms
available for monitoring. We won’t be using these here, but you can get
more information by typing:
So, much will look familiar to you from ergm
. That said,
there are some important differences.
The way the terms count configurations is different, and depends on context. This context is set by the term operators.
As a result, the values of the network statistics, and the
interpretation of the coefficients will be different than in a simple
ergm
.
The biggest difference you’ll notice in specifying a TERGM is that
this typically involves more than one formula. The formulas are
specified using the new temporal “operators” available since version 4.0
of the tergm
package.
Just a warning here: there is a lot of conceptual material condensed into a brief introduction.
The best way to unpack it and make sure you understand it is to work through the examples in Appendix B.
There are five temporal operators: Form()
,
Diss()
, Persist()
, Cross()
and
Change()
. Each operator tracks a specific type of tie
dynamic. We describe them below for the case of a two panel network
dataset (t and t+1).
Form()
tracks the formation of ties between time
steps, so the terms in this model represent the factors that influence
tie formation. For example, Form(~edges)
is a model for
homogeneous rates of tie formation, and the estimated coefficient would
represent the log-odds of a tie forming in each timestep. The
Form()
operator constructs a NetSeries
object
that is the union of the t
and t+1
networks, so the edges
term will count all ties that are
present in either network.
Persist()
tracks the persistence of ties between
time steps, so the terms in this model represent the factors that
influence tie persistence. For example, Persist(~edges)
is
a model for homogeneous rates of tie persistence, and the estimated
coefficient would represent the log-odds of a tie persisting (i.e., not
dissolving) in each time step. The Persist()
operator
constructs a NetSeries
object that is the
intersection of the t
and t+1
networks, so the edges
term will count all ties that are
present in both networks.
Diss()
tracks the dissolution of ties between time
steps, so the terms in this model represent the factors that influence
tie dissolution. It is the opposite of Persist()
and only
one of these can be used in specifying a model. For example,
Diss(~edges)
is a model for homogeneous tie dissolution
rates, and the estimated coefficient will be equal to the
Persist(~edges)
coefficient multiplied by
-1
.
Change()
tracks the total number of dyads that
change status between time steps, both formations and dissolutions. So
it represents the rate of “churn” or turnover in dyad status. The
Change(~edges)
model would specify a homogeneous rate of
change, and the estimated coefficient would represent the log-odds of a
dyad changing status – either forming a tie or dissolving a tie – in
each time step. The Change()
operator constructs a
NetSeries
object that includes only the ties that are
present in one of the networks t
and t+1
, but
not both. This operator does not distinguish between formation and
dissolution changes, so the terms in this operator, and their
coefficients, represent effects on both dynamics.
Cross()
tracks the average cross-sectional structure
of the network over time, and in this sense it is not a model for
tie-dynamics. This operator is analogous to a cross-sectional
ergm
– but it operates on a set of networks (typically, a
time-series). It summarizes the presence of the structural feature
across these networks. For example, Cross(~edges)
is a
model for homogeneous tie density, and the estimated coefficient would
represent the log-odds of a tie existing in the network at an arbitrary
time point. The Cross()
operator constructs a
NetSeries
object from the t+1
network, so the
edges
term will count all ties that are present in that
network.
Generalizing to a longer time series just pools the tie counts from
the operators applied to each pair of time-points (or, in the case of
the Cross()
operator, pools all of the time points after
t).
You can mix these operators in different ways, but no more than 2 operators can include the same model term, otherwise the model will be over-specified.
While the operators can be mixed in different ways, there are two natural pairings that are used most frequently:
Form() + Diss()
(a “separable” model)
Cross() + Change()
(a “joint” model)
So we focus on these here.
TERGMs are a very general family of models. Sub-classes within this family are defined by which tie changes are tracked and counted (the “operators” you use), and the model specification for each type of change (the model terms in the operators).
We will focus on two important classes of TERGMs in this tutorial:
the joint model and the separable model. These allow us to introduce all
of the operator terms implemented in the tergm
packages,
and provide a good intuitive foundation for temporal model
specification.
The joint and separable models both specify a pair of equations, but they differ in how they represent the tie-change dynamics:
Joint models specify a process for tie dynamics using
operators that are dependent within time steps; the
Cross() + Change()
model is the classic example. Here,
every dyad is represented in both operators: the Cross()
operator evaluates cross-sectional network statistics over the whole
network, and the Change()
operator evaluates the changes in
dyad status over the whole network. The terms in the two operators are
correlated because the same dyads appear in both sets of terms, so this
model must be jointly evaluated at each step of estimation or simulation
(hence the name).
Separable models specify a process for tie dynamics
using operators that are independent within time-step; the
Form() + Diss()
(or Persist()
) model is the
classic example. Here, the dyads can be partitioned into two mutually
exclusive sets at each timepoint: the empty dyads, which are subject to
formation, and the tied dyads, which are subject to dissolution. A
single dyad change will affect either the Form()
model
statistics or the Diss()/Persist()
model statistics at each
timestep, but not both. So the terms in the two operators are
uncorrelated, and can be separately evaluated at each step for
estimation or simulation.
Note that while these separable models posit that the 2 sets of dyads are independent within timestep, there are two types of dependence still operating:
There is still Markov-dependence between timesteps: the formation
of a tie at time t
can influence the dissolution of an
adjacent tie at t+1
.
And within each operator set, there can still be dyad-dependence within step if dyad-dependent terms are specified in the model.
This distinction also has important implications for the type of data that can be used to estimate a TERGM.
tergm
syntaxIn the ergm
package, the model is specified using
standard R formula notation, with the network on the
LHS and the ergm-terms
on the RHS:
> ergm(my.network ~
edges + nodefactor('age') + gwesp(0, fixed=T)) #do not run this!
For a call to tergm
, you pass
the formula, which will typically include more than one Operator (each with its own set of terms), and
the estimation method, which will depend on the type of data you have.
For example, for a model specified with Form()
and
Diss()
operators:
> tergm(my.dynamic.network ~ #do not run this!
Form(~ edges + gwesp(0, fixed=T)) +
Diss(~ edges + nodefactor('age')),
estimate = 'insert method'
)
The two current options for the estimate
argument
are
CMLE
(conditional maximum likelihood estimation) for
fitting a model to network panel data, and
EGMME
(equilibrium generalized method of moments
estimation), for fitting a model to a single cross-sectional network
with duration information.
There are many other features of a call to tergm
that
will be important; we illustrate them in one or more examples below.
We’ll work through three examples here, first with network panel data, next with cross-sectional data plus duration information, and finally (briefly) with an approximation used in the cross sectional case when durations are very long.
For network panel data
The canonical form of data for modeling dynamic network processes consists of observations of a complete network at two or more points in time on the same node set. Many classic temporal network studies were of this type, which this remains a gold standard for temporal network data collection, so it is a good example to start with.
Let us return to the Sampson monastery data, and consider models for the way this network changes across the 3 observed time points. We will begin with some very basic models, to ensure that we understand how the temporal term operators work.
What do you think would happen if you specified a TERGM with just
a Form()
operator? Would that work? What type of networks
would it produce as at equilibrium?
How about if you specified just a Diss()
operator?
The best way to answer these questions is to try it and see what happens. For that, we’ve developed a series of exercises you can explore in Appendix B.
Here, we will explore the two canonical specifications for separable and joint models.
This is a separable model. We’ll use the Persist()
operator rather than the Diss()
here for several reasons;
one is that the sign of the coefficient indicates the same impact as in
the Form()
operator on whether you expect a tie to be
present.
This is a directed network, and from the descriptive statistics we saw that the number of triangles increased over time. So what kinds of predictors would it be natural to consider?
Should this be in the formation model, the persistence model, or both?
There are many types of directed triads, but to start we might consider the role of cyclical vs. transitive triangles in the network (see examples below). Cyclical triads cumulate up to create non-hierarchical looping structures in the global network. Hierarchical structures are instead generated by the cumulation of transitive triads.
# cyclical
mc <- matrix( c(0,1,0, 0,0,1, 1,0,0), nrow=3, ncol=3, byrow=TRUE)
gc <- network(mc)
# transitive
mt <- matrix( c(0,1,1, 0,0,0, 0,1,0), nrow=3, ncol=3, byrow=TRUE)
gt <- network(mt)
eqtri <- matrix(c(0, 0.25,
0.25, -0.25,
-0.25, -0.25),
nrow = 3, byrow=T)
par(mfrow = c(1,2))
plot.network(gc, coord = eqtri, jitter=F,
edge.lwd = 1, edge.col = "darkgrey",
vertex.cex = 5, vertex.col="lightblue",
arrowhead.cex = 2,
label = c("i", "j", "k"), label.pos=6,
main = "Cyclical Triad")
plot(gt, coord = eqtri, jitter=F,
edge.lwd = 1, edge.col = "darkgrey",
vertex.cex = 5, vertex.col="gold",
arrowhead.cex = 2,
label = c("i", "j", "k"),
main = "Transitive Triad")
The robust way to model this in ERGMs is with the terms
cyclicalties
(the number of ties \(i{\rightarrow}j\) that have at least one
two-path from \(j{\rightarrow}i\)) and
transitiveties
(the number with at least one two-path from
\(i{\rightarrow}j\)).
Why are these model terms described as “robust”?
Triangle terms like these are dyad-dependent terms – which means they specify that presence or absence of one tie influences the status of another. Terms like this induce feedback effects that ripple across a network in complex non-intuitive ways, and they can easily lead to model degeneracy if their impact is not constrained. Model degeneracy typically means that the model you have specified would not have produced the network you observed (unless the network you observed has a degenerate structure). But triad effects are important in networks, and one way we constrain their effects is to change the way the configurations are counted. That is what these two terms do: they count whether there are any two-paths between two nodes, so multiple paths count the same as a single two-path. One result is that a positive coefficient on the term will not put more probability on a multiple two-path than on a single two-path, so it is less likely to induce a degenerate impact.
Should these terms be in the formation model, the persistence model, or both?
In this case, we will specify the same terms in both the formation and persistence model. This is not required in separable models; we could instead, for instance, specify a simple homogeneous Bernoulli model for persistence (all ties are equally likely to persist at each time step) if we thought that was appropriate. Having the same terms also does not mean we have the “same model” for formation and persistence: the coefficient estimates can still differ in both sign and/or magnitude.
What are your hypotheses going in? Do you think
cyclical
ortransitive
ties will influence tie formation and persistence?
In the call to tergm, we specify CMLE
(since we have a
network panel), and the time slices we wish to model (we’ll use all of
them here):
samp.fit.fp <- tergm(samp.list ~
Form(~edges+mutual+cyclicalties+transitiveties) +
Persist(~edges+mutual+cyclicalties+transitiveties),
estimate = "CMLE",
times = c(1:3)
)
summary(samp.fit.fp)
Call:
tergm(formula = samp.list ~ Form(~edges + mutual + cyclicalties +
transitiveties) + Persist(~edges + mutual + cyclicalties +
transitiveties), estimate = "CMLE", times = c(1:3))
Monte Carlo Conditional Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
Form(1)~edges -3.4482 0.3592 0 -9.600 <1e-04 ***
Form(1)~mutual 2.0315 0.4042 0 5.026 <1e-04 ***
Form(1)~cyclicalties -0.1407 0.2068 0 -0.680 0.4963
Form(1)~transitiveties 0.3830 0.2517 0 1.522 0.1280
Persist(1)~edges 0.1974 0.2908 0 0.679 0.4974
Persist(1)~mutual 0.7592 0.5192 0 1.462 0.1437
Persist(1)~cyclicalties -0.1821 0.2533 0 -0.719 0.4722
Persist(1)~transitiveties 0.5152 0.2764 0 1.864 0.0624 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 848.4 on 612 degrees of freedom
Residual Deviance: 378.7 on 604 degrees of freedom
AIC: 394.7 BIC: 430 (Smaller is better. MC Std. Err. = 2.116)
First: what do you notice about the p-values?
Not much is significant here. Keep in mind that this is a small sparse network, with only two changes observed, so the amount of information available for testing these effects is minimal (especially for the persistence model – why?).
But let’s take a look at the direction of the coefficients, and see if they indicate support for our hypotheses, … or not.
All else equal, a relationship is much more likely to form if it will close a mutual pair; the conditional log-odds are increased by 2.03 which translates to an increase in the conditional odds of 7.63 (note that the change statistic can equal only \(0\) or \(1\) in this case). This is one of the few significant effects in the model.
The reciprocity effect on persistence is also positive, but less strong and not statistically significant; the point estimate is an increase of 0.76 in the conditional log-odds of persistence, a bit more than double when transformed to the odds scale.
The coefficients for transitive triads (hierarchical) are positive and borderline significant for both formation and persistence. The coefficients for cyclical triads (egalitarian) are negative for both models, but not significant.
Keep in mind that this is a small sparse network, so the amount of information available for testing these effects is minimal (especially for the dissolution model). Overall, the results suggest that hierarchy plays a role in tie formation and persistence in this monastery, and there is a weak but consistent anti-egalitarianism dynamic.
For comparison we will use the same terms in these two operators, so we can think about the difference in interpretation.
What are your hypotheses going in this time? Which type of configuration would you expect to be more common in the cross-section? Which type would be most likely to change?
samp.fit.cc <- tergm(samp.list ~
Cross(~edges+mutual+cyclicalties+transitiveties) +
Change(~edges+mutual+cyclicalties+transitiveties),
estimate = "CMLE",
times = c(1:3)
)
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Starting Monte Carlo maximum likelihood estimation (MCMLE):
Iteration 1 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 1.3790.
Estimating equations are not within tolerance region.
Iteration 2 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.3640.
Estimating equations are not within tolerance region.
Iteration 3 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0971.
Convergence test p-value: 0.9029. Not converged with 99% confidence; increasing sample size.
Iteration 4 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0292.
Convergence test p-value: 0.8359. Not converged with 99% confidence; increasing sample size.
Iteration 5 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.1077.
Estimating equations are not within tolerance region.
Iteration 6 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0448.
Convergence test p-value: 0.1894. Not converged with 99% confidence; increasing sample size.
Iteration 7 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0212.
Convergence test p-value: 0.0867. Not converged with 99% confidence; increasing sample size.
Iteration 8 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0249.
Convergence test p-value: 0.0001. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.
This model was fit using MCMC. To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.
Call:
tergm(formula = samp.list ~ Cross(~edges + mutual + cyclicalties +
transitiveties) + Change(~edges + mutual + cyclicalties +
transitiveties), estimate = "CMLE", times = c(1:3))
Monte Carlo Conditional Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
Cross(1)~edges -1.89178 0.23889 0 -7.919 <1e-04 ***
Cross(1)~mutual 1.69950 0.37916 0 4.482 <1e-04 ***
Cross(1)~cyclicalties -0.04893 0.17626 0 -0.278 0.7813
Cross(1)~transitiveties 0.46440 0.20782 0 2.235 0.0254 *
Change(1)~edges -1.64877 0.22747 0 -7.248 <1e-04 ***
Change(1)~mutual -0.33010 0.50613 0 -0.652 0.5143
Change(1)~cyclicalties -0.09868 0.17409 0 -0.567 0.5708
Change(1)~transitiveties 0.45930 0.23738 0 1.935 0.0530 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 848.4 on 612 degrees of freedom
Residual Deviance: 379.4 on 604 degrees of freedom
AIC: 395.4 BIC: 430.7 (Smaller is better. MC Std. Err. = 1.387)
Again: what do you notice about the p-values?
Not much is significant, again. This time, both operators are working with the same number of observations (why?).
But let’s take a look at the direction of the coefficients, and see if they indicate support for our hypotheses, … or not.
All else equal, a relationship is much more likely to be present in the network if it is part of a mutual pair; the conditional log-odds are increased by 1.7 which translates to an increase in the conditional odds of 5.47. This is again one of the few significant effects in the model. It’s not surprising given what we saw in the first model – mutual ties are both more likely to form, and to persist.
The reciprocity effect on change is negative: the rate of change for ties in a mutual pair is 0.76 on the log-odds scale, or 5.47 on the odds scale.
Can you explain what those odds mean in terms that make sense to a kid in primary school?
The coefficient for transitive triads (hierarchical) in the cross
section is positive and significant, again what we would expect from the
first model. But note that the coefficient for change is also positive
and borderline significant. So these ties turn over a lot, though we
can’t say if that’s because they’re more likely to form, or to dissolve;
that is the limitation of using the Change()
operator.
The coefficients for cyclical triads (egalitarian) are very weakly negative in the cross-section. This suggests that, whatever the formation and dissolution dynamics, the number of cyclical triads is roughly equal to what you would expect by chance. In contrast to the transitive triads, the ties in cyclical triads are a bit less likely to turn over, but again, this is not a significant effect.
So, which model should you use? Discuss :-)
For cross sectional network data with tie duration information
Now, imagine a different scenario in which we have observed a single cross-sectional network, with retrospective information collected on the duration of ties. This kind of data can be collected in a simple one-time survey, so it is much easier to obtain, and in many contexts it may be the only type of network data that is feasible to collect.
If we assume that the process is in equilibrium, then we can rely on the fact that, at equilibrium,
\[prevalence = incidence \times
duration\] Our cross-sectional network allows us to observe the
\(prevalence\) of ties, and the
retrospective information collected allow us to estimate the \(duration\), so we can use
tergm
to solve for the \(incidence\) in the formation model that
would be consistent with this prevalence and duration.
The duration information required here is obtained by asking respondents to report the duration of relationships. In the simple case of a time-constant persistence hazard, the mean duration is the estimate we need to calculate the persistence edge coefficient. This estimate can come from the same cross-sectional study, or, if necessary, from a completely different data source.
We will demonstrate the latter by creating a toy dataset, based on
Padgett’s cross-sectional flobusiness
network, using a
fictional mean tie duration estimate of 10 time steps, and assuming a
homogeneous persistence process (that is, every existing tie has the
same probability of dissolving, at each step). With real data we could
model heterogeneous durations as we did above, but in this simple case
will suffice to demonstrate the basic modeling tasks.
We begin with a model that sets the density (edges
) and
includes a term to test for possible triad formation bias
(gwesp
):
Form(~ edges + gwesp(0,fixed=T))
Note that with the decay parameter alpha
set to \(0\) the gwesp
term only counts
the first triad closed by a tie so the coefficient will represent the
log-odds of forming a tie given that it creates at least one
“shared-parter” triad.
For our persistence model, we will assume a simple homogenous
process, which corresponds to a model with only an edges
term in it. The coefficient on this term represents the conditional
log-odds of tie persistence at each step, and the inverse is the mean
log duration.
In this case, however, we already know the mean duration, which means we know the rate of persistence/dissolution, so we don’t need to estimate it.
In TERGM notation we specify this as:
Persist(~offset(edges))
A term with a known coefficient in statistical modeling is called an “offset”. The coefficient on the persistence edges term is derived below.
theta.persist
.A persistence model is applied to the set of ties that exist at each step, as reflected in the conditional present in both the numerator and denominator:
\[ \ln{\frac{P(Y_{ij,t+1} = 1 \mid Y_{ij,t} = 1)}{P(Y_{ij,t+1} = 0\mid Y_{ij,t} = 1)}} = \theta'\delta(g(y))_{ij} \]
The numerator represents the case when a tie persists to the next step, and the denominator represents the case when it dissolves.
The one term in our \(\delta(g(y))_{ij}\) vector is the change statistic for the network edge count, which equals one for all \(i,j\).
Let the probability of persistence from one time step to the next for actor pair \(i,j\) be \(p_{ij}\), and the probability of dissolution be \(q_{ij} = 1-p_{ij}\). Our persistence model is homogenous, so we further define \(p_{ij} = p\) for all \(i,j\). Then:
\(\ln{(\frac{p_{ij}}{1-p_{ij}})} = \theta ' \delta(g(y))_{ij}\)
\(\ln{(\frac{p}{1-p})} = \theta\)
\(\ln{(\frac{1-q}{q})} = \theta\)
\(\ln{(\frac{1}{q}-1)} = \theta\)
This is a discrete memoryless process, so the durations have a geometric distribution; and for the geometric distribution, the mean duration is equal to the reciprocal of the dissolution probability. Symbolizing mean relational duration as \(d\), we have \(d = \frac{1}{q}\), and thus:
So, for our persistence model, theta.persist
= \(\ln{(10-1)}\) = \(\ln{9}\) (\(\approx 2.2\)):
In short, we can calculate the persistence coefficient using a (rather simple) closed form solution in this case. And in general, we can do this for all dyadic-dependent terms in a persistence model.
This is the next difference in the EGMME specification.
We specify this with a one-sided formula listing the statistics that
will be used as equilibrium targets.
This can be the same formula as used in one of the operators, or a
totally different formula.
Typically, the target statistic values are drawn from the observed
cross-sectional network that is the focus of analysis. In this case, the
target statistics will be equal to the values of the terms returned by a
call to the summary
function.
For tergms
, as for ergms
, however, the
target values do not have to come from a specific observed network –
they can be derived as counterfactuals, or from multiple data sets, as
long as they are internally consistent (for example, the number of ties
for a nodefactor level cannot exceed the total number of ties in the
network). In this case, we would pass the values of the target
statistics to the model with the argument target.stats
. If
that argument appears in the model specification, it overrides the
values in the observed network data.
Here, we will use the formation model to define the statistics, and take the values from our observed cross-sectional network.
targets = ~edges + gwesp(0, fixed=T)
Note: If one is specifying
targets = ~ <Form() formula>
, then the persistence
coefficient(s) should be specified with an offset, and vice versa.
We put this all together in the call to tergm
, adding in
one additional control argument to monitor model convergence: plotting
the progress of the coefficient estimates and the simulated sufficient
statistics in real time. When one is using a GUI tool like RStudio, it
helps to output this plotting to a separate window, which we do below
with the function X11
(and then turn off that window again
with dev.off()
):
data(florentine)
X11()
set.seed(1)
egmme.fit <- tergm(
flobusiness ~
Form(~ edges + gwesp(0, fixed=T)) +
Persist(~ offset(edges)),
targets = ~ edges + gwesp(0, fixed=T),
offset.coef = theta.persist,
estimate = "EGMME",
control = control.tergm(SA.plot.progress=TRUE)
)
dev.off()
In the X11 window, you should see the MCMC traceplots showing the progress of the coefficient estimates (the f. plots) and the corresponding values of the target statistics.
The real-time feedback suggests that the fitting went well, but let’s
double-check by running the mcmc.diagnostics
:
These look pretty good. The serial correlation evident in the
traceplots could be reduced by modifying some of tergm
’s
control parameters. You can experiment with increasing the
mcmc.interval
to see how that works. But the level of
serial correlation here is not enough to cause concern.
So, we’ll move on to explore the fit object to see what we have:
Call:
tergm(formula = flobusiness ~ Form(~edges + gwesp(0, fixed = T)) +
Persist(~offset(edges)), offset.coef = theta.persist, estimate = "EGMME",
control = control.tergm(SA.plot.progress = TRUE), targets = ~edges +
gwesp(0, fixed = T))
Gradient Descent Equilibrium Generalized Method of Moments Results Coefficients:
Form~edges Form~gwesp.fixed.0 offset(Persist~edges)
-6.474 2.298 2.197
[1] "newnetwork" "newnetworks" "init" "covar"
[5] "mc.se" "eta" "opt.history" "sample"
[9] "network" "network" "coef" "targets"
[13] "target.stats" "estimate" "sample.obs" "control"
[17] "reference" "constraints" "etamap" "offset"
[21] "mle.lik" "MPLE_is_MLE" "ergm_version" "call"
[25] "formula" "estimate.desc" "tergm_version"
Call:
tergm(formula = flobusiness ~ Form(~edges + gwesp(0, fixed = T)) +
Persist(~offset(edges)), offset.coef = theta.persist, estimate = "EGMME",
control = control.tergm(SA.plot.progress = TRUE), targets = ~edges +
gwesp(0, fixed = T))
Gradient Descent Equilibrium Generalized Method of Moments Results Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
Form~edges -6.474 2.366 0 -2.737 0.00621 **
Form~gwesp.fixed.0 2.298 2.248 0 1.022 0.30661
offset(Persist~edges) 2.197 0.000 0 Inf < 1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The following terms are fixed by offset and are not estimated:
offset(Persist~edges)
We have now obtained estimates for the coefficients of a formation model that, conditional on the given dissolution model (and its offsets), yields simulated targets over time that match (in expectation) those observed in the cross-sectional flobusiness network.
From this fitted model we can now simulate new dynamic networks (that is, entire network panels over time) that have the desired cross-sectional structure and mean relational duration. This is a very useful tool – essentially an agent-based simulation that is guaranteed to reproduce, on average, the full joint distribution of the cross-sectional network statistics and partnership durations in the model that we observed in our data.
We can use ndtv
to visualize the simulated network time
series (we’ll just look at the 1st 25 steps):
wealthsize <- log(get.vertex.attribute(flobusiness, "wealth")) * 2/3
# Set the ndtv animation parameter
slice.par = list(start = 0,
end = 25,
interval = 1,
aggregate.dur = 1,
rule = "any")
# Animate
compute.animation(egmme.sim, slice.par = slice.par)
# Set some ndtv rendering parameters
render.par = list(tween.frames = 5,
show.time = T,
show.stats = "~edges+gwesp(0,fixed = T)")
plot.par = list(edge.col = "darkgray",
displaylabels = T,
label.cex = .8,
label.col = "blue",
vertex.cex = wealthsize)
# Show the movie!
render.d3movie(egmme.sim,
render.par = render.par,
plot.par = plot.par,
output.mode = 'htmlWidget')
How well do the cross-sectional networks within our simulated dynamic network fit the probability distribution implied by our model?
We can check by comparing the summary statistics for our observed
network to those for our cross-sectional networks. This is easy, since
by default, the simulate command for a tergm object not only simulates a
dynamic network, but calculates the statistics from the model for each
time point. These are stored in
attribute(simulated.networkDynamic)$stats
.
cbind(model = summary(flobusiness ~ edges + gwesp(0, fixed = T)),
obs = round(colMeans(attributes(egmme.sim)$stats)), 1)
model obs
edges 15 15 1
gwesp.fixed.0 12 12 1
And we can also easily look at a time series and histogram for each statistic:
Of course, one is always free to look under the hood to probe the
model more deeply. For example, consider running this code on your own,
identifying what it is plotting and what you can learn about your model
as a result:
plot(as.matrix(attributes(egmme.sim)$stats))
.
Returning to the elements directly included in the model, we should also check to make sure that our mean duration is what we specified (10 time steps).
This time we’ll use a different approach, using
as.data.frame
, which, when applied to an object of class
networkDynamic
, generates a timed edgelist. From this we
can directly calculate the mean. We’ll show that this produces the same
result as the tsna
edgeDuration function.
# create dataFrame for direct estimation
egmme.sim.df <- as.data.frame(egmme.sim)
names(egmme.sim.df)
[1] "onset" "terminus" "tail"
[4] "head" "onset.censored" "terminus.censored"
[7] "duration" "edge.id"
onset terminus tail head onset.censored terminus.censored duration edge.id
1 0 13 3 5 TRUE FALSE 13 1
cbind(directEst = mean(egmme.sim.df$duration),
ndEst = mean(edgeDuration(egmme.sim,
mode = 'duration',
subject = 'spells')))
directEst ndEst
[1,] 9.896138 9.896138
Although right-censoring is present for some edges in our simulation, with a mean duration of 10 time steps and a simulation 1000 time steps long, the effect of censoring on our observed mean duration should be small, and we see above that it is.
An edge dissolution approximation (EDA) approach for ties with long durations
We can show (Carnegie et al. 2015, Klumb et al. under review) that a good set of starting values for the estimation of the formation model is as follows:
Fit the terms in the formation model using a static ERGM on the cross-sectional network; then
Subtract the values of either the dissolution coefficients themselves (Carnegie et al. 2015) or a slightly adjusted version of them (Klumb et al. under review, see below for adjustment details) from the estimated coefficients for the corresponding terms in the cross-sectional ERGM. The latter is a better fit for sparse networks, and the former for dense ones. (For terms present in the dissolution model but not the formation model, one subtracts them from the implicit 0 coefficient).
The result is a vector of parameter values that are a reasonable
place to start the MCMC chain for the estimation of the formation model.
This is in fact what the tergm
estimation code does by
default for this type of model.
In fact … when mean relational duration is very long, this approximation is so good that it may not be necessary to run a TERGM estimation at all.
This turns out to be quite useful, since models with long mean duration are precisely the ones that are the slowest and most difficult to fit using EGMME. That’s because, with long durations, very few ties dissolve at each MCMC step (toggle), so the fitting algorithm gets very little information to update the estimates, and needs many (many) more steps to accumulate that information.
Again, the equation from epidemiology sheds light on why this works :
\[ incidence \approx prevalence \div duration \]
The edge offset for the dissolution/persistence model gives us log-odds that map onto duration. Fitting the cross-sectional ergm gives us log-odds for prevalence. To estimate incidence, we subtract the duration from prevalence, rather than divide, since we are working on a log scale.
It bears repeating that the terms in your dissolution model must be a subset of the terms in your formation model for you to be able to take advantage of this approximation.
To illustrate, let us reconsider Example 2, but with a mean
relational duration of 100 time steps. And this time, we’ll fit both the
Persist()
and Diss()
versions of the model to
demonstrate.
First, we treat the formation process as if it were a stand-alone
cross-sectional model, and estimate it using a standard cross-sectional
ERGM. (We fit this same formation model in an earlier call to
tergm
, so will snip the output here):
Call:
ergm(formula = flobusiness ~ edges + gwesp(0, fixed = T))
Monte Carlo Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -3.3595 0.6382 0 -5.264 < 1e-04 ***
gwesp.fixed.0 1.5893 0.5960 0 2.667 0.00766 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 166.36 on 120 degrees of freedom
Residual Deviance: 77.79 on 118 degrees of freedom
AIC: 81.79 BIC: 87.37 (Smaller is better. MC Std. Err. = 0.1775)
edges gwesp.fixed.0
-3.359524 1.589315
Next, we subtract the log of the durations from the corresponding
terms (adjusted if necessary for sparse networks) in the formation
model. Specifically, theta.persist
= \(\log{D-1}\) for mean relational duration;
for dense networks, we subtract \(\log{D-1}\) from the formation
edges
coefficient in the formation model, while for sparse
networks, we subtract \(\log{D}\). Our
network density is about 0.125, so we will use the sparse
correction.
In this example, the dissolution model contains only an edges term, so this coefficient should be subtracted from the starting value for the edges term in the formation model.
edges gwesp.fixed.0
-7.964694 1.589315
How well does this approximation do in capturing our desired dynamic
network properties? First, we can simulate from it – here is where
we switch back to a tergm
simulation, with a
tergm
formula:
## Form + Persist
tergm.sim.fp <- simulate(
flobusiness ~
Form(~ edges + gwesp(0,fixed=T)) +
Persist(~ edges),
monitor = "all",
coef = c(theta.form, theta.persist.100),
time.slices = 10000,
dynamic = TRUE)
## Form + Diss
tergm.sim.fd <- simulate(
flobusiness ~
Form(~ edges + gwesp(0,fixed=T)) +
Diss(~ edges),
monitor = "all",
coef = c(theta.form, theta.diss.100),
time.slices = 10000,
dynamic = TRUE)
Then compare the results for cross-sectional network structure.
## Compare
cbind(Observed = summary(flobusiness ~ edges + gwesp(0,fixed = T)),
Persist = round(colMeans(attributes(tergm.sim.fp)$stats)[1:2], 1),
Diss = round(colMeans(attributes(tergm.sim.fd)$stats)[1:2], 1)
)
Observed Persist Diss
edges 15 15.8 17.5
gwesp.fixed.0 12 12.8 14.5
And recovery of the tie duration
cbind(Observed = 100,
Persist = round(mean(edgeDuration(tergm.sim.fd,
mode="duration",
subject="spells")), 1),
Diss = round(mean(edgeDuration(tergm.sim.fp,
mode="duration",
subject="spells")), 1)
)
Observed Persist Diss
[1,] 100 99.6 100.7
One of the most important extensions to tergm
functionality is the ability to estimate models based on egocentrically
sampled network data. This makes it possible to start with data on a
single, cross-sectional, egocentrically sampled network, plus tie
duration, estimate a tergm
, and use the fitted model to
simulate complete dynamic networks over time. For more information,
see:
The ergm.ego
package
and workshop
materials. We teach this workshop at most of the INSNA annual
meetings (Sunbelt, EUSN, NASN).
The EpiModel
package,
which uses this functionality to model diffusion on dynamic networks.
See the EpiModel website and the
weeklong intensive
workshop).
All of the packages reviewed in this tutorial have additional functionality, which you can learn about and explore through the use of R’s many help features.
If you begin to use them in depth, you will likely have further questions. If so, we encourage you to join the Statnet users’ group (https://mailman13.u.washington.edu/mailman/listinfo/statnet_help), where you can then post your questions (and possibly answer others). You may also encounter bugs; please use the same place to report them.
Krivitsky, P.N., Handcock, M.S,(2014). A separable model for dynamic networks JRSS Series B-Statistical Methodology, 76 (1):29-46; 10.1111/rssb.12014 JAN 2014
Krivitsky, P. N., Handcock, M. S., & Morris, M. (2011). Adjusting for network size and composition effects in exponential-family random graph models. Statistical Methodology, 8(4), 319-339. doi:10.1016/j.stamet.2011.01.005
Snijders, T. A. B., van de Bunt, G. G., & Steglich, C. E. G. (2010). Introduction to stochastic actor-based models for network dynamics. Social Networks, 32(1), 44-60. doi:http://dx.doi.org/10.1016/j.socnet.2009.02.004
Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology, 38(1), 155-200. doi:10.1111/j.1467-9531.2008.00203.
Hanneke, S., Fu, W., & Xing, E. P. (2010) Discrete temporal models of social networks. Electron. J. Statist. 4 , 585–605. http://projecteuclid.org/euclid.ejs/1276694116.
Almquist, Z. W., & Butts, C. T. (2014). Logistic Network Regression for Scalable Analysis of Networks with Joint Edge/Vertex Dynamics. Sociological Methodology, 44(1), 273-321. doi:doi:10.1177/0081175013520159
Krivitsky, P. N., & Morris, M. (2017). Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US. Annals of Applied Statistics, 11(1), 427-455. doi:10.1214/16-aoas1010
Carnegie, N. B., Krivitsky, P. N., Hunter, D. R., & Goodreau, S. M. (2015). An approximation method for improving dynamic network model fitting. Journal of Computational and Graphical Statistics, 24(2), 502-519. doi:10.1080/10618600.2014.903087.
Klumb, C., Morris, M., Goodreau, S. M., & Jenness, S. M. (2023). Improving and Extending STERGM Approximations Based on Cross-Sectional Data and Tie Durations. Journal of Computational and Graphical Statistics, 33, 166 - 180. doi:10.1080/10618600.2023.2233593.
The separable model was introduced in Krivitsky and Handcock (2010). “Separable” here means that formation is assumed to be independent of dissolution within time step, and Markov dependent between steps. This allows the factors that influence formation to be different than those that influence dissolution.
Allowing the formation model to be different than the dissolution model is useful in many contexts. Take friendships. It seems intuitive that the factors influencing who becomes friends with whom, out of the set of people who aren’t already) are likely to be different than the factors affecting who stops liking whom, out of the set of people currently in relationships). Any reasonable model of formation would need to include a variety of homophily parameters (mixing on age, for example). Friendship dissolution may or may not. (Conditional on being friends, does a difference in age affect the probability that your friendship ends? Perhaps, but probably not as fundamentally or as strongly as for formation).
In thinking about how STERGMs work, and the difference between the prevalence of ties in the cross section, and the incidence of ties in a temporal data set, it can be helpful to think of the approximation commonly used in epidemiology to consider basic disease dynamics:
\[ prevalence = incidence \times duration \]
To translate this to dynamic networks: the expression above says that the number of ties present in the cross-section (prevalence) is a function of the rate at which they form (incidence) and the rate at which they dissolve (1/duration). The faster ties form, the more ties will be present in the cross-section, and the more slowly they form, the fewer will be present. The slower ties dissolve, the longer the duration, the more ties will be present in the cross-section, and the reverse (faster/shorter/fewer) again holds. The cross-sectional prevalence of ties is determined by both processes.
This same concept extends to other network configurations, for example triangles—the faster that triangles are formed and the slower that they’re dissolved, the more that will be present in the cross-section.
ERGMs are models for the prevalence of ties; they do not represent the implicit rates of incidence and duration, they simply model the net result.
STERGMs are models that explicitly represent the incidence and duration processes.
The two equations governing a STERGM are commonly called the formation equation and the dissolution (or persistence) equation, respectively:
These parallel the traditional cross-sectional ERGM. We have simply:
added a time index to the tie values
added in a new conditional. In the formation equation, the expression is conditional on the tie not existing at the prior time step, and in the dissolution equation it is conditional on the tie existing; and
defined two \(\theta\) and \(g\) vectors instead of just one. Now there are \(\theta^+\) and \(g^+(y)_{ij}\), reflecting the coefficients and statistics in the formation model, and \(\theta^-\) and \(g^-(y)_{ij}\) reflecting those in the dissolution model.
Note that for the dissolution model, the \(P(Y_{ij,t+1} = 1)\) is in the numerator and the \(P(Y_{ij,t+1} = 0)\) is in the denominator. This makes the mathematics parallel to that of the formation model; however, it also means the “dissolution” model is actually a “persistence” model. The probability of dissolution is simply 1 - persistence, so the model can be interpreted in both ways.
When a model is separable, estimation at each step of the MCMC chain proceeds by sending off all empty dyads to be considered for formation (conditioning on the ties that exist), and sending off all the paired dyads to be considered for dissolution (conditioning on the empty dyads), and these two processes proceed independently during that step.
Visually, we can sum this up as:
To understand STERGMs formally, we first review the ERGM framework for cross-sectional or static networks. Let \(\mathbb{Y}\subseteq \{1,\dotsc,n\}^2\) be the set of potential relations (dyads) among \(n\) nodes, ordered for directed networks and unordered for undirected. We can represent a network \(\mathbf{y}\) as a set of ties, with the set of possible sets of ties, \(\mathcal{Y}\subseteq 2^\mathbb{Y}\), being the sample space: \(\mathbf{y} \in \mathcal{Y}\). Let \(\mathbf{y}_{ij}\) be 1 if \((i,j)\in\mathbf{y}\) — a relation of interest exists from \(i\) to \(j\) — and 0 otherwise.
The network also has an associated covariate array \(\mathbf{X}\) containing attributes of the nodes, the dyads, or both. An exponential-family random graph model (ERGM) represents the pmf of \(\mathbf{Y}\) as a function of a \(p\)-vector of network statistics \(g(\mathbf{Y},\mathbf{X})\), with parameters \(\theta \in \mathbb{R}^p\), as follows:
where the normalizing constant \(k(\theta, \mathbf{X}, \mathcal{Y}) = \sum\limits_{\mathbf{y}' \in \mathcal{Y}}\exp\left\{\theta\cdot g(\mathbf{y}', \mathbf{X})\right\}\) is a summation over the space of possible networks on \(n\) nodes, \(\mathcal{Y}\). Where \(\mathcal{Y}\) and \(\mathbf{X}\) are held constant, as in a typical cross-sectional model, they may be suppressed in the notation. Here, on the other hand, the dependence on \(\mathcal{Y}\) and \(\mathbf{X}\) is made explicit.
In modeling the transition from a network \(\mathbf{Y}^{t}\) at time \(t\) to a network \(\mathbf{Y}^{t+1}\) at time \(t+1\), the separable temporal ERGM assumes that the formation and dissolution of ties occur independently from each other within each time step, with each half of the process modeled as an ERGM. For two networks (sets of ties) \(\mathbf{y},\mathbf{y}'\in \mathcal{Y}\), let \(\mathbf{y} \supseteq \mathbf{y}'\) if any tie present in \(\mathbf{y}'\) is also present in \(\mathbf{y}\). Define \(\mathcal{Y}^+(\mathbf{y}) = \{\mathbf{y}'\in\mathcal{Y}:\mathbf{y}'\supseteq\mathbf{y}\}\), the networks that can be constructed by forming ties in \(\mathbf{y}\); and \(\mathcal{Y}^-(\mathbf{y}) = \{\mathbf{y}'\in\mathcal{Y}:\mathbf{y}'\subseteq\mathbf{y}\}\), the networks that can be constructed dissolving ties in \(\mathbf{y}\).
Given \(\mathbf{y}^{t}\), a formation network \(\mathbf{Y}^+\) is generated from an ERGM controlled by a \(p\)-vector of formation parameters \(\theta^+\) and formation statistics \(g^+(\mathbf{y}^+, \mathbf{X})\), conditional on only adding ties:
A dissolution network \(\mathbf{Y}^-\) is simultaneously generated from an ERGM controlled by a (possibly different) \(q\)-vector of dissolution parameters \(\theta^-\) and corresponding statistics \(g^-(\mathbf{y}^-, \mathbf{X})\), conditional on only dissolving ties from \(\mathbf{y}^{t}\), as follows:
The cross-sectional network at time \(t+1\) is then constructed by applying the changes in \(\mathbf{Y}^+\) and \(\mathbf{Y}^-\) to \(\mathbf{y}^{t}\), as follows: \(\mathbf{Y}^{t+1} = \mathbf{Y}^{t}\cup (\mathbf{Y}^+ - \mathbf{Y}^{t}) \, - \, ( \mathbf{Y}^{t} - \mathbf{Y}^-).\) which simplifies to either: \(\mathbf{Y}^{t+1} = \mathbf{Y}^+ - (\mathbf{Y}^{t} - \mathbf{Y}^-)\) \(\mathbf{Y}^{t+1} = \mathbf{Y}^-\cup (\mathbf{Y}^+ - \mathbf{Y}^{t})\)
STERGMs assume that the formation and dissolution processes are independent of each other within the same time step.
This does not mean that they will be independent across time; and in fact, for any dyadic dependent model, they will not be. To see this point, consider a romantic relationship example with:
Form(~ edges + degree(2:10))
Diss(~ edges)
with increasingly negative parameters on the degree terms.
The negative parameters on higher order degree terms means that there
is some underlying tendency for relational formation to occur, governed
by the coefficient on the edges
term, but it is reduced
with each pre-existing tie that the two actors involved already have. In
other words, there is a norm against being in multiple simultaneous
romantic relationships.
However, dissolution in this model is homogeneous—all existing relationships have the same underlying dissolution probability at every time step. (The latter assumption is probably unrealistic; in practice, if someone just acquired a second partner, their first is likely to be at increased risk of dissoving their relation. We ignore this now for simplicity).
While formation and dissolution are independent within a time step, there is dependence between time steps.
Imagine that Chris and Pat are in a relationship at time
t
, and each of them has only this one partner. During the
time between t
and t+1
, whether they break up
does not depend on when either of them acquires a new partner, and vice
versa.
If they do not break up during this time, then during
the next time period between t+1
and t+2
, the
probability that either one forms a new partnership will be lower,
because it is influenced by their existing partnership.
If they do break up during this time, then during the next time
period between t+1
and t+2
, the probability
that either forms a new partnership will be higher, as neither is
currently involved in a partnership.
If we had modeled dissolution as a function of degree, then a similar logic would induce dependence in the hazard of dissolution, as a function of partnerships formed in the prior step.
The simple implication is that formation and dissolution can be dependent in this framework, but that dependence occurs across time steps, not within.
Note that the length of the time step here is abritrary, and left to
the user to define.
At the limit, a STERGM can approximate a continuous-time model by
letting the time interval become very small—the only issue is
computational limitations. The choice of interval length will therefore
depend on context, and on whether the assumption of within-step
independence makes sense in that context. In general, a larger time
interval increases the impact of this assumption, while a smaller
interval reduces the impact and makes the assumption easier to justify.
With a time step of 1 month, if I start a second relationship today, the
earliest I can break up with my first partner as a direct result of my
new partnership is one month later. Shortening the time step to a day
means I can break up a day later, rather than a month later, which is
probably more reasonable. The tradeoff is that a shorter time interval
means longer computation time for both model estimation and
simulation.
The basic intuition for the term operators used in TERGMs is relatively clear – formation of ties (“form”) and dissolution of ties (“diss” or “persist”), cross-sectional equilibrium structure (“cross”) and overall churn (“change”). But, as is often the case with ERGMs, understanding how a specific term in a model influences the structure and dynamics of the network can require a bit more effort. In the TERGM context, the term operators add level of complexity, as they determine which dyads are included in the calculations and which ties are counted by a model term. An “edges” term in an ERGM always has the same meaning: the count of ties in the network. But the meaning of an “edges” term in a TERGM depends on the term operator.
The best way to understand how model terms work in each of the term operators is to start with the simplest possible examples: an “edges only” model using a single term operator, cycling through to each operator to see how this term changes meaning, and what implication that has for the predicted dynamic network. That is the approach we take below.
We work with 2 panels of the Sampson data to:
Fit and simulate from the single Operator(~edges) models, to see what they produce in equilibrium.
Verify that the estimated coefficient on the edges term = logit(p(component change))
The probability of tie component changes in these models is easy to calculate by hand, so it is easy to verify the relationship to the estimated coefficient.
Component | Change probability from t1 to t2 |
---|---|
form | (ties formed) / (empty dyads t1) |
diss | (ties dissolved) / (ties t1) |
cross | density of network at t2 |
change | (changed dyads) / (all dyads) |
library(tergm)
#library(tsna)
library(dplyr)
sessioninfo::package_info(pkgs = c("network",
"ergm", "tergm"),
dependencies = FALSE)
package * version date (UTC) lib source
ergm * 4.6.0 2023-12-18 [1] CRAN (R 4.4.0)
network * 1.18.2 2023-12-05 [1] CRAN (R 4.4.0)
tergm * 4.2.0 2023-05-30 [1] CRAN (R 4.4.0)
[1] C:/Users/Martina Morris/AppData/Local/R/win-library/4.4
[2] C:/Program Files/R/R-4.4.0/library
Combined 1 networks on '.TimeID'/'.Time':
1: n = 18, directed = TRUE, bipartite = FALSE, loops = FALSE
Network attributes:
vertices = 18
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
ergm:
Length Class Mode
constraints 2 formula call
.PrevNet:
Combined 1 networks on '.TimeID'/'.Time':
1: n = 18, directed = TRUE, bipartite = FALSE, loops = FALSE
Network attributes:
vertices = 18
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
ergm:
Length Class Mode
constraints 2 -none- call
total edges = 55
missing edges = 0
non-missing edges = 55
density = 0.1797386
Vertex attributes:
.Time:
integer valued attribute
18 values
.TimeID:
integer valued attribute
18 values
cloisterville:
logical valued attribute
attribute summary:
Mode FALSE TRUE
logical 12 6
group:
character valued attribute
attribute summary:
Loyal Outcasts Turks Waverers
5 3 7 3
vertex.names:
character valued attribute
18 valid vertex names
No edge attributes
Network edgelist matrix:
[,1] [,2]
[1,] 1 3
[2,] 1 5
[3,] 1 14
[4,] 2 1
[5,] 2 7
[6,] 2 14
[7,] 3 1
[8,] 3 2
[9,] 3 17
[10,] 4 5
[11,] 4 6
[12,] 4 10
[13,] 5 4
[14,] 5 11
[15,] 5 13
[16,] 6 1
[17,] 6 4
[18,] 6 9
[19,] 7 2
[20,] 7 8
[21,] 7 16
[22,] 8 1
[23,] 8 2
[24,] 8 9
[25,] 9 5
[26,] 9 8
[27,] 9 16
[28,] 10 4
[29,] 10 8
[30,] 10 14
[31,] 11 5
[32,] 11 8
[33,] 11 14
[34,] 12 1
[35,] 12 2
[36,] 12 14
[37,] 13 5
[38,] 13 7
[39,] 13 18
[40,] 14 1
[41,] 14 11
[42,] 14 12
[43,] 14 15
[44,] 15 1
[45,] 15 2
[46,] 15 14
[47,] 16 1
[48,] 16 2
[49,] 16 7
[50,] 17 3
[51,] 17 13
[52,] 17 18
[53,] 18 1
[54,] 18 2
[55,] 18 7
total edges= 57
missing edges= 0
non-missing edges= 57
Vertex attribute names:
.Time .TimeID cloisterville group vertex.names
No edge attributes
Onsets and termini not specified, assuming each network in network.list should have a discrete spell of length 1
Argument base.net not specified, using first element of network.list instead
Created net.obs.period to describe network
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 1 until 3
Temporal mode: discrete
Time unit: step
Suggested time increment: 1
Cross sectional stats and form/diss changes from tsna
samp.stats <- cbind(edges = tsna::tErgmStats(samp.dyn, "~edges"),
form = tsna::tEdgeFormation(samp.dyn),
diss = tsna::tEdgeDissolution(samp.dyn))
samp.stats
Time Series:
Start = 1
End = 3
Frequency = 1
edges form diss
1 55 55 0
2 57 22 20
3 0 0 57
dyads = network.dyadcount(samplk1)
type <- c("form", "diss", "cross", "change")
prob.form = samp.stats[2,"form"]/(dyads-samp.stats[1,"edges"])
prob.diss = samp.stats[2,"diss"]/network.edgecount(samplk1)
prob.cross = network.edgecount(samplk2)/dyads
prob.change = network.edgecount(xor(samplk1, samplk2))/dyads
def.form = "Fraction of empty dyads at t1 that formed a tie by t2"
def.diss = "Fraction of ties at t1 that dissolved by t2"
def.cross = "Density at t2 (probability of a tie)"
def.change = "Fraction of all dyads at t1 that changed state by t2"
data.frame(Component = type,
Definition = c(def.form, def.diss, def.cross, def.change),
Probability = c(prob.form, prob.diss, prob.cross, prob.change)) %>%
kableExtra::kable(caption = "Observed probabilities of tie change components in Sampson data",
digits=3) %>%
kableExtra::kable_styling(full_width=F,
bootstrap_options = c("striped"))
Component | Definition | Probability |
---|---|---|
form | Fraction of empty dyads at t1 that formed a tie by t2 | 0.088 |
diss | Fraction of ties at t1 that dissolved by t2 | 0.364 |
cross | Density at t2 (probability of a tie) | 0.186 |
change | Fraction of all dyads at t1 that changed state by t2 | 0.137 |
This model sets the tie formation rates based on the observed probability of tie formation from t1 to t2. But the other free parameter is not set, so the effective tie dissolution coef is 0, which means ties dissolve with probability 0.5.
We saw from the descriptives above that the observed probability of a tie dissolving from t1 to t2 is 0.36). So, what equilibrium density will be generated by this model? Higher? Lower? Or the same as the observed density in the Sampson data?
Call:
tergm(formula = samp.series ~ Form(~edges), estimate = "CMLE")
Conditional Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
Form(1)~edges -2.3427 0.2232 0 -10.5 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 424.2 on 306 degrees of freedom
Residual Deviance: 225.4 on 305 degrees of freedom
AIC: 227.4 BIC: 231.1 (Smaller is better. MC Std. Err. = 0)
The net result is a lower predicted equilibrium tie count, compared
to t2 (57). This suggests that the observed Sampson data are not
consistent with an implicit model tie dissolution probability of 50%. To
reproduce the observed density with this formation rate, one needs a
lower dissolution rate than 50%. And to get that, you would
need to add a diss
operator to the TERGM specification.
form.sim <- simulate(form.fit, nsim = 1,
time.slices = 1000,
nw.start = "first",
monitor = ~edges,
stats = T,
output = "stats")
plot(as.numeric(form.sim$stats), type="l",
ylim = c(0, max(form.sim$stats)+10),
main = "Cross-sectional edgecount in simulated network timeseries",
sub = "Model: Form(~edges)",
ylab="edges")
abline(h=target.edges, col="red")
text(x=5, y=(min(form.sim$stats)), labels="Simulated", adj=0, cex=0.7)
text(x=5, y=(target.edges+10), labels="Observed", adj=0, cex=0.7, col="red")
obs sim
edges 57 45.4
This model sets the tie dissolution rates based on the observed probability of tie formation from t1 to t2. But the other free parameter is not set, so the effective tie formation coef is 0, which means ties form with probability 0.5.
We saw from the descriptives above that the observed probability of a tie forming from t1 to t2 is 0.09). So, what equilibrium density do you think will be generated by this model? Higher? Lower? Or the same as the observed density in the Sampson data?
Call:
tergm(formula = samp.series ~ Diss(~edges), estimate = "CMLE")
Conditional Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
Diss(1)~edges -0.5596 0.2803 0 -1.996 0.0459 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 424.2 on 306 degrees of freedom
Residual Deviance: 420.1 on 305 degrees of freedom
AIC: 422.1 BIC: 425.8 (Smaller is better. MC Std. Err. = 0)
The simulated networks have a much higher predicted equilibrium tie
count, compared to t2 (57). This suggests that the implicit model
formation rate of 50% is too high. Here you would need to add a
form
operator to the TERGM model specification to control
the formation rate, in order to reproduce the observed density.
diss.sim <- simulate(diss.fit, nsim = 1,
time.slices = 1000,
nw.start = "first",
monitor = ~edges,
stats = T,
output = "stats")
plot(as.numeric(diss.sim$stats), type="l",
main = "Cross-sectional edgecount in simulated network timeseries",
sub = "Model: Diss(~edges)",
ylab="edges",
ylim = c(0, 200))
abline(h=target.edges, col="red")
text(x=5, y=(min(diss.sim$stats)), labels="Simulated", adj=0, cex=0.7)
text(x=5, y=(target.edges+10), labels="Observed", adj=0, cex=0.7, col="red")
obs sim
edges 57 177.4
This model sets the target cross-sectional tie count to the t2 observed value. It is the one operator that does not explicitly condition on the state of the dyad at t1. With only two time steps of data, this is equivalent to fitting an ergm on the t2 network.
So, what equilibrium density do you think will be generated by this model? Higher? Lower? Or the same as observed at t2?
Call:
tergm(formula = samp.series ~ Cross(~edges), estimate = "CMLE")
Conditional Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
Cross(1)~edges -1.4744 0.1468 0 -10.04 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 424.2 on 306 degrees of freedom
Residual Deviance: 294.2 on 305 degrees of freedom
AIC: 296.2 BIC: 300 (Smaller is better. MC Std. Err. = 0)
Here, the equilibrium tie count stochastically reproduces the t2 value and the observed density is preserved.
The simulation effectively operates like a cross-sectional ergm:
drawing a network from the distribution specified by the model at each
time step, with no information used from the time step before. The way
the changes are distributed between “formations” and “dissolutions” is
controlled by the coefficient on the edges
term: as in an
ergm, it represents the log-odds of a tie (here in the t2 network), so
simulations from this model will reproduce the density observed at
t2.
cross.sim <- simulate(cross.fit, nsim = 1,
time.slices = 1000,
nw.start = "first",
monitor = ~edges,
stats = T,
output = "stats")
plot(as.numeric(cross.sim$stats), type="l",
ylim = c(0, (max(cross.sim$stats)+10)),
main = "Cross-sectional edgecount in simulated network timeseries",
sub = "Model: Cross(~edges)",
ylab="edges")
abline(h=target.edges, col="red")
text(x=5, y=(max(cross.sim$stats)+10), labels="Simulated", adj=0, cex=0.7)
text(x=5, y=max(cross.sim$stats), labels="Observed", adj=0, cex=0.7, col="red")
obs sim
edges 57 56.8
This model sets the dyad change rates based on the observed probability of a dyad changing state from t1 to t2. Dyads are selected at random and changed with this probability – there is no bias towards either formation or dissolution.
What equilibrium density do you think this model will generate? Higher? Lower? Same as observed?
Call:
tergm(formula = samp.series ~ Change(~edges), estimate = "CMLE")
Conditional Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
Change(1)~edges -1.8383 0.1661 0 -11.07 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 424.2 on 306 degrees of freedom
Residual Deviance: 244.8 on 305 degrees of freedom
AIC: 246.8 BIC: 250.5 (Smaller is better. MC Std. Err. = 0)
The equilibrium density for this model is 0.5, well above the observed density at t2 (0.19).
The changes are distributed at random in this model.
If the starting network has density less than 0.5, more of the dyads selected for toggle will be empty, so there are more potential formations and the network will become more dense over time.
If the starting density is greater than 0.5, more of the dyads selected will be tied, so there are more potential dissolutions and the network will become more sparse over time.
In both cases, the density moves towards 50%, and is stable once it gets there.
change.sim <- simulate(change.fit, nsim = 1,
time.slices = 1000,
nw.start = "first",
monitor = ~edges,
stats = T,
output = "stats")
plot(as.numeric(change.sim$stats), type="l",
ylim = c(0, 200), # so the target line is visible
main = "Cross-sectional edgecount in simulated network timeseries",
sub = "Model: Change(~edges)",
ylab="edges")
abline(h=target.edges, col="red")
text(x=5, y=200, labels="Simulated", adj=0, size=4)
text(x=5, y=(target.edges+10), labels="Observed", adj=0, col="red")
obs sim
edges 57 153.1
Here, we verify that the estimated exp(coefficient)
values are equal to the odds of each tie component as calculated by
hand.
As shown in the table below, they are equal, which means we can
interpret the coefficients in terms of the probabilities of each change
component. The coefficients are on the log-odds scale, so to obtain the
corresponding probability, we transform using
p(tie change component) = 1/(1+exp(coef)))
.
As with ergm
coefficients, these tergm
coefficients can be interpreted in terms of the conditional probability
at the dyad level (conditioning on the state of the rest of the
network).
In the dynamic context, this is the probability of a dyad changing state, if that change satisfies the definition of this component operator: a formation, dissolution, cross-sectional tie or a change in state.
odds.form = prob.form / (1-prob.form)
odds.diss = prob.diss / (1-prob.diss)
odds.cross = prob.cross / (1-prob.cross)
odds.change = prob.change / (1-prob.change)
ecoef.form = exp(coef(form.fit))
ecoef.diss = exp(coef(diss.fit))
ecoef.cross = exp(coef(cross.fit))
ecoef.change = exp(coef(change.fit))
data.frame(
exp.coef = c(ecoef.form, ecoef.diss, ecoef.cross, ecoef.change),
handcalc.odds = c(odds.form, odds.diss, odds.cross, odds.change),
handcalc.prob = c(prob.form, prob.diss, prob.cross, prob.change)) %>%
kableExtra::kable(
caption = "Estimated exp(coef) vs. hand calculated odds and probabilities",
digits=3) %>%
kableExtra::kable_styling(bootstrap_options = c("striped"))
exp.coef | handcalc.odds | handcalc.prob | |
---|---|---|---|
Form(1)~edges | 0.096 | 0.096 | 0.088 |
Diss(1)~edges | 0.571 | 0.571 | 0.364 |
Cross(1)~edges | 0.229 | 0.229 | 0.186 |
Change(1)~edges | 0.159 | 0.159 | 0.137 |
coef <- c(NA_real_,
coef(form.fit),
coef(diss.fit),
coef(cross.fit),
coef(change.fit))
mon.edges <- c(form.mon[1],
form.mon[2],
diss.mon[2],
cross.mon[2],
change.mon[2])
names(coef)[1] <- "obs"
data.frame(coef=round(coef,1), edges=round(mon.edges)) %>%
kableExtra::kable(caption = "Compare coefs and obs to equilibrium edgecounts",
digits=3) %>%
kableExtra::kable_styling(bootstrap_options = c("striped"))
coef | edges | |
---|---|---|
obs | NA | 57 |
Form(1)~edges | -2.3 | 45 |
Diss(1)~edges | -0.6 | 177 |
Cross(1)~edges | -1.5 | 57 |
Change(1)~edges | -1.8 | 153 |
It’s always a good idea to conduct some EDA on your temporal network before you jump into the specifying and testing statistical models for the data.
The goal of temporal network EDA and graphics is to give some insight into the structure and dynamics of your network and to highlight properties of interest. Both finding and displaying these properties well is as much of an art as a science. And if it’s your first time doing this, it can be somewhat overwhelming.
As noted in the main body of this tutorial, there is a separate workshop focused on temporal network EDA using the Statnet tools. So this appendix is just intended to give a sense of the kinds of issues temporal network EDA presents, and the implications this has for workflow and presentations.
The example here uses one of the datasets included in the
ndtv
package: short.stergm.sim
We explored it
in the ndtv section of the workshop. It is
a model-based simulation of the Padgett’s Florentine business network
over 25 time steps. Since we explored the proximity timeline and network
movie in the workshop above, we’ll focus on other properties and
graphical displays here.
This simulated network provides a good platform for examining some of the challenges you can face when trying to design graphics for dynamic networks.
The first thing to do is to look at the network summary and the time-collapsed plot.
NetworkDynamic properties:
distinct change times: 25
maximal time range: 0 until 25
Includes optional net.obs.period attribute:
Network observation period info:
Number of observation spells: 1
Maximal time range observed: 0 until 25
Temporal mode: discrete
Time unit: step
Suggested time increment: 1
Network attributes:
vertices = 16
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
net.obs.period: (not shown)
total edges= 32
missing edges= 0
non-missing edges= 32
Vertex attribute names:
active priorates totalties vertex.names wealth
Edge attribute names:
active
# Let's pull the vertex names for convenience
vnames <- get.vertex.attribute(short.stergm.sim, "vertex.names")
# What does the time collapsed network look like? We save the coordinates here to use later
set.seed(16)
coords <- plot(short.stergm.sim,
label = vnames,
label.cex = 0.7,
main = "Collapsed time network")
This time, let’s start by looking at the Forward Reachable Sets (FRS)
from all vertices. We can display this as a matrix of the temporal
distances calculated by the tPath
function.
tp <- matrix(rep(NA, 16*16), nrow=16)
for(i in 1:16) {
tp[i,] <- tPath(short.stergm.sim, v=i, direction = 'fwd')$tdist
}
print(tp)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 0 Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf
[2,] Inf 0 Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf
[3,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[4,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[5,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[6,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[7,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[8,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[9,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[10,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[11,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[12,] Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 0 Inf
[13,] Inf Inf 13 13 13 13 13 13 13 13 13 Inf 0
[14,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[15,] 21 Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf
[16,] Inf Inf 0 0 0 0 0 0 0 0 0 Inf 13
[,14] [,15] [,16]
[1,] Inf 21 Inf
[2,] Inf Inf Inf
[3,] 0 Inf 0
[4,] 0 Inf 0
[5,] 0 Inf 0
[6,] 0 Inf 0
[7,] 0 Inf 0
[8,] 0 Inf 0
[9,] 0 Inf 0
[10,] 0 Inf 0
[11,] 0 Inf 0
[12,] Inf Inf Inf
[13,] 13 Inf 21
[14,] 0 Inf 0
[15,] Inf 0 Inf
[16,] 0 Inf 0
This is a type of adjacency matrix, but the elements represent the time slice that a column vertex first enters the forward reachable set of the row vertex. This does not mean that a tie is formed between these two vertices (though it may have been), but instead that a tie is formed that creates a time-ordered path from the row vertex, through perhaps several intervening nodes and links, to the column vertex.
Inf
means that there is no reachable path between the
row and column verticesTake a minute to read and interpret the patterns.
The FRS have size, structure and timing information that determine the kinds of graphical displays will (or will not) work well for capturing and communicating key properties. What are some properties of these paths that might be helpful to summarize?
[1] 1 0 11 11 11 11 11 11 11 11 11 0 11 11 1 11
This shows the distribution of all FRS sizes. Note there are only 3 values: 0 (for nodes that are not able to reach any other node), 1 and 11. Think about what this suggests for the overall structure of the network.
[1] 0 21 13
Only three time slices show any changes to the FRS. Note that this doesn’t mean ties aren’t forming and dissolving during the other time slices, just that those changes don’t influence the FRS of any node.
# we'll use a common set of coordinates to facilitate change identification
par(mfrow=c(1,3), mar = c(1,1,1,1), oma = c(0,0,2,0))
plot(network.extract(short.stergm.sim, at=0), coord=coords, pad=1,
label = vnames,
label.cex = 0.7,
main = "Time = 1")
plot(network.extract(short.stergm.sim, at=13), coord=coords, pad=1,
label = vnames,
label.cex = 0.7,
main = "Time = 13")
plot(network.extract(short.stergm.sim, at=21), coord=coords, pad=1,
label = vnames,
label.cex = 0.7,
main = "Time = 21")
mtext("Network at the key time slices when FRS changes occur", outer=T, line=1, cex=0.9)
frs.min <- apply(ifelse(is.infinite(tp) | col(tp)==row(tp), NA, tp), 1, min, na.rm=T)
frs.max <- apply(ifelse(is.infinite(tp) | col(tp)==row(tp), NA, tp), 1, max, na.rm=T)
cbind(frs.min, frs.max)
frs.min frs.max
[1,] 21 21
[2,] Inf -Inf
[3,] 0 13
[4,] 0 13
[5,] 0 13
[6,] 0 13
[7,] 0 13
[8,] 0 13
[9,] 0 13
[10,] 0 13
[11,] 0 13
[12,] Inf -Inf
[13,] 13 21
[14,] 0 13
[15,] 21 21
[16,] 0 13
And again, there are just 3 patterns observed in the start and end times for the dynamic evolution of each FRS: [21,21] which means a single change at time=21 created this FRS, [0,13] which means the FRS is in place at the start of the observation period and the last change to this FRS is at time 13, and [13,21] which means an FRS that is established for the first time at time=13, and has a last change at time=21.
Based on this EDA, it seems like there are a limited number of patterns that show up in FRS – the sizes only take a few values, the change points are few, and the emergence and evolution occur at a small number of time slices. One way to communicate this would be to show examples of each pattern, and how these capture different parts of the network over time.
v1name <- vnames[1]
v3name <- vnames[3]
v13name <- vnames[13]
v1tp <- tPath(short.stergm.sim, v=1)
v3tp <- tPath(short.stergm.sim, v=3)
v13tp <- tPath(short.stergm.sim, v=13)
par(mfrow=c(1,3), mar = c(1,1,1,1), oma = c(0,0,2,0))
plot(v1tp, label = vnames, label.cex=0.7, edge.label.col="black", main = v1name, pad=3)
plot(v3tp, label = vnames, label.cex=0.7, edge.label.col="black", main = v3name, pad=3)
plot(v13tp, label = vnames, label.cex=0.7, edge.label.col="black", main = v13name, pad=3)
mtext("The 3 unique FRS configurations", outer=T, line=1, cex=0.9)
Is this the best way to summarize the FRS info?
par(mfrow=c(1,2), mar = c(1,1,1,1), oma = c(0,0,2,0))
plotPaths(short.stergm.sim, list(v1tp, v3tp, v13tp),
main = "Default plot")
plotPaths(short.stergm.sim, list(v1tp, v3tp, v13tp), pad=2,
coord=coords,
path.col=hcl.colors(3, "Dynamic", alpha=0.7),
label.cex=0.7,
label.col="grey",
edge.label.cex=0.8,
arrowhead.cex=3,
main = "Modified specifications", cex.main=0.8)
mtext("Time collapsed network with 3 FRS overlaid", outer=T, line=1, cex=0.9)
my.cols <- hcl.colors(3, "Dynamic", alpha=0.7)
par(mfrow=c(1,3), mar = c(1,1,1,1), oma = c(0,0,2,0))
plotPaths(short.stergm.sim, v3tp,
coord=coords, pad=1,
path.col=my.cols[1],
label.cex=0.8,
label.col="grey",
edge.label.cex=0.9,
edge.label.col="black",
arrowhead.cex=3,
main = paste(v3name, "(v3)"))
plotPaths(short.stergm.sim, v13tp,
coord=coords, pad=1,
path.col=my.cols[2],
label.cex=0.8,
label.col="grey",
edge.label.cex=0.9,
edge.label.col="black",
arrowhead.cex=3,
main = paste(v13name, "(v13)"))
plotPaths(short.stergm.sim, v1tp,
coord=coords, pad=1,
path.col=my.cols[3],
label.cex=0.8,
label.col="grey",
edge.label.cex=0.9,
edge.label.col="black",
arrowhead.cex=3,
main = paste(v1name, "(v1)"))
mtext("Separate FRS overlays in multiple panel plot", outer=T, line=1, cex=0.9)
As you can see, there are lots of ways to display this temporal information. That gives you a great deal of control over the display, but it also means there are LOTS of choices that you will need to make.
We’ll stop here, and we hope the examples have given you some ideas about how to proceed on your own projects, and some template code to play around with.