Last updated: June 19, 2015 [//]: (Last updated 06-19-2015)

This tutorial is a joint product of the Statnet Development Team:

Mark S. Handcock (University of California, Los Angeles)
Carter T. Butts (University of California, Irvine)
David R. Hunter (Penn State University)
Steven M. Goodreau (University of Washington)
Skye Bender de-Moll (Oakland)
Pavel N. Krivitsky (University of Wollongong)
Martina Morris (University of Washington) For general questions and comments, please refer to statnet users group and mailing list
http://statnet.csde.washington.edu/statnet_users_group.shtml

## 1. Getting Started

Open an R session, and set your working directory to the location where you would like to save this work.

To install all of the packages in the statnet suite:

install.packages('statnet')
library(statnet)

Or, to only install the specific statnet packages needed for this tutorial:

install.packages('tergm') # will install the network package
install.packages('sna')

After the first time, to update the packages one can either repeat the commands above, or use:

update.packages('name.of.package')

For this tutorial, we will need one additional package (coda), which is recommended (but not required) by ergm:

install.packages('coda')

Make sure the packages are attached:

library(statnet)

or

library(tergm)
library(sna)
library(coda)

Check package version

# latest versions:  tergm 3.3.0, ergm 3.4.0, network 1.12.0, networkDynamic 0.7.1 (as of 6/22/2015)
sessionInfo()

Set seed for simulations – this is not necessary, but it ensures that we all get the same results (if we execute the same commands in the same order).

set.seed(0)

## 2. A quick review of static ERGMs

Exponential-family random graph models (ERGMs) represent a general class of models based in exponential-family theory for specifying the probability distribution underlying a set of random graphs or networks. Within this framework, one can—among other tasks—obtain maximum-likehood estimates for the parameters of a specified model for a given data set; simulate additional networks with the underlying probability distribution implied by that model; test individual models for goodness-of-fit, and perform various types of model comparison.

The basic expression for the ERGM class can be written as:

$$P(Y=y)=\frac{\exp(\theta'g(y))}{k(\theta)}$$

where:

• Y is the random variable for the state of the network (with realization y)

• $$g(y)$$ is the vector of model statistics for network y,

• $$\theta$$ is the vector of coefficients for those statistics

• $$k(\theta)$$ represents the quantity in the numerator summed over all possible networks (typically constrained to be all networks with the same node set as y).

This can be re-expressed in terms of the conditional log-odds of a single actor pair:

$$\operatorname{logit}{P(Y_{ij}=1 \mid y^{c}_{ij}) = \ln{\frac{P(Y_{ij}=1 \mid y^{c}_{ij})}{P(Y_{ij}=0 \mid y^{c}_{ij})}} = \theta'\delta(g(y))_{ij}}$$

where:

• $$Y_{ij}$$ is the random variable for the state of the actor pair $$i,j$$ (with realization $$y_{ij}$$)

• $$y^{c}_{ij}$$ signifies the complement of $$y_{ij}$$, i.e. all dyads in the network other than $$y_{ij}$$.

• $$\delta(g(y))_{ij}$$ equals $$g(y^{+}_{ij})-g(y^{-}_{ij})$$, where

• $$y^{+}_{ij}$$ is defined as $$y^{c}_{ij}$$ along with $$y_{ij}$$ set to 1

• $$y^{-}_{ij}$$ is defined as $$y^{c}_{ij}$$ along with $$y_{ij}$$ set to 0.

• That is, $$\delta(g(y))_{ij}$$ equals the value of $$g(y)$$ when $$y_{ij}=1$$ minus the value of $$g(y)$$ when $$y_{ij}=0$$, but all other dyads are as in $$g(y)$$. This emphasizes the log-odds of an individual tie conditional on all others.

• We call $$g(y)$$ the statistics of the model, and $$\delta(g(y))_{ij}$$ the change statistics for actor pair $$y_{ij}$$.

Fitting an ERGM usually begins with obtaining data:

data("florentine")
ls()
plot(flobusiness)

To refresh our memories on ERGM syntax, let us fit a cross-sectional example. Just by looking at the plot of flobusiness, we might guess that there are more triangles than expected by chance for a network of this size and density, and thus that there is some sort of explicit triangle closure effect going on. One useful way to model this effect in ERGMs that has been explored in the literature is with a gwesp statistic. When the alpha parameter of this statistic equals 0, then the statistic counts the number of edges in the network that are in at least one triangle. You can see this by summarizing this network and then counting them for yourself in the visualization:

summary(flobusiness~edges+gwesp(0, fixed=T))
        edges gwesp.fixed.0
15            12 
fit1 <- ergm(flobusiness~edges+gwesp(0,fixed=T))
Starting maximum likelihood estimation via MCMLE:
Iteration 1 of at most 20:
The log-likelihood improved by 0.2804
Step length converged once. Increasing MCMC sample size.
Iteration 2 of at most 20:
The log-likelihood improved by 0.003476
Step length converged twice. Stopping.

This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(fit1)

==========================
Summary of model fit
==========================

Formula:   flobusiness ~ edges + gwesp(0, fixed = T)

Iterations:  2 out of 20

Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges          -3.3792     0.6010      0 < 1e-04 ***
gwesp.fixed.0   1.5797     0.5674      0 0.00625 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Null Deviance: 166.4  on 120  degrees of freedom
Residual Deviance:  78.1  on 118  degrees of freedom

AIC: 82.1    BIC: 87.67    (Smaller is better.) 

How to interpret the coefficients? Conditional on the rest of the network, a given edge’s log-odds of existing depends on how much its presence changes the number of edges in at least one triangle in the network.

• If the answer is 0 (i.e. the two incident nodes have no relational partners in common), the log-odds are $$-3.354$$.
• If the answer is 1 then the log-odds are $$-3.354 + 1.551 = -1.803$$.
• Is the answer is 2 then the log-odds are $$-3.354 + 1.551{\times}2 = -0.252$$.
• The corresponding probabilities are $$3.4\%$$, $$14.1\%$$, and $$43.7\%$$.
• How can adding one edge change the number of edges in at least one triangle by 2?

With the estimation in place, we can simulate a new network from the given model:

sim1 <- simulate(fit1,nsim=1,
control=control.simulate.ergm(MCMC.burnin=1000))
plot(sim1)

## 3. An Introduction to STERGMs (non-technical)

• Separable Temporal ERGMs (STERGMs) are an extension of ERGMs for modeling dynamic networks in discrete time, introduced in Krivitsky and Handcock (2010).

• The cross-sectional ERGM entails a single network, and a single model on that network. STERGMs, in contrast, posit two models: one ERGM underlying relational formation, and a second one underlying relational dissolution. Specifying a STERGM thus entails writing two ERGM formulas instead of one.

• STERGMs also requires dynamic data, of course; such data can come in many forms, and we will cover a few examples today.

• This approach is not simply a methodological development, but a theoretical one as well, and one which matches common sense for many social processes. Think of romantic relations. It seems intuitive that the statistical model underlying relational formation (i.e. affecting who becomes partners with whom, out of the set of people who aren’t already) is likely to be different than the model underlying relational dissolution (i.e. affecting who breaks up with whom, out of the set of people currently in relationships). Any reasonable model of the former would need to include a variety of homophily parameters (mixing on age, for example). The latter may or may not. (Conditional on being in a relationship, does your difference in age affect your probability of breaking up? Perhaps, but probably not as fundametally or as strongly as for formation).

• In thinking about how STERGMs work, and how they relate to different data forms, it can be helpful to think of the approximation commonly used in epidemiology to consider basic disease dynamics:

$$prevalence = incidence x duration$$

• To translate this to dynamic networks: first think of a Bernoulli model where all ties are equally likely. This expression indicates that the number of ties present in the cross-section (prevalence) is a function of the rate at which they form (incidence) and the rate at which they dissolve (1/duration). The faster ties form, the more ties that are present in the cross-section. And the slower ties dissolve, the more ties that are present in the cross-section.

• This same concept extends to more elaborate models beyond just Bernoulli. Imagine a model with triangles in it—the faster that triangles are formed and the slower that they’re dissolved, the more that will be present in the cross-section.

• Using the notation from the ERGM review above: the two equations governing a STERGM are commonly called the formation equation and the dissolution (or persistence) equation, respectively:

$$\ln{\frac{P(Y_{ij,t+1}=1 \mid y^{c}_{ij}, Y_{ij,t}=0)}{P(Y_{ij,t+1}=0\mid y^{c}_{ij}, Y_{ij,t}=0)}} = \theta^+ \delta(g^+(y))_{ij}$$

$$\ln{\frac{P(Y_{ij,t+1}=1 \mid y^{c}_{ij}, Y_{ij,t}=1)}{P(Y_{ij,t+1}=0\mid y^{c}_{ij}, Y_{ij,t}=1)}} = \theta^- \delta(g^-(y))_{ij}$$

• These largely parallel the one ERGM equation. We have simply:

• added a time index to the tie values

• added in a new conditional. In the formation equation, the expression is conditional on the tie not existing at the prior time step, and in the dissolution equation it is conditional on the tie existing; and

• defined two $$\theta$$ and $$g$$ vectors instead of just one. Now there are $$\theta^+$$ and $$g^+(y)_{ij}$$, reflecting the coefficients and statistics in the formation model, and $$\theta^-$$ and $$g^-(y)_{ij}$$ reflecting those in the dissolution model.

• Notice that for the dissolution model, the $$P(Y_{ij,t+1}=1)$$ is in the numerator and the $$P(Y_{ij,t+1}=0)$$ is in the denominator. This is needed to make the mathematics parallel to that of the formation model; however, it is also what causes the model to be an expression of relational persistence rather than of dissolution. Of course, the probability of dissolution is merely 1 - persistence, so the model captures both phenomena, and conceptually we often think about relational dissolution in parallel with formation.

• To understand the implications of the two $$\theta$$ and $$g$$ vectors, imagine a model for romantic relationships that captures propensities for:

• people who are currently in one relationship to not form a new relationship as often as single people do; and

• those few people who are in two ongoing relationships to dissolve one of them more often than others do.

• Both the formation and dissolution (persistence) equations would contain the ergm term “concurrent”, which counts the number of nodes in the network of degree 2 or more.

• In both models, the coefficient for the concurrent term would be negative; new ties are relatively unlikely to form if they will generate a new node of degree 2 or more. And they are relatively unlikely to persist if they maintain the current number of nodes of degree 2 or more, when they could bring it down by dissolving.

• Finally, remember that the s in stergm stands for separable; this refers to the fact that the two models assume that relational formation and dissolution occur independently of one another within a time step. We explore how this works and its implications in the examples.

## 4. An Introduction to STERGMs (more formal)

To understand STERGMs formally, we first review the ERGM framework for cross-sectional or static networks again, this time with more complete notation. Let $$\mathbb{Y}\subseteq \{1,\dotsc,n\}^2$$ be the set of potential relations (dyads) among $$n$$ nodes, ordered for directed networks and unordered for undirected. We can represent a network $$\mathbf{y}$$ as a set of ties, with the set of possible sets of ties, $$\mathcal{Y}\subseteq 2^\mathbb{Y}$$, being the sample space: $$\mathbf{y} \in \mathcal{Y}$$. Let $$\mathbf{y}_{ij}$$ be 1 if $$(i,j)\in\mathbf{y}$$ — a relation of interest exists from $$i$$ to $$j$$ — and 0 otherwise.

The network also has an associated covariate array $$\mathbf{X}$$ containing attributes of the nodes, the dyads, or both. An exponential-family random graph model (ERGM) represents the pmf of $$\mathbf{Y}$$ as a function of a $$p$$-vector of network statistics $$g(\mathbf{Y},\mathbf{X})$$, with parameters $$\theta \in \mathbb{R}^p$$, as follows:

$$\mbox{Pr}\left(\mathbf{Y} = \mathbf{y} \mid \mathbf{X}\right) = \frac{\exp\left\{\theta\cdot g(\mathbf{y}, \mathbf{X})\right\} } {k(\theta, \mathbf{X}, \mathcal{Y})},$$

where the normalizing constant $$k(\theta, \mathbf{X}, \mathcal{Y}) = \sum\limits_{\mathbf{y}' \in \mathcal{Y}}\exp\left\{\theta\cdot g(\mathbf{y}', \mathbf{X})\right\}$$ is a summation over the space of possible networks on $$n$$ nodes, $$\mathcal{Y}$$. Where $$\mathcal{Y}$$ and $$\mathbf{X}$$ are held constant, as in a typical cross-sectional model, they may be suppressed in the notation. Here, on the other hand, the dependence on $$\mathcal{Y}$$ and $$\mathbf{X}$$ is made explicit.

In modeling the transition from a network $$\mathbf{Y}^{t}$$ at time $$t$$ to a network $$\mathbf{Y}^{t+1}$$ at time $$t+1$$, the separable temporal ERGM assumes that the formation and dissolution of ties occur independently from each other within each time step, with each half of the process modeled as an ERGM. For two networks (sets of ties) $$\mathbf{y},\mathbf{y}'\in \mathcal{Y}$$, let $$\mathbf{y} \supseteq \mathbf{y}'$$ if any tie present in $$\mathbf{y}'$$ is also present in $$\mathbf{y}$$. Define $$\mathcal{Y}^+(\mathbf{y})=\{\mathbf{y}'\in\mathcal{Y}:\mathbf{y}'\supseteq\mathbf{y}\}$$, the networks that can be constructed by forming ties in $$\mathbf{y}$$; and $$\mathcal{Y}^-(\mathbf{y})=\{\mathbf{y}'\in\mathcal{Y}:\mathbf{y}'\subseteq\mathbf{y}\}$$, the networks that can be constructed dissolving ties in $$\mathbf{y}$$.

Given $$\mathbf{y}^{t}$$, a formation network $$\mathbf{Y}^+$$ is generated from an ERGM controlled by a $$p$$-vector of formation parameters $$\theta^+$$ and formation statistics $$g^+(\mathbf{y}^+, \mathbf{X})$$, conditional on only adding ties:

$$\Pr\left(\mathbf{Y}^+ = \mathbf{y}^+\mid \mathbf{Y}^{t};\theta^+\right) = \frac{\exp\left\{\theta^+\cdot g^+(\mathbf{y}^+, \mathbf{X})\right\} } {k\left(\theta^+, \mathbf{X}, \mathcal{Y}^+(\mathbf{Y}^{t})\right)}, \quad \mathbf{y}^+ \in \mathcal{Y}^+(\mathbf{y}^{t})$$.

A dissolution network $$\mathbf{Y}^-$$ is simultaneously generated from an ERGM controlled by a (possibly different) $$q$$-vector of dissolution parameters $$\theta^-$$ and corresponding statistics $$g^-(\mathbf{y}^-, \mathbf{X})$$, conditional on only dissolving ties from $$\mathbf{y}^{t}$$, as follows:

$$\Pr\left(\mathbf{Y}^- = \mathbf{y}^-\mid \mathbf{Y}^{t};\theta^-\right) = \frac{\exp\left\{\theta^-\cdot g^-(\mathbf{y}^-, \mathbf{X})\right\} } {k\left(\theta^-, \mathbf{X}, \mathcal{Y}^-(\mathbf{Y}^{t})\right)}, \quad \mathbf{y}^- \in \mathcal{Y}^-(\mathbf{y}^{t}).$$

The cross-sectional network at time $$t+1$$ is then constructed by applying the changes in $$\mathbf{Y}^+$$ and $$\mathbf{Y}^-$$ to $$\mathbf{y}^{t}$$, as follows: $$\mathbf{Y}^{t+1} = \mathbf{Y}^{t}\cup (\mathbf{Y}^+ - \mathbf{Y}^{t}) \, - \, ( \mathbf{Y}^{t} - \mathbf{Y}^-).$$ which simplifies to either: $$\mathbf{Y}^{t+1} = \mathbf{Y}^+ - (\mathbf{Y}^{t} - \mathbf{Y}^-)$$ $$\mathbf{Y}^{t+1} = \mathbf{Y}^-\cup (\mathbf{Y}^+ - \mathbf{Y}^{t})$$

Visually, we can sum this up as:

## 5. Notes on model specification and syntax

Within statnet, an ERGM involves one network and one set of network statistics, so these are specified together using R’s formula notation:

my.network $$\sim$$ my.vector.of.model.terms

For a call to stergm, there is still one network (or a list of networks), but two formulas. These are now passed as three separate arguments: the network(s) (argument nw), the formation formula (argument formation), and the dissolution formula (argument dissolution). The latter two both take the form of a one-sided formula. E.g.:

        stergm(my.network,
formation= ~edges+kstar(2),
dissolution= ~edges+triangle
)

There are other features of a call to either ergm or stergm that will be important; we list these features here, and illustrate them in one or more examples during the workshop.

• To fix the coefficient for a particular network statistic, one uses offset notation.
For instance, to fix a dissolution model with only an edges term with parameter value 4.2, the dissolution formula would be:
dissolution = ~offset(edges)

and the corresponding argument for passing the parameter value would be:

offset.coef.diss = 4.2

• In parallel with ergm, any information used to specify the nature of the fitting algorithm is passed by specifying a vector called control.stergm to the control argument. For example:
control=control.stergm(MCMC.burnin=10000)

For a list of options, type ?control.stergm

• Another argument that the user must supply is estimate, which controls the estimation method. Unlike with cross-sectional ERGMs, there is not necessarily an obvious default here, as different scenarios are best fit with different approaches. The most important for the new user to recognize are EGMME (equilibrium generalized method of moments estimation) and CMLE (conditional maximum likelihood estimation). A good rule of thumb is that when fitting to two (or more) networks (i.e. with panel data), one should use estimate="CMLE"; and when fitting to a single cross-section with some duration information, use estimate="EGMME".

## 6. Example 1: Multiple network cross-sections

One common form of data for modeling dynamic network processes consists of observations of network structure at two or more points in time on the same node set. Many classic network studies were of this type, and data of this form continue to be collected and analyzed.

Let us consider the famous Sampson monastery data:

data(samplk)
ls()
[1] "fit1"        "flobusiness" "flomarriage" "samplk1"     "samplk2"
[6] "samplk3"     "sim1"       

To pass them into stergm, we need to combine them into a list:

samp <- list()
samp[[1]] <- samplk1
samp[[2]] <- samplk2
samp[[3]] <- samplk3

Now we must decide on a model to fit to them. Plotting one network:

plot(samplk1)