Last updated: 2024-06-23
This tutorial is a joint product of the Statnet Development Team:
Pavel N. Krivitsky (University of New South Wales)
Martina Morris (University of Washington)
Mark S. Handcock (University of California, Los Angeles)
Carter T. Butts (University of California, Irvine)
David R. Hunter (Penn State University)
Steven M. Goodreau (University of Washington)
Chad Klumb (University of Washington)
Skye Bender de-Moll (Oakland, CA)
Michał Bojanowski (Kozminski University, Poland)
The specific network modeling software demonstrated in this tutorial
is authored by Pavel Krivitsky (ergm.count
,
ergm.rank
and latentnet
).
All Statnet packages are open-source, written for the R computing environment, and published on CRAN. The source repositories are hosted on GitHub. Our website is statnet.org
Need help? For general questions and comments, please email the Statnet users group at statnet_help@uw.edu. You’ll need to join the listserv if you’re not already a member. You can do that here: Statnet_help listserve.
Found a bug in our software? Please let us know by filing an issue in the appropriate package GitHub repository, with a reproducible example.
Want to request new functionality? We welcome suggestions – you can make a request by filing an issue on the appropriate package GitHub repository. The chances that this functionality will be developed are substantially improved if the requests are accompanied by some proposed code (we are happy to review pull requests).
For all other issues, please email us at contact@statnet.org.
If you have not already done so, please make sure that you have a
reasonably new version of R, preferably the latest (4.4.0) (R Core Team (2013)). Then, download and install
the latest versions of the Statnet (Handcock et
al. (2008),Goodreau et al. (2008))
packages, in particular ergm
version 4.6.0 (Hunter et al. (2008),Handcock et al. (2013),Krivitsky et al. (2023)),
ergm.count
version 4.1.2 (Krivitsky
(2013)), ergm.rank
version 4.1.1,
latentnet
version 2.11.0 (Krivitsky
and Handcock (2013)), and their dependencies. You can accomplish
this by typing:
install.packages("ergm.count")
install.packages("ergm.rank")
install.packages("latentnet")
update.packages()
and then
## Warning: package 'ergm.count' was built under R version 4.4.1
## Warning: package 'ergm.rank' was built under R version 4.4.1
network
and edge attributesnetwork
(Butts
(2008),Butts, Handcock, and Hunter (March
15, 2013)) objects have three types of attributes:
The number of attributes at each level is unlimited, and they may be
of any data type; thus, one could define e.g., a network in which every
edge contained a network whose edges contained sill other networks, if
one wished. This flexibility allows for network
objects to
be extended in many ways (as is done e.g. in the
networkDynamic
package, where edges carry complex timing
information). Usually, however, we encounter networks whose attributes
are numeric (or occasionally categorical). As we will see, there are a
number of shortcuts to make life easier in these cases.
Here, we will be especially interested in valued graphs, where edges carry some sort of meaningful value. These are typically stored as edge attributes. Note that an edge attribute is a property of an edge and not a dyad; as such, it is only defined for edges that exist in the network. Thus, in a matter of speaking, to set an edge value, one first has to create an edge and then set its attribute. (In some cases, one needs to associate a value with every dyad, whether or not an edge is present. Typically, these are encoded as network attributes.)
As with network and vertex attributes, edge attributes that have been
set can be listed with list.edge.attributes
. Every network
has at least one edge attribute: "na"
, which, if set to
TRUE
, marks an edge as missing. The ergm
package, in particular, is frequently able to account for edgewise
missingness, and draws this information from the na
attribute.
There are several ways to create valued networks for use with
ergm
. Here, we will demonstrate two of the most
straightforward approaches.
The first dataset that we’ll be using is the (in)famous Sampson’s
monks. Dataset samplk
in package ergm
contains
three (binary) networks: samplk1
, samplk2
, and
samplk3
, containing the Monks’ top-tree friendship
nominations at each of the three survey time points. We are going to
construct a valued network that pools these nominations.
Method 1: From a sociomatrix In many cases, a valued sociomatrix is available (or can easily be constructed). In this case, we’ll build one from the Sampson data.
data(samplk)
ls()
## [1] "est" "mod" "newc.fit1" "newc.fit15"
## [5] "newcomb" "samplk.ct.d2G3" "samplk.d2G3" "samplk.d2G3r"
## [9] "samplk.ecol" "samplk.nm.e" "samplk.nm.l" "samplk.tot"
## [13] "samplk.tot.el" "samplk.tot.ergm" "samplk.tot.m" "samplk.tot.nm"
## [17] "samplk.tot.nm.nz" "samplk.tot2" "samplk1" "samplk2"
## [21] "samplk3" "sim.binom3" "sim.binom3.m1" "sim.binom3.p1"
## [25] "sim.du3" "sim.geo0" "sim.geom" "sim.pois"
## [29] "sim.trgeo.m1" "sim.trgeo.p1" "true" "valmat"
## [33] "y" "Z.K.ref" "Z.ref" "zach"
## [37] "zach.d2G2S" "zach.ecol" "zach.lead" "zach.obs"
## [41] "zach.pois" "zach.sim" "zach.vcol"
as.matrix(samplk1)[1:5, 1:5]
## John Bosco Gregory Basil Peter Bonaventure
## John Bosco 0 0 1 0 1
## Gregory 1 0 0 0 0
## Basil 1 1 0 0 0
## Peter 0 0 0 0 1
## Bonaventure 0 0 0 1 0
# Create a sociomatrix totaling the nominations.
samplk.tot.m <- as.matrix(samplk1) + as.matrix(samplk2) + as.matrix(samplk3)
samplk.tot.m[1:5, 1:5]
## John Bosco Gregory Basil Peter Bonaventure
## John Bosco 0 1 2 0 2
## Gregory 3 0 0 0 0
## Basil 3 1 0 0 0
## Peter 0 0 0 0 3
## Bonaventure 1 0 0 3 0
# Create a network where the number of nominations becomes an attribute of an
# edge.
samplk.tot <- as.network(samplk.tot.m, directed = TRUE, matrix.type = "a", ignore.eval = FALSE,
names.eval = "nominations" # Important!
)
# Add vertex attributes. (Note that names were already imported!)
samplk.tot %v% "group" <- samplk1 %v% "group" # Groups identified by Sampson
samplk.tot %v% "group"
## [1] "Turks" "Turks" "Outcasts" "Loyal" "Loyal" "Loyal"
## [7] "Turks" "Waverers" "Loyal" "Waverers" "Loyal" "Turks"
## [13] "Waverers" "Turks" "Turks" "Turks" "Outcasts" "Outcasts"
# We can view the attribute as a sociomatrix.
as.matrix(samplk.tot, attrname = "nominations")[1:5, 1:5]
## John Bosco Gregory Basil Peter Bonaventure
## John Bosco 0 1 2 0 2
## Gregory 3 0 0 0 0
## Basil 3 1 0 0 0
## Peter 0 0 0 0 3
## Bonaventure 1 0 0 3 0
# Also, note that samplk.tot now has an edge if i nominated j *at least once*.
as.matrix(samplk.tot)[1:5, 1:5]
## John Bosco Gregory Basil Peter Bonaventure
## John Bosco 0 1 1 0 1
## Gregory 1 0 0 0 0
## Basil 1 1 0 0 0
## Peter 0 0 0 0 1
## Bonaventure 1 0 0 1 0
Method 2: Form an edgelist Sociomatrices are simple to work with, but not very convenient for large, sparse networks. In the latter case, edgelists are often preferred. For our present case, suppose that instead of a sociomatrix we have an edgelist with values:
samplk.tot.el <- as.matrix(samplk.tot, attrname = "nominations", matrix.type = "edgelist")
samplk.tot.el[1:5, ]
## [,1] [,2] [,3]
## [1,] 2 1 3
## [2,] 3 1 3
## [3,] 5 1 1
## [4,] 6 1 2
## [5,] 7 1 1
# and an initial empty network.
samplk.tot2 <- samplk1 # Copy samplk1
samplk.tot2[, ] <- 0 # Empty it out
samplk.tot2 #We could also have used network.initialize(18)
## Network attributes:
## vertices = 18
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 0
## missing edges= 0
## non-missing edges= 0
##
## Vertex attribute names:
## cloisterville group vertex.names
##
## No edge attributes
samplk.tot2[samplk.tot.el[, 1:2], names.eval = "nominations", add.edges = TRUE] <- samplk.tot.el[,
3]
as.matrix(samplk.tot2, attrname = "nominations")[1:5, 1:5]
## John Bosco Gregory Basil Peter Bonaventure
## John Bosco 0 1 2 0 2
## Gregory 3 0 0 0 0
## Basil 3 1 0 0 0
## Peter 0 0 0 0 3
## Bonaventure 1 0 0 3 0
In general, the construction
net[i,j, names.eval="attrname", add.edges=TRUE] <- value
can be used to modify individual edge values for attribute
"attrname"
. This way, we can also add more than one edge
attribute to a network. Note that network objects can support an almost
unlimited number of vertex, edge, or network attributes, and that these
attributes can contain any data type. (Not all data types are compatible
with all interface methods; see ?network
and related
documentation for more information.)
The other dataset we’ll be using is almost as (in)famous Zachary’s Karate Club dataset. We will be employing here a collapsed multiplex network that counts the number of social contexts in which each pair of individuals associated with the Karate Club in question interacted. A total of 8 contexts were considered, but as the contexts themselves were determined by the network process, this limit itself can be argued to be endogenous.
Over the course of the study, the club split into two factions, one
led by the instructor (“Mr. Hi”) and the other led by the Club President
(“John A.”). Zachary also recorded the faction alignment of every
regular attendee in the club. This dataset is included in the
ergm.count
package, as zach
.
The network
’s plot
method for
network
s can be used to plot a sociogram of a network. When
plotting a valued network, we it is often useful to color the ties
depending on their value. Function gray
can be used to
generate a gradient of colors, with gray(0)
generating
black and gray(1)
generating white. This can then be passed
to the edge.col
argument of plot.network
.
Sampson’s Monks For the monks, let’s pass value data using a matrix.
par(mar = rep(0, 4))
samplk.ecol <- matrix(gray(1 - (as.matrix(samplk.tot, attrname = "nominations")/3)),
nrow = network.size(samplk.tot))
plot(samplk.tot, edge.col = samplk.ecol, usecurve = TRUE, edge.curve = 1e-04, displaylabels = TRUE,
vertex.col = as.factor(samplk.tot %v% "group"))
Edge color can also be passed as a vector of colors corresponding to
edges. It’s more efficient, but the ordering in the vector must
correspond to network
object’s internal ordering, so it
should be used with care. Note that we can also vary line width and/or
transparency in the same manner:
par(mar = rep(0, 4))
valmat <- as.matrix(samplk.tot, attrname = "nominations") #Pull the edge values
samplk.ecol <- matrix(rgb(0, 0, 0, valmat/3), nrow = network.size(samplk.tot))
plot(samplk.tot, edge.col = samplk.ecol, usecurve = TRUE, edge.curve = 1e-04, displaylabels = TRUE,
vertex.col = as.factor(samplk.tot %v% "group"), edge.lwd = valmat^2)
plot.network
has may display options that can be used to
customize one’s data display; see ?plot.network
for
more.
Zachary’s Karate Club In the following plot, we plot those strongly aligned with Mr. Hi as red, those with John A. with purple, those neutral as green, and those weakly aligned with colors in between.
ergm.count
Many of the functions in package ergm
, including
ergm
, simulate
, and summary
, have
been extended to handle networks with valued relations. They switch into
this “valued” mode when passed the response
argument,
specifying the name of the edge attribute to use as the response
variable. For example, a new valued term sum
evaluates the
sum of the values of all of the relations: \(\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j}\).
So,
## Error: ERGM term 'sum' function 'InitErgmTerm.sum' not found.
produces an error (because no such term has been implemented for binary mode), while
gives the summary statistics. We will introduce more statistics shortly. First, we need to introduce the notion of valued ERGMs.
For a more in-depth discussion of the following, see (Krivitsky (2012)).
Valued ERGMs differ from standard ERGMs in two related ways. First, the support of a valued ERGM (unlike its unvalued counterpart) is over a set of valued graphs; this is a substantial difference from the unvalued case, as valued graph support sets (even for fixed \(N\)) are often infinite (or even uncountable). Secondly, in defining a valued ERGM one must specify the reference measure (or distribution) with respect to which the model is defined. (In the unvalued case, there is a generic way to do this, which we employ tacitly – that is no longer the case for general valued ERGMs.) We discuss some of these issues further below.
Notationally, a valued ERGM (for discrete variables) looks like this: \[\text{Pr}_{h,\boldsymbol{g}}(\boldsymbol{Y}=\boldsymbol{y};\boldsymbol{\theta})=\frac{h(\boldsymbol{y})\exp\mathchoice{\left({\boldsymbol{\theta}{}}^\top{\boldsymbol{g}(\boldsymbol{y})}\right)}{({\boldsymbol{\theta}{}}^\top{\boldsymbol{g}(\boldsymbol{y})})}{({\boldsymbol{\theta}{}}^\top{\boldsymbol{g}(\boldsymbol{y})})}{({\boldsymbol{\theta}{}}^\top{\boldsymbol{g}(\boldsymbol{y})})}}{\kappa_{h,\boldsymbol{g}}(\boldsymbol{\theta})},\ {\boldsymbol{y}\in\mathcal{Y}},\] where \(\mathcal{Y}\) is the support. The normalizing constant is defined by \[\kappa_{h,\boldsymbol{g}}(\boldsymbol{\theta})=\sum_{\boldsymbol{y}\in\mathcal{Y}}h(\boldsymbol{y})\exp\mathchoice{\left({\boldsymbol{\theta}{}}^\top{\boldsymbol{g}(\boldsymbol{y})}\right)}{({\boldsymbol{\theta}{}}^\top{\boldsymbol{g}(\boldsymbol{y})})}{({\boldsymbol{\theta}{}}^\top{\boldsymbol{g}(\boldsymbol{y})})}{({\boldsymbol{\theta}{}}^\top{\boldsymbol{g}(\boldsymbol{y})})}.\] The similarity with ERGMs in the unvalued case is evident, notwithstanding the above caveats.
New concept: a reference distribution With binary
ERGMs, we only concern ourselves with presence and absence of ties among
actors — who is connected with whom? If we want to model values as well,
we need to think about who is connected with whom and how
strong or intense these connections are. In particular, we need to think
about how the values for connections we measure are distributed. The
reference distribution (a reference measure, for the
mathematically inclined) specifies the model for the data
before we add any ERGM terms, and is the first step in modeling
these values. The reference distribution is specified using a one-sided
formula as a reference
argument to an ergm
or
simulate
call. Running
will list the choices implemented in the various packages, and are given as a one-sided formula.
Conceptually, it has two ingredients: the sample space and the baseline distribution (\(h(\boldsymbol{y})\)). An ERGM that “borrows” these from a distribution \(X\) for which we have a name is called an \(X\)-reference ERGM.
The sample space For binary ERGMs, the sample space (or support) \(\mathcal{Y}\) — the set of possible networks that can occur — is usually some subset of \(2^\mathbb{Y}\), the set of all possible ways in which relationships among the actors may occur.
For the sample space of valued ERGMs, we need to define \(\mathbb{S}\), the set of possible values each relationship may take. For example, for count data, that’s \(\mathbb{S}=\{0,1,\dotsc, s \}\) if the maximum count is \( s \) and \(\{0,1,\dotsc\}\) if there is no a priori upper bound. Having specified that, \(\mathcal{Y}\) is defined as some subset of \(\mathbb{S}^\mathbb{Y}\): the set of possible ways to assign to each relationship a value.
As with binary ERGMs, other constraints like degree distribution may be imposed on \(\mathcal{Y}\).
\(h(\boldsymbol{y})\): The baseline distribution What difference does it make?
Suppose that we have a sample space with \(\mathbb{S}=\{0,1,2,3\}\) (e.g., number of monk–monk nominations) and let’s have one ERGM term: the sum of values of all relations: \(\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j}\): \[\text{Pr}_{h,\boldsymbol{g}}(\boldsymbol{Y}=\boldsymbol{y};\boldsymbol{\theta})\propto h(\boldsymbol{y})\exp\mathchoice{\left(\boldsymbol{\theta}{} \sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j}\right)}{(\boldsymbol{\theta}{} \sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j})}{(\boldsymbol{\theta}{} \sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j})}{(\boldsymbol{\theta}{} \sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j})}.\] There are two values for \(h(\boldsymbol{y})\) that might be familiar:
What do they look like? Let’s simulate!
y <- network.initialize(2, directed = FALSE) # A network with one dyad!
## Discrete Uniform reference 0 coefficient: discrete uniform
sim.du3 <- simulate(y ~ sum, coef = 0, reference = ~DiscUnif(0, 3), response = "w",
output = "stats", nsim = 1000)
# Negative coefficient: truncated geometric skewed to the right
sim.trgeo.m1 <- simulate(y ~ sum, coef = -1, reference = ~DiscUnif(0, 3), response = "w",
output = "stats", nsim = 1000)
# Positive coefficient: truncated geometric skewed to the left
sim.trgeo.p1 <- simulate(y ~ sum, coef = +1, reference = ~DiscUnif(0, 3), response = "w",
output = "stats", nsim = 1000)
# Plot them:
par(mfrow = c(1, 3))
hist(sim.du3, breaks = diff(range(sim.du3)) * 4)
hist(sim.trgeo.m1, breaks = diff(range(sim.trgeo.m1)) * 4)
hist(sim.trgeo.p1, breaks = diff(range(sim.trgeo.p1)) * 4)
## Binomial reference 0 coefficient: Binomial(3,1/2)
sim.binom3 <- simulate(y ~ sum, coef = 0, reference = ~Binomial(3), response = "w",
output = "stats", nsim = 1000)
# -1 coefficient: Binomial(3, exp(-1)/(1+exp(-1)))
sim.binom3.m1 <- simulate(y ~ sum, coef = -1, reference = ~Binomial(3), response = "w",
output = "stats", nsim = 1000)
# +1 coefficient: Binomial(3, exp(1)/(1+exp(1)))
sim.binom3.p1 <- simulate(y ~ sum, coef = +1, reference = ~Binomial(3), response = "w",
output = "stats", nsim = 1000)
# Plot them:
par(mfrow = c(1, 3))
hist(sim.binom3, breaks = diff(range(sim.binom3)) * 4)
hist(sim.binom3.m1, breaks = diff(range(sim.binom3.m1)) * 4)
hist(sim.binom3.p1, breaks = diff(range(sim.binom3.p1)) * 4)
Now, suppose that we don’t have an a priori upper bound on the counts — \(\mathbb{S}=\{0,1,\dotsc\}\) — then there are two familiar reference distributions:
sim.geom <- simulate(y ~ sum, coef = log(2/3), reference = ~Geometric, response = "w",
output = "stats", nsim = 1000)
mean(sim.geom)
## [1] 1.978
sim.pois <- simulate(y ~ sum, coef = log(2), reference = ~Poisson, response = "w",
output = "stats", nsim = 1000)
mean(sim.pois)
## [1] 1.979
Similar means. But, what do they look like?
par(mfrow = c(1, 2))
hist(sim.geom, breaks = diff(range(sim.geom)) * 4)
hist(sim.pois, breaks = diff(range(sim.pois)) * 4)
Where did log(2)
and log(2/3)
come from?
Later.
Warning: Parameter space constrints What happens if we simulate from a geometric-reference ERGM with all coefficients set to 0?
par(mfrow = c(1, 1))
sim.geo0 <- simulate(y ~ sum, coef = 0, reference = ~Geometric, response = "w", output = "stats",
nsim = 100, control = control.simulate(MCMC.burnin = 0, MCMC.interval = 1))
mean(sim.geo0)
## [1] 7558.59
Why does it do that? Because \[
\text{Pr}_{h,\boldsymbol{g}}(\boldsymbol{Y}=\boldsymbol{y};\boldsymbol{\theta})=\frac{\exp\mathchoice{\left(\boldsymbol{\theta}\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j}\right)}{(\boldsymbol{\theta}\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j})}{(\boldsymbol{\theta}\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j})}{(\boldsymbol{\theta}\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j})}}{\kappa_{h,\boldsymbol{g}}(\boldsymbol{\theta})}
\] for \(\boldsymbol{\theta}\ge
0\), is not a valid distribution, because \(\kappa_{h,\boldsymbol{g}}(\boldsymbol{\theta})=\infty\).
Using reference=~Geometric
can be dangerous for this
reason. This issue only arises with ERGMs that have an infinite sample
space.
Valued ERGM terms require a separate implementation in the
ergm
framework, and not all binary network features
generalize in a unique and obvious way. Therefore, the collection of
binary terms is separate from that of valued terms. For example, here’s
what happens when we try to use triangles
in a valued
ERGM:
## Error: ERGM term 'triangles' function 'InitWtErgmTerm.triangles' not found.
You can obtain the list of valued terms visible to ergm
with
or with ?ergmTerm
and looking for those tagged
“(valued)
” or “(val)
”.
However, there are some important special cases for which the terms do translate directly.
Many of the familiar ERGM effects can be modeled using the very same terms in the valued case, but applied a little differently.
Any dyad-independent binary ERGM statistic can be expressed as \(\boldsymbol{g}_{\text{k}}=\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{x}_{k,i,j}\boldsymbol{y}_{i,j}\) for some covariate matrix \(\boldsymbol{x}_k\). If \(\boldsymbol{y}_{i,j}\) is allowed to have values other than \(0\) and \(1\), then simply using such a term in a Poisson-reference ERGM creates the familiar log-linear effect. Similarly, in a Binomial-reference ERGM, such terms produce an effect on log-odds of a success.
The good news is that almost every dyad-independent ergm
term has been reimplemented to allow this. It is invoked by specifying
“form="sum"
” argument for one of the terms inherited from
binary ERGMs, though this not required, as it’s the default. Also, note
that for valued ERGMs, the “intercept” term is sum
, not
edges
.
has the complete list across all the loaded packages.
Example: Sampson’s Monks For example, we can fit the equivalent of logistic regression on the probability of nomination, with every ordered pair of monks observed 3 times. We will look at differential homophily on group. That is, \(\boldsymbol{Y}\!_{i,j}{\stackrel{\mathrm{ind.}}{\sim}}\, \text{Binomial}(3,\boldsymbol{\pi}_{i,j})\) where \[ \begin{align*} \text{logit}(\boldsymbol{\pi}_{i,j}) & = \boldsymbol{\beta}_1 + \boldsymbol{\beta}_2 \mathbb{I}\left(\text{$i$ and $j$ are both in the Loyal Opposition}\right) \\ & + \boldsymbol{\beta}_3 \mathbb{I}\left(\text{$i$ and $j$ are both Outcasts}\right) + \boldsymbol{\beta}_4 \mathbb{I}\left(\text{$i$ and $j$ are both Young Turks}\right) \\ & + \boldsymbol{\beta}_5 \mathbb{I}\left(\text{$i$ and $j$ are both Waverers}\right) \end{align*} \]
samplk.tot.nm <- ergm(samplk.tot ~ sum + nodematch("group", diff = TRUE, form = "sum"),
response = "nominations", reference = ~Binomial(3))
mcmc.diagnostics(samplk.tot.nm)
Note that it looks like it’s fitting the model twice. This is because the first run is using an approximation technique called constrastive divergence to find a good starting value for the MLE fit.
summary(samplk.tot.nm)
## Call:
## ergm(formula = samplk.tot ~ sum + nodematch("group", diff = TRUE,
## form = "sum"), response = "nominations", reference = ~Binomial(3))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## sum -2.3209 0.1353 0 -17.159 <1e-04 ***
## nodematch.sum.group.Loyal 2.3106 0.2897 0 7.975 <1e-04 ***
## nodematch.sum.group.Outcasts 3.5385 0.6053 0 5.846 <1e-04 ***
## nodematch.sum.group.Turks 2.1960 0.2206 0 9.955 <1e-04 ***
## nodematch.sum.group.Waverers 1.0911 0.5745 0 1.899 0.0576 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 0.0 on 306 degrees of freedom
## Residual Deviance: -568.4 on 301 degrees of freedom
##
## Note that the null model likelihood and deviance are defined to be 0.
## This means that all likelihood-based inference (LRT, Analysis of
## Deviance, AIC, BIC, etc.) is only valid between models with the same
## reference distribution and constraints.
##
## AIC: -558.4 BIC: -539.7 (Smaller is better. MC Std. Err. = 5.183)
Based on this, we can say that the odds of a monk nominating another monk not in the same group during a given time step are \(\exp\mathchoice{\left(\boldsymbol{\beta}_1\right)}{(\boldsymbol{\beta}_1)}{(\boldsymbol{\beta}_1)}{(\boldsymbol{\beta}_1)}=\exp\mathchoice{\left(-2.3209\right)}{(-2.3209)}{(-2.3209)}{(-2.3209)}=0.0982\), that the odds of a Loyal Opposition monk nominating another Loyal Opposition monk are \(\exp\mathchoice{\left(\boldsymbol{\beta}_2\right)}{(\boldsymbol{\beta}_2)}{(\boldsymbol{\beta}_2)}{(\boldsymbol{\beta}_2)}=\exp\mathchoice{\left(2.3106\right)}{(2.3106)}{(2.3106)}{(2.3106)}=10.0802\) times higher, etc..
Example: Zachary’s Karate Club We will use a Poisson log-linear model for the number of contexts in which each pair of individuals interacted, as a function of whether this individual is a faction leader (Mr. Hi or John A.) That is, \(\boldsymbol{Y}\!_{i,j}{\stackrel{\mathrm{ind.}}{\sim}}\text{Poisson}(\boldsymbol{\mu}_{i,j})\) where \[ \log(\boldsymbol{\mu}_{i,j})=\boldsymbol{\beta}_1 + \boldsymbol{\beta}_2 (\mathbb{I}\left(\text{$i$ is a faction leader}\right) + \mathbb{I}\left(\text{$j$ is a faction leader}\right)) \]
We will do this by constructing a dummy variable, a vertex attribute
"leader"
:
# Vertex attr. 'leader' is TRUE for Hi and John, FALSE for others.
zach %v% "leader" <- zach %v% "role" != "Member"
zach.lead <- ergm(zach ~ sum + nodefactor("leader"), response = "contexts", reference = ~Poisson)
mcmc.diagnostics(zach.lead)
NB: We could also write
“nodefactor(~role!="Member")
” to get the same result.
summary(zach.lead)
## Call:
## ergm(formula = zach ~ sum + nodefactor("leader"), response = "contexts",
## reference = ~Poisson)
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## sum -1.22668 0.08827 0 -13.90 <1e-04 ***
## nodefactor.sum.leader.TRUE 1.45294 0.12549 0 11.58 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 0.0 on 561 degrees of freedom
## Residual Deviance: -351.9 on 559 degrees of freedom
##
## Note that the null model likelihood and deviance are defined to be 0.
## This means that all likelihood-based inference (LRT, Analysis of
## Deviance, AIC, BIC, etc.) is only valid between models with the same
## reference distribution and constraints.
##
## AIC: -347.9 BIC: -339.2 (Smaller is better. MC Std. Err. = 2.579)
Based on this, we can say that the expected number of contexts of interaction between two non-leaders is \(\exp\mathchoice{\left(\boldsymbol{\beta}_1\right)}{(\boldsymbol{\beta}_1)}{(\boldsymbol{\beta}_1)}{(\boldsymbol{\beta}_1)}=\exp\mathchoice{\left(-1.2267\right)}{(-1.2267)}{(-1.2267)}{(-1.2267)}=0.2933\), that the expected number of contexts of interaction between a leader and a non-leader is \(\exp\mathchoice{\left(\boldsymbol{\beta}_2\right)}{(\boldsymbol{\beta}_2)}{(\boldsymbol{\beta}_2)}{(\boldsymbol{\beta}_2)}=\exp\mathchoice{\left(1.4529\right)}{(1.4529)}{(1.4529)}{(1.4529)}=4.2757\) times higher, and that the expected number of contexts of interaction between the two leaders is \(\exp\mathchoice{\left(2\boldsymbol{\beta}_2\right)}{(2\boldsymbol{\beta}_2)}{(2\boldsymbol{\beta}_2)}{(2\boldsymbol{\beta}_2)}=\exp\mathchoice{\left(2\cdot 1.4529\right)}{(2\cdot 1.4529)}{(2\cdot 1.4529)}{(2\cdot 1.4529)}=18.2814\) times higher than that between two non-leaders. (Because the leaders were hostile to each other, this may not be a very good prediction!)
It is often the case that in networks of counts, the network is sparse, yet if two actors do interact, their interaction count is relatively high. This amounts to zero-inflation.
We can model this using the binary-ERGM-based terms with the term
nonzero
(\(\boldsymbol{g}_{\text{k}}=\sum_{{{(i,j)}\in\mathbb{Y}}}\mathbb{I}\left(\boldsymbol{y}_{i,j}\ne
0\right)\)) and GLM-style terms with argument
form="nonzero"
: \(\boldsymbol{g}_{\text{k}}=\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{x}_{k,i,j}\mathbb{I}\left(\boldsymbol{y}_{i,j}\ne
0\right)\). For example,
samplk.tot.nm.nz <- ergm(samplk.tot ~ sum + nonzero + nodematch("group", diff = TRUE,
form = "sum"), response = "nominations", reference = ~Binomial(3))
mcmc.diagnostics(samplk.tot.nm.nz)
summary(samplk.tot.nm.nz)
## Call:
## ergm(formula = samplk.tot ~ sum + nonzero + nodematch("group",
## diff = TRUE, form = "sum"), response = "nominations", reference = ~Binomial(3))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## sum -0.3333 0.2025 0 -1.646 0.0998 .
## nonzero -2.9829 0.3271 0 -9.120 <1e-04 ***
## nodematch.sum.group.Loyal 1.2210 0.2184 0 5.591 <1e-04 ***
## nodematch.sum.group.Outcasts 2.0216 0.5019 0 4.028 <1e-04 ***
## nodematch.sum.group.Turks 1.1920 0.1717 0 6.944 <1e-04 ***
## nodematch.sum.group.Waverers 0.5844 0.4369 0 1.338 0.1810
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 0.0 on 306 degrees of freedom
## Residual Deviance: -647.5 on 300 degrees of freedom
##
## Note that the null model likelihood and deviance are defined to be 0.
## This means that all likelihood-based inference (LRT, Analysis of
## Deviance, AIC, BIC, etc.) is only valid between models with the same
## reference distribution and constraints.
##
## AIC: -635.5 BIC: -613.2 (Smaller is better. MC Std. Err. = 2.867)
fits a zero-modified Binomial model, with a coefficient on the number of non-zero relations \(-2.9829\) is negative and highly significant, indicating that there is an excess of zeros in the data relative to the binomial distribution, and given the rest of the model.
The following terms compute the number of dyads \({(i,j)}\) whose values \(\boldsymbol{y}_{i,j}\) fulfil their
respective conditions: atleast(threshold=0)
,
atmost(threshold=0)
,
equalto(value=0, tolerance=0)
,
greaterthan(threshold=0)
,
ininterval(lower=-Inf, upper=+Inf, open=c(TRUE,TRUE))
, and
smallerthan(threshold=0)
.
An operator (ergm
that takes other terms as
arguments) B(formula, form)
“wraps” binary terms for valued
ERGMs. This term is used in a valued ERGM, but formula
is a
binary ERGM formula. How formula
is treated
depends on form
, which can be one of the following:
"sum"
: emulates the GLM-style behavior described above,
for any dyad-independent terms in formula
."nonzero"
: emulates the zero-modification behavior
described above; formula
does not have to be
dyad-independent.ergm
term
(just one), with the following properties:
formula
on it. formula
does not have to be
dyad-independent.In general, when a valued term is available, it will typically be
faster than a binary term wrapped by B
. Similarly, while
B(~..., "nonzero")
is the same as
B(~..., ~nonzero)
, the former will typically be faster.
For example,
summary(samplk.tot~sum + nodematch("group",form="sum")
+ nonzero + nodematch("group",form="nonzero")
+ nonzero + nodematch("group",form="nonzero"),
response="nominations")
## sum nodematch.sum.group nonzero
## 168 106 88
## nodematch.nonzero.group nonzero nodematch.nonzero.group
## 51 88 51
summary(samplk.tot~B(~edges + nodematch("group"), form="sum")
+ B(~edges + nodematch("group"), form="nonzero")
+ B(~edges + nodematch("group"), form=~nonzero),
response="nominations")
## B(sum)~edges B(sum)~nodematch.group
## 168 106
## B(nonzero)~edges B(nonzero)~nodematch.group
## 88 51
## B(nonzero)~edges B(nonzero)~nodematch.group
## 88 51
ERGMs for polytomous data, whether nominal or ordinal is one of the oldest examples of valued ERGMs (Robins, Pattison, and Wasserman 1999).
For simplicity will use the Sampson’s Monks example, but this
approach can be viewed as a generalization of multinomial logistic
regression. We use DiscUnif
, or discrete uniform, included
in ergm
itself.
The example assumes assumes that the edge values are independent of
one another and take ordinal values that have the same interpretation
for each dyad. (This is not, in general, true: see
ergm.rank
section.)
We will use the B
operator to construct new statistics
consisting of the number of edges with value \(k\) or higher, where \(k\) is 1, 2, or 3.
summary(samplk.tot ~ B( ~ edges, ~ atleast(1) ) + B( ~ edges, ~ atleast(2) )
+ B( ~ edges, ~ atleast(3) ), response = "nominations")
## B(atleast(1))~edges B(atleast(2))~edges B(atleast(3))~edges
## 88 50 30
Since there are \(18\times17\), or
306, possible edges, the summary statistics above tell us that the
valued network we have constructed has 30 edges with value 3, 20 with
value 2, 38 with value 1, and the remaining 218 with value 0. The ERGM
with these statistics has independent edges, where the probabilities an
edge takes the values 0, 1, 2, or 3 are given by \(1/D\), \(\exp(\theta_{1})/D\), \(\exp(\theta_{1}+\theta_{2})/D\), and \(\exp(\theta_{1}+\theta_{2}+\theta_{3})/D\),
respectively, where \[
D = 1 + \exp(\theta_{1}) + \exp(\theta_{1}+\theta_{2}) +
\exp(\theta_{1}+\theta_{2}+\theta_{3}).
\] We may verify that ergm
’s stochastic fitting
algorithm obtains MLEs very close to the exact values:
mod <- ergm(samplk.tot ~ B(~edges, ~atleast(1)) + B(~edges, ~atleast(2)) + B(~edges,
~atleast(3)), response = "nominations", reference = ~DiscUnif(0, 3), control = snctrl(seed = 123))
coef(mod)
## B(atleast(1))~edges B(atleast(2))~edges B(atleast(3))~edges
## -1.7390838 -0.6537426 0.3943245
true <- c(EdgeVal0 = 218, EdgeVal1 = 38, EdgeVal2 = 20, EdgeVal3 = 30)
est <- c(1, exp(cumsum(coef(mod))), use.names = FALSE)
rbind(True_Proportions = true/sum(true), Estimated_Proportions = est/sum(est))
## EdgeVal0 EdgeVal1 EdgeVal2 EdgeVal3
## True_Proportions 0.7124183 0.1241830 0.06535948 0.09803922
## Estimated_Proportions 0.7129665 0.1252549 0.06514451 0.09663417
In categorical data analysis, this is referred to as an adjacent-category logit model.
This example could have used the equalto
terms in place
of all the atleast
terms above. Then, the estimated
proportions would have been proportional to 1, \(\exp(\theta_{1})\), \(\exp(\theta_{2})\), and \(\exp(\theta_{3})\) instead of 1, \(\exp(\theta_{1})\), \(\exp(\theta_{1}+\theta_{2})\), and \(\exp(\theta_{1}+\theta_{2}+\theta_{3})\).
Such a model does not assume ordinality of the edge values, so it could
be used for a multinomial logit model in which the edges take
categorical non-ordered values. This would be a baseline-category
logit model.
Similarly, even if we may use Poisson as a starting distribution, the
counts might be overdispersed or underdispersed relative to it. Bear in
mind that overdispersed counts can sometimes be the result of unmodeled
heterogeneity - so, in some cases, it may be more effective to ask
whether there are omitted drivers of tie value differences that should
be added to the model, than to seek a uniform solution. However,
ergm
currently offers two ways to adjust the baseline tie
value distribution:
Conway–Maxwell–Poisson (Shmueli et al. 2005)
CMP
term to a Poisson- or
geometric-reference ERGM.Conway–Maxwell-binomial (Kadane 2016)
CMB(trials, coupled)
term to a
Binomial- or discrete-uniform-reference ERGM.coupled=TRUE
implying \(\boldsymbol{\theta}[\text{CMP}1]=\boldsymbol{\theta}[\text{CMP}2]\).Fractional moments (Krivitsky 2012)
sum(pow=1/2)
term to a
Poisson-reference ERGM.sum
the first two moments of \(\sqrt{\boldsymbol{y}_{i,j}}\).Mutuality ergm
binary
mutuality
statistic has the form \(\boldsymbol{g}_\leftrightarrow=\sum_{{{(i,j)}\in\mathbb{Y}}}\boldsymbol{y}_{i,j}\boldsymbol{y}_{j,i}\).
It turns out that directly plugging counts into that statistic is a bad
idea. mutuality(form)
is a valued ERGM term, permitting the
following generalizations:
"geometric"
: \(\sum_{{{(i,j)}\in\mathbb{Y}}}\sqrt{\boldsymbol{y}_{i,j}\boldsymbol{y}_{j,i}}\)
— can be viewed as uncentered covariance of variance-stabilized
counts"min"
: \(\sum_{{{(i,j)}\in\mathbb{Y}}}\min{\boldsymbol{y}_{i,j},\boldsymbol{y}_{j,i}}\)
— easiest to interpret"nabsdiff"
: \(\sum_{{{(i,j)}\in\mathbb{Y}}}-\lvert\boldsymbol{y}_{i,j}-\boldsymbol{y}_{j,i}\rvert\)The figure below visualizes their effects.
Individual heterogeneity Different actors may have different overall propensities to interact. This has been modeled using random effects (as in the \(p_2\) model) and using degeneracy-prone terms like \(k\)-star counts.
ergm
implements a number of statistics to model this
effect, but the one that seems to work best so far for valued data seems
to be \[\boldsymbol{g}_{\text{actor
cov.}}(\boldsymbol{y})=\sum_{i\in N}\frac{1}{n-2}\sum_{j,k\in
\mathbb{Y}_{i}\land
j<k}(\sqrt{\boldsymbol{y}_{i,j}}-\overline{\sqrt{\boldsymbol{y}}})(\sqrt{\boldsymbol{y}_{i,k}}-\overline{\sqrt{\boldsymbol{y}}}),\]
essentially a measure of covariance between the \(\sqrt{\boldsymbol{y}_{i,j}}\)s incident on
the same actor. The term nodecovar
implements it.
Triadic closure Finally, to generalize the notion of
triadic closure, ergm
implements very flexible
transitiveweights(twopath, combine, affect)
and similar
cyclicalweights
statistics. The transitive weight statistic
has the following general form: \[\boldsymbol{g}_{\text{$\boldsymbol{v}$}}(\boldsymbol{y})=\sum_{{{(i,j)}\in\mathbb{Y}}}v_{\text{affect}}\left(\boldsymbol{y}_{i,j},v_{\text{combine}}\left(v_{\text{2-path}}(\boldsymbol{y}_{i,k},\boldsymbol{y}_{k,j})_{k\in
N\backslash\{i,j\}}\right)\right),\] and can be “customized” by
varying the three functions
\(v_{\text{2-path}}\) Given \(\boldsymbol{y}_{i,k}\) and \(\boldsymbol{y}_{k,j}\), what is the strength of the two-path they form?
"min"
the minimum of their values — conservative"geomean"
their geometric mean — more able to detect
effects, but more likely to cause “degeneracy”\(v_{\text{combine}}\) Given the strengths of the two-paths \(\boldsymbol{y}_{i\to k\to j}\) for all \(k\ne i,j\), what is the combined strength of these two-paths between \(i\) and \(j\)?
"max"
the strength of the strongest path —
conservative; analogous to transitiveties
"sum"
the sum of path strength — more able to detect
effects, but more likely to cause “degeneracy;” analogous to
triangles
\(v_{\text{affect}}\) Given the combined strength of the two-paths between \(i\) and \(j\), how should they affect \(\boldsymbol{Y}\!_{i,j}\)?
"min"
conservative"geomean"
more able to detect effects, but more likely
to cause “degeneracy”These effects are analogous to mutuality.
Sampson’s Monks Suppose that we want to fit a model with a zero-modified Binomial baseline, mutuality, transitive (hierarchical) triads, and cyclical (antihierarchical) triads, to this dataset:
samplk.tot.ergm <- ergm(samplk.tot ~ sum + nonzero + mutual("min") + transitiveweights("min",
"max", "min") + cyclicalweights("min", "max", "min"), reference = ~Binomial(3),
response = "nominations")
mcmc.diagnostics(samplk.tot.ergm)
summary(samplk.tot.ergm)
## Call:
## ergm(formula = samplk.tot ~ sum + nonzero + mutual("min") + transitiveweights("min",
## "max", "min") + cyclicalweights("min", "max", "min"), response = "nominations",
## reference = ~Binomial(3))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## sum -0.04016 0.18859 0 -0.213 0.8314
## nonzero -3.47663 0.31766 0 -10.945 <1e-04 ***
## mutual.min 1.37951 0.23805 0 5.795 <1e-04 ***
## transitiveweights.min.max.min 0.21747 0.16153 0 1.346 0.1782
## cyclicalweights.min.max.min -0.25038 0.13642 0 -1.835 0.0664 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 0.0 on 306 degrees of freedom
## Residual Deviance: -609.3 on 301 degrees of freedom
##
## Note that the null model likelihood and deviance are defined to be 0.
## This means that all likelihood-based inference (LRT, Analysis of
## Deviance, AIC, BIC, etc.) is only valid between models with the same
## reference distribution and constraints.
##
## AIC: -599.3 BIC: -580.7 (Smaller is better. MC Std. Err. = 2.44)
What does it tell us? The negative coefficient on
nonzero
suggests zero-inflation, there is strong evidence
of mutuality, and the positive coefficient on transitive weights and
negative on the cyclical weights suggests hierarchy, but they are not
significant.
Zachary’s Karate Club Now, let’s try using Poisson to model the Zachary Karate Club data: a zero-modified Poisson, with potentially different levels of activity for the faction leaders, heterogeneity in actor activity level overall, and an effect of difference in faction membership, a model that looks like this:
summary(zach ~ sum + nonzero + nodefactor("leader") + absdiffcat("faction.id") +
nodecovar(center = TRUE, transform = "sqrt"), response = "contexts")
## sum nonzero
## 231.00000 78.00000
## nodefactor.sum.leader.TRUE absdiff.sum.faction.id.1
## 90.00000 74.00000
## absdiff.sum.faction.id.2 absdiff.sum.faction.id.3
## 12.00000 11.00000
## absdiff.sum.faction.id.4 nodecovar
## 10.00000 16.17248
Now, for the fit and the diagnostics:
zach.pois <- ergm(zach ~ sum + nonzero + nodefactor("leader") + absdiffcat("faction.id") +
nodecovar(center = TRUE, transform = "sqrt"), response = "contexts", reference = ~Poisson,
verbose = TRUE)
mcmc.diagnostics(zach.pois)
summary(zach.pois)
## Call:
## ergm(formula = zach ~ sum + nonzero + nodefactor("leader") +
## absdiffcat("faction.id") + nodecovar(center = TRUE, transform = "sqrt"),
## response = "contexts", reference = ~Poisson, verbose = TRUE)
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## sum 0.99180 0.08510 0 11.655 < 1e-04 ***
## nonzero -3.85599 0.25607 0 -15.058 < 1e-04 ***
## nodefactor.sum.leader.TRUE 0.20669 0.06440 0 3.210 0.001329 **
## absdiff.sum.faction.id.1 -0.09957 0.07976 0 -1.248 0.211903
## absdiff.sum.faction.id.2 -0.66393 0.19077 0 -3.480 0.000501 ***
## absdiff.sum.faction.id.3 -0.87951 0.22037 0 -3.991 < 1e-04 ***
## absdiff.sum.faction.id.4 -1.15342 0.23514 0 -4.905 < 1e-04 ***
## nodecovar 1.80162 0.21820 0 8.257 < 1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 0.0 on 561 degrees of freedom
## Residual Deviance: -837.5 on 553 degrees of freedom
##
## Note that the null model likelihood and deviance are defined to be 0.
## This means that all likelihood-based inference (LRT, Analysis of
## Deviance, AIC, BIC, etc.) is only valid between models with the same
## reference distribution and constraints.
##
## AIC: -821.5 BIC: -786.9 (Smaller is better. MC Std. Err. = 0.8369)
What does it tell us? The negative coefficient on
nonzero
suggests zero-inflation, the faction leaders
clearly have more activity than others, and the more ideologically
separated two individuals are, the less they interact. Over and above
that, there is some additional heterogeneity in how active individuals
are: if \(i\) has a lot of interaction
with \(j\), it is likely that \(i\) has more with \(j'\). Could this mean a form of
preferential attachment?
We can try seeing whether there is some friend of a friend effect above and beyond that. This can be done by fitting a model with transitivity and seeing whether the coefficient is significant, or we can perform a simulation test. In the following
simulate
unpacks the zach.pois
ERGM fit,
extracting the formula, the coefficient, and the rest of the
information.nsim
says how many networks to generate.output="stats"
says that we only want to see the
simulated statistics, not the networks.monitor=~transitiveweights("geomean","sum","geomean")
says that in addition to the statistics used in the fit, we want
simulate
to keep track of the transitive weights
statistic.We do not need to worry about degeneracy in this case, because we are not actually using that statistic in the model, only “monitoring” it.
# Simulate from model fit:
zach.sim <- simulate(zach.pois, monitor = ~transitiveweights("geomean", "sum", "geomean"),
nsim = 1000, output = "stats")
# What have we simulated?
colnames(zach.sim)
## [1] "sum"
## [2] "nonzero"
## [3] "nodefactor.sum.leader.TRUE"
## [4] "absdiff.sum.faction.id.1"
## [5] "absdiff.sum.faction.id.2"
## [6] "absdiff.sum.faction.id.3"
## [7] "absdiff.sum.faction.id.4"
## [8] "nodecovar"
## [9] "transitiveweights.geomean.sum.geomean"
# How high is the transitiveweights statistic in the observed network?
zach.obs <- summary(zach ~ transitiveweights("geomean", "sum", "geomean"), response = "contexts")
zach.obs
## transitiveweights.geomean.sum.geomean
## 288.9793
Let’s plot the density of the simulated values of transitive weights statistic:
par(mar = c(5, 4, 4, 2) + 0.1)
# 9th col. = transitiveweights
plot(density(zach.sim[, 9]))
abline(v = zach.obs)
# Where does the observed value lie in the simulated? This is a p-value for
# the Monte-Carlo test:
min(mean(zach.sim[, 9] > zach.obs), mean(zach.sim[, 9] < zach.obs)) * 2
## [1] 0.67
Looks like individual heterogeneity and faction alignment account for appearance of triadic effects. (Notably, the factions themselves may be endogenous, if social influence is a factor. Untangling selection from influence is hard enough when dynamic network data are available. We cannot do it here.)
ergm.rank
Above, we gave an example of polytomous data, encompassing the case of ordinal data where levels can be compared across dyads. Such data could arise if, for instance, we elicited an individual’s mental model for relationships among members of a group (a cognitive social structure), and asked them to rate perceived relationship strengths on a 5-point scale. While we would have no way to know what a rating of, say “2” corresponds to, we would know that “3” is higher than “2,” and that this is true for any pair of individuals. Now, however, imagine that we were to ask each member of the group to use such a scale to report on their relationships with others in the group. We cannot compare Sally’s rating of “2” for a relationship with Bob’s rating of “2” (perhaps a “2” constitutes a very strong tie from Sally’s point of view, while Bob thinks of “2” as very weak), and thus we cannot legitimately use the above approach for data analysis. We instead need a more thorougly general scheme for handling data that is ordinal within sources (and that cannot be compared across sources).
The ergm.rank
package provides tools for such analyses -
in particular, ergm.rank
focuses on the important case in
which each member of the network reports their perceived tie strength
with other members of the network, and ratings can only be compared
ordinally within ego. How this is done is described below. To
load the package, we invoke it in the usual fashion:
Note that the implementations so far are very slow, so we will only do a short example.
Suppose that we reprsent ranking (or ordinal rating) of \(j\) by \(i\) by the value of \(\boldsymbol{y}_{i,j}\). What reference can we use for ranks?
Let’s search:
To get help for this reference, use
More generally, we can see what reference distributions are visible
to ergm
:
For example, if ties are allowed, perhaps
DiscUnif(0,n-2)
may be more appropriate.
For details, see Krivitsky and Butts (2017). Note that in this setting, it is not meaningful to
The only thing we are allowed to do is to ask if \(i\) has ranked \(j\) over \(k\). This ensures that our results are invariant to independent order-preserving transformations of egos’ reports (which is what is needed to satisfy the constraints of our data).
Normal ERGM statistics do not in general satisfy these properties. Thus, ordinal relational data call for their own sufficient statistics. These will depend only on the ternary operator \[ \begin{equation*} \boldsymbol{y}_{i:\,j\succ k}\equiv\begin{cases} 1 & \text{if $j\stackrel{i}{\succ}k$ i.e., $i$ ranks $j$ above $k$;} \\ 0 & \text{otherwise.} \end{cases} \end{equation*} \] To get a live list of terms with this property, try:
We may interpret them using the promotion statistic \[\boldsymbol{\Delta}_{i,j}^\nearrow\boldsymbol{g}(\boldsymbol{y})\equiv \boldsymbol{g}(\boldsymbol{y}^{i:\,j\rightleftarrows j^+})-\boldsymbol{g}(\boldsymbol{y}).\]
Let \({N}^{k\ne}\) be the set of possible \(k\)-tuples of actor indices where no actors are repeated. Then,
rank.deference
: Deference
(aversion): Measures the amount of “deference” in the
network: configurations where an ego \(i\) ranks an alter \(j\) over another alter \(k\), but \(j\), in turn, ranks \(k\) over \(i\): \[
\boldsymbol{g}_{\text{D}}(\boldsymbol{y}) = \sum_{(i,j,l)\in {N}^{3\ne}}
\boldsymbol{y}_{l:\,j\succ i}\boldsymbol{y}_{i:\,l\succ j} \]
\[
\Delta_{i,j}^\nearrow\boldsymbol{g}_{\text{D}}(\boldsymbol{y}) = 2(
\boldsymbol{y}_{j^+:\,i\succ j}+\boldsymbol{y}_{j:\,j^+\succ i} - 1 ).
\] A lower-than-chance value of this statistic and/or a negative
coefficient implies a form of mutuality in the network.
rank.edgecov(x, attrname)
: Dyadic
covariates: Models the effect of a dyadic covariate on the
propensity of an ego \(i\) to rank
alter \(j\) highly: \[
\boldsymbol{g}_{\text{A}}(\boldsymbol{y};\boldsymbol{x}) =
\sum_{(i,j,k)\in {N}^{3\ne}} \boldsymbol{y}_{i:\,j\succ
k}(\boldsymbol{x}_j-\boldsymbol{x}_k).\] \[
\Delta_{i,j}^\nearrow\boldsymbol{g}_{\text{A}}(\boldsymbol{y};\boldsymbol{x})=
2(\boldsymbol{x}_{j}-\boldsymbol{x}_{j^+}),\] See the
?rank.edgecov
ERGM term documentation for
arguments.
rank.inconsistency(x, attrname, weights, wtname, wtcenter)
:
(Weighted) Inconsistency: Measures the amount of
disagreement between rankings of the focus network and a fixed covariate
network x
, by counting the number of pairwise comparisons
for which the two networks disagree. x
can be a
network
with an edge attribute attrname
containing the ranks or a matrix of appropriate dimension containing the
ranks. If x
is not given, it defaults to the LHS network,
and if attrname
is not given, it defaults to the
response
edge attribute. \[\boldsymbol{g}_{\text{I}}(\boldsymbol{y};\boldsymbol{y}')
= \sum_{(i,j,k)\in{N}^{3\ne}_s} \left[ \boldsymbol{y}_{i:\,j\succ k}(
1-\boldsymbol{y}'_{i:\,j\succ k} ) +
\left(1-\boldsymbol{y}_{i:\,j\succ k}\right)
\boldsymbol{y}'_{i:\,j\succ k} \right],\] with promotion
statistic being simply \[
\Delta_{i,j}^\nearrow\boldsymbol{g}_{\text{I}}(\boldsymbol{y};\boldsymbol{y}')
= 2(\boldsymbol{y}'_{i:\,j^+\succ j}-\boldsymbol{y}'_{i:\,j\succ
j^+}).\] Optionally, the count can be weighted by the
weights
argument, which can be either a 3D \(n\times n\times n\)-array whose \((i,j,k)\)th element gives the weight for
the comparison by \(i\) of \(j\) and \(k\) or a function taking three arguments,
\(i\), \(j\), and \(k\), and returning the weight of this
comparison. If wtcenter=TRUE
, the calculated weights will
be centered around their mean. wtname
can be used to label
this term.
rank.nodeicov(attrname, transform, transformname)
:
Attractiveness/Popularity covariates: Models the
effect of a nodal covariate on the propensity of an actor to be ranked
highly by the others. \[
\boldsymbol{g}_{\text{A}}(\boldsymbol{y};\boldsymbol{x}) =
\sum_{(i,j,k)\in {N}^{3\ne}} \boldsymbol{y}_{i:\,j\succ
k}(\boldsymbol{x}_j-\boldsymbol{x}_k).\] \[
\Delta_{i,j}^\nearrow\boldsymbol{g}_{\text{A}}(\boldsymbol{y};\boldsymbol{x})=
2(\boldsymbol{x}_{j}-\boldsymbol{x}_{j^+}), \] See the
?nodeicov
ERGM term documentation for arguments.
rank.nonconformity(to, par)
:
Nonconformity: Measures the amount of
``nonconformity’’ in the network: configurations where an ego \(i\) ranks an alter \(j\) over another alter \(k\), but ego \(l\) ranks \(k\) over \(j\).
This statistic has an argument to
, which controls to
whom an ego may conform:
"all"
(the default) Nonconformity
to all egos is counted: \[
\boldsymbol{g}_{\text{GNC}}(\boldsymbol{y}) = \sum_{(i,j,k,l)\in
{N}^{4\ne}}\boldsymbol{y}_{l:\,j\succ
k}\left(1-\boldsymbol{y}_{i:\,j\succ k}\right) \] \[
\Delta_{i,j}^\nearrow\boldsymbol{g}_{\text{GNC}}(\boldsymbol{y}) =
2\sum_{l\in {N}\backslash\left\{i,j,j^+\right\}}(
\boldsymbol{y}_{l:\,j^+\succ j}-\boldsymbol{y}_{l:\,j\succ j^+} ).
\] A lower-than-chance value of this statistic and/or a negative
coefficient implies a degree of consensus in the network.
"localAND"
(Local
nonconformity) Nonconformity of \(i\) to ego \(l\) regarding the relative ranking of \(j\) and \(k\) is only counted if \(i\) ranks \(l\) over both \(j\) and \(k\): \[\boldsymbol{g}_{\text{LNC}}(\boldsymbol{y}) =
\sum_{(i,j,k,l)\in {N}^{4\ne}} \boldsymbol{y}_{i:\,l\succ j}
\boldsymbol{y}_{i:\,l\succ k} \boldsymbol{y}_{l:\,j\succ k}
(1-\boldsymbol{y}_{i:\,j\succ k})\] \[
\begin{align*}
\Delta_{i,j}^\nearrow\boldsymbol{g}_{\text{LNC}}(\boldsymbol{y})=\sum_{k\in
{N}\backslash\left\{i,j,j^+\right\}}(& \boldsymbol{y}_{i:\,k\succ
j^+}\boldsymbol{y}_{k:\,j^+\succ j}-\boldsymbol{y}_{i:\,k\succ
j^+}\boldsymbol{y}_{k:\,j\succ j^+}\\
\vphantom{\sum_{k\in
{N}\backslash\left\{i,j,j^+\right\}}}&+\boldsymbol{y}_{k:\,i\succ
j^+}\boldsymbol{y}_{k:\,j^+\succ j}-\boldsymbol{y}_{k:\,i\succ
j}\boldsymbol{y}_{k:\,j\succ j^+}\\
\vphantom{\sum_{k\in
{N}\backslash\left\{i,j,j^+\right\}}}&+\boldsymbol{y}_{j:\,k\succ
j^+}\boldsymbol{y}_{i:\,j^+\succ k}-\boldsymbol{y}_{j^+:\,k\succ
j}\boldsymbol{y}_{i:\,j\succ k}).
\end{align*}
\] A lower-than-chance value of this statistic and/or a negative
coefficient implies a form of hierarchical transitivity in the
network.
Consider Newcomb’s famous fraternity data:
data(newcomb)
as.matrix(newcomb[[1]], attrname = "rank")
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## 1 0 7 12 11 10 4 13 14 15 16 3 9 1 5 8 6 2
## 2 8 0 16 1 11 12 2 14 10 13 15 6 7 9 5 3 4
## 3 13 10 0 7 8 11 9 15 6 5 2 1 16 12 4 14 3
## 4 13 1 15 0 14 4 3 16 12 7 6 9 8 11 10 5 2
## 5 14 10 11 7 0 16 12 4 5 6 2 3 13 15 8 9 1
## 6 7 13 11 3 15 0 10 2 4 16 14 5 1 12 9 8 6
## 7 15 4 11 3 16 8 0 6 9 10 5 2 14 12 13 7 1
## 8 9 8 16 7 10 1 14 0 11 3 2 5 4 15 12 13 6
## 9 6 16 8 14 13 11 4 15 0 7 1 2 9 5 12 10 3
## 10 2 16 9 14 11 4 3 10 7 0 15 8 12 13 1 6 5
## 11 12 7 4 8 6 14 9 16 3 13 0 2 10 15 11 5 1
## 12 15 11 2 6 5 14 7 13 10 4 3 0 16 8 9 12 1
## 13 1 15 16 7 4 2 12 14 13 8 6 11 0 10 3 9 5
## 14 14 5 8 6 13 9 2 16 1 3 12 7 15 0 4 11 10
## 15 16 9 4 8 1 13 11 12 6 2 3 5 10 15 0 14 7
## 16 8 11 15 3 13 16 14 12 1 9 2 6 10 7 5 0 4
## 17 9 15 10 2 4 11 5 12 3 7 8 1 6 16 14 13 0
as.matrix(newcomb[[1]], attrname = "descrank")
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## 1 0 10 5 6 7 13 4 3 2 1 14 8 16 12 9 11 15
## 2 9 0 1 16 6 5 15 3 7 4 2 11 10 8 12 14 13
## 3 4 7 0 10 9 6 8 2 11 12 15 16 1 5 13 3 14
## 4 4 16 2 0 3 13 14 1 5 10 11 8 9 6 7 12 15
## 5 3 7 6 10 0 1 5 13 12 11 15 14 4 2 9 8 16
## 6 10 4 6 14 2 0 7 15 13 1 3 12 16 5 8 9 11
## 7 2 13 6 14 1 9 0 11 8 7 12 15 3 5 4 10 16
## 8 8 9 1 10 7 16 3 0 6 14 15 12 13 2 5 4 11
## 9 11 1 9 3 4 6 13 2 0 10 16 15 8 12 5 7 14
## 10 15 1 8 3 6 13 14 7 10 0 2 9 5 4 16 11 12
## 11 5 10 13 9 11 3 8 1 14 4 0 15 7 2 6 12 16
## 12 2 6 15 11 12 3 10 4 7 13 14 0 1 9 8 5 16
## 13 16 2 1 10 13 15 5 3 4 9 11 6 0 7 14 8 12
## 14 3 12 9 11 4 8 15 1 16 14 5 10 2 0 13 6 7
## 15 1 8 13 9 16 4 6 5 11 15 14 12 7 2 0 3 10
## 16 9 6 2 14 4 1 3 5 16 8 15 11 7 10 12 0 13
## 17 8 2 7 15 13 6 12 5 14 10 9 16 11 1 3 4 0
Let’s fit a model for the two types of nonconformity and deference at the first time point:
newc.fit1 <- ergm(newcomb[[1]] ~ rank.nonconformity + rank.nonconformity("localAND") +
rank.deference, response = "descrank", reference = ~CompleteOrder, control = control.ergm(MCMC.burnin = 4096,
MCMC.interval = 32, CD.conv.min.pval = 0.05), eval.loglik = FALSE)
summary(newc.fit1)
## Call:
## ergm(formula = newcomb[[1]] ~ rank.nonconformity + rank.nonconformity("localAND") +
## rank.deference, response = "descrank", reference = ~CompleteOrder,
## eval.loglik = FALSE, control = control.ergm(MCMC.burnin = 4096,
## MCMC.interval = 32, CD.conv.min.pval = 0.05))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## nonconformity -0.004154 0.003397 0 -1.223 0.221
## nonconformity.localAND -0.009445 0.009729 0 -0.971 0.332
## deference -0.154823 0.038553 0 -4.016 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check diagnostics:
newc.fit15 <- ergm(newcomb[[15]] ~ rank.nonconformity + rank.nonconformity("localAND") +
rank.deference, response = "descrank", reference = ~CompleteOrder, control = control.ergm(MCMC.burnin = 4096,
MCMC.interval = 32, CD.conv.min.pval = 0.05), eval.loglik = FALSE)
summary(newc.fit15)
## Call:
## ergm(formula = newcomb[[15]] ~ rank.nonconformity + rank.nonconformity("localAND") +
## rank.deference, response = "descrank", reference = ~CompleteOrder,
## eval.loglik = FALSE, control = control.ergm(MCMC.burnin = 4096,
## MCMC.interval = 32, CD.conv.min.pval = 0.05))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## nonconformity 0.001204 0.002724 0 0.442 0.659
## nonconformity.localAND -0.044258 0.008321 0 -5.319 <1e-04 ***
## deference -0.338163 0.075933 0 -4.453 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check diagnostics:
NA
) edges are handled automatically for valued
ERGMs (as they are for regular ERGMs).ergm
has an argument eval.loglik
, which is
TRUE
by default. For valued ERGMs, it’s quite a bit less
efficient than for binary, at least for now. So, unless you need the
AICs or BICs to compare models, and especially if your networks are not
small, pass eval.loglik=FALSE
.latentnet
latentnet
, as the name suggests, is designed to fit
latent space models, but it can fit other dyad-independent network
models as well. Let
\(\boldsymbol{Y}\) be the random network being modeled;
\(\boldsymbol{y}\) be the observed network;
\(\boldsymbol{x}\) is a \( p \times n \times n \) array of dyadic covariates, with
\(\boldsymbol{x}_{\cdot,i,j}\) being a \( p \)-vector of covariates for dyad \({(i,j)}\);
\(\boldsymbol{\beta}\) be the \( p \)-vector of covariate coefficients;
\(\boldsymbol{Z}\) be the \( n \times d \) array of latent positions, with
\(\boldsymbol{Z}_i\) being the \( d \)-vector position of actor \(i\);
\(\boldsymbol{\delta}\) be the \( n \)-vector of sender effects, with
\(\boldsymbol{\delta}_i\) being the sender effect of \(\boldsymbol{\delta}\); and
\(\boldsymbol{\gamma}\) being a \( n \)-vector of receiver effects, with
\(\boldsymbol{\gamma}_i\) being the receiver effect of \(\boldsymbol{\gamma}\).
For brevity, let \(\boldsymbol{\theta}=(\boldsymbol{\beta},\boldsymbol{Z},\boldsymbol{\delta},\boldsymbol{\gamma})\)
and let \(\boldsymbol{\theta}_{i,j}=(\boldsymbol{\beta},\boldsymbol{Z}_i,\boldsymbol{Z}_j,\boldsymbol{\delta}_i,\boldsymbol{\gamma}_j)\).
Generally, a latent space model that can be fit by
latentnet
has the following form: \[
\begin{align}
\Pr(\boldsymbol{Y}=\boldsymbol{y}|\boldsymbol{\theta},\boldsymbol{x})
& =
\prod_{{{(i,j)}\in\mathbb{Y}}}\Pr(\boldsymbol{Y}\!_{i,j}=\boldsymbol{y}_{i,j}|\boldsymbol{\theta}_{i,j},\boldsymbol{x}_{\cdot,i,j})\label{eq:cond-ind},
\\
\Pr(\boldsymbol{Y}\!_{i,j}=\boldsymbol{y}_{i,j}|\boldsymbol{\theta}_{i,j},\boldsymbol{x}_{\cdot,i,j})
& =
f(\boldsymbol{y}_{i,j}|\boldsymbol{\mu}_{i,j})\label{eq:cond-on-dist},
\\
\boldsymbol{\mu}_{i,j}& =
g^{-1}(\boldsymbol{\eta}_{i,j})\label{eq:glm-link}, \\
\boldsymbol{\eta}_{i,j}& = \boldsymbol{x}_{\cdot,i,j}^\top
\boldsymbol{\beta}+
\text{d}(\boldsymbol{Z}_i,\boldsymbol{Z}_j)+\boldsymbol{\delta}_i+\boldsymbol{\gamma}_j\label{eq:linear-predictor},
\end{align}
\] for some latent position effect \(\text{d}(\cdot,\cdot)\). Effects supported
by latentnet
are Euclidean, having \(\text{d}(\boldsymbol{Z}_i,\boldsymbol{Z}_j)=-\lvert
\boldsymbol{Z}_i -\boldsymbol{Z}_j\rvert\) and bilinear, having
\(\text{d}(\boldsymbol{Z}_i,\boldsymbol{Z}_j)=
\boldsymbol{Z}_i^\top\boldsymbol{Z}_j\). Latent space positions
\(\boldsymbol{Z}\) can further be
modeled as a Gaussian mixture, and \(\boldsymbol{\delta}\) and \(\boldsymbol{\gamma}\) are likewise modeled
as random effects.
In words, the values of individual relations are assumed to be independent \(\eqref{eq:cond-ind}\), and each potential relation has a value modeled by a GLM, having some distribution with density \(f\), parametrized by its expected value \(\eqref{eq:cond-on-dist}\), which is, in turn, a function, via a GLM link function \(g\), of the linear predictor \(\boldsymbol{\eta}_{i,j}\) \(\eqref{eq:glm-link}\), which is a linear function combining all the parameters \(\eqref{eq:linear-predictor}\).
A latent space model for a binary network has \[ \begin{align*} f(\boldsymbol{y}_{i,j}|\boldsymbol{\mu}_{i,j})&=(\boldsymbol{\mu}_{i,j})^{\boldsymbol{y}_{i,j}}(1-\boldsymbol{\mu}_{i,j})^{1-\boldsymbol{y}_{i,j}}\\ g(\boldsymbol{\mu}_{i,j})&=\text{logit}(\boldsymbol{\mu}_{i,j}), \end{align*} \] a Bernoulli model with a logit link: a logistic regression model.
This is easily extended to any GLM. In particular,
Poisson log-linear model: \[ \begin{align*} f(\boldsymbol{y}_{i,j}|\boldsymbol{\mu}_{i,j})&=\exp\mathchoice{\left(-\boldsymbol{\mu}_{i,j}\right)}{(-\boldsymbol{\mu}_{i,j})}{(-\boldsymbol{\mu}_{i,j})}{(-\boldsymbol{\mu}_{i,j})} \boldsymbol{\mu}_{i,j}^{\boldsymbol{y}_{i,j}}/\boldsymbol{y}_{i,j}!\\ g(\boldsymbol{\mu}_{i,j})&=\log(\boldsymbol{\mu}_{i,j}) \end{align*} \]
Binomial logit with \(m\) trials: \[ \begin{align*} f(\boldsymbol{y}_{i,j}|\boldsymbol{\mu}_{i,j})&=\binom{m}{\boldsymbol{y}_{i,j}}(\boldsymbol{\mu}_{i,j}/m)^{\boldsymbol{y}_{i,j}}(1-\boldsymbol{\mu}_{i,j}/m)^{m-\boldsymbol{y}_{i,j}}\\ g(\boldsymbol{\mu}_{i,j})&=\text{logit}(\boldsymbol{\mu}_{i,j}) \end{align*} \]
Normal linear with variance \(\sigma^2\): \[ \begin{align*} f(\boldsymbol{y}_{i,j}|\boldsymbol{\mu}_{i,j})&=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\mathchoice{\left(-\frac{(\boldsymbol{y}_{i,j}-\boldsymbol{\mu}_{i,j})^2}{2\sigma^2}\right)}{(-\frac{(\boldsymbol{y}_{i,j}-\boldsymbol{\mu}_{i,j})^2}{2\sigma^2})}{(-\frac{(\boldsymbol{y}_{i,j}-\boldsymbol{\mu}_{i,j})^2}{2\sigma^2})}{(-\frac{(\boldsymbol{y}_{i,j}-\boldsymbol{\mu}_{i,j})^2}{2\sigma^2})}\\ g(\boldsymbol{\mu}_{i,j})&=\boldsymbol{\mu}_{i,j}\\ \sigma^2&\sim \sigma^2_0 \text{Inv}\chi^2(\nu) \end{align*} \] for prior dispersion parameters \(\sigma^2_0\) (magnitude of the variance) and \(\nu\) (certainty of the prior (higher = more certain)).
These are currently supported by latentnet
.
latentnet
The workhorse function ergmm
(for Exponential Random
Graph Mixed Model) has a syntax very similar to
ergm
: the model is specified as
network ~ term1 + term2 + ...
and latentnet
automatically imports all of the dyad-independent terms
implemented in ergm
(and those in any loaded addon
packages). (latentnet
does have some special-purpose terms,
mainly because it can fit networks with self-loops, while
ergm
cannot.) For a full list of
latentnet
-specific terms implemented, see
? terms.ergmm
. Note that ergmm
, like
glm
and unlike ergm
, sets the first covariate
\(\boldsymbol{x}_{1,i,j}\equiv 1\) for
an intercept term, the (binary) ERGM equivalent of an edge count
term.
For example,
sets \(\boldsymbol{x}_{1,i,j}\equiv
1\), \(\boldsymbol{x}_{2,i,j}\equiv
\mathbb{I}\left(i\in\text{Loyal} \land j\in\text{Loyal}\right)\),
\(\boldsymbol{x}_{3,i,j}\equiv
\mathbb{I}\left(i\in\text{Outcasts} \land
j\in\text{Outcasts}\right)\), etc., to produce a model of the
form \[\text{logit}(\Pr(\boldsymbol{Y}\!_{i,j}=1))=\boldsymbol{\beta}_1
+ \boldsymbol{\beta}_2 \mathbb{I}\left(\text{$i$ and $j$ are both Loyal
Opposition}\right) + \boldsymbol{\beta}_3 \mathbb{I}\left(\text{$i$ and
$j$ are both Outcasts}\right)+\dotsb,\] equivalent to a
ergm
specification of
## Warning in mple.existence(pl): The MPLE does not exist!
In fact, the produce the same coefficients and standard errors:
summary(samplk.nm.l, point.est = "mle", se = TRUE)
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: samplk.tot ~ nodematch("group", diff = TRUE)
## Attribute: edges
## Model: Bernoulli
## Covariate coefficients MLE:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.66208 0.17932 -9.2689 < 2.2e-16 ***
## nodematch.group.Loyal 2.06755 0.49040 4.2161 2.486e-05 ***
## nodematch.group.Outcasts 17.05930 900.30202 0.0189 0.98488
## nodematch.group.Turks 2.57837 0.38577 6.6836 2.331e-11 ***
## nodematch.group.Waverers 1.66208 0.83596 1.9882 0.04678 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(samplk.nm.e)
## Call:
## ergm(formula = samplk.tot ~ edges + nodematch("group", diff = TRUE))
##
## Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -1.6621 0.1793 0 -9.269 <1e-04 ***
## nodematch.group.Loyal 2.0675 0.4904 0 4.216 <1e-04 ***
## nodematch.group.Outcasts 18.3450 1038.5246 0 0.018 0.9859
## nodematch.group.Turks 2.5784 0.3858 0 6.684 <1e-04 ***
## nodematch.group.Waverers 1.6621 0.8360 0 1.988 0.0468 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 424.2 on 306 degrees of freedom
## Residual Deviance: 289.1 on 301 degrees of freedom
##
## AIC: 299.1 BIC: 317.7 (Smaller is better. MC Std. Err. = 0)
latentnet
fits latent space models using terms
euclidean
and bilinear
, which take two main
arguments: d
for dimension and G
for number of
clusters to use. So, for a classic example of the Sampson’s monks fit,
i.e., \[\text{logit}(\Pr(\boldsymbol{Y}\!_{i,j}=1))=\boldsymbol{\beta}_1
+ \lvert \boldsymbol{Z}_i -\boldsymbol{Z}_j \rvert,\] with \(\boldsymbol{Z}_i\) modeled as 3 spherical
Gaussian clusters with 2 dimensions,
Random actor-specific effects can also be used, with terms
rsender
, receiver
, and
rsociality
, respectively: a receiver effects model of the
form \[\text{logit}(\Pr(\boldsymbol{Y}\!_{i,j}=1))=\boldsymbol{\beta}_1
+ \lvert \boldsymbol{Z}_i -\boldsymbol{Z}_j \rvert +
\boldsymbol{\gamma}_j,\] can be fit with
samplk.d2G3r <- ergmm(samplk.tot ~ euclidean(d = 2, G = 3) + rreceiver, verbose = TRUE)
mcmc.diagnostics(samplk.d2G3r)
The results can be visualized using
par(mfrow = c(1, 2))
# Extract a clustering
Z.K.ref <- summary(samplk.d2G3, point.est = "pmean")$pmean$Z.K
# Plot one model, saving positions, using Z.K.ref to set reference clustering.
Z.ref <- plot(samplk.d2G3, pie = TRUE, Z.K.ref = Z.K.ref)
# Plot the other model, using Z.ref and Z.K.ref to ensure similar orientation
# and coloring.
plot(samplk.d2G3r, rand.eff = "receiver", pie = TRUE, Z.ref = Z.ref, Z.K.ref = Z.K.ref)
All ergmm
terms also take additional arguments,
specifying prior distributions. Their defaults are generally sensible,
so we will not discuss them here.
ergmm
Fitting non-binary models using ergmm
requires three
additional arguments:
response
An edge attribute in the
network whose values are to be used as the response.
family
A string specifying the family
and the link to be used.
fam.par
A named list of parameters
required by some families.
The families listed above are specified using parameter values listed in the following table.
Also, see
for more information.
Family | Link | family= |
fam.par=list(...) |
---|---|---|---|
Bernoulli | logit | "Bernoulli" |
|
binomial | logit | "binomial" |
trials = \(m\) |
Poisson | log | "Poisson" |
|
Normal | linear | "normal" |
prior.var \(=\sigma^2_0\),
prior.var.df \(=\nu\) |
We have our network of nomination counts, with a maximum of three “successes”, so we might model it using a Binomial distribution, i.e., \[\Pr(\boldsymbol{Y}\!_{i,j}=\boldsymbol{y}_{i,j}|\boldsymbol{\eta}_{i,j})=\binom{3}{\boldsymbol{y}_{i,j}}(\text{logit}^{-1}(\boldsymbol{\eta}_{i,j}))^{\boldsymbol{y}_{i,j}}(\text{logit}^{-1}(\boldsymbol{\eta}_{i,j}))^{3-\boldsymbol{y}_{i,j}},\] with \(\boldsymbol{\eta}_{i,j}\) being the same as in the binary case.
# Bernoulli logit fit (recall) samplk.d2G3 <-
# ergmm(samplk.tot~euclidean(d=2,G=3)) Binomial(trials=3) logit fit
samplk.ct.d2G3 <- ergmm(samplk.tot ~ euclidean(d = 2, G = 3), response = "nominations",
family = "binomial", fam.par = list(trials = 3), verbose = TRUE)
We can plot the fit
# Plot them side-by-side, using edge.col argument:
par(mfrow = c(1, 2))
plot(samplk.d2G3, pie = TRUE, Z.ref = Z.ref, Z.K.ref = Z.K.ref)
plot(samplk.ct.d2G3, pie = TRUE, Z.ref = Z.ref, Z.K.ref = Z.K.ref, edge.col = samplk.ecol)
Now, consider a latent cluster random effects model for the number of contexts of interactions, which takes into account that faction leaders are likely to have greater propensity to interact than non-leaders: \(\boldsymbol{Y}\!_{i,j}{\stackrel{\mathrm{ind.}}{\sim}}\text{Poisson}(\boldsymbol{\mu}_{i,j})\) with \[\log\boldsymbol{\mu}_{i,j}=\boldsymbol{\eta}_{i,j}=\boldsymbol{\beta}_1 + \boldsymbol{\beta}_2 (\mathbb{I}\left(\text{$i$ is a faction leader}\right) + \mathbb{I}\left(\text{$j$ is a faction leader}\right) ) + \lvert \boldsymbol{Z}_i-\boldsymbol{Z}_j \rvert + \boldsymbol{\delta}_i + \boldsymbol{\delta}_j.\]
Now, let’s fit the model:
zach.d2G2S <- ergmm(zach ~ nodefactor("leader") + euclidean(d = 2, G = 2) + rsociality,
response = "contexts", family = "Poisson", verbose = TRUE)
mcmc.diagnostics(zach.d2G2S)
summary(zach.d2G2S)
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: zach ~ nodefactor("leader") + euclidean(d = 2, G = 2) + rsociality
## Attribute: contexts
## Model: Poisson
## MCMC sample of size 4000, draws are 10 iterations apart, after burnin of 10000 iterations.
## Covariate coefficients posterior means:
## Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
## (Intercept) 0.877460 -0.033149 1.7813 0.0625 .
## nodefactor.leader.TRUE 2.019891 0.550986 3.4796 0.0110 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Sociality effect variance: 0.82465.
## Overall BIC: 792.1717
## Likelihood BIC: 442.2485
## Latent space/clustering BIC: 270.0214
## Sociality effect BIC: 79.90185
##
## Covariate coefficients MKL:
## Estimate
## (Intercept) 0.509221
## nodefactor.leader.TRUE 2.009996
par(mar = c(5, 4, 4, 2) + 0.1)
plot(zach.d2G2S, rand.eff = "sociality", edge.col = zach.ecol, labels = TRUE)
It is worth noting here that even though we are fitting a valued network model, the term names and parameters that we need to use are still those for the binary ERGM variants of these terms.
Another useful application of nodecov
and others is when
an undirected network of counts is a product of collapsing an
affiliation network of actors to events to produce a network of actors
only, counting the number of shared events for each pair of actors. If
there is no particular pattern to the interaction, then the expected
number of shared events between \(i\)
and \(j\) would be proportional to the
total number of events of \(i\) and to
the number of events of \(j\). If \(a_i\) is the total number of events
associated with actor \(i\), then using
a vertex attribute equaling to \(\log(a_i)\) in a socialitycov
can be a good baseline model: \[\log
\boldsymbol{\mu}_{i,j}=\boldsymbol{\beta}_1 + \boldsymbol{\beta}_2
(\log(a_i)+\log(a_j)) \Leftrightarrow
\boldsymbol{\mu}_{i,j}=\exp\mathchoice{\left(\boldsymbol{\beta}_1\right)}{(\boldsymbol{\beta}_1)}{(\boldsymbol{\beta}_1)}{(\boldsymbol{\beta}_1)}a_i^{\boldsymbol{\beta}_2}a_j^{\boldsymbol{\beta}_2},\]
which produces proportionality when \(\boldsymbol{\beta}_2\approx 1\). For an
application of this, see Krivitsky et al.
(2009).
latentnet
to fit heterogeneous trial
counts.