Chapter 8 Variation partitioning
Variation partitioning is a type of analysis that combines RDA and partial RDA to divide the variation of a response variable among two, three or four explanatory data sets. For example, you might want to partition the variation in a community matrix among a set of abiotic environmental variables, and a set of biotic variables. You could also partition this community variation among small-scale or large-scale variables, to test the effect of spatial scale on your community.
The results of variation partitioning analyses are traditionally represented by a Venn diagram, in which the percentage of explained variance by each explanatory data set is reported. In a case where we are partitioning the variation among two explanatory matrices, the result could be represented as follows:
Here,
- Fraction \([a + b + c]\) is the explained variance by \(X1\) and* \(X2\) together, calculated using a RDA of \(Y\) by \(X1 + X2\).
- Fraction \([d]\) is the unexplained variance by \(X1\) and \(X2\) together, obtained from the same RDA as above.
- Fraction \([a]\) is the explained variance by \(X1\) only, calculated using a partial RDA of \(Y\) by \(X1 | X2\) (controlling for \(X2\)).
- Fraction \([c]\) is the explained variance by \(X2\) only, calculated using a partial RDA of \(Y\) by \(X2 | X1\) (controlling for \(X1\)).
- Fraction \([b]\) is calculated by subtraction, i.e. \(b = [a + b] + [b + c] - [a + b + c].\) Because \([b]\) is not the result of an RDA, it cannot be tested for significance. It can also be negative, which indicates that the response matrix is better explained by the combination of \(X1\) and \(X2\) than by either matrix on its own.
8.1 Variation partitioning in R
To demonstrate how variation partitioning works in R
, we will partition the variation of fish species composition between chemical and topographic variables. The varpart()
function from vegan
makes this easy for us.
# Partition the variation in fish community composition
<- varpart(spe.hel, env.chem, env.topo)
spe.part.all $part # access results! spe.part.all
## No. of explanatory tables: 2
## Total variation (SS): 14.07
## Variance: 0.50251
## No. of observations: 29
##
## Partition table:
## Df R.squared Adj.R.squared Testable
## [a+c] = X1 7 0.60579 0.47439 TRUE
## [b+c] = X2 3 0.41526 0.34509 TRUE
## [a+b+c] = X1+X2 10 0.73414 0.58644 TRUE
## Individual fractions
## [a] = X1|X2 7 0.24135 TRUE
## [b] = X2|X1 3 0.11205 TRUE
## [c] 0 0.23304 FALSE
## [d] = Residuals 0.41356 FALSE
## ---
## Use function 'rda' to test significance of fractions of interest
You can then visualise the results with the plot()
function.
# plot the variation partitioning Venn diagram
plot(spe.part.all,
Xnames = c("Chem", "Topo"), # name the partitions
bg = c("seagreen3", "mediumpurple"), alpha = 80, # colour the circles
digits = 2, # only show 2 digits
cex = 1.5)
The chemical variables explain 24.1% of the variation in fish species composition, the topography variables explain 11.2% of the variation in fish species composition, and these two variable groups jointly explain 23.3% of the variation in fish species composition.
Be careful when reporting results of variation partitioning! The shared fraction [b] does not represent an interaction effect of the two explanatory matrices. Think of it as an overlap between \(X1\) and \(X2\). It represents the shared fraction of variation explained when the two are included in the model, meaning it is the portion of variation that cannot be attributed to \(X1\) or \(X2\) separately. In other words, the variation partitioning cannot disentangle the effects of chemistry and topography on 23.3% of the variation in the fish community composition.
8.2 Significance testing
The output from the varpart()
function reports the adjusted \(R^2\) for each fraction, but you will notice that the table does not include any test of statistical significance. However, the Testable
column identifies the fractions that can be tested for significance using the function anova.cca()
, just like we did with the RDA!
X1 [a+b]: Chemistry without controlling for topography
# [a+b] Chemistry without controlling for topography
anova.cca(rda(spe.hel, env.chem))
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe.hel, Y = env.chem)
## Df Variance F Pr(>F)
## Model 7 0.30442 4.6102 0.001 ***
## Residual 21 0.19809
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
X2 [b+c] Topography without controlling for chemistry
# [b+c] Topography without controlling for chemistry
anova.cca(rda(spe.hel, env.topo))
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe.hel, Y = env.topo)
## Df Variance F Pr(>F)
## Model 3 0.20867 5.918 0.001 ***
## Residual 25 0.29384
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
X1 | X2 [a] Chemistry alone
# [a] Chemistry alone
anova.cca(rda(spe.hel, env.chem, env.topo))
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe.hel, Y = env.chem, Z = env.topo)
## Df Variance F Pr(>F)
## Model 7 0.16024 3.0842 0.001 ***
## Residual 18 0.13360
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Recognize this? It’s a partial RDA!
X2 | X1 [c] Topography alone
# [c] Topography alone
anova.cca(rda(spe.hel, env.topo, env.chem))
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(X = spe.hel, Y = env.topo, Z = env.chem)
## Df Variance F Pr(>F)
## Model 3 0.064495 2.8965 0.001 ***
## Residual 18 0.133599
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
All of the testable fractions in the variation partitioning are statistically significant!
8.3 Challenge 3
Partition the variation in the mite species data according to substrate variables (SubsDens
, WatrCont
) and significant spatial variables.
- What proportion of the variation is explained by substrate variables? By space?
- Which individual fractions are significant?
- Plot your results!
Load the spatial variables:
data("mite.pcnm")
Recall some useful functions:
ordiR2step()
varpart()
anova.cca(rda())
plot()
8.3.1 Challenge 3: Solution
Step 1: Forward selection of significant spatial variables.
There are a lot of spatial variables in this dataset (22!). We should select the most important ones, to avoid overloading the model.
# Step 1: Forward selection!
# Write full RDA model with all variables
<- rda(mite.spe.hel ~ ., data = mite.pcnm)
full.spat
# Forward selection of spatial variables
<- ordiR2step(rda(mite.spe.hel ~ 1, data = mite.pcnm),
spat.sel scope = formula(full.spat), R2scope = RsquareAdj(full.spat)$adj.r.squared,
direction = "forward", trace = FALSE)
$call spat.sel
## rda(formula = mite.spe.hel ~ V2 + V3 + V8 + V1 + V6 + V4 + V9 +
## V16 + V7 + V20, data = mite.pcnm)
Step 2: Group variables of interest.
# Step 2: Group variables of interest.
# Subset environmental data to retain only substrate
# variables
<- subset(mite.env, select = c(SubsDens, WatrCont))
mite.subs
# Subset to keep only selected spatial variables
<- subset(mite.pcnm, select = names(spat.sel$terminfo$ordered))
mite.spat # a faster way to access the selected variables!
Step 3: Partition the variation in species abundances.
# Step 3: Partition the variation in species abundances.
<- varpart(mite.spe.hel, mite.subs, mite.spat)
mite.part $part$indfract # access results! mite.part
## Df R.squared Adj.R.squared Testable
## [a] = X1|X2 2 NA 0.05901929 TRUE
## [b] = X2|X1 10 NA 0.19415929 TRUE
## [c] 0 NA 0.24765221 FALSE
## [d] = Residuals NA NA 0.49916921 FALSE
- What proportion of the variation is explained by substrate variables? 5.9%
- What proportion of the variation is explained by spatial variables? 19.4%
Step 4: Which individual fractions are significant?
[a]: Substrate only
# Step 4: Significance testing [a]: Substrate only
anova.cca(rda(mite.spe.hel, mite.subs, mite.spat))
...
## Model: rda(X = mite.spe.hel, Y = mite.subs, Z = mite.spat)
## Df Variance F Pr(>F)
## Model 2 0.025602 4.4879 0.001 ***
## Residual 57 0.162583
...
[c]: Space only
# [c]: Space only
anova.cca(rda(mite.spe.hel, mite.spat, mite.subs))
...
## Model: rda(X = mite.spe.hel, Y = mite.spat, Z = mite.subs)
## Df Variance F Pr(>F)
## Model 10 0.10286 3.6061 0.001 ***
## Residual 57 0.16258
...
Step 5: Plot the variation partitioning results.
# Step 5: Plot
plot(mite.part,
digits = 2,
Xnames = c("Subs", "Space"), # label the fractions
cex = 1.5,
bg = c("seagreen3", "mediumpurple"), # add colour!
alpha = 80) # adjust transparency
So, what can we say about the effects of substrate and space on mite species abundances?
Hint: Why is the model showing such an important effect of space?
Space explains a lot of the variation in species abundances here: 19.4% (p = 0.001) of the variation is explained by space alone, and 24.8% is jointly explained by space and substrate. Substrate only explains ~6% (p = 0.001) of the variation in community composition across sites on its own! Also note that half of the variation is not explained by the variables we included in the model (look at the residuals!), so the model could be improved.
This large effect of space could be a sign that some spatial ecological process is important here (like dispersal, for example). However, it could also be telling us that we are missing an important environmental variable in our model, which itself varies in space!