Chapter 14 Correspondence Analysis
One of the key assumptions made in PCA is that species are related to each other linearly and that they respond linearly to ecological gradients. This is not necessarily the case with a lot of ecological data (e.g. many species have unimodal species distributions). Using PCA on data with unimodal species distributions or a lot of zero values may lead to a phenomenon called the “horseshoe effect”, and can occur with long ecological gradients. As such, a CA or Correspondence Analysis may be a better option for this type of data, see Legendre and Legendre (2012) for further information. As CA preserves Chi2 distances (while PCA preserves Euclidean distances), this technique is indeed better suited to ordinate datasets containing unimodal species distributions, and has, for a long time, been one of the favourite tools for the analysis of species presence–absence or abundance data. In CA, the raw data are first transformed into a matrix Q of cell-by-cell contributions to the Pearson Chi2 statistic, and the resulting table is submitted to a singular value decomposition to compute its eigenvalues and eigenvectors. The result is an ordination, where it is the Chi2 distance that is preserved among sites instead of the Euclidean distance in PCA. The Chi2 distance is not influenced by the double zeros. Therefore, CA is a method adapted to the analysis of species abundance data without pre-transformation. Contrary to PCA, CA can also be applied to analyze both quantitative and binary data (such as species abundances or absence/presence). As in PCA, the Kaiser-Guttman criterion can be applied to determine the significant axes of a CA, and ordination axes can be extracted to be used in multiples regressions.
Run a CA on species data:
# Run the CA using the cca() function (NB: cca() is used
# for both CA and CCA)
<- cca(spe[-8, ])
spe.ca
# Identify the significant axes
<- spe.ca$CA$eig
ev
> mean(ev)] ev[ev
## CA1 CA2 CA3 CA4 CA5
## 0.60099264 0.14437089 0.10729384 0.08337321 0.05157826
= length(ev)
n barplot(ev, main = "Eigenvalues", col = "grey", las = 2)
abline(h = mean(ev), col = "red")
legend("topright", "Average eigenvalue", lwd = 1, col = 2, bty = "n")
From this barplot, you can see that once you reach C6, the proportion of variance explained falls below the average proportion explained by the other components. If you take another look at the CA summary, you will notice that by the time you reach CA5, the cumulative proportion of variance explained by the principal components is 84.63%.
summary(spe.h.pca) #overall results
##
## Call:
## rda(X = spe.hel)
##
## Partitioning of variance:
## Inertia Proportion
## Total 0.5023 1
## Unconstrained 0.5023 1
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 0.2491 0.06592 0.04615 0.03723 0.02125 0.01662 0.01477
## Proportion Explained 0.4959 0.13122 0.09188 0.07412 0.04230 0.03309 0.02940
## Cumulative Proportion 0.4959 0.62715 0.71903 0.79315 0.83544 0.86853 0.89794
## PC8 PC9 PC10 PC11 PC12 PC13
## Eigenvalue 0.01297 0.01054 0.006666 0.00504 0.004258 0.002767
## Proportion Explained 0.02582 0.02098 0.013269 0.01003 0.008477 0.005507
## Cumulative Proportion 0.92376 0.94474 0.958011 0.96804 0.976521 0.982029
## PC14 PC15 PC16 PC17 PC18 PC19
## Eigenvalue 0.002612 0.001505 0.001387 0.001037 0.0007815 0.0004749
## Proportion Explained 0.005200 0.002996 0.002761 0.002065 0.0015557 0.0009454
## Cumulative Proportion 0.987229 0.990225 0.992986 0.995051 0.9966069 0.9975523
## PC20 PC21 PC22 PC23 PC24
## Eigenvalue 0.0004263 0.0002812 0.0002188 0.0001382 0.0000876
## Proportion Explained 0.0008487 0.0005598 0.0004356 0.0002752 0.0001744
## Cumulative Proportion 0.9984010 0.9989608 0.9993965 0.9996717 0.9998460
## PC25 PC26 PC27
## Eigenvalue 5.336e-05 1.476e-05 9.216e-06
## Proportion Explained 1.062e-04 2.938e-05 1.835e-05
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 1.953663
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## CHA 0.17113 0.08669 -0.060772 0.2536941 -0.027774 0.0129709
## TRU 0.64097 0.02193 -0.232895 -0.1429053 -0.059150 0.0013299
## VAI 0.51106 0.19774 0.165053 0.0203364 0.105582 0.1226508
## LOC 0.38002 0.22219 0.235145 -0.0344577 0.126570 0.0570437
## OMB 0.16679 0.06494 -0.087248 0.2444247 0.016349 0.0537362
## BLA 0.07644 0.14707 -0.041152 0.2304754 -0.104984 -0.0424943
## HOT -0.18392 0.05238 -0.042963 0.0222676 0.071349 0.0299974
## TOX -0.14601 0.17844 -0.029561 0.0622957 -0.002615 -0.0839282
## VAN -0.11487 0.18186 0.125113 -0.0190457 -0.197543 0.0253735
## CHE -0.09792 -0.08873 0.281887 0.1094447 0.027418 -0.0153611
## BAR -0.19770 0.21144 -0.072035 0.0956702 0.005739 -0.0309983
## SPI -0.17618 0.16158 -0.048527 0.0387172 0.031417 -0.0535635
## GOU -0.23096 0.12248 0.062575 -0.0025267 -0.144007 0.1420702
## BRO -0.15129 0.14242 0.028378 -0.1206878 -0.096488 0.0729418
## PER -0.15699 0.19145 0.034700 -0.0969487 -0.041512 -0.0483958
## BOU -0.22734 0.13581 -0.074011 -0.0111431 0.080027 -0.0019656
## PSO -0.22670 0.08348 -0.067212 0.0198179 0.063237 0.0144138
## ROT -0.19113 0.03578 -0.007672 -0.0724430 -0.070206 0.0686039
## CAR -0.18609 0.13050 -0.063693 0.0005777 0.039554 -0.0284575
## TAN -0.19138 0.17582 0.094398 -0.0867317 0.010912 -0.0851924
## BCO -0.20055 0.08332 -0.074787 -0.0504875 0.073890 0.0249842
## PCH -0.14626 0.05268 -0.072012 -0.0432572 0.050318 0.0178776
## GRE -0.29970 -0.01158 -0.069047 -0.0118488 0.029937 0.1407698
## GAR -0.35085 -0.09353 0.198664 0.0178669 0.023796 -0.0971362
## BBO -0.24167 0.03598 -0.079528 -0.0339049 0.096690 0.0620979
## ABL -0.42269 -0.22879 0.007158 0.1128353 0.006759 0.1248913
## ANG -0.20521 0.11557 -0.072060 -0.0159902 0.072030 -0.0003801
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## 1 0.370801 -0.49113 -1.018411 -0.58387 -0.491672 -0.622062
## 2 0.507019 -0.05688 -0.175969 -0.42418 0.407790 0.160725
## 3 0.464652 0.02937 -0.067360 -0.49566 0.324339 0.313053
## 4 0.299434 0.19037 0.241315 -0.54538 0.009838 0.197974
## 5 -0.003672 0.13483 0.515723 -0.53640 -0.796183 -0.208554
## 6 0.212943 0.16142 0.538108 -0.44366 -0.138553 -0.066196
## 7 0.440596 -0.01853 0.174782 -0.31336 0.171713 0.131172
## 8 0.032182 -0.53492 -0.354289 -0.07870 -0.125287 -0.632592
## 9 0.040265 -0.33590 0.937036 0.06715 0.610763 -0.828040
## 10 0.299210 0.06061 0.564573 -0.12200 -0.091261 0.461350
## 11 0.470590 -0.10233 -0.100020 0.31166 0.343997 0.217288
## 12 0.479878 -0.05742 -0.117576 0.30965 0.375923 0.191067
## 13 0.486974 0.03698 -0.429138 0.55813 0.050055 0.110758
## 14 0.373684 0.16616 -0.209667 0.63367 -0.181029 0.289613
## 15 0.277984 0.25815 0.075162 0.61382 -0.490132 0.026548
## 16 0.076037 0.48507 0.102428 0.32702 -0.598708 -0.551888
## 17 -0.056099 0.44027 -0.008316 0.42040 -0.063619 -0.422902
## 18 -0.138360 0.40056 0.006109 0.39558 -0.005006 -0.202673
## 19 -0.273219 0.33436 0.144602 0.08318 0.176567 0.023573
## 20 -0.383501 0.21441 0.030917 -0.03750 0.161033 -0.009409
## 21 -0.414203 0.22676 -0.110283 -0.12803 0.156309 0.088863
## 22 -0.448647 0.16664 -0.159017 -0.12942 0.103049 -0.008044
## 23 -0.244257 -1.03990 0.345313 0.42836 0.062931 -0.378721
## 24 -0.361605 -0.78826 -0.007626 0.30578 0.222002 0.528274
## 25 -0.328522 -0.56914 0.211636 0.02271 -1.012465 0.829481
## 26 -0.446605 0.02551 -0.126306 -0.13538 0.213013 0.226330
## 27 -0.449471 0.11911 -0.170655 -0.13637 0.182680 0.072495
## 28 -0.451333 0.11548 -0.200522 -0.14906 0.242752 0.055270
## 29 -0.360249 0.26693 -0.299234 -0.01005 0.120624 0.101802
## 30 -0.472505 0.16141 -0.333315 -0.20807 0.058535 -0.094556
# summary(spe.h.pca, diplay=NULL)# only axis eigenvalues
# and contribution
CA results are presented in a similar manner as PCA results. You can see here that CA1 explains 51.50% of the variation in species abundances, while CA2 explain 12.37% of the variation.
par(mfrow = c(1, 2))
#### scaling 1
plot(spe.ca, scaling = 1, type = "none", main = "CA - biplot scaling 1",
xlab = c("CA1 (%)", round((spe.ca$CA$eig[1]/sum(spe.ca$CA$eig)) *
100, 2)), ylab = c("CA2 (%)", round((spe.ca$CA$eig[2]/sum(spe.ca$CA$eig)) *
100, 2)))
points(scores(spe.ca, display = "sites", choices = c(1, 2), scaling = 1),
pch = 21, col = "black", bg = "steelblue", cex = 1.2)
text(scores(spe.ca, display = "species", choices = c(1), scaling = 1),
scores(spe.ca, display = "species", choices = c(2), scaling = 1),
labels = rownames(scores(spe.ca, display = "species", scaling = 1)),
col = "red", cex = 0.8)
#### scaling 2
plot(spe.ca, scaling = 1, type = "none", main = "CA - biplot scaling 2",
xlab = c("CA1 (%)", round((spe.ca$CA$eig[1]/sum(spe.ca$CA$eig)) *
100, 2)), ylab = c("CA2 (%)", round((spe.ca$CA$eig[2]/sum(spe.ca$CA$eig)) *
100, 2)), ylim = c(-2, 3))
points(scores(spe.ca, display = "sites", choices = c(1, 2), scaling = 2),
pch = 21, col = "black", bg = "steelblue", cex = 1.2)
text(scores(spe.ca, display = "species", choices = c(1), scaling = 2),
scores(spe.ca, display = "species", choices = c(2), scaling = 2),
labels = rownames(scores(spe.ca, display = "species", scaling = 2)),
col = "red", cex = 0.8)
These biplots show that a group of sites located in the left part with similar fish community characterized by numerous species such as GAR, TAN, PER, ROT, PSO and CAR; in the upper right corner, an other site cluster characterized by the species LOC, VAI and TRU is identified; the last site cluster in the lower right corner of the biplot is characterized by the species BLA, CHA and OMB.
Challenge 4 Run a CA of the “mite” species abundance data. What are the significant axes of variation? Which groups of sites can you identify? Which species are related to each group of sites?
Challenge 4 - Solution
# CA on mite species
<- mite
mite.spe <- cca(mite.spe)
mite.spe.ca
# What are the significant axes ?
<- mite.spe.ca$CA$eig
ev > mean(ev)] ev[ev
## CA1 CA2 CA3 CA4 CA5 CA6 CA7
## 0.52511362 0.22727580 0.17401743 0.12661241 0.08621687 0.07484890 0.06694738
## CA8
## 0.05061316
= length(ev)
n barplot(ev, main = "Eigenvalues", col = "grey", las = 2)
abline(h = mean(ev), col = "red")
legend("topright", "Average eigenvalue", lwd = 1, col = 2, bty = "n")
# Output summary/results
summary(mite.spe.ca, display = NULL)
##
## Call:
## cca(X = mite.spe)
##
## Partitioning of scaled Chi-square:
## Inertia Proportion
## Total 1.696 1
## Unconstrained 1.696 1
##
## Eigenvalues, and their contribution to the scaled Chi-square
##
## Importance of components:
## CA1 CA2 CA3 CA4 CA5 CA6 CA7
## Eigenvalue 0.5251 0.2273 0.1740 0.12661 0.08622 0.07485 0.06695
## Proportion Explained 0.3096 0.1340 0.1026 0.07465 0.05083 0.04413 0.03947
## Cumulative Proportion 0.3096 0.4436 0.5462 0.62088 0.67171 0.71584 0.75532
## CA8 CA9 CA10 CA11 CA12 CA13 CA14
## Eigenvalue 0.05061 0.04373 0.03323 0.03184 0.02970 0.02830 0.02652
## Proportion Explained 0.02984 0.02578 0.01959 0.01877 0.01751 0.01668 0.01564
## Cumulative Proportion 0.78516 0.81094 0.83054 0.84931 0.86682 0.88350 0.89914
## CA15 CA16 CA17 CA18 CA19 CA20
## Eigenvalue 0.02224 0.01978 0.016598 0.015868 0.014777 0.012515
## Proportion Explained 0.01311 0.01166 0.009787 0.009356 0.008713 0.007379
## Cumulative Proportion 0.91226 0.92392 0.933705 0.943061 0.951774 0.959153
## CA21 CA22 CA23 CA24 CA25 CA26
## Eigenvalue 0.010378 0.009262 0.007844 0.006765 0.006606 0.005757
## Proportion Explained 0.006119 0.005461 0.004625 0.003989 0.003895 0.003394
## Cumulative Proportion 0.965272 0.970733 0.975358 0.979347 0.983242 0.986636
## CA27 CA28 CA29 CA30 CA31 CA32
## Eigenvalue 0.005236 0.004512 0.003406 0.002828 0.002486 0.001867
## Proportion Explained 0.003087 0.002661 0.002008 0.001667 0.001466 0.001101
## Cumulative Proportion 0.989723 0.992384 0.994392 0.996059 0.997525 0.998625
## CA33 CA34
## Eigenvalue 0.0012956 0.0010357
## Proportion Explained 0.0007639 0.0006106
## Cumulative Proportion 0.9993894 1.0000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
# Plot the biplot
plot(mite.spe.ca, scaling = 1, type = "none", xlab = c("PC1 (%)",
round((mite.spe.ca$CA$eig[1]/sum(mite.spe.ca$CA$eig)) * 100,
2)), ylab = c("PC2 (%)", round((mite.spe.ca$CA$eig[2]/sum(mite.spe.ca$CA$eig)) *
100, 2)))
points(scores(mite.spe.ca, display = "sites", choices = c(1,
2), scaling = 1), pch = 21, col = "black", bg = "steelblue",
cex = 1.2)
text(scores(mite.spe.ca, display = "species", choices = c(1),
scaling = 1), scores(mite.spe.ca, display = "species", choices = c(2),
scaling = 1), labels = rownames(scores(mite.spe.ca, display = "species",
scaling = 1)), col = "red", cex = 0.8)