class: center, middle, inverse, title-slide # Workshop 9: Multivariate analyses ## QCBS R Workshop Series ### Québec Centre for Biodiversity Science --- class: inverse, center, middle # About this workshop [](https://github.com/QCBSRworkshops/workshop09) [](https://wiki.qcbs.ca/r_workshop9) [](https://qcbsrworkshops.github.io/workshop09/workshop09-en/workshop09-en.html) [](https://qcbsrworkshops.github.io/workshop09/workshop09-en/workshop09-en.pdf) [](https://qcbsrworkshops.github.io/workshop09/workshop09-en/workshop09-en.R) --- # Required packages * [ape](https://cran.r-project.org/package=ape) * [gclus](https://cran.r-project.org/package=gclus) * [vegan](https://cran.r-project.org/package=vegan) <br> ```R install.packages(c('ape', 'gclus', 'vegan')) ``` --- # Learning objectives ##### Use R to perform an unconstrained ordination #### Use R to create dendrogram --- class: inverse, center, middle # 1. Introduction ## What is ordination? --- # One Dimension What if we are interested in this response for different species of algae involved in the algal bloom density? .center[] --- # Two Dimensions .center[] --- # Three Dimensions .center[ ] --- # 4,5,6, or more Dimensions .center[] --- # Ordination in reduced space .center[] --- # Ordination in reduced space .center[] - Matrix algebra is complex and hard to understand - A global understanding is enough in order to use ordination methods adequately --- # Methods for scientific research -- - **Questions / Hypothesis** -- - **Experimental design** -- - **Data Collection** -- - **Transformation / Distance** -- - **Analysis** -- - **Redaction** -- - **Communication** --- class: inverse, center, middle # 2. Exploring data --- # Doubs River Fish Dataset .pull-left[ Verneaux (1973) dataset: - characterization of fish communities - 27 different species - 30 different sites - 11 environmental variables ] .pull.right[  ] --- # Doubs River Fish Dataset Load the Doubs River species data (Doubs.Spe.csv) ```r spe <- read.csv("data/doubsspe.csv", row.names = 1) spe <- spe[-8,] # remove site with no data ``` Load the Doubs River environmental data (Doubs.Env.csv) ```r env <- read.csv("data/doubsenv.csv", row.names = 1) env <- env[-8,] # remove site with no data ``` .alert[Proceed with caution, only execute once] --- # Expore Doubs Dataset Explore the content of the fish community dataset ```r names(spe) # Names of objects dim(spe) # dimensions str(spe) # structure of objects summary(spe) # summary statistics head(spe) # first 6 rows ``` ``` # CHA TRU VAI LOC OMB BLA HOT TOX VAN CHE BAR SPI GOU BRO PER BOU PSO ROT CAR # 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # 2 0 5 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # 3 0 5 5 5 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 # 4 0 4 5 5 0 0 0 0 0 1 0 0 1 2 2 0 0 0 0 # 5 0 2 3 2 0 0 0 0 5 2 0 0 2 4 4 0 0 2 0 # 6 0 3 4 5 0 0 0 0 1 2 0 0 1 1 1 0 0 0 0 # TAN BCO PCH GRE GAR BBO ABL ANG # 1 0 0 0 0 0 0 0 0 # 2 0 0 0 0 0 0 0 0 # 3 0 0 0 0 0 0 0 0 # 4 1 0 0 0 0 0 0 0 # 5 3 0 0 0 5 0 0 0 # 6 2 0 0 0 1 0 0 0 ``` --- # Species Frequencies Take a look at the distribution of species frequencies ```r ab <- table(unlist(spe)) barplot(ab, las = 1, col = grey(5:0/5), xlab = "Abundance class", ylab = "Frequency") ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-5-1.png" width="576" style="display: block; margin: auto;" /> .alert[Note the proportion of 0s] --- # Species Frequencies How many zeros? ```r sum(spe == 0) # [1] 408 ``` What proportion of zeros? ```r sum(spe == 0)/(nrow(spe)*ncol(spe)) # [1] 0.5210728 ``` --- # Total Species Richness Visualize how many species are present at each site: ```r site.pre <- rowSums(spe > 0) barplot(site.pre, main = "Species richness", xlab = "Sites", ylab = "Number of species", col = "grey ", las = 1) ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-8-1.png" width="720" style="display: block; margin: auto;" /> --- # Understand your data! .center[...to choose the appropiate transformation and distance] - Are there many zeros? - What do they mean? .alert[A measured 0 (e.g 0mg/L, 0°C) is not the same than a 0 representing an absence observations] --- # Before transforming your community data... .alert[Important considerations:] -- - relative abundances/counts/presence-absence? -- - asymmetrical distributions? -- - many rare species? -- - overabundance of dominant species? -- - double Zero problem? --- # Transforming community data .center[ ] --- # Transforming your community data ## Examples Transforming counts into presence - absence ```r library(vegan) spec.pa <- decostand(spe, method = "pa") ``` Reducing the weight of rare species ```r spec.hel <- decostand(spe, method = "hellinger") spec.chi <- decostand(spe, method = "chi.square") ``` Reducing the weight of very abundant species ```r spe.pa <- decostand(spe,method = "log") ``` --- # Doubs Environmental Data ```r names(env) # Names of objects dim(env) # dimensions str(env) # structure of objects summary(env) # summary statistics head(env) # first 6 rows ``` ```r head(env) # first 6 rows # das alt pen deb pH dur pho nit amm oxy dbo # 1 0.3 934 48.0 0.84 7.9 45 0.01 0.20 0.00 12.2 2.7 # 2 2.2 932 3.0 1.00 8.0 40 0.02 0.20 0.10 10.3 1.9 # 3 10.2 914 3.7 1.80 8.3 52 0.05 0.22 0.05 10.5 3.5 # 4 18.5 854 3.2 2.53 8.0 72 0.10 0.21 0.00 11.0 1.3 # 5 21.5 849 2.3 2.64 8.1 84 0.38 0.52 0.20 8.0 6.2 # 6 32.4 846 3.2 2.86 7.9 60 0.20 0.15 0.00 10.2 5.3 ``` Explore colinearity by visualizing correlations between variables ```r pairs(env, main = "Bivariate Plots of the Environmental Data") ``` --- # Doubs Environmental Data  --- # Standardization Standardizing environmental variables is crucial as you cannot compare the effects of variables with different units ```r ## ?decostand env.z <- decostand(env, method = "standardize") ``` This centers and scales the variables to make your downstream analysis more appropriate ```r apply(env.z, 2, mean) # das alt pen deb pH # -7.959539e-17 -4.795165e-17 2.494600e-17 -7.323225e-17 -1.730430e-15 # dur pho nit amm oxy # -2.028505e-16 4.445790e-17 2.875893e-17 2.754434e-17 -4.038167e-16 # dbo # 9.829975e-17 apply(env.z, 2, sd) # das alt pen deb pH dur pho nit amm oxy dbo # 1 1 1 1 1 1 1 1 1 1 1 ``` --- class: inverse, center, middle # 3. Similarity / Dissimilarity --- # Association measures Matrix algebra is at the heart of all ordinations .center[] - Exploring various measures of distance between objects provides some understanding of the engine under the hood --- # Breaking out of 1D .pull-left[ - As you have seen, ecological datasets can sometimes be very large matrices - Ordinations compute the relationships between species or between sites - We can simplify these relationships using methods of dissimilarity ] .pull-right[    ] --- # Similarity / Dissimilarity - Useful to understand your dataset - Appropriate measure required by some types of ordinations .center[ Similarity: S = 1 - D Distance: D = 1-S]  --- # Community distance measures .pull-left[ - Euclidean - Manhattan - Chord ] .pull-right[ - Hellinger - Chi-square - Bray-Curtis ] -- <br/> <br/> <br/> .alert[Each of these will be useful in different situations] --- # Comparing Doubs Sites The `vegdist()` function contains all common distances ```r ?vegdist ``` How different is the community composition across the 30 sites of the Doubs River? ```r spe.db.pa <- vegdist(spe, method = "bray") ``` --- # Comparing Doubs Sites .center[] --- # Comparing Doubs Sites .center[] --- # Visualization of distance matrices .center[] --- # Challenge #1 ![:cube]() <br/> Discuss with your neighbor: <br/> .center[**How can we tell how similar objects are when we have multivariate data?**] <br/> - Make a list of all your suggestions --- # And what about ordination? With ordination methods, we order our objects (site) according to their similarity - The more the sites are similar, the closer they are in the ordination space (smaller distances) - In Ecology, we usually calculate the similarity between sites according to their species composition or their environmental conditions. --- # Schematic analysis of multivariate analysis .center[ ] --- # Clustering - To highlight structures in the data by partitioning either objects or the descriptors - Results are represented as dendrograms (trees) - Not a statistical method .center[ ] --- # Overview of 3 hierarchical methods <br> - Single linkage agglomerative clustering <br> - Complete linkage, agglomerative clustering <br> - Ward's minimum variance clustering <br> - Elements of lower are nested in higher ranking clusters - (e.g. species, genus, family, order) --- # Hierarchical methods A distance matrix is first sorted in increasing distance order  --- # Single linkage clustering .pull-left[  -- ] .pull-right[ - The two closest objects merge - The next two closest objects/clusters merge - and so on  ] --- # Complete linkage clustering .pull-left[  ] .pull-right[ - The two closest objects merge - The next two objects/cluster will agglomerate when linked to the furthest element of the group  ] --- # Comparison Create a distance matrix from Hellinger transformed Doubs river data and compute the single linkage clustering ```r spe.dhe1 <- vegdist(spec.hel, method = "euclidean") spe.dhe1.single <- hclust(spe.dhe1, method = "single") plot(spe.dhe1.single) ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-19-1.png" width="504" style="display: block; margin: auto;" /> --- # Comparison  .pull-left[ **Single linkage:** Chains of objects occur (e.g. 19,29,30,26) ] .pull-right[ **Complete linkage:** Contrasted groups are formed of objects occur ] --- # Ward's minimum variance method - Uses the criterion of least squares to cluster objects into groups - At each step, the pair of clusters merging is the one leading to the minimum increase in total within-group sum of squares --- # Ward's method Compute the Ward's minimum variance clustering and plot the dendrogram by using the square root of the distances: ```r spe.dhel.ward <- hclust(spe.dhe1, method = "ward.D2") spe.dhel.ward$height <- sqrt(spe.dhel.ward$height) plot(spe.dhel.ward, hang = -1) # hang = -1 aligns objects at the same level ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-20-1.png" width="648" style="display: block; margin: auto;" /> --- # Ward's method <img src="workshop09-en_files/figure-html/unnamed-chunk-21-1.png" width="648" style="display: block; margin: auto;" /> Clusters generated using this method tend to be more spherical and to contain similar number of objects --- # How to choose the right method? - Depends on the objective - highlights gradients? contrasts? - If more than on method seems appropriate, compare dendrograms - Again: this is **not** an statistical method But! is possible to: - determine the optimal number of interpretable clusters - compute clustering statistics - combine clustering to ordination to distinguish groups of sites --- class: inverse, center, middle # 4. Unconstrained ordination --- # Definitions -- - **Variance:** measure of a variable `\(y_j\)` dispersion from its mean -- - **Co-variance:** measure of co-dispersion of variables `\(y_j\)` et `\(y_i\)` from their means -- - **Correlation:** measure of the link strength between 2 variables: `\(r_{ij} = (d_{ij} / d_j x d_k)\)` -- - **Eigenvalues:** Proportio of variance (dispersion) represented by one ordination axe. -- - **Orthogonality:** right angle between 2 axis or 2 arrows which means that these 2 are independent = non correlated. -- - **Score:** position of a dot on an axis. All the scores of a dot give its coordinates in the multidimensional space. They can be used as new variable for other analyses (e.g. linear combination of measured variables). -- - **Dispersion** (inertia): Measure of the total variability of the scatter plot (descriptors) in the multidimensional space with regards to its center of gravity. --- # Unconstrained ordination - Asses relationships **within** a set of variables (species or environmental variables, not **between** sets, i.e. constrained analysis) - Find key components of variation between samples, sites, species, etc... ç - Reduce the number of dimensions in multivariate data without substantial loss of information - Create new variables for use in subsequent analysis (such as regression) --- # 4.1. Principal Component Analysis (PCA) .center[ ] - Preserves, in 2D, the maximum amount of variation in the data - The resulting, synthetic variables are orthogonal (and therefore uncorrelated) --- # PCA - What you need - A set of variables that are response variables (e.g. community composition) OR explanatory variables (e.g. environmental variables) **NOT BOTH!** .pull-left[ - Samples that are measured for the same set of variables - Generally a dataset that is longer than it is wide is preferred ] .pull-right[  ] --- # PCA - Walkthrough <br/><br/> |Site|Species 1| Species 2| |---|------|------| |A|7|3| |B|4|3| |C|12|10| |D|23|11| |E|13|13| |F|15|16| |G|18|14| .alert[ A simplified example ] --- # PCA - Walkthrough .center[ ] .small[ .alert[In 2D, we would plot the sites like this... Notice the dispersion in the scatterplot]] --- # PCA - Walkthrough .center[ ] .small[ .alert[Our first component is essentially drawn trough the maximum amount of observed variation... or the best fit line through the points]] --- # PCA - Walkthrough .center[ ] .small[ .alert[A second principal component is then added perpendicular (90 degrees in 2D) to the first axis]] --- # PCA - Walkthrough .center[ ] .small[ The final plot then is the two PC axes rotated where the axes are now principal components as opposed to species] --- # PCA - Multidimensional case <br/><br/> - **PC1** --> axis that maximizes the variance of the points that are projected perpendicularly onto the axis. - **PC2** --> must be perpendicular to PC1, but the direction is again the one in which variance is maximized when points are perpendicularly projected - **PC3** --> and so on: perpendicular to the first two axes <br/> .alert[When there are more than two dimensions, PCA produces a new spaces in which all PCA axes are orthogonal (i.e. non-correlated) and where the PCA axes are ordered according to the percent of variance of the original data they explain] --- # PCA - Let's try it on Fish Species! - For both PCA and RDA, we will be using the `rda()` function in the vegan package - Run a PCA on the Hellinger-transformed fish data and extract the results ```r spe.h.pca <- rda(spec.hel) summary(spe.h.pca) # # Call: # rda(X = spec.hel) # # Partitioning of variance: # Inertia Proportion # Total 0.5025 1 # Unconstrained 0.5025 1 # # Eigenvalues, and their contribution to the variance # # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 # Eigenvalue 0.2580 0.06424 0.04632 0.03850 0.02197 0.01675 0.01472 # Proportion Explained 0.5133 0.12784 0.09218 0.07662 0.04371 0.03334 0.02930 # Cumulative Proportion 0.5133 0.64118 0.73337 0.80999 0.85370 0.88704 0.91634 # PC8 PC9 PC10 PC11 PC12 PC13 # Eigenvalue 0.01156 0.006936 0.006019 0.004412 0.002982 0.002713 # Proportion Explained 0.02300 0.013803 0.011978 0.008781 0.005935 0.005399 # Cumulative Proportion 0.93934 0.953144 0.965123 0.973903 0.979838 0.985237 # PC14 PC15 PC16 PC17 PC18 PC19 # Eigenvalue 0.001835 0.001455 0.001118 0.0008309 0.0005415 0.0004755 # Proportion Explained 0.003651 0.002895 0.002225 0.0016535 0.0010776 0.0009463 # Cumulative Proportion 0.988888 0.991783 0.994008 0.9956612 0.9967389 0.9976852 # PC20 PC21 PC22 PC23 PC24 # Eigenvalue 0.0003680 0.0002765 0.0002253 0.0001429 7.618e-05 # Proportion Explained 0.0007324 0.0005503 0.0004483 0.0002845 1.516e-04 # Cumulative Proportion 0.9984176 0.9989678 0.9994161 0.9997006 9.999e-01 # PC25 PC26 PC27 # Eigenvalue 4.99e-05 1.526e-05 9.118e-06 # Proportion Explained 9.93e-05 3.036e-05 1.814e-05 # Cumulative Proportion 1.00e+00 1.000e+00 1.000e+00 # # Scaling 2 for species and site scores # * Species are scaled proportional to eigenvalues # * Sites are unscaled: weighted dispersion equal on all dimensions # * General scaling constant of scores: 1.93676 # # # Species scores # # PC1 PC2 PC3 PC4 PC5 PC6 # CHA 0.17336 0.08295 -0.064963 0.2539861 -0.0285801 0.019057 # TRU 0.64860 0.01162 -0.261994 -0.1606020 -0.0745819 -0.088616 # VAI 0.51810 0.14773 0.165304 0.0241017 0.1012928 0.104748 # LOC 0.38606 0.16615 0.242995 -0.0275216 0.1258011 0.048299 # OMB 0.16893 0.06274 -0.096143 0.2426514 0.0140574 0.062117 # BLA 0.07786 0.14644 -0.031402 0.2339394 -0.1032338 -0.040810 # HOT -0.18491 0.04901 -0.045107 0.0199377 0.0687305 0.009650 # TOX -0.14644 0.17834 -0.010937 0.0649955 -0.0006229 -0.106955 # VAN -0.11436 0.15673 0.142223 -0.0127266 -0.1989404 0.013897 # CHE -0.09682 -0.15449 0.242943 0.1124210 0.0233830 -0.039996 # BAR -0.19826 0.21211 -0.053980 0.0969899 0.0067098 -0.035442 # SPI -0.17689 0.16250 -0.033112 0.0397113 0.0323159 -0.072908 # GOU -0.23138 0.09782 0.064144 -0.0013887 -0.1503303 0.130575 # BRO -0.15129 0.12804 0.040303 -0.1203826 -0.1006077 0.066242 # PER -0.15719 0.18144 0.057029 -0.0940032 -0.0412984 -0.060409 # BOU -0.22853 0.13870 -0.062197 -0.0125024 0.0798647 -0.006907 # PSO -0.22790 0.08231 -0.065797 0.0172143 0.0611434 -0.001407 # ROT -0.19221 0.03090 -0.006264 -0.0739133 -0.0731548 0.074581 # CAR -0.18699 0.13388 -0.050804 0.0001803 0.0403961 -0.031005 # TAN -0.19169 0.15719 0.114415 -0.0818330 0.0142624 -0.072024 # BCO -0.20174 0.08807 -0.067086 -0.0529106 0.0737228 0.037312 # PCH -0.14717 0.05829 -0.067311 -0.0458414 0.0501013 0.031605 # GRE -0.30155 -0.01785 -0.084333 -0.0181797 0.0226500 0.126639 # GAR -0.35245 -0.14076 0.168014 0.0185946 0.0213462 -0.129788 # BBO -0.24317 0.03679 -0.082731 -0.0384489 0.0939828 0.063369 # ABL -0.42536 -0.26155 -0.054190 0.1021959 -0.0078085 0.044540 # ANG -0.20631 0.11889 -0.062079 -0.0175733 0.0718743 -0.001956 # # # Site scores (weighted sums of species scores) # # PC1 PC2 PC3 PC4 PC5 PC6 # 1 0.367401 -0.39935 -1.08857 -0.63304 -0.512027 -0.858378 # 2 0.503582 -0.05683 -0.19259 -0.43441 0.389533 0.069451 # 3 0.461709 0.02262 -0.06522 -0.49798 0.309425 0.270577 # 4 0.298336 0.15130 0.26748 -0.53196 0.003088 0.184821 # 5 -0.002222 0.07631 0.54769 -0.50936 -0.780261 -0.169353 # 6 0.212816 0.08345 0.55091 -0.42210 -0.139518 -0.104278 # 7 0.438055 -0.06114 0.15590 -0.31150 0.158686 0.036565 # 9 0.040794 -0.44269 0.89022 0.09609 0.641193 -0.646943 # 10 0.298011 -0.01094 0.56837 -0.10013 -0.088124 0.515072 # 11 0.467609 -0.12622 -0.15505 0.29459 0.325464 0.200912 # 12 0.476845 -0.07691 -0.16329 0.29384 0.360112 0.194576 # 13 0.483620 0.06649 -0.44723 0.53734 0.048587 0.182565 # 14 0.371728 0.16555 -0.21939 0.62130 -0.183604 0.364847 # 15 0.277048 0.23525 0.08928 0.61773 -0.475769 0.124107 # 16 0.077024 0.47455 0.17116 0.34361 -0.570434 -0.572740 # 17 -0.053860 0.42290 0.02810 0.42376 -0.059203 -0.586419 # 18 -0.135418 0.37780 0.03233 0.39706 -0.007199 -0.347064 # 19 -0.269281 0.30751 0.18022 0.09354 0.178657 -0.016299 # 20 -0.378830 0.19764 0.04939 -0.03438 0.157660 -0.056696 # 21 -0.409369 0.22888 -0.08401 -0.12823 0.152787 0.096105 # 22 -0.443679 0.17698 -0.13708 -0.13152 0.103294 0.030004 # 23 -0.242292 -1.11711 0.15254 0.40512 0.045573 -0.576778 # 24 -0.358333 -0.83372 -0.17314 0.27200 0.181192 0.347231 # 25 -0.325288 -0.61983 0.10487 0.01059 -1.034438 0.750325 # 26 -0.441703 0.02111 -0.13742 -0.14346 0.200775 0.244356 # 27 -0.444529 0.12735 -0.15915 -0.14112 0.179240 0.123487 # 28 -0.446407 0.12774 -0.18830 -0.15467 0.239617 0.117101 # 29 -0.355788 0.28044 -0.28006 -0.02003 0.110181 0.079568 # 30 -0.467578 0.20086 -0.29797 -0.21269 0.065512 0.003276 ``` --- # Function `rda()` - RDA is in 2 steps - multiple regressions - PCA on regressed values - If we give only one table to the function `rda()` it does directly a PCA without doing regression .center[ .alert[ rda(Y~X)  RDA rda(Y) or rda(X)  PCA ]] --- #PCA - Interpretation of Output .center[ ] - Total variance explained by the descriptors (here the fish species) - In PCA, not that the "Total" and "Unconstrained" portion of the explained variance is identical --- # PCA - Interpretation of Output <br/> .center[  ] - List the eigenvalues associated to each Principal Component (in this output there are 27 PCs, as this is the number of dimensions in the data) <br/> .alert[ An eigenvalue is the value of the change in the length of a vector, and for our purposes is the amount of variation explained by Principal Component ] <br/> .center[0.258 + 0.064 + ... = 0.5025 Total explained variance] --- # PCA - Interpretation of Output   - List of the proportion of variance explained by each Principal Component (as well as cumulative) <br/> .center[51.3% of 0.5025 is 0.258] --- # PCA - Interpretation of Output  - There are two ways to represent an ordination in 2D, here the output is informing us that it used the default scaling, which is type 2... <br/><br/> -- More on this later! --- # PCA - Interpretation of Output  - Species refers to your descriptors (i.e. the columns in your dataset), which here are the Fish species - Scores refer to the position of every species in the PC. Essentially they are the coordinates of each species along the principal component --- # PCA - Interpretation of Output  - Site refers to the rows in your dataset, which here are the different sites along the Doubs river (but it can be points in time, etc) --- # Accessing Parts of the Output The output is very dense, but you can access specific information if needed. For example, you can access the eigenvalues associated contribution to variance : ```r summary(spe.h.pca, display = NULL) # # Call: # rda(X = spec.hel) # # Partitioning of variance: # Inertia Proportion # Total 0.5025 1 # Unconstrained 0.5025 1 # # Eigenvalues, and their contribution to the variance # # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 # Eigenvalue 0.2580 0.06424 0.04632 0.03850 0.02197 0.01675 0.01472 # Proportion Explained 0.5133 0.12784 0.09218 0.07662 0.04371 0.03334 0.02930 # Cumulative Proportion 0.5133 0.64118 0.73337 0.80999 0.85370 0.88704 0.91634 # PC8 PC9 PC10 PC11 PC12 PC13 # Eigenvalue 0.01156 0.006936 0.006019 0.004412 0.002982 0.002713 # Proportion Explained 0.02300 0.013803 0.011978 0.008781 0.005935 0.005399 # Cumulative Proportion 0.93934 0.953144 0.965123 0.973903 0.979838 0.985237 # PC14 PC15 PC16 PC17 PC18 PC19 # Eigenvalue 0.001835 0.001455 0.001118 0.0008309 0.0005415 0.0004755 # Proportion Explained 0.003651 0.002895 0.002225 0.0016535 0.0010776 0.0009463 # Cumulative Proportion 0.988888 0.991783 0.994008 0.9956612 0.9967389 0.9976852 # PC20 PC21 PC22 PC23 PC24 # Eigenvalue 0.0003680 0.0002765 0.0002253 0.0001429 7.618e-05 # Proportion Explained 0.0007324 0.0005503 0.0004483 0.0002845 1.516e-04 # Cumulative Proportion 0.9984176 0.9989678 0.9994161 0.9997006 9.999e-01 # PC25 PC26 PC27 # Eigenvalue 4.99e-05 1.526e-05 9.118e-06 # Proportion Explained 9.93e-05 3.036e-05 1.814e-05 # Cumulative Proportion 1.00e+00 1.000e+00 1.000e+00 # # Scaling 2 for species and site scores # * Species are scaled proportional to eigenvalues # * Sites are unscaled: weighted dispersion equal on all dimensions # * General scaling constant of scores: ``` --- # Accessing Parts of the Output You can calculate the eigenvalues from scratch ```r eigen(cov(spec.hel)) # eigen() decomposition # $values # [1] 2.579605e-01 6.424089e-02 4.632294e-02 3.850244e-02 2.196526e-02 # [6] 1.675463e-02 1.472430e-02 1.155759e-02 6.936149e-03 6.019271e-03 # [11] 4.412388e-03 2.982309e-03 2.713021e-03 1.834874e-03 1.454670e-03 # [16] 1.117858e-03 8.308832e-04 5.415301e-04 4.755244e-04 3.680458e-04 # [21] 2.765106e-04 2.252760e-04 1.429425e-04 7.618319e-05 4.989831e-05 # [26] 1.525627e-05 9.117507e-06 # # $vectors # [,1] [,2] [,3] [,4] [,5] # [1,] -0.12492725 -0.11979088 -0.11047444 0.4737644443 0.070581708 # [2,] -0.46740781 -0.01678206 -0.44554311 -0.2995735541 0.184188349 # [3,] -0.37336215 -0.21333150 0.28111355 0.0449572376 -0.250153773 # [4,] -0.27821421 -0.23994030 0.41323337 -0.0513364598 -0.310679865 # [5,] -0.12173642 -0.09059800 -0.16349912 0.4526216196 -0.034716207 # [6,] -0.05610722 -0.21147318 -0.05340233 0.4363710457 0.254947303 # [7,] 0.13325245 -0.07077305 -0.07670861 0.0371901204 -0.169737613 # [8,] 0.10553143 -0.25754282 -0.01860002 0.1212372044 0.001538404 # [9,] 0.08240964 -0.22633305 0.24186207 -0.0237391101 0.491305413 # [10,] 0.06977391 0.22309480 0.41314626 0.2097008258 -0.057746857 # [11,] 0.14287815 -0.30629980 -0.09179824 0.1809168920 -0.016570564 # [12,] 0.12747277 -0.23466960 -0.05630920 0.0740741884 -0.079807600 # [13,] 0.16674041 -0.14126479 0.10908251 -0.0025902725 0.371257345 # [14,] 0.10902847 -0.18489900 0.06853886 -0.2245517214 0.248461973 # [15,] 0.11327432 -0.26201530 0.09698255 -0.1753458131 0.101990914 # [16,] 0.16468619 -0.20028850 -0.10577151 -0.0233209939 -0.197234780 # [17,] 0.16423411 -0.11885539 -0.11189266 0.0321101179 -0.151000389 # [18,] 0.13851267 -0.04462111 -0.01065276 -0.1378716381 0.180663942 # [19,] 0.13475615 -0.19333722 -0.08639631 0.0003363296 -0.099762622 # [20,] 0.13814201 -0.22699492 0.19457272 -0.1526445449 -0.035222643 # [21,] 0.14538492 -0.12717573 -0.11408552 -0.0986949356 -0.182066526 # [22,] 0.10605502 -0.08418031 -0.11446859 -0.0855088113 -0.123730627 # [23,] 0.21730821 0.02578243 -0.14341490 -0.0339108533 -0.055936589 # [24,] 0.25399136 0.20327524 0.28572212 0.0346847713 -0.052716826 # [25,] 0.17523797 -0.05313205 -0.14069192 -0.0717194255 -0.232100922 # [26,] 0.30653184 0.37769472 -0.09215443 0.1906277303 0.019284009 # [27,] 0.14867527 -0.17168899 -0.10557045 -0.0327798312 -0.177501675 # [,6] [,7] [,8] [,9] [,10] # [1,] -0.053887689 -0.24810037 -0.012835717 -0.10905228 -0.095263427 # [2,] 0.250578050 0.04342150 0.045242245 -0.01657237 -0.233552880 # [3,] -0.296193043 0.14902952 -0.284313668 -0.21706810 -0.017643581 # [4,] -0.136573518 0.19249109 0.035593297 0.23883835 0.107345114 # [5,] -0.175647314 -0.35879328 -0.125731280 -0.23247198 -0.081584893 # [6,] 0.115397447 0.02602402 -0.008985920 0.46782610 0.253931194 # [7,] -0.027287149 0.17191656 -0.056903256 0.12241555 -0.267914356 # [8,] 0.302432690 0.31491218 -0.121242398 -0.08140984 -0.001023017 # [9,] -0.039296150 0.27420594 0.382252543 -0.34522935 0.075797190 # [10,] 0.113094938 -0.15366272 0.316740555 -0.19574474 -0.422805581 # [11,] 0.100219643 0.05701771 0.110743898 0.11071563 0.013570308 # [12,] 0.206160040 0.20915112 -0.127314825 -0.20541064 -0.018588210 # [13,] -0.369222609 0.06892519 0.050259104 0.21861324 -0.286513706 # [14,] -0.187310981 -0.21267218 -0.459306683 0.10824041 0.047613459 # [15,] 0.170817866 -0.15781651 -0.341981765 -0.13144198 -0.427792205 # [16,] 0.019529799 0.02492789 -0.005265647 -0.19601787 0.034928730 # [17,] 0.003979634 0.12082789 0.073067830 0.16278240 -0.215420671 # [18,] -0.210892020 -0.16084279 -0.049480403 -0.26577752 0.348175747 # [19,] 0.087671511 0.03926802 0.065931818 -0.10755312 0.076887825 # [20,] 0.203659075 -0.36689854 0.033797632 0.19059986 -0.118195939 # [21,] -0.105506691 -0.13126545 0.154555558 -0.06167506 0.189787850 # [22,] -0.089369346 -0.18747260 0.189322337 -0.11164462 0.114566376 # [23,] -0.358094688 0.03849187 0.078583251 0.14414986 -0.155532257 # [24,] 0.366999468 -0.16992091 -0.186036644 0.09141482 0.234279411 # [25,] -0.179186418 -0.05488683 0.175277903 0.16731333 -0.082295798 # [26,] -0.125944557 0.36767940 -0.361679629 -0.10571257 -0.048657458 # [27,] 0.005531863 -0.01735104 0.033932091 -0.17997124 0.055928515 # [,11] [,12] [,13] [,14] [,15] # [1,] 0.064193899 9.272200e-02 -0.16614224 -1.007045e-01 0.52078261 # [2,] 0.006090464 -1.767707e-01 -0.01500439 -2.138553e-01 0.01505748 # [3,] 0.198674624 1.187412e-01 0.22387562 1.401732e-01 -0.03706442 # [4,] -0.181337621 -1.793234e-01 -0.12099866 -3.195848e-01 0.11696265 # [5,] 0.109901333 -1.909575e-01 0.01986808 -1.990958e-01 -0.21073314 # [6,] -0.027035432 1.424063e-01 0.29676760 1.727283e-02 -0.33039629 # [7,] 0.329498373 -1.902815e-01 0.03643482 -3.420115e-02 -0.03199007 # [8,] 0.074565567 2.624298e-01 -0.33066385 2.194720e-01 0.18081461 # [9,] 0.162759574 -1.177892e-02 0.22408341 -3.976988e-01 0.07703691 # [10,] -0.273052803 6.843350e-02 -0.16901293 3.681540e-02 -0.14394441 # [11,] -0.301652467 -3.531787e-01 -0.04554730 2.941090e-02 0.19171054 # [12,] 0.013541172 -9.662652e-02 -0.18461360 5.282229e-02 -0.23627348 # [13,] 0.023653254 -1.620604e-01 -0.08583841 3.682820e-01 -0.12706188 # [14,] -0.342951550 1.478914e-01 -0.29079127 -2.863970e-01 0.00525299 # [15,] 0.019534596 1.804957e-01 0.33595365 1.170536e-02 -0.03963800 # [16,] -0.058319853 -2.819689e-01 -0.01082354 -4.162952e-05 -0.23904975 # [17,] 0.155396405 2.942242e-01 -0.02065819 -2.919909e-01 0.05106059 # [18,] 0.247446631 -1.217100e-01 -0.15658543 1.691032e-01 0.12555708 # [19,] -0.109497442 2.568847e-01 -0.06684604 1.299928e-02 0.05793150 # [20,] 0.054654572 -2.904371e-01 0.21585911 2.487916e-02 0.25823489 # [21,] -0.092448479 9.914451e-05 0.17856405 1.587817e-01 0.15049262 # [22,] -0.232342259 3.477822e-01 0.18661740 -1.375730e-01 -0.18295055 # [23,] 0.205833044 2.307896e-02 -0.29488518 -2.919401e-01 -0.15684106 # [24,] 0.385104579 -7.652965e-02 -0.06032102 -2.649237e-01 -0.07243513 # [25,] 0.178519580 1.478969e-01 0.18711090 2.255094e-02 0.22092863 # [26,] -0.289091422 -1.567421e-01 0.34575306 -1.653173e-01 0.25023307 # [27,] -0.115604287 -1.656960e-01 0.02369596 -1.003356e-01 -0.21515621 # [,16] [,17] [,18] [,19] [,20] # [1,] -0.22297857 -0.029695770 0.0369983843 -0.04070760 -0.073488990 # [2,] 0.11758248 -0.094655191 -0.3039813363 0.11286530 0.299642984 # [3,] 0.16117496 -0.495586427 -0.1044359243 0.12856875 0.013599789 # [4,] -0.16061501 0.426591745 -0.0700878681 0.04243889 0.181772967 # [5,] 0.11416978 0.234783573 -0.2024171014 -0.11756635 -0.011973397 # [6,] -0.02236301 -0.025369853 0.0193321574 0.12426941 0.322316778 # [7,] 0.32588153 0.255676104 0.0836105212 -0.01276921 -0.083323060 # [8,] 0.05278065 0.094429985 -0.4016015744 0.08989816 0.048646388 # [9,] 0.06586493 -0.015362879 -0.0149894684 -0.09279079 -0.102439703 # [10,] 0.21439494 -0.076520520 -0.0438275294 0.10630753 0.384704993 # [11,] 0.40582706 -0.226264310 0.2531720192 0.35500305 -0.198035086 # [12,] -0.22343643 -0.136365379 0.2231584172 -0.31860481 0.285984792 # [13,] -0.05557784 0.044039568 -0.3686499771 -0.15179616 -0.072002029 # [14,] 0.30779187 -0.117211955 -0.0130182734 -0.21646165 0.029597144 # [15,] -0.10526778 0.346563681 0.2121059495 0.30923016 -0.017471685 # [16,] -0.03403174 0.017495879 -0.1220422599 0.06909829 -0.078513734 # [17,] 0.08609131 -0.140926709 0.1227082448 -0.15559985 0.164321594 # [18,] 0.06232076 0.125052490 0.2153262828 0.24582813 0.527750155 # [19,] -0.13539555 0.053580142 -0.1421409249 0.07438860 -0.089636202 # [20,] -0.33914754 -0.320857989 -0.1424350464 -0.08650200 0.115010361 # [21,] 0.15124184 0.128069950 -0.3301085643 0.07436707 0.129946113 # [22,] -0.01113887 -0.009198274 -0.1837530613 0.12666570 -0.081687057 # [23,] -0.31488220 -0.214773321 -0.0321193373 0.49225288 0.015647490 # [24,] 0.18249725 -0.072631142 -0.2987937434 0.07741704 -0.083649110 # [25,] 0.24864024 0.021664187 0.0298929458 -0.29184758 0.250656528 # [26,] -0.10772295 -0.033545836 -0.1814804392 0.04637292 0.214002704 # [27,] -0.11681150 -0.098699646 -0.0002682813 -0.23907156 0.002993534 # [,21] [,22] [,23] [,24] [,25] # [1,] -0.23699102 -0.006673203 0.2150504253 0.16746060 -0.154668929 # [2,] -0.15419656 0.046253218 -0.0067084047 0.09885408 -0.078608286 # [3,] -0.05256293 0.019359564 0.0157935753 0.07403182 -0.007679928 # [4,] -0.11922265 0.034171553 -0.0921754116 -0.02641738 -0.041708919 # [5,] 0.15675411 -0.016382824 -0.2392621523 -0.21441061 0.120130707 # [6,] 0.10304062 0.029520263 0.1445333785 0.12969633 -0.041582122 # [7,] 0.11447334 -0.382187902 0.0345617270 0.52628189 0.130188947 # [8,] 0.35742023 -0.010269888 -0.2170804342 -0.18712889 0.025799589 # [9,] 0.11857691 -0.067729604 0.0696803012 -0.03292835 -0.054141786 # [10,] 0.06410052 0.010662229 0.1013814451 0.12164894 -0.020966944 # [11,] -0.16664663 0.032456581 -0.2018014752 -0.12893277 -0.040193655 # [12,] -0.30440951 -0.426137953 0.0004305712 -0.11427417 -0.222875681 # [13,] -0.38180475 0.127858213 -0.1022442675 0.01866138 0.003199344 # [14,] 0.16015665 -0.092748003 0.1521319540 0.09192202 -0.045663839 # [15,] -0.10920996 0.100235497 0.0558832865 -0.14393925 -0.102532789 # [16,] 0.15076000 0.359021371 0.3335644539 -0.03027825 -0.182409390 # [17,] -0.22199225 0.255910958 0.0211511138 -0.22348748 0.570074992 # [18,] -0.05903632 0.130256109 -0.1931041416 0.13744937 0.157243132 # [19,] -0.09080438 0.206395852 0.0414079719 0.56114128 0.012920746 # [20,] 0.28579768 -0.143344064 -0.1585208609 0.07093030 0.162651785 # [21,] -0.14384975 -0.336903092 0.5426784348 -0.22794946 0.185841056 # [22,] -0.21430560 -0.290323080 -0.4742989109 0.09398279 -0.059637704 # [23,] 0.15837319 -0.135723150 0.0876146386 -0.09325476 -0.146564489 # [24,] -0.36618777 0.089487942 0.0042739214 -0.00175145 -0.159327548 # [25,] 0.11492131 0.195702984 -0.1191717904 -0.08069023 -0.594202836 # [26,] -0.01372967 0.005518255 -0.0851270189 0.04835800 0.010916689 # [27,] 0.02170163 0.299274679 -0.0384912363 0.19527646 0.138899872 # [,26] [,27] # [1,] 0.286444574 0.160115454 # [2,] -0.004346981 0.020253239 # [3,] -0.018510820 0.008667706 # [4,] 0.012547754 0.008152367 # [5,] -0.280152618 -0.176124136 # [6,] 0.050969707 0.059394910 # [7,] 0.173278980 0.050786372 # [8,] 0.094611255 0.166874398 # [9,] -0.034886498 -0.013455934 # [10,] 0.010993922 0.027774588 # [11,] -0.133708827 0.003576285 # [12,] -0.096209440 -0.136306223 # [13,] 0.030967117 -0.001104217 # [14,] 0.030111639 -0.022329185 # [15,] -0.067849606 0.126779482 # [16,] 0.506057486 -0.347584407 # [17,] 0.124873480 -0.177187525 # [18,] 0.102384832 -0.028288166 # [19,] -0.502490829 -0.367340373 # [20,] 0.018992525 -0.141581060 # [21,] -0.158654849 0.124060624 # [22,] 0.393638849 -0.003437816 # [23,] -0.159732210 0.104801206 # [24,] -0.033666811 0.066412841 # [25,] -0.111096800 0.012335431 # [26,] 0.010694156 -0.038642171 # [27,] -0.090405565 0.729577313 ``` --- # Accessing Parts of the Output You may wish to extract the scores (either from species or sites) for use in subsequent analysis or for plotting - Access the species scores along the 1st and 2nd PC: ```r spe.scores <- scores(spe.h.pca, display = "species", choices = c(1,2)) ``` - Access the site scores along the 1st and 2nd PC: ```r site.scores <- scores(spe.h.pca, display = "sites", choices = c(1,2)) ``` --- # Selecting Significant PCs <br> - The strength of PCA is that we can condense the variance contained in a huge dataset into a set of synthetic variables that is manageable - In our case, there are still 27 Principal Components, but only the first few account for any significant amount of variance, while the rest can simply be discarded as noise... -- .alert[How do we manage this?] --- # Kaiser - Guttman criterion Selects principal components which capture more variance than the average of all PCs - Extract the eigenvalues associated to the PCs ```r ev <- spe.h.pca$CA$eig ``` - Select all eigenvalues above average ```r ev[ev>mean(ev)] # PC1 PC2 PC3 PC4 PC5 # 0.25796049 0.06424089 0.04632294 0.03850244 0.02196526 ``` --- # Kaiser - Guttman criterion (visualization) ```r n <- length(ev) barplot(ev, main = "Eigenvalues", col = "grey", las = 2) abline(h = mean(ev), col = "red3", lwd = 2) legend("topright", "Average eigenvalue", lwd = 2, col = "red3" , bty = "n") ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-29-1.png" width="720" style="display: block; margin: auto;" /> --- # PCA - environmental variables We can also run PCAs on standardized environmental variables, to compare sites for example, or how variables are correlated... - Run a PCA on the standardized environmental variables and extract the results ```r env.pca <- rda(env.z) summary(env.pca, scaling = 2) # default # # Call: # rda(X = env.z) # # Partitioning of variance: # Inertia Proportion # Total 11 1 # Unconstrained 11 1 # # Eigenvalues, and their contribution to the variance # # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 # Eigenvalue 6.0980 2.1671 1.03760 0.70351 0.35185 0.31913 0.16455 # Proportion Explained 0.5544 0.1970 0.09433 0.06396 0.03199 0.02901 0.01496 # Cumulative Proportion 0.5544 0.7514 0.84570 0.90966 0.94164 0.97066 0.98561 # PC8 PC9 PC10 PC11 # Eigenvalue 0.11171 0.023109 0.017361 0.0060618 # Proportion Explained 0.01016 0.002101 0.001578 0.0005511 # Cumulative Proportion 0.99577 0.997871 0.999449 1.0000000 # # Scaling 2 for species and site scores # * Species are scaled proportional to eigenvalues # * Sites are unscaled: weighted dispersion equal on all dimensions # * General scaling constant of scores: 4.189264 # # # Species scores # # PC1 PC2 PC3 PC4 PC5 PC6 # das 1.08432 0.5148 -0.257430 -0.16170 0.21140 -0.09500 # alt -1.04356 -0.5946 0.179904 0.12274 0.12527 0.14024 # pen -0.57520 -0.5104 -0.554958 -0.80205 0.02798 0.20064 # deb 0.95767 0.6412 -0.306547 -0.19434 0.18417 0.03031 # pH -0.05863 0.4820 1.034452 -0.51376 0.14431 0.05791 # dur 0.90722 0.6182 -0.022833 0.15761 -0.27763 0.50792 # pho 1.04604 -0.6092 0.187347 -0.11866 -0.15094 0.04919 # nit 1.14317 -0.1290 0.012045 -0.18470 -0.21343 -0.34870 # amm 0.99541 -0.6989 0.186019 -0.08271 -0.19250 -0.04935 # oxy -1.00895 0.4578 -0.009183 -0.23449 -0.50559 -0.05661 # dbo 0.98991 -0.6835 0.119635 0.03647 0.08580 0.21975 # # # Site scores (weighted sums of species scores) # # PC1 PC2 PC3 PC4 PC5 PC6 # 1 -1.41239 -1.47577 -1.74581 -2.95537 0.23122 0.49150 # 2 -1.04170 -0.81766 0.34078 0.54374 0.92518 -1.77040 # 3 -0.94878 -0.48825 1.36061 -0.21762 1.05157 -0.69842 # 4 -0.88068 -0.29459 0.21011 0.66428 -0.23902 -0.06353 # 5 -0.42586 -0.66501 0.77630 0.78778 0.63144 1.17725 # 6 -0.77727 -0.74517 -0.06763 0.90844 0.46895 -0.32998 # 7 -0.78154 -0.09447 0.39332 0.23074 -0.45171 1.17422 # 9 -0.28731 -0.47351 0.29470 1.13214 0.69989 1.05202 # 10 -0.49324 -0.44889 -1.31855 0.78932 -0.38490 0.41677 # 11 -0.28012 0.43092 0.12222 -0.11792 -1.07089 0.46021 # 12 -0.44851 0.33198 -0.53100 0.60347 -0.96624 0.11902 # 13 -0.38853 0.68557 0.10459 0.08106 -1.10784 0.84740 # 14 -0.25000 0.74161 0.88640 -0.46709 -0.96780 0.74880 # 15 -0.31334 0.93929 1.93010 -1.27074 0.06309 0.14747 # 16 -0.14333 0.31109 -0.21270 0.24369 -0.61836 -0.52781 # 17 0.08992 0.29897 -0.18640 0.23392 -0.73322 -0.44217 # 18 0.05684 0.34974 -0.22088 0.14163 -0.76214 -0.60351 # 19 0.04508 0.40785 0.12274 -0.20084 -0.49815 -0.87652 # 20 0.16121 0.36121 -0.28792 -0.05336 -0.79530 -1.36030 # 21 0.16001 0.32539 -0.74769 0.41012 0.17144 -0.90680 # 22 0.14172 0.53543 -0.08102 -0.07015 0.58783 -0.24777 # 23 1.37609 -1.19053 0.74781 -0.35057 -0.22819 0.75871 # 24 0.98255 -0.51442 0.01132 0.40988 1.01286 0.84623 # 25 2.18629 -2.04865 0.35038 -0.29562 -1.26072 -0.38758 # 26 0.88331 -0.11824 -0.64837 0.33902 0.86015 -0.14587 # 27 0.63976 0.39427 -0.15988 -0.30084 1.09735 -0.66720 # 28 0.75826 0.80550 0.51025 -0.96862 0.41900 -0.74380 # 29 0.65317 1.09395 -1.68223 0.37783 0.43878 0.65258 # 30 0.73840 1.36241 -0.27154 -0.62832 1.42571 0.87947 ``` --- # PCA - environmental variables - Extract the eigenvalues associated to the PCs: ```r ev <- env.pca$CA$eig ``` - Select all the eigenvalues above average ```r ev[ev>mean(ev)] # PC1 PC2 PC3 # 6.097995 2.167126 1.037603 ``` --- # PCA - environmental variables - Plot the eigenvalues above average ```r n <- length(ev) barplot(ev, main = "Eigenvalues", col = "grey", las = 2) abline(h = mean(ev), col = "red3", lwd = 2) legend("topright", "Average eigenvalue", lwd = 2, col = "red3" , bty = "n") ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-33-1.png" width="576" style="display: block; margin: auto;" /> --- # PCA - Visualization The abundance of information produced by PCA is easier to understand and interpret using biplots to visualize patterns - We can produce a quick biplot of the PCA using the function `plot()` in base R ```r plot(spe.h.pca) ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-34-1.png" width="360" style="display: block; margin: auto;" /> --- # PCA basic biplot with plot() .center[ ] .comment[ `plot()` is quick but its hard to interpret the angles between species ] --- # PCA basic `biplot()` - Using the function `biplot()` from base R, arrows are plotted to show the directionality and angle of the descriptors in the ordination .alert[ .small[ - Descriptors at 180 degrees of each other are negatively correlated - Descriptors at 90 degrees of each other have zero correlation - Descriptors at 0 degrees of each other are positively correlated ]] ```r biplot(spe.h.pca) ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-35-1.png" width="324" style="display: block; margin: auto;" /> --- # PCA scaling types  .pull_left[ .small[Type 2 scaling (DEFAULT): distances among objects are not approximations of Euclidean distances; angles between descriptor (species) vectors reflect their correlations.] .alert[ .small[**Best for interpreting relationships among descriptors (species)!**]]] .pull_right[ .small[Type 1 scaling: attempts to preserve the Euclidean distance (in multidimensional space) among objects (sites): the angles among descriptor (species) vector are meaningless.] .alert[ .small[**Best for interpreting relationships among objects (sites)!**]]] --- # Advanced "biplotting" - By extracting specific parts of the PCA output, we can build more detailed and aesthetic plots: ```r plot(spe.h.pca, scaling = 1, type = "none", xlab = c("PC1 (%)", round(spe.h.pca$CA$eig[1]/sum(spe.h.pca$CAeig)*100,2)), ylab = c("PC2 (%)", round(spe.h.pca$CA$eig[2]/sum(spe.h.pca$CA$eig)*100,2))) points(scores(spe.h.pca, display = "sites", choices = c(1,2), scaling = 1), pch=21, col = "black", bg = "steelblue" , cex = 1.2) text(scores(spe.h.pca, display = "species", choices = 1, scaling = 1), scores(spe.h.pca, display = "species", choices = 2, scaling = 1), labels = rownames(scores(spe.h.pca, display = "species", scaling = 1)), col = "red", cex = 0.8) spe.cs <- scores(spe.h.pca, choices = 1:2, scaling = 1 , display = "sp") arrows(0, 0, spe.cs[,1], spe.cs[,2], length = 0) ``` use the `arrows()` function in `graphics` to add vectors --- # Advanced "biplotting" <img src="workshop09-en_files/figure-html/unnamed-chunk-37-1.png" width="504" style="display: block; margin: auto;" /> --- # EASTER EGG: ggvegan - A set of tools for producing biplots using ggplot2 ```r install.packages("devtools") require("devtools") install_github("ggvegan", "gavinsimpson") require("ggvegan") autoplot() ``` --- # EASTER EGG: rgl and vegan 3d Interactive 3D biplots using rgl ```r require(rgl) require(vegan3d) ordirgl(spe.h.pca) ```  --- # Challenge # 3 ![:cube]() Using everything you have learned to execute a PCA on the mite species abundance data ```r data(mite) ``` - What are the significant axes? - Which groups of sites can you identify? - Which groups of species are related to these groups of sites? --- # Solution #3 - Compute PCA on the Hellinger-transformed species data ```r mite.spe.hel <- decostand(mite, method = "hellinger") mite.spe.h.pca <- rda(mite.spe.hel) ``` - Check significant axes using the Guttman-Kaiser criterion ```r ev <- mite.spe.h.pca$CA$eig ev[ev>mean(ev)] n <- length(ev) barplot(ev, main = "Eigenvalues", col = "grey", las = 2) abline(h = mean(ev), col = "red3", lwd = 2) legend("topright", "Average eigenvalue", lwd = 2, col = "red3", bty = "n") ``` --- # Solution #3 <img src="workshop09-en_files/figure-html/unnamed-chunk-43-1.png" width="720" style="display: block; margin: auto;" /> --- # Solution #3 ```r biplot(mite.spe.h.pca, col = c("red3", "grey15")) ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-44-1.png" width="504" style="display: block; margin: auto;" /> --- # Warnings - PCA is a linear method and thus relies on a few assumptions -- - multinormal distribution of the data (only if you wish to make inferences) -- - not too many zeros -- - the gradient of interest is causing the majority of the variance in the dataset .alert[ .small[ Violation of these can cause a horseshoe shape in your biplots, where opposite ends of the horseshoe are close together but in reality represent opposite ends of a gradient] ] --- # Warnings - We can avoid some of these problems in PCA by choosing appropriate transformations for your community composition data or environmental data - In some cases, such as studies that cover very large environmental gradients, it may be appropriate to use other types of unconstrained ordinations --- # 4.1. Correspondance Analysis (CA) ## Euclidean vs Chi<sup>2</sup> distances - PCA preserves **euclidean distances** between objects, and thus postulates **linear relationships** between species, and between species and environmental gradients. - ... but in **some cases, species instead present unimodal responses** to environmental gradients --- # Principles of CA - In such cases, CA should be preferred compared to PCA as it preserves **Chi2 distances between sites**... and thus better represents uni modal relationships --- # How to run a CA? - CA is implemented in the `vegan` package using the function `cca()`: ```r spe.ca <- cca(spe[-8,]) # only take columns which rowsums are > than 0. ``` - CA on fish species abundances --- # CA: R output - CA results are presented in the same way as PCA results and can be called using: ```r summary(spe.ca) # # Call: # cca(X = spe[-8, ]) # # Partitioning of scaled Chi-square: # Inertia Proportion # Total 1.128 1 # Unconstrained 1.128 1 # # Eigenvalues, and their contribution to the scaled Chi-square # # Importance of components: # CA1 CA2 CA3 CA4 CA5 CA6 CA7 # Eigenvalue 0.6062 0.1423 0.10251 0.07319 0.04912 0.03909 0.03341 # Proportion Explained 0.5374 0.1262 0.09087 0.06488 0.04354 0.03465 0.02962 # Cumulative Proportion 0.5374 0.6635 0.75437 0.81925 0.86279 0.89745 0.92706 # CA8 CA9 CA10 CA11 CA12 CA13 # Eigenvalue 0.01709 0.01302 0.010765 0.008141 0.007533 0.005820 # Proportion Explained 0.01515 0.01154 0.009543 0.007217 0.006678 0.005159 # Cumulative Proportion 0.94221 0.95375 0.963294 0.970511 0.977188 0.982347 # CA14 CA15 CA16 CA17 CA18 CA19 # Eigenvalue 0.004765 0.004444 0.003262 0.002268 0.001663 0.001376 # Proportion Explained 0.004224 0.003939 0.002892 0.002011 0.001474 0.001219 # Cumulative Proportion 0.986571 0.990510 0.993402 0.995413 0.996887 0.998106 # CA20 CA21 CA22 CA23 CA24 # Eigenvalue 0.0010054 0.0004515 0.0002976 0.0001528 0.0001455 # Proportion Explained 0.0008913 0.0004003 0.0002638 0.0001355 0.0001290 # Cumulative Proportion 0.9989973 0.9993976 0.9996614 0.9997969 0.9999258 # CA25 CA26 # Eigenvalue 5.671e-05 2.694e-05 # Proportion Explained 5.027e-05 2.388e-05 # Cumulative Proportion 1.000e+00 1.000e+00 # # Scaling 2 for species and site scores # * Species are scaled proportional to eigenvalues # * Sites are unscaled: weighted dispersion equal on all dimensions # # # Species scores # # CA1 CA2 CA3 CA4 CA5 CA6 # CHA 1.49168 1.385259 -0.433809 0.071078 -0.24616 0.05168 # TRU 1.65415 -0.516649 -0.440526 -0.159605 0.21034 -0.49677 # VAI 1.29689 -0.287854 0.081677 0.039670 -0.04216 0.33505 # LOC 1.01655 -0.318047 0.345002 0.064349 0.07146 0.25276 # OMB 1.54572 1.331279 -1.009750 0.022603 -0.36894 0.38992 # BLA 0.99242 1.488077 0.395856 -0.002786 0.03087 -0.42919 # HOT -0.54197 0.037905 -0.007131 0.189457 0.34781 0.12443 # TOX -0.17940 0.440408 0.666475 -0.157195 0.59702 -0.09090 # VAN 0.01649 0.115139 0.632750 0.027180 -0.28562 -0.25295 # CHE -0.01016 -0.003287 0.087052 0.360355 -0.09299 0.04720 # BAR -0.32772 0.280531 0.087903 -0.134099 0.20162 0.02136 # SPI -0.37673 0.235208 0.296247 -0.251569 0.46512 0.10939 # GOU -0.31588 0.025703 0.089443 0.122259 -0.11758 -0.02693 # BRO -0.25636 -0.209762 0.076551 -0.128801 -0.37914 -0.08203 # PER -0.28362 -0.136097 0.304122 -0.263493 -0.25323 -0.04680 # BOU -0.59482 0.018329 -0.133101 -0.259428 0.15559 0.13556 # PSO -0.57898 0.049761 -0.180659 -0.080819 0.16908 0.02711 # ROT -0.61086 -0.155369 -0.062639 -0.139514 -0.40987 -0.16073 # CAR -0.57134 0.067623 -0.114447 -0.368663 0.11556 0.03287 # TAN -0.38717 -0.126068 0.088482 -0.142953 -0.22493 -0.01394 # BCO -0.69346 -0.064062 -0.326351 -0.337945 -0.02336 0.07199 # PCH -0.72297 -0.058647 -0.458970 -0.536212 -0.09385 0.05842 # GRE -0.68487 -0.073615 -0.386296 0.125350 0.02339 0.01566 # GAR -0.48338 -0.092189 0.075212 0.244705 -0.12215 -0.09479 # BBO -0.70069 -0.077202 -0.373887 -0.141038 0.02316 0.06521 # ABL -0.62469 -0.051667 -0.268265 0.765808 0.17809 -0.07415 # ANG -0.62735 -0.004040 -0.227844 -0.315380 0.09477 0.11331 # # # Site scores (weighted averages of species scores) # # CA1 CA2 CA3 CA4 CA5 CA6 # 1 2.72880 -3.63047 -4.297575 -2.180558 4.28251 -12.70718 # 2 2.26939 -2.74567 -0.683635 -0.508114 1.86199 -0.82146 # 3 2.01894 -2.55716 0.004459 -0.347298 1.04225 0.59661 # 4 1.28757 -1.97139 0.649261 -0.366438 -0.69066 0.85344 # 5 0.08260 -0.91883 1.609565 -0.009607 -3.62098 -1.54952 # 6 1.03028 -1.62206 1.049551 0.285131 -0.94420 0.82432 # 7 1.91231 -2.28949 0.346864 0.059722 1.09657 -0.13682 # 10 1.24958 -1.35061 1.947791 1.125968 -0.79586 2.58731 # 11 2.12864 -0.19558 -1.779393 0.246236 -0.32251 1.37538 # 12 2.16190 0.16511 -1.786342 0.125825 -0.17453 1.13375 # 13 2.29162 1.87949 -2.161318 -0.191687 -0.59319 0.15032 # 14 1.85861 2.19432 -1.472445 0.084282 -1.00562 0.53827 # 15 1.33811 1.87052 0.341352 0.449315 -0.93504 -0.62573 # 16 0.70115 1.57032 2.113072 -0.238157 0.62925 -1.83993 # 17 0.29051 0.86531 1.368926 -0.167800 1.64807 0.17059 # 18 0.05651 0.68479 0.965786 -0.005014 1.08092 0.60339 # 19 -0.19970 0.05475 1.064683 0.458458 0.92614 0.60861 # 20 -0.57263 -0.02434 0.359285 0.408580 0.48971 0.14790 # 21 -0.66495 -0.07911 -0.028650 -0.080409 0.05745 0.03087 # 22 -0.71075 -0.04983 -0.163794 -0.180754 -0.05923 -0.05610 # 23 -0.71881 -0.34926 -0.912791 7.297949 0.71786 -1.25267 # 24 -0.83779 -0.26250 -1.470892 4.899937 1.26149 -0.41743 # 25 -0.68011 -0.33390 -0.180281 3.764841 -1.79032 -1.86951 # 26 -0.75038 -0.23767 -0.702769 0.601903 -0.13132 0.06362 # 27 -0.74233 -0.14481 -0.559979 -0.081360 -0.19911 0.05064 # 28 -0.76945 -0.14800 -0.716897 -0.406389 -0.11339 0.18430 # 29 -0.59922 0.12766 -0.437857 -0.646721 0.08057 0.07932 # 30 -0.79734 -0.08325 -0.436074 -1.168606 -0.21617 0.01955 ``` --- # CA: Interpretation of results .pull-left2[  ] .pull-right2[ 26 CA axes identified % CA1 = 51.50% % CA2 = 12.37% ] --- # CA: biplots .center[ ] .small[ The group of sites on the left is characterized by the species *GAR*, *TAN*, *PER*, *ROT*, *PSO*, and *CAR* The group of sites in the upper right corner is characterized by the species *LOC*, *VAI* and *TRU* The group of sites in the lower right corner is characterized by the species *BLA*, *CHA*, and *OMB* ] --- # Challenge #4 ![:cube]() Using everything you have learned to execute a CA on the mite species abundance data: ```r mite.spe <- mite ``` - What are the significant axes? - Which groups of sites can you identify? - Which groups of species are related to these groups of sites? --- # Solution #4 - Compute CA: ```r mite.spe.ca <- cca(mite.spe) ``` - Check significant axes using the Guttman-Kaiser criterion ```r ev <- mite.spe.ca$CA$eig ev[ev > mean(ev)] n <- length(ev) barplot(ev, main = "Eigenvalues", col = "grey", las = 2) abline(h = mean(ev), col = "red3", lwd = 2) legend("topright", "Average eigenvalue", lwd = 2, col = red3, bty = "n") ``` --- # Solution #4 <img src="workshop09-en_files/figure-html/unnamed-chunk-50-1.png" width="720" style="display: block; margin: auto;" /> --- # 4.3. Principal Coordinates Analysis .center[] - In PCA, we preserve the maximum amount of variation in the data - In PCoA, we preserve as best we can in 2D (Euclidean) distances between each object in multidimensional space .alert[ PCoA can be especially useful when the dataset is wider than it is long (Typical problem in Genetics)] --- # PCoA - Let's try it on Fish species! - For computing PCoA, we can use the `cmdscale()` or the `pcoa()`functions from the **stats** and **ape** packages: ```r ?cmdscale library(ape) ?pcoa ``` - Run a PCoA on the Hellinger-distances of the fish dataset and extract the results ```r spe.h.pcoa <- pcoa(dist(spec.hel)) summary(spe.h.pcoa) # Length Class Mode # correction 2 -none- character # note 1 -none- character # values 5 data.frame list # vectors 783 -none- numeric # trace 1 -none- numeric ``` --- # PCoA - Interpretation of Output .center[] - Eigenvalues - Relative eigenvalues - Broken stick model: to evaluate which axes are significant - Cumulative eigenvalues: cumulative value of the relative eigenvalues - Cumulative broken stick: cumulative value of the broken stick model --- # PCoA - Interpretation of Output  - Vectors: the eigenvectors associated to each eigenvalue contain the coordinates, in Euclidean space, of each site. .alert[ These are the most useful for subsequent analysis as they capture the distance among objects very well] --- # PCoA biplot with `biplot.pcoa()` We can display, in 2D, the distances between sites using the `biplot.pcoa()` function, as well as the species associated to each site ```r biplot.pcoa(spe.h.pcoa, spec.hel) ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-54-1.png" width="576" style="display: block; margin: auto;" /> --- # PCoA and non-metric distances - PCoA can also be used to capture information contained in non-metric distances, such as the popular Bray-Curtis distance. Let's give it a try: ```r spe.bray.pcoa <- pcoa(spe.db.pa) # spe.bray.pcoa ``` - Examine the output and notice the negative eigenvalues. This is because non-metric distances cannot be represented in Euclidean space without corrections (*see* Legendre & Legendre 2012 for more details on this): ```r spe.bray.pcoa <- pcoa(spe.db.pa, correction = "cailliez") # spe.bray.pcoa ``` --- # PCoA and non-metric distances - Now let's visualize this using a biplot without the species (more common approach for PCoA) ```r biplot.pcoa(spe.bray.pcoa) ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-57-1.png" width="432" style="display: block; margin: auto;" /> --- # Challenge #5 ![:cube]() Execute a PCoA on the Hellinger-transformed mite species abundance data - What are the significant axes? - Which groups of sites can you identify? - Which groups of species are related to these groups of sites - How do the PCoA results compare with the PCA results? --- # Solution #5 - Hellinger transform the species data ```r mite.spe.hel <- decostand(mite.spe, method = "hellinger") ``` - Compute PCoA ```r mite.spe.h.pcoa <- pcoa(dist(mite.spe.hel)) ``` --- # Solution #5 - Build a biplot to visualize the data: ```r biplot.pcoa(mite.spe.h.pcoa, mite.spe.hel) ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-60-1.png" width="432" style="display: block; margin: auto;" /> --- # Non-metric Multidimensional scaling (NMDS) - In PCA, CA and PCoA, objects are ordinated in a few number of dimensions (i.e. axis) generally > 2 - Consequently, 2D-biplots can fail to represent all the variation of the dataset - In some cases, the objective is however to represent the data in a specified small number of dimensions - Then, how do you plot the ordination space to represent all the variation in the data? --- # Principles of NMDS - NMDS - is the non-metric counterpart of PCoA - uses an iterative optimization algorithm to find the best representation of distances in reduced space - increasingly popular - In NMDS, users can thus specify; - the number of dimensions - the distance measure --- # How to run a NMDS - nMDS is implemented in the `vegan` package using function `metaMDS()` where - *distance* specifies the distance measure to use - *k* specifies the number of dimensions ```r spe.nmds <- metaMDS(spe, distance = 'bray', k = 2) # Run 0 stress 0.07478058 # Run 1 stress 0.07376348 # ... New best solution # ... Procrustes: rmse 0.01949622 max resid 0.09458706 # Run 2 stress 0.1111213 # Run 3 stress 0.07376327 # ... New best solution # ... Procrustes: rmse 0.0008543472 max resid 0.00409852 # ... Similar to previous best # Run 4 stress 0.120377 # Run 5 stress 0.07477805 # Run 6 stress 0.08844006 # Run 7 stress 0.0742951 # Run 8 stress 0.1127541 # Run 9 stress 0.120818 # Run 10 stress 0.1210417 # Run 11 stress 0.1112381 # Run 12 stress 0.08930153 # Run 13 stress 0.1104349 # Run 14 stress 0.1171116 # Run 15 stress 0.1203766 # Run 16 stress 0.07376316 # ... New best solution # ... Procrustes: rmse 0.0001443268 max resid 0.0006330577 # ... Similar to previous best # Run 17 stress 0.0869643 # Run 18 stress 0.1211906 # Run 19 stress 0.1219327 # Run 20 stress 0.07383756 # ... Procrustes: rmse 0.003909611 max resid 0.01441007 # *** Solution reached ``` --- # NMDS: goodness-of-fit - NMDS applies an iterative procedure that tries to position the objects in the requested number of dimensions in such a way as to minimize a stress function (scaled from 0 to 1) which measure the goodness-of-fit of the distance adjustment in the reduced-space configuration. - Consequently, the lower the stress value, the better the representation of objects in the ordination-space is. --- # NMDS: goodness-of-fit - The Shepard diagram and stress values can be obtained from: ```r spe.nmds$stress # [1] 0.07376316 stressplot(spe.nmds, main = "Shepard plot") ``` <img src="workshop09-en_files/figure-html/unnamed-chunk-62-1.png" width="360" style="display: block; margin: auto;" /> --- # NMDS on fish abundances - Run the NMDS and assess the goodness of fit ```r spe.nmds <- metaMDS(spe, distance = 'bray', k = 2) spe.nmds$stress stressplot(spe.nmds, main = "Shepard plot") ``` --- # NMDS on fish abundances .pull-left[  ] .pull-right[ - The Shepard plot identifies a strong correlation between observed dissimilarity and ordination distance (R2 > 0.95) highlighting a high goodness of fit of the NMDS ] --- # NMDS on fish abundances - Construct the biplot ```r plot(spe.nmds, type = "none", main = paste("NMDS/Bray - Stress =", round(spe.nmds$stress, 3)), xlab = c("NMDS1"), ylab = "NMDS2") points(scores(spe.nmds, display = "sites", choiches = c(1,2), pch = 21, col = "black", g = "steelblue", cex = 1.2)) text(scores(spe.nmds, display = "species", choices = c(1)), scores(spe.nmds, display = "species", choices = c(2)), labels = rownames(scores(spe.nmds, display = "species")), col = "red", cex = 0.8) ``` --- # NMDS on fish abundances .pull-left[ The biplot of the NMDS shows a group of closed sites characterized by the species BLA, TRU, VAI, LOC, CHA and OMB, while the other species form a cluster of sites in the upper right part of the graph. Four sites in the lower part of the graph are strongly different from the others ] .pull-right[  ] --- # Challenge #6 ![:cube]() <br> - Run the NMDS of the mite species abundances in 2 dimensions based on a Bray-Curtis distance. - Assess the goodness-of-fit of the ordination and interpret the biplot --- # Solution #6 .pull-left[  ] .pull-right[ The correlation between observed dissimilarity and ordination distance (R2 > 0.91) and the stress value relatively low, showing together a good accuracy of the NMDS ordination ] --- # Solution #6 .pull-left[  ] .pull-right[ No cluster of sites can be precisely defined from the NMDS biplot showing that most of the species occurred in most of the sites, i.e. a few sites shelter specific communities ] --- # Conclusion .alert[Many ordination techniques exist, but their specificity should guide your choices on which methods to use] | | Distance preserved | Variables | Maximum number of axis | |---|---------|--------------|------| |PCA| Euclidean | Quantitative data, linear relationships | p | |CA| Chi2 | Non-negative, quantitative homogeneous data, binary data | p-1 | |PCoA| User defined | Quantitative, semi-quantitative, mixed data| p-1| |NMDS| User defined | Quantitative, semi-quantitative, mixed data| User defined| --- # Prime time 4 quiz time .alert[What does PCA stand for?] -- Principal Component Analysis -- .alert[Which one is the best way to visualize the *distances* between the community composition of many sites?] -- Principal Coordinate Analysis (PCoA) -- .alert[What does an eigenvalue represent in PCA?] -- The proportion of variance explained by a principal component --- # Prime time 4 quiz time Spot what is sketchy  -- .alert[ - Data non centered, Yikes! ] --- # Prime time 4 quiz time Spot what is sketchy  -- .alert[ - 2 first PCs explain 100% of the variation! ] --- # Live Long and Ordinate .center[] --- class: inverse, center, bottom # Thank you for attending this workshop! 