Mining species-level biodiversity information from published literature

by Gabriel Muñoz

Project Details

  • Language : English
  • Material required : R and RStudio
  • Instructed : R Symposium 2019
  • Contributed by : Gabriel Muñoz

Mining species-level biodiversity information from published literature

Product of centuries of research, a vast amount of information about life on our planet is stored in the published literature on the web. Most of this literature is now accessible and comes as articles, theses, or reports, which are stored and shared as PDF files. However, manually scanning all this corpus to separate and extract species-level biodiversity data from individual works can be a daunting task. With this workshop, you will learn computational tools and more automated ways to search for biodiversity information of interest among the large corpus of literature. First, we will briefly review common and less-common sources of literature on biodiversity. Second, we will explore techniques of programmatic literature search and selection of keywords. Third, we will learn the use of specific tools to mine particular biodiversity observations from a collection of pdf articles. Finally, we will briefly review common global biodiversity data aggregators.

Workshop material