Text & Data Mining

Library Resources

USF Libraries negotiate with several different vendors and publishers for electronic resources.  Many of those resources may be discoverable via simple browser search, for example, a search of Google Scholar will return several scholarly articles for which access has been negotiated by the Libraries.

The following table is meant as a helpful guide to databases and collections that allow text and data mining and/or offer an API for researchers to use for extracting data.  Users who plan on using any of the linked resources should read the terms and conditions of the resource in full before proceeding with their project.  Please feel free to send questions to your subject librarian or copyright librarian.

Tool

Additional Link

Limits/Requirements

Prohibited

ProQuest TDM Studio

Terms of Use

  • Secure Jupyter notebook environment
  • Export of model restricted to certain publishers
  • Commercial Uses

Cambridge University Press

Terms

  • Link to underlying content in site
  • Locally stored copies must be deleted after project
  • Large scale download must contact openresearch@cambridge.org
  • Commercial uses

Elsevier Developer Portal

TDM policy

  • Obtain a key at the developer portal
  • Commercial uses
  • Use of robots or automated downloading programs
  • Use in “combination with an artificial intelligence tool (including to train an algorithm, test, process, analyze, generate output and/or develop any form of artificial intelligence tool) save for instances where such artificial intelligence tool is used locally only in a self-hosted environment and does not share any data it processes with unauthorised third parties”

Sage

TDM policy

  • 1 request every 6 seconds – Monday to Friday between Midnight and Noon in the "America/Los_Angeles" timezone;
  • 1 request every 2 seconds - Monday to Friday between Noon and Midnight in the "America/Los_Angeles" timezone, and all day Saturday and Sunday.
  • Commercial uses

Springer Nature API

TDM policy

  • Must request an API key
  • Limit 1 request per second
  • Projects with higher bandwidth requests limit to 150 requests per minute via the API
  • Commercial uses

University of Chicago Press Journals Division

Terms and conditions

  • Approval from the press required prior to all automated downloading
  • Commercial uses
  • Creating a product for use by third parties that substitutes for a subscription

Wiley

TDM policy

  • Limit 3 articles per second
  • up to 60 requests per 10 minutes, which entails building in a delay of 10 seconds between requests
  • outputs of TDM must be communicated to third parties as ‘non-commercial research by authorized uses’
  • Cite original content with DOI for any extracts or quoted material
  • Commercial uses
  • perform systematic or substantive extracting for the purposes of creating a product or service for use by third parties

JAMA Network API

Terms and conditions

  • must include proprietary notice
  • Create derivatives
  • systematically reproduce or redistribute to third parties

JSTOR Constellate

Constellate terms

  • secure Jupyter notebook environment
  • Commercial uses
  • Create a competing product
  • systematically print or distribute for anything other than Text and Data Analytics

IEEE API

Terms

  • need to obtain token to use API
  • Commercial Uses
  • access the Licensed Products using a robot, spider, crawler, screen scraping or similar technological device
  • Utilize TDM  to compete with or replicate existing products
  • make the results of any TDM output available on an externally facing server or website
  • permit a third party to harvest any TDM Output.