Researchers using shared data sets, creating their own datasets, entering data into a database, or constructing databases could be interacting with many different agreements and/or handling data from many sources. Keeping track of the agreements, terms of use, licenses, and sources is an important part of ensuring that a new database or dataset can be successfully utilized by others.
In any data project, there are likely to be two components. The first is the data collected, assembled, or generated. Think of it as the raw content in the system. The second component is the data system in which the data is stored and managed.
We usually do not think of data content separate from the system in which it is stored, but the distinction is important in terms of intellectual property rights. The question is what, if anything, is protected by copyright. Data that is factual has no copyright protection under U.S. law; it is not possible to copyright facts. In many cases, the data in a data management system as well as the metadata describing that data will be factual, and hence not protected by copyright.
However, data content that consists of original creative expression is protected under U.S. law. For example, a project may compile copyrighted photographs into a database that analyzed components of the photographs. The photographs are part of the project’s “data" and each are copyright to the author of the photograph unless copyright ownership has been transferred.
A database can have a thin layer of copyright protection separate from the data contained within it. Deciding what data needs to be included in a database, how to organize the data, and how to relate different data elements are all creative decisions that may receive copyright protection.
Because of the different copyright status of databases and data content, different mechanisms are required to manage each. Copyright can govern the use of databases and some data content (that which is itself original), but contract law, trademarks, and other mechanisms are required to regulate factual data.
Clearly communicating the terms of use of a database or dataset is important for facilitating the reuse of data. The Open Data Commons group has developed legally binding tools to govern the use of data sets. Using a combination of copyright and contractual standards, they have created three standard licenses that can be used in conjunction with data projects.
Creative Commons also has a library of standardized licenses, and some of them apply to data and databases. The ODC-By license, for example, is the equivalent of a Creative Commons Attribution license (CC BY). However, Creative Commons licenses can only be applied to copyright protected material by the copyright owner. ODC-By license applies to works not protected by copyright (such as factual data).
The two CC licenses that are of greatest relevance to data management are:
There is no single right answer as to which license to assign to a database or content. However, anything other than an ODC PDDL or CC0 license may cause serious problems for subsequent scientists and other users. This is because of the problem of attribution stacking. Attribution stacking happens when two or more licensed datasets, A and B, are combined into a new dataset, C. The new dataset C must be licensed by the most restrictive terms of the component datasets, A and B, and provide full attribution to both. Any further use of dataset C should attribute datasets C, A, and B.
It may be possible to extract and use data from a data set in a research project and still maintain information as to the source of that data. It becomes more difficult to provide source information when creating a data set derived from hundreds of sources, some of which may also be compilations of other sources.
While providing attribution to the sources used in the creation of a new dataset is expected of researchers, using the most open licenses for data will make it easier for researchers as data and datasets are combined and reused in new and innovative ways.
The ownership of works produced by USF faculty, students, and non-academic staff is governed by the USF 0-300 Inventions and Works Policy, USF12.003 Inventions and Works Regulation, and especially the 0-105 Copyrighted Materials - Use and General Principles Policy. The precise answer will depend on whether the project was created as part of sponsored research; the employment status of the creator; whether the work was conducted “pursuant to the USF employee’s position description or specific professional assignment or ...commission”; and, whether the creation of the work required “appreciable USF System support.”.
Information on this page was adapted from: Cornell University (nd) Introduction to intellectual property rights in data management. Research Data Management Services Group. Web page: https://data.research.cornell.edu/content/intellectual-property CC-BY 4.0