Introduction to Intellectual Property Rights in Data Management
Data versus Database
In any data project, there are likely to be two components. The first is the data collected, assembled, or generated. Think of it as the raw content in the system. The second component is the data system in which the data is stored and managed.
We usually do not think of data content separate from the system in which it is stored, but the distinction is important in terms of intellectual property rights. The question is what, if anything, is protected by copyright. Data that is factual has no copyright protection under U.S. law; it is not possible to copyright facts. In many cases, the data in a data management system as well as the metadata describing that data will be factual, and hence not protected by copyright.
However, data content that consists of original creative expression is protected under U.S. law. For example, a project may compile copyrighted photographs into a database that analyzed components of the photographs. The photographs are part of the project’s “data" and each are copyright to the author of the photograph unless copyright ownership has been transferred.
A database can have a thin layer of copyright protection separate from the data contained within it. Deciding what data needs to be included in a database, how to organize the data, and how to relate different data elements are all creative decisions that may receive copyright protection.
Because of the different copyright status of databases and data content, different mechanisms are required to manage each. Copyright can govern the use of databases and some data content (that which is itself original), but contract law, trademarks, and other mechanisms are required to regulate factual data.
The Open Data Commons ODbL stipulates that any subsequent use of the database must provide attribution, an unrestricted version of the new product must always be accessible, and any new products made using ODbL material must be distributed using the same terms. It is the most restrictive of all ODC licenses.
Creative Commons also has a library of standardized licenses, and some of them apply to data and databases. The ODC-By license, for example, is the equivalent of a Creative Commons Attribution license (CC BY). However, Creative Commons licenses can only be applied to copyright protected material by the copyright owner. ODC-By license applies to works not protected by copyright (such as factual data).
The two CC licenses that are of greatest relevance to data management are:
When an owner wishes to waive her copyright and/or database rights, she can use the CC0 mark. It effectively places the database and data into the public domain. It is the functional equivalent of an ODC PDDL license.
It is used to mark works that are in the public domain, and for which there are no known copyright or database restrictions. It is possible to flag factual data as PDM in a database, for example, in order to make it clear it is free to use.
Selecting a data license
There is no single right answer as to which license to assign to a database or content. However, anything other than an ODC PDDL or CC0 license may cause serious problems for subsequent scientists and other users. This is because of the problem of attribution stacking. Attribution stacking happens when two or more licensed datasets, A and B, are combined into a new dataset, C. The new dataset C must be licensed by the most restrictive terms of the component datasets, A and B, and provide full attribution to both. Any further use of dataset C should attribute datasets C, A, and B.
It may be possible to extract and use data from a data set in a research project and still maintain information as to the source of that data. It becomes more difficult to provide source information when creating a data set derived from hundreds of sources, some of which may also be compilations of other sources.
While providing attribution to the sources used in the creation of a new dataset is expected of researchers, using the most open licenses for data will make it easier for researchers as data and datasets are combined and reused in new and innovative ways.
Data Ownership at the University of South Florida
The ownership of works produced by USF faculty, students, and non-academic staff is governed by the USF 0-300 Inventions and Works Policy, USF12.003 Inventions and Works Regulation, and especially the 0-105 Copyrighted Materials - Use and General Principles Policy. The precise answer will depend on whether the project was created as part of sponsored research; the employment status of the creator; whether the work was conducted “pursuant to the USF employee’s position description or specific professional assignment or ...commission”; and, whether the creation of the work required “appreciable USF System support.”.
University of South Florida's official policy regarding copyright as promulgated by the Office of the General Counsel in 1996 and amended in 2013.
Information on this page was adapted from: Cornell University (nd) Introduction to intellectual property rights in data management. Research Data Management Services Group. Web page: https://data.research.cornell.edu/content/intellectual-property CC-BY 4.0