The general function of a social science data archive is to make machine-readable data available to scientists.
Archives seek to work closely with data producers to obtain complete data and documentation. These files are then deposited in the archive, manipulated, documented, stored and finally distributed to the scientific community for further analysis. The goal is make it as easy as possible for secondary analysts to work with data they obtain from archives.
Many social science data archives, data collection organizations, and grant agencies have produced primers and guides which describe both the responsibilities of data producers and the role that data archives play in preparing data and making it accessible to the research community. Some examples include:
- Principles and Good Practice for Preserving Data (Working Paper No. 003: International Household Survey Network). http://www.surveynetwork.org/home/sites/default/files/resources/IHSN-WP003.pdf
- Dissemination of Microdata Files: Principles, Procedures and Practices (Working Paper No. 005: International Household Survey Network). http://www.surveynetwork.org/home/sites/default/files/resources/IHSN-WP005.pdf
- Managing and Sharing Data. UK Data Archive. http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
- Preparing Data for Sharing: Guide to Social Science Data Archiving. Data Archiving and Networked Services (DANS). http://www.dans.knaw.nl/en/content/categorieen/publicaties/dans-data-guide-8
- Guide to Social Science Data Preparation and Archiving. ICPSR. http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/index.html
- Digital Research Data Sharing and Management. Task Force on Data Policies Committee on Strategy and Budget National Science Board, National Science Foundation. http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf
Data policies are norms regulating management and publication of research data. They range from recommendations to enforcements. There is much variation in their scope and content across countries and across disciplines in single countries.
It is in the best interest of science policy makers, research funders, and research organizations to stimulate secondary use of research data by supporting open access to it. One key tool is guiding data producers to plan and implement data management so that it supports a long and vital life-cycle of the data. New data should be collected and archived so that re-use is not prevented.
Science publishers want to publish high quality research, and many of them have introduced data policies. Access to data of research findings helps to control the quality of research by allowing validation and/or correction of previous results through re-analysis, and helps to counteract against misconduct.
Many international organizations have promoted data sharing with recommendations as well. They have set up guidelines for access to research data. OECD report on Principles and Guidelines for Access to Research Data from Public Funding (2007) discusses aims and tools for data sharing in detail, and gives a comprehensive definition to ‘research data.’
According to this report, research data are “…factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings. A research data set constitutes a systematic, partial representation of the subject being investigated.”
The definition is also accompanied with statements of what research data is not: “This term does not cover the following: laboratory notebooks, preliminary analyses, and drafts of scientific papers, plans for future research, peer reviews, or personal communications with colleagues or physical objects” (OECD 2007).
In 2012-2013, IFDO ran an expert survey on data policies in order to overview the situation in different countries. The main focus was on data policies of research funders and on the social sciences and humanities (SSH).
To view the survey results, click here.
DATA SHARING: Open scientific knowledge cannot rely on publications with closed data, as science is fundamentally based on openness and transparency. Whenever possible, research data should be collected and made available in a way that allows its use of by those who have not collected it.
World countries and national scientific communities are different in their research infrastructures and in their data sharing cultures. What is common to all governments is the need to support cost-effective research with high scientific quality.
Well planned data management and data sharing walk hand in hand with such aims. Providing open data maximizes its use, and controls the quality of research. International cooperation can help governments and national scientific communities to develop appropriate data sharing infrastructures and practices cost-effectively. For experts working with social sciences, communities like the IASSIST provide good opportunities for newcomers to get to know people and practices in this area.
Usually the best way to share data is to publish it with the help of established data archives and data services. Good data sharing practices guarantee proper data documentation and dissemination of data to secondary users with sufficient guidance.
Well developed standards of data documentation, data citation, and researchers’ resumés allow primary users and other data producers to specify their merits and ownership of data. Secondary users of data are expected to credit the original data producers by citing their data. The best way to allow this is publishing data with well developed standards.
Technological development and demands of efficiency in public investments have boosted open data initiatives during recent years. International organizations, national science policy makers, research funding agencies, as well as publishers of research results have introduced data policies favorable for data sharing.
IFDO points out key resources to keep on track with data sharing, and monitor national data policies especially related to social sciences and humanities.
UK/JISC: Comparative Study of International Approaches to Enabling the Sharing of Research Data (2008)
OECD Principles and Guidelines for Access to Research Data from Public Funding (2007)
Riding the wave. How Europe can gain from the rising tide of scientific data. Final report of the High Level Expert Group on Scientific Data, A submission to the European Commission, October 2010.
NSF: Dissemination and Sharing of Research Results
ICPSR at 50: Facilitating Research and Data Sharing
Data sharing and the Council of European Social Science Data Archives (CESSDA)
Data preservation, or more specifically, digital data preservation, refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary.
This broad definition of data preservation refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological change. Long-term preservation can be defined as the ability to provide continued access to digital materials, or at least to the information contained in them, indefinitely.
A sustainable preservation program should address organizational issues, technological concerns and the digital curation/data management process.
Organizational infrastructure includes the policies, procedures, practices and people — the basic elements of data preservation requirements. This includes the purpose, scope and objectives of the organizations; the legal and regulatory framework, and all aspects of funding and resource planning.
Technological concerns deals with the IT architecture of the organization. That is, the requisite equipment, software, hardware, skills, a secure environment and an updated media monitoring and refreshment strategy.
Data curation refers to the active management of data through its life cycle of interest and usefulness to a designated community. Data curation activities enable data discovery and retrieval, maintain its quality, add value, and provide for re-use over time. As such, it includes all processes in the organization that involves data management. That is, pre-ingest initiatives; ingest functions; archival storage and preservation; and disseminating and providing access to data for its designated community.
DPC Digital Preservation Handbook
IHSN: Principles and Good Practice for Preserving Data. IHSN Working Paper No 003. December 2009.
CCSDS: Reference Model for an Open Archival Information System (OAIS).
Audit and Certification of Trustworthy Digital Repositories. Magenta Book. Issue 1. September 2011.