The Institution for Social and Policy Studies (ISPS) was established in 1968 by the Yale Corporation as an interdisciplinary center at the university to facilitate research in the social sciences and public policy arenas. ISPS is an independent academic unit within the university, including affiliates from across the social sciences. ISPS supports research by providing a vigorous intellectual environment as well as research funds. The ISPS Data Archive is a digital repository meant to capture and preserve the intellectual output of and the research produced by scholars affiliated with ISPS, and strives to serve as a model for sharing and preserving research data by implementing the ideals of scientific reproducibility and transparency. The Archive provides free and public access to research materials and accepts content for distribution under a Creative Commons license. The archive is managed by a full-time professional with knowledge of the specific research domain, and graduate student research assistants handle most of the data curation.

The Archive was built to create an open access digital collection of social science experimental data, metadata, code, and associated files produced by ISPS researchers. Deposits are organized by study and include much of the research output—data, metadata, statistical code, codebooks, research materials, >and description files—from each study.

ISPS Pipeline
ISPS data curation and code review workflow


Upon deposit, a safe copy is created and deposited in a dark archive. A public copy of the files is created and begins processing, which includes generating study-level and file-level metadata, confirming all variables and values are labeled, standardizing missing values, creating and augmenting documentation, and assessing and minimizing disclosure risk by applying techniques such as recoding, masking, or removal of variables, and assigning persistent links. The review of code files—statistical and other programming scripts—includes verifying that the code executes and that the published scientific results can be reproduced with the given code and data. The data and code review processes include an assessment of the quality of documentation and contextual information necessary for long-term usability (for example, a codebook, a readme file, a commented code). In cases where these are found lacking or insufficient, the archive works with researchers on remedial actions. All files formats are normalized (including migrating software-specific data files to flat file formats such as ASCII, text, or comma delimited, and rewriting code written using licensed statistical software such as SPSS to open-source statistical languages such as R). All files are assigned a unique identifier (handle), and files sets have citation information. After completion of the process, materials are stored and made publicly available via the ISPS Data Archive Web portal.  ISPS is currently developing, in partnership with Innovations for Poverty Action and Colectica, a workflow tool to structure the curation and review process, and to generate high quality data packages that are repository-agnostic (i.e., they can be ingested into any repository). For ISPS, the software will be deployed on Yale infrastructure in partnership with IT and the Yale University Library, and additional access to these materials will be provided through the Library’s Digital Collections portal.

The ISPS Data Archive supports the sharing of quality data by ensuring that deposits meet certain accessibility and usability standards. Meeting these standards ultimately contributes to more reproducible science.

