The Cornell Institute for Social and Economic Research (CISER), founded in 1981, is home to one of the oldest university-based social science data archives in the United States. CISER houses an extensive collection of public and restricted numeric data files in the social sciences with particular emphasis on studies that match the interests of Cornell researchers: demography, economics and labor, political and social behavior, family life, and health. CISER’s mission is to anticipate and support the evolving computational and data needs of Cornell social scientists and economists throughout the entire research process and data life cycle. Data archive functions include making data available to the broadest audience permissible via green/yellow/red light access levels; providing a secure, safe research computing environment to facilitate data access and use for researchers; and data consulting support from staff experienced in using social science data, in order to maximize the benefits of the data archive and research computing facilities, including availability of CISER’s significant depth of expertise in restricted data access management.

CISER also offers a Data Curation and Reproduction of Results Service, R-squared or R2, where researchers with papers ready to submit for publication can send their data and code to CISER prior to submission for appraisal, curation and replication. This is to ensure that published results are replicable; and that data and codes are well documented, reusable, packaged, and preserved in a trustworthy data repository, such as the CISER Data Archive, for current and future generations of researchers.

CISER Workflow
Data curation and Reproduction of Results (R2) service workflow at CISER

The service requires that clients deposit the following: a) Copy of the article with sections in the article that referenced figures derived from their data highlighted in yellow; b) Error-free codes and notes on the sequence of running code(s); c) clean data with variables and values having labels; and d) other supplementary materials. The files should be packaged such that if everything is downloaded together the programs will run without manipulation (or just minor tweaking of the file paths). If they have sufficient time before submission, instructions on how to package the materials following the Teaching Integrity in Empirical Research (TIER) protocol are provided.

Upon deposit, the paper is inspected to ensure that researchers highlighted sections in the article that reference figures/tables derived from their data because these are the targets for reproduction. Source and analytic data, and documentation are reviewed to ensure variables and their values have labels, and that research materials are complete, accurate, and well documented. Disclosure risks are assessed, and if risks are found, avoidance techniques are then suggested and relayed to the researchers as results could be affected. Codes are committed to Github so versions can be tracked as there are likely to be changes made to the codes. Codes are then executed to make sure they run, and their output compared to the submitted paper. If the results do not match, the findings are discussed with the researchers and appropriate actions taken. Supplementary files are also examined such as Readme file and codebooks, if provided.

Upon study replication, the paper, code, data, documentation, and other supplementary materials are packaged for sharing following the TIER protocol. Integrity of the package is verified by running multiple scripts to determine number of records, variables, bytes, file names, checksums, and other file-level information. To generate and edit DDI codebook metadata, CISER’s instance of the Comprehensive Extensible Data Documentation and Access Repository (CED2AR) is used. CISER plans to incorporate the W3C’s Data Quality Vocabulary (DQV) and PROV metadata into the service in the future. The packaged materials are then stored at either the journal-specified location or at the CISER Data Archive, and accessed through the CISER Data Archive Catalog or through a custom website developed by CISER at the request of the author(s). Each study replicated and hosted by the CISER Data Archive have study number and persistent URL, and will be provided an Open Data Certificate badge, a CISER Certified Reproducible Research badge, and active monitoring and listing of study and data citations.

