The P-RECS’19 workshop will be held as a full-day meeting at ACM HPDC 2019 in Phoenix, Arizona, USA on June 24th, 2019. This year, HPDC runs under the ACM Federated Computing Research Conference. This large federated conference will assemble 11 affiliated conferences and will provide excellent opportunities for interdisciplinary networking and learning.
The P-RECS workshop focuses heavily on practical, actionable aspects of reproducibility in broad areas of computational science and data exploration, with special emphasis on issues in which community collaboration can be essential for adopting novel methodologies, techniques and frameworks aimed at addressing some of the challenges we face today. The workshop brings together researchers and experts to share experiences and advance the state of the art in the reproducible evaluation of computer systems, featuring contributed papers and invited talks.
Topics
We expect submissions from topics such as, but not limited to:
- Experiment dependency management.
- Software citation and persistence.
- Data versioning and preservation.
- Provenance of data-intensive experiments.
- Tools and techniques for incorporating provenance into publications.
- Automated experiment execution and validation.
- Experiment portability for code, performance, and related metrics.
- Experiment discoverability for re-use.
- Cost-benefit analysis frameworks for reproducibility.
- Usability and adaptability of reproducibility frameworks into already-established domain-specific tools.
- Long-term artifact archiving for future reproducibility.
- Frameworks for sociological constructs to incentivize paradigm shifts.
- Policies around publication of articles/software.
- Blinding and selecting artifacts for review while maintaining history.
- Reproducibility-aware computational infrastructure.
Program
09:00-09:15 | Welcome |
09:15-10:15 | Keynote (Dr. Carl Kesselman) |
10:15-10:45 | Coffee break |
10:45-12:15 | Paper Presentations 1 |
12:15-13:30 | Lunch (hosted by HPDC/FCRC) |
13:30-15:00 | Paper Presentations 2 |
15:00-15:30 | Coffee Break |
15:30-16:30 | Open discussion |
16:30-17:00 | Closing remarks |
Keynote Address
Carl Kesselman (University of Southern California)
Title: Making Lightening Strike Twice: Achieving reproducibility and impact in a data-driven scientific environment.
Abstract: A cornerstone of the scientific method is the ability for one scientist to reproduce the results of another scientist. This requires that investigators take explicit steps such as ensuring that protocols are well defined, reagents and cell lines characterized and validated, etc. A critical aspect of this process is describing what data has been collected and how it is analyzed. While science has always been driven by the collection, analysis and sharing of data, technology advances have shifted data processing from the role of a final analysis step to a core and integral part of the scientific method. However, with the increased complexity of computational methods and shear volume of data, achieving reproducibility of a data-driven scientific investigation becomes correspondingly more difficult. In my talk, I will describe the properties that data in a scientific investigation should have to promote reproducibility. Specifically, reproducibility requires that data should be Findable, Interoperable, Accessible, or Reusable, or FAIR. I will describe methods and tools that can help promote reproducibility in data-driven scientific research and will illustrate with examples from FaceBase, an NIH funded consortium that is generating data associated with caniofacial development and malformation.
Bio: Dr. Carl Kesselman specializes in grid computing technologies. This term was developed by him and professor Ian Foster in the book The Grid: Blueprint for a New Computing Infrastructure. He and Foster are winners of the British Computer Society’s Lovelace Medal for their grid work. He is institute fellow at the University of Southern California’s Information Sciences Institute and a professor in the Epstein Department of Industrial and Systems Engineering, at the University of Southern California.
Papers Session 1
-
Dylan Chapp, Danny Rorabaugh, Duncan Brown, Ewa Deelman, Karan Vahi, Von Welch, Michela Taufer. Applicability study of the PRIMAD model to LIGO gravitational wave search workflows.
-
David Stockton, Astrid Prinz, Fidel Santamaria. Provenance and reproducibility in the automation of a standard computational neuroscience pipeline.
-
Von Welch, Ewa Deelman, Victoria Stodden, Michela Taufer. Initial Thoughts on Cybersecurity And Reproducibility.
Papers Session 2
-
Kyle Chard, Niall Gaffney, Matthew Jones, Kacper Kowalik, Bertram Ludascher, Jarek Nabrzyski, Victoria Stodden, Matthew Turk, Craig Willis. Implementing Computational Reproducibility in the Whole Tale Environment.
-
Matthew S. Krafczyk, August Shi, Adhithya Bhaskar, Darko Marinov, Victoria Stodden. Continuous Integration Strategies in the Scientific Software Context.
-
Andrea David, Mariette Souppe, Ivo Jimenez, Katia Obraczka, Sam Mansfield, Kerry Veenstra, Carlos Maltzahn. Reproducible Computer Network Experiments: A Case Study Using Popper.
Submission
Submit (single-blind) via EasyChair. We look for two categories of submissions:
-
Position papers. This category is for papers whose goal is to propose solutions (or scope the work that needs to be done) to address some of the issues outlined above. We hope that a research agenda comes out of this and that we can create a community that meets yearly to report on our status in addressing these problems.
-
Experience papers. This category consists of papers reporting on the authors’ experience in automating one or more experimentation pipelines. The committee will look for submissions reporting on their experience: what worked? What aspects of experiment automation and validation are hard in your domain? What can be done to improve the tooling for your domain? As part of the submission, authors need to provide a URL to the automation service they use (e.g., TravisCI, GitLabCI, CircleCI, Jenkins, etc.) so reviewers can verify that there is one or more automated pipelines associated to the submission.
Format
Authors are invited to submit manuscripts in English not exceeding 5
pages of content. The 5-page limit includes figures, tables and
appendices, but does not include references, for which there is no
page limit. Submissions must use the ACM Master
Template
(please use the sigconf
format with default options).
Proceedings
The proceedings will be archived in both the ACM Digital Library and IEEE Xplore through SIGHPC. In addition, pre-print versions of the accepted articles will be published in this website (as allowed by ACM’s publishing policy).
Tools
These tools can be used used to automate your experiments (not an exhaustive list): CK, CWL, Popper, ReproZip, Sciunit, Sumatra.
Important Dates
- Submissions due:
April 9April 15, 2019 (AoE) - Acceptance notification: April 30, 2019
- Camera-ready paper submission: May 9, 2019
- Workshop: June 24, 2019
Organizers
- Ivo Jimenez, UC Santa Cruz
- Carlos Maltzahn, UC Santa Cruz
- Jay Lofstead, Sandia National Laboratories
- Fernando Chirigati, New York University
Program Committee
- Jay Billings, Oak Ridge National Laboratory
- Ronald Boisvert, NIST
- Bruce R. Childers, University of Pittsburgh
- Neil Chue Hong, Software Sustainability Institute, EPCC, University of Edinburgh
- Robert Clay, Sandia National Labs
- Michael Crusoe, Common Workflow Language
- Dmitry Duplyakin, University of Utah
- Torsten Hoefler, ETH Zurich
- Fatma Imamoglu, University of California, Berkeley
- Daniel S. Katz, University of Illinois Urbana-Champaign
- Arnaud Legrand, CNRS / Inria / University of Grenoble
- Tanu Malik, DePaul University
- Robert Ricci, University of Utah
- Victoria Stodden, University of Illinois at Urbana-Champaign
- Violet Syrotiuk, Arizona State University
- Michela Taufer, University of Tennessee Knoxville
Contact
Please address workshop questions to ivo@cs.ucsc.edu.