This workshop will focus heavily on practical, actionable aspects of reproducibility in broad areas of computational science and data exploration, with special emphasis on issues in which community collaboration can be essential for adopting novel methodologies, techniques and frameworks aimed at addressing some of the challenges we face today. The workshop will bring together researchers and experts to share experiences and advance the state of the art in the reproducible evaluation of computer systems, featuring contributed papers and invited talks.
Topics
We expect submissions from topics such as, but not limited to:
- Experiment dependency management.
- Software citation and persistence.
- Data versioning and preservation.
- Provenance of data-intensive experiments.
- Tools and techniques for incorporating provenance into publications.
- Automated experiment execution and validation.
- Experiment portability for code, performance, and related metrics.
- Experiment discoverability for re-use.
- Cost-benefit analysis frameworks for reproducibility.
- Usability and adaptability of reproducibility frameworks into already-established domain-specific tools.
- Long-term artifact archiving for future reproducibility.
- Frameworks for sociological constructs to incentivize paradigm shifts.
- Policies around publication of articles/software.
- Blinding and selecting artifacts for review while maintaining history.
- Reproducibility-aware computational infrastructure.
Submission
Submit (single-blind) via EasyChair. We look for two categories of submissions:
-
Position papers. This category is for papers whose goal is to propose solutions (or scope the work that needs to be done) to address some of the issues outlined above. We hope that a research agenda comes out of this and that we can create a community that meets yearly to report on our status in addressing these problems.
-
Experience papers. This category consists of papers reporting on the authors’ experience in automating one or more experimentation pipelines. The committee will look for submissions reporting on their experience: what worked? What aspects of experiment automation and validation are hard in your domain? What can be done to improve the tooling for your domain? As part of the submission, authors need to provide a URL to the automation service they use (e.g., TravisCI, GitLabCI, CircleCI, Jenkins, etc.) so reviewers can verify that there is one or more automated pipelines associated to the submission.
Format
Authors are invited to submit manuscripts in English not exceeding 5
pages of content. The 5-page limit includes figures, tables and
appendices, but does not include references, for which there is no
page limit. Submissions must use the ACM Master
Template
(please use the sigconf
format with default options).
Proceedings
The proceedings will be archived in both the ACM Digital Library and IEEE Xplore through SIGHPC.
Tools
These tools can be used used to automate your experiments (not an exhaustive list): CWL, Popper, ReproZip, Sciunit, Sumatra.
Accepted Papers
-
Abdulqawi Saif, Alexandre Merlin, Lucas Nussbaum, and Ye-Qiong Song. MonEx: An Integrated Experiment Monitoring Framework Standing on Off-The-Shelf Components.
-
Quan Pham, Tanu Malik, Dai Hai Ton That, and Andrew Youngdahl. Improving Reproducibility of Distributed Computational Experiments.
-
Victoria Stodden, Matthew S. Krafczyk, and Adhithya Bhaskar. Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility.
-
Michael A. Sevilla and Carlos Maltzahn. Popper Pitfalls: Experiences Following a Reproducibility Convention.
-
David Wilkinson, Luís Oliveira, Daniel Mossé, and Bruce Childers. Software Provenance: Track the Reality Not the Virtual Machine.
-
Luís Oliveira, David Wilkinson, Daniel Mossé, and Bruce Childers. Supporting Long-term Reproducible Software Execution.
Program
Schedule
07:30-08:45 - Breakfast |
08:45-09:00 - Welcome |
09:00-10:00 - Keynote (Dr. Fatma Deniz) |
10:00-10:30 - Coffee break |
10:30-10:45 - Lightning talks |
10:45-12:00 - Paper Presentations 1 |
12:00-13:00 - Lunch (hosted by HPDC) |
13:00-14:30 - Paper Presentations 2 |
14:30-15:00 - Coffee break |
15:00-16:00 - Panel (Moderator: Victoria Stodden) |
16:00-17:00 - Poster session |
Paper Session 1 (chair: Jay Lofstead)
- Abdulqawi Saif, Alexandre Merlin, Lucas Nussbaum, and Ye-Qiong Song. An Integrated Experiment Monitoring Framework Standing on Off-The-Shelf Components.
- Luís Oliveira, David Wilkinson, Daniel Mossé, and Bruce Childers. Supporting Long-term Reproducible Software Execution.
- Quan Pham, Tanu Malik, Dai Hai Ton That, and Andrew Youngdahl. Improving Reproducibility of Distributed Computational Experiments.
Paper Session 2 (chair: Carlos Maltzahn)
- Jay Jay Billings. Applying Distributed Ledgers to Manage Workflow Provenance.
- David Wilkinson, Luís Oliveira, Daniel Mossé, and Bruce Childers. Software Provenance: Track the Reality Not the Virtual Machine.
- Victoria Stodden, Matthew S. Krafczyk, and Adhithya Bhaskar. Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility.
- Michael A. Sevilla and Carlos Maltzahn. Popper Pitfalls: Experiences Following a Reproducibility Convention.
Keynote Address
Abstract: Establishing generalizable findings from systematic empirical observations is at the core of the modern scientific method. Reproducible research practices can help scientists arrive at results that flourish new theories and technological advances. Importantly, it enables the scientific community to corroborate published results and theories. In my talk, I will start by introducing reproducibility and what it means to aim reproducible research practices. I will then present technical tools and data science practices that can incorporate reproducible research in a scientist’s everyday workflow. I will discuss how we can use these tools and practices to conduct large-scale data exploration and computational science by introducing case studies from data-intensive sciences. I will finish my talk by providing my account of reproducible research, where I will present a web-based replication engine (https://boldpredictions.gallantlab.org) that uses data derived from naturalistic neuroimaging experiments to simulate a variety of language experiments. Bio: Dr. Fatma Deniz is a Moore-Sloan Data Science Fellow in Berkeley Institute for Data Science, a Postdoctoral-Fellow in Dr. Jack Gallant’s laboratory (Helen Wills Neuroscience Institute) at the University of California, Berkeley and a visiting scientist at the Technical University Berlin. She is interested in how sensory information is encoded in the brain and uses machine-learning approaches to fit computational models to large-scale brain data. Her current work focuses on cross-modal language representation in the human brain. She did her PhD in Dr. John-Dylan Haynes’s laboratory at Bernstein Center for Computational Neuroscience and Technical University Berlin, where she studied functional connectivity changes during conscious perception in humans. She got a bachelor’s and master’s degrees in Computer Science from the Technical University Munich. During her master’s work, Dr. Deniz worked with Dr. Christof Koch at Caltech, where she studied visual saliency and automated text detection. As an advocate of reproducible research practices, she is the editor of the book titled “The Practice of Reproducible Research”. In addition, she works on improving Internet security applications using knowledge gained from cognitive neuroscience and Mooney images (mooneyauth.org). Her work is at the intersection between computer science, human cognition, and neuroscience. She is a passionate coder, baker and loves playing the cello.
Registration
To register for the workshop, go to the HPDC registration page.
Important Dates
- Submissions due:
April 2April 9, 2018 (AoE) - Acceptance notification: April 30, 2018
- Camera-ready paper submission: May 8, 2018
- Workshop: June 11, 2018
Organizers
- Ivo Jimenez, UC Santa Cruz
- Carlos Maltzahn, UC Santa Cruz
- Jay Lofstead, Sandia National Laboratories
Program Committee
- Divyashri Bhat, UMass Amherst
- Michael R. Crusoe, Project Lead, Common Workflow Language project
- Anja Feldmann, TU Berlin
- Todd Gamblin, LLNL
- Mike Heroux, Sandia National Laboratories
- Torsten Hoefler, ETH Zürich
- Neil Chue Hong, Software Sustainability Institute / University of Edinburgh, UK
- Dan Katz, NCSA, UIUC
- Kate Keahey, Argonne National Lab / ChameleonCloud
- Ignacio Laguna, LLNL
- Arnaud Legrand, CNRS/Inria/University of Grenoble
- Reed Milewicz, Sandia National Laboratories
- Robert Ricci, University of Utah / CloudLab
- Victoria Stodden, UIUC
- Violet R. Syrotiuk, ASU
- Michela Taufer, University of Delaware
- Michael Zink, UMass Amherst
Contact
Please address workshop questions to ivo@cs.ucsc.edu.