Note: The exact practice of and tools to do open science are in constant development. As such this is living document and is often updated and evolving, so that over time it becomes more refined. Because of this, some parts of this document may be incomplete.

Underlying philosophy

The main philosophy of this toolkit is to encourage reproducible and open scientific practices by automating difficult tasks and providing a (strongly) opinionated view on which tools and processes to use when doing open science. The goal here is to reduce the burden on researchers to actually do open and reproducible science, in particular for creating abstracts, slides, posters, and manuscripts. Currently, the target users are biomedical, medical, and health researchers, though this toolkit could easily be used by other disciplines.

Tools and services to use

The number of and possible tools and services to use to do open science is broad and diverse. This gives a lot of advantages, especially as this is the time for open science to grow and evolve. However, this is also a huge barrier for those researchers and scientists who are just starting out. There are too many choices and very little guidance on what to use. Given the possible choices available, this toolkit takes a strong stance on what tools and services to use. So, described below is a brief overview of the tools used in the larger steps involved in creating scientific output, which is then followed by more detailed explanations.

  • File management will be done using Git and either stored on GitHub or on GitLab. This keeps track of changes to the files and code, focuses the project into a single folder structure and emphasizes a final “product” from the coding and writing. It also makes it easier to disseminate and publish later on.
  • Bibliography management … This is still in development.
  • Coding and analyses will all be done in R. This obvious for now, since this is an R package! However, I would like this toolkit to eventually be more language agnostic… but baby steps.
  • Writing will be done in R Markdown files. This provides the ability to insert R code chunk so that figures, tables, and results can be regenerated easily and reproducibly. This also includes not just manuscripts, but slides and posters (more on this below).
  • Dissemination of the whole project will be done using Zenodo to create a DOI once the manuscript has been submitted, or the poster or slides have been presented. Manuscripts will be submitted to bioRxiv while posters and slides will be submitted to figshare.
  • All activities will be done in RStudio.
  • Reproducibility (more advanced users and currently optional) … This is still in development.

Managing files and tracking changes

All projects, files, coding, writing, and other activities will be done in RStudio. RStudio is an exceptional environment to work with R (and other languages) and has many features that make it simpler to follow open scientific workflows.

Version control is vital to an open scientific workflow. Since Git is well established in the open source, software development, and R package development world, it is the natural choice for version control. RStudio also has an excellent and fairly easy to use interface with Git.

The coding and analysis phase

Analyses should follow the style similar to creating R packages (read this excellent book on R package creation for more details).

  • R code should as much as possible be written as functions in the R/ folder.
  • Data, results output, and other output should be saved in data/.
  • Use devtools and usethis workflow and commands to load and use the data and R code (e.g. load_all()).

The writing phase:

All writing should be done using R Markdown. Big picture output such as figures and tables should be created as single purpose functions (e.g. table_one() or figure_two()) and inserted into a single code chunk (especially important for figures). Inserting citations is fairly simple, with the bibliography file preferably saved in the same folder as the manuscript or other scientific output (in the doc/ folder). Exploratory analyses should be done in a R Markdown file in the doc/ folder as well.

General collaboration should be done through GitHub Pull Requests or through GitLab Merge Requests. If technical skill of Git is low, designate a team member with a bit more skill in Git (or willing/able to put time into learning) as the “Git liaison”.

Slides should preferably be created using the options available in R Markdown. The best way to create posters is still in development.

Dissemination

Once the manuscript, slides, or poster have been finalized, several actions should be taken:

  1. Increase the version number of the project (more will be added to this).
  2. The GitHub/GitLab project should be made public (if sensitive data is in the project, that should be excluded) and should be uploaded to Zenodo to obtain an DOI.
  3. The manuscript should be submitted to bioRxiv and the slides or posters should be uploaded to figshare (can add an embargo time if presenting at a conference).
  4. Give your presentation or submit your manuscript to a journal!

For number 1 above, increasing the version number of the project can be used to track bigger picture milestones in the project. For small abstracts this isn’t really necessary, since they are likely part of a bigger scientific output.