<p>People are busy, especially managers and leaders. Results of data analyses are sometimes presented in oral form, but often the first cut is presented via email.</p>
<p>It is often useful therefore, to breakdown the results of an analysis into different levels of granularity/detail</p>
<h2>Hierarchy of Information: Research Paper</h2>
<ul>
<li>Title / Author List
<ul>
<li>Speaks about what the paper is about</li>
<li>Hopefully interesting</li>
<li>No detail</li>
</ul></li>
<li>Abstract
<ul>
<li>Motivation of the problem</li>
<li>Bottom Line Results</li>
</ul></li>
<li>Body / Results
<ul>
<li>Methods</li>
<li>More detailed results</li>
<li>Sensitivity Analysis</li>
<li>Implication of Results</li>
</ul></li>
<li>Supplementary Materials / Gory Details
<ul>
<li>Details on what was done</li>
</ul></li>
<li>Code / Data / Really Gory Details
<ul>
<li>For reproducibility</li>
</ul></li>
</ul>
<h2>Hierarchy of Information: Email Presentation</h2>
<ul>
<li>Subject Line / Subject Info
<ul>
<li>At a minimum: include one</li>
<li>Can you summarize findings in one sentence?</li>
</ul></li>
<li>Email Body
<ul>
<li>A brief description of the problem / context: recall what was proposed and executed; summarize findings / results. (Total of 1-2 paragraphs)</li>
<li>If action is needed to be taken as a result of this presentation, suggest some options and make them as concrete as possible</li>
<li>If questions need to be addressed, try to make them yes / no</li>
</ul></li>
<li>Attachment(s)
<ul>
<li>R Markdown file</li>
<li>knitr report</li>
<li>Stay Concise: Don't spit out pages of code</li>
</ul></li>
<li>Links to Supplementary Materials
<ul>
<li>Code / Software / Data</li>
<li>Github Repository / Project Website</li>
</ul></li>
</ul>
<h2>DO: Start with Good Science</h2>
<ul>
<li>Remember: Garbage, in, garbage out</li>
<li>Find a coherent focused question. This helps solve many problems</li>
<li>Working with good collaborators reinforces good practices</li>
<li>Something that's interesting to you will hopefully motivate good habits</li>
</ul>
<h2>DON'T: Do Things By Hand</h2>
<ul>
<li>Editing spreadsheets of data to "clean it up"
<ul>
<li>Removing outliers</li>
<li>QA / QC</li>
<li>Validating</li>
</ul></li>
<li>Editing tables or figures (e.g rounding, formatting)</li>
<li>Downloading data from a website</li>
<li>Moving data around your computer, splitting, or reformatting files.</li>
</ul>
<p>Things done by hand need to precisely documented (this is harder than it sounds!)</p>
<h2>DON'T: Point and Click</h2>
<ul>
<li>Many data processing / statistical analysis packages have graphical user interfaces (GUIs)</li>
<li>GUIs are convenient / intuitive but the actions you take with a GUI can be difficult for others to reproduce</li>
<li>Some GUIs produce a log file or script which includes equivalent commands; these can be saved for later examination</li>
<li>In general, be careful with data analysis software that is highly interactive; ease of use can sometimes lead to non-reproducible analyses.</li>
<li>Other interactive software, such as text editors, are usually fine.</li>
</ul>
<h2>DO: Teach a Computer</h2>
<p>If something needs to be done as part of your analysis / investigation, try to teach your computer to do it (even if you only need to do it once) </p>
<p>In order to give your computer instructions, you need to write down exactly what you mean to do and how it should be done. Teaching a computer almost guarantees reproducibility</p>
<p>For example, by, hand you can</p>
<pre><code> 1. Go to the UCI Machine Learning Repository at http://archive.ics.uci.edu/mil/
2. Download the Bike Sharing Dataset</code></pre>
<p>Or you can teach your computer to do it using R</p>
<li>The full URL to the dataset file is specified</li>
<li>The name of the file saved to your local computer is specified</li>
<li>The directory to which the filed was saved is specified ("ProjectData")</li>
<li>Code can always be executed in R (as long as link is available)</li>
</ul>
<h2>DO: Use Some Version Control</h2>
<p>It helps you slow things down by adding changes into small chunks. (Don't just do one massive commit). It allows one to track / tag snapshots so that one can revert back to older versions of the project. Software like Github / Bitbucket / SourceForge make it easy to publish results.</p>
<h2>DO: Keep Track of Your Software Environment</h2>
<p>If you work on a complex project involving many tools / datasets, the software and computing environment can be critical for reproducing your analysis.</p>
<p><strong>Computer Architecture</strong>: CPU (Intel, AMD, ARM), CPU Architecture, GPUs</p>
<p><strong>Operating System</strong>: Windows, Mac OS, Linux / Unix</p>
<p><strong>Software Toolchain</strong>: Compilers, interpreters, command shell, programming language (C, Perl, Python, etc.), database backends, data analysis software</p>
<p><strong>Supporting software / infrastructure</strong>: Libraries, R packages, dependencies</p>
<p><strong>External dependencies</strong>: Websites, data repositories, remote databases, software repositories</p>
<p><strong>Version Numbers:</strong> Ideally, for everything (if available)</p>
<p>This function in R helps report a bunch of information relating to the software environment</p>