Data science workflow with Jupyter and Overleaf v2, Part I: setup

Jupyter is a tool to work with data and text; Jupyter uses the markdown format to write text, and you can use markdown to write text in Jupyter and then convert the text to html and latex for outputs. If you use Pandoc, you can stretch the range of outputs to other formats (see the pandoc website for more options).

Overleaf is a webapp that uses latex to output text. Recently Overelaf released their Version 2 of the webapp; this version is still in beta; but a good feature of Overleaf’s new version is integration with github. Jupyter+Github+Overleaf provides a way to write in Jupyter, share the notebook in github where you can maintain code and show the code outputs (using binder.org), and you can share a final draft of the work and collaborate with others using a PDF document. If you use the PDF document to share your idea in the form of preprints, you can seamlessly integrate a workflow with your data analysis.

Here’s my workflow is to show that it is possible to create a repo using Overleaf, then work using Jupyter and then, using Jupyter, you can push a latex document back to Overleaf.

Basically,
Overleaf -> github -> Jupyter notebook -> latex -> Overleaf

This is an example. For the first time you set up the workflow, you do the following:

  1. Start a document in Overleaf. To do so, log into Overleaf version two and then select a new project
  2. Link the project to github
  3. On your PC, create a new folder, cd into it, and do: git init; git remote add origin {URL of the github repo}
  4. Then in the same folder start a jupyter file and name it as main.ipynb (this is not needed; it is my shortcut, you can name your file anything indeed).
  5. Write your content and run your analyses in Jupyter.
  6. Create a special tplx file to modify the latex outputs from jupyter notebooks
  7. Then, when you are done with your analyses, convert the jupyter notebook to a latex file.
  8. Then export the contents of file to github repo but first convert the ipynb to latex.
  9. Finally, pull the contents of the github to Overleaf.

In subsequent stages, all you do is to write or edit the document in Jupyter, convert to latex, push it to github, and pull it in from Overleaf.

What this process does obviously sets up a one way push from Jupyter notebook to Overleaf. At this time it is not possible to set up a reverse transformation from latex to jupyter notebook formats. But there are several ways in which a jupyter notebook format allows for nice integration between code, text and harnesses the power of Overleaf.

Requirements #

For this to work, you will need a working copy each of the following:

  1. You will need Jupyter notebook installed
  2. You will need Pandoc installed
  3. If you also want to see how the PDF will appear locally once rendered, you will need a copy of LaTeX software for your platform
  4. Finally, you will need Git for this to work.

A few other things #

In an academic article, we tend to do following:

Writing tables #

For tables, you can use Markdown tables in Jupyter. For outputs of analyses you use R and use knitr package, yoou can write codes in R and output neat markdown tables that will be correctly rendered in LaTeX. I you use Python, you can use pytablewriter

Inserting Citations #

Use a bibtex file (a file with extension .bib) to work with. Most reference managers will output a bibtex file, and I find Jabref very useful in managing bibtex files. You can use Zotero as well. Google scholar provides bibtex file support. Here are the steps:

For creating the template file:

%%writefile temp.tplx
((*- extends 'article.tplx' -*))

((* block author *))
\author{Arindam Basu}
((* endblock author *))

((* block title *))
\title{My first data analysis example}
((* endblock title *))

((* block bibliography *))
\bibliographystyle{unsrt}
\bibliography{refs}
((* endblock bibliography *))

… and then for conversion to latex

jupyter nbconvert --to latex --template temp main.ipynb

Next Steps #

First, Jupyter is versatile in being simple and in weaving code and text seamlessly using Markdown and at least three languages (R, Julia, and Python) and you can seamlessly change between them. Second, using github as an intermediate channel, you can share the python notebooks via mybinder.org and this way, you can share a notebook for wider audience and channel a tidy PDF version that you can publish as preprint. You can also collaborate with others on the LaTeX version as the final production version of tables and graphs while keeping the production version of the notebook. While notebooks are useful in their own ways, a text based PDF-ised version with a LaTeX backend is often easy to read. In part II of this series, I will explore doing data analysis in Jupyter and writing about it in Overleaf.

 
25
Kudos
 
25
Kudos

Now read this

The Trouble of Conducting “Experiments” in Social Media

Caveat of What you see and Post in Facebook: Perils of Unsupervised Studies on human emotion # Kramer et al (2014) recently published an article, “Experimental Evidence of Massive Scale Emotional Contagion through social networks”,... Continue →