Over the last few years, I have noticed it has become increasingly popular to dislike Jupyter Notebooks with many people stating you should switch from Jupyter to scripts (here, here, here, here, etc.).
Indeed, there are some disadvantages to using Jupyter Notebooks, but that does not mean you should ignore the trove of advantages that could help you become a more efficient Data Scientist!
Jupyter Notebooks can complement your workflow
Like with most tools it is a matter of using the tool for its intended purposes. Sure, there are hacky ways that make it possible to use Jupyter Notebooks in production but its true potential shine in different areas.
I would argue that you should not switch to scripts but use Jupyter Notebooks as they are, a tool to use alongside scripts.
Using Jupyter Notebooks alongside scrips allows for synergy
In this article, I will go through several reasons why Jupyter Notebooks are not as bad as some might think. Here, I will be focusing on its advantages, use cases, and potential role in your tech stack.
1. They are great for exploration
Jupyter Notebooks are an amazing tool for exploration purposes. It allows you to quickly and, most importantly, interactively go through your data, create some visualizations, and calculate results all in one go.
Tools like pandas profiling help you increase the usefulness of Jupyter Notebooks as an exploration tool as quickly creates an overview of what can be found in the data.
Due to its interactive nature, you can then continue delving into the data and create visualizations of your own to complement the profile.
2. They can be used for developing code
One of the main concerns I hear regarding Jupyter is that it encourages bad coding practices. Indeed, if you are not careful Jupyter Notebooks can lead to polluting the global namespace, difficult source control, and issues with reproducibility if you do not run the notebooks in a linear fashion.
If they are used inappropriately, these are major concerns. However, that simply means that you should be more careful when using Jupyter Notebooks and understand how you can improve upon your workflow.
To improve the workflow and prevent those issues, there is one thing that typically solves most of these issues: packaging your code.
Use functions and classes where appropriate in a linear fashion
If you package your code in a .py *file or even a *pip installable package, you can then load that code in your Jupyter Notebook and continue developing.
%load_ext autoreload # Load the extension
%autoreload 2 # Autoreload all modules
%autoreload allow you to reload modules before executing your code. Put this at the beginning of your notebook and all functions and classes loaded are updated every time you execute a cell.
Moreover, with Jupyter Lab, notebooks are getting more and more similar to proper IDEs.
3. They are the perfect communication tool
To me, Jupyter Notebooks are ideal for communication purposes such as tutorials, presentations, and explaining algorithms. Users can use it to hide code, show visualizations, embed videos, interactively demonstrate results, and much more!
For example, whenever I create a package I make sure to include a set of Jupyter Notebooks that contain several tutorials on how to use the package. This eases sharing notebooks between colleagues, especially if you have taken the time to make it a nicely understandable document.
Personally, I am a fan of Voila and Fastpages. Voila allows you to render live versions of Jupyter Notebooks with interactive widgets. Fastpages is a blogging platform with support Jupyter Notebook that can be used as documentation.
4. They have a low barrier of entry
Jupyter Notebooks have an extremely low barrier of entry. Their interactive nature combined with their simple-to-use interface allows users to quickly start programming. They make exploring and visualizing your data a breeze.
This is especially helpful to those who are not primarily programmers but do have the need to additionally perform some analyses, such as Scientists in different fields.
As expected, there are a bunch of extensions that can lower this barrier even further!
Starting off with Jupyterhub, a tool that can create multiple instances of the single-user Jupyter Notebook server. In practice, you can offer notebook servers to a class of students. I have used it myself during my studies and had a blast using it!
Keeping in line with the academic environment, nbgrader allows teachers to easily assign and grade Jupyter Notebooks.
5. Plugins/extensions/tool can supercharge Jupyter
As you might have noticed from the above paragraphs, there is a huge amount of extensions out there that can improve upon your Jupyter Notebook workflow.
These extensions allow you to compensate for any disadvantages of Jupyter Notebooks and turn them into strengths!
Below, I will list an overview of extensions that I mentioned before, but I will also add some that I believe could be of great benefit to you.
Widgets & Visualizations
Using widget- and visualization extensions can make it a breeze to quickly visualize and explore your data. pandas-profiling and D-Tale are great tools for creating a nice overview of your data whereas ipywidgets and ipyvolume make it possible to further extend those capabilities.
For development, tools like papermill, nbdev, and nbdime allow for more reproducibility and richer features. They include the possibility to develop packages in Jupyter Notebooks and even apply unit testing where necessary.
Although Jupyter Notebooks are not meant for creating production-grade code, these extensions definitely help users approach that.
We can publish Jupyter Notebooks with extensions like Voila and Fastpages to render live versions. This makes it possible to communicate our results whilst allowing colleagues to experiment with them.
We can even take it one step further by converting Jupyter Notebooks in formats that might be better suited for your use case. Take nbconvert, for example, which converts Jupyter Notebooks to pdf, HTML, and even LaTeX!