Imagine having a fully provisioned data science environment with all desired libraries and even an IDE with useful extensions, without having to install any packages packages on your system, just by simply placing a single file in a directory and running a single command. What if, by sharing two files, two people can have identical setups, regardless of operating system. Imagine you could add, remove, and upgrade libraries without saying a little prayer not to get the dreaded message that a compatible set of dependencies could not be found. The Nix package manager provides all this, and more, and can be itself installed on any operating system.
My computer is very clean. I don’t have any IDEs installed, nor any data science libraries. Well, that’s not really true. Actually I have many versions of each of these installed on my system. But they are only accessible in the directories to which they pertain, although they are not installed in those directories like they are when using virtual environments. And even though Python itself is installed system-wide on my computer, each development directory has its own version of Python.
Traditional Package Management
Package management for data scientists working in Python has been notoriously difficult. Pip, Conda, Miniconda, Mamba, ux, poetry, venvwrapper, Docker … the list goes on of the different package managers and solutions are available. They all work, mostly. But they all break, occasionally. None are truly reproducible, and they are all prone to the eventual library conflicts, when one package upgrades its own dependencies, and the upgraded dependencies aren’t compatible with another package. Returning to older projects after some time can be hazardous.
To be fair, package management is a complex problem when dozens of packages with shared dependencies need to be installed and maintained. The problem is worse when considering portability to other computers and other operating systems. I should say, it is a complex problem under the traditional FHS paradigm. I have written about the problems caused by the FHS, and the gymnastics required to work around its limitations. The basic problem is that the FHS does not easily accommodate multiple versions of the same package, and the traditional way of installing packages relies heavily on shared libraries.
Virtual environments go some way to resolve the package management problems. Instead of installing packages system-wide, they are installed in a subdirectory in the project folder. When the virtual environment is active (and you need to remember to manually do so), all relevant environment variables are set to point to these local packages. This way there are no version conflicts between different projects on the same machine.
But many problems still remain. Firstly, it doesn’t solve the problem of version conflicts among packages within the same project. Secondly, even with an accompanying requirements.txt
, there is no guarantee that someone else will get the exact same libraries as were used in the original project. And then there is the question of interoperablilty between OS. Docker solves that one, but then requires that a Docker or Podman process be running the whole time to manage the container, and each Docker image contains and runs a full blown operating system, with no version guarantees about that.
Nix, uniquely, sidesteps all these problems by abandoning the whole FHS paradigm with shared libraries, installing everything needed by a package, and not relying on anything else being installed elsewhere.
Nix package management
With Nix, a fully provisioned development environment is declared in a text file, which is basically a list of packages you want, although it is actually a functional program(!). It defines a graph of all of the dependencies and installs them in a single /nix/store
directory. Nix prepends the hash values of the packages/libraries (actually called derivations) to the to each filename in the store, ensuring no collisions between different versions of any particular library, and that each package has its own requirements satisfied independently.
Something else to note: by its nature, all installations are atomic, so either everything is installed or nothing is installed. Under the traditional way of installing elements sequentially through a series of commands, you can end up with partially installed packages. And on the other end, removing packages is a clean procedure. Nix installs packages declaratively, unlike the traditional imperative method.
The flake.nix
file can create the development environments automatically when you enter a directory with a mechanism called direnv
, avoiding the need to invoke the environment at all. Within this directory, and any sub-directories, you can launch your IDE of choice, and have access to all the libraries and tools you have declared. When you leave the directory you leave the shell, and you no longer have access to the packages in the directory. Another directory can have an entirely different set of packages, with potentially entirely different version numbers for packages, and there is no conflict.
Nix does have its challenges, notable around documentation, which is generally poor and inconsistent, but improving all the time.
Give it a try
Nixpkgs, the repository for Nix, is actually the largest software repository of all, hosting even more packages than Arch’s AUR. Nix can be installed on Linux, Mac, or Windows WSL. Official installation instructions can be found here, but I recommend using Zero to Nix to install Nix because it will configure the flakes feature for you. Installation boils down to running a simple command. Once Nix is installed, try typing nix shell nixpkgs#floorp
to try out a fun web browser. When you are done, exit the shell and program disappears. You can actually use it as the package manager for your system if you want instead of your native package manager, and I know many Mac users prefer it to homebrew.
To jump-start your experience, here is a flake for a Python data science environment. By placing this file in a directory and running nix develop
, you will have Jupyter Lab, VS Code with relevant extensions, and some basic data science libraries. To “install” more packages, just add them to the list and run nix develop
again. Even better, install direnv, and the environment will automatically load when you enter the directory and unload when you leave it. Unfortunately, there are just a few data science packages which are not available on nixpkgs. The flake provides the ability to pip install these, and these packages will be available along with the other packages without a need to activate an additional venv.
Happy coding!
```{nix}
# flake.nix
{
description = "Python Data Science";
inputs = {
nixpkgs = {
url = "github:nixos/nixpkgs/nixos-unstable";
};
flake-utils = {
url = "github:numtide/flake-utils";
};
};
outputs = { nixpkgs, flake-utils, ... }: flake-utils.lib.eachDefaultSystem (system:
let
pkgs = import nixpkgs {
inherit system;
config.allowUnfree = true;
};
myvscode = pkgs.vscode-with-extensions.override {
vscodeExtensions = (with pkgs.vscode-extensions; [
equinusocio.vsc-material-theme
equinusocio.vsc-material-theme-icons
vscodevim.vimms-python.python
ms-python.vscode-pylance
ms-toolsai.jupyter
ms-toolsai.jupyter-renderers
]);
};
in {
devShell = pkgs.mkShell {
name = "python-venv";
venvDir = "./.venv";
buildInputs = with pkgs; [
tmux
myvscode(python3.withPackages(ps: with ps; [
ipython
pip
jupyterwidgetsnbextension
jupyter-nbextensions-configurator
jedi-language-server
ipywidgets
mypy
notebook
pandas
numpy
matplotlib
seaborn
]))
];
postVenvCreation = ''
unset SOURCE_DATE_EPOCH
pip install -r requirements.txt
'';
shellHook = ''
# export BROWSER=brave
# Tells pip to put packages into $PIP_PREFIX instead of the usual locations.
export PIP_PREFIX=$(pwd)/_build/pip_packages
export PYTHONPATH="$PIP_PREFIX/${pkgs.python3.sitePackages}:$PYTHONPATH"
export PATH="$PIP_PREFIX/bin:$PATH"
unset SOURCE_DATE_EPOCH
# jupyter lab # uncomment to automatically launch jupyter
'';
postShellHook = ''
# allow pip to install wheels
unset SOURCE_DATE_EPOCH
'';
};
}
);
}
```