Skip to content

Home

From one data scientist to another on how to utilize docker to make your life easier.

Docker

Any data scientist that ever had to set up an environment for a deep learning framework, knows that getting the combination of CUDA, cuDNN, deep learning framework and other libraries right is a frustrating exercise. With docker you still have to go through the pain of figuring out the right combination, but... And it is a BIG but! Once you have this blueprint called a docker image, you can use it on other machines as well; you will be up and running in seconds.

This is by no means supposed to be an exhaustive introduction to docker (docker can do a lot more!), but merely for getting you started on your journey.

Disclaimer: This introduction was written using a Linux host system. If you are using Mac OSX or WSL under Windows, your mileage may vary...

Sections#

To get started with docker, you should look at the following sections:

  • Basics - explains the most common operations that you will need to know
  • Dockerfile - explains the structure of the Dockerfile file which is used to create docker images
  • Best practices - what to keep in mind when creating and using images
  • Repositories - where to find the base images that your own images will use

If you should encounter problems, then have a look here:

Just like with any programming language or complex framework, there are certain things that can make your life easier. Therefore do not forget to have a look at:

Once you get a handle on things, and you are getting tired of manually building images, you might want to look into automating your builds and maybe also run your own registry/proxy. In that case, have a look at the following sections:

Finally, if you need to orchestrate multiple docker images, you can have a look at:

About the content#

This page was generated using mkdocs. The source code itself is hosted at github.com/waikato-datamining/docker-for-data-scientists and licensed under CC-BY-SA 4.0.