tflite model maker tools

Tensorflow made the tflite model maker available a while go, which (according to the project website) simplifies the process of training a TensorFlow Lite model using custom dataset. It uses transfer learning to reduce the amount of training data required and shorten the training time.

But instead of falling back on scripting Python again, we pulled code from the their example notebooks into a library that offers command-line utilies and published it on PyPI to avoid having to write Python code over and over again for every new problem/dataset. It also makes it easy to create docker images from it, making it highly reusable.

video-frame-selector library released

The video-frame-selector Python 3 library is now available:

github.com/waikato-datamining/video-frame-selector

With this library you can present frames obtained from video files or a webcam to an image analysis framework, such as detectron2, and react to the generated predictions. For instance, you might want to trawl through a video from a trail camera and generate either JPG images or a shortened video that contains only the frames that show actual animal. Or you might only look for a specific animal, rather than all of them (e.g., only NZ pests such as rats or stoats). Frames that are to be kept can also be cropped to the smallest area that encompasses all the detected bounding boxes (you can enforce a margin around the cropped content and/or a minimum width/height).

Docker for Data Scientists website launched

Deep learning has opened a lot of new avenues in many domains for data scientists. However... The pain and suffering in setting up environments with the correct versions of CUDA, cuDNN, numpy and the actual deep learning framework is, unfortunately, all too real. Replicating an existing test environment on a production server can prove challenging. For quite some time now, we resorted to using docker to ease the pain of reproducing the same setup on various servers for running experiments. Docker will not solve the problem of having to figure out the right combination of libraries, but at least you can easily reuse docker images and build pipelines with frameworks that would otherwise have conflicting requirements in terms of libraries.

Long story short: in order to make it easier for data scientists to start their journey with docker, we compiled a little mkdocs website:

www.data-mining.co.nz/docker-for-data-scientists/

On this website, you will learn the docker basics and find also steps for creating (and using) a docker image that uses PyTorch's image classification facilities.

The website's source code is available from github and released under CC-BY-SA 4.0:

https://github.com/waikato-datamining/docker-for-data-scientists

simple-confusion-matrix library released

The simple-confusion-matrix Python 3 library has been released today:

github.com/waikato-datamining/simple-confusion-matrix

It is a simple library that can generate confusion matrices from CSV files or lists of actual and predicted labels. It can output the generated matrix either in plain text or CSV, as string or to a file.

Rather than just using counts, it can generate also:

  • percentages (all cells sum up to 1)

  • percentages per row (all cells in a row sum up to 1)

The latter is useful when dealing with imbalanced datasets, giving you a good idea of how well each label is being predicted.