Llama-2 Docker images available

Applied Machine Learning Group, University of Waikato

2023-11-10 16:33

Llama-2, despite not actually being open-source as advertised, is a very powerful large language model (LLM), which can also be fine-tuned with custom data. With version v0.0.3 of our llm-dataset-converter Python library, it is now possible to generate data in jsonlines format that the new Docker images for Llama-2 can consume:

In-house registry:
- public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-huggingface-transformers:4.31.0_cuda11.7_llama2
Docker hub:
- waikatodatamining/pytorch-huggingface-transformers:4.31.0_cuda11.7_llama2

Of course, you can use these Docker images in conjunction with our gifr Python library for gradio interfaces as well (gifr-textgen).

gifr release

Applied Machine Learning Group, University of Waikato

2023-11-03 14:00

A lot of our Docker images allow the user to make predictions in two ways: using simple file-polling or via a Redis backend. File-polling is great for testing, but unsuitable for a production system due to wear-and-tear on SSDs.

Initially, I developed a really simple library for sending and receiving data via Redis, called simple-redis-helper:

https://github.com/fracpete/simple-redis-helper

With this library you get some command-line tools for broadcasting, listening, etc. Sufficient for someone who is comfortable with the command-line (or especially when logged in remotely via terminal), but not so great for your clients.

Now, there is the brilliant gradio library that was specifically developed for such scenarios: to create easy to use and great looking interfaces for your machine learning models.

The last couple of days, I have put together a new library that is tailored to our Docker images called gifr:

https://github.com/waikato-datamining/gifr

With the first release, the following types of models are supported:

image classification
image segmentation
object detection/instance segmentation
text generation

llm-dataset-converter release

Applied Machine Learning Group, University of Waikato

2023-10-27 09:47

Over the last couple of months, we have been working on a little command-line tool that allows you to convert LLM datasets from one format into another, appropriately called llm-dataset-converter:

https://github.com/waikato-llm/llm-dataset-converter

With the first release (0.0.1), you can not only load data from and save to in various formats (csv/tsv, text, json, jsonlines, parquet). The tool lets you define pipelines using the following format:

reader [filter [filter ...]] [writer]

Each component in the pipeline comes with its own set of command-line parameters. You can even tee off records and process them differently (e.g., writing the same data to different output formats).

The library also has other tools, for downloading files or datasets from huggingface or combining text files.

In order to make building such pipeline-oriented tools simpler to develop, we created a base library that manages the handling of plugins (and, if necessary, their compatibility) called seppl (Simple Entry Point PipeLines):

https://github.com/waikato-datamining/seppl

Thanks to seppl, the llm-dataset-converter library can be easily extended with additional modules, as it uses a dynamic approach to locating plugins: you only need to define in what modules to look for what superclass (like Reader, Filter, Writer).

LLM organization on Github

Applied Machine Learning Group, University of Waikato

2023-10-19 10:56

We have launched a new Github organization to house all our libraries and Docker images around large language models (LLMs):

https://github.com/waikato-llm

MMDetection 3.1.0 Docker images available

Applied Machine Learning Group, University of Waikato

2023-09-01 16:56

New Docker images are available for the MMDetection object detection framework, using the 3.1.0 release of MMDetection (code base as of 2023-06-30):

CUDA 11.3
CPU (inference only)

Finetune GTP2-XL Docker images available

Applied Machine Learning Group, University of Waikato

2023-08-28 15:53

The finetune-gpt2xl repository allows the fine-tuning and using of GPT2-XL and GPT-Neo models (the repository uses the Hugging Face transformers library) and is now available via the following docker images:

In-house registry:
- public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-huggingface-transformers:4.7.0_cuda11.1_finetune-gpt2xl_20220924
Docker hub:
- waikatodatamining/pytorch-huggingface-transformers:4.7.0_cuda11.1_finetune-gpt2xl_20220924

Segment-Anything in High Quality Docker images available

Applied Machine Learning Group, University of Waikato

2023-08-28 15:16

Docker images for Segment-Anything in High Quality (SAM-HQ) are now available.

Just like SAM, SAM-HQ is a great tool for aiding a human annotating images for image segmentation or object detection, as it can determine a relatively good outline of an object based on either a point or a box. Only pre-trained models are available.

The code used by the docker images is available from here:

github.com/waikato-datamining/pytorch/tree/master/segment-anything-hq

The tags for the images are as follows:

In-house registry:
- public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-sam-hq:2023-08-17_cuda11.6
- public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-sam-hq:2023-08-17_cpu
Docker hub:
- waikatodatamining/pytorch-sam-hq:2023-08-17_cuda11.6
- waikatodatamining/pytorch-sam-hq:2023-08-17_cpu

Falcontune Docker images available

Applied Machine Learning Group, University of Waikato

2023-08-21 16:39

The falcontune library for fine-tuning and using Falcon 7B/40B models (which is based on the Hugging Face transformers library) is now available via the following docker images:

In-house registry:
- public.aml-repo.cms.waikato.ac.nz:443/pytorch/pytorch-huggingface-transformers:4.31.0_cuda11.7_falcontune_20230618
Docker hub:
- waikatodatamining/pytorch-huggingface-transformers:4.31.0_cuda11.7_falcontune_20230618

Redis-related Docker image updates

Applied Machine Learning Group, University of Waikato

2023-08-09 11:41

The redis-docker-harness Python library, which is used by a lot of our Docker images, has received a number of updates (at time of writing, the version of the library in use is 0.0.4):

ability to specify a password for the Redis server
specify the timeout parameter for the the Redis client, with larger timeouts resulting in lower CPU load (the default is now 0.01 instead of 0.001)

Unfortunately, this required re-releasing the most recent images of the following frameworks:

detectron2
mmdetection
mmsegmentation
yolov5
yolov7
Segment Anything (SAM)
DEXTR

The images kept their version number, you just need to pull them again, or use --pull ALWAYS in conjunction with docker run.

MMSegmentation 1.1.0 Docker images available

Applied Machine Learning Group, University of Waikato

2023-07-18 10:12

Docker images for building (and using) image segmentation models using the PyTorch-based framework MMSegmentation (version 1.1.0) are now available:

More information on the Docker images is available from Github:

https://github.com/waikato-datamining/mmsegmentation