PaddleX Docker Image release

Applied Machine Learning Group, University of Waikato

2025-11-04 20:43

The first Docker images for the PaddleX framework are now available:

https://github.com/waikato-datamining/paddlex

According to the project's website:

PaddleX 3.0 is a low-code development tool for AI models built on the PaddlePaddle framework. It integrates numerous ready-to-use pre-trained models, enabling full-process development from model training to inference, supporting a variety of mainstream hardware both domestic and international, and aiding AI developers in industrial practice.

spectral-data-converter release

Applied Machine Learning Group, University of Waikato

2025-10-31 20:43

A new release of our spectral-data-converter-all library is now available: 0.1.0. Docker images have been deployed as well.

Just like with the image-dataset-converter-all, this release benefits from the major overhaul under the hood and the integration of the kasperl libraries.

Check out the examples as well.

audio-dataset-converter release

Applied Machine Learning Group, University of Waikato

2025-10-31 20:01

A new release of our audio-dataset-converter-all library is now available: 0.1.0. Docker images have been deployed as well.

Just like with the image-dataset-converter-all, this release benefits from the major overhaul under the hood and the integration of the kasperl libraries.

Check out the examples as well.

image-dataset-converter release

Applied Machine Learning Group, University of Waikato

2025-10-31 15:45

A new release of our image-dataset-converter-all library is now available: 0.1.0. Docker images have been deployed as well.

This release represents a major overhaul under the hood and lots of new functionality has been added as well. With this release it is now possible to write more complex/integrated/reactive pipelines with the added I/O and email plugins.

It is always worth checking out the examples.

The most notably changes since 0.0.12 are:

a new release of seppl allows for filters that support m-to-n not just 1-to-1 conversions
added list-to-sequence stream filter that forwards list items one by one
common code among the image-dataset-converter, audio-dataset-converter and spectral-data-converter libraries has been extracted and moved into separate libraries to reduce duplication: kasperl, kasperl_redis
the new kasperl_plots library adds basic support for terminal plots (textual and graphical) and matplotlib ones
the filters tee, trigger and sub-process support conditional execution based on meta-data evaluations now, as well as loading their sub-pipeline from a file to break up large pipelines into smaller, logical chunks
added block, stop filters for controlling the flow of data (via meta-data conditions)
the idc-exec tool now uses all trailing arguments as the pipeline to execute multiple times rather than as a single argument to a flag; alternatively, the pipeline can be loaded from a file
the idc-convert tool can load a pipeline from a file now as well
added the text-file and csv-file generators that work off files to populate the variable(s)
the readers from-grayscale-dp, from-indexed-png-is, from-blue-channel-is and from-grayscale-is now support reading only the annotations
added from-text-file reader and to-text-file writer
added a number of I/O related plugins: list-files, move-files, delete-files, copy-files, watch-dir
added email support with get-email reader and send-email writer
added console writer for outputting the data on stdout that is coming through
added count-specks filter that adds counts of small objects to meta-data
added support for caching plugins via IDC_CLASS_CACHE environment variable
added is-to-od filter that generates object detection annotations from contours determined in image segmentation layers
added to-metadata writer that outputs the meta-data of an image
added attach-metadata filter that loads meta-data from a directory and attaches it to the data passing through
added load-data filter to turn file names into data containers
added annotation-to-storage and annotation-from-storage filters
added delete-storage filter for removing objects from internal storage
annotation data is now being type-checked when setting it

spectral-data-converter release

Applied Machine Learning Group, University of Waikato

2025-07-11 11:23

A new release of our spectral-data-converter-all library is now available: 0.0.2. Docker images have been deployed as well.

This release contains couple of major of changes:

support for direct I/O: most readers/writers can operate on file-like objects now as well
reading from/writing to ZIP files: from-zip, to-zip

image-dataset-converter release

Applied Machine Learning Group, University of Waikato

2025-07-11 10:23

A new release of our image-dataset-converter-all library is now available: 0.0.12. Docker images have been deployed as well.

The most notably changes since 0.0.11 are:

dropped numpy<2.0.0 restriction
added grayscale-to-binary filter
fix: sort-pixels, rgb-to-grayscale filters
the rename filter now supports lower/upper case placeholders of name and extension as well
requiring seppl>=0.2.17 now for skippable plugin support and avoiding deprecated use of pkg_resources
added any-to-rgb filter for turning binary/grayscale images back into RGB ones
added label-to-metadata filter for transferring labels into meta-data
added metadata-to-placeholder filter for transferring meta-data into placeholders
added basic support for images with associated depth information: DepthData, DepthInformation
added depth-to-grayscale filter for converting depth information to grayscale image
added depth information readers from-grayscale-dp, from-numpy-dp, from-csv-dp and from-pfm-dp
added depth information writers to-grayscale-dp, to-numpy-dp, to-csv-dp and to-pfm-dp
added apply-ext-mask filter to applying external PNG masks to image containers (image and/or annotations)
added apply-label-mask filter for applying image segmentation label masks to their base images
added label-present-ic and label-present-is that ensure that certain label(s) are present or otherwise discard the image
filter label-present was renamed to label-present-od but keeping label-present as alias for the time being
fix: imgseg_to_bluechannel, imgseg_to_indexedpng and imgseg_to_grayscale now handle overlapping pixels correctly, no longer adding them up and introducing additional labels
discard-by-name filter can use names of files in specified paths now as well
fixed the construction of the error messages in the pyfunc reader/filter/writer classes

llm-dataset-converter release

Applied Machine Learning Group, University of Waikato

2025-07-11 09:20

Version 0.2.7 of our llm_dataset_converter library has been release. New release of ldc_doc, ldc_docx, ldc_faster_whisper, ldc_google, ldc_openai, ldc_pdf and ldc_tint have been made available as well.

The meta-library that combines all the libraries now stands at version 0.0.6:

llm-dataset-converter-all

A new Docker image is available as well:

https://hub.docker.com/r/waikatodatamining/llm-dataset-converter/tags

This release is mostly a maintenance release, but still had some useful additions:

added set-placeholder filter for dynamically setting (temporary) placeholders at runtime
added remove-strings filter that just removes sub-strings
added strip-strings filter for stripping whitespaces from start/end of strings

audio-dataset-converter release

Applied Machine Learning Group, University of Waikato

2025-07-10 13:07

A new release of our audio-dataset-converter library and it various additional dependent libraries is out.

The meta-library that combines all the libraries now stands at version 0.0.3:

audio-dataset-converter-all

A new Docker image is available as well:

https://hub.docker.com/r/waikatodatamining/audio-dataset-converter/tags

Notable changes:

improved support for placeholders via the set-placeholder and metadata-to-placeholder filters
added from-multi and to-multi for combining multiple readers/writers
added the --resume_from option to readers to allow resuming the processing from a specific file
added the --split_group option ti writers: a regular expression with a single group used for keeping items in the same split, e.g., for identifying the base name of a file or the ID

spectral-data-converter release

Applied Machine Learning Group, University of Waikato

2025-06-27 10:23

The first release of our spectral-data-converter-all library is now available: 0.0.1. Docker images have been deployed as well.

This library allows you to define and run processing pipelines on the command-line, e.g., for:

converting data from one format into another (e.g., OPUS to NIR)
clean the data (e.g., IQR)
transform the data (e.g., SIMPLS, PLS1, standardize)
build and apply scikit-learn models

You can find examples for various scenarios here:

data-mining.co.nz/spectral-data-converter-examples/

djl-arff release

Applied Machine Learning Group, University of Waikato

2025-05-02 11:23

Deep Java Library (DJL) is an open source library to build and deploy deep learning in Java, developed by Amazon.com. Besides the usual image models, it also offers some basic support for tabular data. Since its input is limited to CSV files, I decided to add support for Weka ARFF files.

The result of this effort is the djl-arff library:

https://github.com/waikato-datamining/djl-arff/