PaddleX Docker Image release

The first Docker images for the PaddleX framework are now available:

https://github.com/waikato-datamining/paddlex

According to the project's website:

PaddleX 3.0 is a low-code development tool for AI models built on the PaddlePaddle framework. It integrates numerous ready-to-use pre-trained models, enabling full-process development from model training to inference, supporting a variety of mainstream hardware both domestic and international, and aiding AI developers in industrial practice.

image-dataset-converter release

A new release of our image-dataset-converter-all library is now available: 0.1.0. Docker images have been deployed as well.

This release represents a major overhaul under the hood and lots of new functionality has been added as well. With this release it is now possible to write more complex/integrated/reactive pipelines with the added I/O and email plugins.

It is always worth checking out the examples.

The most notably changes since 0.0.12 are:

  • a new release of seppl allows for filters that support m-to-n not just 1-to-1 conversions

  • added list-to-sequence stream filter that forwards list items one by one

  • common code among the image-dataset-converter, audio-dataset-converter and spectral-data-converter libraries has been extracted and moved into separate libraries to reduce duplication: kasperl, kasperl_redis

  • the new kasperl_plots library adds basic support for terminal plots (textual and graphical) and matplotlib ones

  • the filters tee, trigger and sub-process support conditional execution based on meta-data evaluations now, as well as loading their sub-pipeline from a file to break up large pipelines into smaller, logical chunks

  • added block, stop filters for controlling the flow of data (via meta-data conditions)

  • the idc-exec tool now uses all trailing arguments as the pipeline to execute multiple times rather than as a single argument to a flag; alternatively, the pipeline can be loaded from a file

  • the idc-convert tool can load a pipeline from a file now as well

  • added the text-file and csv-file generators that work off files to populate the variable(s)

  • the readers from-grayscale-dp, from-indexed-png-is, from-blue-channel-is and from-grayscale-is now support reading only the annotations

  • added from-text-file reader and to-text-file writer

  • added a number of I/O related plugins: list-files, move-files, delete-files, copy-files, watch-dir

  • added email support with get-email reader and send-email writer

  • added console writer for outputting the data on stdout that is coming through

  • added count-specks filter that adds counts of small objects to meta-data

  • added support for caching plugins via IDC_CLASS_CACHE environment variable

  • added is-to-od filter that generates object detection annotations from contours determined in image segmentation layers

  • added to-metadata writer that outputs the meta-data of an image

  • added attach-metadata filter that loads meta-data from a directory and attaches it to the data passing through

  • added load-data filter to turn file names into data containers

  • added annotation-to-storage and annotation-from-storage filters

  • added delete-storage filter for removing objects from internal storage

  • annotation data is now being type-checked when setting it

image-dataset-converter release

A new release of our image-dataset-converter-all library is now available: 0.0.12. Docker images have been deployed as well.

The most notably changes since 0.0.11 are:

  • dropped numpy<2.0.0 restriction

  • added grayscale-to-binary filter

  • fix: sort-pixels, rgb-to-grayscale filters

  • the rename filter now supports lower/upper case placeholders of name and extension as well

  • requiring seppl>=0.2.17 now for skippable plugin support and avoiding deprecated use of pkg_resources

  • added any-to-rgb filter for turning binary/grayscale images back into RGB ones

  • added label-to-metadata filter for transferring labels into meta-data

  • added metadata-to-placeholder filter for transferring meta-data into placeholders

  • added basic support for images with associated depth information: DepthData, DepthInformation

  • added depth-to-grayscale filter for converting depth information to grayscale image

  • added depth information readers from-grayscale-dp, from-numpy-dp, from-csv-dp and from-pfm-dp

  • added depth information writers to-grayscale-dp, to-numpy-dp, to-csv-dp and to-pfm-dp

  • added apply-ext-mask filter to applying external PNG masks to image containers (image and/or annotations)

  • added apply-label-mask filter for applying image segmentation label masks to their base images

  • added label-present-ic and label-present-is that ensure that certain label(s) are present or otherwise discard the image

  • filter label-present was renamed to label-present-od but keeping label-present as alias for the time being

  • fix: imgseg_to_bluechannel, imgseg_to_indexedpng and imgseg_to_grayscale now handle overlapping pixels correctly, no longer adding them up and introducing additional labels

  • discard-by-name filter can use names of files in specified paths now as well

  • fixed the construction of the error messages in the pyfunc reader/filter/writer classes

llm-dataset-converter release

Version 0.2.7 of our llm_dataset_converter library has been release. New release of ldc_doc, ldc_docx, ldc_faster_whisper, ldc_google, ldc_openai, ldc_pdf and ldc_tint have been made available as well.

The meta-library that combines all the libraries now stands at version 0.0.6:

llm-dataset-converter-all

A new Docker image is available as well:

https://hub.docker.com/r/waikatodatamining/llm-dataset-converter/tags

This release is mostly a maintenance release, but still had some useful additions:

  • added set-placeholder filter for dynamically setting (temporary) placeholders at runtime

  • added remove-strings filter that just removes sub-strings

  • added strip-strings filter for stripping whitespaces from start/end of strings

audio-dataset-converter release

A new release of our audio-dataset-converter library and it various additional dependent libraries is out.

The meta-library that combines all the libraries now stands at version 0.0.3:

audio-dataset-converter-all

A new Docker image is available as well:

https://hub.docker.com/r/waikatodatamining/audio-dataset-converter/tags

Notable changes:

  • improved support for placeholders via the set-placeholder and metadata-to-placeholder filters

  • added from-multi and to-multi for combining multiple readers/writers

  • added the --resume_from option to readers to allow resuming the processing from a specific file

  • added the --split_group option ti writers: a regular expression with a single group used for keeping items in the same split, e.g., for identifying the base name of a file or the ID

spectral-data-converter release

The first release of our spectral-data-converter-all library is now available: 0.0.1. Docker images have been deployed as well.

This library allows you to define and run processing pipelines on the command-line, e.g., for:

  • converting data from one format into another (e.g., OPUS to NIR)

  • clean the data (e.g., IQR)

  • transform the data (e.g., SIMPLS, PLS1, standardize)

  • build and apply scikit-learn models

You can find examples for various scenarios here:

data-mining.co.nz/spectral-data-converter-examples/