S3000 Pod Support

Typically, our commercial framework for laboratories, S3000, combines training of models/cleaners/evaluators/etc and generating predictions on the same server. But that was mostly due to S3000 instances starting out with a small number of models, not because it was a requirement. Especially for mission-critical systems that output predictions 24/7, it makes sense to have a separate grunty server for training the models and then one or more light-weight servers that serve the predictions.

However, copying the versioned directory structures from the training directory across to other servers is not particularly efficient and it is easy to make mistakes. Especially, if one has to update the setups and/or components as well.

In order to speed up deployments of new models or simply make replacing only a few models easier, S3000 introduces the concept of Pods. A Pod is a self-contained zip file that combines all the serialized models like classifier and evaluators plus a JSON configuration file describing the pod. Prediction generators that are Pod-aware will then read these Pod files and their configuration to generate a prediction workflow on-the-fly which will then load the models into internal storage, ready to be used.

Currently, only SIMPLE components are supported and FUSION will be added at a later stage.

S3000 HTML Support

During training runs, our commercial framework for laboratories, S3000, can output cross-validation statistics, which include summary statistics, spreadsheets and plots. Up till now, the plots were PNG image files, making it easy to include them in reports or quickly scan the model performances.

Simple scatter plot of actual vs predicted

A while ago, support for kernel-density estimate (KDE) plots was introduced, which can tell a better story for datasets with a lot of data points. The default scatter plot for actual vs predicted can be replaced using the ConfigTags.props file, e.g., with this KDE setup:

output_model_statistics#chart=adams.gui.visualization.jfreechart.chart.DensityScatterPlot -label-x actual -label-y predicted -mode KDE -generator "adams.gui.visualization.core.BiColorGenerator -first-color #0000ff -second-color #ffc800 -alpha 127"
Kernel-density estimate plot of actual vs predicted

However, static plots make it easy to spot outliers, but much harder to identify them. ADAMS introduced the generation of self-contained HTML plots in the WEKA Investigator, which can now be used in S3000 as well when outputting model statistics. The following ConfigTags.props entry will

output_model_statistics#html=adams.flow.transformer.actualvspredictedprocessor.ClassifierErrorsKernelDensityEstimate -circle-size 10
Kernel-density estimate plot of actual vs predicted in HTML

Yolo26 Docker images available

Docker images for Yolo26 are now available.

The code used by the docker images is available from here:

github.com/waikato-datamining/pytorch/tree/master/yolo26

The tags for the images are as follows:

  • In-house registry:

    • harbor.cms.waikato.ac.nz/public/pytorch/pytorch-yolo26:8.4.16_cuda12.6

    • harbor.cms.waikato.ac.nz/public/pytorch/pytorch-yolo26:8.4.16_cpu

  • Docker hub:

    • waikatodatamining/pytorch-yolo26:8.4.16_cuda12.6

    • waikatodatamining/pytorch-yolo26:8.4.16_cpu

The tutorial on object detection is available from here:

www.data-mining.co.nz/applied-deep-learning/object_detection/yolo26/

PaddleX Docker Image release

The first Docker images for the PaddleX framework are now available:

https://github.com/waikato-datamining/paddlex

According to the project's website:

PaddleX 3.0 is a low-code development tool for AI models built on the PaddlePaddle framework. It integrates numerous ready-to-use pre-trained models, enabling full-process development from model training to inference, supporting a variety of mainstream hardware both domestic and international, and aiding AI developers in industrial practice.

image-dataset-converter release

A new release of our image-dataset-converter-all library is now available: 0.1.0. Docker images have been deployed as well.

This release represents a major overhaul under the hood and lots of new functionality has been added as well. With this release it is now possible to write more complex/integrated/reactive pipelines with the added I/O and email plugins.

It is always worth checking out the examples.

The most notably changes since 0.0.12 are:

  • a new release of seppl allows for filters that support m-to-n not just 1-to-1 conversions

  • added list-to-sequence stream filter that forwards list items one by one

  • common code among the image-dataset-converter, audio-dataset-converter and spectral-data-converter libraries has been extracted and moved into separate libraries to reduce duplication: kasperl, kasperl_redis

  • the new kasperl_plots library adds basic support for terminal plots (textual and graphical) and matplotlib ones

  • the filters tee, trigger and sub-process support conditional execution based on meta-data evaluations now, as well as loading their sub-pipeline from a file to break up large pipelines into smaller, logical chunks

  • added block, stop filters for controlling the flow of data (via meta-data conditions)

  • the idc-exec tool now uses all trailing arguments as the pipeline to execute multiple times rather than as a single argument to a flag; alternatively, the pipeline can be loaded from a file

  • the idc-convert tool can load a pipeline from a file now as well

  • added the text-file and csv-file generators that work off files to populate the variable(s)

  • the readers from-grayscale-dp, from-indexed-png-is, from-blue-channel-is and from-grayscale-is now support reading only the annotations

  • added from-text-file reader and to-text-file writer

  • added a number of I/O related plugins: list-files, move-files, delete-files, copy-files, watch-dir

  • added email support with get-email reader and send-email writer

  • added console writer for outputting the data on stdout that is coming through

  • added count-specks filter that adds counts of small objects to meta-data

  • added support for caching plugins via IDC_CLASS_CACHE environment variable

  • added is-to-od filter that generates object detection annotations from contours determined in image segmentation layers

  • added to-metadata writer that outputs the meta-data of an image

  • added attach-metadata filter that loads meta-data from a directory and attaches it to the data passing through

  • added load-data filter to turn file names into data containers

  • added annotation-to-storage and annotation-from-storage filters

  • added delete-storage filter for removing objects from internal storage

  • annotation data is now being type-checked when setting it

S3000 Customization

Out of the box, our S3000 software comes with predefined workflow generator plugins that can be configured to suit a customer's needs. However, sometimes it is necessary to further refine that configuration, e.g., when generating statistics during training runs. In order not to overload the generator plugins with even more options, or create customized generators, the notion of config tags has been introduced. Generators that support such customization will state so in their help screen and list the supported tags. These tags can then be customized in the ConfigTags.props properties file.

image-dataset-converter release

A new release of our image-dataset-converter-all library is now available: 0.0.12. Docker images have been deployed as well.

The most notably changes since 0.0.11 are:

  • dropped numpy<2.0.0 restriction

  • added grayscale-to-binary filter

  • fix: sort-pixels, rgb-to-grayscale filters

  • the rename filter now supports lower/upper case placeholders of name and extension as well

  • requiring seppl>=0.2.17 now for skippable plugin support and avoiding deprecated use of pkg_resources

  • added any-to-rgb filter for turning binary/grayscale images back into RGB ones

  • added label-to-metadata filter for transferring labels into meta-data

  • added metadata-to-placeholder filter for transferring meta-data into placeholders

  • added basic support for images with associated depth information: DepthData, DepthInformation

  • added depth-to-grayscale filter for converting depth information to grayscale image

  • added depth information readers from-grayscale-dp, from-numpy-dp, from-csv-dp and from-pfm-dp

  • added depth information writers to-grayscale-dp, to-numpy-dp, to-csv-dp and to-pfm-dp

  • added apply-ext-mask filter to applying external PNG masks to image containers (image and/or annotations)

  • added apply-label-mask filter for applying image segmentation label masks to their base images

  • added label-present-ic and label-present-is that ensure that certain label(s) are present or otherwise discard the image

  • filter label-present was renamed to label-present-od but keeping label-present as alias for the time being

  • fix: imgseg_to_bluechannel, imgseg_to_indexedpng and imgseg_to_grayscale now handle overlapping pixels correctly, no longer adding them up and introducing additional labels

  • discard-by-name filter can use names of files in specified paths now as well

  • fixed the construction of the error messages in the pyfunc reader/filter/writer classes