llm-dataset-converter release

Version 0.2.7 of our llm_dataset_converter library has been release. New release of ldc_doc, ldc_docx, ldc_faster_whisper, ldc_google, ldc_openai, ldc_pdf and ldc_tint have been made available as well.

The meta-library that combines all the libraries now stands at version 0.0.6:

llm-dataset-converter-all

A new Docker image is available as well:

https://hub.docker.com/r/waikatodatamining/llm-dataset-converter/tags

This release is mostly a maintenance release, but still had some useful additions:

  • added set-placeholder filter for dynamically setting (temporary) placeholders at runtime

  • added remove-strings filter that just removes sub-strings

  • added strip-strings filter for stripping whitespaces from start/end of strings