scikit-learn

Requirements#

Requires the spectral-data-converter-sklearn library.

Plugins#

Training a model#

The following command-line loads spectra in ADAMS format, leaves only every 4th wave number, then builds a PLS regression model with three components on the al.ext_usda.a1056_mg.kg target using sklearn-fit and saves it to disk:

sdc-convert -l INFO -b \
  from-adams \
    -l INFO \
    -i {CWD}/train/*.spec \
  downsample \
    -n 4 \
  sklearn-fit \
    -l INFO \
    -m sklearn.cross_decomposition.PLSRegression \
    -p "{\"n_components\": 3}" \
    -t al.ext_usda.a1056_mg.kg \
    -o {CWD}/model/al.pkl

Using a template#

For more complicated setups, it can be easier to create a pickled template of the estimator/pipeline. Such a template can then be loaded via the -T/--template option of the sklearn-fit plugin.

First, create the template:

import pickle
from sklearn.pipeline import Pipeline
from sklearn.cross_decomposition import PLSRegression
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('pls', PLSRegression(n_components=3))
])

with open("./model/al_template.pkl", "wb") as fp:
    pickle.dump(pipe, fp)

Then, make use of the template:

sdc-convert -l INFO -b \
  from-adams \
    -l INFO \
    -i {CWD}/train/*.spec \
  downsample \
    -n 4 \
  sklearn-fit \
    -l INFO \
    -T {CWD}/model/al_template.pkl \
    -t al.ext_usda.a1056_mg.kg \
    -o {CWD}/model/al.pkl

Making predictions#

Having a trained model in place, we can use it to make predictions. The sklearn-predict filter loads a pickled model from disk and then generates a prediction for each spectrum passing through, setting the value under the specified target sample data field:

sdc-convert -l INFO -b \
  from-adams \
    -l INFO \
    -i {CWD}/test/*.spec \
  downsample \
    -n 4 
  sklearn-predict \
    -l INFO \
    -m {CWD}/model/al.pkl \
    -t al.ext_usda.a1056_mg.kg \
  to-adams \
    -l INFO \
    -o {CWD}/predictions \
    --output_sampledata