Naive Bayes Classification

Naive Bayes Classifier

Operator alias: Naive-Bayes-Classifier · Category: Supervised Classification

Naive Bayes Classification

Naive Bayes is a probabilistic classifier based on applying Bayes' theorem with the naïve assumption that features are conditionally independent given the class. In the Gaussian variant used for numeric features, each feature is modeled by a class-conditional normal distribution.

Given a feature vector x = (x1, …, xd) and class c, the proportional form is:

P(c|x) ∝ P(c) ∏ N(x_j; μ_{c,j}, σ_{c,j}^2)

where P(c) is the class prior, and μc,j, σc,j are the per-class mean and standard deviation for feature j.Prediction chooses argmaxc P(c | x).

The fully normalized form is:

Normalized Naive Bayes posterior

For numerical stability, computation is performed in log-space:

log P(c|x) = log P(c) + Σ log N(…)

The decision rule is:

class(x) = argmaxc P(C = c) * ∏i P(Xi = xi | C = c)
   

where P(C = c) is the prior probability of class c (estimated from training data), and the likelihood term P(Xi | C = c) for numeric features is computed from the Gaussian probability density function.

Operator Overview

The Naive Bayes Classification operator integratesWEKA weka.classifiers.bayes.NaiveBayes classifier into SNAP. It supports training from a label band/list of vectors or loading a pre-trained model, and classifies per pixel in a tile-based fashion.

Note: When classifying optical products, exclude clouds, shadows, water, or other unwanted areas using masks to help maintain the validity of the Gaussian assumptions and improve classification accuracy.

Note: For best results, use optical datasets (e.g., Sentinel-2) with surface reflectance or brightness temperature bands.

Inputs / Outputs

Source Products (sourceProducts) Any raster product (e.g., Sentinel-2 L1C/L2A) with numeric bands.
Target Product (targetProduct) Copy of the source geocoding and metadata; adds bands:
  • nb_class (for vector trained) or predicted<trainingBand> (for raster trained) (Int16): predicted class index
  • nb_confidence (Float32): posterior probability of the predicted class in [0,1]

Parameters

Name Type Description Default
doLoadClassifier Boolean Choose to save or load classifier. If true, the operator will load a previously trained and saved classifier. false
savedClassifierName String The name of a previously trained and saved classifier.
Only used if doLoadClassifier is true.
(empty)
trainOnRaster Boolean Train on raster (true) or vector data (false) false
numTrainSamples Integer The number of training samples, interval (1,*] 5000
trainingVectors String Name of the vectors that holds training labels.
The name of each vector is used as a class label and its position as a class code (e.g. 1, 2...) .
Only specified if trainOnRaster=true.
At least 2 vectors are required. If not specified, all the vectors of the first source product are used.
(empty)
trainingBands String Name of the band that holds training labels. Pixels with valid (non-NaN) values are used as training samples.
Values are interpreted as integer class codes (e.g. 0, 1, 2...).
Only specified if trainOnRaster=false.
If not specified, the first band of the first source product is used.
(empty)
doClassValQuantization Boolean Quantization for raster training. Ignored for vector training. true
minClassValue Double Quantization min class value for raster training. Ignored for vector training or if doClassValQuantization=false 0.0
classValStepSize Double Quantization step size for raster training. Ignored for vector training or if doClassValQuantization=false 5.0
classLevels Double Quantization class levels for raster training. Ignored for vector training or if doClassValQuantization=false 101
featureBands String array Names of bands used as features.
If not specified, all the bands of the first source product are used.
(empty)

Note: All input bands (features and label) must have identical raster size. Pixels containing NaN in any feature band are skipped for training and assigned confidence = 0 during classification.

Trained model save / load

When a model is trained, the operator writes a bundled model file under ~/.snap/auxdata/classification/NaiveBayes/:

The saved models can be later used to classify other products.

References