Naive Bayes Classifier

Operator alias: Naive-Bayes-Classifier · Category: Supervised Classification

Naive Bayes Classification

Naive Bayes is a probabilistic classifier based on applying Bayes' theorem with the naïve assumption that features are conditionally independent given the class. In the Gaussian variant used for numeric features, each feature is modeled by a class-conditional normal distribution.

Given a feature vector x = (x₁, …, x_d) and class c, the proportional form is:

$P(c|x) ∝ P(c) ∏ N(x_j; μ_{c,j}, σ_{c,j}^2)$

where P(c) is the class prior, and μ_c,j, σ_c,j are the per-class mean and standard deviation for feature j.Prediction chooses argmax_c P(c | x).

The fully normalized form is:

For numerical stability, computation is performed in log-space:

The decision rule is:

class(x) = argmax_c P(C = c) * ∏_i P(X_i = x_i | C = c)

where P(C = c) is the prior probability of class c (estimated from training data), and the likelihood term P(X_i | C = c) for numeric features is computed from the Gaussian probability density function.

Operator Overview

The Naive Bayes Classification operator integratesWEKA weka.classifiers.bayes.NaiveBayes classifier into SNAP. It supports training from a label band/list of vectors or loading a pre-trained model, and classifies per pixel in a tile-based fashion.

Training: from a label band / list of vectors in the source product (supervised learning) or load a pre-trained model bundle.
Features: user-selected input bands.
Outputs: class label band and confidence band.
Model management: models are saved/loaded under SNAP's auxdata/classification/NaiveBayes directory with an associated parameter XML.

Note: When classifying optical products, exclude clouds, shadows, water, or other unwanted areas using masks to help maintain the validity of the Gaussian assumptions and improve classification accuracy.

Note: For best results, use optical datasets (e.g., Sentinel-2) with surface reflectance or brightness temperature bands.

Inputs / Outputs

Source Products (`sourceProducts`)	Any raster product (e.g., Sentinel-2 L1C/L2A) with numeric bands.
Target Product (`targetProduct`)	Copy of the source geocoding and metadata; adds bands: `nb_class` (for vector trained) or `predicted<trainingBand>` (for raster trained) (Int16): predicted class index `nb_confidence` (Float32): posterior probability of the predicted class in [0,1]

Parameters

Name	Type	Description	Default
`doLoadClassifier`	Boolean	Choose to save or load classifier. If true, the operator will load a previously trained and saved classifier.	false
`savedClassifierName`	String	The name of a previously trained and saved classifier. Only used if `doLoadClassifier` is true.	(empty)
`trainOnRaster`	Boolean	Train on raster (`true`) or vector data (`false`)	false
`numTrainSamples`	Integer	The number of training samples, interval (1,*]	5000
`trainingVectors`	String	Name of the vectors that holds training labels. The name of each vector is used as a class label and its position as a class code (e.g. 1, 2...) . Only specified if `trainOnRaster=true`. At least 2 vectors are required. If not specified, all the vectors of the first source product are used.	(empty)
`trainingBands`	String	Name of the band that holds training labels. Pixels with valid (non-NaN) values are used as training samples. Values are interpreted as integer class codes (e.g. 0, 1, 2...). Only specified if `trainOnRaster=false`. If not specified, the first band of the first source product is used.	(empty)
`doClassValQuantization`	Boolean	Quantization for raster training. Ignored for vector training.	true
`minClassValue`	Double	Quantization min class value for raster training. Ignored for vector training or if `doClassValQuantization=false`	0.0
`classValStepSize`	Double	Quantization step size for raster training. Ignored for vector training or if `doClassValQuantization=false`	5.0
`classLevels`	Double	Quantization class levels for raster training. Ignored for vector training or if `doClassValQuantization=false`	101
`featureBands`	String array	Names of bands used as features. If not specified, all the bands of the first source product are used.	(empty)

Note: All input bands (features and label) must have identical raster size. Pixels containing NaN in any feature band are skipped for training and assigned confidence = 0 during classification.

Trained model save / load

When a model is trained, the operator writes a bundled model file under ~/.snap/auxdata/classification/NaiveBayes/:

<classifierName>.model - Java-serialized bundle containing the classifier, the header-only dataset, and the training instance count.
<classifierName>.xml - associated XML with selected parameters (classes, feature bands, training band, etc.).

The saved models can be later used to classify other products.

References

[R-1] Aggarwal, C.C.. Data Classification: Algorithms and Applications, CRC Press, 2014, Ch 3
[R-2] Hastie, Tibshirani, Friedman. The Elements of Statistical Learning, 2nd ed., Springer, 2009, Ch. 2 & 6.
[R-3] Richards, Jia. Remote Sensing Digital Image Analysis, Springer, 2006, Ch 8
[R-4] https://scikit-learn.org/stable/modules/naive_bayes.html>
[R-5] Weka 3 NaiveBayes Javadoc — University of Waikato.
[R-6] Weka Wiki — tutorials and usage examples.
[R-7] Scikit-Learn — Naive Bayes documentation.
[R-8] Scikit-Learn — Gaussian Naive Bayes documentation.
[R-9] Wikipedia — Naive Bayes classifier