KNN Imputer#
An unsupervised imputer that replaces missing values in a dataset with the distance-weighted average of the samples' k nearest neighbors' values. The average for a continuous feature column is defined as the mean of the values of each donor. Similarly, average is defined as the most frequent value for categorical features.
Note
Requires a NaN safe distance kernel such as Safe Euclidean for continuous features.
Interfaces: Transformer, Stateful, Persistable
Data Type Compatibility: Depends on distance kernel
Parameters#
| # | Name | Default | Type | Description | 
|---|---|---|---|---|
| 1 | k | 5 | int | The number of nearest neighbor donors to consider when imputing a value. | 
| 2 | weighted | true | bool | Should we use distances as weights when selecting a donor sample? | 
| 3 | categoricalPlaceholder | '?' | string | The categorical placeholder denoting the category that contains missing values. | 
| 4 | tree | BallTree | Spatial | The spatial tree used to run nearest neighbor searches. | 
Example#
use Rubix\ML\Transformers\KNNImputer;
use Rubix\ML\Graph\Trees\BallTee;
use Rubix\ML\Kernels\Distance\SafeEuclidean;
$transformer = new KNNImputer(10, false, '?', new BallTree(30, new SafeEuclidean()));
Additional Methods#
This transformer does not have any additional methods.
References#
- 
O. Troyanskaya et al. (2001). Missing value estimation methods for DNA microarrays. ↩ 
  
    
      Last update: 2021-03-03