Skip to content

Normalization.jl

This package allows you to easily normalize an array over any combination of dimensions, with a bunch of methods (z-score, sigmoid, centering, minmax, etc.) and modifiers (robust, mixed, NaN-safe).

Usage

Each normalization method is a subtype of AbstractNormalization. Each AbstractNormalization subtype has its own estimators and forward methods that define how parameters are calculated and the normalization formula. Each AbstractNormalization instance contains the concrete parameter values for a normalization, fit to a given input array.

You can work with AbstractNormalizations as either types or instances. The type approach is useful for concise code, whereas the instance approach is useful for performant mutations. In the examples below we use the ZScore normalization, but the same syntax applies to all Normalizations.

Fit to a type

julia
X = randn(100, 10)
N = fit(ZScore, X; dims=nothing) # eltype inferred from X
N = fit(ZScore{Float32}, X; dims=nothing) # eltype set to Float32
N isa AbstractNormalization && N isa ZScore # Returns a concrete AbstractNormalization

Fit to an instance

julia
X = randn(100, 10)
N = ZScore{Float64}(; dims=2) # Initializes with empty parameters
N isa AbstractNormalization && N isa ZScore # Returns a concrete AbstractNormalization
!isfit(N)

fit!(N, X; dims=1) # Fit normalization in-place, and update the `dims`
Normalization.dims(N) == 1

Normalization and denormalization

With a fit normalization, there are two approaches to normalizing data: in-place and out-of-place.

julia
_X = copy(X)
normalize!(_X, N) # Normalizes in-place, updating _X
Y = normalize(X, N) # Normalizes out-of-place, returning a new array
normalize(X, ZScore; dims=1) # For convenience, fits and then normalizes

For most normalizations, there is a corresponding denormalization that transforms data to the original space.

julia
Z = denormalize(Y, N) # Denormalizes out-of-place, returning a new array
Z  X
denormalize!(Y, N) # Denormalizes in-place, updating Y

Both syntaxes allow you to specify the dimensions to normalize over. For example, to normalize each 2D slice (i.e. iterating over the 3rd dimension) of a 3D array:

julia
X = rand(100, 100, 10)
N = fit(ZScore, X; dims=[1, 2])
normalize!(X, N) # Each [1, 2] slice is normalized independently
all(std(X; dims=[1, 2]) .≈ 1) # true

Normalization methods

Any of these normalizations will work in place of ZScore in the examples above:

Subtract the mean and scale by the standard deviation (aka standardization)

xμσ
julia
x = 1.5.*randn(100) .+ 0.5
N = fit(ZScore, x)
y = normalize(x, N)
normviz(x, y)

Normalization modifiers

What if the input data contains NaNs or outliers? We provide AbstractModifier types that can wrap an AbstractNormalization to modify its behavior.

Any concrete modifier type Modifier <: AbstractModifier (for example, NaNSafe) can be applied to a concrete normalization type Normalization <:AbstractNormalization:

julia
    N = NaNSafe{ZScore} # A combined type with a free `eltype` of `Any`
    N = NaNSafe{ZScore{Float64}} # A concrete `eltype` of `Float64`

Any AbstractNormalization can be used in the same way as an AbstractModifier.

NaN-safe normalizations

If the input array contains any NaN values, the ordinary normalizations given above will fit with NaN parameters and return NaN arrays. To circumvent this, any normalization can be made 'NaN-safe', meaning it ignores NaN values in the input array, using the NaNSafe modifier.

Robust modifier

The Robust modifier can be used with any AbstractNormalization that has mean and standard deviation parameters. The Robust modifier converts the mean to median and std to iqr/1.35, giving a normalization that is less sensitive to outliers.

Mixed modifier

The Mixed modifier defaults to the behavior of Robust but uses the regular parameters (mean and std) if the iqr is 0.

Properties and traits

The following are common methods defined for all AbstractNormalization subtypes and instances.

Type traits

  • Normalization.estimators(N::Union{<:AbstractNormalization,Type{<:AbstractNormalization}) returns the estimators N as a tuple of functions

  • forward(N::Union{<:AbstractNormalization,Type{<:AbstractNormalization}) returns the forward normalization function (e.g. x -> xμ/σ for the ZScore)

  • inverse(N::Union{<:AbstractNormalization,Type{<:AbstractNormalization}}) returns the inverse normalization function e.g. forward(N)(ps...) |> InverseFunctions.inverse

  • eltype(N::Union{<:AbstractNormalization,Type{<:AbstractNormalization}) returns the eltype of the normalization parameters

Concrete properties

  • Normalization.dims(N::<:AbstractNormalization) returns the dimensions of the normalization. The dimensions are determined by dims and correspond to the mapped slices of the input array.

  • params(N::<:AbstractNormalization) returns the parameters of N as a tuple of arrays. The dimensions of arrays are the complement of dims.

  • isfit(N::<:AbstractNormalization) checks if all parameters are non-empty