Examples - Approximation
OLS approximation
If we have lots of data that we want to summarise with OLS
# Generating example data
using Random
using Distributions
using DataStructures
Random.seed!(1)
obs = 1000
X = rand(obs)
y = X .+ rand(Normal(),obs) .+ 7
# And now making an approximation function
approxFunction, reg = create_ols_approximation(y, X, 2)Numerical Integration with Chebyshev polynomials
And if we want to approximate the sin function in the [2.3, 5.6] bound with 7 polynomial terms and 20 approximation nodes:
chebyshevs = create_chebyshev_approximation(sin, 20, 7, OrderedDict{Symbol,Tuple{Float64,Float64}}(:default => (2.3, 5.6)))We can integrate the above term in the normal way to achieve Gauss-Chebyshev quadrature:
integral(chebyshevs, 2.3, 5.6)Multivariate: MARS Spline for approximation
First we will generate some example data.
using MultivariateFunctions
using Random
using DataFrames
using Distributions
using DataStructures
Random.seed!(1992)
nObs = 1000
dd = DataFrame()
dd[!, :x] = rand( Normal(),nObs) + 0.1 .* rand( Normal(),nObs)
dd[!, :z] = rand( Normal(),nObs) + 0.1 .* rand( Normal(),nObs)
dd[!, :w] = (0.5 .* rand( Normal(),nObs)) .+ 0.7.*(dd[!, :z] .- dd[!, :x]) + 0.1 .* rand( Normal(),nObs)
dd[!, :y] = (dd[!, :x] .*dd[!, :w] ) .* (dd[!, :z] .- dd[!, :w]) .+ dd[!, :x] + rand( Normal(),nObs)
dd[7,:y] = 1.0
y = :y
x_variables = Set{Symbol}([:w, :x, :z])It is important to note here that we have a set of symbols for x_variables. This is the set of columns in the dataframe that we will use to predict y - the dependent variable.
We can then create an approximation with recursive partitioning:
number_of_divisions = 7
rp_4, rp_reg_4 = create_recursive_partitioning(dd, y, x_variables, number_of_divisions; rel_tol = 1e-3)We can also create a MARS approximation spline:
rp_1, rp_reg_1 = create_mars_spline(dd, y, x_variables, number_of_divisions; rel_tol = 1e-3)Note that the rel_tol here is the tolerance in the optimisation step for hinges (or divisions in the recursive partitioning case). In most applied cases it generally doesn't matter much if there is a hinge at 1.0006 or at 1.0007 so in most settings this can be set higher than you would generally set the tolerance for a numerical optimiser. For this reason the default value is 1e-02.
Monotonic MARS Spline
If you need a MARS approximation that is guaranteed to be monotonic in each dimension, use create_monotonic_mars_spline. First generate some example data:
using MultivariateFunctions
using Random
using DataFrames
using Distributions
using DataStructures
Random.seed!(1)
nObs = 500
dd = DataFrame()
dd[!, :x] = rand(Normal(), nObs)
dd[!, :z] = rand(Normal(), nObs)
dd[!, :y] = 3.0 .* max.(0.0, dd[!, :x] .- 0.5) .+ 2.0 .* max.(0.0, dd[!, :z] .+ 1.0) .+ 1.0By default all dimensions are monotone increasing:
result = create_monotonic_mars_spline(dd, :y, Set([:x, :z]), 5; rel_tol = 1e-3)
monotone_model = result.modelYou can specify directions per dimension. For example, increasing in :x and decreasing in :z:
result = create_monotonic_mars_spline(dd, :y, Set([:x, :z]), 5;
directions = Dict{Symbol,Symbol}(:x => :increasing, :z => :decreasing))The returned named tuple contains the model (a Sum_Of_Piecewise_Functions), the coefficients vector, and the rss (residual sum of squares).
Monotonicity is guaranteed by construction: each basis function is a product of non-negative, monotone hinge functions, and all non-intercept coefficients are constrained to be non-negative via NNLS fitting.
By default the result is monotone (non-decreasing) which allows flat regions where the hinge functions are zero. To ensure strict monotonicity (no flat regions), set min_gradient to a positive value. This adds a linear term with that slope in every dimension before fitting, so the partial derivative in each dimension is always at least min_gradient:
result = create_monotonic_mars_spline(dd, :y, Set([:x, :z]), 5;
min_gradient = 0.01)Weighted Approximation
All approximation methods accept an optional weights keyword. Weights must be a Vector{Float64} with one entry per observation:
using MultivariateFunctions
using DataFrames, Random, Distributions
Random.seed!(1)
nObs = 200
dd = DataFrame()
dd[!, :x] = rand(Normal(), nObs)
dd[!, :z] = rand(Normal(), nObs)
dd[!, :y] = 2.0 .* max.(0.0, dd[!, :x] .- 0.3) .+ 1.5 .* max.(0.0, dd[!, :z] .+ 0.5) .+ 1.0
# Random positive weights
w = rand(nObs) .+ 0.1
# Weighted OLS
model_w, reg_w = create_saturated_ols_approximation(dd, :y, [:x, :z], 2; weights = w)
# Weighted MARS
result_w = create_mars_spline(dd, :y, Set([:x, :z]), 5; weights = w)
# Weighted monotonic MARS
result_mw = create_monotonic_mars_spline(dd, :y, Set([:x, :z]), 5; weights = w)Near-zero weights effectively exclude observations, which is useful for downweighting outliers:
# Add outliers and downweight them
dd_out = vcat(dd, DataFrame(x = [0.0], z = [0.0], y = [1000.0]))
w_out = vcat(ones(nObs), [1e-10])
model_clean, _ = create_saturated_ols_approximation(dd_out, :y, [:x, :z], 2; weights = w_out)Iterative Fitting with MultivariateFitter
The MultivariateFitter allows iterative fitting where each call to fit! blends new data with the accumulated model. This is useful for daily signal-to-return mappings.
using MultivariateFunctions
using DataFrames
using Random
using Distributions
Random.seed!(1)
# Create a fitter: 4 basis functions per fit, trim to 8 every 5 fits
fitter = MultivariateFitter(:mars, Set([:x, :z]);
MaxM = 4, simplify_to = 8, simplification_frequency = 5,
weight_on_new = 0.5)
# Simulate daily fitting
for day in 1:10
dd = DataFrame()
dd[!, :x] = rand(Normal(), 200)
dd[!, :z] = rand(Normal(), 200)
dd[!, :y] = 2.0 .* max.(0.0, dd[!, :x] .- 0.3) .+ dd[!, :z] .+ rand(Normal(), 200)
fit!(fitter, dd, :y)
end
# Evaluate the accumulated model
dd_test = DataFrame(x = [1.0, 2.0], z = [0.5, -0.5])
predictions = evaluate(fitter, dd_test)Both fitter types support callable syntax as a shorthand for evaluate:
# These are equivalent:
predictions = evaluate(fitter, dd_test)
predictions = fitter(dd_test)
# Single-point evaluation:
val = fitter(Dict(:x => 1.0, :z => 0.5))Weights can be passed to fit! for weighted fitting:
w = rand(200) .+ 0.1
fit!(fitter, dd, :y; weights = w)Monotonic MARS with Iterative Fitting
The MultivariateFitter supports :monotonic_mars for iterative fitting with monotonicity guarantees:
fitter = MultivariateFitter(:monotonic_mars, Set([:x, :z]);
MaxM = 4, weight_on_new = 0.5,
directions = Dict(:x => :increasing, :z => :increasing),
min_gradient = 0.01)
for day in 1:10
dd = DataFrame()
dd[!, :x] = rand(Normal(), 200)
dd[!, :z] = rand(Normal(), 200)
dd[!, :y] = 3.0 .* max.(0.0, dd[!, :x]) .+ 2.0 .* dd[!, :z] .+ 1.0 .+ rand(Normal(), 200)
fit!(fitter, dd, :y)
endIterative Fitting with Group Adjustments
The MultivariateAdjustedFitter fits a shared shape f(x) with per-group affine coefficients y_g ≈ a_g + b_g * f(x):
fitter = MultivariateAdjustedFitter(:mars, Set([:x, :z]);
MaxM = 4, weight_on_new = 0.5,
coefficient_bounds = ((-2.0, 2.0), (0.1, 3.0)))
dd = DataFrame()
dd[!, :x] = rand(Normal(), 300)
dd[!, :z] = rand(Normal(), 300)
dd[!, :y] = rand(Normal(), 300)
groups = vcat(fill(:A, 150), fill(:B, 150))
fit!(fitter, dd, :y, groups)
# Group-specific predictions (these are equivalent)
predictions_A = evaluate(fitter, dd[1:10, :], :A)
predictions_A = fitter(dd[1:10, :], :A)
# Single-point evaluation
val = fitter(Dict(:x => 1.0, :z => 0.5), :B)The coefficient_bounds parameter clamps per-group (a, b) to specified ranges. Setting fit_intercept = false forces a = 0:
fitter = MultivariateAdjustedFitter(:mars, Set([:x, :z]);
MaxM = 4, weight_on_new = 0.5,
fit_intercept = false,
coefficient_bounds = ((-5.0, 5.0), (0.5, 2.0)))For unknown groups (not seen during fitting), predictions use default coefficients (a=0.0, b=1.0):
# Group :C was never in the training data
predictions_C = fitter(dd[1:5, :], :C) # uses a=0, b=1