Multi-Fidelity#

Multi-fidelity optimization algorithms like Hyperband, BOHB and ASHA evaluate configurations at varying levels of accuracy. A cheap low-fidelity evaluation (e.g. training on 10% of the data) filters out bad candidates early, while full-fidelity evaluations are reserved for promising ones.

Surfaces supports this pattern through the fidelity parameter on all ML test functions.

Basic Usage#

Pass fidelity as a keyword argument to any ML function call. The value must be in the range (0, 1], where 1.0 means full evaluation and smaller values use proportionally less training data.

from surfaces.test_functions import RandomForestClassifierFunction

func = RandomForestClassifierFunction(dataset="digits", cv=5)

# Full-fidelity evaluation (default behaviour, same as fidelity=1.0)
score_full = func({"n_estimators": 100, "max_depth": 10})

# Low-fidelity: train on 10% of the data
score_cheap = func({"n_estimators": 100, "max_depth": 10}, fidelity=0.1)

# Medium-fidelity: train on 50% of the data
score_mid = func({"n_estimators": 100, "max_depth": 10}, fidelity=0.5)

Calling without fidelity or with fidelity=None gives the same result as before this feature existed. Existing code is not affected.

Subsampling Strategies#

Different ML function categories use different subsampling strategies to ensure that the reduced dataset remains meaningful.

Category	Strategy	Rationale
Tabular Classification	Stratified	Preserves class distribution (important for imbalanced or sorted datasets like iris)
Tabular Regression	Shuffled	Deterministic random subset with fixed seed
Time-Series Forecasting	Sequential	Takes the first N% of the series to preserve temporal order
Time-Series Classification	Stratified	Same as tabular classification
Image Classification	Stratified	Same as tabular classification
Ensemble Optimization	Stratified	Uses classification datasets internally
Feature Engineering	Stratified	Uses classification datasets internally

All subsampling uses random_state=42 for full reproducibility. The same fidelity value always produces the same subset.

Successive Halving Example#

A typical Successive Halving pattern evaluates many configurations cheaply, then narrows down:

import numpy as np
from surfaces.test_functions import GradientBoostingClassifierFunction

func = GradientBoostingClassifierFunction(dataset="digits", cv=3)

# Generate random configurations
rng = np.random.RandomState(0)
configs = [
    {"n_estimators": int(rng.choice(func.search_space["n_estimators"])),
     "max_depth": int(rng.choice(func.search_space["max_depth"])),
     "learning_rate": rng.choice(func.search_space["learning_rate"])}
    for _ in range(27)
]

# Round 1: evaluate all 27 at low fidelity
scores = [(c, func(c, fidelity=0.1)) for c in configs]
top_9 = sorted(scores, key=lambda x: x[1])[:9]

# Round 2: evaluate top 9 at medium fidelity
scores = [(c, func(c, fidelity=0.3)) for c in [t[0] for t in top_9]]
top_3 = sorted(scores, key=lambda x: x[1])[:3]

# Round 3: evaluate top 3 at full fidelity
scores = [(c, func(c, fidelity=1.0)) for c in [t[0] for t in top_3]]
best = min(scores, key=lambda x: x[1])

Memory Cache#

When memory=True, the cache distinguishes between fidelity levels. The same hyperparameters evaluated at different fidelities produce separate cache entries:

func = GradientBoostingClassifierFunction(dataset="digits", memory=True)

params = {"n_estimators": 100, "max_depth": 5, "learning_rate": 0.1}

func(params, fidelity=0.1)   # computed
func(params, fidelity=0.1)   # cache hit
func(params, fidelity=1.0)   # computed (different fidelity)
func(params)                 # computed (fidelity=None, separate key)

Data Collection#

When fidelity is set, the recorded evaluation data includes the fidelity value:

func = GradientBoostingClassifierFunction(dataset="digits")
func({"n_estimators": 100, "max_depth": 5, "learning_rate": 0.1}, fidelity=0.3)

print(func.data.search_data[-1])
# {'n_estimators': 100, 'max_depth': 5, 'learning_rate': 0.1,
#  'score': -0.87, 'fidelity': 0.3}

Evaluations without fidelity do not include the key in the record, keeping backwards compatibility with existing data processing code.

Limitations#

Surrogates do not support fidelity. When use_surrogate=True, the surrogate always returns full-fidelity predictions regardless of the fidelity value. A UserWarning is raised in this case.

Very low fidelity on small datasets may fail. The subsampled data must have at least as many samples as the number of CV folds. For example, iris (150 samples, 3 classes) with cv=5 and fidelity=0.05 produces only about 8 samples, which is not enough for 5-fold cross-validation. A ValueError with a clear message is raised when this happens.

Algebraic functions ignore fidelity. Passing fidelity to an algebraic function like SphereFunction has no effect. The parameter is accepted without error (for API uniformity) but does not change the evaluation.