Fast Bayesian optimization of Needle-in-a-Haystack problems using zooming memory-based initialization (ZoMBI)

Siemenn, Alexander E.; Ren, Zekun; Li, Qianxiao; Buonassisi, Tonio

doi:10.1038/s41524-023-01048-x

Fast Bayesian optimization of Needle-in-a-Haystack problems using zooming memory-based initialization (ZoMBI)

Article
Open access
Published: 26 May 2023

Volume 9, article number 79, (2023)
Cite this article

Download PDF

You have full access to this open access article

npj Computational Materials

Fast Bayesian optimization of Needle-in-a-Haystack problems using zooming memory-based initialization (ZoMBI)

Download PDF

3970 Accesses
12 Citations
18 Altmetric
Explore all metrics

Abstract

Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. However, current state-of-the-art optimization algorithms are not designed with the capabilities to find solutions to these challenging multidimensional Needle-in-a-Haystack problems, resulting in slow convergence or pigeonholing into a local minimum. In this paper, we present a Zooming Memory-Based Initialization algorithm, entitled ZoMBI, that builds on conventional Bayesian optimization principles to quickly and efficiently optimize Needle-in-a-Haystack problems in both less time and fewer experiments. The ZoMBI algorithm demonstrates compute time speed-ups of 400× compared to traditional Bayesian optimization as well as efficiently discovering optima in under 100 experiments that are up to 3× more highly optimized than those discovered by similar methods.

The importance of implementation details and parameter settings in black-box optimization: a case study on Gaussian estimation-of-distribution algorithms and circles-in-a-square packing problems

Article 15 October 2016

ARGONAUT: AlgoRithms for Global Optimization of coNstrAined grey-box compUTational problems

Article 12 April 2016

Tabu Search∗

Introduction

Current optimization algorithms achieve good results on low-dimensional problems that are smooth and have wide basins of attraction. Examples of smooth manifolds with wide basins of attraction within material science include process- and recipe-optimization problems such as tuning perovskite manufacturing variables to achieve higher efficiency¹, optimizing microfluidics flow parameters to achieve ideal droplet formation², optimizing silver nanoparticle recipes for optical properties³, and tuning perovskite compositions with physics-based constraints to maximize stability⁴. Optimization techniques like Bayesian optimization (BO) are well-suited to model these simple manifolds using a Gaussian Process (GP) surrogate^5,6,7,8,9. However, the performance of this BO with a GP breaks down as the manifold complexity increases. Material property optimization problems that have high technological significance, such as discovering materials with rare properties or materials with a specific combination of properties, have search space manifolds that more closely resemble a Needle-in-a-Haystack¹⁰, shown in Fig. 1b, rather than a smooth or convex space.

**Fig. 1: Archetypal Manifolds in Materials Science Optimization.**

This Needle-in-a-Haystack (NiaH) problem arises when only few optimum conditions exist within the entire dataset, resulting in an extreme imbalance. Interpolating the parameter space of an imbalanced dataset with an estimation function, such as a GP, results in smoothing over the optimum or over-predicting the properties of the materials found near the optimum^11,12,13. Examples of NiaH materials optimization problems include discovering auxetic materials (i.e., materials that have a highly negative Poisson’s ratio, ν) for energy absorptive medical devices or protective armor^14,15,16 and discovering materials that have a combination of high electrical conductivity and low thermal conductivity (i.e., a highly positive thermoelectric figure of merit, ZT) used for improving sensor technology to enable ubiquitous solid-state cooling^17,18,19. Optimization of these rare material properties illustrates examples where an extreme data balance exists in the dataset because only a fraction of the total number of materials exhibit these rare properties^{14,20,21,22,23}. This NiaH optimization challenge of extremely imbalanced datasets is largely applicable to many fields, not just materials science, including the fields of ecological resource management^24,25, fraud detection^26,27, and rare diseases^27,28.

Several challenges exist for the current landscape of computational tools that inhibit effective optimization of these complex NiaH problems. Firstly, the “needle" makes up only a small percentage of the total manifold search space, resulting in a weak correlation between the measured input parameters and the target property of interest, inhibiting discovery of the region containing the needle^11,29,30. This challenge requires the development of an algorithm that can more quickly determine the plausible region of the manifold where the needle exists. The second challenge for algorithms, such as BO, to optimize NiaH manifolds is in the nature of the acquisition function to pigeonhole sampling into local minima because of the narrowness of the needle’s basin of attraction^31,32. Standard BO acquisition functions, including expected improvement (EI)³³ and lower confidence bound (LCB)^7,12, are static sampling techniques that only adjust sampling based on the output of the surrogate model, which enacts smoothing of the needle^5,6,11. To overcome this challenge, active learning-based tuning of the acquisition function hyperparameters can be implemented to improve the sampling quality and avoid pigeonholing. Lastly, there exists a computing challenge for NiaH problems where, typically, several thousands of samples must be observed to find an optimum when using an algorithm that is poorly suited to tackle NiaH manifolds¹⁰. The compute time of BO using a GP surrogate scales with the complexity O(n³), where n is the number of experiments sampled, hence, the compute time of traditional BO blows up as more data is required to find the optimum^{5,6,34,35,36,37,38}. To solve this computing challenge, an algorithm must be designed that both efficiently optimizes the space in as few experiments as possible and reduces the effect of compounding compute times over the length of the optimization procedure.

In recent literature, algorithms have been developed to address some of these challenges individually, but not all of them together. The first class of solutions bound the search space using a trust region approach to sample regions with higher probability of containing the optimum. Eriksson et al. develop TuRBO³⁹ that compiles a set of independent model runs, using separate GP surrogate models to compute a new, smaller search region, narrowed in on the target optimum. Regis develops TRIKE⁴⁰ that utilizes maximization of the EI acquisition function to bound a trust region containing the global optimum. Diouane et al. develop TREGO⁴¹, which interleaves sampling between global and local search regions, where the local search regions are defined by the single best historical experiment sampled. Although these methods offer solutions to one of the three challenges presented, each method has its downfalls when optimizing NiaH problems. For example, TuRBO requires the computation of several GP model runs, which increases compute time and also does not guarantee that the needle will be resolved due to interpolation effects; TRIKE is inflexible to the use of other acquisition functions as it locks the user in to only using EI, which may pigeonhole into local minima; TREGO uses only the best sampled experiment to define its search regions, which will yield inconsistent or sub-optimal results when the needle consists of a fractional region of the manifold and single point is unlikely to land in its basin of attraction. The second class of solutions to the challenges presented in this paper are designed to decrease the computing time required to run an optimization procedure. A common method for reducing the compute time of BO with a GP surrogate is to introduce a sparse GP^5,37,42. A sparse GP uses a small subset of pseudo data, often denoted as m, to reduce the GP time complexity from O(n³) to O(nm²)⁴³. However, the process of selecting a useful subset requires minimizing the Kullback-Leibler divergence between the sparse GP and true posterior GP, which is often a computationally intensive procedure of using variational inference⁴⁴. In addition to sparse GPs, algorithms have been developed in literature to improve the compute time of optimization in various ways. Van Stein et al. develop MiP-EGO⁴⁵, which parallelizes the function evaluations of efficient global optimization (EGO) to discover optima faster and in fewer experiments using derivative-free computation⁴⁶. Joy et al.⁴⁷ use directional derivatives to accelerate hyperparameter tuning by 100× and achieve higher accuracy than the FABOLAS baseline by Klein et al.⁴⁸. Zhang et al. develop FLASH⁴⁹ to achieve optimization speed-ups of 50% by using a linear parametric model to guide algorithm search within high-dimensional spaces. Snoek et al.¹³ design a neural network-based parametric model that reduces the overall time complexity of BO to O(n) compared to the complexity of O(n³) of standard BO with a GP surrogate model. These existing methods from literature within the class of solutions for accelerating compute time are generally introducing external models necessary to perform optimization, such as neural networks, variational inference, or parametric models. While these external models do speed up compute time, they often lack the predictive capabilities to capture the weak correlation between measured input parameters and the target property of interest in NiaH problems. We illustrate this mechanism later in the paper when comparing the optimization results on two materials science NiaH problems of a fast algorithm MiP-EGO with that of TuRBO, an algorithm better suited for discovering optima within narrow basins of attraction.

Although these methods from existing literature address some of the challenges in optimizing NiaH problems, none of them have been designed specifically to quickly and efficiently discover a needle-like optimum within a haystack of sub-optimal points, resulting in all of them falling short of a full solution. Therefore, in this paper, we design an algorithm that addresses all three of the challenges faced when optimizing NiaH problems by (1) zooming in the manifold search bounds iteratively and independently for each dimension based on m number of best memory points to quickly converge to the plausible region containing the global optimum needle, (2) relieving compute utilization by pruning the low-performing and redundant memory points not being used to zoom in the search bounds, (3) anti-pigeonholing into local minima by using actively learned acquisition function hyperparameters to tune the exploitation-to-exploration ratio. The proposed algorithm, entitled [Zo]oming [M]emory-[B]ased [I]nitialization (ZoMBI), combines these three contributions into a method that efficiently optimizes NiaH problems quickly. Figure 2 demonstrates the accelerated convergence ability of the proposed (ZoMBI) algorithm compared to standard BO. In essence, this process of scanning broadly and then focusing in on points of interest based on memory was inspired by the way we humans solve similar problems, but stands in contrast to the way standard BO methods with static acquisition functions solve problems. We demonstrate the performance of this algorithm on three vastly different NiaH problems in materials science and ecological resource management: (1) discovery of materials with negative Poisson’s ratio, (2) discovery of materials with both high electrical conductivity and low thermal conductivity, and (3) detection of environmental conditions conducive of sustaining wildfires. The performance of the proposed ZoMBI algorithm is compared against standard BO with static acquisition functions as well as against three more algorithms: (1) HEBO, the winning submission of the NeurIPS 2020 Black-Box optimization challenge⁵⁰ and one algorithm from each of the two classes of partial NiaH solutions (2) TuRBO (bounded search space)³⁹ and (3) MiP-EGO (faster compute)⁴⁵. Finally, we stress-test the proposed ZoMBI algorithm across 174 additional datasets varying the optimum needle width, optimum distance to edges, dimensionality, and initialization conditions.

**Fig. 2: Accelerated Convergence to True Target using ZoMBI.**

Results

Zooming in the search bounds on the manifold addresses challenge number one of optimizing NiaH problems, which is the challenge of finding the general hypervolume region that contains the needle-like optimum. Figure 3 illustrates how the ZoMBI algorithm iteratively zooms in the search bounds based on the number of activations, α. An Ackley function is used as a simulated example due to its non-convexity and needle-like global optimum^51,52. For each activation, m prior points that achieved the lowest target values, y, are retained in memory and used to zoom the search bounds in. This zooming occurs independently across each dimension and is based on the minimum and maximum values of the m memory points along each dimension, as shown in Equation (2). The red and orange rectangles illustrate the evolution of the bounds over space and time. Initially, sampling occurs across the entire manifold for ϕ forward experiments per activation, shown by the black markers. However, by using the best-performing memory points to zoom in the search bounds, pigeonholing into local minima can also be avoided as the search bounds are pulled away from these trap minima and move closer towards the global minimum basin of attraction. The iterative zooming of ZoMBI does not guarantee convergence on the global optimum, but if a sufficient initialization set is obtained, convergence often gets close to the global optimum as shown across several examples in Fig. 5 and Figs. 8, 9, 10. Furthermore, we comprehensively demonstrate the performance limitations of ZoMBI where initializations miss extreme needle-like optima in Fig. 6 and where optima are near the edges of a manifold in Supplementary Figure 4.

As more experiments are amassed and committed to memory to run traditional BO by computing the GP surrogate, the compute time increases polynomially, following the O(n³) time complexity of GP matrix inversion^{5,6,34,37,38,53}. This complexity is unfavorable as it leads to compounding compute times as more experiments are run. Therefore, we implement a memory pruning feature into the ZoMBI algorithm that iteratively selects which prior data points to keep and which to prune from the memory during each activation, α. Memory pruning is demonstrated to remove redundant features during the optimization procedure. Figure 2 illustrates how ZoMBI accelerates the convergence of a GP prediction to the precise location of the true. However, only data within the newly computed bounds of ZoMBI are used prediction of the true target, hence, all data outside this boundary becomes redundant and is pruned to decrease compute time.

Through memory pruning, the number of experiments used to train the GP surrogate varies between [i, i + ϕ] for every α, rather than being proportional to n, where the number of initialization samples is fixed at i = 5. In this paper, we use ϕ ∈ [0, 10], i.e., once ϕ = 10, the activation is complete and resets to ϕ = 0. This is computationally favorable because {X_i} ∪ {X_ϕ} ⊆ {X_n}. Thus, for a single α, the time complexity is O((i+ϕ)³) ≈ O(ϕ³), since i is fixed. Furthermore, since the range of ϕ is capped, a non-increasing sawtooth pattern in compute time is exhibited, illustrated in Fig. 4. Therefore, the compute complexity of ZoMBI trends towards O(1) for α > 1 as a result of the efficient memory pruning process. After collecting 1000 experiments, the compute time of traditional BO trends towards > 400 seconds per experiment, whereas for ZoMBI the compute time maintains a constant trend of approximately 1 second per experiment. Therefore, the memory pruning feature of ZoMBI accelerates the optimization compute time by over 400× at n = 1000 and achieves further relative acceleration as n increases.

Pigeonholing into the local minima of a function occurs when an optimization algorithm has insufficient learned knowledge of the manifold topology to continue exploring potentially profitable regions or when the algorithm’s hyperparameters are improperly tuned, leading to overly exploitative tendencies^1,9. The ZoMBI algorithm’s anti-pigeonholing capabilities are two-fold: (1) the zooming search bounds help the acquisition function to quickly stop sampling local minima once a better performing data point is found and (2) actively learned acquisition function hyperparameters use knowledge about the domain to help exit a local minimum. Figure 5 demonstrates the anti-pigeonholing capabilities of ZoMBI on optimizing a 6D Ackley function with both static and dynamic acquisition functions, compared to that of traditional BO.

**Fig. 5: Acquisition Function Sampling Density.**

The needle-like global minimum is indicated by the red “x" and the local minima are indicated by the circular and pointed regions of the contour lines. The sampling density of each acquisition function is illustrated by the heatmap, where the darker colors indicate higher sampling density regions. The goal is to get high sampling density near the red “x". It is shown that without ZoMBI being activated, the LCB, LCB Adaptive, and EI acquisition functions all end up pigeonholing into local minima. However, EI Abrupt initially pigeonholes into a local minima but then switches from an exploitative to an explorative mode to jump out of the local minimum and converge closer to the global. Conversely, when running the optimization procedure with ZoMBI active, all of the acquisition functions except the most exploitative, EI, converge onto the global minimum fast. LCB Adaptive and EI are shown to initially start sampling towards a local minima, but as ZoMBI is iteratively activated, the search bounds zoom in closer to the global minimum. Thus, with the combination of dynamic acquisition functions and zooming search bounds, pigeonholing into sub-optimal local minima can be more readily avoided while optimizing NiaH problems, although avoidance is not guaranteed, as shown by the sampling density of EI. The combination of the three foundational features of ZoMBI, (1) zooming bounds, (2) memory pruning, and (3) anti-pigeonholing drives fast optimization of NiaH problems and in most cases, does not sacrifice the ability to converge on the global optimum.

Before assessing the performance of ZoMBI on the three real-world datasets, we use 144 permutations of the Ackley function to stress-test the capability of ZoMBI to discover the global optimum basin of attraction, given two varying dataset hyperparameters: (1) basin of attraction width and (2) dimensionality. The basin of attraction hypervolume is determined by both the width of the basin and the dimensionality of the manifold, hence, as the basin becomes narrower in width and as the dimensionality increases, the percentage of hypervolume space taken up by the basin decreases, i.e. the optimum becomes more needle-like. The Ackley permutations have varying basin hypervolumes from 0.001% to 100% and varying manifold dimensionalities from 2D to 10D. For this experiment, we aim to determine types of manifold topologies that ZoMBI best optimizes while quantifying those limits with the Pareto front.

Figure 6 shows the results of this large-scale optimization experiment of 48 independent trials of ZoMBI across each of the 144 unique permutations of the Ackley function dataset with varying optimum hypervolumes and dimensionality. All points below the grey-shaded region fall within the optimum basin of attraction. The red trace of the Pareto front indicates the narrowest optimum hypervolume and dimensionality conditions of a dataset that result in the best minimum function value being discovered. We show that with an initialization set of i = 5, ZoMBI can reliably discover the global minimum region for needles as narrow as 0.05% of total hypervolume space. Moreover, as the optimum becomes narrower than 0.05% of the total hypervolume, the initialization set is no longer sufficient and ZoMBI gets trapped in local minima, as indicated by the greyed-out region. Conversely, as the optimum becomes wider than 5% of the total hypervolume, the manifold becomes flatter, expressing the greedy nature of ZoMBI to falsely zoom inward to less ideal function values than it would for narrower optimum conditions. This experiment quantifies the range of ZoMBI’s Goldilocks zone to be between 0.05% and 5% optimum hypervolume. Therefore, for ideal performance, ZoMBI is best used on datasets with optimum conditions consisting of between 0.05% and 5% of the total number of conditions. This optimum hypervolume trade-off of ZoMBI is further assessed relative to other optimization methods in Supplementary Figure 3.

**Fig. 6: Varying Optimum Hypervolume.**

Three real-world datasets are optimized using ZoMBI—each of these datasets has an extreme data imbalance, illustrated in Figure 7 within the specified ideal ranges of ZoMBI performance. The 6D Poisson’s Ratio dataset has an imbalance of 0.82% optimum conditions, the 6D Thermoelectric Figure of Merit dataset has an imbalance of roughly 1.32% optimum conditions, and the 11D wildfire detection dataset has an imbalance of 4.16% optimum conditions. This range of ideal performance of ZoMBI between 0.05% and 5% optimum hypervolume is facilitated by the initialization set. Hence, to improve performance for narrower optima, either the number of initialization samples must be increased, or initialization conditions should be adjusted. Additional initialization conditions experiments of ZoMBI are shown in Supplementary Information.

**Fig. 7: Data Distributions of Real-world Needle-in-a-Haystack Datasets.**

The first experimental dataset is 6-dimensional and consists of 146k materials from the publicly available Materials Project database with different mechanical properties, described by Poisson’s Ratio, ν²⁰. Only 0.82% of the total 146k materials have a negative Poisson’s Ratio, ν < 0^14,15,20,21. Hence, for this experiment, we aim to minimize ν. A positive ν > 0, describes a material that expands when a compressive load is applied to the orthogonal direction^54,55. Conversely, a negative ν < 0 describes a material that contracts rather than expands when compressed in the orthogonal direction, denoted as an auxetic material—a rare phenomenon^14,23. Auxetic materials with highly negative Poisson’s ratios have energy absorptive properties that are ideal materials for wearable medical devices and protective armor that must absorb the energy of large impacts to keep bones from shifting or to inhibit the penetration of the protective layer^15,16.

Figure 8 demonstrates the optimization performance of ZoMBI on the Poisson’s Ratio dataset compared to MiP-EGO, TuRBO, and HEBO. The ZoMBI algorithm is run with each of the four acquisition functions: LCB, LCB Adaptive, EI, and EI Abrupt. In under 100 evaluated experiments, LCB and LCB Adaptive discover the global minimum NiaH material, Li₂NbF₆ (ν ≈ − 1.7). The variance of ν values for the final experiment across all ensemble runs is illustrated as a KDE plot for each method to highlight the sampling density and general rate of success. HEBO discovers the global minimum after ZoMBI with LCB and LCB Adaptive, however, the spread of runs for ZoMBI is narrower than that of HEBO, which indicates that for this problem, ZoMBI can more consistently discover the minimum, that is 3× lower than those discovered by MiP-EGO and TuRBO. Furthermore, the rate of convergence on Needle 1 is faster for ZoMBI than HEBO.

**Fig. 8: Discovery of Rare Negative Poisson’s Ratio Materials.**

Figure 7a illustrates the distribution of ν values within the full dataset. The ground truth “needle" materials with the lowest ν values are Li₂NbF₆ with ν ≈ −1.7 and Na₂CO₃ with ν ≈ −1.2. ZoMBI with the LCB and LCB Adaptive acquisition functions and HEBO discover Li₂NbF₆, while ZoMBI with the EI Abrupt acquisition function discovers Na₂CO₃.

The second experimental dataset is 6-dimensional and consists of 1k materials with different thermal and electrical properties, described by the Thermoelectric Figure of Merit, ZT. Since ZT values are always positive, there is no clear cutoff for what “optimum" conditions are, but with a threshold of ZT > 0.8, 1.32% of the total 1k materials are considered optimum. A higher ZT indicates that the material is better able to convert a thermal gradient into an electrical current⁵⁶. Hence for this experiment, we aim to maximize ZT. Unlike Poisson’s Ratio, Thermoelectric Merit is determined by a combination of several variables, rather than a single variable⁵⁶:

$${{{\rm{ZT}}}}=\frac{{S}^{2}\sigma }{\kappa }T,$$

(1)

where S is the Seebeck coefficient, σ is electrical conductivity, T is the average temperature, and κ is thermal conductivity. The ZT is computed for each material with valid thermal and electrical properties in the Materials Project database using BoltzTraP⁵⁷. ZT is a common figure of merit used to describe the thermal-to-electrical or electrical-to-thermal conversion efficiency of thermoelectric materials^58,59,60,61. Materials with high ZT values have a range of applications from usage as solid-state cooling devices to being used as sensors that when heated up, will produce an electrical signal^17,18,19.

Figure 9 demonstrates the optimization performance of ZoMBI on the Thermoelectric Figure of Merit dataset compared to MiP-EGO, TuRBO, and HEBO. In this experiment, although none of the tested methods are able to discover the maximum needle, LCB Adaptive discovers the second highest needle-in-a-haystack material, Na₄Al₃Ge₃IO₁₂ (ZT ≈ 1.4) in under 100 experiments. Neither HEBO, TuRBO, nor MiP-EGO are capable of discovering any needle-like ZT optima and MiP-EGO performs worse than random sampling in this experiment. The wide variance across runs for ZoMBI and HEBO, shown in the KDE plots, indicate that both methods operate relatively explorative to discover maxima in this topology. Ultimately, this experiment demonstrates that ZoMBI can optimize material objective functions that have a complex combination of variables (Equation (1)) with roughly 2× better performance than HEBO.

**Fig. 9: Discovery of Rare Positive Thermoelectric Figure of Merit Materials.**

Figure 7b illustrates the distribution of ZT values within the full dataset. The ground truth “needle" materials with the highest ZT values are Sr₄Al₆SO₁₂ with ZT ≈ 1.9 and Na₄Al₃Ge₃IO₁₂ with ZT ≈ 1.4. ZoMBI with the LCB Adaptive acquisition function is the only method that discovers one of these needles, Na₄Al₃Ge₃IO₁₂.

The third experimental dataset is 11-dimensional and consists of 128k meteorological conditions and an index, ψ, that determines whether the set of conditions has a high likelihood of generating or sustaining a wildfire in the state of California—publicly available from the California Irrigation Management Information System (CIMIS) weather stations⁶². Only 4.16% of the total 128k meteorological conditions have a negative wildfire detection index, ψ < 0. A highly negative ψ indicates a high risk of wildfires. Hence, for this experiment, we aim to minimize ψ, to best detect meteorological conditions at high risk of wildfires. The dataset spans over two years of data collected from 2018 to 2020, during which over 2500 wildfires have occurred, burning over 24 million acres of land⁶³. In California, temperature and precipitation alone are poor indicators for wildfire outbreaks (see Supplementary Fig. 1), resulting in researchers using computer-vision methods or convolutions of many meteorological variables to reliably detect wildfire conditions instead^25,63. Thus, there is a high need for algorithmic support to aid humans in early wildfire detection.

Figure 10 demonstrates the optimization performance of ZoMBI on the Wildfire Detection dataset compared to MiP-EGO, TuRBO, and HEBO. In this experiment, LCB Adaptive, EI, and HEBO discover the lowest index value, ψ ≈ − 3.5, for detecting wildfires based on a high-dimensional convolution of ten meteorological variables. TuRBO and MiP-EGO also discover a low index value, ψ ≈ − 2.5, however, these methods have widely distributed variances, as shown by the KDE plots, indicating inconsistent optimization results given only 100 sampled experiments. Similarly, HEBO has high variance across model runs while the LCB Adaptive and EI ZoMBI methods have a tight distribution, indicating more reliable optimization results with a higher rate of success. Furthermore, ZoMBI methods achieve a faster rate of convergence than HEBO onto the Needle 1 optimum, similar to the optimization results on the Poisson’s Ratio dataset.

**Fig. 10: Detection of Environmental Conditions with Wildfire Risk.**

Figure 7c illustrates the distribution of ψ values within the full dataset. The ground truth “needle" conditions for detecting wildfires are those with the most negative detection index values, ψ. Although ZoMBI with the LCB Adaptive and EI acquisition functions as well as HEBO discover the lowest needle-like ψ conditions after 100 sampled experiments, none of the tested methods are able to find the global ${\psi }_{\min }\approx -12$. These results imply that, even for ZoMBI, with a narrow enough needle-like optimum, an LHS initialization of i = 5 experiments, may not be sufficient. Supplementary Fig. 4 demonstrates that extending the bounds of LHS initialization is shown to improve the performance of ZoMBI on certain manifold topologies.

Discussion

In this paper, we proposed the [Zo]oming [M]emory-[B]ased [I]nitialization (ZoMBI) algorithm that builds on the principles of Bayesian optimization to accelerate the optimization of Needle-in-a-Haystack problems by two-fold, firstly by requiring fewer experiments to achieve better optimum faster than existing MiP-EGO⁴⁵, TuRBO³⁹, and HEBO⁵⁰ on a variety of real-world applications, and secondly by pruning the memory of low-performing historical experiments to speed-up compute time. The ZoMBI algorithm convergences onto narrow and sharp optima quickly in Needle-in-a-Haystack datasets by (1) using the values of the m best performing previously sampled memory points to iteratively zoom in the search bounds of the manifold uniquely on each dimension and (2) implementing two custom acquisition functions, LCB Adaptive and EI Abrupt, that adapt their hyperparameters to tune sampling of new experimental conditions based on learned information from the surrogate model. The main contributions of this algorithm solve three fundamental challenges of optimizing non-convex Needle-in-a-Haystack problems: (1) the challenge of locating the hypervolume region of the manifold containing the narrow global optimum basin of attraction^11,29,30 is alleviated by introducing iterative search bounds based on learned knowledge of the manifold; (2) the challenge of polynomially increasing compute times of BO using a GP surrogate^{5,6,34,35,36,37,38} is addressed by actively pruning the retained memory of the algorithm after each activation, α, in turn, reducing the time complexity from O(n³) to O(ϕ³) for ϕ forward experiments per activation, α, which trends to a constant O(1) when α > 1; (3) unwanted pigeonholing into local minima^5,6,31,32 is avoided by both the zooming mechanics of ZoMBI as well as using the two acquisition functions developed in this paper, LCB Adaptive and EI Abrupt, that tune their hyperparameters through adaptive learning. By develo** the ZoMBI algorithm to solve these challenges, it becomes possible to quickly and efficiently find optimal solutions to complex Needle-in-a-Haystack problems in fewer experiments.

Solving a Needle-in-a-Haystack problem that arises from extremely imbalanced data is a significant challenge that has important implications in science and engineering, especially within the field of materials science^10,29. In this paper, we use ZoMBI to discover the optimum materials in two real-world materials science Needle-in-a-Haystack datasets where only a small fraction of the entire search space consists of the target optimum conditions. For breadth, we also extend our analysis to a third real-world dataset but for ecological resource management with the objective of discovering the environmental conditions that have a high likelihood of sustaining wildfires for early detection of wildfires. In the first materials dataset, we discover a material with a highly negative Poisson’s ratio, ν,^20,21; in the second materials dataset, we discover a material with a highly positive thermoelectric figure of merit, ZT^20,57, both rare material properties; and in the third dataset for ecological resource management, we discover a set of environmental conditions with a highly negative wildfire detection index, ψ^25,62,63. For the first dataset, both the ZoMBI algorithm with the LCB and LCB Adaptive custom acquisition functions and HEBO⁵⁰ discover the material with the minimum ν ≈ −1.7, however, the ZoMBI methods converge on this minimum in only 70 experiments while HEBO takes 90 experiments. TuRBO³⁹ and MiP-EGO⁴⁵ only discover materials with ν ≈ − 0.55 and ν ≈ − 0.20, respectively. For the second dataset, the ZoMBI algorithm with the LCB Adaptive custom acquisition function discovers the material with the maximum ZT ≈ 1.4, while HEBO⁵⁰, TuRBO³⁹, and MiP-EGO⁴⁵ only discover ZT ≈ 0.78, ZT ≈ 0.65, and ZT ≈ 0.45, respectively. For the third dataset the ZoMBI algorithm with all acquisition functions and HEBO⁵⁰ discover a minimum ψ ≈ − 3, while TuRBO³⁹ and MiP-EGO⁴⁵ both only discover ψ ≈ − 2. However, the ZoMBI methods converge on the minimum faster and with less variance. In general, we note HEBO⁵⁰ outperforms the other benchmark methods, TuRBO³⁹ and MiP-EGO⁴⁵. Thus, for future investigation, we believe the performance of ZoMBI may be further improved by running optimization within the latent space of a variational autoencoder, similar to HEBO^64,65. Overall, these results demonstrate that the ZoMBI algorithm is more well-suited to tackle various real-world Needle-in-a-Haystack optimization problems than current methods, however, ZoMBI has performance limitations for extremely narrow optima when instantiated with an insufficient initialization set. Therefore to assess these limitations, we stress tested ZoMBI on an additional 174 analytical datasets with varying optimum needle widths, optimum distance to edges, dimensionality, and initialization conditions. These results concluded that with a fixed initialization set of 5 samples, ZoMBI has ideal performance on datasets with needle-like optima consisting of between 0.05% and 5% of total hypervolume space. Furthermore, by extending the range of the initialization set, ZoMBI is capable of discovering global minima that lay on the absolute edge of a manifold’s limits. Thus, in these certain cases, convergence to a global optimum using ZoMBI is not guaranteed, but with slight modifications based on some a priori domain knowledge of the optimization landscape, ZoMBI produces high-performance and low-variance results.

Ultimately, the significance of develo** the ZoMBI algorithm is to quickly and efficiently tackle difficult Needle-in-a-Haystack optimization problems in extremely imbalanced datasets. In this paper, we showcased the ability of the developed algorithm to discover rare materials and conditions with highly-optimized properties in a short period of time using few experiments. Discovering rare materials quickly and efficiently enables widespread access to a new range of materials applications from engineering high-performance medical devices to ubiquitous solid-state cooling systems^{10,15,16,17,18,19}. However, the application space for ZoMBI to accelerate the efficient discovery of highly-optimized solutions extends past materials science and is generally applicable for many Needle-in-a-Haystack problems, including those found in ecological resource management^24,25, fraud detection^26,27, and rare disease prediction^27,28. We aim for this contribution to support the elimination of the time and resource barriers previously inhibiting the throughput of optimizing complex and challenging Needle-in-a-Haystack problems across a broad range of application spaces.

Methods

In this paper, we develop two major contributions: (1) the ZoMBI algorithm and (2) adaptive learning acquisition functions. Through the combination of these two contributions, the optimum region of a NiaH manifold can be quickly discovered in fewer experiments without pigeonholing into local minima. Thus, the three challenges of optimizing NiaH problems are addressed: (1) the challenge of finding a hypervolume within the manifold that contains the needle-like optimum^11,29,30, (2) the challenge of the polynomially increasing compute times of BO using a GP surrogate^{5,6,35,36,37,38}, (3) the challenge of avoiding pigeonholing into local minima^1,9,31,32. We demonstrate the implementation of ZoMBI on a 6D analytical Ackley function, a 6D dataset of materials with Poisson’s ratios, a 6D dataset of thermoelectric materials, and an 11D dataset for wildfire detection, all of which exhibit an extreme data imbalance and a NiaH regime, and compare the performance to that of MiP-EGO⁴⁵, TuRBO³⁹, and HEBO⁵⁰. For each of the three problems, the objective is to find the target value, y, with either the lowest or highest value depending on if the problem is minimization or maximization. This optimum y-value resembles a needle for each problem because it is located within a narrow and steep basin of attraction. Precisely, the needle optimum for each problem has a value of y = 0 for the Ackley function (minimization), y = −1.7 for Poisson’s ratio dataset (minimization), y = 1.9 for the thermoelectric merit dataset (maximization), and y = −12 for the wildfire detection dataset (minimization). To extend the applicability of ZoMBI optimization performance results to a wider array of applications, additional stress tests are conducted on 174 analytical datasets. First, a set of 144 analytical datasets are optimized to assess the failure and success conditions of ZoMBI on problems with extremely narrow optima and few initialization data points. Then, in the Supplemental Information, a set of 30 analytical datasets are optimized to assess the failure and success conditions of ZoMBI on problems with insufficient initialization data and cases where the global optimum is near the edge of the manifold.

The ZoMBI algorithm has two key features: (1) iterative inward bounding of proceeding search spaces using the m number of best-performing memory points within the prior search space and (2) iterative pruning of low-performing historical search space memory. The newly computed search space bounds are unique for each dimension, such that the optimum basin of attraction of complex, non-convex NiaH manifolds can be discovered. This algorithm leverages these two key features to guide the acquisition of new data towards more optimal regions while only fitting the surrogate within the suggested optimum region to resolve more detail of the space of interest, as shown in Figs. 3 and 2. This process subsequently reduces the compute time significantly compared to the compute of a GP in a standard BO procedure, as shown in Fig. 4.

Algorithm 1

Zooming Memory-Based Initialization (ZoMBI)

We define m as the number of retained memory points during an activation of ZoMBI. The m memory points are saved to memory while all other data are erased from memory. These are the historical data points that achieve the m lowest (for minimization) target values, y, and they are used to zoom in the search bounds. Using these memory points, the multi-dimensional upper and lower bounds of the zoomed search space are computed for each dimension, d. Let X ≔ {X₁, X₂, …, X_n} be a set of data points, where ${X}_{j}\in {{\mathbb{R}}}^{d}$. Let $f:{{\mathbb{R}}}^{d}\to {\mathbb{R}}$ be the objective function. We first assume that the points in X are in general position so that f(X) contains unique elements. Then, for each m≤n define X^(m) = {X_π(1), …, X_π(m)} where π is a permutation on {1, …, n} so that {f(X_π(j))} is in ascending order. If f(X) contains repeated elements, we may first remove the points with repeated f values and apply the definition above. Then, for each d, the bounds are defined as:

$$\begin{array}{l}{{{{\mathcal{B}}}}}_{d}^{l}\,=\,\mathop{\min }\limits_{X\in {{{{\bf{X}}}}}^{(m)}}\{{X}_{d}\}\\ {{{{\mathcal{B}}}}}_{d}^{u}\,=\,\mathop{\max }\limits_{X\in {{{{\bf{X}}}}}^{(m)}}\{{X}_{d}\},\end{array}$$

(2)

where ${{{{\mathcal{B}}}}}_{d}^{l}$ and ${{{{\mathcal{B}}}}}_{d}^{u}$ computed lower and lower bounds for each dimension, d, respectively. The bounds $[{{{{\mathcal{B}}}}}_{d}^{l},{{{{\mathcal{B}}}}}_{d}^{u}]$ constrain the proceeding acquisition of new data as well as the computation of a GP, such that sampling cannot occur outsides of the bounded region. This constraining process operates independently for each dimension, such that each dimension has a unique lower and upper bound. To initialize the algorithm with data from the constrained space, i data points are sampled from the bounded region using Latin Hypercube Sampling (LHS). LHS splits a d-dimensional space into i*d equally spaced strata, where i is the number of points to sample uniformly over d dimensions with low variability, unlike random sampling that has high sampling variability⁶⁶. A GP surrogate model is retrained on these i LHS points sampled from the constrained space and then for every proceeding experiment sampled from the space, denoted as a forward experiment, the surrogate model is retrained. Thus, the GP is only being trained on information within the constrained region and as the constrained region iteratively zooms inward and decreases in hypervolume, so does the region computed by the GP. This process allows for more information to be resolved within regions plausibly containing the global optimum basin of attraction. Up to ϕ forward experiments are sampled in serial, where {X_i} ∪ {X_ϕ} ⊆ {X_n}. These forward experiments are sampled by maximizing an acquisition value, a ∈ [0, 1], computed by a user-selected acquisition function from one of the four functions EI, EI Abrupt, LCB, and LCB Adaptive, described in the Methods. Once i + ϕ number of experiments are sampled, the bounds are re-constrained using the m best performing experiments, i new experiments are sampled from the zoomed-in space using LHS, and then the memory is pruned. The process of collecting ϕ forward experiments is repeated. A complete constraining-resetting iteration is denoted as an activation, α. This iterative zooming and pruning process over several α significantly speeds up compute time. Implementation of ZoMBI is shown in Algorithm 1.

Traditional BO acquisition functions, such as EI⁶⁷ and LCB⁶⁸, use the computed means and variances from a surrogate model to compute an acquisition value; maximizing this acquisition value guides sampling of the manifold^7,12,33. However, these traditional acquisition functions are static, such that they do not actively use any information about the performance of previously sampled experiments to guide sampling. Hence, we implement an adaptive learning approach into the acquisition functions to develop two functions, EI Abrupt and LCB Adaptive, that dynamically adapt their sampling based on the quantity and quality of previously sampled experiments. In contrast to a static acquisition function, these adaptive acquisition functions are initialized with an initial set of hyperparameter values to guide their search but then tune these values as sampling progresses. The developed EI Abrupt and LCB Adaptive functions are used within the ZoMBI framework to further accelerate optimization and avoid pigeonholing, see line 9 of Algorithm 1.

LCB Adaptive builds off of previous work that also tune sampling based on the number of experiments collected, n^69,70,71. In this paper, we design LCB Adaptive to tune its hyperparameter to become less explorative as more samples are collected. For example, as the n increases, LCB Adaptive decays its β hyperparameter value to become less explorative and more exploitative. Specifically, this information feedback received by the function determines the contribution of both μ(X) and σ(X) to the acquisition value, a. Similar to EI Abrupt, LCB Adaptive computes an acquisition value, a ∈ [0, 1], for a given X, wherein the X with the highest a is selected by the acquisition function as the next suggested experiment to measure. LCB Adaptive is implemented for a minimization problem as:

$${a}_{{{{\rm{LCB}}}}{{{\rm{Adaptive}}}}}(X,n;\beta ,\epsilon )=\mu (X)-{\epsilon }^{n}\beta \sigma (X),$$

(3)

where n is the number of experiments sampled, and β = 3 and ϵ = 0.9 are hand-tuned initialization hyperparameters selected based on a priori domain knowledge of the function’s performance on a variety of different problems. Having a large β and an ϵ close to 1 supports a gradual decay from very explorative to very exploitative, rather than a rapid decay. The dynamic EI Abrupt and LCB Adaptive are shown to both discover optima faster and avoid pigeonholing into local minima better than their static counterparts by actively balancing the ratio of exploitation to exploration using learned information about the quality and quantity of previously sampled experiments.

EI Abrupt is an acquisition function that flips between the exploitative EI⁶⁷ and explorative LCB⁶⁸ acquisition functions based on the computed finite differences of recently evaluated experiments. For example, if the evaluated experiment y-values plateaus for three or more experiments in a row, EI Abrupt will abruptly switch from a greedy sampling policy to a more explorative sampling policy. Specifically, this information feedback received by the function determines if the current round of sampling should exploit the surrogate mean values, μ(X), or explore the surrogate variances, σ(X). EI Abrupt computes an acquisition value, a ∈ [0, 1], for a given X, wherein the X with the highest a is selected by the acquisition function as the next suggested experiment to measure. EI Abrupt is implemented for a minimization problem as:

$$\begin{array}{l}{a}_{{{{\rm{EI}}}}{{{\rm{Abrupt}}}}}(X,y;\beta ,\xi ,\eta )\,=\,\left\{\begin{array}{ll}\left(\mu (X)-{y}^{* }-\xi \right)\,\Phi \,(Z)+\sigma (X)\psi (Z),& {{\rm{if}}}\,\,| \Delta \{{y}_{n-3...n}\}| \le \eta \\ \mu (X)-\beta \sigma (X),& {{\rm{otherwise}}}\,\end{array}\right.\\ \qquad \qquad \qquad \qquad\,\,\, Z\,=\,\dfrac{\mu (X)\,-\,{y}^{* }\,-\,\xi }{\sigma (X)},\end{array}$$

(4)

where y^* is the lowest measured target value thus far (i.e., the running minimum), Φ( ⋅ ) is the cumulative density function of the normal distribution, ψ( ⋅ ) is the probability density function of the normal distribution, and ∣Δ{y_n−3...n}∣ is the absolute value of the finite differences of the set of target values of the last three sampled experiments. Moreover, β = 0.1, ξ = 0.1, and η = 0 are hand-tuned initialization hyperparameters used for the rest of the paper for EI Abrupt. Moreover, for standard LCB and EI, β = 1 and ξ = 0.1 hyperparameters are used, respectively. These hyperparameters were selected based on a priori domain knowledge of EI Abrupt performance on a variety of different problems. The most important hyperparameter for efficient sampling is β, whose ideal value is non-obvious, but it is found that β = 0.1 allows EI Abrupt to switch into an explorative sampling policy while still having a strong weight on the surrogate means, implying that exploration does not veer far.

Data availability

Implementation of the ZoMBI algorithm, the experimental dataset analyzed during the current study, the simulated data and labeled data supporting the findings of this study, and the data comprising the figures in this paper are all available in the following GitHub repository: https://github.com/PV-Lab/ZoMBI.

References

Liu, Z. et al. Machine learning with knowledge constraints for process optimization of open-air perovskite solar cell manufacturing. Joule 6, 834–849 (2022).
Article Google Scholar
Siemenn, A. E. et al. A machine learning and computer vision approach to rapidly optimize multiscale droplet generation. ACS Appl. Mater. Interfaces 14, 4668–4679 (2022).
Article CAS Google Scholar
Mekki-Berrada, F. et al. Two-step machine learning enables optimized nanoparticle synthesis. npj Comput. Mater. 7, 1–10 (2021).
Article Google Scholar
Sun, S. et al. A data fusion approach to optimize compositional stability of halide perovskites. Matter 4, 1305–1322 (2021).
Article CAS Google Scholar
Snelson, E. & Ghahramani, Z. Sparse Gaussian Processes using Pseudo-inputs, vol. 18 (MIT Press, 2005).
Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (The MIT Press, 2005).
Brochu, E., Cora, V. M. & de Freitas, N. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning (2010).
Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine learning algorithms 1–12 (2001). ar**v:1206.2944v2.
Liang, Q. et al. Benchmarking the performance of Bayesian optimization across multiple experimental materials science domains. npj Comput. Mater. 7, 1–10 (2021).
Article Google Scholar
Kim, Y., Kim, E., Antono, E., Meredig, B. & Ling, J. Machine-learned metrics for predicting the likelihood of success in materials discovery. npj Comput. Mater. 6, 1–9 (2020).
Article CAS Google Scholar
Andricioaei, I. & Straub, J. E. Finding the needle in the haystack: algorithms for conformational optimization. Comput. Phys. 10, 449 (1996).
Article Google Scholar
Seeger, M. Gaussian processes for machine learning. Int. J. Neural Syst. 14, 69–106 (2004).
Article Google Scholar
Snoek, J. et al. Scalable Bayesian optimization using deep neural networks. 32nd Int. Conf. Mach. Learn. ICML 2015 3, 2161–2170 (2015).
Google Scholar
Dagdelen, J., Montoya, J., De Jong, M. & Persson, K. Computational prediction of new auxetic materials. Nat. Commun. 8, 1–8 (2017).
Article CAS Google Scholar
Saxena, K. K., Das, R. & Calius, E. P. Three decades of auxetics research materials with negative Poisson’s ratio: a review. Adv. Eng. Mater. 18, 1847–1870 (2016).
Article CAS Google Scholar
Liu, Q. Literature review: materials with negative Poisson’s ratios and potential applications to aerospace and defense. Tech. Rep., Australian Government Department of Defense (2006).
Salah, W. A. & Abuhelwa, M. Review of thermoelectric cooling devices recent applications. J. Eng. Sci. Technol. 15, 455–476 (2020).
Google Scholar
He, R., Schierning, G. & Nielsch, K. Thermoelectric devices: a review of devices, architectures, and contact optimization. Adv. Mater. Technol. 3, 1700256 (2018).
Article Google Scholar
Mao, J., Chen, G. & Ren, Z. Thermoelectric cooling materials. Nat. Mater. 20, 454–461 (2020).
Article Google Scholar
Jain, A. et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Article Google Scholar
De Jong, M. et al. Charting the complete elastic properties of inorganic crystalline compounds. Sci. Data 2, 1–13 (2015).
Google Scholar
Yeganeh-Haeri, A., Weidner, D. J. & Parise, J. B. Elasticity of α-Cristobalite: a silicon dioxide with a negative Poisson’s ratio. Science 257, 650–652 (1992).
Article CAS Google Scholar
Lakes, R. & Wojciechowski, K. W. Negative compressibility, negative Poisson’s ratio, and stability. Phys. Status Solidi Basic Res. 245, 545–551 (2008).
Article CAS Google Scholar
Rew, L. J., Maxwell, B. D., Dougher, F. L. & Aspinall, R. Searching for a needle in a haystack: evaluating survey methods for non-indigenous plant species. Natl. Park Biol. Invasions 8, 523–539 (2006).
Article Google Scholar
Bouguettaya, A., Zarzour, H., Taberkit, A. M. & Kechida, A. A review on early wildfire detection from unmanned aerial vehicles using deep learning-based computer vision algorithms. Signal Process. 190, 108309 (2022).
Article Google Scholar
Wei, W. et al. Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16, 449–475 (2012).
Article Google Scholar
Marchant, N. G. & Rubinstein, B. I. P. Needle in a haystack: label-efficient evaluation under extreme class imbalance. KDD ’21, August 14–18, 2021, Virtual Event, Singapore 11 (2021). https://doi.org/10.1145/3447548.3467435.
Khalilia, M., Chakraborty, S. & Popescu, M. Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 11, 1–13 (2011).
Article Google Scholar
Crammer, K. & Chechik, G. A Needle in a haystack: local one-class optimization. Proc. 21st Int. Conf. Mach. Learn. Banff, Canada (2004).
Liu, H., Hu, Y., Zhu, B., Matusik, W. & Sifakis, E. Narrow-band topology optimization on a sparsely populated grid. ACM Trans. Graph. 37, 1–14 (2018).
Google Scholar
Nusse, H. E. & Yorke, J. A. Basins of attraction. Science 271, 1376–1380 (1996).
Article CAS Google Scholar
Datseris, G. & Wagemakers, A. Effortless estimation of basins of attraction. Chaos An Interdiscip. J. Nonlinear Sci. 32, 023104 (2022).
Article Google Scholar
Hennig, P. & Schuler, C. J. Entropy search for information-efficient global optimization. J. Mach. Learn. Res. 13, 1809–1837 (2012).
Google Scholar
Mikhail, B., Evgeny, B. & Yermek, K. Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure (2014).
Li, C. et al. High dimensional Bayesian optimization using dropout. Proc. 26th Int. Jt. Conf. Artif. Intell. IJCAI (2017).
Wang, Z., Li, C., Jegelka, S. & Kohli, P. Batched high-dimensional Bayesian optimization via structural kernel learning. Proc. 34th Int. Conf. Mach. Learn. Sydney, Aust. PMLR 70 (2017).
Bui, T. D., Yan, J. & Turner, R. E. A unifying framework for Gaussian process pseudo-point approximations using power expectation propagation. J. Mach. Learn. Res. 18, 1–72 (2017).
CAS Google Scholar
Lan, G., Tomczak, J. M., Roijers, D. M. & Eiben, A. E. Time Efficiency in Optimization with a Bayesian-Evolutionary Algorithm (2020).
Eriksson, D., Pearce, M., Gardner, J. R., Turner, R. & Poloczek, M. Scalable Global Optimization via Local Bayesian Optimization (2020).
Regis, R. G. Trust regions in Kriging-based optimization with expected improvement. Eng. Optim. 48, 1037–1059 (2015).
Article Google Scholar
Diouane, Y., Picheny, V., Le Riche, R., Scotto, A. & Perrotolo, D. TREGO: a Trust-Region Framework for Efficient Global Optimization (2021).
Titsias, M. Variational learning of inducing variables in sparse gaussian processes. Proc. Mach. Learn. Res. 5, 567–574 (2009).
Google Scholar
Leibfried, F., Dutordoir, V., John, S. T. & Durrande, N. A Tutorial on Sparse Gaussian Processes and Variational Inference (2021).
Turner, R. E. & Sahani, M. Two problems with variational expectation maximisation for time-series models. In Barber, D., Cemgil, T. & Chiappa, S. (eds.) Bayesian Time series models, chap. 5, 109–130 (Cambridge University Press, 2011).
van Stein, B., Wang, H. & Back, T. Automatic configuration of deep neural networks with parallel efficient global optimization. 2019 Int. Jt. Conf. Neural Netw. 1–7 (2019).
Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13, 455–492 (1998).
Article Google Scholar
Joy, T. T., Rana, S., Gupta, S. & Venkatesh, S. Fast hyperparameter tuning using Bayesian optimization with directional derivatives. Knowledge-Based Syst. 205, 106247 (2020).
Article Google Scholar
Klein, A., Falkner, S., Bartels, S., Hennig, P. & Hutter, F. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets (2017).
Zhang, Y., Bahadori, M. T., Su, H. & Sun, J. FLASH: Fast Bayesian Optimization for Data Analytic Pipelines. Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. (2016).
Cowen-Rivers, A. I. et al. Hebo: pushing the limits of sample-efficient hyperparameter optimisation honorary position. J. Artif. Intell. Res. 70, 1–15 (2021).
Google Scholar
Ackley, D. H. A connectionist machine for genetic hillclimbing (Kluwer Academic Publishers, 1987).
Adorio, E. P. MVF - Multivariate Test Functions Library in C for Unconstrained Global Optimization (2005).
Correa, E. S. & Shapiro, J. L. Model Complexity vs. Performance in the Bayesian Optimization Algorithm (Springer, 2006).
Belyadi, H., Fathi, E. & Belyadi, F. Rock mechanical properties and in situ stresses. Hydraul. Fract. Unconv. Reserv. 13, 215–231 (2019).
Article Google Scholar
Poplavko, Y. M. Mechanical properties of solids. Electron. Mater. 2, 71–93 (2019).
Article Google Scholar
Hinterleitner, B. et al. Thermoelectric performance of a metastable thin-film Heusler alloy. Nature 576, 85–90 (2019).
Article CAS Google Scholar
Madsen, G. K. & Singh, D. J. BoltzTraP. A code for calculating band-structure dependent quantities. Comput. Phys. Commun. 175, 67–71 (2006).
Article CAS Google Scholar
Kim, H. S., Liu, W., Chen, G., Chu, C. W. & Ren, Z. Relationship between thermoelectric figure of merit and energy conversion efficiency. Proc. Natl Acad. Sci. USA 112, 8205–8210 (2015).
Article CAS Google Scholar
Chen, W. H., Wu, P. H., Wang, X. D. & Lin, Y. L. Power output and efficiency of a thermoelectric generator under temperature control. Energy Convers. Manag. 127, 404–415 (2016).
Article Google Scholar
Goldsmid, H. J. Bismuth telluride and its alloys as materials for thermoelectric generation. Materials 7, 2577–2592 (2014).
Article CAS Google Scholar
Rodrigo, P. M., Valera, A., Fernandez, E. F. & Almonacid, F. M. Annual energy harvesting of passively cooled hybrid thermoelectric generator-concentrator photovoltaic modules. IEEE J. Photovoltaics 9, 1652–1660 (2019).
Article Google Scholar
Kohli, G. et al. Ecostress and cimis: a comparison of potential and reference evapotranspiration in riverside county, california. Remote Sens. 12, 4126 (2020).
Article Google Scholar
Mohapatra, A. & Trinh, T. Early wildfire detection technologies in practice - a review. Sustainability 14, 12270 (2022).
Article Google Scholar
Maus, N. et al. Local latent space bayesian optimization over structured inputs (2022). https://arxiv.org/abs/2201.11872v1.
Grosnit, A. et al. High-dimensional bayesian optimisation with variational autoencoders and deep metric learning (2021). https://arxiv.org/abs/2106.03609v3.
McKay, M. D., Beckman, R. J. & Conover, W. J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42, 55–61 (2000).
Article Google Scholar
Saltenis, V. R. One method of multiextremum optimization. Automatic Control and Comput. Sci. 5, 33–38 (1971).
Google Scholar
Auer, P. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2002).
Google Scholar
Srinivas, N., Krause, A., Kakade, S. & Seeger, M. Gaussian process optimization in the bandit setting: no regret and experimental design. Proc. 27th Int. Conf. Mach. Learn. Haifa, Isr. 2010 1015–1022 (2010).
Häse, F., Roch, L. M., Kreisbeck, C. & Aspuru-Guzik, A. Phoenics: a bayesian optimizer for chemistry. ACS Cent. Sci. 4, 1134–1145 (2018).
Article Google Scholar
Häse, F., Aldeghi, M., Hickman, R. J., Roch, L. M. & Aspuru-Guzik, A. Gryffin: an algorithm for bayesian optimization of categorical variables informed by expert knowledge. Appl. Phys. Rev. 8, 031406 (2021).
Article Google Scholar
Reuther, A. et al. Interactive supercomputing on 40,000 cores for machine learning and data analysis. 2018 IEEE, 2018 conference proceedings. High Perform. Extrem. Comput. Conf. 1–6 (2018).

Download references

Acknowledgements

Basita Das is thanked for help in naming the algorithm. **aonan Wang is thanked for initial discussions for this study. John Dagdelen, Hongbin Zhang, and Shyam Dwaraknath are thanked for discussion of and reference to different Needle-in-a-Haystack problems within materials science. We acknowledge the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for providing HPC resources that have contributed to the research results reported within this paper. A.E.S acknowledges support from the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under the Solar Energy Technology Office (SETO) award number DE-EE0009366. Q.L. acknowledges support from the National Research Foundation, Singapore (project No. NRF-NRFF13-2021-0005) and the Ministry of Education, Singapore, under its Research Centre of Excellence award to I-FIM (project No. EDUNC-33-18-279-V12).

Author information

Authors and Affiliations

Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Alexander E. Siemenn & Tonio Buonassisi
Department of Electrical and Computer Engineering, Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
Zekun Ren
**nterra, Singapore, Singapore
Zekun Ren
Department of Mathematics, National University of Singapore, Singapore, Singapore
Qianxiao Li
Institute for Functional Intelligent Materials, National University of Singapore, Singapore, Singapore
Qianxiao Li

Authors

Alexander E. Siemenn
View author publications
You can also search for this author in PubMed Google Scholar
Zekun Ren
View author publications
You can also search for this author in PubMed Google Scholar
Qianxiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Tonio Buonassisi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.E.S., Z.R., and T.B. conceived of and designed the study. Q.L. and T.B. provided guidance on machine learning methods, benchmark functions, and datasets. A.E.S. and Z.R. wrote the code. A.E.S. performed the machine learning modeling and analysis. A.E.S. wrote the paper, while all co-authors reviewed the manuscript.

Corresponding author

Correspondence to Alexander E. Siemenn.

Ethics declarations

Competing interests

Although our laboratory has IP filed covering photovoltaic technologies and materials informatics broadly, we do not envision a direct COI with this study, the content of which is open-sourced. Two of the authors (Z.R. and T.B.) own equity in **nterra Pte Ltd, which applies machine learning to accelerate novel materials development.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Siemenn, A.E., Ren, Z., Li, Q. et al. Fast Bayesian optimization of Needle-in-a-Haystack problems using zooming memory-based initialization (ZoMBI). npj Comput Mater 9, 79 (2023). https://doi.org/10.1038/s41524-023-01048-x

Download citation

Received: 26 August 2022
Accepted: 12 May 2023
Published: 26 May 2023
DOI: https://doi.org/10.1038/s41524-023-01048-x
Springer Nature Limited

This article is cited by

Bayesian optimization-driven enhancement of the thermoelectric properties of polycrystalline III-V semiconductor thin films
- Takamitsu Ishiyama
- Koki Nozawa
- Kaoru Toko
NPG Asia Materials (2024)
Autonomous experiments using active learning and AI
- Zhichu Ren
- Zekun Ren
- Ju Li
Nature Reviews Materials (2023)

Fast Bayesian optimization of Needle-in-a-Haystack problems using zooming memory-based initialization (ZoMBI)

Abstract

Similar content being viewed by others

The importance of implementation details and parameter settings in black-box optimization: a case study on Gaussian estimation-of-distribution algorithms and circles-in-a-square packing problems