1 Introduction

Many inequalities are known in connection with the approximation problem. In 1947 the hypercircle inequality has been applied to boundary value problems in mathematical physics [11]. In 1959, Golomb and Weinberger demonstrated the relevance of the hypercircle inequality (HI) to a large class of numerical approximation problems [5]. At present the method of hypercircle, which has a long history in applied mathematics, has received attention by mathematicians in several directions [4, 12]. In 2011, Khompurngson and Micchelli [6] described HI and its potential application to kernel-based learning when the data is known exactly and then extended it to situation where there is known data error (Hide). Furthermore, the most recent result is applied to the problem of learning a function value in the reproducing kernel Hilbert space. Specifically, a computational experiment of the method of hypercircle, when data error is measured with the \(l^{p}\) norm \((1< p \leq \infty )\), is compared to the regularization method, which is a standard method of learning problem [6, 8]. We continue our research on this topic by presenting a full analysis of hypercircle inequality for data error (Hide) measured with the \(l^{\infty}\) norm [10]. Despite this breakthrough, there is still a significant aspect of data error measure with the \(l^{1}\) norm to consider in this issue. In this paper, we do not only explore the hypercircle inequality for data error measured with the \(l^{1}\) norm, but also provide an unexpected application of hypercircle inequality for only one data error to the \(l^{\infty}\) minimization problem, which is a dual problem in this case.

Recently, we are specifically interested in a detailed analysis of the hypercircle inequality for data error (Hide) measured with the \(l^{\infty}\) norm [10]. Given a set of \(\mathit{{linearly\ independent}}\) vectors \(X= \{ x_{j}: j \in \mathbb{N}_{n}\}\) in a real Hilbert space H with inner product \(\langle \cdot,\cdot \rangle \) and norm \(\Vert \cdot \Vert \), where \(\mathbb{N}_{n} =\{ 1,2,\ldots,n\}\). The Gram matrix of the vector in X is

$$\begin{aligned} G= \bigl( \langle x_{i}, x_{j} \rangle: i, j \in \mathbb{N}_{n} \bigr). \end{aligned}$$

We define the linear operator \(L: H \longrightarrow \mathbb{R}^{n}\) as

$$\begin{aligned} L x = \bigl(\langle x,x_{j} \rangle: j \in \mathbb{N}_{n} \bigr),\quad x\in H. \end{aligned}$$

Consequently, the adjoint map \(L^{T}: \mathbb{R}^{n} \longrightarrow H\) is given as

$$\begin{aligned} L^{T} a = \sum_{j \in \mathbb{N}_{n}} a_{j} x_{j},\quad a \in \mathbb{R}^{n}. \end{aligned}$$

It is well known that for any \(d \in \mathbb{R}^{n}\), there is a unique vector \(x(d) \in M\) such that

$$\begin{aligned} x(d):= L^{T} \bigl(G^{-1}d \bigr):= \arg \min \bigl\{ \Vert x \Vert : x \in H, L(x) = d \bigr\} , \end{aligned}$$
(1)

where M is the n-dimensional subspace of H spanned by the vectors in X; see, for example, [9]. We start with \(I \subseteq \mathbb{N}_{n}\) that contains m elements \((m< n)\). For each \(e = ( e_{1},\ldots,e_{n}) \in \mathbb{R}^{n}\), we also use the notations \(e_{{I}} = (e_{i}: i \in I) \in \mathbb{R}^{m}\) and \(e_{{J}}= (e_{i}: i \in J) \in \mathbb{R}^{n-m}\). We define the set

$$\begin{aligned} \mathbb{E}_{\infty} = \bigl\{ e: e \in \mathbb{R}^{n}: e_{{I}}=0, \vert \!\vert \! \vert e _{{J}} \vert \!\vert \!\vert _{ \infty} \leq \varepsilon \bigr\} , \end{aligned}$$

where ε is some positive number. For each \(d \in \mathbb{R}^{n}\), we define the partial hyperellipse

$$\begin{aligned} \mathcal{H}(d|\mathbb{E}_{\infty}):= \bigl\{ x: x\in H, \Vert x \Vert \leq 1, L(x)- d \in \mathbb{E}_{\infty} \bigr\} . \end{aligned}$$
(2)

Given \(x_{0} \in H\), our main goal here is to estimate \(\langle x, x_{0} \rangle \) for \(x \in \mathcal{H}(d|\mathbb{E}_{\infty})\). According to the midpoint algorithm, we define

$$\begin{aligned} I(x_{0}, d|\mathbb{E}_{\infty}) = \bigl\{ \langle x, x_{0} \rangle: x \in \mathcal{H} (d|\mathbb{E}_{\infty}) \bigr\} . \end{aligned}$$

We point out that \(I(x_{0}, d|\mathbb{E}_{\infty}) \) is a closed bounded subset in \(\mathbb{R}\). Therefore we obtain that

$$\begin{aligned} I(x_{0}, d|\mathbb{E}_{\infty}) = \bigl[ m_{-}(x_{0},d| \mathbb{E}_{\infty}), m_{+}(x_{0},d| \mathbb{E}_{\infty}) \bigr], \end{aligned}$$

where \(m_{-}(x_{0},d|\mathbb{E}_{\infty}) = \min \{\langle x,x_{0} \rangle: x \in \mathcal{H} (d|\mathbb{E}_{\infty})\}\) and \(m_{+}(x_{0},d|\mathbb{E}_{\infty}) = \max \{\langle x,x_{0} \rangle: x \in \mathcal{H} (d|\mathbb{E}_{\infty})\}\). Hence the best estimator is the midpoint of this interval. According to our previous work [10], we give a formula for the right-hand endpoint.

Theorem 1.1

If \(x_{0} \notin M \) and \(\mathcal{H}(d|\mathbb{E}_{\infty}) \) contains more than one point, then

$$\begin{aligned} m_{+}(x_{0},d|\mathbb{E}_{\infty})= \min \bigl\{ \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \Vert \! \vert c_{{J}} \Vert \! \vert _{1} + (d,c).: c \in \mathbb{R}^{n} \bigr\} . \end{aligned}$$
(3)

Therefore the midpoint of the uncertainty \(I(x_{0},d|\mathbb{E}_{\infty}) \) is given by

$$\begin{aligned} m(x_{0},d|\mathbb{E}_{\infty})= \frac{m_{+} (x_{0},d|\mathbb{E}_{\infty})- m_{+} (x_{0},-d|\mathbb{E}_{\infty})}{2}. \end{aligned}$$
(4)

Furthermore, we describe every solution to the error bound problem (3) that is required to find the uncertainty interval midpoint. Specifically, the result is applied to a problem with learning the value of a function in the Hardy space of square-integrable functions on the unit circle, which has a well-known reproducing kernel. These formulas allow us to give explicitly the right-hand endpoint \(m_{+}(x_{0},d|\mathbb{E}_{\infty})\) when only the data error is known. We conjecture that the results of this case appropriately extend to the case of data error measured with the \(l^{1}\) norm, which is our motivation to study this subject.

The paper is organized as follows. In Sect. 2, we provide basic concepts of the particular case of hypercircle inequality for only one data error. Specifically, we provide an explicit solution of a dual problem, which we need for main results. In Sect. 3, we solve the problem of hypercircle inequality for data error measured with the \(l^{1}\) norm. The main result in this section is Theorem 3.3, which establishes the solution for the \(l^{\infty}\) minimization problem, which is a dual problem in this case. Finally, we provide an example of a learning problem in the Hardy space of square-integrable functions on the unit circle and report on numerical experiments of the proposed methods.

2 Hypercircle inequality for only one data error

In this section, we describe HI for only one data error and its potential relevance to kernel-based learning. Given a set \(I \subseteq \mathbb{N}_{n}\) that contains \(n-1 \) elements, we assume that \(\mathbf{j} \notin I\). For each \(e = ( e_{1},\ldots,e_{n}) \in \mathbb{R}^{n}\), we also use the notation \(e_{{I}} = (e_{i}: i \in I) \in \mathbb{R}^{m}\). For each \(d \in \mathbb{R}^{n}\), we define the partial hyperellipse

$$\begin{aligned} \mathcal{H}(d, \varepsilon ):= \bigl\{ x: x\in H, \Vert x \Vert \leq 1, L_{{I}}(x) = d_{{I}}, \bigl\vert \langle x, x_{{\mathbf{j}}} \rangle - d_{{ \mathbf{j}}} \bigr\vert \leq \varepsilon \bigr\} . \end{aligned}$$
(5)

Let \(x_{0} \in H\). Our purpose here is to find the best estimator for \(\langle x,x_{0} \rangle \) knowing that \(\Vert x \Vert \leq 1\),

$$\langle x,x_{i} \rangle = d_{i}\quad \text{for all }j \in \mathbb{N}_{n-1} \quad\text{and}\quad \langle x,x_{{\mathbf{j}}} \rangle =d_{{\mathbf{j}}}+e,\quad \text{where } \vert e \vert \leq \varepsilon. $$

According to our previous work [7], we point out that \(\mathcal{H}(d, \varepsilon ) \) is weakly sequentially compact in the weak topology on H. It follows that \(I(x_{0}, d, \varepsilon ):= \{ \langle x,x_{0} \rangle: x \in \mathcal{H}(d, \varepsilon ) \} \) fills out a closed bounded interval in \(\mathbb{R}\). Clearly, the midpoint of the uncertainty interval is the best estimator for \(\langle x,x_{0} \rangle \) when \(x \in \mathcal{H}(d, \varepsilon )\). Therefore the hypercircle inequality for partially corrupted data becomes as follows.

Theorem 2.1

If \(x_{0} \in H\) and \(\mathcal{H}(d, \varepsilon ) \neq \emptyset \), then there is \(e_{0} \in \mathbb{R}\) such that \(|e_{0}| \leq \varepsilon \) and for any \(x \in \mathcal{H}(d, \varepsilon )\),

$$\begin{aligned} \bigl\vert \bigl\langle x(d+e_{0}), x_{0} \bigr\rangle - \langle x,x_{0} \rangle \bigr\vert \leq \frac{1}{2} (m_{+}(x_{0},d, \varepsilon ) + m_{-}(x_{0},d, \varepsilon ), \end{aligned}$$

where \(x(d+e_{0}) = Q^{T} (G^{-1}(d+e_{0}) ) \in \mathcal{H}(d, \varepsilon )\),

$$\begin{aligned} m_{+}(x_{0},d, \varepsilon ):= \max \bigl\{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d, \varepsilon ) \bigr\} , \end{aligned}$$
(6)

and

$$\begin{aligned} m_{-}(x_{0},d, \varepsilon ):= \min \bigl\{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d, \varepsilon ) \bigr\} . \end{aligned}$$
(7)

For the particular case \(\varepsilon =0\), let us provide an explicit HI bound and a hypercircle inequality as follows.

Theorem 2.2

If \(x \in \mathcal{H}(d)\) and \(x_{0} \in H\), then

$$\begin{aligned} \bigl\vert \bigl\langle x(d), x_{0} \bigr\rangle - \langle x,x_{0} \rangle \bigr\vert \leq \operatorname{dist}(x_{0},M) \sqrt{1 - \bigl\Vert x(d) \bigr\Vert ^{2}}, \end{aligned}$$
(8)

where \(\operatorname{dist} (x_{0},M):= \min \{ \Vert x_{0} - y \Vert : y \in M \}\).

The inequality above guarantees the presence of an approximation value, which is the vector in the closest point of a hyperplane to the origin. Moreover, it is independent of the vector \(x_{0}\). For the detailed proofs, see [2].

A more complete right-hand endpoint of the uncertainty interval may be obtained by the following results. To this end, we define the function \(V:\mathbb{R}^{n} \rightarrow \mathbb{R} \) for each \(c \in \mathbb{R}^{n} \) by

$$\begin{aligned} V(c):= \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \vert c_{{\mathbf{j}}} \vert + (d,c). \end{aligned}$$

Theorem 2.3

If \(x_{0} \notin M \) and \(\mathcal{H}(d, \varepsilon ) \) contains more than one element, then

$$\begin{aligned} m_{+}(x_{0},d, \varepsilon )= \min \bigl\{ V(c): c \in \mathbb{R}^{n} \bigr\} , \end{aligned}$$
(9)

and the right-hand side of equation (9) has a unique solution.

Proof

See [7]. □

To state the midpoint of the uncertainty interval, we point out the following fact. We begin with the left-hand side of the interval

$$\begin{aligned} -m_{+}(x_{0},-d, \varepsilon ) = m_{-}(x_{0},d, \varepsilon ):= \min \bigl\{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d, \varepsilon ) \bigr\} . \end{aligned}$$

The midpoint is given by

$$\begin{aligned} m(x_{0},d, \varepsilon )= \frac{m_{+} (x_{0},d, \varepsilon )- m_{+} (x_{0},-d, \varepsilon )}{2}. \end{aligned}$$
(10)

In the remainder of this section, we provide an explicit solution to (9).

Theorem 2.4

If \(x_{0} \notin M \) and \(\mathcal{H}(d, \varepsilon )\) contain more than one element, then we have:

1. \(\frac{x_{0}}{ \Vert x_{0} \Vert } \in \mathcal{H}(d, \varepsilon )\) if and only if \(m_{+} (x_{0},d, \varepsilon )= \Vert x_{0} \Vert \),

2. \(x_{+}(d_{{I}}) \in \mathcal{H}(d, \varepsilon )\) if and only if

$$\begin{aligned} m_{+} (x_{0},d, \varepsilon ) = \bigl\langle x(d_{{I}}), x_{0} \bigr\rangle + \operatorname{dist}(x_{0},M_{{I}}) \sqrt{1 - \bigl\Vert x(d_{{I}}) \bigr\Vert ^{2}}, \end{aligned}$$

where the vector \(x_{+}(d_{{I}}):= \arg \max \{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d_{{I}})\} \),

3. \(\frac{x_{0}}{ \Vert x_{0} \Vert }, x_{+}(d_{{I}}) \notin \mathcal{H}(d, \varepsilon )\) if and only if

$$\begin{aligned} m_{+} (x_{0},d, \varepsilon )= \max \bigl\{ \bigl\langle x_{+}(d + \varepsilon \mathbf{e} ), x_{0} \bigr\rangle , \bigl\langle x_{+}(d - \varepsilon \mathbf{e}), x_{0} \bigr\rangle \bigr\} , \end{aligned}$$

where the vector \(\mathbf{e} \in \mathbb{R}^{n}\) with \(\mathbf{e}_{{I}} = 0\) and \(|\mathbf{e}_{{\mathbf{j}}}| = 1 \).

Proof

According to our hypotheses, the minimum \(c^{*} \in \mathbb{R}^{n}\) is the unique solution of the right-hand side of equation (9).

(1) The proof directly follows from [7], that is, we can state that if \(x_{0} \notin M\), then

$$\begin{aligned} 0 = \arg \min \bigl\{ V (c): c \in \mathbb{R}^{n} \bigr\} \end{aligned}$$

if and only if \(\frac{x_{0}}{ \Vert x_{0} \Vert } \in \mathcal{H}(d, \varepsilon )\).

(2) Again from [7] it follows that \(c^{*} = \arg \min \{ V (c): c \in \mathbb{R}^{n}\}\) with \(c^{*}_{{\mathbf{j}}}=0\) if and only if

$$\begin{aligned} x_{+}(d_{{I}}) \in \mathcal{H}(d, \varepsilon ). \end{aligned}$$

By the hypercircle inequality and (8) we obtain that

$$\begin{aligned} \bigl\langle x_{+}(d_{{I}}), x_{0} \bigr\rangle = \bigl\langle x(d_{{I}}), x_{0} \bigr\rangle + \operatorname{dist}(x_{0},M_{{I}}) \sqrt{1 - \bigl\Vert x(d_{{I}}) \bigr\Vert ^{2}}. \end{aligned}$$

(3) Under our hypotheses and [7], the minimum \(c^{*} \in \mathbb{R}^{n}\) is the unique solution of the function V, and \(c^{*}_{{\mathbf{j}}} \neq 0\). Computing the gradient of V yields

$$\begin{aligned} -L \biggl( \frac{x_{0} - L^{T}c^{*}}{ \Vert x_{0} - L^{T}c^{*} \Vert } \biggr) + \varepsilon \operatorname{sgn} \bigl(c^{*}_{n} \bigr)\mathbf{e} + d =0, \end{aligned}$$
(11)

which confirms that

$$L_{I} \bigl(x_{+}(d, \varepsilon ) \bigr) = d_{I} \text{ and } \bigl\langle x_{+}(d, \varepsilon ), x_{{\mathbf{j}}} \bigr\rangle = d_{{ \mathbf{j}}} + \operatorname{sgn} \bigl(c^{*} \bigr)\varepsilon, $$

where the vector \(x_{+}(d, \varepsilon ) \) is given by

$$\begin{aligned} x_{+}(d, \varepsilon ):= \frac{x_{0} - L^{T}c^{*}}{ \Vert x_{0} - L^{T}c^{*} \Vert } \in \mathcal{H}(d + \varepsilon \mathbf{e} ) \cup \mathcal{H}(d - \varepsilon \mathbf{e} ). \end{aligned}$$

Therefore we obtain that

$$\begin{aligned} m_{+} (x_{0},d, \varepsilon )= \max \bigl\{ \bigl\langle x_{+}(d + \varepsilon \mathbf{e} ), x_{0} \bigr\rangle , \bigl\langle x_{+}(d - \varepsilon \mathbf{e}), x_{0} \bigr\rangle \bigr\} . \end{aligned}$$

 □

We end this section by discussing a concrete example of the hypercircle inequality for only one data error for function estimation in a reproducing kernel Hilbert space. Specifically, we report on a new numerical experiment in a reproducing kernel Hilbert space by using the available material from HI and our recent results. A real-valued function \(K(t,s)\) of t and s in \(\mathcal{T}\) is called a reproducing kernel of H if the following property is satisfied for all \(t \in \mathcal{T}\) and \(f \in H\):

$$\begin{aligned} f(t) = \langle K_{t}, f \rangle, \end{aligned}$$
(12)

where \(K_{t}\) is the function defined for \(s \in \mathcal{T}\) as \(K_{t}(s)=K(t,s)\). Moreover, for any kernel K, there is unique RKHS with K as its reproducing kernel [1]. In our example, we choose the Gaussian kernel on \(\mathbb{R}\), that is,

$$\begin{aligned} K(s, t) = e^{-\frac{(s - t)^{2}}{10}},\quad s, t \in \mathbb{R}\mathbbm{.} \end{aligned}$$

The computational steps are organized in the following way. Let \(T=\{t_{j}:j\in \mathbb{N}_{n}\}\) be points of increasing order in \(\mathbb{R}\). Consequently, we have a finite set of linearly independent elements \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) in H, where

$$\begin{aligned} K_{t_{j}}(t):=e^{-\frac{(t_{j} - t)^{2}}{10}},\quad j\in \mathbb{N}_{n}, t \in \mathbb{R}. \end{aligned}$$

Thus the vectors \(\{ x_{j}: j\in \mathbb{N}_{n}\}\) appearing above are identified with the function \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) Therefore the Gram matrix of \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) is given by

$$\begin{aligned} G(t_{1},\ldots,t_{n}):= \bigl( K(t_{i},t_{j}): i,j \in \mathbb{N}_{n} \bigr). \end{aligned}$$

In our experiment, we choose the exact function

$$\begin{aligned} g(t) = -0.15K_{0.5}(t) + 0.05 K_{0.85}(t) - 0.25 K_{-0.5}(t) \end{aligned}$$

and compute the vector \(d=\{ g(t_{j}): j\in \mathbb{N}_{12}\}\) as shown in Fig. 1.

Figure 1
figure 1

Exact function

Given \(t_{0} = 3\), we want to estimate \(f(3) = \langle K_{t_{0}}, f \rangle \) knowing that \(\Vert f \Vert _{K} \leq \rho \) and \(f(t_{i}) = \langle K_{t_{i}}, f \rangle = d_{i}\) for all \(i \in \mathbb{N}_{12}\). In addition, we assume that there is one missing data, that is, we assume that \(g(0)\) is missing. Therefore we proximate \(f(0)\) by \(f_{d_{I}}(0) =6.5768973 \), which is obtained from the hypercircle inequality, Theorem 2.2, whereas the exact value \(g(0) = 6.576978\). Next, we wish to estimate \(f(3)=\langle f,K_{t_{0}} \rangle \) knowing that \(f(t_{j}) = d_{j}\) for all \(j \in \mathbb{N}_{12}\) and \(f(0) = 6.5768973 +e\), where \(|e| \leq \varepsilon \). Clearly, our data set contains both accurate and inaccurate data. Specifically, there is only one data error in this case. By Theorem 2.4 we easily see that \(f_{d_{I}} \in \mathcal{H}(d, \varepsilon )\). Thus the best value to estimate \(f(3)\) is \(f_{d_{I}}(3) = 3.137912\) knowing that \(f(t_{j}) = d_{j}\) for all \(j \in \mathbb{N}_{12}\) and \(f(0) = 6.5768973 +e\), where \(|e| \leq \varepsilon \). The exact value is \(g(3) = 3.1395855\).

3 Hypercircle inequality for data error measured with \(l^{1}\) norm

In the previous section, we have provided basic concepts of the particular case of hypercircle inequality for only one data error. For our purpose, we restrict our attention to the study of hypercircle inequality for partially corrupted data with the \(l^{1}\) norm. We start with \(I \subseteq \mathbb{N}_{n}\) that contains m elements \((m< n)\). For each \(e = ( e_{1},\ldots,e_{n}) \in \mathbb{R}^{n}\), we also use the notations \(e_{{I}} = (e_{i}: i \in I) \in \mathbb{R}^{m}\) and \(e_{{J}}= (e_{i}: i \in J) \in \mathbb{R}^{n-m}\). We define \(\mathbb{E}_{1} =\{ e: e \in \mathbb{R}^{n}: e_{{I}}=0, \Vert \vert e _{{J}} \vert \Vert _{1} \leq \varepsilon \}\), where ε is some positive number. For each \(d \in \mathbb{R}^{n}\), we define the partial hyperellipse

$$\begin{aligned} \mathcal{H}(d|\mathbb{E}_{1}):= \bigl\{ x: x\in H, \Vert x \Vert \leq 1, L(x)- d \in \mathbb{E}_{1} \bigr\} . \end{aligned}$$
(13)

As we said earlier, it follows that \(\mathcal{H}(d|\mathbb{E}_{1})\) is weakly sequentially compact in the weak topology on H and \(I(x_{0},d|\mathbb{E}_{1}):=\{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d|\mathbb{E}_{1})\}\) is a closed bounded interval in \(\mathbb{R}\). Again, the midpoint of the uncertainty interval is the best estimator for \(\langle x, x_{0} \rangle \) when \(x \in \mathcal{H}(d|\mathbb{E}_{1})\). Therefore the midpoint of the uncertainty \(I(x_{0},d|\mathbb{E}_{1}) \) is given by

$$\begin{aligned} m(x_{0},d|\mathbb{E}_{1})= \frac{m_{+} (x_{0},d|\mathbb{E}_{1})- m_{+} (x_{0},-d|\mathbb{E}_{1})}{2}. \end{aligned}$$
(14)

We easily see that the data set contains both accurate and inaccurate data. In the same manner, we provide the duality formula to obtain the right-hand endpoint of the uncertainty interval \(I(x_{0},d|\mathbb{E}_{1})\). To this end, let us define the convex function \(\mathbb{V}: \mathbb{R}^{n} \longrightarrow \mathbb{R}\) by

$$\begin{aligned} \mathbb{V}(c):= \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \Vert \! \vert c_{{ \mathbf{J}}} \vert \! \Vert _{\infty} + (d,c), \quad c \in \mathbb{R}^{n}. \end{aligned}$$
(15)

Theorem 3.1

If \(x_{0} \notin M \) and \(\mathcal{H}(d|\mathbb{E}_{1}) \) contains more than one point, then

$$\begin{aligned} m_{+}(x_{0},d|\mathbb{E}_{1})= \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} , \end{aligned}$$
(16)

and the right-hand side of equation (16) has a unique solution. Moreover, \(x_{+}(d_{I}) \in \mathcal{H}(d|\mathbb{E}_{1})\) if and only if \(c^{*}_{J} = 0 \) where, \(c^{*} = \arg \min \{ \mathbb{V}(c): c \in \mathbb{R}^{n}\} \).

Proof

See [10]. □

We begin our main result of this section by providing a useful observation. To this end, let us introduce the following notations. For each \(j\in \mathbf{J}\), define the function \(\mathbb{V}_{j}: \mathbb{R}^{n} \rightarrow \mathbb{R} \) by

$$\begin{aligned} \mathbb{V}_{j}( c):= \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \vert c_{j} \vert + (c, d),\quad c \in \mathbb{R}^{n}. \end{aligned}$$
(17)

Clearly, we see that the duality formula (17) corresponds to the hyperellipse with only one data error

$$\begin{aligned} \mathcal{H}_{j}(d, \varepsilon ) = \bigl\{ x: x \in B, L_{{ \mathbb{N}_{n} \setminus \{j\}}}(x) =d_{{\mathbb{N}_{n} \setminus \{j \}}}, \bigl\vert \langle x, x_{j} \rangle - d_{j} \bigr\vert < \varepsilon \bigr\} . \end{aligned}$$
(18)

By Theorem 2.4, if \(x_{0} \notin M \) and \(\mathcal{H}_{j}(d, \varepsilon )\) contain more than one element, then there is unique \(a^{*} \in \mathbb{R}^{n}\) such that

$$\begin{aligned} \mathbb{V}_{j} \bigl( a^{*} \bigr)= \min \bigl\{ \mathbb{V}_{j}( a):a \in \mathbb{R}^{n} \bigr\} . \end{aligned}$$

We can now state the first result.

Theorem 3.2

If \(x_{0} \notin M, \mathcal{H}(d|\mathbb{E}_{1}), \mathcal{H}_{j}( \mathbf{d}, \varepsilon ) \) contains more than one point and there exists \(a^{*} = \arg \min \{ \mathbb{V}_{j}(a): a \in \mathbb{R}^{m+1}\}\) with \(\Vert a^{*} \Vert _{\infty} = |a^{*}_{j} |\), then

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} = \min \bigl\{ \mathbb{V}_{j}( a ): a \in \mathbb{R}^{n} \bigr\} . \end{aligned}$$
(19)

Proof

For each \(c \in \mathbb{R}^{n}\), we observe that

$$\begin{aligned} \mathbb{V}_{j}( c) &= \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \vert c_{j} \vert + (c, d) \\ &\leq \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \Vert \! \vert c_{{ \mathbf{J}}} \Vert \! \vert _{ \infty} + (c, d) \\ &= \mathbb{V}( c), \end{aligned}$$

which means that \(\min \{\mathbb{V}_{j}(c ): c \in \mathbb{R}^{n} \} \leq \min \{ \mathbb{V}(c): c \in \mathbb{R}^{n}\}\).

According to our assumption, we obtain that

$$\begin{aligned} \mathbb{V}_{j} \bigl( a^{*} \bigr) &= \bigl\Vert x_{0} - L^{T} \bigl(a^{*} \bigr) \bigr\Vert + \varepsilon \bigl\vert a^{*}_{j} \bigr\vert + \bigl(a^{*}, d \bigr) \end{aligned}$$
(20)
$$\begin{aligned} &= \bigl\Vert x_{0} - L^{T} \bigl(a^{*} \bigr) \bigr\Vert + \varepsilon \bigl\Vert \! \bigl\vert a^{*}_{{ \mathbf{J}}} \bigr\Vert \! \bigr\vert _{ \infty} + \bigl(a^{*}, d \bigr), \end{aligned}$$
(21)

which completes the proof. □

To study the general case, let us introduce the following notations. We first denote the set

$$\begin{aligned} \Lambda _{\infty} = \bigl\{ \lambda: \lambda \in \mathbb{R}^{n}, \lambda _{{ \mathbf{I}}} = 0, \vert \!\vert \!\vert \lambda _{{\mathbf{J}}} \vert \!\vert \!\vert _{\infty} \leq 1 \bigr\} . \end{aligned}$$

For each \(\lambda \in \Lambda _{\infty}\), we denote the set of linearly independent vectors

$$\begin{aligned} X \bigl(\lambda ^{j} \bigr)= \{ x_{i}: i \in \mathbb{N}_{m}\} \cup \bigl\{ \mathbf{x} \bigl( \lambda ^{j} \bigr) \bigr\} \end{aligned}$$

in H, where the vector

$$\begin{aligned} \mathbf{x} \bigl(\lambda ^{j} \bigr) = x_{j} + \sum _{i\in \mathbf{J}\backslash \{j\} } \lambda _{i} x_{i}. \end{aligned}$$

Consequently, we denote by \(M(X(\lambda ^{j})) \) the \((m+1)\)-dimensional linear subspace of H spanned by the vectors in \(X(\lambda ^{j})\). From now on, we denote by \(G( X(\lambda ^{j})) \) the Gram matrix of the vectors in \(X_{j}(\lambda _{{\mathbf{J}}})\), which is symmetric and positive definite. The vector \(\mathbf{d}(\lambda ^{j}) \in \mathbb{R}^{m+1}\) has the components

$$\mathbf{d} \bigl(\lambda ^{j} \bigr)_{i} = d_{i} \quad\text{for }i \in I\text{ and } \mathbf{d} \bigl(\lambda ^{j} \bigr)_{m+1} = d_{j} + \sum _{{i \in \mathbf{J} \backslash \{j\} }} \lambda _{i} d_{i}. $$

Therefore we obtain the following partial hyperellipse with constant \(\mathbf{d}(\lambda ^{j}) \):

$$\begin{aligned} \mathcal{H} \bigl(\mathbf{d} \bigl(\lambda ^{j} \bigr), \varepsilon \bigr) = \bigl\{ x: x \in B, L_{I}(x) = d_{I}, \bigl\vert \bigl\langle x,\mathbf{x} \bigl(\lambda ^{j} \bigr) \bigr\rangle - \mathbf{d} \bigl(\lambda ^{j} \bigr)_{m+1} \bigr\vert < \varepsilon \bigr\} . \end{aligned}$$
(22)

Next, this partial hyperellipse with only one data error as (22) corresponds to a duality formula for the right-hand endpoint of uncertainty interval, \(m_{+}(x_{0},\mathbf{d}(\lambda ^{j}), \varepsilon )\), as shown the following way. For all \(j\in \mathbf{J} \) and \(\lambda \in \Lambda _{\infty}\), we define the function \(\mathbb{V}_{j}(\cdot | \lambda ): \mathbb{R}^{m+1} \rightarrow \mathbb{R} \) by

$$\begin{aligned} \mathbb{V}_{j}( c| \lambda ):= \bigl\Vert x_{0} - L^{T}_{I}(c_{I}) - c_{m+1} \bigl( \mathbf{x} \bigl(\lambda ^{j} \bigr) \bigr) \bigr\Vert + \varepsilon \vert c_{m+1} \vert + \bigl(c, \mathbf{d} \bigl( \lambda ^{j} \bigr) \bigr),\quad c \in \mathbb{R}^{m+1}. \end{aligned}$$
(23)

Theorem 3.3

If \(x_{0} \notin M\), \(\frac{x_{0}}{ \Vert x_{0} \Vert } \notin \mathcal{H}(d|\mathbb{E}_{1})\), and \(\mathcal{H}(d|\mathbb{E}_{1}) \) contains more than one point, then there are \(\hat{\lambda} \in \Lambda _{\infty} \) and \(j \in \mathbf{J}\) such that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} = \min \bigl\{ \mathbb{V}_{ \mathbf{j}}( c|\hat{\lambda} ): c \in \mathbb{R}^{m+1} \bigr\} . \end{aligned}$$
(24)

Proof

According to our assumption, we can conclude that the right-hand side of equation (24) has a unique solution. Since \(\frac{x_{0}}{ \Vert x_{0} \Vert } \notin \mathcal{H}(d|\mathbf{E}_{1})\), the vector \(c^{*} \neq 0 \). We then assume that \(\Vert \vert c^{*}_{{ \mathbf{J}}} \vert \Vert _{\infty} = |c^{*}_{\mathbf{j}}|\) for some \(\mathbf{j} \in \mathbf{J} \). Alternatively, we find that there is \(\hat{\lambda} \in \Lambda _{\infty}\) such that \(c^{*}_{i} = \hat{\lambda}_{i} c^{*}_{\mathbf{j}}\) for all \(i \in \mathbf{J} \backslash \{\mathbf{j} \} \). Therefore we obtain that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} &= \bigl\Vert x_{0} - L^{T}_{I} \bigl(c^{*}_{I} \bigr) - c^{*}_{\mathbf{j}} \bigl(\mathbf{x} \bigl(\hat{ \lambda}^{j} \bigr) \bigr) \bigr\Vert + \varepsilon \bigl\vert c^{*}_{ \mathbf{j}} \bigr\vert + \bigl(c^{*}, \mathbf{d} \bigl(\hat{\lambda}^{j} \bigr) \bigr) \\ &=\min \bigl\{ \mathbb{V}_{\mathbf{j}}( a| \hat{\lambda} ): a \in \mathbb{R}^{m+1} \bigr\} . \end{aligned}$$
(25)

 □

Computing the gradient of \(\mathbb{V}_{\mathbf{j}}( \cdot | \hat{\lambda} ) \), the minimum \(c^{*}_{I \cup \{ \mathbf{j} \}} = a^{*} \in \mathbb{R}^{m+1}\) is a unique solution of the nonlinear equations

$$\begin{aligned} L_{{\mathbf{I} }} \bigl(x_{+} \bigl(\mathbf{d} \bigl(\hat{ \lambda}^{j} \bigr), \varepsilon \bigr) \bigr) = d_{I}, \end{aligned}$$

and

$$\begin{aligned} \bigl\langle x_{+} \bigl(\mathbf{d} \bigl(\hat{\lambda}^{j} \bigr), \varepsilon \bigr),\mathbf{x} \bigl( \lambda ^{j} \bigr) \bigr\rangle - \mathbf{d} \bigl(\lambda ^{j} \bigr)_{m+1} = \operatorname{sgn} \bigl(c^{\star}_{ \mathbf{j}} \bigr)\varepsilon, \end{aligned}$$

where the vector \(x_{+}(\mathbf{d}(\hat{\lambda}^{j}), \varepsilon ) \) is given by

$$\begin{aligned} x_{+} \bigl(\mathbf{d} \bigl(\hat{\lambda}^{j} \bigr), \varepsilon \bigr):= \frac{x_{0} - L^{T}_{I}(a^{*}_{I}) - a^{*}_{\mathbf{j}}(\mathbf{x}(\hat{\lambda}^{j}))) }{ \Vert x_{0} - L^{T}_{I}(a^{*}_{I}) - a^{*}_{\mathbf{j}}(\mathbf{x}(\hat{\lambda}^{j})) \Vert } \in \mathcal{H} \bigl(\mathbf{d} \bigl( \hat{ \lambda}^{j} \bigr), \varepsilon \bigr), \end{aligned}$$

and the partial hyperellipse with the constant \(\mathbf{d}(\hat{\lambda}^{j})\) is

$$\begin{aligned} \mathcal{H} \bigl(\mathbf{d} \bigl(\hat{\lambda}^{j} \bigr), \varepsilon \bigr) = \bigl\{ x: x \in B, L_{I}(x) = d_{I}, \bigl\vert \bigl\langle x,\mathbf{x} \bigl(\hat{\lambda}^{j} \bigr) \bigr\rangle - \mathbf{d} \bigl(\hat{\lambda}^{j} \bigr)_{m+1} \bigr\vert < \varepsilon \bigr\} . \end{aligned}$$

To this end, let us introduce the following set: For each \(\lambda \in \Lambda _{\infty}\), we define

$$\begin{aligned} W(\lambda ):= \min \bigl\{ m_{i}(\lambda ): i \in \mathbf{J} \bigr\} , \end{aligned}$$

where \(m_{i}(\lambda ) = \min \{ \mathbb{V}_{i}( c| \lambda ): c \in \mathbb{R}^{m+1} \}\).

Theorem 3.4

If \(\mathcal{H}(d|\mathbb{E}_{1}) \) contains more than one point, then

$$\begin{aligned} m_{+}(x_{0},d|\mathbb{E}_{1} ) = \min \bigl\{ W( \lambda ): \lambda \in \Lambda _{\infty} \bigr\} \end{aligned}$$
(26)

Proof

For each \(\lambda \in \Lambda _{\infty}\), we see that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} \leq \min \bigl\{ \mathbb{V}_{i}( a|\lambda ): a \in \mathbb{R}^{m+1} \bigr\} \end{aligned}$$

for all \(i \in \mathbf{J}\), that is, for each \(\lambda \in \Lambda _{\infty}\),

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} \leq W( \lambda ). \end{aligned}$$

Consequently, we obtain that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} \leq \inf \bigl\{ W( \lambda ): \lambda \in \Lambda _{\infty} \bigr\} . \end{aligned}$$

According to Theorem 3.2, there are \(\hat{\lambda} \in \Lambda _{\infty}\) and \(\mathbf{j} \in \mathbf{J} \) such that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} = \min \bigl\{ V_{ \mathbf{j}}( c|\hat{\lambda} ): c \in \mathbb{R}^{m+1} \bigr\} . \end{aligned}$$

Therefore we can conclude that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} = \min \bigl\{ W(\lambda ): \lambda \in \Lambda _{\infty} \bigr\} . \end{aligned}$$

 □

We end this section by extending these results to estimate optimally any number of features. Let us define the function \(W:\mathcal{H}(d|\mathbb{E}_{1}) \rightarrow \mathbb{R}^{k}\) defined for \(x \in \mathcal{H}(d|\mathbb{E}_{1})\) by

$$\begin{aligned} Wx = \bigl(\langle x,x_{-k + j}\rangle: j \in \mathbb{N}_{k} \bigr). \end{aligned}$$
(27)

In the case of estimating a single feature, the uncertainty set is an interval. For multiple features, the uncertainty set is a bounded set in a finite-dimensional space. Consequently, the corresponding uncertainty set is given as

$$\begin{aligned} U(d|\mathbb{E}_{1}):= \bigl\{ Wx: x \in \mathcal{H}(d| \mathbb{E}_{1}) \bigr\} . \end{aligned}$$
(28)

It is easy to check that \(U(d|\mathbb{E}_{1})\) is a convex compact subset of \(\mathbb{R}^{k}\). To get the best estimator, we need to find the center and radius of \(U(d|\mathbb{E}_{1})\). We recall the Chebyshev radius and center. For this purpose, we choose the \(l^{\infty}\) norm \(\Vert \vert \cdot \vert \Vert _{\infty} \) on \(\mathbb{R}^{k}\) and define the radius of \(U(d|\mathbb{E}_{1})\) as

$$\begin{aligned} r_{\infty} \bigl(U(d|\mathbb{E}_{1}) \bigr):= \inf _{y \in \mathbb{R}^{k}} \sup_{u \in U(d|\mathbb{E}_{1})} \vert \!\vert \!\vert u - y \vert \!\vert \!\vert _{\infty}. \end{aligned}$$

We denote its center as \(m_{\infty} \in \mathbb{R}^{k}\). In the theorem below, we will show that the \(l^{\infty}\) center of the set \(U(d|\mathbb{E})\) is given by the vector

$$\begin{aligned} m_{\infty}:= \bigl( m(x_{-k+j},d|\mathbb{E}_{1}): j \in \mathbb{N}_{k} \bigr), \end{aligned}$$

where \(m(x_{-k +j},d|\mathbb{E}_{1})\) is the center of the interval \(I(x_{-k+j},d|\mathbb{E}_{1})\) for all \(j \in \mathbb{N}_{k}\).

Theorem 3.5

If \(\mathcal{H}(d|\mathbb{E}_{1}) \neq \emptyset \), then the \(l^{\infty}\) center of the uncertainty set is \(m_{\infty} = ( m(x_{-k+j}, d|\mathbb{E}_{1}): j \in \mathbb{N}_{k})\), and its radius is given by

$$\begin{aligned} r_{\infty} \bigl(U(d|\mathbb{E}_{1}) \bigr) = \max \bigl\{ r \bigl(I(x_{-k+j},d|\mathbb{E}_{1}) \bigr): j \in \mathbb{N}_{k} \bigr\} . \end{aligned}$$

Proof

This follows by the same method as in [6]. □

To this end, we present some results of a numerical experiment on estimating multiple features of a vector in the partial hyperellipse \(\mathcal{H}(d|\mathbb{E}_{1})\). For our computational experiments, we choose the Hardy space of square-integrable functions on the unit circle with reproducing kernel

$$\begin{aligned} K(z,\zeta ) = \frac{1}{1-\overline{\zeta}z}, \zeta,\quad z \in \Delta, \end{aligned}$$

where \(\Delta:=\{ z: |z| \leq 1\}\), [3]. Specifically, let \(H^{2}( \Delta ) \) be the set of all functions analytic in the unit disc Δ with norm

$$\begin{aligned} \Vert f \Vert = \sup_{\substack{0< r< 1}} \biggl(\frac{1}{2 \pi} \int _{0}^{2\pi} \bigl\vert f \bigl(re^{i \theta} \bigr) \bigr\vert ^{2} d\theta \biggr)^{\frac{1}{2}}. \end{aligned}$$

Specifically, let \(T=\{t_{j}:j\in \mathbb{N}_{n}\}\) be points of increasing order in \((-1,1)\). Consequently, we have a finite set of linearly independent elements \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) in H, where

$$\begin{aligned} K_{t_{j}}(t):=\frac{1}{1-t_{j} t}, j\in \mathbb{N}_{n},\quad t \in \Delta. \end{aligned}$$

Thus the vectors \(\{ x_{j}: j\in \mathbb{N}_{n}\}\) appearing above are identified with the functions \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\). Therefore the Gram matrix of \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) is given by

$$\begin{aligned} G(t_{1},\ldots,t_{n}):= \bigl( K(t_{i},t_{j}): i,j \in \mathbb{N}_{n} \bigr). \end{aligned}$$

Next, we recall the Cauchy determinant defined for \(\{ t_{j}: j \in \mathbb{N}_{n}\}\) and \(\{ s_{j}: j \in \mathbb{N}_{n}\}\) as

$$\begin{aligned} \operatorname{det} \biggl(\frac{1}{ 1- t_{i} s_{j}} \biggr)_{i,j \in \mathbb{N}_{n}} = \frac{ \prod_{ 1 \leq j < i \leq n} (t_{j} - t_{i})(s_{j} - s_{i})}{ \prod_{i,j \in \mathbb{N}_{n}} ( 1-t_{i} s_{j})}; \end{aligned}$$
(29)

see, for example, [2]. From this formula we obtain that

$$\begin{aligned} \operatorname{det} G(t_{1},\ldots,t_{n}) = \frac{ \prod_{1\leq i < j \leq n}(t_{i} - t_{j})^{2}}{ \prod_{i,j \in \mathbb{N}_{n}}(1-t_{i}t_{j})}. \end{aligned}$$
(30)

In our case, for any \(t_{0} \in (-1,1) \) and \(t_{0} \notin T:= \{ t_{j}: j \in \mathbb{N}_{n}\}\), we obtain that

$$\begin{aligned} \operatorname{dist} \bigl( K_{t_{0}}, \operatorname{span} \{K_{t_{j}}: j\in \mathbb{N}_{n} \} \bigr) = \frac{ \vert B(t_{0}) \vert }{\sqrt{1-t_{0}^{2}}}, \end{aligned}$$
(31)

where B is the rational function defined for \(t \in \mathbb{C}\setminus \{ t^{-1}_{j}: j \in \mathbb{N}_{n}\}\) by

$$\begin{aligned} B(t):= \prod_{j \in \mathbb{N}_{n}} \frac{t-t_{j}}{1-tt_{j}}, \end{aligned}$$
(32)

and the vector \(x_{0}\) appearing previously is identified with the function \(K_{t_{0}}\). We organize the computational steps as follows. We choose a finite set of linear independent elements \(\{ K_{t_{j}}: j\in \mathbb{N}_{6}\}\) in H with

$$\begin{aligned} t_{1}=-0.9,\qquad t_{2} = -0.6,\qquad t_{3} = -0.3,\qquad t_{4} = 0.3,\qquad t_{5} = 0.6 \quad\text{and} \quad t_{6} = 0.9. \end{aligned}$$

We choose the exact function

$$\begin{aligned} g(t) = -0.15K_{0.5}(t) + 0.05 K_{0.85}(t) - 0.25 K_{-0.5}(t) \end{aligned}$$

and compute the vector \(d=\{ g(t_{j}): j\in \mathbb{N}_{6}\}\). By the definition of (12) the linear operator \(L: H^{2}( \Delta ) \longrightarrow \mathbb{R}^{5}\) is defined for \(f \in H^{2}( \Delta )\) as follows:

$$\begin{aligned} Lf:= \bigl( f(t_{i}): i \in \mathbb{N}_{6} \bigr) )= \bigl( \langle f, K_{t_{i}} \rangle: i \in \mathbb{N}_{6} \bigr) ). \end{aligned}$$

In our experiment, we choose

$$\begin{aligned} t_{-2}=-0.4,\qquad t_{-1} =0,\qquad t_{0} = 0.4, \end{aligned}$$

and we wish to estimate

$$\begin{aligned} Wf= \bigl( \langle f,K_{t_{-3+j}} \rangle = f(t_{-3+j}): j \in \mathbb{N}_{3} \bigr) \end{aligned}$$

when we known that

$$\begin{aligned} f(t_{j}) = d_{j}\quad\text{for all } j \in \mathbb{N}_{6} \setminus \{ 3, 4\}\quad\text{and}\quad \bigl\vert f(t_{3}) - d_{3} \bigr\vert + \bigl\vert f(t_{4}) - d_{4} \bigr\vert \leq \varepsilon = 0.1. \end{aligned}$$

According to Theorem 3.2, the functions \(\mathbb{V}_{1}\) and \(\mathbb{V}_{2}\) become

$$\begin{aligned} \mathbb{V}_{1}( c| \lambda ) = {}&\bigl\Vert K_{t_{0}} - (c_{1}K_{t_{1}} +c_{2} K_{t_{2}} + c_{3} K_{t_{5}} + c_{4} K_{t_{6}}) - c_{5}(K_{t_{3}} + \lambda K_{t_{4}} ) \bigr\Vert \\ &{}+ \varepsilon \vert c_{5} \vert + (c_{I},d_{I}) + c_{5}(d_{3} + \lambda d_{4} ) \end{aligned}$$

and

$$\begin{aligned} \mathbb{V}_{2}( c| \lambda ) ={}& \bigl\Vert K_{t_{0}} - (c_{1}K_{t_{1}} +c_{2} K_{t_{2}} + c_{3} K_{t_{5}} + c_{4} K_{t_{6}}) - c_{5}(K_{t_{4}} + \lambda K_{t_{3}} ) \bigr\Vert + \varepsilon \vert c_{5} \vert + (c_{I},d_{I}) \\ &{}+ c_{5}(d_{4} + \lambda d_{3} ). \end{aligned}$$

In this computation, we found that \(f^{+}_{d_{{I}}} \notin \mathcal{H}(d|\mathbb{E}_{1})\). To obtain the minimum of W, we must compare the values of \(m_{1}(t_{-3+j}, \lambda )\) and \(m_{2}(t_{-3+j}, \lambda )\), where

$$\begin{aligned} m_{1}(t_{-3+j}, \lambda ) = \max \bigl\{ f^{+}_{d(\lambda ^{1}) + \varepsilon \mathbf{e}}(t_{-3+j}), f^{+}_{d(\lambda ^{1}) - \varepsilon \mathbf{e}}(t_{-3+j}) \bigr\} \end{aligned}$$

and

$$\begin{aligned} m_{2}(t_{-3+j}, \lambda ) = \max \bigl\{ f^{+}_{d(\lambda ^{2}) + \varepsilon \mathbf{e}}(t_{-3+j}), f^{+}_{d(\lambda ^{2}) - \varepsilon \mathbf{e}}(t_{-3+j}) \bigr\} , \end{aligned}$$

that is,

$$\begin{aligned} m_{j}(t_{-3+j}, \lambda ) &= \max \bigl\{ f^{+}_{d(\lambda ^{j}) \pm \varepsilon \mathbf{e}}(t_{-3+j}) \bigr\} \\ &= \max \bigl\{ f_{d(\lambda ^{j}) \pm \varepsilon \mathbf{e}}(t_{-3+j}) + \operatorname{dist} \bigl(K_{t_{-3+j}},M \bigl(\lambda ^{j} \bigr) \bigr) \sqrt{1 - \Vert f_{d( \lambda ^{j}\pm \varepsilon \mathbf{e})} \Vert ^{2}} \bigr\} . \end{aligned}$$

To obtain the right-hand endpoint, we need to find the minimum of W defined for \(\lambda \in [-1,1]\). As explained earlier, the midpoint algorithm requires us to find numerically the minimum of the function \(\mathbb{V}\) for d and −d, that is, we compute \(v_{\pm}:= \min \{ \mathbb{V}(c,\pm d): c \in \mathbb{R}^{n}\} \), and then our midpoint estimator is given by \(\frac{v_{+} - v_{-}}{2} \). The result of this computation is shown in Table 1.

Table 1 Optimal value

Furthermore, we see that \(m_{+}(t_{-2},d|\mathbb{E}_{1}) = \min \{ f^{+}_{d(\lambda ^{2}) + \varepsilon \mathbf{e}}(t_{-2}): \lambda \in [-1, 1]\} \). To obtain \(m_{+}(t_{-2},-d|\mathbb{E}_{1}) \), we then plot \(m_{1}\) and \(m_{2}\) as functions of λ for \(t_{-2} = -0.4\), as shown in Fig. 2.

Figure 2
figure 2

\(m_{i}(-0.4, \lambda )\) for −d

Similarly, we find that

$$\begin{aligned} m_{+}(t_{-1},d|\mathbb{E}_{1}) = \min \bigl\{ f^{+}_{d(\lambda ^{2}) + \varepsilon \mathbf{e}}(t_{-1}): \lambda \in [-1, 1] \bigr\} \end{aligned}$$

and

$$\begin{aligned} m_{+}(t_{-1},-d|\mathbb{E}_{1}) = \min \bigl\{ f^{+}_{-d(\lambda ^{1}) + \varepsilon \mathbf{e}}(t_{-1}): \lambda \in [-1, 1] \bigr\} . \end{aligned}$$

For the case \(t_{0} = 0.4\), we plot \(m_{1}\) and \(m_{2}\) as functions of λ for d and −d as shown in Figs. 3 and 4, respectively.

Figure 3
figure 3

\(m_{i}(0.4, \lambda )\) for d

Figure 4
figure 4

\(m_{i}(0.4, \lambda )\) for −d

4 Conclusions

In this paper, we described an unexpected application of hypercircle inequality for only one data error to the \(l^{\infty}\) minimization problem (16). In two different circumstances, we applied what we have learned from recent results to the problem of learning the value of a function in RKHS, which can be beneficial in practice.