Log in

Computing highest density regions for continuous univariate distributions with known probability functions

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

We examine the problem of computing the highest density region (HDR) in a computational context where the user has access to a density function and quantile function for the distribution (e.g., in the statistical language R). We examine several common classes of continuous univariate distributions based on the shape of the density function; this includes monotone densities, quasi-concave and quasi-convex densities, and general multimodal densities. In each case we show how the user can compute the HDR from the quantile and density functions by framing the problem as a nonlinear optimisation problem. We implement these methods in R to obtain general functions to compute HDRs for classes of distributions, and for commonly used families of distributions. We compare our method to existing R packages for computing HDRs and we show that our method performs favourably in terms of both accuracy and average speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. These distributions are in the stats package, which is now included in the base version of R.

  2. The notion of the “smallest” region requires us to work in a probability space with sufficient structure that we can measure the size of an event in the sample space. Our probability space takes the set of real numbers as the sample space and the corresponding class of Borel sets as the sigma-field of all measureable events. Within this context, every possible “region” under analysis is a Borel set. For discrete random variables the “smallness” of a region is measured by counting measure (i.e., the number of points in the region), and for absolutely continuous random variables it is measured by Lebesgue measure.

  3. These properties are trivial to establish. Since \(f\) is a density function, it is non-negative, so we have \(H\left( a \right) = 1\) for all \(a \le 0\). Since \(a < a^{\prime}\) implies \(\left\{ {f\left( X \right) \ge a} \right\} \supseteq \left\{ {f\left( X \right) \ge a^{\prime}} \right\}\) we also have \({\mathbb{P}}\left( {f\left( X \right) \ge a} \right) \ge {\mathbb{P}}\left( {f\left( X \right) \ge a^{\prime}} \right)\) so \(H\) is non-increasing. Note that we can restrict the domain of \(H\) to \({\mathbb{R}}_{0 + }\) without loss of usefulness. Here we have allowed the domain to include the whole real line, but this choice makes no difference to our subsequent analysis.

  4. More specifically, if the set of points in the mode has zero probability, then every non-empty HDR will contain the entire set of points in the mode. Contrarily, if the set of points in the mode has nonzero probability, then every non-empty HDR will contain some of these points, up to the required coverage probability; and will only contain points outside the mode if it has not yet met the required coverage probability.

  5. The function \(u:{\mathbb{R}} \to {\mathbb{R}}\) is defined by \(u\left( x \right) = f^{\prime}\left( x \right)/f\left( x \right)\) for all \(x \in {\mathbb{R}}\). It can be obtained as the derivative of the log-density function. (By convention we set \(u\left( x \right) = 0\) when \(f^{\prime}\left( x \right) = f\left( x \right) = 0\) and we set \(u\left( x \right) = \infty\) when \(f^{\prime}\left( x \right) > 0\) and \(f\left( x \right) = 0\).).

  6. The up-down method is essentially a root-finding problem where we wish to solve \(H\left( a \right) = 1 - \alpha\) for \(a\), but this can easily be framed as an optimisation problem, to give the benefits of derivatives, etc., for rapid convergence to the solution. We frame the method as an optimisation here to give a clearer comparison to the left–right method.

  7. This can be done in cases where the HDR for a multivariate distribution can be reduced to a corresponding optimisation problem for a univariate distribution. For example, the HDR for a multivariate normal distribution is a closed set bounded by the ellipsoid using the Mahalanobis distance. The ellipsoid can be computed using an optimisation for the univariate chi-squared distribution.

  8. In most statistical applications, it is not usually problematic if the computed HDR has a coverage probability that is slightly below the stipulated minimum. If this minimum bound is important then it is possible to change the optimisation method to disallow outcomes that fail to meet the stipulated minimum coverage probability. This can be done either using “penalty” methods or by using iterative procedures that end by going back over the iterations to search for the last iteration that met the required minimum bound requirement.

  9. Since the random variable \(X\) is assumed to be continuous, we have \({\mathbb{P}}\left( {X < L} \right) = {\mathbb{P}}\left( {X \le L} \right) = F\left( L \right) = \theta\), so the parameter is the probability of the random variable being below a closed interval bounded below at \(L\).

  10. The case of the uniform distribution, which is not strictly unimodal, will be considered separately at the end of our analysis. This will ensure that our method covers all continuous unimodal distributions in base R.

  11. Note that the function \(u\) should not be confused with the score function for the random variable. The latter is the derivative of the logarithm of the likelihood function, with respect to the parameters of the distribution. The function \(u\) is instead the derivative of the logarithm of the density, with respect to the outcome value.

  12. For brevity, the functions we give in this paper do not include any checks on the inputs, to ensure that they are of the appropriate type, and to ensure that numerical values are in the admissible range. In actual implementation of these functions outside this paper we have added additional checks on inputs, but we omit those here for brevity. Note also that for consistency of the inputs between the two functions, we allow the input of the density function and the logarithmic-derivative-density function into HDR.monotone, even though these inputs are not used in that function.

  13. The quantile function of the distribution is a necessary input, but the density function and the logarithmic-derivative-density function may be omitted. The latter objects are used to generate analytic outputs for the first and second derivatives of the objective function. If these are omitted then these derivatives are approximated by numerical differentiation instead of being computed analytically from the formulae derived here. Tests by the author show that the function performs well in either case.

  14. For example, the author was able to use a looped calculation to compute the HDR intervals for the chi-squared distribution over 10,000 different values of the degrees-of-freedom parameter. This computation took 5.76 s on a standard personal computer (approximately 1735 HDR outputs per second).

  15. The same basic idea can be extended if the number of local minima is countably infinite, or if there are flat sections, etc. For non-continuous densities it may be more useful to use some sections that are quasi-convex. We will not give a full treatment here, but will instead show one broad case, to illustrate the general method.

  16. We set \(x_{0} = Q\left( 0 \right)\) and \(x_{m + 1} = Q\left( 1 \right)\) with corresponding changes to the openness of the interval if either of these values is infinite. It is also worth noting that there is overlap between these range segments at the boundary points. Since the HDR is defined as a closed interval, this does not cause problems, and indeed, it is the simplest way to frame the optimisation.

  17. This step is not be necessary if the optimisation performed in the second step can go right up to the boundary of the minimising point. However, it is important to note that if the optimisation in this step is transformed to an unconstrained optimisation, then the computed optima will not go right up to the boundary, and so the HDR will appear to have tiny gaps around the minimising points, which should not be present. For this reason we do recommend adding the preliminary step to remove unnecessary segmentation of the support.

  18. If the support does not fall on a known countable set then one cannot really get started with the problem because the search space is uncountable.

  19. The case we examine is for a pivotal quantity based on a function \(h_{X}\) and its inverse \(g_{X}\) are strictly decreasing. The same analysis can be repeated for a strictly increasing function \(h_{X}\) (so that \(g_{X}\) is also strictly increasing). In this case, we obtain the probability interval \(1 - \alpha = {\mathbb{P}}\left( {g_{X} \left( {L\left( \theta \right)} \right) \le \psi \le g_{X} \left( {U\left( \theta \right)} \right)} \right)\) (i.e., the lower and upper bounds are switched), giving the width function \(W\left( \theta \right) = g_{X} \left( {U\left( \theta \right)} \right) - g_{X} \left( {L\left( \theta \right)} \right)\). Thus, the width function and its derivatives are the same as shown in the main analysis, except that they are all negated, and the necessary and sufficient condition for the critical point \(\hat{\theta }\) to be a local minimum is the reverse of the inequality shown.

  20. The mclust package (Fraley et al. 2020) computes the density cut-off but does not compute the HDR.

  21. The HDR functions in the stat.extend package use a “defensive programming” method which exhaustively checks all the inputs to the functions to ensure that they are of the correct type. The functions also converts the computed HDR to a set object (using the sets package) and makes other changes to the output to make it more user-friendly.

  22. This computation is the function sort(1L:1000000L, decreasing = TRUE), which uses functions in the base package.

  23. Since the computed HDR roughly inverts the true HDR, its coverage probability is approximately \(\alpha\), and so the probability disparity is approximately \(1 - 2\alpha\).

  24. The code for the HDR functions in this package uses the algorithms in this paper, but there is also additional code to check inputs and add some further elements to the output. The underlying programming uses “helper” functions that split the coding into pieces. Code for functions used in the stat.extend package can be called in R using the relevant function commands (e.g., HDR.monotone, HDR.unimodal, HDR.bimodal, HDR.discrete.unimodal, HDR.discrete, etc.).

  25. One other possibility is that \({\mathcal{A}} \subset {\mathcal{H}}\) so that \({\mathcal{A}} - {\mathcal{H}} = \emptyset\) and so no density value exists over this set. In this case the second integral is zero and the first is positive, so the first integral is still strictly higher than the second.

References

Download references

Acknowledgements

The author is grateful to two anonymous referees for their comments on an earlier draft of this paper. The comparison of accuracy and speed against other packages was added due to their excellent suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben O’Neill.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of Theorems

Appendix: Proof of Theorems

In this appendix we give proofs of the lemmas and theorems shown in the main body of the paper. For ease of reference, we repeat these lemmas and theorems here.

Lemma 1

If \({\mathcal{H}}\) is a highest density region then for any set \({\mathcal{A}}\) we have:

$$ \begin{gathered} \left| {\mathcal{A}} \right| < \left| {\mathcal{H}} \right| \Rightarrow {\mathbb{P}}\left( {X \in {\mathcal{A}}} \right) < {\mathbb{P}}\left( {X \in {\mathcal{H}}} \right){\text{,}} \hfill \\ \left| {\mathcal{A}} \right| \le \left| {\mathcal{H}} \right| \Rightarrow {\mathbb{P}}\left( {X \in {\mathcal{A}}} \right) \le {\mathbb{P}}\left( {X \in {\mathcal{H}}} \right){\text{.}} \hfill \\ \end{gathered} $$

Proof of Lemma 1

: For any set \({\mathcal{A}}\) we have:

$$ \begin{aligned} {\mathbb{P}}\left( {X \in {\mathcal{H}}} \right) - {\mathbb{P}}\left( {X \in {\mathcal{A}}} \right) &= \mathop \smallint \limits_{{\mathcal{H}}} f\left( x \right)dx - \mathop \smallint \limits_{{\mathcal{A}}} f\left( x \right)dx\\&= \mathop \smallint \limits_{{{\mathcal{H}} - {\mathcal{A}}}} f\left( x \right)dx - \mathop \smallint \limits_{{{\mathcal{A}} - {\mathcal{H}}}} f\left( x \right)dx{.} \end{aligned}$$

If \(\left| {\mathcal{A}} \right| < \left| {\mathcal{H}} \right|\) then we must have \(\left| {{\mathcal{A}} - {\mathcal{H}}} \right| < \left| {{\mathcal{H}} - {\mathcal{A}}} \right|\), so the range of integration in the first integral is larger than in the second integral. Moreover, since \({\mathcal{H}}\) is a highest-density region it encompasses all points with \(f\left( x \right) \ge f_{*}\), so we must have \(f\left( x \right) < f_{*} \le f\left( y \right)\) for all \(x \in {\mathcal{A}} - {\mathcal{H}}\) and \(y \in {\mathcal{H}} - {\mathcal{A}}\). This means that the integrand in the first integral is strictly higher than in the second integral.Footnote 25 Since the first integral has a higher integrand and larger region of integration, this implies that it is larger than the second integral, so \({\mathbb{P}}\left( {X \in {\mathcal{H}}} \right) > {\mathbb{P}}\left( {X \in {\mathcal{A}}} \right)\). This gives us the first inequality result in the lemma; the second result follows analogously. □

Theorem 1

A highest density region \({\mathcal{H}}\) with actual coverage probability \(1 - \alpha\) is a smallest closed covering region with minimal coverage probability \(1 - \alpha\).

Proof of Theorem 1

Consider a HDR \({\mathcal{H}}\) with actual coverage probability \(1 - \alpha\). To show that \({\mathcal{H}}\) is a smallest closed covering region with minimal coverage probability \(1 - \alpha\), we have to show that there is no closed set \({\mathcal{A}}\) with \(\left| {\mathcal{A}} \right| < \left| {\mathcal{H}} \right|\) and \({\mathbb{P}}\left( {X \in {\mathcal{A}}} \right) \ge 1 - \alpha\). This follows directly from the first inequality result in Lemma 1. □

Theorem 2

If the intensity function for \(X\) is continuous then a highest density region \({\mathcal{H}}\) formed with minimal coverage probability \(1 - \alpha\) has actual coverage probability \(1 - \alpha\).

Proof of Theorem 2

Consider a HDR \({\mathcal{H}}\) with minimum coverage probability \(1 - \alpha\). Since the intensity function \(H\) is continuous, the random variable \(X\) is also continuous, so we have:

$$ \begin{array}{*{20}c} {{\mathbb{P}}\left( {X \in {\mathcal{S}}} \right) = {\mathbb{P}}\left( {X \in {\text{closure}}{\mathcal{S}}} \right)} & {} & {{\text{for any set }}{\mathcal{S}}{.}} \\ \end{array} $$

Since the intensity function \(H\) is continuous we have \(H( {f_{*} } ) = 1 - \alpha\), it follows that:

$$ \begin{aligned}{\mathbb{P}}\left( {X \in {\mathcal{H}}} \right) &= {\mathbb{P}}\left( {X \in {\text{closure}}\left\{ {x \in {\mathbb{R}}{|}f\left( x \right) \ge f_{*} } \right\}} \right)\\&= {\mathbb{P}}\left( {X \in \left\{ {x \in {\mathbb{R}}{|}f\left( x \right) \ge f_{*} } \right\}} \right)\\&= {\mathbb{P}}\left( {f\left( X \right) \ge f_{*} } \right) \\ &= H( {f_{*} }) = 1 - \alpha {,} \end{aligned} $$

which establishes that the actual coverage probability is \(1 - \alpha\). □

Table 1 Accuracy of HDRs computed by different R packages

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

O’Neill, B. Computing highest density regions for continuous univariate distributions with known probability functions. Comput Stat 37, 613–649 (2022). https://doi.org/10.1007/s00180-021-01133-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01133-z

Keywords

Navigation