Computing highest density regions for continuous univariate distributions with known probability functions

O’Neill, Ben

doi:10.1007/s00180-021-01133-z

Computing highest density regions for continuous univariate distributions with known probability functions

Original paper
Published: 05 August 2021

Volume 37, pages 613–649, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Ben O’Neill ORCID: orcid.org/0000-0001-6899-0483¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

We examine the problem of computing the highest density region (HDR) in a computational context where the user has access to a density function and quantile function for the distribution (e.g., in the statistical language R). We examine several common classes of continuous univariate distributions based on the shape of the density function; this includes monotone densities, quasi-concave and quasi-convex densities, and general multimodal densities. In each case we show how the user can compute the HDR from the quantile and density functions by framing the problem as a nonlinear optimisation problem. We implement these methods in R to obtain general functions to compute HDRs for classes of distributions, and for commonly used families of distributions. We compare our method to existing R packages for computing HDRs and we show that our method performs favourably in terms of both accuracy and average speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Method to Determine the Degree of Overlap of Two Multivariate Distributions

Directional bivariate quantiles: a robust approach based on the cumulative distribution function

Article 31 August 2019

Minimum distance histograms with universal performance guarantees

Article Open access 13 July 2019

Notes

These distributions are in the stats package, which is now included in the base version of R.
The notion of the “smallest” region requires us to work in a probability space with sufficient structure that we can measure the size of an event in the sample space. Our probability space takes the set of real numbers as the sample space and the corresponding class of Borel sets as the sigma-field of all measureable events. Within this context, every possible “region” under analysis is a Borel set. For discrete random variables the “smallness” of a region is measured by counting measure (i.e., the number of points in the region), and for absolutely continuous random variables it is measured by Lebesgue measure.
These properties are trivial to establish. Since $f$ is a density function, it is non-negative, so we have $H\left( a \right) = 1$ for all $a \le 0$. Since $a < a^{\prime}$ implies $\left\{ {f\left( X \right) \ge a} \right\} \supseteq \left\{ {f\left( X \right) \ge a^{\prime}} \right\}$ we also have ${\mathbb{P}}\left( {f\left( X \right) \ge a} \right) \ge {\mathbb{P}}\left( {f\left( X \right) \ge a^{\prime}} \right)$ so $H$ is non-increasing. Note that we can restrict the domain of $H$ to ${\mathbb{R}}_{0 + }$ without loss of usefulness. Here we have allowed the domain to include the whole real line, but this choice makes no difference to our subsequent analysis.
More specifically, if the set of points in the mode has zero probability, then every non-empty HDR will contain the entire set of points in the mode. Contrarily, if the set of points in the mode has nonzero probability, then every non-empty HDR will contain some of these points, up to the required coverage probability; and will only contain points outside the mode if it has not yet met the required coverage probability.
The function $u:{\mathbb{R}} \to {\mathbb{R}}$ is defined by $u\left( x \right) = f^{\prime}\left( x \right)/f\left( x \right)$ for all $x \in {\mathbb{R}}$. It can be obtained as the derivative of the log-density function. (By convention we set $u\left( x \right) = 0$ when $f^{\prime}\left( x \right) = f\left( x \right) = 0$ and we set $u\left( x \right) = \infty$ when $f^{\prime}\left( x \right) > 0$ and $f\left( x \right) = 0$.).
The up-down method is essentially a root-finding problem where we wish to solve $H\left( a \right) = 1 - \alpha$ for $a$, but this can easily be framed as an optimisation problem, to give the benefits of derivatives, etc., for rapid convergence to the solution. We frame the method as an optimisation here to give a clearer comparison to the left–right method.
This can be done in cases where the HDR for a multivariate distribution can be reduced to a corresponding optimisation problem for a univariate distribution. For example, the HDR for a multivariate normal distribution is a closed set bounded by the ellipsoid using the Mahalanobis distance. The ellipsoid can be computed using an optimisation for the univariate chi-squared distribution.
In most statistical applications, it is not usually problematic if the computed HDR has a coverage probability that is slightly below the stipulated minimum. If this minimum bound is important then it is possible to change the optimisation method to disallow outcomes that fail to meet the stipulated minimum coverage probability. This can be done either using “penalty” methods or by using iterative procedures that end by going back over the iterations to search for the last iteration that met the required minimum bound requirement.
Since the random variable $X$ is assumed to be continuous, we have ${\mathbb{P}}\left( {X < L} \right) = {\mathbb{P}}\left( {X \le L} \right) = F\left( L \right) = \theta$, so the parameter is the probability of the random variable being below a closed interval bounded below at $L$.
The case of the uniform distribution, which is not strictly unimodal, will be considered separately at the end of our analysis. This will ensure that our method covers all continuous unimodal distributions in base R.
Note that the function $u$ should not be confused with the score function for the random variable. The latter is the derivative of the logarithm of the likelihood function, with respect to the parameters of the distribution. The function $u$ is instead the derivative of the logarithm of the density, with respect to the outcome value.
For brevity, the functions we give in this paper do not include any checks on the inputs, to ensure that they are of the appropriate type, and to ensure that numerical values are in the admissible range. In actual implementation of these functions outside this paper we have added additional checks on inputs, but we omit those here for brevity. Note also that for consistency of the inputs between the two functions, we allow the input of the density function and the logarithmic-derivative-density function into HDR.monotone, even though these inputs are not used in that function.
The quantile function of the distribution is a necessary input, but the density function and the logarithmic-derivative-density function may be omitted. The latter objects are used to generate analytic outputs for the first and second derivatives of the objective function. If these are omitted then these derivatives are approximated by numerical differentiation instead of being computed analytically from the formulae derived here. Tests by the author show that the function performs well in either case.
For example, the author was able to use a looped calculation to compute the HDR intervals for the chi-squared distribution over 10,000 different values of the degrees-of-freedom parameter. This computation took 5.76 s on a standard personal computer (approximately 1735 HDR outputs per second).
The same basic idea can be extended if the number of local minima is countably infinite, or if there are flat sections, etc. For non-continuous densities it may be more useful to use some sections that are quasi-convex. We will not give a full treatment here, but will instead show one broad case, to illustrate the general method.
We set $x_{0} = Q\left( 0 \right)$ and $x_{m + 1} = Q\left( 1 \right)$ with corresponding changes to the openness of the interval if either of these values is infinite. It is also worth noting that there is overlap between these range segments at the boundary points. Since the HDR is defined as a closed interval, this does not cause problems, and indeed, it is the simplest way to frame the optimisation.
This step is not be necessary if the optimisation performed in the second step can go right up to the boundary of the minimising point. However, it is important to note that if the optimisation in this step is transformed to an unconstrained optimisation, then the computed optima will not go right up to the boundary, and so the HDR will appear to have tiny gaps around the minimising points, which should not be present. For this reason we do recommend adding the preliminary step to remove unnecessary segmentation of the support.
If the support does not fall on a known countable set then one cannot really get started with the problem because the search space is uncountable.
The case we examine is for a pivotal quantity based on a function $h_{X}$ and its inverse $g_{X}$ are strictly decreasing. The same analysis can be repeated for a strictly increasing function $h_{X}$ (so that $g_{X}$ is also strictly increasing). In this case, we obtain the probability interval $1 - \alpha = {\mathbb{P}}\left( {g_{X} \left( {L\left( \theta \right)} \right) \le \psi \le g_{X} \left( {U\left( \theta \right)} \right)} \right)$ (i.e., the lower and upper bounds are switched), giving the width function $W\left( \theta \right) = g_{X} \left( {U\left( \theta \right)} \right) - g_{X} \left( {L\left( \theta \right)} \right)$. Thus, the width function and its derivatives are the same as shown in the main analysis, except that they are all negated, and the necessary and sufficient condition for the critical point $\hat{\theta }$ to be a local minimum is the reverse of the inequality shown.
The mclust package (Fraley et al. 2020) computes the density cut-off but does not compute the HDR.
The HDR functions in the stat.extend package use a “defensive programming” method which exhaustively checks all the inputs to the functions to ensure that they are of the correct type. The functions also converts the computed HDR to a set object (using the sets package) and makes other changes to the output to make it more user-friendly.
This computation is the function sort(1L:1000000L, decreasing = TRUE), which uses functions in the base package.
Since the computed HDR roughly inverts the true HDR, its coverage probability is approximately $\alpha$, and so the probability disparity is approximately $1 - 2\alpha$.
The code for the HDR functions in this package uses the algorithms in this paper, but there is also additional code to check inputs and add some further elements to the output. The underlying programming uses “helper” functions that split the coding into pieces. Code for functions used in the stat.extend package can be called in R using the relevant function commands (e.g., HDR.monotone, HDR.unimodal, HDR.bimodal, HDR.discrete.unimodal, HDR.discrete, etc.).
One other possibility is that ${\mathcal{A}} \subset {\mathcal{H}}$ so that ${\mathcal{A}} - {\mathcal{H}} = \emptyset$ and so no density value exists over this set. In this case the second integral is zero and the first is positive, so the first integral is still strictly higher than the second.

References

Box GEP, Tiao GC (1973) Bayesian inference in statistical analysis. Addison-Wesley, Reading, MA
MATH Google Scholar
Chasnovski E (2019) pdqr: work with custom distribution functions. R package, Version 0.2.1. https://CRAN.R-project.org/package=pdqr
Fraley C, Raftery AE, Scrucca L, Murphy TB, Fop M (2020) mclust: Gaussian mixture modelling for model-based clustering, classification, and density estimation. R package, Version 5.4.6. https://CRAN.R-project.org/package=mclust
Gardner MJ, Altman DG (1986) Confidence intervals rather than P values: estimation rather than hypothesis testing. Stat Med 292:746–750
Google Scholar
Hyndman R (1996) Computing and graphing highest density regions. Am Stat 50(2):120–126
Google Scholar
Hyndman R, Einbeck J, Wand M (2018) hdrcde: highest density regions and conditional density estimation. R package, Version 3.3. https://CRAN.R-project.org/package=hdrcde
Kruschke JK (2015) Doing Bayesian data analysis: a tutorial with R and BUGS, 2nd edn. Academic Press, London
MATH Google Scholar
Meredith M, Kruschke J (2020) HDIntervals: highest (posterior) density intervals. R package, Version 0.2.2. https://CRAN.R-project.org/package=HDInterval
Mersmann O (2019) microbenchmark: accurate timing functions. R Package, Version 1.4.7. https://CRAN.R-project.org/package=microbenchmark
Meyer D, Hornik K (2009) Generalised and customizable sets in R. J Stat Softw 31(2):1–27
Article Google Scholar
O’Neill B (2014) Some useful moment results in sampling problems. Am Stat 68(4):282–296
Article MathSciNet Google Scholar
O’Neill B (2021) Smallest covering regions and highest density regions for discrete distributions. Under submission
O’Neill B, Fultz N (2020) stat.extend: highest density regions and other functions of distributions. R package, Version 0.1.1. https://CRAN.R-project.org/package=stat.extend
Royall RM (1986) Model robust confidence intervals using maximum likelihood estimators. Int Stat Rev 54(2):221–226
Article MathSciNet Google Scholar
Schnabel RB, Koontz JE, Weiss BE (1985) A modular system of algorithms for unconstrained minimization. ACM Trans Math Softw 11(4):419–440
Article MathSciNet Google Scholar
Tate RF, Klett GW (1959) Optimal confidence intervals for the variance of a normal distribution. J Am Stat Assoc 54(287): 674–682
Article MathSciNet Google Scholar

Download references

Acknowledgements

The author is grateful to two anonymous referees for their comments on an earlier draft of this paper. The comparison of accuracy and speed against other packages was added due to their excellent suggestions.

Author information

Authors and Affiliations

Research School of Population Health, Australian National University, Canberra, ACT, 0200, Australia
Ben O’Neill

Authors

Ben O’Neill
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ben O’Neill.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of Theorems

In this appendix we give proofs of the lemmas and theorems shown in the main body of the paper. For ease of reference, we repeat these lemmas and theorems here.

Lemma 1

If ${\mathcal{H}}$ is a highest density region then for any set ${\mathcal{A}}$ we have:

$$ \begin{gathered} \left| {\mathcal{A}} \right| < \left| {\mathcal{H}} \right| \Rightarrow {\mathbb{P}}\left( {X \in {\mathcal{A}}} \right) < {\mathbb{P}}\left( {X \in {\mathcal{H}}} \right){\text{,}} \hfill \\ \left| {\mathcal{A}} \right| \le \left| {\mathcal{H}} \right| \Rightarrow {\mathbb{P}}\left( {X \in {\mathcal{A}}} \right) \le {\mathbb{P}}\left( {X \in {\mathcal{H}}} \right){\text{.}} \hfill \\ \end{gathered} $$

Proof of Lemma 1

: For any set ${\mathcal{A}}$ we have:

$$ \begin{aligned} {\mathbb{P}}\left( {X \in {\mathcal{H}}} \right) - {\mathbb{P}}\left( {X \in {\mathcal{A}}} \right) &= \mathop \smallint \limits_{{\mathcal{H}}} f\left( x \right)dx - \mathop \smallint \limits_{{\mathcal{A}}} f\left( x \right)dx\\&= \mathop \smallint \limits_{{{\mathcal{H}} - {\mathcal{A}}}} f\left( x \right)dx - \mathop \smallint \limits_{{{\mathcal{A}} - {\mathcal{H}}}} f\left( x \right)dx{.} \end{aligned}$$

If $\left| {\mathcal{A}} \right| < \left| {\mathcal{H}} \right|$ then we must have $\left| {{\mathcal{A}} - {\mathcal{H}}} \right| < \left| {{\mathcal{H}} - {\mathcal{A}}} \right|$, so the range of integration in the first integral is larger than in the second integral. Moreover, since ${\mathcal{H}}$ is a highest-density region it encompasses all points with $f\left( x \right) \ge f_{*}$, so we must have $f\left( x \right) < f_{*} \le f\left( y \right)$ for all $x \in {\mathcal{A}} - {\mathcal{H}}$ and $y \in {\mathcal{H}} - {\mathcal{A}}$. This means that the integrand in the first integral is strictly higher than in the second integral.^{Footnote 25} Since the first integral has a higher integrand and larger region of integration, this implies that it is larger than the second integral, so ${\mathbb{P}}\left( {X \in {\mathcal{H}}} \right) > {\mathbb{P}}\left( {X \in {\mathcal{A}}} \right)$. This gives us the first inequality result in the lemma; the second result follows analogously. □

Theorem 1

A highest density region ${\mathcal{H}}$ with actual coverage probability $1 - \alpha$ is a smallest closed covering region with minimal coverage probability $1 - \alpha$.

Proof of Theorem 1

Consider a HDR ${\mathcal{H}}$ with actual coverage probability $1 - \alpha$. To show that ${\mathcal{H}}$ is a smallest closed covering region with minimal coverage probability $1 - \alpha$, we have to show that there is no closed set ${\mathcal{A}}$ with $\left| {\mathcal{A}} \right| < \left| {\mathcal{H}} \right|$ and ${\mathbb{P}}\left( {X \in {\mathcal{A}}} \right) \ge 1 - \alpha$. This follows directly from the first inequality result in Lemma 1. □

Theorem 2

If the intensity function for $X$ is continuous then a highest density region ${\mathcal{H}}$ formed with minimal coverage probability $1 - \alpha$ has actual coverage probability $1 - \alpha$.

Proof of Theorem 2

Consider a HDR ${\mathcal{H}}$ with minimum coverage probability $1 - \alpha$. Since the intensity function $H$ is continuous, the random variable $X$ is also continuous, so we have:

$$ \begin{array}{*{20}c} {{\mathbb{P}}\left( {X \in {\mathcal{S}}} \right) = {\mathbb{P}}\left( {X \in {\text{closure}}{\mathcal{S}}} \right)} & {} & {{\text{for any set }}{\mathcal{S}}{.}} \\ \end{array} $$

Since the intensity function $H$ is continuous we have $H( {f_{*} } ) = 1 - \alpha$, it follows that:

$$ \begin{aligned}{\mathbb{P}}\left( {X \in {\mathcal{H}}} \right) &= {\mathbb{P}}\left( {X \in {\text{closure}}\left\{ {x \in {\mathbb{R}}{|}f\left( x \right) \ge f_{*} } \right\}} \right)\\&= {\mathbb{P}}\left( {X \in \left\{ {x \in {\mathbb{R}}{|}f\left( x \right) \ge f_{*} } \right\}} \right)\\&= {\mathbb{P}}\left( {f\left( X \right) \ge f_{*} } \right) \\ &= H( {f_{*} }) = 1 - \alpha {,} \end{aligned} $$

which establishes that the actual coverage probability is $1 - \alpha$. □

Table 1 Accuracy of HDRs computed by different R packages

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

O’Neill, B. Computing highest density regions for continuous univariate distributions with known probability functions. Comput Stat 37, 613–649 (2022). https://doi.org/10.1007/s00180-021-01133-z

Download citation

Received: 25 May 2020
Accepted: 05 July 2021
Published: 05 August 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00180-021-01133-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Computing highest density regions for continuous univariate distributions with known probability functions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient Method to Determine the Degree of Overlap of Two Multivariate Distributions

Directional bivariate quantiles: a robust approach based on the cumulative distribution function

Minimum distance histograms with universal performance guarantees

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Proof of Theorems

Lemma 1

Proof of Lemma 1

Theorem 1

Proof of Theorem 1

Theorem 2

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Computing highest density regions for continuous univariate distributions with known probability functions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient Method to Determine the Degree of Overlap of Two Multivariate Distributions

Directional bivariate quantiles: a robust approach based on the cumulative distribution function

Minimum distance histograms with universal performance guarantees

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Proof of Theorems

Appendix: Proof of Theorems

Lemma 1

Proof of Lemma 1

Theorem 1

Proof of Theorem 1

Theorem 2

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation