1 Introduction

Our work presents an analysis of the persistence properties in Bitcoin’s higher-order moments, using a modeling framework based on the representation of conditional densities with time-varying parameters, through the structure of Generalized Autoregressive Score models (Creal et al., 2013). This structure allows a common representation for several models of time-varying parameters, including as particular case GARCH conditional variance models, but also allowing autoregressive dynamics for the parameters of the distribution associated with higher-order moments, generating conditional asymmetry and kurtosis structures.

This common structure is especially relevant to analyze the dynamics of the conditional distribution of Bitcoin returns, characterized by a dynamic of high volatility, extreme price variations and thus a very complex risk structure. There are several differences between Bitcoin and traditional assets, as discussed by (Härdle et al., 2020) and (Petukhina et al., 2020). Similar to other financial assets, cryptocurrencies returns present time-varying conditional volatility structures. However, these assets display a greater asymmetry in returns and the presence of more extreme values than traditional financial assets, such as stocks and index.

Bitcoin and other cryptocurrencies are characterized by the presence of negative skewness and other relevant deviations from the class of symmetric distributions, as discussed in (Cerqueti et al., 2020). These characteristics are especially important in investment decisions, since the presence of different probabilities for gains and losses is a violation of a fundamental assumption of symmetry in the distribution of returns assumed in many portfolios’ allocation and risk management methods (Lezmi et al., 2018; Boudt et al., 2020).

These characteristics can be explained in part by the greater presence of jumps in the prices of these assets. Jumps, defined as a variation of large magnitude in prices and with low probability of being generated by a continuous process, are notably higher in cryptocurrencies, compared to traditional assets, such as stocks, index and other financial instruments, as discussed in (Chaim and Laurini, 2018), (Scaillet et al., 2018) and (Bouri et al., 2020). Jump processes generate asymmetry and excess kurtosis in the distribution of financial assets, due to the impact of extreme values on third- and fourth-order moments, as discussed for example in (Manganelli et al., 2008).

The presence of jumps is an important factor in the design of risk management mechanisms and portfolio allocation, making these procedures more complex when compared to traditional assets with low occurrence of jumps. Hedge mechanisms are more limited due to the lower availability of derivative markets for these assets, especially high liquidity options. Futures contracts on Bitcoins were introduced in the Chicago Mercantile Exchange in December 2017, and options in January 2020, as discussed by (Geman and Price, 2020), but they do not yet have sufficient liquidity to manage the extreme risks associated with these assets. See the discussion in (Trimborn et al., 2019) about liquidity aspects in cryptocurrency portfolio allocation, and (Hou et al., 2019) for methods for pricing cryptocurrency options.

Motivated by these features, we estimate Generalized Autoregressive Score (GAS) (Creal et al., 2013) models with distributions that admits time varying higher moments, using an autoregressive structure for volatility (scale), skewness and shape parameters for the Bitcoin return series. Our main objective is to verify if there are predictive gains in a risk management application using this approach. Using the Model Confidence Set (MCS) (Hansen et al., 2010) algorithm for model selection, we explore a series of alternative distributions and investigate how the models perform in predicting Value at Risk (VaR) through an asymmetric quantile loss function.

The use of a modeling based on the GAS representation for time-varying conditional densities is a way of simultaneously capturing the presence of conditional volatility, asymmetry and kurtosis with several alternative distributions. Furthermore, it is an indirect way of incorporating the presence of jumps in the conditional distribution of returns, through the impact of jumps on the asymmetry and kurtosis of the process, without the need to directly incorporate the jump structure in the process modeling.

Models with the direct incorporation of the jump structure can be formulated using various representations, for example using structures based on compound binomial processes (e.g., (Chaim and Laurini, 2018) and (Chaim and Laurini, 2019b)), or models using Poisson processes (e.g., (Scaillet et al., 2018) and (Chen and Huang, 2021)). However, these models normally assume that the jump processes are (conditionally) independent, and thus the occurrence of a jump does not affect the probability of a jump occurring in the following periods. This is an important limitation, since the possible persistence in the occurrence of jumps is not incorporated in these models.

However, there is relevant evidence against the hypothesis of independence in the jumps in the Bitcoin series, as pointed out by (Scaillet et al., 2018) using tests for clustering in jumps. It is possible to conjecture that the presence of dependent jumps can explain the presence of autoregressive asymmetry structures and conditional kurtosis in financial time series, by making the impact of extreme values persistent in the third- and fourth-order moments, analogous to the impact of autoregressive structures in the second-order moment, as captured by the GARCH and stochastic volatility models.

The relevant question is how to incorporate the presence of predictability/persistence in the jumps of the process. In (Scaillet et al., 2018), this jump predictability is indirectly modeled through a binomial regression model using a series of covariates, and thus is a way of estimating the probability of occurrence of a jump in a next period, and not directly the impact of the jump on the returns, which is necessary for risk management procedures, such as the calculation of Value-at-Risk. Another alternative way would be to use a non-homogeneous Poisson model, where the probability of occurrence of jumps is determined by exogenous covariates. This structure has two limitations — the first is the need to obtain relevant covariates for this estimation, which increases the information set needed for risk modeling. The second limitation is that these models do not incorporate the direct possibility of autoregressive dependence on jumps, where the occurrence of a jump in one period directly affects the possibility of jumps in the following periods. The incorporation of direct dependency in the jump occurrence can be done, for example, using self-exciting Hawkes process (Hawkes, 2018) or Poisson models with stochastic intensity, known as Cox processes (Cox, 1955). The relationship between Hawkes, Cox and other processes is discussed in (Jang and Oh, 2021).

Although these two formulations are interesting, they introduce considerable difficulties in the process of modeling returns and risk, since the use of these models to represent jumps in returns is complex due to jumps being a latent variable, that is, not directly observed, since we only have the series of observed returns. Thus, the use of these models for the modeling of jump activity adds a great complexity to the estimation process, since in these cases the likelihood function depends on latent processes and thus cannot be evaluated directly.

The modeling proposed in this article avoids these complexities by directly modeling the impact of possible jumps and the persistence of this process on volatility, skewness and conditional kurtosis structures. Thus, we do not need to use complex inference methods, for example Bayesian inference using Markov Chain Monte Carlo methods, for the latent intensity process of the jump process, since we use the representation of generalized score models for conditional densities with time-varying parameters and thus the function of likelihood can be directly evaluated. This formulation directly captures the persistence in the higher-order moments of the Bitcoin series, and thus allows the measurement of the risk in this asset, measured by the Value-at-Risk derived from the conditional density obtained by the GAS representation. So, we have a direct way of not only capturing the existence of skewness and kurtosis in the Bitcoin series, but making these moments vary in time.

The empirical literature on Bitcoin has expanded greatly in recent years. However, attention has been focused on patterns of conditional volatility, the presence of outliers in the return series and there is a vast literature on long memory patterns for this asset. Regarding the dynamics of volatility, many studies explored models of the GARCH family. (Dyhrberg, 2016a) and (Dyhrberg, 2016b) argues that it is possible to build hedge strategies from Bitcoin against stocks using these models. To find the best fit for the Bitcoin series (Katsiampa, 2017) incorporates a long-term volatility component after comparing several GARCH models. Analyzing several cryptocurrencies (Baur and Dimpfl, 2018) suggests that the best model for conditional volatility for cryptocurrencies should be an integrated GARCH. The authors argue this result based on likelihood criteria and unconditional and conditional Value at Risk coverage tests.

Long-memory analysis for cryptocurrencies can be found in a series of papers. (Bariviera et al., 2017) analyzed stylized facts from the Bitcoin series from the calculation of Hurts exponents, and (Lahmiri et al., 2018) discussed the use of fractional GARCH models. The authors find long-term dependency in several Bitcoin markets. Charfeddine and Maouchi (2019) presents a series of tests to argue that Bitcoin’s volatility has long memory rather than level shifts memory. See also (Phillip et al., 2019) on this question. Additionally, we have two growing research topics on the cryptocurrency’s literature: the bubble literature and market efficiency tests. Market efficiency and cryptocurrency dynamics are intricately connected to long memory works, as discussed in (Urquhart, 2016) and (Cheah et al., 2018). For the bubble literature, we can cite (Chaim and Laurini, 2019a), (Cheah and Fry, 2015), Corbet et al. (2018), (Hafner, 2018) and (Li et al., 2019). Although our work does not directly discuss these issues, it is important to note that the presence of structures of time-varying moments can be related to regime changes in parameters and aggregation of distributions in time series. As discussed in (Diebold and Inoue, 2001), these processes can be a mechanism that generates patterns compatible with presence of long memory with possible consequences on market efficiency tests and bubbles, as discussed in (Chaim and Laurini, 2019a) and (Chaim and Laurini, 2019b).

We contribute to the literature in two aspects: we have not found any work so far analyzing the dynamics of time-varying higher-order moments in Bitcoin series and the impact on risk management. Despite the use of GAS models for Bitcoin in (Troster et al., 2019), our work innovates in the sense that we are also evaluating our proposal in a context of Value at Risk prediction, and our evaluation is done through the robust inferential method of the Model Confidence Set (Hansen et al., 2010).

The article is divided into five sections, including the present introduction. We have methodology and data presentation in the second and third sections, respectively. Section four presents and discusses our results. Section five concludes.

2 Methodology

This section describes the strategies used for the empirical analysis. The first step consists in estimating Generalized Autoregressive Score (GAS) models with a flexible moment structure, allowing not only the standard deviation to be variant over time, as in the GARCH approach (Bollerslev, 1987). We allow skewness and shape parameters to be time-varying following a conditional autoregressive representation, making third and fourth moments to be time-varying as a consequence.

The second step of our exercise is checking if there is a statistically significant predictive gains in approaching this strategy using the Model Confidence Set (MCS) approach proposed by (Hansen et al., 2010). The idea is to build a set of models with different distributions and parameter specifications and test, from a predictive point of view, if there is a winner model or subset of models with the same statistical performance. For the estimation, we are relying on the library developed by (Ardia et al., 2019). The MCS procedure is based on (Bernardi and Catania, 2018).

2.1 Generalized autoregressive score framework

We use the same notation as (Creal et al., 2013) to describe the framework of Generalized Autoregressive Score models. In the original GAS paper (Creal et al., 2013), the authors formulate a general class of observation-driven time-varying parameter models. Let \(y_t\) the dependent variable. \(f_t\) is the time-varying parameter vector, \(x_t\) is a vector of covariates and \(\theta \) is a vector of static parameters. Assume that \(Y^t = \{y_1, \dots , y_t\}, F^t = \{f_0, f_1, \dots , f_t\}\) and \(X^t = \{x_1, \dots , x_t\}\). The available information set, at time t, consists of \(\{f_t, F_t\}\), such that:

$$\begin{aligned} {\mathcal {F}}_t = \{Y^{t - 1}, F^{t - 1}, X^t\}, \forall t = 1,\dots , n \end{aligned}$$
(1)

They assume that \(y_t\) is generated by the following observation density:

$$\begin{aligned} y_t \sim p(y_t | f_t, {\mathcal {F}}_t ; \theta ) \end{aligned}$$
(2)

The updating mechanism for updating the time-varying parameter \(f_t\) is given by the autoregressive formulation:

$$\begin{aligned} f_{t + 1} = \varvec{\omega } + \sum _{i = 1}^p A_i s_{t - (i - 1)} + \sum _{j = 1}^q B_j f_{t - (j - 1)} \end{aligned}$$
(3)

Here \(\omega \) is a vector of constants, \(A_i\) and \(B_j\) are coefficients matrices, while \(s_t\) is an appropriate function of past data:

$$\begin{aligned} s_t = s_t(y_t, f_t ; \theta ) \end{aligned}$$
(4)

Additionally, the \(A_i\) and \(B_j\) unknown coefficients are functions of \(\theta \): \(\omega = \omega (\theta ), A_i = A_i(\theta )\) and \(B_j = B_j(\theta ), \forall i = 1, \dots , p\) and \(\forall j = 1, \dots , q\). The approach is based on the observation density \(y_t\) for a given \(f_t\). When an observation \(y_t\) is observed, we update the time-varying \(f_t\) to the next period \(t + 1\) using (3):

$$\begin{aligned} s_t&= S_t \cdot \nabla _t \end{aligned}$$
(5)
$$\begin{aligned} \nabla _t&= \frac{\partial \ln {p(y_t | f_t, {\mathcal {F}}_t ; \theta )}}{\partial f_t} \end{aligned}$$
(6)
$$\begin{aligned} S_t&= S(t, f_t, {\mathcal {F}}_t ; \theta ), \end{aligned}$$
(7)

where \(S(\cdot )\) is a matrix function. In our econometric setup, we are considering \(S_t = I\). It is natural to consider a form of scaling that depends on the variance of the score. For example:

$$\begin{aligned} S_t&= {\mathcal {I}}_{t | t - 1}^{-1} \end{aligned}$$
(8)
$$\begin{aligned} {\mathcal {I}}_{t | t - 1}^{-1}&= {\mathbb {E}}_{t - 1} [\nabla _t \nabla _t^{'}] \end{aligned}$$
(9)

The equations set from (2) thought (9) define the Generalized Autoregressive Score model with orders p and q, i.e., GAS (pq). For this particular choice of \(S_t\), GAS model encompasses a large class of traditional models in the literature, as (Bollerslev, 1987) GARCH approach. As mentioned, our estimation is based on (Ardia et al., 2019), by maximum likelihood estimation (ML). A crucial property of GAS models is that given the past information and the static parameter vector, the vector of time-varying parameters is perfectly predictable, and the log-likelihood function can be evaluated via the prediction error decomposition. Asymptotic properties of GAS estimations by maximum likelihood can be found on (Harvey, 2011) and (Blasques et al., 2014). The distributions used in this work for modeling returns are described in the Sect. 2.3.

Although it is possible that at times the process mean varies in time, we do not have consistent evidence for a time-varying mean structure. In the framework of Generalized Autoregressive Score (GAS) models, it is possible and in general trivial to make parameters associated with the non-conditional mean (first moment of the distribution) vary over time, but in our analysis this structure did not add significant gains in the adjustment of Value at Risk. Overall, the apparent persistent deviations from the constant mean and random walk hypothesis for the Bitcoin series seem to be explained by asymmetric processes of conditional volatility and do not actually correspond to permanent changes in the unconditional mean or the presence of some time-varying conditional mean structure, as discussed for example in (Bariviera, 2017), (Aggarwal, 2019) and (Palamalai et al., 2021).

2.2 Forecasting evaluation

To assess the practical relevance of using specifications with time varying higher-order moments, we assess the performance of these models to calculate Value at Risk (VaR), a tail risk measure used in risk management. For that, we will estimate a series of specifications, with fixed and time-varying moments, and we evaluate which models present a statistically significant superior performance, using the Model Confidence Set (MCS) proposed by (Hansen et al., 2010), using the implementation of (Bernardi and Catania, 2018).

The MCS algorithm is about building a set of models such that the best model, from a predictive point of view, is an element of this set given a confidence level. It is an algorithm that sequentially tests the null hypothesis that the models have identical accuracy, constructing the finite sample distributions by mean of a bootstrap procedure, and controlling for the False Discovery Rate, as discussed in (Hansen et al., 2010). Based on an elimination criterion, the MCS algorithm selects the best model or models set.

Let \(Y_t\) be our time series at the time point t and \(\hat{Y}_{i, t}\) the i-th model fit, on t. The first step is to define a loss function \(\ell _{i, t}\) that is associated with the i-th model, such that:

$$\begin{aligned} \ell _{i, t} = \ell (Y_t, \hat{Y}_{i, t}) \end{aligned}$$
(10)

The procedure begins with the determination of the loss differential between M models in two different forms:

$$\begin{aligned} d_{ij, t}&= \ell _{i, t} - \ell _{j, t} \end{aligned}$$
(11)
$$\begin{aligned} d_{i, t}&= \frac{1}{M - 1} \sum _j d_{ij, t} \end{aligned}$$
(12)

where \(\ell _{i, t}\) and \(\ell _{j, t}\) are the losses associated with models i and j. The construction of the null and alternative hypothesis is done as follows:

$$\begin{aligned}&\text {H}_0 : {\mathbb {E}}(d_i) = 0, \forall i, i = 1, \dots , M \end{aligned}$$
(13)
$$\begin{aligned}&\text {H}_A : {\mathbb {E}}(d_i) \ne 0, \exists i, i = 1, \dots , M \end{aligned}$$
(14)

The goal here is investigate the null hypothesis of equal predictive ability between models. Two statistics, \(T_{R, M}\) and \(T_{\mathrm{max}, M}\) will be used.

$$\begin{aligned} T_{R, M}&= \max \{|t_{ij}|\} \end{aligned}$$
(15)
$$\begin{aligned} T_{\mathrm{max}, M}&= \max \{t_i\} \end{aligned}$$
(16)

Here, \(t_{ij}\) and \(t_i\) are defined as:

$$\begin{aligned} t_{ij}&= \frac{\bar{d}_{ij}}{\sqrt{\hat{\text {var}}(\bar{d}_{ij})}} \end{aligned}$$
(17)
$$\begin{aligned} t_i&= \frac{\bar{d}_i}{\sqrt{\hat{\text {var}}(\bar{d}_i)}} \end{aligned}$$
(18)

\(\hat{\text {var}}(\bar{d}_{ij})\) and \(\hat{\text {var}}(\bar{d}_i)\) are estimated using Bootstrap. \(\bar{d}_{ij}\) is (11) sample version and \(\bar{d}_i\) is the sample version of (12).

In the scenario where the null hypothesis is not rejected, all models form the MCS. Otherwise, the algorithm uses the following elimination rule:

$$\begin{aligned} e_{\mathrm{max}, M} = \mathop {\mathrm{arg}\,\mathrm{max}}\limits _{i \in M} \left\{ \frac{\bar{d}_i}{\hat{\text {var}}(\bar{d}_i)}\right\} \end{aligned}$$
(19)

The model with the worst performance is eliminated. Then, the procedure restarts with \(M - 1\) models.

Our main goal is to analyze the importance of assuming time-varying higher moments, and what is the best specification for this process, using a metric of economic utility derived from these specifications. For this, we will evaluate the proposed models by analyzing the predictive performance of the models when applied to risk management problems, which is an especially relevant problem in the management of cryptocurrencies, due to the high volatility and the large price variations observed in these assets. For this, we will analyze the performance in terms of a Value at Risk analysis built using predictive models, through the introduction of a loss function related to this metric in the construction of the Model Confidence Set, as discussed in (Ardia et al., 2016). We also follow the same notation of this work. The main idea is to compare the VaR properties via backtesting procedures. In this type of strategy, the correct coverage of the left tail of Log returns distribution, conditional and unconditional, is checked.

Correct unconditional coverage (UC) of VaR estimations was first considered by (Kupiec, 1995), and a Conditional Coverage (CC) version was proposed by (Christoffersen, 1998). UC considers correct coverage of left tail of the marginal Log return distribution, \(f(r_t)\). CC deals with conditional density, \(f(r_t | {\mathcal {I}}_{t - 1})\). By an inferential perspective, UC looks at the ratio between the number of realized VaR violations observed from the data and the expect number of VaR violations implied by the chosen of confidence level \(\alpha \), during the \(\alpha H\) forecast period. Here, H is the out-of-sample period length.

Due to the considerable number of possible models for VaR prediction, it is possible in more than one model to achieve UC/CC coverage. This makes model comparison techniques necessary in order to choose the best model or set of models. Such choice can be made via loss functions, and this form is especially useful in our analysis since a loss function is necessary for the MCS analysis. As we are analyzing a tail risk measure, a possible class of loss functions can be derived through the use of quantile regression methods. The Quantile Loss (QL) used for Quantile Regressions (Koenker and Bassett, 1978)—it is one of the most frequent choices in the VaR context. Formally, given a VaR prediction at confidence level \(\alpha \) for time \(t + 1\), the \(QL_{t + 1} (\alpha )\) Quantile Loss is defined as:

$$\begin{aligned} QL_{t + 1} (\alpha ) \equiv (\alpha - d_{t + 1}) (r_{t + 1} - VaR_{t + 1} (\alpha )) \end{aligned}$$
(20)

Here, \(d_{t + 1}\) is an indicator function equal to one if \(r_{t + 1} < VaR_{t + 1} (\alpha )\) holds. In our exercise, \(\alpha = 5\%\) and \(1\%\)

Note that QL is an asymmetric loss function, just like in (González-Rivera et al., 2004) and (Bernardi and Catania, 2018), that penalizes more heavily with weight \(1 - \alpha \) the observations for which we observe returns showing VaR exceedance. Models with lower averages are preferred since quantile losses are then averaged over the forecasting period. That is, if \(QL_1/QL_2 < 1\), then model 1 outperforms model 2 and vice versa.

It is feasible to use other loss functions in the performance analysis. It would be possible, for example, to analyze the predictive performance of the different GAS specifications used to predict future volatility, using, for example, the Root Mean Squared Error between predicted volatility and realized volatility, as is quite common in works analyzing the predictive performance of conditional volatility models. However, our central focus is the performance in terms of predictive VaR calculation, and in this aspect the asymmetric Quantile Loss function is adequate as it focuses on the direct performance of the calculated VaR, imposing an asymmetric penalty that is consistent with the practical use of this model in the risk management. Thus, we use a loss function that directly analyzes the objective function of the work.

As discussed in (Abad et al., 2015), there are other loss functions that can be used in the VaR analysis with different weights for deviations from the predicted VaR, such as (Lopez, 1999) quadratic loss function. Our choice for the asymmetric loss function is given by the fact that it penalizes more observations that exceed VaR, which is consistent with the impact in terms of tail risk, where extreme observations exceeding VaR represent more negative impacts for a long position in the asset. (Abad et al., 2015) discuss some robustness properties of VaR assessment results in terms of the loss function used.

2.3 Empirical application

In this article, our empirical exercise consists of the following components:

  • We estimated a benchmark GAS model for Bitcoin daily US dollar Log returns. We assume constant mean (location) and allowed standard deviation, skewness, and shape to be time-varying. For this, we used Skew-Student-t innovations (Fernandez and Steel, 1998). Note that by making the scale (volatility), skewness and shape time-varying for this distribution, we also obtain as a consequence a time-varying kurtosis, which is useful for modeling variations in the risk of extremes in the series. This model is used to illustrate the empirical characteristics of the Bitcoin series in relation to possible changes in conditional high-order moments.

  • Using the MCS algorithm, we evaluate several specifications to forecast one-step ahead the conditional density for Bitcoin returns and construct predictive 5% and 1% Value at Risk measures. We use statistical distributions from (Ardia et al., 2019). The choice of distributions used in the empirical analyses is determined by two main aspects. The first aspect is empirical fit characteristics for financial time series data. We start from the Gaussian distribution, which is the benchmark distribution in finance. Then we moved on to a distribution with heavy tails (Student-t), and added the possibility of skewness in these two distributions (Skew-Normal and Skew-t), and then distributions with additional shape parameters (Asymmetric Student-t with one and two decay parameters and the Asymmetric Laplace Distribution) that allow more flexible modeling for skewness and kurtosis. The second aspect is related to computational complexity and stability. Some distributions used in the modeling of financial time series present difficulties in computational representation, such as Stable distributions (Nolan, 2003), and other distributions present stability and convergence problems in the estimation process, such as the generalized hyperbolic distribution, an interesting class of distributions that allow the presence of skewness and kurtosis. (Chu et al., 2015) discusses the use of this distribution in modeling the distribution of Bitcoin returns. The univariate distributions used are listed in Table 1. We analyze these distributions with fixed and time-varying parameters.

  • To perform the MCS analysis we need to define a suitable loss function based on the predictive performance of the analyzed models. In this article we choose the same strategy as (Ardia et al., 2016), i.e., using the GAS models to calculate the one-step ahead predictive Value-at-Risk (VaR). Therefore, we were able to build a loss matrix that contains the VaR losses associated to each of the different GAS specifications, using the asymmetric loss function Quantile Loss ((González-Rivera et al., 2004) and (Bernardi et al., 2014)) function discussed in the previous section.

  • In this paper, we considered one-step-ahead \(5\%\) and \(5\%\) VaR predictions for the last 365 and 1095 days in the sample, corresponding to approximately one and three years of forecasted observations. In this specification, we use two ways to build the predictions, using moving (rolling) and recursive windows in the definition of the sample used in the estimation of the models. For the moving and recursive windows, we also use two specifications for the change in the estimation window, using 1 and 22 days in the definition of the new estimation sample. For each model, we also evaluated specifications with all moments fixed in time, with time-varying scale parameter and fixed skewness and shape parameters, and time-varying scale and the other parameters (skewness, shape 1 and shape 2, depending on the distribution used) varying in time, using a first-order GAS dynamic for the time-varying parameters. VaR is calculated directly using the quantile of the specified conditional distribution. The results are presented in Sect. 3.

Table 1 Table shows the analyzed statistical distributions

3 Data

As discussed in the introduction we are using in this study Bitcoin US dollar prices data. The source of data is coinmetricsFootnote 1. As discussed in (Urquhart, 2021), Coinmetrics is a trusted source of cryptocurrency data, as it only uses data from reputable exchanges and uses a set of 35 filtering criteria to eliminate illiquid and unreliable transactions, and this data has been used in several other studies on cryptocurrencies, such as (Liu and Tsyvinski, 2020), (Tsang and Yang, 2021), (Conlon et al., 2021) and (Urquhart, 2021), among other articles.

We use daily data with regular observations over time as it is the most common way of calculating performance and risk in financial assets. It is possible to work with irregularly spaced data from Bitcoin tick-by-tick transactions, and this is a proposed extension to our article. A discussion of GAS models in high-frequency and irregularly spaced data contexts can be found in (Buccheri et al., 2021).

The 2784 days in sample covers the period from January 2013 thought August 2020. The return \(r_t\) is calculated as \(r_t = \log (P_t / P_ {t - 1})\), where \(P_t\) is Bitcoin US dollar price on day t. As in (Chaim and Laurini, 2018), we presented in Table 2 the descriptive statistics of Gold, S &P500 and USD-EUR exchange rate returns for comparison issues. Average return is close to zero. An interesting result is the 5.70% standard deviation, a substantially higher value compared to traditional assets, such as stocks and market indexes. Skewness is negative: formal interpretation of this is that the median is greater than the average. Kurtosis has a high value, but it is not the largest. Figure 1 shows the time series of prices and returns of Bitcoin used in our study. We can see jumps in the price series and abrupt variations in the return series. The most recent is easy to interpret: it is the variation caused by the coronavirus crisis.

Table 2 Table shows descriptive statistics for Bitcoin, Gold, S &P 500 index and EUR-USD exchange rate daily Log returns
Fig. 1
figure 1

Figure displays Bitcoin US dollar prices in top panel (a) and daily Log returns in bottom panel (b). Price data was collected from coinmetrics.io

To allow an interpretation of Bitcoin price dynamics, we have retrieved from news and literature the most relevant Bitcoin’s ups and downs in recent years. Bitcoin started 2013 for around US$ 13.033. The beginning of 2013 was a bullish phase, surpassing its historical high of US$ 32.00. In November 2013, we observe a dramatic behavior with an 87% decline in the price. At the end of 2013 year, the price was close to US$ 1,200.00. In November 2017, 5 years after the four-digit Bitcoin, the cryptocurrency soared to US$ 10,000.00 and went to US$ 20,000.00 before it lost strength. Historically, the five biggest daily falls in history are, respectively: on 3/12/2020 (-37.1%), 12/18/2013 (-22.9%), 14/01/2015 (-20.45%), 12/06/2013 (-20.43%) and 12/16/ 2013 (-19.81%), and the biggest daily price returns are, respectively: 17/04/2013 (30.74%), 18/11/2013 (30.40%), 12/04/2013 (29.61%), 19/ 12/2013 (26.46%) and 20/07/2017-07-20 (22.40%).

Fig. 2
figure 2

Figure displays the posterior probability of jumps for the conditional volatility and mean using the double jump model of (Chaim and Laurini, 2018)

To get an insight into the presence of jumps in the Bitcoin series, we used the stochastic volatility model with jumps in the conditional mean and volatility proposed in (Chaim and Laurini, 2018) to perform the statistical dating of jumps. This model uses a compound binomial process framework to incorporate the impact of jumps on the first and second conditional moments. The jumps in this model are interpreted as the realization of two Bernoulli distributions for each observation in the sample, where the parameter associated with the Bernoulli processes indicates the probability of occurrence of a jump in the mean or in the conditional volatility of the process on each day. In this model, jumps in the mean indicate an occurrence of a discontinuous variation in the return of the series, being assumed that the jumps in the mean only affect the return on the day of occurrence of the jump, while the jumps in volatility indicate a permanent change in the non-conditional mean of the conditional volatility. This model is estimated using Bayesian methods using MCMC, whose details are not repeated here for reasons of space but can be seen in the original article (Chaim and Laurini, 2018) or in the multivariate version proposed in (Chaim and Laurini, 2019b).

Box 1: Jump dating using the (Chaim and Laurini, 2018) model.

Dates with posterior probability of jumps larger than 0.5 in volatility:

2013-01-21, 2013-02-27, 2013-03-16, 2013-04-25, 2013-08-14, 2013-10-27, 2014-04-28, 2014-08-25, 2014-12-21, 2015-02-15,2015-02-25,2015-05-27, 2016-08-01, 2017-01-24, 2018-06-23, 2019-04-23.

Dates with posterior probability of jumps larger than 0.5 in mean:

2013-10-02, 2013-10-03, 2014-03-03, 2015-08-18, 2016-05-28, 2017-01-11, 2019-04-02, 2020-03-12.

Figure 2 shows the estimated posterior probability for the jumps in the mean and in the conditional volatility of the process for the analyzed sample. We can observe a considerable number of days with probabilities greater than 0.5 for both mean and volatility. Box 1 lists the days that exceed this threshold, where we can observe 16 days with estimated jumps in conditional volatility, and 8 days with jumps in the average, according to the model of (Chaim and Laurini, 2018).

4 Results

In this section, we present and discuss the results obtained. First, we show the results for benchmark GAS model with the Skew-Student-t (sstd) innovations and time-varying scale, skewness and shape parameters. We present the estimated model parameters and also the estimated conditional moments plots. Finally, we show the result of the MCS algorithm and present the winning models.

Table 3 Table reports the estimation results for the specification with sstd innovations

In Table 3 we present the estimation results from sstd innovations. First column shows the parameter estimates, second column shows the estimated standard error, third and fourth column show the t value and p value statistics, respectively. In all estimations we use the Broyden–Fletcher–Goldfarb–Shanno (BFGS) (Fletcher, 1987) algorithm for maximize the likelihood functions derived from the GAS representation of the models. The BFGS algorithm is a general purpose and computationally and memory efficient Quasi-Newton optimization method. The descent direction in determined by preconditioning the gradient with the curvature information and using an iterative approximation of the Hessian matrix of the objective function using a generalized secant method (Fletcher, 1987). The BFGS is widely used in numerical maximization in econometric problems due to these good computational properties.

Figure 3 plots Bitcoin absolute Log daily returns (black), as a proxy of the unobserved true volatility, and model predicted volatility (red). Graphically, we see a good adjustment of the estimate in relation to our volatility proxy. It is clearly a series with large variations, some historically linked to our analysis made in the previous section.

The Fig. 4 is our main graph of interest. We have in panel (a) conditional skewness and in panel (b) the implied model conditional kurtosis, derived from the shape parameter, both moments time-varying. Skewness presents a relevant variation in time, being negative in most of the sample. Kurtosis, on the other hand, showed more expressive time variation, with some periods with very extreme values, which can be related to the presence of jumps in these moments. This indicates that the Bitcoin return series are not only exposed to considerable tail risk, but also that this risk is time varying.

Fig. 3
figure 3

Absolute Log returns (black color) and predicted volatility (red). These are the sstd innovations estimation result

Fig. 4
figure 4

Figure displays Bitcoin return’s predicted skewness in top panel (a) and daily predicted kurtosis in bottom panel (b)

We also performed an in-sample analysis of the performance of the distributions in adjusting conditional volatility for the entire sample analyzed. We assume constant mean and all other parameters (skewness and shape parameters) time-varying for all distributions. In this specification we also include the estimation of a GARCH (1,1) model, which would be equivalent to a GAS model for a Gaussian distribution with a time-varying scale parameter. Figure 5 shows predicted volatility using garch, sstd, std and ald distributions for the Bitcoin returns.

Table 4 Table reports the Mean Error (ME), Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE) of the conditional volatility estimated by each model in relation to absolute returns as a proxy for volatility

Table 4 shows the results of this analysis. We can observe that the specifications with sstd, std and ald distributions show better results in terms of ME, RMSE and MAE in relation to the GARCH model traditionally used in conditional volatility estimates. In the analysis for the full sample, the sstd model obtains the best performance in terms of mean error, and the model with std distribution the best results in terms of RMSE and MAE. These results support the use of models with alternative distributions and time-varying high-order moments in risk estimation, measured by latent volatility, associated with the series of Bitcoin returns.

Fig. 5
figure 5

Absolute Log returns (black color) and predicted volatility of garch, sstd, std and ald distributions

Tables 5, 6, 7 and 8 show the MCS algorithm results comparing the alternative innovation distributions, updating window size and method for the predictive Value at Risk analysis. Tables 5, 6 refer to the analysis for the last 1095 observations (VaR levels of 5% and 1% respectively), while Tables 7, 8 for the last 365 observations (VaR levels of 5% and 1% respectively). We report the model rankings in the final Model Confidence Set and the test statistic for each model, based on 5,000 bootstrap replications, a confidence level of 0.2 and the Quantile Loss function discussed in Sect. 2.2, for a total of 76 tested specifications.

The predictive results for the last 1095 observations in the sample indicate a relevant reduction in the number of models in the model confidence set. For VaR with a level of 5%, 32 models are eliminated from the final MCS, and for the level of 1%, 31 models are eliminated. It is notable that Gaussian models with constant volatility are eliminated in this structure, as expected. Other relevant facts are that there are no notable differences between the use of moving or recursive estimates, and also between the use of windows with 1 or 22 days in the re-estimation. For the VaR of 5%, the ranking points to a better point performance for the specification using the snorm specification, although it is important to remember that all models in the MCS are statically equivalent in terms of performance. In this case, it is also notable that many eliminated models are given by specifications based on the std and sstd distribution.

Table 5 Table shows the MCS algorithm result for the last 1095 observations in the sample, with Var Level of 5%
Table 6 Table shows the MCS algorithm result for the last 1095 observations in the sample, with Var Level of 1%

The results for the VaR of 1% level for the analysis for the last 1095 observations are quite similar to the VaR of 5%, with the best positions in the ranking for the snorm model, and without notable differences in the use of moving versus recursive estimates, and also in the use of 1 or 22 observations in the definition of new estimation windows. In general, in the ordering of models within the MCS there is a predominance of models with time-varying scale parameters, but many statistically equivalent specifications impose higher-order parameters fixed in time, and even including in the final MCS models with fixed scale and shape parameters in time, like the std model at position 41 in this ranking. Thus, the results seem to indicate some punctual gain in the use of models with higher-order moments varying in time, but with a performance statistically equivalent to that of models with higher-order moments fixed in time. In part we can interpret this result due to the nature of the series of cryptocurrency returns, with emphasis on Bitcoin, which are characterized by extreme changes in the pattern of risk over time and the presence of price jumps.

Table 7 Table shows the MCS algorithm result for the last 365 observations in the sample, with Var Level of 5%

When observing the results obtained by looking at the predictive analysis for the last 365 observations in the sample, shown in Tables 7 and 8, we note similarities and differences from the previous analysis. There are no notable gains between moving or recursive estimates, and also between 1 or 22 observations in the definition of new windows, and again the best models in the ranking assume structures with scale parameters and moments of a higher-order that vary in time. The reduction in MCS is smaller for this sample, with the elimination of only 14 models for VaR of 5% and 25 for VaR of 1%. This can be partly explained by the smaller sample size used in this comparison.

Again, the results indicate the importance of time-varying conditional volatility, especially for the use of the Gaussian distribution. In contrast in this sample, the best point performances are obtained by the std distribution, for both VaR levels of 5% and 1%, as opposed to the results of the previous sample. An interesting result is that the less traditional distributions ast, ast1 and ald, and the distribution with more parameters (ast1) has intermediate performance in all analyses. In this regard, we can confirm the relevant aspect of the large dynamic variation in the Bitcoin series, indicating that it is difficult to select a better model or specification for this series, which again reinforces the complexity in the management of cryptocurrency risk.

Table 8 Table shows the MCS algorithm result for the last 365 observations in the sample, with Var Level of 1%

This analysis brings evidence that distributions with varying skewness and kurtosis bring predictive gains in risk management, although it is difficult to establish a better general specification, since MCS results indicate many models with statistically equivalent performance. We can also observe that the results depend on the sampling period used, and that there are no relevant differences between moving or recursive estimates or short (1 day) or longer (22) windows in the definition of the new estimation sample.

To perform a comparison of the results obtained for the series of bitcoin with traditional assets, we replicated the same analysis using returns from S &P500 index in the same sample period. Table 9 shows the results of the MCS applied to the VaR prediction at 5% level, applied to the last 1095 observations of the sample, making the results comparable to Table 5. The results for VaR (1%) have the same overall pattern and therefore not copied here. Compared to the results obtained in the analysis for Bitcoin, we can see that there are more models included in the final MCS, showing that the specific characteristics of the used distributions and time-varying parameter specifications are less important for the S &P500 series. We can also observe that the models with the best positions in the final MCS for this series are models of the Asymmetric Student-t distribution with one decay parameter, but always assuming skewness and/or shape parameters fixed in time, while in the same analysis for the Bitcoin the best models are from the Skew-normal family with time-varying scale and skewness parameters, and so it seems that making higher-order moments is more relevant for Bitcoin than for a more traditional asset like the SP500, but remembering that all the models included in the final MCS are statistically equivalent and this analysis is based on the point results obtained with the specific sample used.

Table 9 Table shows the MCS algorithm result for the last 1095 observations in the sample for the S &P500 index, with Var Level of 5%

5 Conclusion

Cryptocurrencies are a new and relevant class of financial assets, with empirical characteristics quite different from traditional assets such as stocks or bonds. These assets have more complex risk dynamics, with the presence of high volatility, relevant asymmetries and especially a high tail risk, associated with extreme variations in returns. These characteristics make the empirical modeling and risk management associated with these assets challenging problems.

In this work, we analyze the dynamics of Bitcoin returns, using a structure based on Generalized Autoregressive Score (GAS) models that allow the representation and inference of models with time-varying scale, skewness, and shape parameters, for a wide range of alternative distributions. This structure effectively allows the construction of models with conditional volatility, asymmetry, and kurtosis structures, generalizing the class of conditional volatility traditionally used in the modeling of financial time series.

To verify the possible predictive gains from the use of these models, we conduct analyses in and out of sample, comparing conditional volatility measures and the comparative performance of these models in the predictive calculation of the most important tail risk measure, Value at Risk. Using an asymmetric loss function derived from the VaR estimation results, we compare these models through the construction of Model Confidence Sets, using different configurations with fixed and time-varying parameters, two window update configurations and also the use of moving and recursive estimates.

The results obtained indicate some gains in the use of models with a dynamic structure for higher-order moments. In the in-sample analysis for conditional volatility, the use of a Student-t distribution with a time-varying shape parameter, inducing an autoregressive conditional kurtosis structure, led to the best results in terms of root mean squared error and mean absolute in relation to absolute returns, used as a proxy for latent volatility.

In the out-of-sample analyses comparing the predictive performance of the analyzed distributions, with fixed parameters and variations over time, the results are more heterogeneous. While in general the results indicate specific gains in the use of models with time-varying skewness and shape parameters, the final confidence set models also include models with some fixed parameters related to higher-order moments, and different orderings depending on the analyzed period. The results support the importance of models with conditional volatility in the estimation of Value at Risk, a traditional result of the risk modeling literature, but also indicate possible gains in the use of models with flexible structures for higher-order moments in the estimation of the risk when we look at the point estimates.

We believe that the GAS methodology can be also used in three different lines of research in cryptocurrency analysis. The first direct extension is the use of multivariate GAS models for risk assessment in cryptocurrency portfolios. A second direct extension is the use of GAS models in the analysis of high-frequency data of cryptocurrencies, a particularly interesting analysis due to the specific mechanisms of decentralized and asynchronous trading in these markets, using the formulation proposed in (Buccheri et al., 2021). A third possible extension is the use of GAS models with the possibility of regime switching (Bernardi and Catania, 2019), an alternative formulation of models with changing parameters that can capture some stylized facts of the cryptocurrency market, as discussed in (Chaim and Laurini, 2019b).