Abstract
Cryptocurrencies represent a new and important class of investments but are associated with asymmetric distributions and extreme price changes. We use a modeling structure where higher-order moments (scale, skewness and kurtosis) are time-varying, and additionally we used nontraditional innovations distributions to study the return series of the most important cryptocurrency, Bitcoin. Based on the estimation of a series of Generalized Autoregressive Score (GAS) models, we compare predictive performance using a loss function based on Value at Risk performance.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Our work presents an analysis of the persistence properties in Bitcoin’s higher-order moments, using a modeling framework based on the representation of conditional densities with time-varying parameters, through the structure of Generalized Autoregressive Score models (Creal et al., 2013). This structure allows a common representation for several models of time-varying parameters, including as particular case GARCH conditional variance models, but also allowing autoregressive dynamics for the parameters of the distribution associated with higher-order moments, generating conditional asymmetry and kurtosis structures.
This common structure is especially relevant to analyze the dynamics of the conditional distribution of Bitcoin returns, characterized by a dynamic of high volatility, extreme price variations and thus a very complex risk structure. There are several differences between Bitcoin and traditional assets, as discussed by (Härdle et al., 2020) and (Petukhina et al., 2020). Similar to other financial assets, cryptocurrencies returns present time-varying conditional volatility structures. However, these assets display a greater asymmetry in returns and the presence of more extreme values than traditional financial assets, such as stocks and index.
Bitcoin and other cryptocurrencies are characterized by the presence of negative skewness and other relevant deviations from the class of symmetric distributions, as discussed in (Cerqueti et al., 2020). These characteristics are especially important in investment decisions, since the presence of different probabilities for gains and losses is a violation of a fundamental assumption of symmetry in the distribution of returns assumed in many portfolios’ allocation and risk management methods (Lezmi et al., 2018; Boudt et al., 2020).
These characteristics can be explained in part by the greater presence of jumps in the prices of these assets. Jumps, defined as a variation of large magnitude in prices and with low probability of being generated by a continuous process, are notably higher in cryptocurrencies, compared to traditional assets, such as stocks, index and other financial instruments, as discussed in (Chaim and Laurini, 2018), (Scaillet et al., 2018) and (Bouri et al., 2020). Jump processes generate asymmetry and excess kurtosis in the distribution of financial assets, due to the impact of extreme values on third- and fourth-order moments, as discussed for example in (Manganelli et al., 2008).
The presence of jumps is an important factor in the design of risk management mechanisms and portfolio allocation, making these procedures more complex when compared to traditional assets with low occurrence of jumps. Hedge mechanisms are more limited due to the lower availability of derivative markets for these assets, especially high liquidity options. Futures contracts on Bitcoins were introduced in the Chicago Mercantile Exchange in December 2017, and options in January 2020, as discussed by (Geman and Price, 2020), but they do not yet have sufficient liquidity to manage the extreme risks associated with these assets. See the discussion in (Trimborn et al., 2019) about liquidity aspects in cryptocurrency portfolio allocation, and (Hou et al., 2019) for methods for pricing cryptocurrency options.
Motivated by these features, we estimate Generalized Autoregressive Score (GAS) (Creal et al., 2013) models with distributions that admits time varying higher moments, using an autoregressive structure for volatility (scale), skewness and shape parameters for the Bitcoin return series. Our main objective is to verify if there are predictive gains in a risk management application using this approach. Using the Model Confidence Set (MCS) (Hansen et al., 2010) algorithm for model selection, we explore a series of alternative distributions and investigate how the models perform in predicting Value at Risk (VaR) through an asymmetric quantile loss function.
The use of a modeling based on the GAS representation for time-varying conditional densities is a way of simultaneously capturing the presence of conditional volatility, asymmetry and kurtosis with several alternative distributions. Furthermore, it is an indirect way of incorporating the presence of jumps in the conditional distribution of returns, through the impact of jumps on the asymmetry and kurtosis of the process, without the need to directly incorporate the jump structure in the process modeling.
Models with the direct incorporation of the jump structure can be formulated using various representations, for example using structures based on compound binomial processes (e.g., (Chaim and Laurini, 2018) and (Chaim and Laurini, 2019b)), or models using Poisson processes (e.g., (Scaillet et al., 2018) and (Chen and Huang, 2021)). However, these models normally assume that the jump processes are (conditionally) independent, and thus the occurrence of a jump does not affect the probability of a jump occurring in the following periods. This is an important limitation, since the possible persistence in the occurrence of jumps is not incorporated in these models.
However, there is relevant evidence against the hypothesis of independence in the jumps in the Bitcoin series, as pointed out by (Scaillet et al., 2018) using tests for clustering in jumps. It is possible to conjecture that the presence of dependent jumps can explain the presence of autoregressive asymmetry structures and conditional kurtosis in financial time series, by making the impact of extreme values persistent in the third- and fourth-order moments, analogous to the impact of autoregressive structures in the second-order moment, as captured by the GARCH and stochastic volatility models.
The relevant question is how to incorporate the presence of predictability/persistence in the jumps of the process. In (Scaillet et al., 2018), this jump predictability is indirectly modeled through a binomial regression model using a series of covariates, and thus is a way of estimating the probability of occurrence of a jump in a next period, and not directly the impact of the jump on the returns, which is necessary for risk management procedures, such as the calculation of Value-at-Risk. Another alternative way would be to use a non-homogeneous Poisson model, where the probability of occurrence of jumps is determined by exogenous covariates. This structure has two limitations — the first is the need to obtain relevant covariates for this estimation, which increases the information set needed for risk modeling. The second limitation is that these models do not incorporate the direct possibility of autoregressive dependence on jumps, where the occurrence of a jump in one period directly affects the possibility of jumps in the following periods. The incorporation of direct dependency in the jump occurrence can be done, for example, using self-exciting Hawkes process (Hawkes, 2018) or Poisson models with stochastic intensity, known as Cox processes (Cox, 1955). The relationship between Hawkes, Cox and other processes is discussed in (Jang and Oh, 2021).
Although these two formulations are interesting, they introduce considerable difficulties in the process of modeling returns and risk, since the use of these models to represent jumps in returns is complex due to jumps being a latent variable, that is, not directly observed, since we only have the series of observed returns. Thus, the use of these models for the modeling of jump activity adds a great complexity to the estimation process, since in these cases the likelihood function depends on latent processes and thus cannot be evaluated directly.
The modeling proposed in this article avoids these complexities by directly modeling the impact of possible jumps and the persistence of this process on volatility, skewness and conditional kurtosis structures. Thus, we do not need to use complex inference methods, for example Bayesian inference using Markov Chain Monte Carlo methods, for the latent intensity process of the jump process, since we use the representation of generalized score models for conditional densities with time-varying parameters and thus the function of likelihood can be directly evaluated. This formulation directly captures the persistence in the higher-order moments of the Bitcoin series, and thus allows the measurement of the risk in this asset, measured by the Value-at-Risk derived from the conditional density obtained by the GAS representation. So, we have a direct way of not only capturing the existence of skewness and kurtosis in the Bitcoin series, but making these moments vary in time.
The empirical literature on Bitcoin has expanded greatly in recent years. However, attention has been focused on patterns of conditional volatility, the presence of outliers in the return series and there is a vast literature on long memory patterns for this asset. Regarding the dynamics of volatility, many studies explored models of the GARCH family. (Dyhrberg, 2016a) and (Dyhrberg, 2016b) argues that it is possible to build hedge strategies from Bitcoin against stocks using these models. To find the best fit for the Bitcoin series (Katsiampa, 2017) incorporates a long-term volatility component after comparing several GARCH models. Analyzing several cryptocurrencies (Baur and Dimpfl, 2018) suggests that the best model for conditional volatility for cryptocurrencies should be an integrated GARCH. The authors argue this result based on likelihood criteria and unconditional and conditional Value at Risk coverage tests.
Long-memory analysis for cryptocurrencies can be found in a series of papers. (Bariviera et al., 2017) analyzed stylized facts from the Bitcoin series from the calculation of Hurts exponents, and (Lahmiri et al., 2018) discussed the use of fractional GARCH models. The authors find long-term dependency in several Bitcoin markets. Charfeddine and Maouchi (2019) presents a series of tests to argue that Bitcoin’s volatility has long memory rather than level shifts memory. See also (Phillip et al., 2019) on this question. Additionally, we have two growing research topics on the cryptocurrency’s literature: the bubble literature and market efficiency tests. Market efficiency and cryptocurrency dynamics are intricately connected to long memory works, as discussed in (Urquhart, 2016) and (Cheah et al., 2018). For the bubble literature, we can cite (Chaim and Laurini, 2019a), (Cheah and Fry, 2015), Corbet et al. (2018), (Hafner, 2018) and (Li et al., 2019). Although our work does not directly discuss these issues, it is important to note that the presence of structures of time-varying moments can be related to regime changes in parameters and aggregation of distributions in time series. As discussed in (Diebold and Inoue, 2001), these processes can be a mechanism that generates patterns compatible with presence of long memory with possible consequences on market efficiency tests and bubbles, as discussed in (Chaim and Laurini, 2019a) and (Chaim and Laurini, 2019b).
We contribute to the literature in two aspects: we have not found any work so far analyzing the dynamics of time-varying higher-order moments in Bitcoin series and the impact on risk management. Despite the use of GAS models for Bitcoin in (Troster et al., 2019), our work innovates in the sense that we are also evaluating our proposal in a context of Value at Risk prediction, and our evaluation is done through the robust inferential method of the Model Confidence Set (Hansen et al., 2010).
The article is divided into five sections, including the present introduction. We have methodology and data presentation in the second and third sections, respectively. Section four presents and discusses our results. Section five concludes.
2 Methodology
This section describes the strategies used for the empirical analysis. The first step consists in estimating Generalized Autoregressive Score (GAS) models with a flexible moment structure, allowing not only the standard deviation to be variant over time, as in the GARCH approach (Bollerslev, 1987). We allow skewness and shape parameters to be time-varying following a conditional autoregressive representation, making third and fourth moments to be time-varying as a consequence.
The second step of our exercise is checking if there is a statistically significant predictive gains in approaching this strategy using the Model Confidence Set (MCS) approach proposed by (Hansen et al., 2010). The idea is to build a set of models with different distributions and parameter specifications and test, from a predictive point of view, if there is a winner model or subset of models with the same statistical performance. For the estimation, we are relying on the library developed by (Ardia et al., 2019). The MCS procedure is based on (Bernardi and Catania, 2018).
2.1 Generalized autoregressive score framework
We use the same notation as (Creal et al., 2013) to describe the framework of Generalized Autoregressive Score models. In the original GAS paper (Creal et al., 2013), the authors formulate a general class of observation-driven time-varying parameter models. Let \(y_t\) the dependent variable. \(f_t\) is the time-varying parameter vector, \(x_t\) is a vector of covariates and \(\theta \) is a vector of static parameters. Assume that \(Y^t = \{y_1, \dots , y_t\}, F^t = \{f_0, f_1, \dots , f_t\}\) and \(X^t = \{x_1, \dots , x_t\}\). The available information set, at time t, consists of \(\{f_t, F_t\}\), such that:
They assume that \(y_t\) is generated by the following observation density:
The updating mechanism for updating the time-varying parameter \(f_t\) is given by the autoregressive formulation:
Here \(\omega \) is a vector of constants, \(A_i\) and \(B_j\) are coefficients matrices, while \(s_t\) is an appropriate function of past data:
Additionally, the \(A_i\) and \(B_j\) unknown coefficients are functions of \(\theta \): \(\omega = \omega (\theta ), A_i = A_i(\theta )\) and \(B_j = B_j(\theta ), \forall i = 1, \dots , p\) and \(\forall j = 1, \dots , q\). The approach is based on the observation density \(y_t\) for a given \(f_t\). When an observation \(y_t\) is observed, we update the time-varying \(f_t\) to the next period \(t + 1\) using (3):
where \(S(\cdot )\) is a matrix function. In our econometric setup, we are considering \(S_t = I\). It is natural to consider a form of scaling that depends on the variance of the score. For example:
The equations set from (2) thought (9) define the Generalized Autoregressive Score model with orders p and q, i.e., GAS (p, q). For this particular choice of \(S_t\), GAS model encompasses a large class of traditional models in the literature, as (Bollerslev, 1987) GARCH approach. As mentioned, our estimation is based on (Ardia et al., 2019), by maximum likelihood estimation (ML). A crucial property of GAS models is that given the past information and the static parameter vector, the vector of time-varying parameters is perfectly predictable, and the log-likelihood function can be evaluated via the prediction error decomposition. Asymptotic properties of GAS estimations by maximum likelihood can be found on (Harvey, 2011) and (Blasques et al., 2014). The distributions used in this work for modeling returns are described in the Sect. 2.3.
Although it is possible that at times the process mean varies in time, we do not have consistent evidence for a time-varying mean structure. In the framework of Generalized Autoregressive Score (GAS) models, it is possible and in general trivial to make parameters associated with the non-conditional mean (first moment of the distribution) vary over time, but in our analysis this structure did not add significant gains in the adjustment of Value at Risk. Overall, the apparent persistent deviations from the constant mean and random walk hypothesis for the Bitcoin series seem to be explained by asymmetric processes of conditional volatility and do not actually correspond to permanent changes in the unconditional mean or the presence of some time-varying conditional mean structure, as discussed for example in (Bariviera, 2017), (Aggarwal, 2019) and (Palamalai et al., 2021).
2.2 Forecasting evaluation
To assess the practical relevance of using specifications with time varying higher-order moments, we assess the performance of these models to calculate Value at Risk (VaR), a tail risk measure used in risk management. For that, we will estimate a series of specifications, with fixed and time-varying moments, and we evaluate which models present a statistically significant superior performance, using the Model Confidence Set (MCS) proposed by (Hansen et al., 2010), using the implementation of (Bernardi and Catania, 2018).
The MCS algorithm is about building a set of models such that the best model, from a predictive point of view, is an element of this set given a confidence level. It is an algorithm that sequentially tests the null hypothesis that the models have identical accuracy, constructing the finite sample distributions by mean of a bootstrap procedure, and controlling for the False Discovery Rate, as discussed in (Hansen et al., 2010). Based on an elimination criterion, the MCS algorithm selects the best model or models set.
Let \(Y_t\) be our time series at the time point t and \(\hat{Y}_{i, t}\) the i-th model fit, on t. The first step is to define a loss function \(\ell _{i, t}\) that is associated with the i-th model, such that:
The procedure begins with the determination of the loss differential between M models in two different forms:
where \(\ell _{i, t}\) and \(\ell _{j, t}\) are the losses associated with models i and j. The construction of the null and alternative hypothesis is done as follows:
The goal here is investigate the null hypothesis of equal predictive ability between models. Two statistics, \(T_{R, M}\) and \(T_{\mathrm{max}, M}\) will be used.
Here, \(t_{ij}\) and \(t_i\) are defined as:
\(\hat{\text {var}}(\bar{d}_{ij})\) and \(\hat{\text {var}}(\bar{d}_i)\) are estimated using Bootstrap. \(\bar{d}_{ij}\) is (11) sample version and \(\bar{d}_i\) is the sample version of (12).
In the scenario where the null hypothesis is not rejected, all models form the MCS. Otherwise, the algorithm uses the following elimination rule:
The model with the worst performance is eliminated. Then, the procedure restarts with \(M - 1\) models.
Our main goal is to analyze the importance of assuming time-varying higher moments, and what is the best specification for this process, using a metric of economic utility derived from these specifications. For this, we will evaluate the proposed models by analyzing the predictive performance of the models when applied to risk management problems, which is an especially relevant problem in the management of cryptocurrencies, due to the high volatility and the large price variations observed in these assets. For this, we will analyze the performance in terms of a Value at Risk analysis built using predictive models, through the introduction of a loss function related to this metric in the construction of the Model Confidence Set, as discussed in (Ardia et al., 2016). We also follow the same notation of this work. The main idea is to compare the VaR properties via backtesting procedures. In this type of strategy, the correct coverage of the left tail of Log returns distribution, conditional and unconditional, is checked.
Correct unconditional coverage (UC) of VaR estimations was first considered by (Kupiec, 1995), and a Conditional Coverage (CC) version was proposed by (Christoffersen, 1998). UC considers correct coverage of left tail of the marginal Log return distribution, \(f(r_t)\). CC deals with conditional density, \(f(r_t | {\mathcal {I}}_{t - 1})\). By an inferential perspective, UC looks at the ratio between the number of realized VaR violations observed from the data and the expect number of VaR violations implied by the chosen of confidence level \(\alpha \), during the \(\alpha H\) forecast period. Here, H is the out-of-sample period length.
Due to the considerable number of possible models for VaR prediction, it is possible in more than one model to achieve UC/CC coverage. This makes model comparison techniques necessary in order to choose the best model or set of models. Such choice can be made via loss functions, and this form is especially useful in our analysis since a loss function is necessary for the MCS analysis. As we are analyzing a tail risk measure, a possible class of loss functions can be derived through the use of quantile regression methods. The Quantile Loss (QL) used for Quantile Regressions (Koenker and Bassett, 1978)—it is one of the most frequent choices in the VaR context. Formally, given a VaR prediction at confidence level \(\alpha \) for time \(t + 1\), the \(QL_{t + 1} (\alpha )\) Quantile Loss is defined as:
Here, \(d_{t + 1}\) is an indicator function equal to one if \(r_{t + 1} < VaR_{t + 1} (\alpha )\) holds. In our exercise, \(\alpha = 5\%\) and \(1\%\)
Note that QL is an asymmetric loss function, just like in (González-Rivera et al., 2004) and (Bernardi and Catania, 2018), that penalizes more heavily with weight \(1 - \alpha \) the observations for which we observe returns showing VaR exceedance. Models with lower averages are preferred since quantile losses are then averaged over the forecasting period. That is, if \(QL_1/QL_2 < 1\), then model 1 outperforms model 2 and vice versa.
It is feasible to use other loss functions in the performance analysis. It would be possible, for example, to analyze the predictive performance of the different GAS specifications used to predict future volatility, using, for example, the Root Mean Squared Error between predicted volatility and realized volatility, as is quite common in works analyzing the predictive performance of conditional volatility models. However, our central focus is the performance in terms of predictive VaR calculation, and in this aspect the asymmetric Quantile Loss function is adequate as it focuses on the direct performance of the calculated VaR, imposing an asymmetric penalty that is consistent with the practical use of this model in the risk management. Thus, we use a loss function that directly analyzes the objective function of the work.
As discussed in (Abad et al., 2015), there are other loss functions that can be used in the VaR analysis with different weights for deviations from the predicted VaR, such as (Lopez, 1999) quadratic loss function. Our choice for the asymmetric loss function is given by the fact that it penalizes more observations that exceed VaR, which is consistent with the impact in terms of tail risk, where extreme observations exceeding VaR represent more negative impacts for a long position in the asset. (Abad et al., 2015) discuss some robustness properties of VaR assessment results in terms of the loss function used.
2.3 Empirical application
In this article, our empirical exercise consists of the following components:
-
We estimated a benchmark GAS model for Bitcoin daily US dollar Log returns. We assume constant mean (location) and allowed standard deviation, skewness, and shape to be time-varying. For this, we used Skew-Student-t innovations (Fernandez and Steel, 1998). Note that by making the scale (volatility), skewness and shape time-varying for this distribution, we also obtain as a consequence a time-varying kurtosis, which is useful for modeling variations in the risk of extremes in the series. This model is used to illustrate the empirical characteristics of the Bitcoin series in relation to possible changes in conditional high-order moments.
-
Using the MCS algorithm, we evaluate several specifications to forecast one-step ahead the conditional density for Bitcoin returns and construct predictive 5% and 1% Value at Risk measures. We use statistical distributions from (Ardia et al., 2019). The choice of distributions used in the empirical analyses is determined by two main aspects. The first aspect is empirical fit characteristics for financial time series data. We start from the Gaussian distribution, which is the benchmark distribution in finance. Then we moved on to a distribution with heavy tails (Student-t), and added the possibility of skewness in these two distributions (Skew-Normal and Skew-t), and then distributions with additional shape parameters (Asymmetric Student-t with one and two decay parameters and the Asymmetric Laplace Distribution) that allow more flexible modeling for skewness and kurtosis. The second aspect is related to computational complexity and stability. Some distributions used in the modeling of financial time series present difficulties in computational representation, such as Stable distributions (Nolan, 2003), and other distributions present stability and convergence problems in the estimation process, such as the generalized hyperbolic distribution, an interesting class of distributions that allow the presence of skewness and kurtosis. (Chu et al., 2015) discusses the use of this distribution in modeling the distribution of Bitcoin returns. The univariate distributions used are listed in Table 1. We analyze these distributions with fixed and time-varying parameters.
-
To perform the MCS analysis we need to define a suitable loss function based on the predictive performance of the analyzed models. In this article we choose the same strategy as (Ardia et al., 2016), i.e., using the GAS models to calculate the one-step ahead predictive Value-at-Risk (VaR). Therefore, we were able to build a loss matrix that contains the VaR losses associated to each of the different GAS specifications, using the asymmetric loss function Quantile Loss ((González-Rivera et al., 2004) and (Bernardi et al., 2014)) function discussed in the previous section.
-
In this paper, we considered one-step-ahead \(5\%\) and \(5\%\) VaR predictions for the last 365 and 1095 days in the sample, corresponding to approximately one and three years of forecasted observations. In this specification, we use two ways to build the predictions, using moving (rolling) and recursive windows in the definition of the sample used in the estimation of the models. For the moving and recursive windows, we also use two specifications for the change in the estimation window, using 1 and 22 days in the definition of the new estimation sample. For each model, we also evaluated specifications with all moments fixed in time, with time-varying scale parameter and fixed skewness and shape parameters, and time-varying scale and the other parameters (skewness, shape 1 and shape 2, depending on the distribution used) varying in time, using a first-order GAS dynamic for the time-varying parameters. VaR is calculated directly using the quantile of the specified conditional distribution. The results are presented in Sect. 3.
3 Data
As discussed in the introduction we are using in this study Bitcoin US dollar prices data. The source of data is coinmetricsFootnote 1. As discussed in (Urquhart, 2021), Coinmetrics is a trusted source of cryptocurrency data, as it only uses data from reputable exchanges and uses a set of 35 filtering criteria to eliminate illiquid and unreliable transactions, and this data has been used in several other studies on cryptocurrencies, such as (Liu and Tsyvinski, 2020), (Tsang and Yang, 2021), (Conlon et al., 2021) and (Urquhart, 2021), among other articles.
We use daily data with regular observations over time as it is the most common way of calculating performance and risk in financial assets. It is possible to work with irregularly spaced data from Bitcoin tick-by-tick transactions, and this is a proposed extension to our article. A discussion of GAS models in high-frequency and irregularly spaced data contexts can be found in (Buccheri et al., 2021).
The 2784 days in sample covers the period from January 2013 thought August 2020. The return \(r_t\) is calculated as \(r_t = \log (P_t / P_ {t - 1})\), where \(P_t\) is Bitcoin US dollar price on day t. As in (Chaim and Laurini, 2018), we presented in Table 2 the descriptive statistics of Gold, S &P500 and USD-EUR exchange rate returns for comparison issues. Average return is close to zero. An interesting result is the 5.70% standard deviation, a substantially higher value compared to traditional assets, such as stocks and market indexes. Skewness is negative: formal interpretation of this is that the median is greater than the average. Kurtosis has a high value, but it is not the largest. Figure 1 shows the time series of prices and returns of Bitcoin used in our study. We can see jumps in the price series and abrupt variations in the return series. The most recent is easy to interpret: it is the variation caused by the coronavirus crisis.
To allow an interpretation of Bitcoin price dynamics, we have retrieved from news and literature the most relevant Bitcoin’s ups and downs in recent years. Bitcoin started 2013 for around US$ 13.033. The beginning of 2013 was a bullish phase, surpassing its historical high of US$ 32.00. In November 2013, we observe a dramatic behavior with an 87% decline in the price. At the end of 2013 year, the price was close to US$ 1,200.00. In November 2017, 5 years after the four-digit Bitcoin, the cryptocurrency soared to US$ 10,000.00 and went to US$ 20,000.00 before it lost strength. Historically, the five biggest daily falls in history are, respectively: on 3/12/2020 (-37.1%), 12/18/2013 (-22.9%), 14/01/2015 (-20.45%), 12/06/2013 (-20.43%) and 12/16/ 2013 (-19.81%), and the biggest daily price returns are, respectively: 17/04/2013 (30.74%), 18/11/2013 (30.40%), 12/04/2013 (29.61%), 19/ 12/2013 (26.46%) and 20/07/2017-07-20 (22.40%).
Figure displays the posterior probability of jumps for the conditional volatility and mean using the double jump model of (Chaim and Laurini, 2018)
To get an insight into the presence of jumps in the Bitcoin series, we used the stochastic volatility model with jumps in the conditional mean and volatility proposed in (Chaim and Laurini, 2018) to perform the statistical dating of jumps. This model uses a compound binomial process framework to incorporate the impact of jumps on the first and second conditional moments. The jumps in this model are interpreted as the realization of two Bernoulli distributions for each observation in the sample, where the parameter associated with the Bernoulli processes indicates the probability of occurrence of a jump in the mean or in the conditional volatility of the process on each day. In this model, jumps in the mean indicate an occurrence of a discontinuous variation in the return of the series, being assumed that the jumps in the mean only affect the return on the day of occurrence of the jump, while the jumps in volatility indicate a permanent change in the non-conditional mean of the conditional volatility. This model is estimated using Bayesian methods using MCMC, whose details are not repeated here for reasons of space but can be seen in the original article (Chaim and Laurini, 2018) or in the multivariate version proposed in (Chaim and Laurini, 2019b).
Box 1: Jump dating using the (Chaim and Laurini, 2018) model.
Dates with posterior probability of jumps larger than 0.5 in volatility: |
2013-01-21, 2013-02-27, 2013-03-16, 2013-04-25, 2013-08-14, 2013-10-27, 2014-04-28, 2014-08-25, 2014-12-21, 2015-02-15,2015-02-25,2015-05-27, 2016-08-01, 2017-01-24, 2018-06-23, 2019-04-23. |
Dates with posterior probability of jumps larger than 0.5 in mean: |
2013-10-02, 2013-10-03, 2014-03-03, 2015-08-18, 2016-05-28, 2017-01-11, 2019-04-02, 2020-03-12. |
Figure 2 shows the estimated posterior probability for the jumps in the mean and in the conditional volatility of the process for the analyzed sample. We can observe a considerable number of days with probabilities greater than 0.5 for both mean and volatility. Box 1 lists the days that exceed this threshold, where we can observe 16 days with estimated jumps in conditional volatility, and 8 days with jumps in the average, according to the model of (Chaim and Laurini, 2018).
4 Results
In this section, we present and discuss the results obtained. First, we show the results for benchmark GAS model with the Skew-Student-t (sstd) innovations and time-varying scale, skewness and shape parameters. We present the estimated model parameters and also the estimated conditional moments plots. Finally, we show the result of the MCS algorithm and present the winning models.
In Table 3 we present the estimation results from sstd innovations. First column shows the parameter estimates, second column shows the estimated standard error, third and fourth column show the t value and p value statistics, respectively. In all estimations we use the Broyden–Fletcher–Goldfarb–Shanno (BFGS) (Fletcher, 1987) algorithm for maximize the likelihood functions derived from the GAS representation of the models. The BFGS algorithm is a general purpose and computationally and memory efficient Quasi-Newton optimization method. The descent direction in determined by preconditioning the gradient with the curvature information and using an iterative approximation of the Hessian matrix of the objective function using a generalized secant method (Fletcher, 1987). The BFGS is widely used in numerical maximization in econometric problems due to these good computational properties.
Figure 3 plots Bitcoin absolute Log daily returns (black), as a proxy of the unobserved true volatility, and model predicted volatility (red). Graphically, we see a good adjustment of the estimate in relation to our volatility proxy. It is clearly a series with large variations, some historically linked to our analysis made in the previous section.
The Fig. 4 is our main graph of interest. We have in panel (a) conditional skewness and in panel (b) the implied model conditional kurtosis, derived from the shape parameter, both moments time-varying. Skewness presents a relevant variation in time, being negative in most of the sample. Kurtosis, on the other hand, showed more expressive time variation, with some periods with very extreme values, which can be related to the presence of jumps in these moments. This indicates that the Bitcoin return series are not only exposed to considerable tail risk, but also that this risk is time varying.
We also performed an in-sample analysis of the performance of the distributions in adjusting conditional volatility for the entire sample analyzed. We assume constant mean and all other parameters (skewness and shape parameters) time-varying for all distributions. In this specification we also include the estimation of a GARCH (1,1) model, which would be equivalent to a GAS model for a Gaussian distribution with a time-varying scale parameter. Figure 5 shows predicted volatility using garch, sstd, std and ald distributions for the Bitcoin returns.
Table 4 shows the results of this analysis. We can observe that the specifications with sstd, std and ald distributions show better results in terms of ME, RMSE and MAE in relation to the GARCH model traditionally used in conditional volatility estimates. In the analysis for the full sample, the sstd model obtains the best performance in terms of mean error, and the model with std distribution the best results in terms of RMSE and MAE. These results support the use of models with alternative distributions and time-varying high-order moments in risk estimation, measured by latent volatility, associated with the series of Bitcoin returns.
Tables 5, 6, 7 and 8 show the MCS algorithm results comparing the alternative innovation distributions, updating window size and method for the predictive Value at Risk analysis. Tables 5, 6 refer to the analysis for the last 1095 observations (VaR levels of 5% and 1% respectively), while Tables 7, 8 for the last 365 observations (VaR levels of 5% and 1% respectively). We report the model rankings in the final Model Confidence Set and the test statistic for each model, based on 5,000 bootstrap replications, a confidence level of 0.2 and the Quantile Loss function discussed in Sect. 2.2, for a total of 76 tested specifications.
The predictive results for the last 1095 observations in the sample indicate a relevant reduction in the number of models in the model confidence set. For VaR with a level of 5%, 32 models are eliminated from the final MCS, and for the level of 1%, 31 models are eliminated. It is notable that Gaussian models with constant volatility are eliminated in this structure, as expected. Other relevant facts are that there are no notable differences between the use of moving or recursive estimates, and also between the use of windows with 1 or 22 days in the re-estimation. For the VaR of 5%, the ranking points to a better point performance for the specification using the snorm specification, although it is important to remember that all models in the MCS are statically equivalent in terms of performance. In this case, it is also notable that many eliminated models are given by specifications based on the std and sstd distribution.
The results for the VaR of 1% level for the analysis for the last 1095 observations are quite similar to the VaR of 5%, with the best positions in the ranking for the snorm model, and without notable differences in the use of moving versus recursive estimates, and also in the use of 1 or 22 observations in the definition of new estimation windows. In general, in the ordering of models within the MCS there is a predominance of models with time-varying scale parameters, but many statistically equivalent specifications impose higher-order parameters fixed in time, and even including in the final MCS models with fixed scale and shape parameters in time, like the std model at position 41 in this ranking. Thus, the results seem to indicate some punctual gain in the use of models with higher-order moments varying in time, but with a performance statistically equivalent to that of models with higher-order moments fixed in time. In part we can interpret this result due to the nature of the series of cryptocurrency returns, with emphasis on Bitcoin, which are characterized by extreme changes in the pattern of risk over time and the presence of price jumps.
When observing the results obtained by looking at the predictive analysis for the last 365 observations in the sample, shown in Tables 7 and 8, we note similarities and differences from the previous analysis. There are no notable gains between moving or recursive estimates, and also between 1 or 22 observations in the definition of new windows, and again the best models in the ranking assume structures with scale parameters and moments of a higher-order that vary in time. The reduction in MCS is smaller for this sample, with the elimination of only 14 models for VaR of 5% and 25 for VaR of 1%. This can be partly explained by the smaller sample size used in this comparison.
Again, the results indicate the importance of time-varying conditional volatility, especially for the use of the Gaussian distribution. In contrast in this sample, the best point performances are obtained by the std distribution, for both VaR levels of 5% and 1%, as opposed to the results of the previous sample. An interesting result is that the less traditional distributions ast, ast1 and ald, and the distribution with more parameters (ast1) has intermediate performance in all analyses. In this regard, we can confirm the relevant aspect of the large dynamic variation in the Bitcoin series, indicating that it is difficult to select a better model or specification for this series, which again reinforces the complexity in the management of cryptocurrency risk.
This analysis brings evidence that distributions with varying skewness and kurtosis bring predictive gains in risk management, although it is difficult to establish a better general specification, since MCS results indicate many models with statistically equivalent performance. We can also observe that the results depend on the sampling period used, and that there are no relevant differences between moving or recursive estimates or short (1 day) or longer (22) windows in the definition of the new estimation sample.
To perform a comparison of the results obtained for the series of bitcoin with traditional assets, we replicated the same analysis using returns from S &P500 index in the same sample period. Table 9 shows the results of the MCS applied to the VaR prediction at 5% level, applied to the last 1095 observations of the sample, making the results comparable to Table 5. The results for VaR (1%) have the same overall pattern and therefore not copied here. Compared to the results obtained in the analysis for Bitcoin, we can see that there are more models included in the final MCS, showing that the specific characteristics of the used distributions and time-varying parameter specifications are less important for the S &P500 series. We can also observe that the models with the best positions in the final MCS for this series are models of the Asymmetric Student-t distribution with one decay parameter, but always assuming skewness and/or shape parameters fixed in time, while in the same analysis for the Bitcoin the best models are from the Skew-normal family with time-varying scale and skewness parameters, and so it seems that making higher-order moments is more relevant for Bitcoin than for a more traditional asset like the SP500, but remembering that all the models included in the final MCS are statistically equivalent and this analysis is based on the point results obtained with the specific sample used.
5 Conclusion
Cryptocurrencies are a new and relevant class of financial assets, with empirical characteristics quite different from traditional assets such as stocks or bonds. These assets have more complex risk dynamics, with the presence of high volatility, relevant asymmetries and especially a high tail risk, associated with extreme variations in returns. These characteristics make the empirical modeling and risk management associated with these assets challenging problems.
In this work, we analyze the dynamics of Bitcoin returns, using a structure based on Generalized Autoregressive Score (GAS) models that allow the representation and inference of models with time-varying scale, skewness, and shape parameters, for a wide range of alternative distributions. This structure effectively allows the construction of models with conditional volatility, asymmetry, and kurtosis structures, generalizing the class of conditional volatility traditionally used in the modeling of financial time series.
To verify the possible predictive gains from the use of these models, we conduct analyses in and out of sample, comparing conditional volatility measures and the comparative performance of these models in the predictive calculation of the most important tail risk measure, Value at Risk. Using an asymmetric loss function derived from the VaR estimation results, we compare these models through the construction of Model Confidence Sets, using different configurations with fixed and time-varying parameters, two window update configurations and also the use of moving and recursive estimates.
The results obtained indicate some gains in the use of models with a dynamic structure for higher-order moments. In the in-sample analysis for conditional volatility, the use of a Student-t distribution with a time-varying shape parameter, inducing an autoregressive conditional kurtosis structure, led to the best results in terms of root mean squared error and mean absolute in relation to absolute returns, used as a proxy for latent volatility.
In the out-of-sample analyses comparing the predictive performance of the analyzed distributions, with fixed parameters and variations over time, the results are more heterogeneous. While in general the results indicate specific gains in the use of models with time-varying skewness and shape parameters, the final confidence set models also include models with some fixed parameters related to higher-order moments, and different orderings depending on the analyzed period. The results support the importance of models with conditional volatility in the estimation of Value at Risk, a traditional result of the risk modeling literature, but also indicate possible gains in the use of models with flexible structures for higher-order moments in the estimation of the risk when we look at the point estimates.
We believe that the GAS methodology can be also used in three different lines of research in cryptocurrency analysis. The first direct extension is the use of multivariate GAS models for risk assessment in cryptocurrency portfolios. A second direct extension is the use of GAS models in the analysis of high-frequency data of cryptocurrencies, a particularly interesting analysis due to the specific mechanisms of decentralized and asynchronous trading in these markets, using the formulation proposed in (Buccheri et al., 2021). A third possible extension is the use of GAS models with the possibility of regime switching (Bernardi and Catania, 2019), an alternative formulation of models with changing parameters that can capture some stylized facts of the cryptocurrency market, as discussed in (Chaim and Laurini, 2019b).
Data availability
All data from coinmetrics.io.
Code availability
Custom code developed by the authors.
Notes
References
Abad, P., Muela, S. B., & Martin, C. L. (2015). The role of the loss function in Value-at-Risk comparisons. Journal of Risk Model Validation, 9(1), 1–19.
Aggarwal, D. (2019). Do bitcoins follow a random walk model? Research in Economics, 73(1), 15–22.
Ardia, D., Boudt, K., & Catania, L. (2016). Value-at-Risk prediction in R with the GAS package. ar**v: Risk Management.
Ardia, D., Boudt, K., & Catania, L. (2019). Generalized Autoregressive Score models in R: The gas package. Journal of Statistical Software, Articles, 88(6), 1–28.
Bariviera, A. F. (2017). The inefficiency of Bitcoin revisited: A dynamic approach. Economics Letters, 161, 1–4.
Bariviera, A. F., Basgall, M. J., Hasperué, W., & Naiouf, M. (2017). Some stylized facts of the Bitcoin market. Physica A: Statistical Mechanics and its Applications, 484, 82–90.
Baur, D. G., & Dimpfl, T. (2018). Asymmetric volatility in cryptocurrencies. Economics Letters, 173, 148–151.
Bernardi, M., Catania, L., & Petrella, L. (2014). Are news important to predict large losses? Technical report, ar**v:1410.6898.
Bernardi, M., & Catania, L. (2018). The model confidence set package for R. International Journal of Computational Economics and Econometrics, 2(8), 144–158.
Bernardi, M., & Catania, L. (2019). Switching generalized autoregressive score copula models with application to systemic risk. Journal of Applied Econometrics, 34(1), 43–65.
Blasques, F., Koopman, S. J., & Lucas, A. (2014). Maximum likelihood estimation for generalized autoregressive score models. Technical Report 14-029/III, Tinbergen Institute Discussion Paper, Amsterdam and Rotterdam.
Bollerslev, T. (1987). A conditionally heteroskedastic time series model for speculative prices and rates of return. The Review of Economics and Statistics, 69(3), 542–547.
Boudt, K., Cornilly, D., Van Holle, F., & Willems, J. (2020). Algorithmic portfolio tilting to harvest higher moment gains. Heliyon, 6(3), e03516.
Bouri, E., Roubaud, D., & Shahzad, S. J. H. (2020). Do Bitcoin and other cryptocurrencies jump together? The Quarterly Review of Economics and Finance, 76, 396–409.
Buccheri, G., Bormetti, G., Corsi, F., & Lillo, F. (2021). A score-driven conditional correlation model for noisy and asynchronous data: An application to high-frequency covariance dynamics. Journal of Business & Economic Statistics, 39(4), 920–936.
Cerqueti, R., Giacalone, M., & Mattera, R. (2020). Skewed non-Gaussian GARCH models for cryptocurrencies volatility modelling. Information Sciences, 527, 1–26.
Chaim, P., & Laurini, M. P. (2018). Volatility and return jumps in Bitcoin. Economics Letters, 173, 158–163.
Chaim, P., & Laurini, M. P. (2019). Is Bitcoin a bubble? Physica A: Statistical Mechanics and its Applications, 517, 222–232.
Chaim, P., & Laurini, M. P. (2019). Nonlinear dependence in cryptocurrency markets. The North American Journal of Economics and Finance, 48, 32–47.
Charfeddine, L., & Maouchi, Y. (2019). Are shocks on the returns and volatility of cryptocurrencies really persistent? Finance Research Letters, 28, 423–430.
Cheah, E.-T., & Fry, J. (2015). Speculative bubbles in Bitcoin markets? an empirical investigation into the fundamental value of Bitcoin. Economics Letters, 130, 32–36.
Cheah, E.-T., Mishra, T., Parhi, M., & Zhang, Z. (2018). Long memory interdependency and inefficiency in Bitcoin markets. Economics Letters, 167, 18–25.
Chen, K.-S. & Huang, Y.-C. (2021). Detecting jump risk and jump-diffusion model for Bitcoin options pricing and hedging. Mathematics, 9(20).
Christoffersen, P. F. (1998). Evaluating interval forecasts. International Economic Review, 39(4), 841–862.
Chu, J., Nadarajah, S., & Chan, S. (2015). Statistical analysis of the exchange rate of Bitcoin. PLOS ONE, 10, 1–27.
Conlon, T., Corbet, S., & McGee, R. J. (2021). Inflation and cryptocurrencies revisited: A time-scale analysis. Economics Letters, 206, 109996.
Corbet, S., Lucey, B., & Yarovaya, L. (2018). Datestam** the Bitcoin and Ethereum bubbles. Finance Research Letters, 26, 81–88.
Cox, D. R. (1955). Some statistical methods connected with series of events. Journal of the Royal Statistical Society. Series B (Methodological), 17(2), 129–164.
Creal, D., Koopman, S. J., & Lucas, A. (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics, 28(5), 777–795.
Diebold, F. X., & Inoue, A. (2001). Long memory and regime switching. Journal of Econometrics,105(1), 131–159. Forecasting and empirical methods in finance and macroeconomics.
Dyhrberg, A. H. (2016). Bitcoin, gold and the dollar - a GARCH volatility analysis. Finance Research Letters, 16, 85–92.
Dyhrberg, A. H. (2016). Hedging capabilities of Bitcoin. is it the virtual gold? Finance Research Letters, 16, 139–144.
Fernandez, C., & Steel, M. (1998). On Bayesian modeling of fat tails and skewness. Journal of The American Statistical Association, 93, 359–371.
Fletcher, R. (1987). Practical methods of optimization (2nd ed.). John Wiley & Sons.
Geman, H., & Price, H. (2020). Bitcoin spot and derivatives markets: Searching for completeness. Risk and Decision Analysis, pages 1–13.
González-Rivera, G., Lee, T.-H., & Mishra, S. (2004). Forecasting volatility: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood. International Journal of Forecasting, 20(4), 629–645.
Hafner, C. M. (2018). Testing for bubbles in cryptocurrencies with time-varying volatility. Journal of Financial Econometrics, 18(2), 233–249.
Hansen, P., Nason, J., & Lunde, A. (2010). The model confidence set. Econometrica, 79, 453–497.
Härdle, W. K., Harvey, C. R., & Reule, R. C. G. (2020). Understanding cryptocurrencies. Journal of Financial Econometrics, 18(2), 181–208.
Harvey, A. (2011). Dynamic models for volatility and heavy tails: With applications to financial and economic time series. Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series, pages 1–262.
Hawkes, A. G. (2018). Hawkes processes and their applications to finance: a review. Quantitative Finance, 18(2), 193–198.
Hou, A. J., Wang, W., Chen, C. Y., & Härdle, W. K. (2019). Pricing cryptocurrency options: The case of Bitcoin and CRIX. Technical report, SSRN.
Jang, J., & Oh, R. (2021). A review on poisson, cox, hawkes, shot-noise Poisson and dynamic contagion process and their compound processes. Annals of Actuarial Science, 15(3), 623–644.
Katsiampa, P. (2017). Volatility estimation for Bitcoin: A comparison of GARCH models. Economics Letters, 158, 3–6.
Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46(1), 33–50.
Kotz, S., Kozubowski, T., & Podgórski, K. (2001). The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhäuser.
Kupiec, P. H. (1995). Techniques for verifying the accuracy of risk measurement models. The Journal of Derivatives, 3(2), 73–84.
Lahmiri, S., Bekiros, S., & Salvi, A. (2018). Long-range memory, distributional variation and randomness of Bitcoin volatility. Chaos, Solitons & Fractals, 107, 43–48.
Lezmi, E., Malongo, H., Roncalli, T., & Sobotka, R. (2018). Portfolio allocation with skewness risk: A practical guide. Technical report, SSRN.
Li, Z.-Z., Tao, R., Su, C.-W., & Lobonţ, O.-R. (2019). Does Bitcoin bubble burst? Quality & Quantity: International Journal of Methodology, 53(1), 91–105.
Liu, Y., & Tsyvinski, A. (2020). Risks and returns of cryptocurrency. The Review of Financial Studies, 34(6), 2689–2727.
Lopez, J. A. (1999). Methods for evaluating value-at-risk estimates. Economic Review - Federal Reserve Bank of San Francisco, 2, 3–17.
Manganelli, S., White, H., & Kim, T.-H. (2008). Modeling autoregressive conditional skewness and kurtosis with multi-quantile CAViaR. Working Paper Series 957, European Central Bank.
Nolan, J. P. (2003). Modeling financial data with stable distributions. In S. T. Rachev (Ed.), Handbook of Heavy Tailed Distributions in Finance. volume 1 of Handbooks in Finance, (pp. 105–130). Amsterdam: North-Holland.
Palamalai, S., Kumar, K. K., & Maity, B. (2021). Testing the random walk hypothesis for leading cryptocurrencies. Borsa Istanbul Review, 21(3), 256–268.
Petukhina, A., Trimborn, S., Härdle, W. K., & Elendner, H. (2020). Investing with cryptocurrencies – evaluating their potential for portfolio allocation strategies. Technical report, SSRN.
Phillip, A., Chan, J., & Peiris, S. (2019). On long memory effects in the volatility measure of cryptocurrencies. Finance Research Letters, 28, 95–100.
Scaillet, O., Treccani, A., & Trevisan, C. (2018). High-frequency jump analysis of the Bitcoin market. Journal of Financial Econometrics, 18(2), 209–232.
Trimborn, S., Li, M., & Härdle, W. K. (2019). Investing with cryptocurrencies–a liquidity constrained investment approach*. Journal of Financial Econometrics, 18(2), 280–306.
Troster, V., Tiwari, A. K., Shahbaz, M., & Macedo, D. N. (2019). Bitcoin returns and risk: A general GARCH and GAS analysis. Finance Research Letters, 30, 187–193.
Tsang, K. P., & Yang, Z. (2021). The market for Bitcoin transactions. Journal of International Financial Markets, Institutions and Money, 71, 101282.
Urquhart, A. (2021). Under the hood of the ethereum blockchain. Finance Research Letters, page 102628.
Urquhart, A. (2016). The inefficiency of Bitcoin. Economics Letters, 148, 80–82.
Zhu, D., & Galbraith, J. W. (2010). A generalized asymmetric Student-t distribution with application to financial econometrics. Journal of Econometrics, 157(2), 297–305.
Funding
Authors acknowledge funding from CNPq (303738/2015-4) and FAPESP (2018 /04654-9) and Capes (Finance Code 01).
Author information
Authors and Affiliations
Contributions
Equal participation of the two authors in all stages of the work.
Corresponding author
Ethics declarations
Conflict of interest
The authors report the absence of any type of conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We are grateful for the criticisms and suggestions sent by two anonymous reviewers.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vieira, L.I., Laurini, M.P. Time-varying higher moments in Bitcoin. Digit Finance 5, 231–260 (2023). https://doi.org/10.1007/s42521-022-00072-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42521-022-00072-8