1 Introduction

1.1 Background

Satellite rainfall products have recently been widely applied for environmental sciences, engineering forecasting, water resources, etc. Rainfall estimates from rain gauges have limitation related to the low spatial representation when compared with satellite rainfall products. In develo** countries, rain gauge observations have low density, particularly in mountainous areas where majority of rainfall takes place. Besides, rain gauge measurements are costly for data monitoring and data maintenance.

It is still a common practice that rainfall is interpolated from point rain gauge measurements in water resources and related studies. However, such method is expensive because many gauges are needed to capture typically large rainfall variability, so often, gauge numbers are compromised. As such, interpolation can create high bias. Satellite rainfall estimates can fill this gap by providing spatio-temporal data coverage, although processed satellite rainfall products have often substantial biases when compared with true, gauge measurements, mainly due to different spatial resolution. Besides, the performance of satellite products is different in different areas. To fill this gap in a given study area, it is therefore essential to evaluate error of different satellite rainfall estimates against rain gauge network of that area, based on that choose an optimal satellite rainfall product for that particular area, and finally remove bias of that product.

Recent developments make satellite-based rainfall more accurate, accessible, consistent, and reliable than before (Boushaki et al. 2009; Michaelides et al. 2009). Various approaches have been developed to derive rainfall from satellites such as the Tropical Rainfall Measuring Mission Multi-Satellite Precipitation Analysis (TRMM TMPA) by Simpson et al. (1988) and Huffman et al. (2007), Climate Prediction Center Morphing Method (CMORPH) by Joyce et al. (2004), and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) by Sorooshian et al. (2000).

Performance of different satellite products, typically evaluated comparing them to ground measurements, is different in different geographical regions. For example, Li et al. (2013) concluded that TRMM performed better than PERSIANN and CMORPH in Yangtze River Basin in China, where the latter two substantially underestimated rainfall. Also in the Philippines, TRMM performed better than CMORPH (Jamandre and Narisma 2013). However, in Illinois River Basin, USA (Behrangi et al. 2011), PERSIANN Adjusted performed better than CMORPH and even better than TRMM TMPA-RT (near real-time product/3B42RT) and TRMM TMPA-V6 (post-real-time research version product/3B42). In contrast, CMORPH and TRMM TMPA-RT performed better than TRMM TMPA-V6 and PERSIANN in mountainous watershed in Ethiopia (Bitew et al. 2011). Even these few mentioned examples show that different rainfall products perform differently depending on the area investigated; therefore, before selecting satellite rainfall product for an investigated area, validation of various products, such as presented in this study for the Bali Island, should be done (if not done before), to select one, the most suitable and accurate for that particular area.

The satellite rainfall products have uncertainties attributed to different aspects such as spatial and temporal sampling methods, retrieval error, and geographical and climatic conditions (Seo and Krajewski 2015; Villarini et al. 2008). Spatial sampling bias is present at any satellite rainfall product, due to spatial rainfall integration in an analyzed pixel, and also can be caused by converting the coarse scale (such as 0.50° and 0.25°) satellite product into finer scale (downscaling). Temporal sampling bias can be caused by diurnal cycle ignorance, when certain satellite visits over a land area or tropical ocean are not frequent enough to catch the process that influences rainfall occurrence. The satellite products such as CMORPH and TRMM combine data from Passive Micro-Wave (PMW) and Infra-Red (IR). The retrieval methods of the individual data are different and can influence the accuracy of each rainfall product. Some products, such as PERSIANN, incorporate rain gauges to decrease the uncertainty while CMORPH does not. It can be generally concluded that the differences in rainfall retrieval can lead to different rainfall estimation errors over different climatic and geographic regions (Dinku et al. 2009; Gebregiorgis and Hossain 2013).

Despite obvious rainfall dependence on many environmental factors such as elevation, humidity, wind speed, and direction, many research studies evaluate performance of satellite rainfall products without constraining them with any of such environmental factors. Some selected examples of such studies are as follows: Nile Basin (Habib et al. 2012); Iran (Katiraie-Boroujerdy et al. 2013; Moazami et al. 2013); Yangtze River Basin (Li et al. 2013); Ganjiang River Basin, China (Hu et al. 2013); and India (Prakash et al. 2014). Some few other studies use a single environmental factor, mostly elevation (Dinku et al. 2008; Dinku et al. 2009; Hu et al. 2014). Only very few studies use more than one environmental factors such as elevation and climatology zone (Gao and Liu 2013; Liu et al. 2015); elevation, geographical location, and slope condition (Yang and Luo 2014); elevation, climatology type, and dry/wet season (Gebregiorgis and Hossain 2014).

To our knowledge, the only satellite-based rainfall study in Bali Island was done by As-Syakur et al. (2011). They evaluated TRMM (3B42 and 3B43) product with only three rain gauge stations, applying daily and monthly data over 1998–2002 period, without evaluating performance of other satellites and without constraining the rainfall assessment with environmental factors such as elevation and climatology zone. They concluded that the TRMM product provided a good performance on monthly basis in Bali Island and that it can be used to substitute rainfall from rain gauge observations.

Bali Island has complex spatio-temporal pattern of rainfall due to variable topography, proximity of the ocean, and tropical island climatic exposure to Asian-Australian monsoon system. It has insufficient amount of rain gauges to accurately characterize spatio-temporal rainfall variability indispensable to properly manage water resources. The rainfall represents most important input to water balances and to integrated hydrological models (e.g., GSFLOW) used nowadays in water resource evaluation and management (Hassan et al. 2014). As most of such models use daily rainfall as input, there is a need to define the most accurate, daily satellite rainfall product for Bali Island that could compliment an available rain gauge network to provide the best-possible rainfall input for a distributed hydrological model of Bali Island. In response to that need, the main objective of this study is to validate various satellite rainfall products over the Bali Island, accounting for terrain and climatic complexity, to select the most accurate daily satellite rainfall product, based on daily rainfall data of 34 rain gauge stations, over a 3-year period (1 October 2003 to 30 September 2006). For realization of that objective, descriptive and categorical statistics at various assessment scales were used, applying elevation and climatic variability as environmental constraints of that analysis.

The novelty of this research is in the first time validation of satellite rainfall in the Bali Island applying (i) daily ground-rainfall measurements over the island applying large number of rain gauges, i.e., 34 gauges over a 3-year period; (ii) descriptive and categorical statistics including bias decomposition of four satellite rainfall products: CMORPH25, CMORPH8, TRMM, and PERSIANN; (iii) three different assessment scales, i.e., the whole island scale, the zonation scale, and the pixel scale; and (iv) two environmental constraints, influencing rainfall, i.e., elevation and climate.

1.2 Description of study area

The Indonesian Bali Island is located between 8° 3′ 40″ S and 8° 50′ 48″ S latitude and 114° 25′ 53″ E and 115° 42′ 40″ E longitude. It occupies an area of 5380 km2, being 127.5 km long and 62.5 km wide. The Island has a tropical climate with minimum and maximum annual temperatures (as recorded within 1998–2012), ranging from 24.4 to 30.8 °C.

The Island consists of nine regencies, i.e., Jembrana, Tabanan, Badung, Gianyar, Klungkung, Bangli, Karangasem, Buleleng, and Denpasar. It has complex topography, which comprises six slope classes according to van Zuidam and van Zuidam-Cancelado (1979): flat topography with slope 0–3% (1010.89 km2), undulating topography with slope 3–8% (2427.83 km2), undulating rolling topography with slope 8–14% (1169.33 km2), rolling-hilly topography with slope 14–21% (545.39 km2), hilly steeply dissected topography with slope 21–56% (223.30 km2), and mountainous topography with slope more than 56% (3.26 km2). The altitude of the Bali Island varies from 0 to 3028 m a.s.l. In the middle part, there is the highest, latitudinal mountain range of Gunung Agung that separates Bali Island into two parts, northern narrower and southern broader (Fig. 1), the two with different rainfall characteristics. Based on data in 1984–2009 from Indonesian Agency for Meteorology, Climatology, and Geophysics (locally called BMKG), the average annual rainfalls in the northern and the southern parts are 1761.3 and 2024.5 mm year−1, respectively.

Fig. 1
figure 1

Bali Island map with two types of grids, 0.25 × 0.25° and 0.0727 × 0.0727°, representing pixels of satellite rainfall products and network of pixels containing rain gauges marked by numbers and specific border lines, black (0.25°) for CMORPH25, TRMM, and PERSIANN and magenta (0.0727°) for CMORPH8 satellite rainfall products

2 Methods

Based on literature review (e.g., Haile et al. 2013; Li et al. 2013; Yang and Luo 2014), four satellite products, i.e., CMORPH25, CMORPH8, PERSIANN, and TRMM, were selected to be validated in this study.

CMORPH is a merged product of Infra-Red (IR) and Passive Micro-Wave (PMW) images using Lagrangian interpolation to morph PMW data sets. It is available around the globe between 60° N and 60° S (Joyce et al. 2010). The original spatial resolution of CMORPH is 0.0727 × 0.0727° lat/lon corresponding to ~ 8 × 8 km grid size at the equator, interpolated to obtain spatial resolution of 0.25 × 0.25° lat/lon corresponding to ~ 27 × 27 km grid size (Janowiak et al. 2005).

TRMM is developed by the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (GSFC). TRMM TMPA is available in two products: (i) a real-time version (3B42-RT) and (ii) gauge-adjusted, post-real-time research version (3B42). Gridded rainfall of TRMM TMPA (3B42) published since 1998 is mainly accumulated at three hourly intervals, with spatial grid coverage 0.25 × 0.25° lat/lon, the same as CMORPH. In this study research, 3B42 version was used and is further referred as TRMM. The TRMM is available between 50° N and 50° S (Huffman et al. 2010). It is a merged product of the IR from geosynchronous satellite and the PMW from low orbit satellite, where the IR is to detect cloud top temperatures and PMW to observe cloud size and cloud phase based on available hydrometeors.

PERSIANN gridded rainfall is generated every 3 h at 0.25 × 0.25° lat/lon with global coverage between 60° N and 60° S. This product estimates rainfall using artificial neural network of IR brightness temperature every 30 min provided by PMW (Hsu et al. 1997). This data set provides three hourly and six hourly gridded data at 0.25 × 0.25° lat/lon, matching CMORPH25 and TRMM grids.

The three hourly PERSIANN and 30-min CMORPH8 products were aggregated into daily rainfall to be consistent with CMORPH25 and TRMM. The 0.25 × 0.25° lat/lon grid projection of coinciding daily rainfall products CMORPH25, PERSIANN, and TRMM (thick lines grids) as well as 0.0727 × 0.0727° lat/lon grid projection of CMORPH8 (thin line grid) are presented in Fig. 1. All the maps and figures were analyzed using ILWIS, ArcGIS 10.2.1, and R-software.

To validate the rainfall products, the 3-year daily rainfall data from 1 October 2003 to 30 September 2006 from 34 rain gauge stations (Fig. 1) obtained from Research Center for Water Resources (locally known PUSAIR) was used. Besides, 13 additional rain gauge stations (Fig. 1) with monthly data extending from 1984 to 2009 obtained from BMKG were used to define climatology zones (for which, also the 34 rain gauges were used). The validation was done using two different statistical methods: descriptive and categorical statistics. The descriptive statistics was implemented separately for wet and dry seasons at the three different, spatial assessment scales: island, zonation, and pixel scales. The categorical statistics and bias decomposition were implemented for island and pixel scales, without separation into wet and dry seasons.

2.1 Descriptive statistics

The satellite rainfall validation was performed applying the following descriptive statistical measures (Hu et al. 2014; Li et al. 2013): mean error (ME), root mean square error (RMSE), accumulated absolute error (AAE), and relative error (RE). The performances of the four daily satellite rainfall products, i.e., CMORPH25, CMORPH8, TRMM, and PERSIANN, were evaluated using daily rainfall records of 34 rain gauges distributed over the island (Fig. 1), within the 3-year time, i.e., in total within 1096 days from 1 October 2003 to 30 September 2006, for three wet seasons from October to March (in total 547 days) and for three dry seasons from April to September (in total 549 days), separately; that separation was applied because of different expected performance.

2.1.1 Island-scale approach

To evaluate performances of different satellite rainfall products over the Bali Island according to the island-scale approach, each of the 34 daily gauge rainfall estimate was compared with the corresponding satellite-pixel rainfall estimate (Fig. 1) and temporal averages of ME, MAE, RMSE, AAE, and RE error measures, as explained below, were calculated over both dry (549 days) and wet (547 days) seasons per each satellite rainfall product. The error measures are explained as follows:

$$ ME=\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}\frac{1}{T}{\sum}_{t=1}^T\left({Rs}_{\lambda t}-{Rg}_{\lambda t}\right) $$
(1)

The ME (mean error) ranges from − ∞ to ∞, the best score is 0, and the negative values mean underestimation and positive overestimation.

$$ MAE=\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}\frac{1}{T}{\sum}_{t=1}^T\left|{Rs}_{\lambda t}-{Rg}_{\lambda t}\right| $$
(2)

The MAE (mean absolute error) ranges from 0 to ∞ and 0 is the best score.

$$ \mathrm{RMSE}=\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}\sqrt{\frac{1}{T}{\sum}_{t=1}^T{\left({Rs}_{\lambda t}-{Rg}_{\lambda t}\right)}^2} $$
(3)

The RMSE (root mean squared error) range is from 0 to ∞ and 0 is the best score.

$$ AAE=\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}{\sum}_{t=1}^T\left|{Rs}_{\lambda t}-{Rg}_{\lambda t}\right| $$
(4)

The AAE (accumulated absolute error) range is from 0 to ∞ and 0 is the best score.

$$ RE=\frac{AAE}{AR_g}=\frac{\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}{\sum}_{t=1}^T\left|{Rs}_{\lambda t}-{Rg}_{\lambda t}\right|}{\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}{\sum}_{t=1}^T\left({Rg}_{\lambda t}\right)} $$
(5)

The lower RE (relative error), the more reliable the tested property is.

In Eqs. 15, T is the total number of daily rainfall events t (in three dry seasons T = 3 × 183 = 549 days and in three wet seasons T = 2 × 182 + 183 = 547 days), n i is the total number of rain gauges (λ) in the Bali Island (in this study n i  = 34), Rg λt is a rainfall in a rain gauge λ in a day t, Rs λt is a satellite rainfall of the pixel matching rain gauge λ in a day t, AR g is accumulated rainfall mean of all gauges available in the analyzed area, and AR g /T = MR g is a daily mean rainfall of all gauges available in an analyzed area, in this case whole Bali Island and all days of the analyzed period (T).

2.1.2 Zonation-scale approach

In zonation-scale approach, two constraints were taken into account, elevation and climatology as in Figs. 2 and 3, respectively. In that approach, similar calculation as in the island-scale approach was done, but attributing gauges to the defined, particular zones.

Fig. 2
figure 2

Elevation zonation of Bali Island and selected pixels for pixel-scale approach

Fig. 3
figure 3

Climatology zonation of Bali Island

The elevation zonation was done using Digital Terrain Model (DTM) map downloaded from http://glcf.umd.edu/data/ with resolution 90 × 90 m. Using that DTM, the Bali Island was divided into six elevation intervals of 500 m: 0–500 m a.s.l., 500–1000 m a.s.l., 1000–1500 m a.s.l., 1500–2000 m a.s.l., 2000–2500 m a.s.l., and > 2500 m a.s.l. adapting Liu et al. (2015), because 500-m interval is the relative height difference of relief classes according to van Zuidam and van Zuidam-Cancelado (1979). However, as in this study area, there were no rain gauges above 1000 m a.s.l., finally, only two zones were considered (Fig. 2): zone I (< 500 m a.s.l.) and zone II (> 500 m a.s.l.). Based on relief classification (van Zuidam and van Zuidam-Cancelado 1979), the zone I with average slope < 15% was classified as flat-hilly topography (zone L), while the zone II with > 15%, as hilly-mountainous topography (zone H).

The climatology zonation was assigned based on Schmidt and Ferguson (1951) tropical climatology classification applying Q-ratio of the number of dry to wet months in a certain long period, in this case from 1984 to 2009. According to that classification, a dry month is when rainfall is ≤ 60 mm and a wet one when ≥ 100 mm. According to that classification, eight main climatology types (from A to H) are considered: A (very wet region, tropical rain forest) with 0 ≤ Q˂ 0.143; B (wet region, tropical rain forest) with 0.143 ≤ Q˂ 0.333; C (somewhat wet region, deciduous forest in dry season) with 0.333 ≤ Q˂ 0.6; D (moderate climate, seasonal forest) with 0.6 ≤ Q˂ 1.0; E (somewhat dry climate, savanna forest) with 1.0 ≤ Q˂ 1.67; F (dry climate, savanna forest) with 1.67 ≤ Q˂ 3.00; G (very dry climate, grass) with 3.00 ≤ Q˂ 7.0; and H (extremely dry climate, grass) with Q ≥ 7.0. The climatology zonation of this study was defined from the 13 monthly and 34 daily rain gauge stations (Fig. 3), applying mountain range of Gunung-Agung, which divides Bali Island into dry northern and wet southern parts, as climatological constraint.

In the zonation approach, the ME, MAE, RMSE, AAE, and RE were estimated in elevation or climatic zones, in similar way as in island-scale approach, adapting Eqs. 15 for each daily gauge estimates of rainfall, to be compared with satellite rainfall estimate of the corresponding pixel within the analyzed zone (also for wet and dry seasons separately) for n z -number of rain gauges in that zone (equivalent of n i in Eqs. 15).

2.1.3 Pixel-scale approach

In the pixel-scale approach, three pixels with the largest amount of rain gauges were analyzed, i.e., pixel 3 and pixel 8, both with 0.25 × 0.25° spatial resolution including six and five rain gauges respectively, and pixel A, composed of four pixels each with 0.0727 × 0.0727° spatial extent, all together including five rain gauges (Figs. 1 and 2). In the selection of the three pixels, a condition was applied that sea coverage, possibly having different rainfall condition than the land, should have as small as possible contribution; this is the reason why, for example, the pixel 9 (Fig. 1) with six rain gauges but with > 50% area occupied by the ocean was not considered in the pixel-scale approach.

In each of the three selected pixels, the pixel-reference rainfall (MR g ) was defined as an average of all rain gauge estimates in that pixel (or set of pixels) at the given day of assessment t as follows (Villarini et al. 2008):

$$ {MR}_g=\frac{1}{n_p}{\sum}_{\lambda =1}^{n_p}\frac{1}{T}{\sum}_{t=1}^T{Rg}_{\lambda t} $$
(6)

where Rg is rainfall measurement in a single λ rain gauge in a day t while n p is the number of rain gauges in that pixel (equivalent of n i in Eqs. 15), or in the set of pixels as in the case of pixel A, where satellite rainfall estimates of the four composite pixels were also averaged similarly to gauges.

2.2 Categorical statistics and bias decomposition

Categorical statistics was used to cross-reference rainfall frequency in each of the four satellites versus rain gauge frequency, according to contingency Table 1, involving four event combinations: (i) hit (H)—the daily event when rain gauge and satellite both detect rainfall; (ii) miss (M)—the daily event when rain gauge detects rainfall and satellite product not; (iii) false alarm (FA)—the daily event when satellite detects rainfall and rain gauge not; (iv) correct negative (CN)—the daily event when both rain gauge and satellite detect no rainfall.

Table 1 Contingency table of categorical statistics

If we assume that in a given number of days (in this study 1096 days), Ng and Ns are the numbers of days with rain recorded by gauges and satellite, respectively, then Ng = H + M and Ns = H + FA.

The total bias of each of the satellite rainfall estimation algorithms can be decomposed into three components, hit bias and absolute hit bias HB and AHB, Eqs. 7 and 7a, respectively), miss bias (MB, Eq. 8), and false bias (FB, Eq. 9), as expressed after Haile et al. (2013). Total bias (TB) is the sum of the above three bias components as in Eq. 10.

$$ HB=\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}{\sum}_{t=1}^T\left({Rs}_{\lambda t}-{Rg}_{\lambda t}\right) $$
(7)
$$ AHB=\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}{\sum}_{t=1}^T\left|{Rs}_{\lambda t}-{Rg}_{\lambda t}\right| $$
(7a)
$$ MB=\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}{\sum}_{t=1}^T\left({Rg}_{\lambda t}\right):\left({Rs}_{\lambda t}=0;{Rg}_{\lambda t}>0\right) $$
(8)
$$ FB=\frac{1}{n_i}{\sum}_{\lambda =1}^{n_i}{\sum}_{t=1}^T\left({Rs}_{\lambda t}-{Rg}_{\lambda t}\right):\left({Rs}_{\lambda t}>0;{Rg}_{\lambda t}=0\right) $$
(9)
$$ TB= HB+ MB+ FB $$
(10)

Symbols in the Eqs. 79 are the same as in Eqs. 16. The categorical statistics and bias decomposition were carried out for CMORPH25, CMORPH8, and TRMM (PERSIANN rainfall product was excluded as the least reliable in the Bali Island, after completion of the results of descriptive statistics as it is explained below), without splitting estimates into dry and wet season. All categorical statistics values were calculated as frequencies compared to total number of sampled days T = 1096.

2.2.1 Island-scale approach

In island-scale approach, the 34 gauge-specific daily averages of categorical statistics (H, M, FA, and CN) and bias decomposition components (HB, MB, FB and TB) were averaged over n i  = 34 available rain gauges.

2.2.2 Pixel-scale approach

In the pixel-scale approach, the validation of the CMORPH25 and TRMM at the 0.25° pixels 3 and 8 and of the CMORPH8 at the composite pixel A were carried out in the same manner as in the island-scale; only in the pixel A, the satellite rainfall was calculated as the average of four CMORPH8 rainfall pixel estimates. The bias decomposition was defined not only for each of the three pixels but also individually for each gauging station of the three pixels to illustrate spatial variability of bias decomposition components in these pixels. Moreover, the 3-year daily wind speed and wind direction data of the three locations (Fig. 1) were used to build wind roses and to overview the performances of satellite products as the wind circulation plays important role in spatio-temporal rainfall variability in tropics (Sato et al. 2009).

Additionally, to evaluate satellite performance in detecting low (≤ 20 mm) and high (≥ 50 mm) rainfall events (BMKG 2010), in low (< 500 m a.s.l.) and high (> 500 m a.s.l.) altitudes, the rainfall of the pixels 3 and 8 was analyzed, applying failure ratio (FR = M/H). The FR was calculated as the ratio of the two most important categorical analysis indicators, i.e., miss (M) and hit (H); obviously, the satellite rainfall product performance is the best if the M = 0 and H = max, i.e. if FR approaches 0.

3 Results

3.1 Descriptive statistics

3.1.1 Island-scale approach

The results of the island-scale approach applying descriptive statistics are presented in Figs. 4, 5, and 6 and Table 2. The boxplots made for rainy and dry seasons (Fig. 4) show that CMORPH25, CMORPH8, and TRMM substantially underestimated daily rainfall, in both rainy and dry seasons, while PERSIANN only slightly underestimated in the rainy seasons and substantially overestimated in the dry seasons.

Fig. 4
figure 4

Boxplot of daily rainfall events in rainy and dry seasons within 3-year period (1 October 2003 to 30 September 2006) for 34 rain gauges and 4 satellite rainfall products. Y-axis was cut to 35 mm day−1 by sacrificing one, 43.3 mm day−1 outlier of TRMM

Fig. 5
figure 5

Histograms of mean error (ME) of daily rainfall events per 5 mm day−1 rainfall classes, in rainy season and dry season, within 3-year period (1 October 2003 to 30 September 2006) for 34 rain gauges and 4 satellite rainfall products

Fig. 6
figure 6

Histograms of root mean square error (RMSE) of daily rainfall events per 5 mm day−1 rainfall classes, in rainy season and dry season, within 3-year period (1 October 2003 to 30 September 2006) for 34 rain gauges and 4 satellite rainfall products

Table 2 Summary of satellite performance on daily basis at the island scale in 3-year period from 1 October 2003 to 30 September 2003 for wet and dry seasons separately. ME [mm·day-1] mean error; RMSE [mm·day-1] root mean square error; MRg [mm·day-1] mean rainfall; ARg [mm] accumulated rainfall; AAE [mm] accumulated absolute error; RE [ ] relative error

Considering statistical distribution of daily rainfall within the three study years, the PERSIANN rainfall products seemed to have the most similar distribution as compared to the gauged rainfall distribution in rainy season, so also the best performance, although it is shown below how misleading that first impression was.

The performance differences in rainfall detection between each of the four satellite rainfall products and the reference rain gauges (averages of the 34 rain gauges), at the island scale, are shown by histograms of ME in Fig. 5 and by histograms of RMSE in Fig. 6, for rainy and dry seasons separately. In the rainy seasons, all four ME histograms were quite similar and close to normal distribution. In the dry seasons, the similar CMORPH25, CMORPH8, and TRMM histograms were (i) different than in the rainy seasons, being substantially skewed towards negative values (satellite rainfall underestimation), with the most frequent underestimation of the smallest rainfall events, i.e., ME in range from − 5 to 0 mm day−1, and (ii) different than the ME histogram of the PERSIANN, which was skewed towards positive values with the most frequent overestimation of the smallest rainfall events with ME in range from 0 to + 5 mm day−1, but also substantial overestimations of rain events in range 5 to 10 and even 10 to 15 mm day−1.

The RMSE distributions of CMORPH25, CMORPH8, and TRMM are quite similar to each other (Fig. 6) but very different between the rainy and the dry seasons. In the rainy seasons, all four histograms have semi-normal distributions, with the largest RMSE in the range of 15–20 mm day−1, being skewed towards small values. In the dry seasons, the histograms of CMORPH25, CMORPH8, and TRMM are remarkably skewed towards the smallest RMSE of 0–5 mm day−1 range, representing 74, 73, and 71% of the total RMSE, respectively. The dry season histogram of PERSIANN was quite different. The smallest RMSE range of 0–5 mm day−1 represented only 31% while the larger RMSE range of 5–10 mm day−1 was the most frequent (42%) while the next 10–15 mm day−1 range, representing 12%, was mainly responsible for large rainfall overestimation by PERSIANN. The similarities of ME and RMSE histograms of CMORPH25, CMORPH8, and TRMM are likely because these three satellite products retrieve rainfall using PMW and IR, in contrast to PERSIANN, which retrieves rainfall using only IR.

Table 2 presents various error measures (ME, RMSE, AAE, RE) at the whole island-scale approach, for the wet and the dry seasons separately. In general, the performance of all the four satellite products was relatively poor because (i) the RMSE was large, comparing to MR g ; (ii) ME was substantially divergent from 0 in negatives, indicating significant satellite rainfall underestimation; and (iii) AAE was larger than AR g (Eq. 5), so RE was larger, or much larger than one as in the case of PERSIANN (RE > 3 in dry season), practically disqualifying it. Among the three other satellite products, relatively best performance was observed by CMORPH25, which had the lowest RMSE, AAE and RE in dry and wet seasons, respectively, although when only considering ME, the TRMM had the best performance, i.e., was the least divergent from 0. The descriptive statistics at the island scale emphasized reasonable performance of CMORPH25, CMORPH8, and TRMM and much worse performance of PERSIANN, particularly in dry seasons.

3.1.2 Zonation-scale approach

Table 3 presents error assessment (ME, RMSE, AAE, and RE) of the satellite rainfall products as dependent on elevation, i.e., per two elevation zones, low (L) altitude zone of < 500 m a.s.l. and high (H) altitude zone > 500 m a.s.l., for wet and dry seasons separately; unfortunately, there are no rain gauges in the very high altitude zone > 1000 m a.s.l. In the wet season, when the H-zone received more rain than the L-zone, the RMSE of the H-zone was larger and the ME was more divergent from 0; in the dry season, when the H-zone received less rain than the L-zone, the RMSE of the H-zone was still larger and the ME more divergent from 0 than in L-zone. In the wet season, better performance of the satellite rainfall products in H-zone than in L-zone, was marked by lower RE (except of PERSIANN). In contrast, in the dry season, the lower RE in the L-zone (except of PERSIANN for which the RE was ~ 3 in both L and H-zones) emphasized better performance of the satellite rainfall products in L-zone than in H-zone. The three satellite products, CMORPH25, CMORPH8, and TRMM, performed relatively similar; CMORPH25 had the lowest RMSE, AAE, and RE followed by similarly performing CMORPH8, while TRMM had the least divergent ME from 0. Based on RE (Table 3), it can be concluded that satellite rainfall products performed better in wet season than in dry season. PERSIANN consistently performed better in high (H) altitudes, although its overall performance was weak. CMORPH25, CMORPH8, and TRMM performed better in high (H) than in low (L) altitudes in wet season, while in dry season, all those satellite rainfall products performed worse in low (L) than in high (H) altitudes.

Table 3 Daily satellite rainfall performance within 3-year period from 1 October 2003 to 30 September 2003, in low (L) altitude (< 500 m a.s.l.) zone and in high (H) altitude (> 500 m a.s.l.) zone, presented separately for wet and dry seasons. ME [mm·day-1] mean error; RMSE [mm·day-1] root mean square error; MRg [mm·day-1] mean rainfall; ARg [mm] accumulated rainfall; AAE [mm] accumulated absolute error; RE [ ] relative error

The performance of the four satellite rainfall products as dependent on climatological zonation of Schmidt and Ferguson (1951) is presented in Table 4. The rainfall of the CMORPH25, CMORPH8, and TRMM at the wetter zones C–E was generally underestimated (negative ME), more in the wet season than in the dry season (larger RMSE), although the RE was smaller in wet than in dry season. In the driest zone F, all the satellite rainfall products overestimated rainfall in both wet and dry seasons, providing high RE > 1.5, higher in dry than in wet season, extremely high in the case of PERSIANN in dry season. However, the performance evaluation in the zone F is uncertain, as it is based on the mean of only three gauges, with very different rainfall estimates.

Table 4 Daily satellite rainfall performance within 3-year period from 1 October 2003 to 30 September 2003, in climatological zones following Schmidt and Ferguson (1951) classification, presented separately in wet and dry seasons. ME [mm·day-1] mean error; RMSE [mm·day-1] root mean square error; MRg [mm·day-1] mean rainfall; ARg [mm] accumulated rainfall; AAE [mm] accumulated absolute error; RE [ ] relative error

The spatial distribution of the mean absolute error (MAE) and of the sign of ME of all four-satellite rainfall estimates compared to the 34 gauge rainfall measurements are presented in Figs. 7 and 8 for dry and wet seasons, respectively. In those figures, the areas above (H-zone) and below (L-zone) 500 m a.s.l. and the climatology zones are delineated. Even quick visual assessment of the maps shows that in the wet season (Fig. 7), in H-zone, all four satellite-based products had pretty high MAE and were generally underestimating rainfall, while in L-zone, MAE were generally lower, with common, local, small overestimations in coastal areas. In the dry season (Fig. 8—note that for clarity of illustration, the scale of this dry season MAE is different than in the scale in wet season as in Fig. 7), the MAE values were much lower; in the H-zone, the CMORPH25, CMORPH8, and TRMM rainfalls were still underestimated, while in the L-zone, there were fewer coastal overestimates, so in general, all the three satellite products underestimated rainfall; in contrast, PERSIANN highly overestimated rainfall.

Fig. 7
figure 7

Spatially distributed, daily performance of satellite rainfall by mean absolute error (MAE) estimated over 34 rain gauges for the three wet seasons in 3-year period from 1 October 2003 to 30 September 2006; the diameter of the circle is proportional to MAE, while plus or minus indicates sign of the mean error (ME) representing either satellite rainfall overestimation or underestimation respectively

Fig. 8
figure 8

Spatially distributed, daily performance of satellite rainfall by mean absolute error (MAE) estimated over 34 rain gauges for the three dry season in 3-year period from 1 October 2003 to 30 September 2006; the diameter of the circle is proportional to MAE, while plus or minus indicates sign of the mean error (ME) representing either satellite rainfall overestimation or underestimation respectively

The climatological zonation presented in Figs. 7 and 8, in general, follows the topography, i.e., drier zones are in coastal areas and wetter, in higher altitude inland areas. Exception to that is the south-western coast of Bali Island characterized by moderately wet zone C. That zone extends towards center of the Bali Island over hilly and mountainous area, where even wetter climate zone B is present although that zone B classification is based on only one, monthly rain gauge (for location, see Fig. 1). The disadvantage of the presented Schmidt and Ferguson (1951) climatological classification is that its spatial distribution is largely dependent on the availability, so density, of rain gauges; for example, in this study, there are no gauges in the highest altitude areas and only one in the central part of the island, where the largest rainfall is expected.

3.1.3 Pixel-scale approach

Table 5 presents pixel-scale approach of error analysis (ME, RMSE, AAE and RE), carried out for the selected pixels 3, 8, and A, i.e., those with the largest amount of rain gauges, such as 6, 5, and 5, respectively (Fig. 2). In each of these pixels, reference rainfall (MR g ) was estimated according to Eq. 6, for comparison with the satellite rainfall estimate at each pixel. Similar to the earlier analysis, the negative ME in Table 5 indicates that CMORPH25, CMORPH8, TRMM, and PERSIANN underestimated rainfall in the wet season while in the dry season, CMORPH25, CMORPH8, and TRMM underestimated less, but PERSIANN heavily overestimated rainfall at all 34 rain gauges.

Table 5 Daily satellite rainfall performance within 3-year period (1 October 2003–30 September 2006) at the pixel scale, presented separately for rainy and dry season. ME [mm·day-1] mean error; RMSE [mm·day-1] root mean square error; MR g [mm·day-1] mean rainfall; AR g [mm] accumulated rainfall; AAE [mm] accumulated absolute error; RE [ ] relative error

In general, all satellite products performed rather unsatisfactorily at the pixel-scale approach, having relatively large RMSE comparing to MR g , reflected also by quite large RE > 1, which in the case of PERSIANN in dry season was extremely large exceeding 2.6. In general, all satellite rainfall products performed relatively better in wet than in dry seasons as indicated by lower RE in wet than in dry season. Among the four products, CMORPH25 seemed to perform the best, having the lowest RE despite in pixel 3 having larger negative ME and in pixel 8 higher RMSE than TRMM. CMORPH8 had comparable error measures with CMORPH25 and TRMM, but direct comparison was impossible because its error measures referred to different pixel size, with possibly different rainfall conditions. In wet season, all indicators of PERSIANN rainfall product were slightly worse than any of the other three satellite products, but in dry season, the performance of PERSIANN was totally incorrect, as also in the island and zonation approaches; therefore PERSIANN satellite rainfall product was excluded from further, categorical analysis.

3.2 Categorical statistics and bias decomposition

3.2.1 Island- and pixel-scale approaches

Table 6 presents categorical statistics and bias decomposition of three satellite rainfall products CMORPH25, CMORPH8, and TRMM in the island-scale and pixel-scale approaches. Generally, CMORPH25 and TRMM had larger rainfall frequency (Ns) compared to rain gauge rainfall frequency (Ng), but CMORPH8 had lower. Considering categorical statistics, the CMORPH25 provided best results having the lowest M and the highest H. The FA was the lowest in CMORPH8 but that frequency measure is not a credible indicator of the quality of satellite rainfall-product performance as discussed below.

Table 6 Island and pixel scale, daily categorical statistics, and bias decomposition analyzed within 3-year period from 1 October 2003 to 30 September 2003

Considering accumulated satellite rainfall (AR s —calculated analogical to AR g ), all the three satellite products underestimated rainfall compared to AR g , so AR s < AR g ; the least underestimated was AR s of TRMM, while the most, that of CMORPH8; the TRMM had also the lowest HB and AHB. In contrast, CMORPH25 had the lowest MB while CMORPH8 the largest. The bias decomposition did not provide clear answer which of the three satellites performed best, although it emphasized poor performance of all three-satellite rainfall products, all substantially underestimating daily rainfall. This can be particularly well seen by large negative MB and HB values (Table 6). Regarding the latter, the small or no differences between HB and AHB clearly indicate that hit overestimates at the island scale are very rare, so satellite rainfall underestimation is a dominant problem.

All satellite rainfall approaches presented so far either at the whole island or at the pixel scale were compared with average rainfall of a number of gauges. In contrast, Table 7 compares satellite rainfall estimates at the two selected pixels 3 and 8, with rainfall estimates at individual gauging stations, applying the failure ratio (FR) performance indicator. The analysis presented in Table 7 focused on testing low rain (≤ 20 mm day−1) and high rain (≥ 50 mm day−1), additionally separating gauges into those located in low (L) altitudes (< 500 m a.s.l.) and those in high (H) altitudes (> 500 m a.s.l.) zones. It is remarkable that the FR of CMORPH25 was the lowest for most of the FR combinations (low rain in L and H altitude and high rain at L altitude), except for high rains ≥ 50 mm day−1 at H altitude, for which TRMM had the lowest FR. In contrast, the FR of the CMORPH8 was the highest for all FR combinations (even with frequent FR > 1), except for low rains at H altitudes, for which the FR was generally lower than in TRMM but still higher than in CMORPH25.

Table 7 Use of frequencies of hit (H), miss (M), and failure ratio (FR = M/H), for the assessment of daily satellite rainfall performance in detecting low rainfall events (≤ 20 mm) and high rainfall events (≥ 50 mm), in low (L) altitude zone (< 500 m a.s.l.) and high (H) altitude zone (> 500 m a.s.l.); the analyzed pixel 3, 8, and their rain gauge stations are presented in Figs. 2 and 9

Figure 9 presents graphical analysis of the bias decomposition per gauge in pixels 3, 8, and A (for the locations of the pixels, see Fig. 3). The pixel 3 is the only one that captures area (with gauges), south and north of the Gunung Agung mountain range. In the southern, windward location, the two rain gauges located in H-zone, Bongancina (8448 mm in 3 years so 2816 mm year−1, see Fig. 9) and Pempatan (2848 mm year−1), have substantially larger satellite rainfall biases than the gauges at the northern, leeward side characterized by generally lower rainfall and lower biases except of the Umadesa (2676 mm year−1), which has comparable biases, likely because of comparable rainfall rates. The pixels 8 and A are both located entirely within the southern, windward side of the Bali Island. Knowing that the pixel 8 size is nearly twice larger than the size of the pixel A, it can be deduced that the topographical gradient measured perpendicularly to the coastline is approximately twice larger in pixel A than in pixel 8. The pixel 8 represents rather flat, homogeneous land between sea and mountains, characterized by moderate and quite uniform biases as well as the rainfall rates in the available five gauges distributed well within that pixel. In contrast, in the pixel A, like in the pixel 3, there is large variability of biases. The largest MB = 1819 mm year−1 is observed for the Negara BMKG (2198 mm year−1) while the lowest of 547 mm year−1, for the Negara DPU (1585 mm year−1), so also lower than the MB = 1358 mm year−1 of similarly and closely located, the wettest Pohsanten (3023 mm year−1)—the reason of such large rainfall difference for such closely and similarly located stations is not clear. It is also remarkable that despite the largest rainfall in the Pohsanten location, the largest bias of the pixel A is not in that location but in the Negara BMKG, which all suggests that the pixel A is likely affected by local wind conditions.

Fig. 9
figure 9

Bias decomposition of daily rainfall six rain gauges of the pixel 3 and 5 rain gauges of the pixel 8 against three satellite products and five rain gauges of the pixel A against CMORPH8 within 3-year period (1 October 2003 to 30 September 2006). CM25 means CMORPH25, CM8 means CMORPH8. Y-axis represents total accumulation of rainfall bias in millimeter, positive values reflecting overestimation, and negative values underestimation comparing to gauged rainfall. Note that the scale difference of the pixel A is about half size of the other two

4 Discussion and conclusions

The relatively poor agreement between the satellite rainfall products and ground rainfall measurements observed in this study is mainly caused by (1) scale difference between satellite products and rain gauges used for validation; (2) fine, daily resolution of validation; and (3) complexity of the Bali Island.

Large, if not the largest problem in comparing satellite rainfall with the rainfall measured by gauges, is the scale difference between the two types of measurements, i.e., large spatial scale in the case of satellite rainfall products and small, in the case of rain gauges. A rain gauge measurement represents rainfall estimate at the local scale, limited to ~ 200 m2 coverage (Morin and Gabella 2007), while the satellite-pixel rainfall is a regional rainfall estimate with spatial coverage in order of tenths of km2 (e.g., CMORPH8) or even hundreds of km2 (CMORPH25 or TRMM). For example, in pixel 3, six rain gauges represent in total rainfall area of ~ 1200 m2, while satellite rainfall estimated within a pixel represents an area of ~ 729 km2. The scale-difference problem can be additionally enhanced by fine temporal resolution of satellite data acquisition and terrain complexity, both discussed below.

The fine, daily, temporal resolution of validation of satellite rainfall was applied in this study, because nowadays, the daily (not weekly or monthly) rainfall is typically required as driving force input of integrated hydrological models, slowly becoming a standard tool in watershed management. Daily satellite-based rainfall estimates can also improve scientific knowledge on water cycle and energy budget. Unfortunately, the correct daily satellite rainfall estimate is by far more demanding and difficult to obtain than weekly, decadal, or monthly, because temporal data accumulation reduces the systematic error (AghaKouchak et al. 2012; Tian et al. 2007). As such, the satellite rainfall products at daily resolution are more prone to errors and more vulnerable to large spatio-temporal rainfall variability, characteristic for Bali Island.

The large complexity of the Bali Island is the third reason of generally poor agreement between the analyzed satellite rainfall products and rain gauges. That complexity is mainly because of (i) diverse topography, (ii) proximity of sea and mountains, and (iii) local conditions of wind circulation.

The diverse topography of the Bali Island enhances spatio-temporal variability of rainfall. Orographic lifting and blocking can modify rainfall, especially in island conditions, over short distances (Lee et al. 2014). Air masses of different moisture contents in the rainy and dry seasons entering the island at low altitudes are forced by the mountainous area to move dynamically with strong vertical components over a short distance, creating large spatio-temporal variability in rainfall intensities. CMORPH25, CMORPH8, and TRMM performed differently in low altitudes (L-zone) and high altitudes (H-zone) in the rainy and dry seasons. The ME values of CMORPH25, CMORPH8, and TRMM were all negative, i.e., these satellite-based products underestimated the rainfall as compared to the gauge-measured rainfall, in both L and H elevation zones, in rainy and dry seasons (e.g., Table 3). The RE of these three-satellite rainfall products were (Table 3) the lowest in the H-zone in the rainy season (best performance) and the highest in the H-zone in the dry season (poorest performance). The PERSIANN showed the least matching performance in the dry season (e.g., extremely large RE), because it had low accuracy to detect local convective rainfall, mostly occurring in dry season (Table 3). The similar performance of the PERSIANN in the L-zone and in the H-zone, can be attributed to its retrieval algorithm, which neglects the altitude of the ground objects (Cai et al. 2016).

The proximity of sea and mountains in the Bali Island is another reason enhancing discrepancy between satellite and gauge estimates. Short distance and large topographical gradient between sea coast and mountains, typical for Bali Island, accelerate abrupt and rapid growth of rain clouds, resulting in successive sensor snapshots of miss (M) events of short lifetime and limited spatial extent. The whole island is affected by this problem because of its small size, although at different spatial extent and intensity. An example of that problem is evidenced by coastal pixel A with large topographical gradient, in which 3 year MB = − 3500 mm was substantially larger than MB = − 2789 mm for the entire island (Table 6). Each satellite sensor handles mixed land-ocean pixels differently that can influence the performance of satellite rainfall products at inlands and coastal areas. Figures 7 and 8 show that despite general inland rainfall underestimations, the coastal areas exhibit overestimations. The inland underestimates are because PMW sensor-based rainfall products have tendencies to underestimate light rainfall. The brightness temperature thresholds to define rain and no-rain clouds, cannot separate the warm cloud of light rainfall from the warm background of the land, resulting in underestimation of rainfall. This is not the case over the cold background of the ocean. The coastal overestimates are because satellite products have tendencies to overestimate air moisture transport since the atmosphere beneath the cloud is dry implying rainfall evaporation before reaching the land surface (Dinku et al. 2011). The complexity of convective systems over coastal and inland areas could not be captured sufficiently well to describe the local circulation in Bali Island with typical proximity of sea and mountains and land-sea contrast (Sato et al. 2009).

Local wind circulation (Meisner and Arkin 1987) in Bali Island is another reason of large and complex rainfall variability which enhance differences between satellite and gauged rainfall. For example, because of local condition of wind circulation, satellite sensors cannot properly quantify windward heavy rain at the south of the Gunung Agung mountain range (Fig. 1) and leeward light rain at the north of mountain range during the southwest monsoon in the rainy season. Winds from south-easterly directions (windward) bring moist air to the south of the Gunung Agung mountain range, creating localized heavy rains, while the downwinds from the top of the mountain range to the north (leeward) bring dry air creating localized, light rain rates. The most typical, moderate wind (11–17 knots with frequency 52.1%) that blows in Bali Island from coastal areas towards highland areas can also influence convergence circulation associated with land-sea breeze. As such, the local wind circulation is responsible for large differences in rainfall itself and in rainfall biases. These differences can be well seen in Fig. 9, for example comparing pixel A stations Negara DPU and Pohsanten located next to each other and in similar conditions but substantially differing in rainfall rate and in satellite estimated rainfall biases. It seems coarse resolution satellite products are unable to depict properly implications of local wind circulation, as they cannot capture the shifting time of rainfall occurrence of diurnal cycle of precipitation (Qian 2008) that in Bali Island is associated with variable land-sea breeze conditions.

The descriptive statistics shows that in the wet season, the absolute error measures (e.g., RMSE, MAE) are larger than in dry season but the relative error (RE), lower. The larger absolute error in wet season is because the dominant, larger rainfall events, those of > 5 mm day−1 (Fig. 4), have also larger absolute error, as indicated in Fig. 6 by the RMSE frequency distribution. In contrast, in the dry season, the most frequent type is the rain of small intensity of < 5 mm day−1, so the majority of the absolute error is restricted to that intensity of rain, resulting in low absolute error. As the absolute error is dependent on the quantity of rainfall, better assessment of a satellite rainfall performance is provided by RE. The RE in most of the analyzed scales (i.e., Tables 2, 3, 4, and 5) indicated lower RE in the wet than in the dry season. The larger RE in the dry season is likely attributed to the following two reasons: one is that majority of rainfall events in dry season is very small (Fig. 4), so the RE naturally tend to be high, and the other is that in the dry atmosphere, the direct evaporation of precipitation is more intense, so rain produced by the cloud is strongly diminished before reaching the ground (Johansson and Chen 2003). As a result, also in the drier climatic zones (e.g., E, F), the absolute error measures were lower than in the wetter ones (e.g., C, D), in both the wet and dry seasons, but the RE were higher (Table 4). The rainfall dependence on altitude was less distinct, likely because of only two elevation zones implemented due to data limitation at high altitudes. In the wet season, in H-zone, there was larger rainfall, larger absolute error, and lower RE than in L-zone, while in the dry season, in H-zone, the rainfall was lower but the absolute error measures (particularly RMSE) and RE (except of PERSIANN) were larger, which can also be attributed to the dominance of convective rains (Table 3).

In categorical statistics, the low M confirms reliability of a satellite product, because it informs about low amount of failures of satellite rainfall detection, i.e., when the gauge in a given pixel records rainfall, but that rainfall is not recorded by satellite. In contrast to M, the H shows confirmation of the ability of satellite to record properly the rain that was recorded by the gauge, although there is always a potential risk that different showers are recorded by satellite than the gauge. The least reliable, in fact misleading, is FA (false alarm, referring to the case when satellite registers rain but that rain is not recorded by the gauge), as there could be a rain within a pixel area that is not recorded by the gauge.

Regarding bias decomposition, the most reliable validation component of a satellite rainfall product is MB (miss bias), which represents the rain bias of an event not accounted for by a satellite. The hit bias (HB) is a less reliable measure of a bias than MB, because it is possible that different showers are recorded by satellite than by gauge and also because the hit events characterized by more rain measured by satellite than by gauge cancel out with hit events characterized by less rain measured by the satellite than by gauge when these hits are accumulated (see Eq. 7). Because of the latter reason, an absolute hit bias (AHB) introduced in this study (see Eq. 7a) is a better, hit bias measure than HB. For FB (false bias), the same problem applies as for the FA, i.e., the truly recorded rain by satellite, that takes place only in a certain part of a pixel where gauge or gauges are not present, may not be recorded by a gauge, just because of no rain at that gauge. Finally, the TB cannot characterize well the performance of satellite rainfall products (Yong et al. 2016), because, having different signs, its components can cancel.

Remarkable performance differences were observed between the three satellite rainfall products, i.e., CMORPH25, CMORPH8, and TRMM, all primarily based on PMW algorithms and the PERSIANN product, based on an IR algorithm. The first three rainfall products analyze electromagnetic emission and scattering of the rainfall, while the PERSIANN does not analyze signal from the rainfall itself, but cloud conditions. CMORPH25, CMORPH8, and TRMM had tendency to underestimate rainfall as compared to rain gauge estimates, while PERSIANN tended to overestimate. The PMW sensors capture brightness temperature from the clouds (Ferraro et al. 1998). However, the rapid evolution of stratiform rain and the rain intensity at the base and at the top of the clouds could not be captured by successive scans of the sensors (Ebert et al. 2007; Tian et al. 2009); therefore, these three satellite rainfall products tend to miss rain-clouds and underestimate rainfall. In contrast, the IR sensors of PERSIANN had tendencies to overestimate rainfall compared to rain gauges, because long-time scale of convective activity at the cloud is miss-identified as rainy events (Janowiak et al. 2005; Pfeifroth et al. 2016). The PERSIANN rainfall estimates in dry season were overestimated so much (Tables 2, 3, 4, and 5) that already after descriptive statistical analysis, this product was excluded from further investigations.

The statistical analysis carried out in this study showed that CMORPH25 had slightly better accuracy in estimating rainfall in Bali Island than TRMM and CMORPH8 although none of the three products provided sufficiently accurate rainfall estimates to be directly used without bias correction. That conclusion with respect to TRMM is in agreement with As-Syakur et al. (2011) who also concluded that TRMM does not have good performance on daily basis.

The statistically defined advantages of CMORPH25 over TRMM are as follows: (i) lower RMSE, MAE, and RE; (ii) larger H and lower M frequencies; and (iii) lower MB (although larger HB). CMORPH25 showed also consistently better agreement with gauged rainfall than TRMM (at all 12 gauges analyzed, Table 7) for low-rate rainfall events (≤ 20 mm day−1), which represent 70% of Bali rainfall, having larger H, lower M, and lower FR. For high-rate rainfall events (> 50 mm day−1), which represent ~ 9% of Bali rainfall, there was no clear advantage of any of the two satellite products.

The poorer performance of CMORPH8 than CMORPH25 and TRMM, despite its better spatial resolution, is evidenced by the larger RMSE, MAE, and RE. CMORPH8 missed also more rainfall events, resulting in the largest M and the largest MB. In addition, CMORPH8 had the lowest H, all reflecting its poor rainfall detection. The poorest performance of CMORPH8 estimates is related to its highest spatial resolution implying largest vulnerability to error enhanced by high spatio-temporal variability of rainfall in the Bali Island; the reason is lower error compensation due to lesser influence of error accumulation in spatial and temporal sampling (AghaKouchak et al. 2012; Sato et al. 2009). As such, the coarser spatial resolution CMORPH25 better represents the shifting time of rainfall events from coastal areas to inlands (Qian 2008), which could not be properly detected by successive snapshots of CMORPH8 sensors.

Generally, the daily performance of all four satellite rainfall products analyzed in this study showed a large discrepancy as compared to gauge data, that was proved by large ME, RMSE, MAE, and RE and also by large M and low H compared with N g , which contributed to large values of HB, MB, and TB. Despite this quite weak performance of all the four satellite rainfall products, they provide spatio-temporal estimate of rainfall not available from gauges. Besides, that estimate can be improved by bias correction (Addor and Seibert 2014; AghaKouchak et al. 2009; Müller and Thompson 2013) or by merged/blended improvements (Chappell et al. 2013; Li and Shao 2010; Woldemeskel et al. 2013) with gauge data as reference. Geophysical and climatological constraints (Jia et al. 2011), such as elevation applied in this study, eventually also distance to the sea (Abtew et al. 2011; Johansson and Chen 2003) and wind direction (Castro et al. 2014), can also improve the performance of satellite rainfall products in contrast to climatology zonation that was not particularly beneficial. The relatively best performing satellite-based rainfall product (among the four validated ones in this study) on daily basis in the Bali Island was the CMORPH25. In spite of its coarse spatial resolution, that satellite rainfall product is recommended for the use in Bali Island after blended/merging.