Abstract
An explanatory model from time series data allows us to identify performance drivers and forecast performance given specific driver values, just as regression models from cross sectional data do. When decision makers want to forecast future performance in the shorter term, a time series of past performance is used to identify drivers and fit a model. A time series model can be used to identify drivers whose variation over time is associated with later variation in performance over time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Appendices
Excel 10.1: Build and Fit an Explanatory Time Series Model
Tesla Revenues
Build a model of Tesla revenues which potentially includes past the economic indicators, represented by World GDPPC and World GDP, and World CO2 emissions. The data are in Tesla revenues.
Excel 10.2: Create Potential Driver Lags
Create working columns that will contain the lagged drivers by copying columns A through E and then pasting into column G:
For the desired forecast, emissions, GDP and GDPPC must be lagged 3–5 years, providing data in 2023 and later years upon which to make the forecast. For emissions and GDP, make 3, 4 and 5 year lags. For GDPPC, make a 3 year lag. To make the three three year lags, either insert three empty cells in rows 2–4 or select the data and drag down three rows. Copy those 3 year lags for emissions and GDP and paste into new columns, and then insert one more empty cell in row 2 to create 4 year lags. Copy those 4 year lags and paste into new columns, then inserte one more empty cell in row 2 to make 5 year lags. Label the eight columns with the appropriate lags.
The top of the data series becomes:
Now, delete the rows with incomplete data, 2–6:
Next assess skewness of the dependent variable, revenues, and the seven potential drivers, using only rows through 2020:
Since revenues are positively skewed outside the approximately Normal range, add a new column I with the square roots and confirm that skewness of the square roots is within the approximately Normal range of −1 to +1.
Excel 10.3: Select the Most Promising Driver
To increase the chances that the model will be valid, identify the potential driver(s) that show a matching pattern in the recent years (excluding the two most recent years).
Plot the square roots and the potential drivers, adding trendlines. Plot through the 2020 year only, shown in Figs. 10.9, 10.10, 10.11, and 10.12. The longest lags for emissions and GDP are shown below, though all six of the lags should be considered.
Revenues increased every year, including the four most recent years.
World GDPPC, lagged 3 years, has increased three out of four of the most recent years.
World GDP, lagged 5 years, has increased in three of the four most recent years. This is also the case for the 3 and 4 year lags.
World emissions, lagged 5 years, have increased annually in three of the four most recent years. The 3 and 4 year lags have fewer matching increases.
Based on patterns revealed by the plots, the GDPPC 3 year lag, the GDP 3, 4 and 5 year lags, and the emissions 5 year lag match better than other lags, and tie for three out of four matches in the four most recent years. Move the emissions lags which match less well out of the way, for now. To decide which of these lags is preferred, look at correlations with the revenue square roots, using only data through 2020.
sqrt revenue ($B) | |
---|---|
sqrt revenue ($B) | 1 |
World GDP (T$) t−3 | 0.838261 |
World GDPPC ($K) t−3 | 0.703793 |
World GDP (T$) t−4 | 0.846541 |
World CO2 emissions (B tons) t−5 | 0.933914 |
World GDP (T$) t−5 | 0.908683 |
Only the first column of correlations with the revenue square roots is of interest. The correlation with the 5 year emissions lag is strongest, make that the best choice for the one driver model.
Build a one driver model of the revenue square roots with the 5 year emissions lag.
Variation in past World emissions accounts for 87% of the variation in the revenue square roots. The model is significant. The emissions coefficient has the correct positive sign, and the one tail pvalue would be much smaller than .05.
Plot the residuals, setting major units to the standard error, .7, and assess Durbin Watson.
DW is between the lower and upper critical values, indicating possible positive autocorrelation. Residuals in 2018–2020 illustrate a positive impact on revenues, presumably due to availability of Model 3. Add a Model 3 indicator, set to one in 2018 and later years and build a two driver model.
With past World emissions and the Model 3 indicator, 96% of the variation in revenue is accounted for. The model is significant. Both coefficient estimates have the correct positive signs and both one tail pvalues would be much smaller than .05.
Plot residuals, setting major units to the standard error, .4, and assess Durbin Watson.
DW is now greater than two, signaling absence of positive autocorrelation. All residuals are within two standard errors of zero. Nonetheless, a third driver could improve the model. To allow for this possibility, copy the residuals and paste next to the lagged drivers, and then run correlations between the lagged drivers for GDP and GDPPC and residuals.
World GDP (T$) t−3 | World GDPPC ($K) t−3 | World GDP (T$) t−4 | World GDP (T$) t−5 | Residuals | |
---|---|---|---|---|---|
Residuals | 0.31776 | 0.361931 | −0.02193 | −0.01632 | 1 |
It is the residual row of correlations that will reveal any drivers that could help to account for remaining residual variation. The largest correlation is with past GDPPC, lagged 3 years. Note that adding that driver to the model will restrict the forecast to 3 years. Run that three driver model.
Adding past GDPPC increases RSquare to 98.1%. The model is significant, all three coefficient estimates are positive, as expected, and all three have one tail pvalues much smaller than .05.
Assess residuals:
All residuals are within two standard errors of zero and DW is above two. Residuals are free from positive autocorrelation.
Excel 10.4: Test the Model’s Forecasting Validity
Test the model’s predictive validity.
Copy Year, and revenue and revenue square roots from the data sheet and paste next to residuals.
JMP 10.1: Find Individual Prediction Interval Bounds
Use JMP to find the individual prediction interval bounds to test predictive validity.
Copy year, sqrt revenue, Model 3, lagged World emissions, lagged World GDPPC and paste into JMP.
Hide/exclude 2021 and 2022.
Analyze, Fit Model.
In the regression output, click on the red triangle at the top, Save Columns, Indiv Confidence interval.
Copy the individual prediction interval bounds (Indiv Confidence Intervals) and paste into your excel regression sheet.
Comparing the predictions for 2021 and 2022 with actual values reveals that the forecasts are too low, presumably due to the impact of the popular Model Y launch.
Add a Model Y indicator, set to one 2021 through 2025, and set to zero before 2021.
Excel 10.5: Recalibrate to Forecast
Recalibrate the model with regression using all years, model 3, Model Y, emissions and GDPPC.
Accounting for Model Y availability increases RSquare to 98.6%. All four drivers are significant with correct signs.
To make the fit and forecast, copy year and revenues and paste next to residuals.
Use JMP to find the individual prediction interval bounds.
Copy the year, revenue, revenue square roots, the two indicators, past World emissions, and World GDPPC and paste into a new data table in JMP.
Analyze, Fit Model.
Add the revenue square roots to Y and Add Model 3, Model Y, World emissions and World GDPPC to Construct Model Effects. Change emphasis to Minimal and click Keep Dialog Open.
Run, then click the red triangle, Save Columns, Indiv Confidence Intervals:
Select and copy the prediction intervals, which will have been added to the data table:
Paste into the excel regression sheet, next to the square roots. Create two new columns which are the squares of the prediction interval bounds in square roots. Add labels:
Excel 10.6: Illustrate the Fit and Forecast
To see the model fit and forecast, plot the prediction intervals with actual revenues through 2020. After selecting Year and Revenues through 2025, hold CNTL and then, with your mouse, select the prediction interval bounds. Insert a scatterplot.
Adjust the horizontal axis to make good use of white space, add a vertical axis title, remove unwelcome zeros, change the prediction intervals to one color, without markers, change the actual revenues to markers only, and add a stand alone title:
Excel 10.7: Assess the Impact of Drivers
To see the contributions of the four drivers to revenue forecasts, make the part worths. To make the part worths, copy Model 3, Model Y, World emissions and World GDPPC from the data sheet and paste into the regression sheet.
The intercept is in $B$17.
The Model 3 part worth is the product of its coefficient, $B$18, and the value of Model 3 in a year.
The Model Y part worth is the product of its coefficient, $B$19, and the value of recession in a year.
The World emissions part worth is the product of its coefficient, $B$20, and the value of World emissions t−5.
The World GDPPC part worth is the product of its coefficient, $B$21, and the value of World GDPPC t−3.
Predicted revenue square roots are the sum of the five part worths, and when squared, produce predicted revenues.
Use the part worths to compare the impacts of each of the drivers on model forecasts. The ranges (maximum-minimum) in the part worths are measures of driver importances.
Copy driver labels and paste above the part worth ranges, and then insert a column chart (2D bar).
Add a horizontal axis title and add a stand alone title.
Using sensitivity analysis, the impacts of the Model 3 and Model Y availability can be revealed. Since this is a nonlinear model, that impacts will differ each year. Set the Model 3 values to zero, copy and paste special (without formulas) predicted revenues.
Reset the Model 3 indicator to 1 in 2018–2025 and find the difference between predicted revenue with and without Model 3.
The Model 3 estimated impact ranges from $16.1B to $33.1B. While the impact generally increases each year, there are several years where this is not true, due to interactions with past World GDPPC and World emissions.
Repeat this process to assess the expected impact of Model Y in years 2021–2025.
The estimated impact of Model Y is $30.6B–$35.6B, also increasing most years, but not all years due to interactions with past World emissions and World GDPPC.
Case 10.1: Identifying Drivers and Forecasting Aluminum Production in China and North America
Alcoa (and other U.S. aluminum producers) face increasing competition from Chinese aluminum production. China has designated aluminum as a strategic resource, and now produces more than demanded in China. Overproduction has produced a glut in World markets, lowering prices and profits of producers. To bolster aluminum production and sales in the U.S., the Whitehouse placed tariffs on aluminum imports from China in March 2018.
Alcoa managers are interested in forecasting production of aluminum in China and in North America, as well as identifying drivers of production in both global regions
Managers believe that population, GDP and GDP per capita drive aluminum production, since increasing population and wealth drives demand for aluminum intensive products, such as soft drinks, beer, cars, and construction. They are not sure which is the most powerful driver.
The price of aluminum probably influences production. Higher prices motivate increased production, while falling prices lead to reduced production.
Finally, aluminum supply may drive aluminum production. Production in North America may influence production in China, and production in China may influence production in North America.
Have tariffs significantly driven production in China and in North America? (Address this question after you’ve built valid models by then adding a tariff indicator to your recalibrated models.)
Alcoa would like forecasts of aluminum production in China and North America, ideally for the next 5 years, 2019–2023, but at least through 2022.
-
Production in China
CHN aluminum.xlsx contains Chinese aluminum production for 2007 through 2018, North American aluminum production, World GDP and per capita GDP, World population, as well as World aluminum prices from 2002 to 2018.
Hide the two most recent observations for aluminum production (until you are ready to recalibrate).
Plot Chinese aluminum production to identify patterns to be explained.
Identify potential drivers and set up lags which will allow a 4–5 year forecast.
Assess skewness and rescale if skewness is outside the approximately Normal −1 to +1 range.
-
1.
Plot the potential drivers and identify those with most pattern matches to aluminum production in China in the four most recent years (among those not hidden). Choosing one of those with most pattern matches will increase chances that your model is valid. Specify best matches with their lag(s). (Note: matches with NA production need to be moving in the opposite direction.)
Lag(s)
Lag(s)
Lag(s)
Lag(s)
NA prod
GDPPC
GDP
Alum price
-
2.
Among potential driver lags with most matches in the four most recent years, identify those that have higher correlations (of correct sign) with aluminum production. (Consider “higher” correlations those within .02 of the highest correlation.) Specify higher correlations with their lag(s). (If a potential driver does not have the most matches in the four most recent years, leave that cell blank.)
Lag(s)
Lag(s)
Lag(s)
NA prod
GDPPC
GDP
population
Alum price
-
3.
Build a one driver model, favoring a highly correlated driver, favoring a longer lag to enable a longer forecast. Driver and lag you chose: _________
Assess and plot residuals to identify patterns to be explained.
-
4.
Add an indicator to account for the apparent shock to improve the model. (Compare results with the indicator turned on alternate years to find the best years to turn on.)
Your indicator is turned on in years: ________ to ________
Assess and plot residuals to look for patterns to be explained.
-
5.
Compare correlations with residuals, identifying potential lagged drivers with higher (.02 of highest) correlations. (Consider only those which were not chosen for the one driver model.)
Lag(s) | Lag(s) | Lag(s) | |||
NA prod | GDPPC | GDP | |||
population | Alum price |
Build a three driver model, adding one of the more highly correlated drivers, preferably with a longer lag.
Validate and recalibrate your best model, either two or three driver. (Note: if the third driver is not significant, go with your two driver model.) Then, add a tariff indicator to assess significance on production.
-
6.
How much have the 2018 tariffs on Chinese imports significantly reduced production? _______ to ________ K mt. (from the confidence interval for the coefficient)
-
7.
Present your fit and forecast of your best model (with the tariff, if significant; without the tariff if not significant), removing unwanted decimals, adding axes and chart titles, and using 12 pt font:
-
8.
Present your final model equation:
-
9.
Illustrate driver importances with a bar chart of part worth ranges.
-
10.
Identify all continuous lagged drivers of Chinese aluminum production. Run regressions with each continuous driver not in your model. (Choose one lag for each.) Which potential drivers, including those not in your model and those in your model, drive China aluminum production?
___ World population ___ NA production ___ GDP ___ GDPPC ___aluminum price
-
Production in North America
NA aluminum.xlsx contains North American aluminum production for 2007 through 2018, Chinese aluminum production, World GDP, per capita GDP, and population, as well as World aluminum prices from 2002 to 2018.
Hide the two most recent observations for aluminum production (until you are ready to recalibrate).
Plot North American aluminum production to identify patterns to be explained.
Identify potential drivers and set up lags which will allow a 4–5 year forecast.
Assess skewness and rescale if skewness is outside the approximately Normal −1 to +1 range.
-
1.
Plot the potential drivers and identify those with most pattern matches to aluminum production in North America in the four most recent years (among those not hidden). Choosing one of those with most pattern matches will increase chances that your model is valid. Specify best matches with their lag(s). (Note: to match, Chinese production needs to move in the opposite direction.)
Lag(s)
Lag(s)
Lag(s)
Chinese prod
GDP
Aluminum price
World population
GDPPC
-
2.
Among potential driver lags with most matches in the four most recent years, identify those that have higher correlations (of correct sign) with aluminum production. (Consider “higher” correlations those within .02 of the highest correlation.) Specify higher correlations with their lag(s). (If a potential driver does not have the most matches in the four most recent years, leave that cell blank.)
Lag(s)
Lag(s)
Lag(s)
Chinese prod
GDP
Aluminum price
World population
GDPPC
-
3.
Build a one driver model, favoring a highly correlated driver with longer lag.
Assess and plot residuals to identify patterns to be explained. The driver and lag that you chose: __________
-
4.
Add an indicator to account for the apparent shock to improve the model. (Compare results with the indicator turned on alternate years to find the best years to turn on.) Years you chose to turn on: _______
Assess and plot residuals to look for patterns to be explained.
-
5.
Compare correlations with residuals, identifying potential lagged drivers with higher (.02 of highest) correlations. (Consider only those which were not chosen for the one driver model.)
Lag(s) | Lag(s) | Lag(s) | |||
Chinese prod | GDP | Aluminum price | |||
World population | GDPPC |
Build a three driver model, adding one of the more highly correlated drivers, preferably with a longer lag.
Validate and recalibrate your best model. (Note: if the third driver is not significant, go with your two driver model.) Then, add a tariff indicator to assess significance on production.
-
6.
How much have the 2018 tariffs on Chinese imports significantly increased North American production? _______ to ________ mt.
-
7.
Present your fit and forecast. (Note: if the tariff is not significant, go with your recalibrated model without tariff.) Remove unwanted decimals, add axis and chart titles, and using 12 pt font:
-
8.
Present your final model equation:
-
9.
Illustrate driver importances with a bar chart of part worth ranges.
-
10.
Identify all continuous lagged drivers of North American aluminum production. Run regressions with each continuous driver not in your model, including the indicator(s). (Choose one lag for each.) Which potential drivers, including those not in your model and those in your model, drive North American aluminum production?
___ Chinese prod ___ GDP ___ GDPPC ___alum price ___ World population
Case 10.2: Identify Drivers and Forecast Amazon Sales
Amazon has grown at amazing rates over its short history. Several investors want to know what is driving that growth and whether that sort of growth is sustainable. With World population, World GDP, or World GDPPC, build a model of Amazon’s sales to identify drivers and forecast sales through at least 2025, and, possibly through 2027. You are able to use 4 or 5 year lags for World Population and World GDP, but World GDPPC should be limited to a 3 year lag.
Recent events may have led to improved sales.
Whole Foods was acquired late in 2017, with 2018 the first full year of operation as part of Amazon.
Prime Expansion. In 2020, partly due to the Pandemic, Amazon began offering greatly expanded benefits to its prime members, such as
-
Amazon Fresh, where prime members in select cities can shop for food and nonfood essentials,
-
Amazon Day, where prime members could shop during a week, then choose free delivery of multiple items in a single box,
-
Discount of 20% off to prime members on subscription purchases of six or more items, such as diapers and baby food.
-
Door step WFM deliveries
Data containing Amazon sales from 2011 through 2022 is in Amazon Revenue.excel.
-
1.
Copy and paste the coefficient section of your final, recalibrated model, showing variable names, coefficients and pvalues:
-
2.
Present your regression equation:
-
3.
Explain what your equation tells us:
-
4.
Show your recalibrated fit and forecast scatterplot, below. Paste directly, not as a picture.
-
5.
Illustrate the importance of drivers in your model with a bar chart:
-
6.
Illustrate the impact on predicted revenues of the most influential indicator:
-
7.
Which potential drivers, including those in your model and those not in your model drive sales:
___ Population ___ GDPPC ____GDP
Case 10.3: Identify Drivers and Forecast EV Sales
A Ford investor seeks to explain the apparent success of EV sales. It is believed that economic productivity, GDP, may drive EV sales. It is also believed that personal income, GDPPC, may drive EV sales, since EVs are expensive, relative to conventional cars. Finally, it is thought that increasing emissions may drive demand for electric vehicles. Build a model of EV vehicle sales to both explain sales fluctuations and forecast sales in the next through 2025, at the least, and possibly through 2027.
Consider these following potential drivers:
-
World Per capita GDP(K)q−l
-
World GDP(trillion USD)q−l
-
World CO2 emissions(B tons)t−l
For World GDPPC, consider only a 3 year lag, since the delayed impact of changes in personal income wouldn’t be a long as 4 or 5 years. Consider 4 and 5 year lags of World GDP and World CO2 emissions, which would enable a longer forecast if GDPPC isn’t used.
Some EV model releases and availability may significantly boost sales:
-
Tesla Model S was launched in June 2012, with 2013 the first full year of availability.
-
Ford Focus was launched mid 2012, with 2013 the first year of availability, and available through 2017.
-
Toyota RAV4 was launched mid 2013, with 2013 the first year of availability, available through 2014.
-
Tesla Model X was launched mid 2015, with 2016 the first full year of availability.
-
Chevy Bolt was launched mid 2016, with 2017 the first full year of availability.
-
Tesla Model 3 was introduced July 2017, with first full year of availability in 2018.
-
Tesla Model Y was introduced March 2020, with first full year of availability in 2021.
-
1.
Plot EV sales, excluding the two most recent years. If you rescaled, plot in the rescaled units. Add a trendline:
-
2.
Copy and paste the excel cells showing drivers, coefficients, pvalues, and one tail pvalues from your final, recalibrated model:
-
3.
Present the equation describing your final, recalibrated model:
-
4.
Explain what your equation tells us:
-
5.
Present a bar graph showing the contributions to predicted EV sales:
-
6.
Present your model fit and forecast:
-
7.
Use sensitivity analysis to illustrate the impact of one of the shifts or shocks included in your model. Illustrate with a scatterplot:
Case 10.4: Identify Drivers and Forecast Sephora Sales
Managers want to know what is driving Sephora’s sales, including a forecast through 2025. Build a model of Sephora’s sales to both explain and forecast sales through 2025. Sephora.xlsx contains, World Population, World Urban Population, World GDP, and World GDPPC, all of which are thought to positively drive sales.
Recent events may have influenced sales:
Class action suit. In 2014, four Asian customers filed a class action suit alleging race discrimination. Sephora had discovered that some Asia customers were buying in bulk and reselling in Asian markets. Sephora deactivated some Asian accounts. This may have been a temporary shock or it may have longer term impacts.
Smart phone app. In 2016 Sephora introduced a smart phone app which enabled customers to try cosmetics virtually.
Mexico. In 2018, Sephora entered Mexico.
The Pandemic. The pandemic may have hurt sales growth in 2020. Stores temporarily closed. Any Pandemic impact is thought to be temporary.
Data containing Sephora sales from 2009 through 2020 is in Sephora.excel.
-
8.
Copy and paste the coefficient section of your final, recalibrated model, showing variable names, coefficients and pvalues:
-
9.
List the indicators in your recalibrated model, along with years they are each turned on (set to one).
-
10.
Present your regression equation:
-
11.
Explain what each component of your equation, other than the intercept, tells us:
-
12.
Present the DW(n,K) from your validation model and your interpretation:
-
1.
6, Show your recalibrated fit and forecast scatterplot, below. Paste directly, not as a picture.
-
2.
What is your 2025 forecast, with 95% certainty? (Specify units) _____ to _____
-
3.
Illustrate the importance of drivers in your model with a bar chart:
-
4.
Which potential drivers, including those in your model and those not in your model drive sales:
___ Population ___ Urban Population ___ GDP ___ GDPPC
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Fraser, C. (2024). Explanatory Time Series Models. In: Business Statistics for Competitive Advantage with Excel and JMP . Springer, Cham. https://doi.org/10.1007/978-3-031-42555-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-42555-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42554-7
Online ISBN: 978-3-031-42555-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)