Explanatory Time Series Models

Fraser, Cynthia

doi:10.1007/978-3-031-42555-4_10

Cynthia Fraser²

99 Accesses

Abstract

An explanatory model from time series data allows us to identify performance drivers and forecast performance given specific driver values, just as regression models from cross sectional data do. When decision makers want to forecast future performance in the shorter term, a time series of past performance is used to identify drivers and fit a model. A time series model can be used to identify drivers whose variation over time is associated with later variation in performance over time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

McIntire School of Commerce, University of Virginia, Charlottesville, VA, USA
Cynthia Fraser

Authors

Cynthia Fraser
View author publications
You can also search for this author in PubMed Google Scholar

10.1 Electronic Supplementary Material(s)

Amazon Revenue master (XLSX 57 kb)

China NA alum prod master (XLSX 88 kb)

EV vehicle forecast master (XLSX 150 kb)

explanatory Tesla master (XLSX 77 kb)

NA alum prod master (XLSX 90 kb)

sephora trend forecast master (XLSX 41 kb)

Appendices

Excel 10.1: Build and Fit an Explanatory Time Series Model

Tesla Revenues

Build a model of Tesla revenues which potentially includes past the economic indicators, represented by World GDPPC and World GDP, and World CO₂ emissions. The data are in Tesla revenues.

Excel 10.2: Create Potential Driver Lags

Create working columns that will contain the lagged drivers by copying columns A through E and then pasting into column G:

Two tables in the Excel sheet with 5 columns and 18 rows depict revenue in billion dollars, world C O 2 emissions in billion tons, world G D P in thousand dollars, and world G D P P C in dollars thousand for the years from 2005 to 2022.

For the desired forecast, emissions, GDP and GDPPC must be lagged 3–5 years, providing data in 2023 and later years upon which to make the forecast. For emissions and GDP, make 3, 4 and 5 year lags. For GDPPC, make a 3 year lag. To make the three three year lags, either insert three empty cells in rows 2–4 or select the data and drag down three rows. Copy those 3 year lags for emissions and GDP and paste into new columns, and then insert one more empty cell in row 2 to create 4 year lags. Copy those 4 year lags and paste into new columns, then inserte one more empty cell in row 2 to make 5 year lags. Label the eight columns with the appropriate lags.

The top of the data series becomes:

Two tables. 1. It depicts revenue, world emissions, world G D P, and world G D P P C for the years 2005 to 2022. 2. It depicts revenue, world emissions and world G D P for t minus 3, t minus 4, and t minus 5, and world G D P P C t minus 3 for the years 2005 to 2025.

Now, delete the rows with incomplete data, 2–6:

An Excel sheet of 8 columns read revenue, world C O 2 emissions t minus 3, world G D P t minus 3, world G D P P C t minus 3, world C O 2 emissions t minus 4, world G D P t minus 4, world C O 2 emission t minus 5, and world G D P t minus 5 for the years 2010 to 2025.

Next assess skewness of the dependent variable, revenues, and the seven potential drivers, using only rows through 2020:

An Excel sheet of 9 columns reads revenue, world C O 2 emissions t minus 3, world G D P in t minus 3, world G D P P C t minus 3, world C O 2 emissions t minus 4, world G D P t minus 4, world C O 2 emission t minus 5, and world G D P t minus 5, along with their skewness for the years 2020 to 2025.

Since revenues are positively skewed outside the approximately Normal range, add a new column I with the square roots and confirm that skewness of the square roots is within the approximately Normal range of −1 to +1.

An Excel sheet with 4 columns reads world G D P P C in dollars thousand, year, revenue in dollars billion, and square root revenue in dollars billion for the years from 2010 to 2025.

Excel 10.3: Select the Most Promising Driver

To increase the chances that the model will be valid, identify the potential driver(s) that show a matching pattern in the recent years (excluding the two most recent years).

Plot the square roots and the potential drivers, adding trendlines. Plot through the 2020 year only, shown in Figs. 10.9, 10.10, 10.11, and 10.12. The longest lags for emissions and GDP are shown below, though all six of the lags should be considered.

A scatterplot of the revenue in dollars million square roots versus years from 2010 to 2020. It plots a positive growth with the best-fit line. The growth is consistent from 2015 to 2018. — **Fig. 10.9**

A scatterplot of the world G D P P C in dollars kilo t minus 3 versus years from 2010 to 2020. It plots a positive trend with a best-fit line. The growth is consistent from 2014 to 2017. — **Fig. 10.10**

A scatterplot of the world G D P in dollars thousand t minus 5 versus years from 2010 to 2020. It plots a positive growth with a best-fit line. The growth is consistent from 2016 to 2019. — **Fig. 10.11**

A scatterplot of the world C O 2 emissions in billion tons versus years from 2010 to 2020. It plots a positive growth with a best-fit line. The growth is consistent from 2011 to 2013 and 2015 to 2017. — **Fig. 10.12**

Revenues increased every year, including the four most recent years.

World GDPPC, lagged 3 years, has increased three out of four of the most recent years.

World GDP, lagged 5 years, has increased in three of the four most recent years. This is also the case for the 3 and 4 year lags.

World emissions, lagged 5 years, have increased annually in three of the four most recent years. The 3 and 4 year lags have fewer matching increases.

Based on patterns revealed by the plots, the GDPPC 3 year lag, the GDP 3, 4 and 5 year lags, and the emissions 5 year lag match better than other lags, and tie for three out of four matches in the four most recent years. Move the emissions lags which match less well out of the way, for now. To decide which of these lags is preferred, look at correlations with the revenue square roots, using only data through 2020.

An Excel sheet depicts the values of revenue in dollar billion, square root revenue, world G D P t minus 3, t minus 4, and t minus 5, world G D P P C t minus 3, and world C O 2 emissions. The dialog box for correlation is displayed on the sheet, with options to set input and output ranges.

	sqrt revenue ($B)
sqrt revenue ($B)	1
World GDP (T$) t−3	0.838261
World GDPPC ($K) t−3	0.703793
World GDP (T$) t−4	0.846541
World CO₂ emissions (B tons) t−5	0.933914
World GDP (T$) t−5	0.908683

Only the first column of correlations with the revenue square roots is of interest. The correlation with the 5 year emissions lag is strongest, make that the best choice for the one driver model.

Build a one driver model of the revenue square roots with the 5 year emissions lag.

An Excel sheet depicts the values of square root revenue in dollars billion, world G D P t minus 3 and t minus 4, world G D P P C t minus 3, and world C O 2 emissions t minus 5. The dialog box for regression is displayed on the sheet, with options to set input Y and X ranges and output range.

Three tables of summary output. 1. The table of regression statistics has multiple R, R square, adjusted R square, standard error, and observations. The columns have 2. d f, S S, M S, F, and significance F. 3. coefficients, standard error, t stat, P-value, and lower and upper 95 and 95.05 percentages.

Variation in past World emissions accounts for 87% of the variation in the revenue square roots. The model is significant. The emissions coefficient has the correct positive sign, and the one tail pvalue would be much smaller than .05.

Plot the residuals, setting major units to the standard error, .7, and assess Durbin Watson.

An Excel sheet with a residual plot on the right. The residual plot plots the residuals versus the years from 2010 to 2020. The table on the left has the values of residuals from 2010 to 2020. The sheet also has the values of d L, d U, and D W.

DW is between the lower and upper critical values, indicating possible positive autocorrelation. Residuals in 2018–2020 illustrate a positive impact on revenues, presumably due to availability of Model 3. Add a Model 3 indicator, set to one in 2018 and later years and build a two driver model.

An Excel sheet with columns revenue in dollars billion, square root revenue in dollars billion, model 3, and world C O 2 emissions in billion tons t minus 5. The dialog box for regression on the right has options to set input Y and X ranges, labels, output range, new worksheet ply, and residuals.

With past World emissions and the Model 3 indicator, 96% of the variation in revenue is accounted for. The model is significant. Both coefficient estimates have the correct positive signs and both one tail pvalues would be much smaller than .05.

Plot residuals, setting major units to the standard error, .4, and assess Durbin Watson.

An Excel sheet with a residual plot on the right. The residual plot plots the residuals from the year 2010 to 2020 with a maximum of 0.596415 in 2014 and a minimum of negative 0.51708 in 2012. The table on the left depicts the values of residuals from 2010 to 2020 and the value of D W.

DW is now greater than two, signaling absence of positive autocorrelation. All residuals are within two standard errors of zero. Nonetheless, a third driver could improve the model. To allow for this possibility, copy the residuals and paste next to the lagged drivers, and then run correlations between the lagged drivers for GDP and GDPPC and residuals.

An Excel sheet with world G D P in thousand dollars t minus 3 and t minus 4, world G D P P C in dollars kilo t minus 3, world G D P t minus 5, and residuals. The dialog box for correlation on the right has options to choose input range, grouped by, output range, new worksheet ply, and new workbook.

	World GDP (T$) t−3	World GDPPC ($K) t−3	World GDP (T$) t−4	World GDP (T$) t−5	Residuals
Residuals	0.31776	0.361931	−0.02193	−0.01632	1

It is the residual row of correlations that will reveal any drivers that could help to account for remaining residual variation. The largest correlation is with past GDPPC, lagged 3 years. Note that adding that driver to the model will restrict the forecast to 3 years. Run that three driver model.

An Excel sheet with square root revenue in dollars billion, model 3, world C O 2 emissions t minus 5, world G D P P C t minus 3, and world G D P t minus 3, t minus 4, and t minus 5. The dialog box for regression to the right has options to choose input Y and X ranges, output range, and residuals.

Adding past GDPPC increases RSquare to 98.1%. The model is significant, all three coefficient estimates are positive, as expected, and all three have one tail pvalues much smaller than .05.

Assess residuals:

An Excel sheet with a residual plot on the right. The residual plot plots the residuals from 2010 to 2020 with a maximum of 0.364976 in 2017 and a minimum of negative 0.33571 in 2015. The table on the left has the values of residuals from 2010 to 2020 along with the value of D W.

All residuals are within two standard errors of zero and DW is above two. Residuals are free from positive autocorrelation.

Excel 10.4: Test the Model’s Forecasting Validity

Test the model’s predictive validity.

Copy Year, and revenue and revenue square roots from the data sheet and paste next to residuals.

An Excel sheet with 5 columns and 16 rows with residuals from 2010 to 2020, D W, revenue in dollars billion, and square root revenue in dollars billion.

JMP 10.1: Find Individual Prediction Interval Bounds

Use JMP to find the individual prediction interval bounds to test predictive validity.

Copy year, sqrt revenue, Model 3, lagged World emissions, lagged World GDPPC and paste into JMP.

Hide/exclude 2021 and 2022.

A table with 5 columns and 16 rows with revenue in dollars billion, revenue in dollars billion square root, model 3, and world emissions in billion tons t minus 5 for the years 2010 to 2025.

Analyze, Fit Model.

A screenshot of model specification with options to select columns, pick role variables, personality, emphasis, and construct model effects. The pick role variables have the options to choose Y, weight, frequency, validation, and by. The options of help, recall, remove, and run are on the right.

In the regression output, click on the red triangle at the top, Save Columns, Indiv Confidence interval.

A screenshot of response revenue in dollars billion square root. The red triangle at the top left corner has the options of regression reports, estimates, and save columns. The option save columns is selected. The individual confidence interval under the save columns option is highlighted.

A sheet with 2 columns and 14 rows depicts lower 95 percent individual revenue in dollars billion square root and upper 95 percent individual revenue in dollars billion square root.

Copy the individual prediction interval bounds (Indiv Confidence Intervals) and paste into your excel regression sheet.

An Excel sheet with 5 columns and 14 rows depicts revenue in dollars billion and revenue in dollars billion square roots from the year 2010 to 2023 with the data of 2021 and 2021 selected.

Comparing the predictions for 2021 and 2022 with actual values reveals that the forecasts are too low, presumably due to the impact of the popular Model Y launch.

Add a Model Y indicator, set to one 2021 through 2025, and set to zero before 2021.

Excel 10.5: Recalibrate to Forecast

Recalibrate the model with regression using all years, model 3, Model Y, emissions and GDPPC.

An Excel sheet with revenue and square root revenue in dollars billion, model 3, model y, world C O 2 emissions t minus 5, and world G D P P C t minus 3. The dialog box for regression to the right has options to choose input Y and X ranges, output range, and a new worksheet ply.

Three tables of summary output. The columns of 1. regression statistics with multiple R, R square, adjusted R square, standard error, and observations. 2. d f, S S, M S, F, and significance F for regression, residual, and total. 3. coefficients, standard error, t s t a t, P-value, and one tail p value.

Accounting for Model Y availability increases RSquare to 98.6%. All four drivers are significant with correct signs.

To make the fit and forecast, copy year and revenues and paste next to residuals.

An Excel sheet with 4 columns and 16 rows has square root revenue and residuals for observations 1 to 13 and the revenues in dollars billion for the years from 2010 to 2025.

Use JMP to find the individual prediction interval bounds.

Copy the year, revenue, revenue square roots, the two indicators, past World emissions, and World GDPPC and paste into a new data table in JMP.

A table with 6 columns and 16 rows has revenues in dollars billion, square root revenue in dollars billion, model 3, model Y, and world C O 2 emissions in tons t minus 5 for the years from 2010 to 2025.

Analyze, Fit Model.

Add the revenue square roots to Y and Add Model 3, Model Y, World emissions and World GDPPC to Construct Model Effects. Change emphasis to Minimal and click Keep Dialog Open.

Run, then click the red triangle, Save Columns, Indiv Confidence Intervals:

Select and copy the prediction intervals, which will have been added to the data table:

Paste into the excel regression sheet, next to the square roots. Create two new columns which are the squares of the prediction interval bounds in square roots. Add labels:

An Excel sheet with 6 columns and 16 rows depicts revenues in dollars billion, lower individual prediction intervals, and upper individual prediction intervals for the years from 2010 to 2025.

Excel 10.6: Illustrate the Fit and Forecast

To see the model fit and forecast, plot the prediction intervals with actual revenues through 2020. After selecting Year and Revenues through 2025, hold CNTL and then, with your mouse, select the prediction interval bounds. Insert a scatterplot.

An Excel sheet with a 3-line graph to the right. The table has revenues in dollars billion, lower and upper individual prediction intervals for the years from 2010 to 2025. The lines of revenues and lower and upper individual prediction intervals versus years from 2010 to 2025 plot an increasing trend.

Adjust the horizontal axis to make good use of white space, add a vertical axis title, remove unwelcome zeros, change the prediction intervals to one color, without markers, change the actual revenues to markers only, and add a stand alone title:

A 2-line and dot plot of the revenues in dollars billion, lower and upper individual prediction intervals versus years from 2010 to 2025. The lines plot an increasing trend from 0 in 2010 to 50 and 116 billion dollars in 2025 respectively.

Excel 10.7: Assess the Impact of Drivers

To see the contributions of the four drivers to revenue forecasts, make the part worths. To make the part worths, copy Model 3, Model Y, World emissions and World GDPPC from the data sheet and paste into the regression sheet.

The intercept is in $B$17.

The Model 3 part worth is the product of its coefficient, $B$18, and the value of Model 3 in a year.

The Model Y part worth is the product of its coefficient, $B$19, and the value of recession in a year.

The World emissions part worth is the product of its coefficient, $B$20, and the value of World emissions t−5.

The World GDPPC part worth is the product of its coefficient, $B$21, and the value of World GDPPC t−3.

Predicted revenue square roots are the sum of the five part worths, and when squared, produce predicted revenues.

Use the part worths to compare the impacts of each of the drivers on model forecasts. The ranges (maximum-minimum) in the part worths are measures of driver importances.

Copy driver labels and paste above the part worth ranges, and then insert a column chart (2D bar).

An Excel sheet with 11 columns and 16 rows with data and the maximum, minimum, and range of model Y, model 3, world emissions, and world G D P P C.

A horizontal bar graph of the contribution to predicted revenues in square root dollars billion for 4 variables. The values are world G D P P C 2.568598, world emissions 2.534747, model 3 2.094603, and model Y 2.274531. The table below has the values of contribution to predicted revenue.

Add a horizontal axis title and add a stand alone title.

A horizontal bar graph of the contribution to predicted revenues in square root dollars billion for 4 variables. The contribution values of world G D P P C are 2.7, world emissions 2.6, model 3 2.2, and model Y 2.3.

Using sensitivity analysis, the impacts of the Model 3 and Model Y availability can be revealed. Since this is a nonlinear model, that impacts will differ each year. Set the Model 3 values to zero, copy and paste special (without formulas) predicted revenues.

Reset the Model 3 indicator to 1 in 2018–2025 and find the difference between predicted revenue with and without Model 3.

An Excel sheet with 13 columns and 16 rows with model 3, world emissions in billion tons t minus 5, world G D P P C t minus 3, intercept, model y p, model 3 p, world emissions and G D P P C p w in square root dollars billion, predicted, predicted revenues, and predicted revenues without model 3.

The Model 3 estimated impact ranges from $16.1B to $33.1B. While the impact generally increases each year, there are several years where this is not true, due to interactions with past World GDPPC and World emissions.

Repeat this process to assess the expected impact of Model Y in years 2021–2025.

An Excel sheet with 6 columns and 16 rows with predicted, predicted revenues in dollars billion, predicted revenues in dollars billion without model 3, impact of model 3, predicted revenues in dollars billion without model Y, and impact of model Y.

The estimated impact of Model Y is $30.6B–$35.6B, also increasing most years, but not all years due to interactions with past World emissions and World GDPPC.

Case 10.1: Identifying Drivers and Forecasting Aluminum Production in China and North America

Alcoa (and other U.S. aluminum producers) face increasing competition from Chinese aluminum production. China has designated aluminum as a strategic resource, and now produces more than demanded in China. Overproduction has produced a glut in World markets, lowering prices and profits of producers. To bolster aluminum production and sales in the U.S., the Whitehouse placed tariffs on aluminum imports from China in March 2018.

Alcoa managers are interested in forecasting production of aluminum in China and in North America, as well as identifying drivers of production in both global regions

Managers believe that population, GDP and GDP per capita drive aluminum production, since increasing population and wealth drives demand for aluminum intensive products, such as soft drinks, beer, cars, and construction. They are not sure which is the most powerful driver.

The price of aluminum probably influences production. Higher prices motivate increased production, while falling prices lead to reduced production.

Finally, aluminum supply may drive aluminum production. Production in North America may influence production in China, and production in China may influence production in North America.

Have tariffs significantly driven production in China and in North America? (Address this question after you’ve built valid models by then adding a tariff indicator to your recalibrated models.)

Alcoa would like forecasts of aluminum production in China and North America, ideally for the next 5 years, 2019–2023, but at least through 2022.

Production in China

CHN aluminum.xlsx contains Chinese aluminum production for 2007 through 2018, North American aluminum production, World GDP and per capita GDP, World population, as well as World aluminum prices from 2002 to 2018.

Hide the two most recent observations for aluminum production (until you are ready to recalibrate).

Plot Chinese aluminum production to identify patterns to be explained.

Identify potential drivers and set up lags which will allow a 4–5 year forecast.

Assess skewness and rescale if skewness is outside the approximately Normal −1 to +1 range.

1.
Plot the potential drivers and identify those with most pattern matches to aluminum production in China in the four most recent years (among those not hidden). Choosing one of those with most pattern matches will increase chances that your model is valid. Specify best matches with their lag(s). (Note: matches with NA production need to be moving in the opposite direction.)

Lag(s)

Lag(s)

Lag(s)

Lag(s)
NA prod

GDPPC

GDP

Alum price
2.
Among potential driver lags with most matches in the four most recent years, identify those that have higher correlations (of correct sign) with aluminum production. (Consider “higher” correlations those within .02 of the highest correlation.) Specify higher correlations with their lag(s). (If a potential driver does not have the most matches in the four most recent years, leave that cell blank.)

Lag(s)

Lag(s)

Lag(s)
NA prod

GDPPC

GDP

population

Alum price
3.
Build a one driver model, favoring a highly correlated driver, favoring a longer lag to enable a longer forecast. Driver and lag you chose: _________

Assess and plot residuals to identify patterns to be explained.
4.
Add an indicator to account for the apparent shock to improve the model. (Compare results with the indicator turned on alternate years to find the best years to turn on.)

Your indicator is turned on in years: ________ to ________

Assess and plot residuals to look for patterns to be explained.
5.
Compare correlations with residuals, identifying potential lagged drivers with higher (.02 of highest) correlations. (Consider only those which were not chosen for the one driver model.)

	Lag(s)		Lag(s)		Lag(s)
NA prod		GDPPC		GDP
population		Alum price

Build a three driver model, adding one of the more highly correlated drivers, preferably with a longer lag.

Validate and recalibrate your best model, either two or three driver. (Note: if the third driver is not significant, go with your two driver model.) Then, add a tariff indicator to assess significance on production.

6.
How much have the 2018 tariffs on Chinese imports significantly reduced production? _______ to ________ K mt. (from the confidence interval for the coefficient)
7.
Present your fit and forecast of your best model (with the tariff, if significant; without the tariff if not significant), removing unwanted decimals, adding axes and chart titles, and using 12 pt font:
8.
Present your final model equation:
9.
Illustrate driver importances with a bar chart of part worth ranges.
10.
Identify all continuous lagged drivers of Chinese aluminum production. Run regressions with each continuous driver not in your model. (Choose one lag for each.) Which potential drivers, including those not in your model and those in your model, drive China aluminum production?

___ World population ___ NA production ___ GDP ___ GDPPC ___aluminum price

Production in North America

NA aluminum.xlsx contains North American aluminum production for 2007 through 2018, Chinese aluminum production, World GDP, per capita GDP, and population, as well as World aluminum prices from 2002 to 2018.

Hide the two most recent observations for aluminum production (until you are ready to recalibrate).

Plot North American aluminum production to identify patterns to be explained.

Identify potential drivers and set up lags which will allow a 4–5 year forecast.

Assess skewness and rescale if skewness is outside the approximately Normal −1 to +1 range.

1.
Plot the potential drivers and identify those with most pattern matches to aluminum production in North America in the four most recent years (among those not hidden). Choosing one of those with most pattern matches will increase chances that your model is valid. Specify best matches with their lag(s). (Note: to match, Chinese production needs to move in the opposite direction.)

Lag(s)

Lag(s)

Lag(s)
Chinese prod

GDP

Aluminum price

World population

GDPPC
2.
Among potential driver lags with most matches in the four most recent years, identify those that have higher correlations (of correct sign) with aluminum production. (Consider “higher” correlations those within .02 of the highest correlation.) Specify higher correlations with their lag(s). (If a potential driver does not have the most matches in the four most recent years, leave that cell blank.)

Lag(s)

Lag(s)

Lag(s)
Chinese prod

GDP

Aluminum price

World population

GDPPC
3.
Build a one driver model, favoring a highly correlated driver with longer lag.

Assess and plot residuals to identify patterns to be explained. The driver and lag that you chose: __________
4.
Add an indicator to account for the apparent shock to improve the model. (Compare results with the indicator turned on alternate years to find the best years to turn on.) Years you chose to turn on: _______

Assess and plot residuals to look for patterns to be explained.
5.
Compare correlations with residuals, identifying potential lagged drivers with higher (.02 of highest) correlations. (Consider only those which were not chosen for the one driver model.)

	Lag(s)		Lag(s)		Lag(s)
Chinese prod		GDP		Aluminum price
World population		GDPPC

Build a three driver model, adding one of the more highly correlated drivers, preferably with a longer lag.

Validate and recalibrate your best model. (Note: if the third driver is not significant, go with your two driver model.) Then, add a tariff indicator to assess significance on production.

6.
How much have the 2018 tariffs on Chinese imports significantly increased North American production? _______ to ________ mt.
7.
Present your fit and forecast. (Note: if the tariff is not significant, go with your recalibrated model without tariff.) Remove unwanted decimals, add axis and chart titles, and using 12 pt font:
8.
Present your final model equation:
9.
Illustrate driver importances with a bar chart of part worth ranges.
10.
Identify all continuous lagged drivers of North American aluminum production. Run regressions with each continuous driver not in your model, including the indicator(s). (Choose one lag for each.) Which potential drivers, including those not in your model and those in your model, drive North American aluminum production?

___ Chinese prod ___ GDP ___ GDPPC ___alum price ___ World population

Case 10.2: Identify Drivers and Forecast Amazon Sales

Amazon has grown at amazing rates over its short history. Several investors want to know what is driving that growth and whether that sort of growth is sustainable. With World population, World GDP, or World GDPPC, build a model of Amazon’s sales to identify drivers and forecast sales through at least 2025, and, possibly through 2027. You are able to use 4 or 5 year lags for World Population and World GDP, but World GDPPC should be limited to a 3 year lag.

Recent events may have led to improved sales.

Whole Foods was acquired late in 2017, with 2018 the first full year of operation as part of Amazon.

Prime Expansion. In 2020, partly due to the Pandemic, Amazon began offering greatly expanded benefits to its prime members, such as

Amazon Fresh, where prime members in select cities can shop for food and nonfood essentials,
Amazon Day, where prime members could shop during a week, then choose free delivery of multiple items in a single box,
Discount of 20% off to prime members on subscription purchases of six or more items, such as diapers and baby food.
Door step WFM deliveries

Data containing Amazon sales from 2011 through 2022 is in Amazon Revenue.excel.

1.
Copy and paste the coefficient section of your final, recalibrated model, showing variable names, coefficients and pvalues:
2.
Present your regression equation:
3.
Explain what your equation tells us:
4.
Show your recalibrated fit and forecast scatterplot, below. Paste directly, not as a picture.
5.
Illustrate the importance of drivers in your model with a bar chart:
6.
Illustrate the impact on predicted revenues of the most influential indicator:
7.
Which potential drivers, including those in your model and those not in your model drive sales:

___ Population ___ GDPPC ____GDP

Case 10.3: Identify Drivers and Forecast EV Sales

A Ford investor seeks to explain the apparent success of EV sales. It is believed that economic productivity, GDP, may drive EV sales. It is also believed that personal income, GDPPC, may drive EV sales, since EVs are expensive, relative to conventional cars. Finally, it is thought that increasing emissions may drive demand for electric vehicles. Build a model of EV vehicle sales to both explain sales fluctuations and forecast sales in the next through 2025, at the least, and possibly through 2027.

Consider these following potential drivers:

World Per capita GDP(K)_q−l
World GDP(trillion USD)_q−l
World CO₂ emissions(B tons)_t−l

For World GDPPC, consider only a 3 year lag, since the delayed impact of changes in personal income wouldn’t be a long as 4 or 5 years. Consider 4 and 5 year lags of World GDP and World CO₂ emissions, which would enable a longer forecast if GDPPC isn’t used.

Some EV model releases and availability may significantly boost sales:

Tesla Model S was launched in June 2012, with 2013 the first full year of availability.
Ford Focus was launched mid 2012, with 2013 the first year of availability, and available through 2017.
Toyota RAV4 was launched mid 2013, with 2013 the first year of availability, available through 2014.
Tesla Model X was launched mid 2015, with 2016 the first full year of availability.
Chevy Bolt was launched mid 2016, with 2017 the first full year of availability.
Tesla Model 3 was introduced July 2017, with first full year of availability in 2018.
Tesla Model Y was introduced March 2020, with first full year of availability in 2021.

1.
Plot EV sales, excluding the two most recent years. If you rescaled, plot in the rescaled units. Add a trendline:
2.
Copy and paste the excel cells showing drivers, coefficients, pvalues, and one tail pvalues from your final, recalibrated model:
3.
Present the equation describing your final, recalibrated model:
4.
Explain what your equation tells us:
5.
Present a bar graph showing the contributions to predicted EV sales:
6.
Present your model fit and forecast:
7.
Use sensitivity analysis to illustrate the impact of one of the shifts or shocks included in your model. Illustrate with a scatterplot:

Case 10.4: Identify Drivers and Forecast Sephora Sales

Managers want to know what is driving Sephora’s sales, including a forecast through 2025. Build a model of Sephora’s sales to both explain and forecast sales through 2025. Sephora.xlsx contains, World Population, World Urban Population, World GDP, and World GDPPC, all of which are thought to positively drive sales.

Recent events may have influenced sales:

Class action suit. In 2014, four Asian customers filed a class action suit alleging race discrimination. Sephora had discovered that some Asia customers were buying in bulk and reselling in Asian markets. Sephora deactivated some Asian accounts. This may have been a temporary shock or it may have longer term impacts.

Smart phone app. In 2016 Sephora introduced a smart phone app which enabled customers to try cosmetics virtually.

Mexico. In 2018, Sephora entered Mexico.

The Pandemic. The pandemic may have hurt sales growth in 2020. Stores temporarily closed. Any Pandemic impact is thought to be temporary.

Data containing Sephora sales from 2009 through 2020 is in Sephora.excel.

8.
Copy and paste the coefficient section of your final, recalibrated model, showing variable names, coefficients and pvalues:
9.
List the indicators in your recalibrated model, along with years they are each turned on (set to one).

A table with 2 columns and 3 rows. The column headings are indicator and years turned on. There are no entries in the rows.

10.
Present your regression equation:
11.
Explain what each component of your equation, other than the intercept, tells us:
12.
Present the DW(n,K) from your validation model and your interpretation:
1.
6, Show your recalibrated fit and forecast scatterplot, below. Paste directly, not as a picture.
2.
What is your 2025 forecast, with 95% certainty? (Specify units) _____ to _____
3.
Illustrate the importance of drivers in your model with a bar chart:
4.
Which potential drivers, including those in your model and those not in your model drive sales:

___ Population ___ Urban Population ___ GDP ___ GDPPC

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fraser, C. (2024). Explanatory Time Series Models. In: Business Statistics for Competitive Advantage with Excel and JMP . Springer, Cham. https://doi.org/10.1007/978-3-031-42555-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-42555-4_10
Published: 05 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42554-7
Online ISBN: 978-3-031-42555-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics