Chapter 1: Introduction to Time Series

1 Introduction

1.1 Learning Objectives

By the end of this chapter, students should be able to:

Understand Time Series Fundamentals.
Identify Time Series Components.
Understand the Relationship Among Components.
Measure Forecast Performance.
Apply these concepts using R.

1.2 Definition

A time series is fundamentally defined as a time-oriented or chronological sequence of observations measured on a variable of interest. These observations are typically collected sequentially over time at fixed, equally spaced intervals, referred to as the sampling interval.

Depending on the context, time series data can take several forms:

Instantaneous measurements: A reading taken at a specific point in time, such as the viscosity of a chemical product at the moment it is measured.
Cumulative quantities: An accumulated total over the interval, such as the total product demand or sales during a month.
Summary statistics: A metric that reflects activity of the variable during the time period, such as the daily closing price of a stock.

1.2.1 Statistical and Mathematical Definition

A time series is modelled as a discrete-time stochastic process — a collection of random variables indexed according to the discrete time order in which they are obtained.

A time series of length \(n\) is represented as:

\[\{y_t : t = 1, 2, \ldots, n\}\]

where the subscript \(t\) denotes the discrete time period.

An important distinction:

Concept	Description
The Time Series Model	The theoretical sequence of underlying random variables — the ensemble of possibilities
The Realization	The actual historical sequence of data observed; a single realization of the stochastic process

1.2.2 Arrangement of Time Series Data

Show R Code

df_example <- data.frame(
  t     = 1:6,
  Year  = 2018:2023,
  y_t   = c(3.4, 3.3, 3.9, 4.5, 3.7, 3.5)
)
knitr::kable(df_example, col.names = c("$t$", "Year", "$y_t$ (Unemployment Rate, %)"),
             align = "c")

Table 1: Example arrangement of time series data

\(t\)	Year	\(y_t\) (Unemployment Rate, %)
1	2018	3.4
2	2019	3.3
3	2020	3.9
4	2021	4.5
5	2022	3.7
6	2023	3.5

1.3 Categorization of Forecasting Models

Source: Wang, Shouyi & Chaovalitwongse, Wanpracha (2011). Evaluating and Comparing Forecasting Models.

1.4 Visualization of Time Series

A time series can be visualized via a time series plot.

1.4.1 Example 1: Malaysia Unemployment Rate (1982–2020)

Show R Code

unrate   <- read.csv("data/employment.csv")
unratets <- ts(unrate$u_rate, start = 1982)

plot(unratets,
     ylab = "Unemployment Rate (%)", xlab = "Year", xaxt = "n",
     col = "steelblue", lwd = 1.5)
years <- seq(1982, 2020, by = 2)
axis(1, at = years, labels = years, las = 2)
title(sub = "Figure 1: Malaysia Unemployment Rate from 1982 to 2020",
      col.sub = "grey40")

Figure 1: Malaysia Unemployment Rate, 1982–2020

1.4.2 Example 2: Monthly Oil Price (Jan 1986 – Jan 2006)

Show R Code

data(oil.price)
plot(oil.price, ylab = "Price per Barrel (USD)", type = "l",
     col = "darkred", lwd = 1.5)
title(sub = "Figure 2: Monthly Price of Oil: Jan 1986 – Jan 2006",
      col.sub = "grey40")

Figure 2: Monthly price of oil, January 1986 – January 2006

1.4.3 Example 3: Monthly Sales of Specialty Oil Filters (1983–1987)

Show R Code

data(oilfilters)
plot(oilfilters, type = "l", ylab = "Sales", col = "darkgreen", lwd = 1.5)
points(y = oilfilters, x = time(oilfilters),
       pch = as.vector(season(oilfilters)))
title(sub = "Figure 3: Monthly Sales of Specialty Oil Filters, Jul 1983 – Jun 1987",
      col.sub = "grey40")

Figure 3: Monthly sales of specialty oil filters, 1983–1987

1.5 Interpretation of Time Series Plot

When interpreting a time series plot, always include the following:

Element	Description
Minimum	The smallest value in the plot
Maximum	The largest value in the plot
Outliers	Values much further away from the rest of the data points
Trends	Patterns such as upward/downward movements or clusters

Example interpretation (Figure 1): Malaysia’s unemployment rate ranged from a minimum of 2.4% (1997) to a maximum of 7.4% (2020). The data shows a general downward trend from 1986 to 1997 and exhibits considerable fluctuation thereafter, with a sharp spike in 2020 likely attributed to the COVID-19 pandemic.

2 Components of Time Series

A time series is typically composed of four components:

Component	Notation	Description
Trend	\(T_t\)	Long-run direction of the series
Seasonal	\(S_t\)	Regular, repeating fluctuations within a fixed period
Cyclical	\(C_t\)	Long-wave rises and falls around the trend
Irregular/Random	\(I_t\)	Unpredictable, residual variation

2.1 Trend

The trend component describes the general upward or downward movements that characterise all economic and business activities.

2.1.1 Types of Trends

Type	Description
Upward	Values increase over time
Downward	Values decrease over time
No Trend/Flat	Values remain relatively stable
Non-linear	Exhibits curvature — exponential growth, logarithmic decay, etc.

Show R Code

t  <- 1:60
p1 <- ggplot(data.frame(t, y = 10 + 0.5*t + rnorm(60, 0, 1.5)),
             aes(t, y)) + geom_line(colour="#d73027", linewidth=0.8) +
      labs(title="Upward Trend", x="Time", y="Value") + theme_ts()

p2 <- ggplot(data.frame(t, y = 40 - 0.5*t + rnorm(60, 0, 1.5)),
             aes(t, y)) + geom_line(colour="#4575b4", linewidth=0.8) +
      labs(title="Downward Trend", x="Time", y="Value") + theme_ts()

p3 <- ggplot(data.frame(t, y = 25 + rnorm(60, 0, 1.5)),
             aes(t, y)) + geom_line(colour="#1a9641", linewidth=0.8) +
      labs(title="No Trend / Flat Trend", x="Time", y="Value") + theme_ts()

p4 <- ggplot(data.frame(t, y = 10 * exp(0.04*t) + rnorm(60, 0, 3)),
             aes(t, y)) + geom_line(colour="#fc8d59", linewidth=0.8) +
      labs(title="Non-linear (Exponential) Trend", x="Time", y="Value") + theme_ts()

grid.arrange(p1, p2, p3, p4, nrow = 2)

Figure 4: Four types of trend in time series data

2.1.2 Importance of Studying Trends

Helps in making forecasts and predictions about future values.
Provides insights into underlying patterns and relationships which useful for decision-making.
Helps identify potential anomalies or changes in data behaviour over time.

2.1.3 Identifying Trends

Visual Inspection: Plotting the time series and observing the overall direction.
Statistical Techniques: Linear regression or moving averages.

2.1.4 Trend Using Simple Linear Regression

In simple linear regression for trend, the independent variable is time \(t\) where \(t = 1, 2, \ldots, n\):

\[\hat{T}_t = \hat{\beta}_0 + \hat{\beta}_1 t\]

Show R Code

prod    <- read.csv("data/electric.csv")
electric <- data.frame(t = seq(1:108), index = prod[1:108, 3])

ggplot(data = electric, aes(x = t, y = index)) +
  geom_line(colour = "grey50", linewidth = 0.7) +
  geom_smooth(method = "lm", colour = "#d73027", se = TRUE) +
  stat_regline_equation(label.x = 30, label.y = 130) +
  labs(title = "Output of Electricity Index with Linear Trend",
       x = "Time (months, Jan 2015 = 1)",
       y = "Output of Electricity (Index)") +
  theme_minimal()

Figure 5: Output of electricity index with fitted linear trend (Jan 2015 – Dec 2023)

Show R Code

lm_fit <- lm(index ~ t, data = electric)
# Linear regression output
summary(lm_fit)


Call:
lm(formula = index ~ t, data = electric)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.2613  -2.7849   0.3872   3.8791  10.5326 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 103.05133    1.08696   94.81   <2e-16 ***
t             0.22672    0.01731   13.10   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.609 on 106 degrees of freedom
Multiple R-squared:  0.618, Adjusted R-squared:  0.6144 
F-statistic: 171.5 on 1 and 106 DF,  p-value: < 2.2e-16

Excel Tutorial: https://youtu.be/n6e71G9PImc

2.1.5 Trend Using Moving Average

\[M_t^{(k)} = \frac{1}{k}\sum_{i=0}^{k-1} y_{t-i}\]

If \(k = 3\): \(M_1 = \dfrac{y_1 + y_2 + y_3}{3}\) — 3-period moving average
If \(k = 5\): \(M_1 = \dfrac{y_1 + y_2 + y_3 + y_4 + y_5}{5}\) — 5-period moving average

Note

The larger \(k\) is, the smoother the series, but more observations are lost at both ends.
The first \(\lfloor k/2 \rfloor\) and last \(\lfloor k/2 \rfloor\) observations are lost from the moving average series.

Show R Code

ma3 <- ma(electric$index, 3)
ma5 <- ma(electric$index, 5)

matplot(electric$t,
        cbind(electric$index, ma3, ma5),
        type = "l", lty = 1, col = c("red", "blue", "green"),
        xlab = "Time (months)", ylab = "Output Index",
        main = "Moving Average Trend")
legend("bottom", legend = c("Actual", "3-period MA", "5-period MA"),
       col = c("red", "blue", "green"), lty = 1, bty = "n")

Figure 6: Electricity output index: actual vs. 3-period and 5-period moving averages

Excel Tutorial: https://youtu.be/FODxTjhY5TM

2.2 Seasonal

The seasonal component refers to a recurring pattern that repeats at regular, fixed intervals (daily, weekly, monthly, or yearly).

2.2.1 Causes of Seasonal Patterns

Factor	Example
Weather	Palm oil production influenced by rainfall (November–March, east coast Malaysia)
Holidays	Tourist numbers peak during school holidays
Cultural Events	Zakat collection highest during Hari Raya Eidulfitri

2.2.2 Identifying Seasonal Components

Visual Inspection: Plotting the time series and observing recurring patterns.
Seasonal Decomposition: STL or classical decomposition methods.
Statistical Tests: Autocorrelation function (ACF) analysis (Chapter 5).

Show R Code

electric$date <- seq(from = as.Date("2015-01-01"),
                     to   = as.Date("2023-12-01"),
                     by   = "month")

ggplot(data = electric, aes(x = date, y = index)) +
  geom_line(colour = "steelblue", linewidth = 0.8) +
  scale_x_date(date_labels = "%b %Y", date_breaks = "6 months") +
  labs(title = "Monthly Electricity Output Index (Jan 2015 to Dec 2023)",
       x = "Year", y = "Electric Consumption (Index)") +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Figure 7: Monthly electricity output index showing seasonal pattern (2015–2023)

Show R Code

elec_ts <- ts(electric$index, start = c(2015, 1), frequency = 12)
decomp  <- decompose(elec_ts, type = "additive")
autoplot(decomp) +
  labs(title = "Additive Decomposition: Electricity Output Index") +
  theme_bw(base_size = 11)

Figure 8: Classical additive decomposition of the electricity index

Excel Tutorial: https://youtu.be/Wwlx2IgB7uw

2.3 Cyclical

Cyclical variation refers to rises and falls of the series over an unspecified period, usually around the long-run trend line.

Important

Requires sufficiently long data — typically more than 5 years.
Easier to detect with yearly data.
There is no fixed period for cyclical recurrence (unlike seasonal).
Identification is challenging because data patterns are typically not stable.

Source: https://www.mathworks.com/help/econ/choose-time-series-filter-for-business-cycle-analysis.html

2.3.1 Methods to Identify Cyclical Variation

2.3.1.1 Residual Method

\[\text{Percent of Trend} = \frac{y_t}{\hat{T}_t} \times 100\]

Value	Interpretation
\(< 100\)	Economy contracting (recession)
\(> 100\)	Economy expanding

2.3.1.2 Relative Cyclical Residual Method

\[\text{Relative Cyclical Residual} = \frac{y_t - \hat{T}_t}{\hat{T}_t} \times 100\]

Value	Interpretation
Negative	Trend pulled down by recession
Positive	Output above average (expansion)

2.4 Irregular / Random

The irregular or random component is the portion that cannot be explained by trend, seasonal, or cyclical components. It has four distinct sub-types:

Sub-type	Key Characteristic	Visual Pattern
Turning Point	Trend permanently changes direction	Sustained shift in slope
Random Shock	Sudden, temporary spike or dip	Sharp impulse, then recovery
Outlier	One extreme isolated value	Single point far from neighbours
Pure Noise	Small unpredictable fluctuations	Irregular scatter around zero

2.4.1 Turning Point

A turning point is where the long-run direction permanently reverses — from upward to downward or vice versa. Unlike a shock, the series does not recover to its prior direction.

Examples:

Introduction of robotic manufacturing permanently raises production.
Sudden change in consumer preference shifts demand to a new sustained level.
Policy change (e.g., new minimum wage legislation) permanently alters labour market dynamics.

Show R Code

unrate <- read.csv("data/employment.csv")
df_u   <- data.frame(
  Year   = as.numeric(format(as.Date(unrate$date), "%Y")),
  u_rate = unrate$u_rate
)

ggplot(df_u, aes(x = Year, y = u_rate)) +
  geom_line(colour = "steelblue", linewidth = 0.9) +
  geom_point(size = 1.8, colour = "steelblue") +
  geom_point(data = df_u[df_u$Year == 1986, ],
             aes(x = Year, y = u_rate),
             colour = "#d73027", size = 4, shape = 17) +
  annotate("label", x = 1986, y = df_u$u_rate[df_u$Year == 1986] + 0.5,
           label = "Peak (1986)\nTurning Point ↓",
           colour = "#d73027", fill = "white", size = 3.2, label.size = 0.3) +
  geom_point(data = df_u[df_u$Year == 1997, ],
             aes(x = Year, y = u_rate),
             colour = "#1a9641", size = 4, shape = 25) +
  annotate("label", x = 1997, y = df_u$u_rate[df_u$Year == 1997] - 0.6,
           label = "Trough (1997)\nTurning Point ↑",
           colour = "#1a9641", fill = "white", size = 3.2, label.size = 0.3) +
  geom_point(data = df_u[df_u$Year == 2020, ],
             aes(x = Year, y = u_rate),
             colour = "#fc8d59", size = 4, shape = 17) +
  annotate("label", x = 2019, y = df_u$u_rate[df_u$Year == 2020] + 0.5,
           label = "COVID-19 (2020)\nShock / Turning Point?",
           colour = "#fc8d59", fill = "white", size = 3.2, label.size = 0.3) +
  labs(title = "Turning Points in Malaysia's Unemployment Rate (1982–2020)",
       subtitle = "▲ Peak = trend reverses downward  |  ▽ Trough = trend reverses upward",
       x = "Year", y = "Unemployment Rate (%)") +
  theme_ts()

Figure 9: Turning points in Malaysia’s unemployment rate (1982–2020)

Show R Code

t_tp  <- 1:80
# Before turning point: upward; after: downward
y_tp  <- ifelse(t_tp <= 45,
                20 + 0.5 * t_tp,
                20 + 0.5 * 45 - 0.4 * (t_tp - 45)) +
         rnorm(80, 0, 1.2)

df_tp <- data.frame(t = t_tp, y = y_tp)

ggplot(df_tp, aes(x = t, y = y)) +
  geom_line(colour = "#4575b4", linewidth = 0.9) +
  geom_vline(xintercept = 45, linetype = "dashed", colour = "#d73027",
             linewidth = 1) +
  annotate("label", x = 45, y = max(y_tp) - 1,
           label = "Turning Point\n(t = 45)", colour = "#d73027",
           fill = "white", size = 3.5, label.size = 0.3) +
  annotate("segment", x = 5, xend = 42, y = 22, yend = 40,
           arrow = arrow(length = unit(0.15, "cm")),
           colour = "#1a9641", linewidth = 0.8) +
  annotate("text", x = 15, y = 26, label = "Upward trend",
           colour = "#1a9641", size = 3.5) +
  annotate("segment", x = 48, xend = 75, y = 42, yend = 30,
           arrow = arrow(length = unit(0.15, "cm")),
           colour = "#d73027", linewidth = 0.8) +
  annotate("text", x = 62, y = 38, label = "Downward trend",
           colour = "#d73027", size = 3.5) +
  labs(title = "Turning Point in a Time Series",
       subtitle = "Before t = 45: upward trend  |  After t = 45: downward trend",
       x = "Time", y = "Value") +
  theme_ts()

Figure 10: Turning point — the trend permanently reverses direction at the marked point

Note

Turning point vs. Random Shock: A turning point results in a permanent direction change. After a random shock, the series recovers toward its previous level.

2.4.2 Random Shock

A random shock is an abrupt, temporary spike or drop caused by an unexpected external event. The series typically recovers toward its prior level once the event passes.

Examples:

Crude oil price sudden drop due to COVID-19.
Production of masks increasing tremendously during the pandemic.
Stock market crash following a geopolitical crisis or natural disaster.

Show R Code

data(oil.price)
oil_df <- data.frame(Date = as.numeric(time(oil.price)),
                     Price = as.numeric(oil.price))

ggplot(oil_df, aes(x = Date, y = Price)) +
  geom_line(colour = "steelblue", linewidth = 0.8) +
  annotate("rect", xmin = 1990, xmax = 1991.5,
           ymin = -Inf, ymax = Inf, alpha = 0.15, fill = "#d73027") +
  annotate("label", x = 1990.75, y = 40,
           label = "Gulf War\nshock (1990–91)",
           colour = "#d73027", fill = "white", size = 3, label.size = 0.3) +
  annotate("rect", xmin = 1997.5, xmax = 1999.5,
           ymin = -Inf, ymax = Inf, alpha = 0.15, fill = "#4575b4") +
  annotate("label", x = 1998.5, y = 44,
           label = "Asian Financial\nCrisis (1998)",
           colour = "#4575b4", fill = "white", size = 3, label.size = 0.3) +
  annotate("rect", xmin = 2003, xmax = 2006,
           ymin = -Inf, ymax = Inf, alpha = 0.12, fill = "#fc8d59") +
  annotate("label", x = 2004.5, y = 20,
           label = "Iraq War &\nprice surge",
           colour = "#a63603", fill = "white", size = 3, label.size = 0.3) +
  labs(title = "Random Shocks in Monthly Oil Prices (Jan 1986 – Jan 2006)",
       subtitle = "Each shaded region marks an unexpected external event",
       x = "Year", y = "Price per Barrel (USD)") +
  theme_ts()

Figure 11: Random shocks in monthly oil prices (Jan 1986 – Jan 2006)

Show R Code

oil_sub <- oil_df[oil_df$Date >= 1989.5 & oil_df$Date <= 1995, ]
lm_pre  <- lm(Price ~ Date, data = oil_df[oil_df$Date < 1990, ])
oil_sub$trend <- predict(lm_pre, newdata = data.frame(Date = oil_sub$Date))

ggplot(oil_sub, aes(x = Date)) +
  geom_line(aes(y = Price, colour = "Observed"), linewidth = 0.9) +
  geom_line(aes(y = trend, colour = "Pre-shock trend"),
            linetype = "dashed", linewidth = 1) +
  annotate("segment", x = 1990.25, xend = 1991.5, y = 38, yend = 22,
           arrow = arrow(length = unit(0.15, "cm")), colour = "#d73027") +
  annotate("text", x = 1990, y = 39, label = "Spike then\nrecovery",
           colour = "#d73027", size = 3.2) +
  scale_colour_manual(values = c("Observed" = "steelblue",
                                 "Pre-shock trend" = "#d73027")) +
  labs(title = "Gulf War Shock: Price Recovers to Pre-shock Level",
       x = "Year", y = "Price per Barrel (USD)", colour = NULL) +
  theme_ts()

Figure 12: Gulf War shock (1990–91): price spikes then recovers to prior trend

Show R Code

t_sh  <- 1:80
trend_sh <- 30 + 0.2 * t_sh
noise_sh <- rnorm(80, 0, 1.2)
shock    <- rep(0, 80)
shock[40] <- -18   # sudden sharp downward shock
shock[41] <- -10
shock[42] <- -4    # gradual recovery

y_sh  <- trend_sh + shock + noise_sh
df_sh <- data.frame(t = t_sh, y = y_sh)

ggplot(df_sh, aes(x = t, y = y)) +
  geom_line(colour = "#4575b4", linewidth = 0.9) +
  annotate("rect", xmin = 39, xmax = 43, ymin = -Inf, ymax = Inf,
           alpha = 0.15, fill = "#d73027") +
  annotate("label", x = 41, y = min(y_sh) + 3,
           label = "Shock period\n(e.g. COVID-19)", colour = "#d73027",
           fill = "white", size = 3.5, label.size = 0.3) +
  annotate("segment", x = 43, xend = 55, y = y_sh[42] + 1, yend = 35,
           arrow = arrow(length = unit(0.15, "cm")), colour = "#1a9641",
           linewidth = 0.8) +
  annotate("text", x = 58, y = 35.5, label = "Recovery",
           colour = "#1a9641", size = 3.5) +
  labs(title = "Random Shock in a Time Series",
       subtitle = "Sharp temporary drop followed by recovery to original trend",
       x = "Time", y = "Value") +
  theme_ts()

Figure 13: Random shock — a sudden spike followed by recovery to the previous level

Show R Code

t2    <- 1:60
base  <- 20 + 0.15 * t2

# Single-period impulse
y_imp <- base + rnorm(60, 0, 0.8)
y_imp[30] <- y_imp[30] + 12

# Multi-period sustained shock (e.g. supply chain disruption for 5 months)
shock_sus <- rep(0, 60)
shock_sus[30:34] <- c(8, 10, 9, 7, 4)
y_sus <- base + shock_sus + rnorm(60, 0, 0.8)

df_imp <- data.frame(t = t2, y = y_imp, type = "Single-period Impulse")
df_sus <- data.frame(t = t2, y = y_sus, type = "Sustained Shock (5 periods)")
df_cmp <- rbind(df_imp, df_sus)

ggplot(df_cmp, aes(x = t, y = y)) +
  geom_line(colour = "#4575b4", linewidth = 0.8) +
  facet_wrap(~type, scales = "free_y") +
  labs(title = "Types of Random Shock",
       x = "Time", y = "Value") +
  theme_ts()

Figure 14: Comparison: impulse shock (single period) vs. sustained shock (multiple periods)

2.4.3 Outliers

An outlier is an isolated data point that significantly deviates from the overall pattern. Unlike a random shock (which spans several periods), an outlier is typically a single observation.

Outliers arise from: errors in data collection, measurement instrument failure, or genuinely unusual one-off events.

Show R Code

t_out  <- 1:60
y_out  <- 50 + 0.3 * t_out + rnorm(60, 0, 2)
# Inject outliers
y_out[15] <- y_out[15] + 20   # high outlier
y_out[42] <- y_out[42] - 18  # low outlier

# Z-score to flag outliers
z_scores  <- abs(scale(y_out))
is_outlier <- z_scores > 2.5

df_out <- data.frame(t = t_out, y = y_out, outlier = as.vector(is_outlier))

ggplot(df_out, aes(x = t, y = y)) +
  geom_line(colour = "grey50", linewidth = 0.7) +
  geom_point(aes(colour = outlier, size = outlier)) +
  scale_colour_manual(values = c("FALSE" = "#4575b4", "TRUE" = "#d73027"),
                      labels = c("FALSE" = "Normal", "TRUE" = "Outlier")) +
  scale_size_manual(values = c("FALSE" = 1.5, "TRUE" = 3.5)) +
  annotate("label", x = 15, y = y_out[15] + 2,
           label = "High outlier\n(data entry error?)",
           colour = "#d73027", fill = "white", size = 3.2, label.size = 0.3) +
  annotate("label", x = 42, y = y_out[42] - 3,
           label = "Low outlier\n(instrument failure?)",
           colour = "#d73027", fill = "white", size = 3.2, label.size = 0.3) +
  labs(title = "Outliers in a Time Series",
       subtitle = "Flagged using Z-score threshold > 2.5",
       x = "Time", y = "Value", colour = NULL, size = NULL) +
  theme_ts()

Figure 15: Outliers in a time series — isolated extreme data points

How to detect outliers:

Visual inspection of the time series plot.
Z-score method — flag where \(|z_t| = |\frac{y_t - \bar{y}}{s}| > 2.5\).
IQR method — flag outside \([Q_1 - 1.5 \times IQR,\ Q_3 + 1.5 \times IQR]\).
Decomposition residuals — inspect the irregular component after removing trend and seasonal effects.

How to deal with outliers:

Strategy	When to Use
Remove	Confirmed data entry errors or instrument failures
Winsorize	Replace with boundary value (5th/95th percentile)
Impute	Replace with average of neighbours
Robust models	Use outlier-resistant forecasting techniques

2.4.4 Random (Pure Noise)

The random sub-component refers to small, unpredictable fluctuations that remain after all systematic effects are removed. Also called the residual or error term.

Random errors are expected to be \(I_t \sim N(0, \sigma^2)\) and uncorrelated across time.

Show R Code

elec_ts <- ts(electric$index, start = c(2015, 1), frequency = 12)
decomp  <- decompose(elec_ts, type = "additive")
resid_df <- data.frame(Date     = as.numeric(time(elec_ts)),
                       Residual = as.numeric(decomp$random))

p1 <- ggplot(resid_df, aes(x = Date, y = Residual)) +
  geom_line(colour = "grey40", linewidth = 0.7, na.rm = TRUE) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "#d73027") +
  geom_hline(yintercept =  2 * sd(decomp$random, na.rm = TRUE),
             linetype = "dotted", colour = "#4575b4") +
  geom_hline(yintercept = -2 * sd(decomp$random, na.rm = TRUE),
             linetype = "dotted", colour = "#4575b4") +
  annotate("text", x = 2023.5, y =  2 * sd(decomp$random, na.rm = TRUE) + 0.2,
           label = "+2σ", colour = "#4575b4", size = 3.2, hjust = 1) +
  annotate("text", x = 2023.5, y = -2 * sd(decomp$random, na.rm = TRUE) - 0.3,
           label = "−2σ", colour = "#4575b4", size = 3.2, hjust = 1) +
  labs(title = "Irregular Component (Random Noise)", x = "Year",
       y = expression(I[t])) + theme_ts()

p2 <- ggplot(na.omit(resid_df), aes(x = Residual)) +
  geom_histogram(aes(y = after_stat(density)), bins = 12,
                 fill = "#4575b4", alpha = 0.7, colour = "white") +
  stat_function(fun  = dnorm,
                args = list(mean = mean(decomp$random, na.rm = TRUE),
                            sd   = sd(decomp$random, na.rm = TRUE)),
                colour = "#d73027", linewidth = 1) +
  labs(title = "Distribution of Residuals",
       subtitle = expression("Should approximate " * N(0, sigma^2)),
       x = expression(I[t]), y = "Density") + theme_ts()

grid.arrange(p1, p2, ncol = 2)

Figure 16: Irregular (residual) component from the electricity index decomposition

2.4.5 Summary: Distinguishing Irregular Sub-types

Sub-type	Duration	Reversible?	Real Example
Turning Point	Permanent	No	Malaysia unemployment 1986, 1997
Random Shock	Few periods	Yes — recovers	Gulf War oil spike 1990
Outlier	Single period	N/A	Flagged oil filter observation
Pure Noise	Throughout	N/A	Electricity residuals

Show R Code

t_all <- 1:80
base_all <- 30 + 0.2 * t_all

# 1. Turning Point
y1 <- ifelse(t_all <= 45,
             base_all,
             base_all[45] - 0.35 * (t_all - 45)) + rnorm(80, 0, 1)

# 2. Random Shock
shock2 <- rep(0, 80); shock2[c(35, 36, 37)] <- c(-15, -8, -3)
y2 <- base_all + shock2 + rnorm(80, 0, 1)

# 3. Outlier
y3 <- base_all + rnorm(80, 0, 1)
y3[25] <- y3[25] + 22

# 4. Pure Noise
y4 <- base_all + rnorm(80, 0, 3.5)

make_plot <- function(t, y, title, subtitle, highlight = NULL,
                      vline = NULL) {
  df <- data.frame(t = t, y = y)
  g  <- ggplot(df, aes(x = t, y = y)) +
    geom_line(colour = "#4575b4", linewidth = 0.8) +
    labs(title = title, subtitle = subtitle, x = "Time", y = "Value") +
    theme_ts() +
    theme(plot.subtitle = element_text(size = 9))
  if (!is.null(highlight)) {
    g <- g +
      geom_point(data = df[highlight, ], aes(x = t, y = y),
                 colour = "#d73027", size = 3)
  }
  if (!is.null(vline)) {
    g <- g +
      geom_vline(xintercept = vline, linetype = "dashed",
                 colour = "#d73027", linewidth = 0.9)
  }
  g
}

pa <- make_plot(t_all, y1, "Turning Point",
                "Permanent change in trend direction", vline = 45)
pb <- make_plot(t_all, y2, "Random Shock",
                "Sharp temporary impulse, then recovery", highlight = 35:37)
pc <- make_plot(t_all, y3, "Outlier",
                "Single isolated extreme value", highlight = 25)
pd <- make_plot(t_all, y4, "Pure Random Noise",
                "Small unpredictable fluctuations throughout")

grid.arrange(pa, pb, pc, pd, nrow = 2)

Figure 17: Side-by-side comparison of the four irregular component sub-types

Tip

Key distinctions at a glance:

Turning point → slope changes permanently after one point.
Random shock → sharp deviation for a few periods, then the series recovers.
Outlier → exactly one data point is extreme; neighbours are normal.
Pure noise → small irregular deviations with no pattern throughout the whole series.

Tip

Quick visual check:

Does the trend permanently change direction? → Turning Point
Does the series spike then return to where it was? → Random Shock
Is one single point unusually extreme? → Outlier
Are there small unpatterned wiggles throughout? → Pure Noise

3 Relationship Among Components

3.1 Additive Effect

In the additive model, seasonal variation is constant — its magnitude is independent of the level of the series.

\[\hat{Y}_t = T_t + S_t + C_t + I_t\]

3.2 Multiplicative Effect

In the multiplicative model, seasonal variation grows proportionally with the level of the data series.

\[\hat{Y}_t = T_t \times S_t \times C_t \times I_t\]

3.3 Additive vs. Multiplicative

Show R Code

elec_ts <- ts(electric$index, start = c(2015, 1), frequency = 12)
p_add <- autoplot(decompose(elec_ts, "additive")) +
  labs(title = "Additive Decomposition") + theme_bw(base_size = 10)
p_mul <- autoplot(decompose(elec_ts, "multiplicative")) +
  labs(title = "Multiplicative Decomposition") + theme_bw(base_size = 10)
grid.arrange(p_add, p_mul, ncol = 2)

Figure 18: Additive vs. multiplicative decomposition of the electricity output index

Feature	Additive	Multiplicative
Seasonal magnitude	Constant	Grows with level
Use when	Seasonal swings roughly constant	Seasonal swings widen as series grows
Linearisation	Not needed	Apply log transform
R function	`decompose(ts, type = "additive")`	`decompose(ts, type = "multiplicative")`

Tip

If seasonal fluctuations grow proportionally with the level of the series → use multiplicative. If they remain roughly constant → use additive.

3.3.1 Examples

(a) **Example 1** — Airline passengers (top): seasonal swings *widen* as the series grows → **multiplicative**. Milk production (bottom): seasonal swings remain *roughly constant* → **additive**.

4 Measuring Performance

4.1 Data Partition

Part	Purpose	Rule of Thumb
Estimation (Training)	Fit the forecasting model	80% of data
Evaluation (Test)	Assess forecast accuracy	20% of data

Show R Code

cpi   <- read.csv("data/CPI Malaysia.csv", check.names = FALSE)
cpits <- ts(cpi$`Consumer price index (2010 = 100)`,
            start = 1960, frequency = 1)

est_part <- head(cpits, 0.8 * length(cpits))
eva_part <- tail(cpits, 0.2 * length(cpits))

str(est_part)

 Time-Series [1:50] from 1960 to 2009: 21.3 21.3 21.3 22 21.9 21.8 22.1 23.1 23 22.9 ...

Show R Code

str(eva_part)

 Time-Series [1:13] from 2010 to 2022: 100 103 105 107 110 ...

Show R Code

n_est <- length(est_part)
n_eva <- length(eva_part)

df_cpi <- data.frame(
  Year  = as.numeric(time(cpits)),
  Value = as.numeric(cpits),
  Part  = c(rep("Estimation (80%)", n_est), rep("Evaluation (20%)", n_eva))
)

ggplot(df_cpi, aes(x = Year, y = Value, colour = Part)) +
  geom_line(linewidth = 0.9) +
  geom_vline(xintercept = as.numeric(time(cpits))[n_est],
             linetype = "dashed", colour = "black") +
  annotate("label",
           x = as.numeric(time(cpits))[n_est],
           y = max(as.numeric(cpits)) * 0.7,
           label = "Partition\ncut-off", size = 3.2,
           fill = "white", label.size = 0.3) +
  scale_colour_manual(values = c("Estimation (80%)" = "#4575b4",
                                 "Evaluation (20%)" = "#d73027")) +
  labs(title = "Malaysia CPI: Data Partition for Forecasting Evaluation",
       x = "Year", y = "CPI (2010 = 100)", colour = NULL) +
  theme_ts()

Figure 20: Data partition: 80% estimation, 20% evaluation — Malaysia CPI

4.2 Error Measures

The criterion used to differentiate between a poor and a good forecast model is called an error measure.

4.2.1 Mean Square Error (MSE)

\[\text{MSE} = \frac{1}{n} \sum_{t=1}^{n} (y_t - \hat{y}_t)^2\]

Advantage	Easy to compute; penalises large errors more heavily
Disadvantage	Highly sensitive to large forecast errors; not in original units

4.2.2 Mean Absolute Percentage Error (MAPE)

\[\text{MAPE} = \frac{1}{n} \sum_{t=1}^{n} \left| \frac{y_t - \hat{y}_t}{y_t} \right| \times 100\]

Advantage	Easy to interpret (%); scale-independent — allows comparison across datasets
Disadvantage	Undefined when \(y_t = 0\); not suitable for negative values

4.2.3 Mean Absolute Error (MAE)

\[\text{MAE} = \frac{1}{n} \sum_{t=1}^{n} |y_t - \hat{y}_t|\]

Advantage	Robust to outliers; in original units; easy to compute
Disadvantage	Treats all forecast errors equally regardless of magnitude

Note

The best model produces the lowest error measure value.
A truly good model gives consistently low values across multiple error measures.

4.3 Example: Malaysia CPI Data

4.3.1 Model 1 — Linear Trend

Show R Code

n          <- length(cpits)
n_train    <- length(est_part)   # derived from actual 80% split
n_test     <- n - n_train
time_train <- 1:n_train
time_test  <- (n_train + 1):n

model1 <- lm(est_part ~ time_train)
summary(model1)


Call:
lm(formula = est_part ~ time_train)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.2836 -3.3028 -0.2493  2.3069 10.7824 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.82882    1.11813   7.896 3.16e-10 ***
time_train   1.68883    0.03816  44.255  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.894 on 48 degrees of freedom
Multiple R-squared:  0.9761,    Adjusted R-squared:  0.9756 
F-statistic:  1959 on 1 and 48 DF,  p-value: < 2.2e-16

Show R Code

pred1      <- predict(model1, newdata = data.frame(time_train = time_test))
actual     <- as.numeric(eva_part)
predicted1 <- as.numeric(pred1)

mse1  <- mean((actual - predicted1)^2)
mape1 <- mean(abs((actual - predicted1) / actual)) * 100
mae1  <- mean(abs(actual - predicted1))

cat("=== Model 1: Linear Trend ===\n")

=== Model 1: Linear Trend ===

Show R Code

cat("MSE  :", round(mse1,  4), "\n")

MSE  : 90.1995

Show R Code

cat("MAE  :", round(mae1,  4), "\n")

MAE  : 9.2001

Show R Code

cat("MAPE :", round(mape1, 4), "%\n")

MAPE : 7.959 %

4.3.2 Model 2 — Quadratic Trend

Show R Code

model2 <- lm(est_part ~ time_train + I(time_train^2))
summary(model2)


Call:
lm(formula = est_part ~ time_train + I(time_train^2))

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4353 -1.2541 -0.1506  1.3311  4.3873 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     16.03953    1.05250  15.239  < 2e-16 ***
time_train       0.85682    0.09520   9.000 8.57e-12 ***
I(time_train^2)  0.01631    0.00181   9.014 8.17e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.382 on 47 degrees of freedom
Multiple R-squared:  0.9912,    Adjusted R-squared:  0.9909 
F-statistic:  2657 on 2 and 47 DF,  p-value: < 2.2e-16

Show R Code

pred2      <- predict(model2, newdata = data.frame(time_train = time_test))
predicted2 <- as.numeric(pred2)

mse2  <- mean((actual - predicted2)^2)
mape2 <- mean(abs((actual - predicted2) / actual)) * 100
mae2  <- mean(abs(actual - predicted2))

cat("=== Model 2: Quadratic Trend ===\n")

=== Model 2: Quadratic Trend ===

Show R Code

cat("MSE  :", round(mse2,  4), "\n")

MSE  : 21.6595

Show R Code

cat("MAE  :", round(mae2,  4), "\n")

MAE  : 3.8183

Show R Code

cat("MAPE :", round(mape2, 4), "%\n")

MAPE : 3.2507 %

4.3.3 Model Comparison

Show R Code

library(kableExtra)

knitr::kable(
  data.frame(Model = c("Linear", "Quadratic"),
             MSE   = round(c(mse1, mse2),  4),
             MAE   = round(c(mae1, mae2),  4),
             MAPE  = round(c(mape1, mape2), 4)),
  col.names = c("Model", "MSE", "MAE", "MAPE (%)"),
  align = "c"
) %>%
  kable_styling() %>%
  row_spec(2, bold = TRUE, background = "#FFFFCC")

Table 2: Error measure comparison: linear vs. quadratic trend

Model	MSE	MAE	MAPE (%)
Linear	90.1995	9.2001	7.9590
Quadratic	21.6595	3.8183	3.2507

4.3.4 4-Year Forecast

Show R Code

h             <- 4
time_all      <- 1:n
time_forecast <- (n + 1):(n + h)

final_model     <- lm(cpits ~ time_all + I(time_all^2))
forecast_values <- predict(final_model,
                           newdata  = data.frame(time_all = time_forecast),
                           interval = "prediction")
print(forecast_values)

       fit      lwr      upr
1 132.2664 127.1072 137.4256
2 134.9212 129.7168 140.1255
3 137.6011 132.3471 142.8551
4 140.3063 134.9979 145.6146

Figure 21: Malaysia CPI: observed data and 4-year quadratic trend forecast

5 Summary

A time series is defined as \(\{y_t : t = 1, 2, \ldots, n\}\) — data indexed in ascending time order.
A time series plot interpretation should include minimum, maximum, outliers, and overall trends.
Four components: Trend, Seasonal, Cyclical, and Irregular/Random.
The Irregular component has four sub-types: Turning Point, Random Shock, Outlier, and Pure Noise.
Component relationships are modelled as Additive or Multiplicative.
Performance is measured via MSE, MAPE, and MAE on an 80/20 data split — lowest value wins.

6 References

Mohd Alias Lazim, Introductory Business Forecasting: A Practical Approach, 3rd ed., UPENA, UiTM, 2013.
https://open.dosm.gov.my/data-catalogue