# Time-series decomposition and trend analysis in Python

There are a number of methods to accomplish time-series decompositions in R, including the decompose and STL commands.

I haven’t come across a seasonal decomposition method in Python comparable to R’s STL. However, statsmodels 0.6 added a naive seasonal decomposition method similar to R’s decompose that is not as powerful as the LOESS method used in STL. Let’s run through an example:

import urllib2
import datetime as datetime
import pandas as pd
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt

# Import the sample streamflow dataset
data = urllib2.urlopen('https://raw.github.com/mps9506/Sample-Datasets/master/Streamflow/USGS-Monthly_Streamflow_Bend_OR.tsv')

# The yyyy,mm, and dd are in seperate columns, we need to make this a single column
df['dti'] = df[['year_nu','month_nu','dd_nu']].apply(lambda x: datetime.datetime(*x),axis=1)

# Let use this as our index since we are using pandas
df.index = pd.DatetimeIndex(df['dti'])
# Clean the dataframe a bit
df = df.drop(['dd_nu','year_nu','month_nu','dti'],axis=1)
df = df.resample('M',how='mean')
fig,ax = plt.subplots(1,1, figsize=(6,4))
flow = df['mean_va']
flow = flow['1949-01':]

res = sm.tsa.seasonal_decompose(flow)
fig = res.plot()
fig.show()


Each component can then be accessed with:

residual = res.residual
seasonal = res.seasonal
trend = res.trend
print trend['1950':'1951']

1950-01-31    1441.591667
1950-02-28    1468.133333
1950-03-31    1499.883333
1950-04-30    1521.466667
1950-05-31    1540.633333
1950-06-30    1572.079167
1950-07-31    1611.412500
1950-08-31    1666.541667
1950-09-30    1720.658333
1950-10-31    1759.700000
1950-11-30    1780.408333
1950-12-31    1789.491667
1951-01-31    1800.950000
1951-02-28    1810.950000
1951-03-31    1819.616667
1951-04-30    1848.866667
1951-05-31    1889.850000
1951-06-30    1895.979167
1951-07-31    1878.858333
1951-08-31    1841.137500
1951-09-30    1806.308333
1951-10-31    1807.850000
1951-11-30    1826.516667
1951-12-31    1856.683333
Freq: M, Name: mean_va, dtype: float64


If we want to determine if there is a simple monotonic trend in this data we can utilize the Mann-Kendall test for trend. This doesn’t appear to be available in scipy.stats or statsmodels yet. I came across a function written by Sat Kumar Tomer, the homepage with the software package seems to be gone, so I verified the output to implementations in R and uploaded to GitHub so it won’t disappear.

import numpy as np
from scipy.stats import norm, mstats

def mk_test(x, alpha = 0.05):
"""
Input:
x:   a vector of data
alpha: significance level (0.05 default)

Output:
trend: tells the trend (increasing, decreasing or no trend)
h: True (if trend is present) or False (if trend is absence)
p: p value of the significance test
z: normalized test statistics

Examples
--------
>>> x = np.random.rand(100)
>>> trend,h,p,z = mk_test(x,0.05)
"""
n = len(x)

# calculate S
s = 0
for k in range(n-1):
for j in range(k+1,n):
s += np.sign(x[j] - x[k])

# calculate the unique data
unique_x = np.unique(x)
g = len(unique_x)

# calculate the var(s)
if n == g: # there is no tie
var_s = (n*(n-1)*(2*n+5))/18
else: # there are some ties in data
tp = np.zeros(unique_x.shape)
for i in range(len(unique_x)):
tp[i] = sum(unique_x[i] == x)
var_s = (n*(n-1)*(2*n+5) + np.sum(tp*(tp-1)*(2*tp+5)))/18

if s>0:
z = (s - 1)/np.sqrt(var_s)
elif s == 0:
z = 0
elif s<0:
z = (s + 1)/np.sqrt(var_s)

# calculate the p_value
p = 2*(1-norm.cdf(abs(z))) # two tail test
h = abs(z) > norm.ppf(1-alpha/2)

if (z<0) and h:
trend = 'decreasing'
elif (z>0) and h:
trend = 'increasing'
else:
trend = 'no trend'

return trend, h, p, z


Let’s see if there is a trend direction in the first decade of data 1950-1960:

trend = res.trend['1950':'1960']
test_trend,h,p,z = mk_test(trend,alpha=0.05)
print test_trend, h
print z, p

decreasing True
-4.05429896945 5.02848722452e-05


The test indicates a monotonic decreasing trend over the time period, with a Mann-Kendall Z stat = -4.05 and p<0.05.