Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- pandas
- sktime tutorial
- pandas row 제거
- Does GNN Pretraining Help Molecular Representation?
- 비선형함수 딥러닝
- weight 일부 고정
- Graph Theory
- python 경우의 수
- 일부 레이어 고정
- pandas 특정 조건 열 제거
- 경우의 수 파이썬
- sktime 튜토리얼
- pandas 조건
- 선형함수 딥러닝
- sktime 예제
- pytorch dataset split
- Prompt Tuning for Graph Neural Networks
- molecular representation
- 모델 freeze
- 판다스 조건
- 비선형함수
- sktime
- pretraining
- EDA in python
- 시계열 라이브러리
- pandas 행 제거
- pytorch 데이터셋 나누기
- Skip connection
- layer 일부 freeze
- EDA 추천 파이썬
Archives
- Today
- Total
MoonNote
sktime (time series 라이브러리) 본문
Sktime 이란?¶
- 파이썬의 시계열 분석을 위한 라이브러리
- scikit-learn과 호환되며 다양한 시계열 알고리즘 및 도구 제공
- https://opensourcelibs.com/libs/time-series-classification
Install¶
- pip install sktime
- conda install -c conda-forge sktime
Task¶
- Forecasting (Regression)
- Classification
Dataset¶
- 각 task에 적합한 총 18개의 데이터셋을 제공한다.
- univariate for forecasting
- multivariate for forecasting
- univariate for classification
- multivariate for classification
In [1]:
from sktime.datasets import (
load_airline, # 1 univariate forecasting
load_PBS_dataset, # 1 univariate forecasting
load_shampoo_sales, # 1 univariate forecasting
load_lynx, # 1 univariate forecasting
load_macroeconomic, # 2 multivariate forecasting
load_longley, # 2 multivariate forecasting
load_uschange, # 2 multivariate forecasting
load_acsf1, # 3 univariate classification
load_arrow_head, # 3 univariate classification
load_gunpoint, # 3 univariate classification
load_italy_power_demand, # 3 univariate classification
load_osuleaf, # 3 univariate classification
load_unit_test, # 3 univariate classification
load_japanese_vowels, # 4 multivariate classification
load_basic_motions, # 4 multivariate classification
#load_electric_devices_segmentation,
#load_gun_point_segmentation,
#load_UCR_UEA_dataset
)
Forecasting¶
- univariate forecasting
- multivariate forecasting
In [2]:
#from sktime.forecasting.all import *
Univariate Forecasting¶
- 데이터셋 불러오기
- 데이터셋 나누기
- Forecasting 실행
- 시각화
- 성능 측정
데이터셋 불러오기¶
In [3]:
y = load_airline()
y.head()
Out[3]:
Period
1949-01 112.0
1949-02 118.0
1949-03 132.0
1949-04 129.0
1949-05 121.0
Freq: M, Name: Number of airline passengers, dtype: float64
In [4]:
y.index
Out[4]:
PeriodIndex(['1949-01', '1949-02', '1949-03', '1949-04', '1949-05', '1949-06',
'1949-07', '1949-08', '1949-09', '1949-10',
...
'1960-03', '1960-04', '1960-05', '1960-06', '1960-07', '1960-08',
'1960-09', '1960-10', '1960-11', '1960-12'],
dtype='period[M]', name='Period', length=144, freq='M')
데이터셋 나누기¶
In [5]:
from sktime.forecasting.model_selection import temporal_train_test_split
y_train, y_test = temporal_train_test_split(y, test_size=36)
데이터셋 시각화¶
In [6]:
from sktime.utils.plotting import plot_series
plot_series(y_train, y_test, labels=["y", "y_test"])
Out[6]:
(<Figure size 1152x288 with 1 Axes>,
<AxesSubplot:ylabel='Number of airline passengers'>)
Forecasting 실행¶
In [7]:
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.naive import NaiveForecaster
# 예측 범위 지정
fh = ForecastingHorizon(y_test.index, is_relative=False)
# 모델 설정
forecaster = NaiveForecaster(strategy="last", sp=12)
forecaster.fit(y_train)
# 예측
y_pred = forecaster.predict(fh)
In [8]:
from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.naive import NaiveForecaster
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
# step 1: 데이터 불러오기 및 나누기
y = load_airline()
y_train, y_test = temporal_train_test_split(y, test_size=36)
# step 2: forecasting 실행
fh = ForecastingHorizon(y_test.index, is_relative=False)
forecaster = NaiveForecaster(strategy="last", sp=12)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
# 시각화
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])
# step 3: evaluation metric 지정
# step 4: 성능 측정
mean_absolute_percentage_error(y_test, y_pred)
Out[8]:
0.145427686270316
모델 변경¶
- sktime은 다양한 통계적 forecasting algorithm들을 제공한다.
- https://github.com/alan-turing-institute/sktime/blob/922cd71d0d82d849025a080be826cf1c3c4777e5/sktime/forecasting/all/__init__.py#L88
In [9]:
from sktime.registry import all_estimators
import pandas as pd
all_estimators("forecaster", as_dataframe=True)
Out[9]:
name | estimator | |
---|---|---|
0 | ARIMA | <class 'sktime.forecasting.arima.ARIMA'> |
1 | AutoARIMA | <class 'sktime.forecasting.arima.AutoARIMA'> |
2 | AutoETS | <class 'sktime.forecasting.ets.AutoETS'> |
3 | AutoEnsembleForecaster | <class 'sktime.forecasting.compose._ensemble.A... |
4 | BATS | <class 'sktime.forecasting.bats.BATS'> |
5 | ColumnEnsembleForecaster | <class 'sktime.forecasting.compose._column_ens... |
6 | Croston | <class 'sktime.forecasting.croston.Croston'> |
7 | DirRecTabularRegressionForecaster | <class 'sktime.forecasting.compose._reduce.Dir... |
8 | DirRecTimeSeriesRegressionForecaster | <class 'sktime.forecasting.compose._reduce.Dir... |
9 | DirectTabularRegressionForecaster | <class 'sktime.forecasting.compose._reduce.Dir... |
10 | DirectTimeSeriesRegressionForecaster | <class 'sktime.forecasting.compose._reduce.Dir... |
11 | EnsembleForecaster | <class 'sktime.forecasting.compose._ensemble.E... |
12 | ExponentialSmoothing | <class 'sktime.forecasting.exp_smoothing.Expon... |
13 | ForecastingGridSearchCV | <class 'sktime.forecasting.model_selection._tu... |
14 | ForecastingPipeline | <class 'sktime.forecasting.compose._pipeline.F... |
15 | ForecastingRandomizedSearchCV | <class 'sktime.forecasting.model_selection._tu... |
16 | HCrystalBallForecaster | <class 'sktime.forecasting.hcrystalball.HCryst... |
17 | MultioutputTabularRegressionForecaster | <class 'sktime.forecasting.compose._reduce.Mul... |
18 | MultioutputTimeSeriesRegressionForecaster | <class 'sktime.forecasting.compose._reduce.Mul... |
19 | MultiplexForecaster | <class 'sktime.forecasting.compose._multiplexe... |
20 | NaiveForecaster | <class 'sktime.forecasting.naive.NaiveForecast... |
21 | OnlineEnsembleForecaster | <class 'sktime.forecasting.online_learning._on... |
22 | PolynomialTrendForecaster | <class 'sktime.forecasting.trend.PolynomialTre... |
23 | Prophet | <class 'sktime.forecasting.fbprophet.Prophet'> |
24 | RecursiveTabularRegressionForecaster | <class 'sktime.forecasting.compose._reduce.Rec... |
25 | RecursiveTimeSeriesRegressionForecaster | <class 'sktime.forecasting.compose._reduce.Rec... |
26 | StackingForecaster | <class 'sktime.forecasting.compose._stack.Stac... |
27 | TBATS | <class 'sktime.forecasting.tbats.TBATS'> |
28 | ThetaForecaster | <class 'sktime.forecasting.theta.ThetaForecast... |
29 | TransformedTargetForecaster | <class 'sktime.forecasting.compose._pipeline.T... |
30 | TrendForecaster | <class 'sktime.forecasting.trend.TrendForecast... |
31 | UnobservedComponents | <class 'sktime.forecasting.structural.Unobserv... |
32 | VAR | <class 'sktime.forecasting.var.VAR'> |
In [10]:
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
y = load_airline()
y_train, y_test = temporal_train_test_split(y, test_size=36)
fh = ForecastingHorizon(y_test.index, is_relative=False)
forecaster = ExponentialSmoothing(trend="add", seasonal="additive", sp=12)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])
mean_absolute_percentage_error(y_test, y_pred)
Out[10]:
0.05027655720606656
In [11]:
from sktime.forecasting.arima import ARIMA
forecaster = ARIMA(
order=(1, 1, 0), seasonal_order=(0, 1, 0, 12), suppress_warnings=True
)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])
mean_absolute_percentage_error(y_pred, y_test)
Out[11]:
0.04257105757347649
Multivariate Forecasting¶
- 데이터셋 불러오기
- 데이터셋 나누기
- Forecasting 실행
- 시각화
- 성능 측정
In [12]:
_, y = load_longley()
y.head()
Out[12]:
GNPDEFL | GNP | UNEMP | ARMED | POP | |
---|---|---|---|---|---|
Period | |||||
1947 | 83.0 | 234289.0 | 2356.0 | 1590.0 | 107608.0 |
1948 | 88.5 | 259426.0 | 2325.0 | 1456.0 | 108632.0 |
1949 | 88.2 | 258054.0 | 3682.0 | 1616.0 | 109773.0 |
1950 | 89.5 | 284599.0 | 3351.0 | 1650.0 | 110929.0 |
1951 | 96.2 | 328975.0 | 2099.0 | 3099.0 | 112075.0 |
In [13]:
from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.naive import NaiveForecaster
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
# step 1: 데이터 불러오기 및 나누기
_, y = load_longley()
y_train, y_test = temporal_train_test_split(y)
# step 2: forecasting 실행
fh = ForecastingHorizon(y_test.index, is_relative=False)
forecaster = NaiveForecaster(strategy="last")
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
# 시각화
plot_series(y_train["POP"], y_test["POP"], y_pred["POP"], labels=["y_train", "y_test", "y_pred"])
# step 3: evaluation metric 지정
# step 4: 성능 측정
mean_absolute_percentage_error(y_test, y_pred)
Out[13]:
0.08039499198190428
In [74]:
import warnings
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV
from sktime.datasets import load_shampoo_sales, load_italy_power_demand
from sktime.forecasting.compose import RecursiveTimeSeriesRegressionForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
sns.set_style('whitegrid')
In [14]:
from sktime.forecasting.var import VAR
# step 1: 데이터 불러오기 및 나누기
_, y = load_longley()
y_train, y_test = temporal_train_test_split(y)
# step 2: forecasting 실행
fh = ForecastingHorizon(y_test.index, is_relative=False)
forecaster = VAR()
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
# 시각화
plot_series(y_train["POP"], y_test["POP"], y_pred["POP"], labels=["y_train", "y_test", "y_pred"])
# step 3: evaluation metric 지정
# step 4: 성능 측정
mean_absolute_percentage_error(y_test, y_pred)
Out[14]:
0.08482383879246463
Univariate time series classification¶
In [46]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sktime.classification.compose import ComposableTimeSeriesForestClassifier
from sktime.datasets import load_arrow_head
from sktime.utils.slope_and_trend import _slope
In [47]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)
In [48]:
# univariate 데이터
X_train.head()
Out[48]:
dim_0 | |
---|---|
122 | 0 -1.6961 1 -1.6806 2 -1.6574 3 ... |
32 | 0 -1.6737 1 -1.6715 2 -1.6602 3 ... |
142 | 0 -1.8981 1 -1.8790 2 -1.8566 3 ... |
30 | 0 -1.9204 1 -1.9015 2 -1.8864 3 ... |
73 | 0 -1.8132 1 -1.8255 2 -1.8166 3 ... |
In [49]:
# target variable
labels, counts = np.unique(y_train, return_counts=True)
print(labels, counts)
['0' '1' '2'] [58 49 51]
In [50]:
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
for label in labels:
X_train.loc[y_train == label, "dim_0"].iloc[0].plot(ax=ax, label=f"class {label}")
plt.legend()
ax.set(title="Example time series", xlabel="Time");
Scikit-learn 방식¶
- time point에 해당하는 값들을 feature로 치환
In [51]:
from sklearn.ensemble import RandomForestClassifier
from sktime.datatypes._panel._convert import from_nested_to_2d_array
X_train_tab = from_nested_to_2d_array(X_train)
X_test_tab = from_nested_to_2d_array(X_test)
X_train_tab.head()
Out[51]:
dim_0__0 | dim_0__1 | dim_0__2 | dim_0__3 | dim_0__4 | dim_0__5 | dim_0__6 | dim_0__7 | dim_0__8 | dim_0__9 | ... | dim_0__241 | dim_0__242 | dim_0__243 | dim_0__244 | dim_0__245 | dim_0__246 | dim_0__247 | dim_0__248 | dim_0__249 | dim_0__250 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
122 | -1.6961 | -1.6806 | -1.6574 | -1.6443 | -1.6187 | -1.5873 | -1.5372 | -1.5189 | -1.4789 | -1.4287 | ... | -1.4995 | -1.5552 | -1.5921 | -1.6109 | -1.6262 | -1.6402 | -1.6658 | -1.6798 | -1.6816 | -1.6834 |
32 | -1.6737 | -1.6715 | -1.6602 | -1.6349 | -1.6061 | -1.5588 | -1.5562 | -1.5173 | -1.4901 | -1.4263 | ... | -1.4081 | -1.4331 | -1.4963 | -1.5221 | -1.5602 | -1.5768 | -1.6097 | -1.6362 | -1.6612 | -1.6625 |
142 | -1.8981 | -1.8790 | -1.8566 | -1.8160 | -1.8048 | -1.7729 | -1.7545 | -1.7027 | -1.6606 | -1.6129 | ... | -1.6285 | -1.6869 | -1.7297 | -1.7631 | -1.7927 | -1.8192 | -1.8330 | -1.8704 | -1.8827 | -1.8985 |
30 | -1.9204 | -1.9015 | -1.8864 | -1.8678 | -1.8133 | -1.7729 | -1.7501 | -1.7205 | -1.6654 | -1.6369 | ... | -1.5643 | -1.6283 | -1.6402 | -1.6773 | -1.7094 | -1.7512 | -1.7945 | -1.8678 | -1.9019 | -1.9039 |
73 | -1.8132 | -1.8255 | -1.8166 | -1.8025 | -1.7866 | -1.7659 | -1.7616 | -1.7547 | -1.7455 | -1.7145 | ... | -1.2668 | -1.3390 | -1.4362 | -1.5041 | -1.5512 | -1.6177 | -1.6687 | -1.7403 | -1.7732 | -1.8038 |
5 rows × 251 columns
In [52]:
classifier = RandomForestClassifier(n_estimators=100)
classifier.fit(X_train_tab, y_train)
y_pred = classifier.predict(X_test_tab)
accuracy_score(y_test, y_pred)
Out[52]:
0.8490566037735849
Feature extraction¶
- 시계열 데이터에서 특징을 추출한 후 이를 활용
In [53]:
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor
transformer = TSFreshFeatureExtractor(default_fc_parameters="minimal")
extracted_features = transformer.fit_transform(X_train)
extracted_features.head()
/usr/local/lib/python3.6/dist-packages/sktime/transformations/panel/tsfresh.py:164: UserWarning:
tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
Feature Extraction: 100%|██████████| 5/5 [00:00<00:00, 32.56it/s]
Out[53]:
dim_0__sum_values | dim_0__median | dim_0__mean | dim_0__length | dim_0__standard_deviation | dim_0__variance | dim_0__root_mean_square | dim_0__maximum | dim_0__minimum | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.000197 | 0.218390 | 7.848606e-07 | 251.0 | 0.998006 | 0.996017 | 0.998006 | 1.2427 | -1.6961 |
1 | -0.000356 | 0.312720 | -1.418327e-06 | 251.0 | 0.998003 | 0.996011 | 0.998003 | 1.1377 | -1.6737 |
2 | 0.000279 | -0.020420 | 1.111554e-06 | 251.0 | 0.998007 | 0.996018 | 0.998007 | 1.3738 | -1.8985 |
3 | 0.000071 | -0.166200 | 2.828685e-07 | 251.0 | 0.998009 | 0.996021 | 0.998009 | 1.5740 | -1.9204 |
4 | 0.000015 | -0.020305 | 5.976096e-08 | 251.0 | 0.998005 | 0.996013 | 0.998005 | 1.3624 | -1.8255 |
In [54]:
from sklearn.pipeline import make_pipeline
classifier = make_pipeline(
TSFreshFeatureExtractor(show_warnings=False), RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
/usr/local/lib/python3.6/dist-packages/sktime/transformations/panel/tsfresh.py:164: UserWarning:
tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
Feature Extraction: 100%|██████████| 5/5 [00:09<00:00, 1.95s/it]
/usr/local/lib/python3.6/dist-packages/sktime/transformations/panel/tsfresh.py:164: UserWarning:
tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00, 1.51it/s]
Out[54]:
0.8679245283018868
Time series classification¶
- Time series forest : 랜덤 포레스트의 시계열 버전
- 데이터를 여러 개의 random한 구간으로 분할한다.
- 각 구간에서 특징(평균, 표준편차, 기울기)을 추출하고,
- 추출된 특징에 대해 학습한다.
- 1 - 3 step 앙상블
In [55]:
from sktime.transformations.panel.summarize import RandomIntervalFeatureExtractor
steps = [
(
"extract",
RandomIntervalFeatureExtractor(
n_intervals="sqrt", features=[np.mean, np.std, _slope]
),
),
("clf", DecisionTreeClassifier()),
]
time_series_tree = Pipeline(steps)
In [56]:
time_series_tree.fit(X_train, y_train)
time_series_tree.score(X_test, y_test)
Out[56]:
0.7358490566037735
In [57]:
tsf = ComposableTimeSeriesForestClassifier(
estimator=time_series_tree,
n_estimators=100,
criterion="entropy",
bootstrap=True,
oob_score=True,
random_state=1
)
In [58]:
tsf.fit(X_train, y_train)
if tsf.oob_score:
print(tsf.oob_score_)
0.8481012658227848
In [59]:
tsf = ComposableTimeSeriesForestClassifier()
tsf.fit(X_train, y_train)
tsf.score(X_test, y_test)
Out[59]:
0.8679245283018868
Feature 중요도¶
In [31]:
fi = tsf.feature_importances_
# renaming _slope to slope.
fi.rename(columns={"_slope": "slope"}, inplace=True)
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
fi.plot(ax=ax)
ax.set(xlabel="Time", ylabel="Feature importance");
Multivariate time series classification¶
- Time series concatenation
- Column ensembling
- Bespoke classification algorithms
In [32]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.dictionary_based import BOSSEnsemble
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.classification.shapelet_based import MrSEQLClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator
In [33]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 6) (60,) (20, 6) (20,)
In [34]:
# multivariate input data
X_train.head()
Out[34]:
dim_0 | dim_1 | dim_2 | dim_3 | dim_4 | dim_5 | |
---|---|---|---|---|---|---|
9 | 0 -0.407421 1 -0.407421 2 2.355158 3... | 0 1.413374 1 1.413374 2 -3.928032 3... | 0 0.092782 1 0.092782 2 -0.211622 3... | 0 -0.066584 1 -0.066584 2 -3.630177 3... | 0 0.223723 1 0.223723 2 -0.026634 3... | 0 0.135832 1 0.135832 2 -1.946925 3... |
24 | 0 0.383922 1 0.383922 2 -0.272575 3... | 0 0.302612 1 0.302612 2 -1.381236 3... | 0 -0.398075 1 -0.398075 2 -0.681258 3... | 0 0.071911 1 0.071911 2 -0.761725 3... | 0 0.175783 1 0.175783 2 -0.114525 3... | 0 -0.087891 1 -0.087891 2 -0.503377 3... |
5 | 0 -0.357300 1 -0.357300 2 -0.005055 3... | 0 -0.584885 1 -0.584885 2 0.295037 3... | 0 -0.792751 1 -0.792751 2 0.213664 3... | 0 0.074574 1 0.074574 2 -0.157139 3... | 0 0.159802 1 0.159802 2 -0.306288 3... | 0 0.023970 1 0.023970 2 1.230478 3... |
7 | 0 -0.352746 1 -0.352746 2 -1.354561 3... | 0 0.316845 1 0.316845 2 0.490525 3... | 0 -0.473779 1 -0.473779 2 1.454261 3... | 0 -0.327595 1 -0.327595 2 -0.269001 3... | 0 0.106535 1 0.106535 2 0.021307 3... | 0 0.197090 1 0.197090 2 0.460763 3... |
34 | 0 0.052231 1 0.052231 2 -0.54804... | 0 -0.730486 1 -0.730486 2 0.70700... | 0 -0.518104 1 -0.518104 2 -1.179430 3... | 0 -0.159802 1 -0.159802 2 -0.239704 3... | 0 -0.045277 1 -0.045277 2 0.023970 3... | 0 -0.029297 1 -0.029297 2 0.29829... |
In [35]:
# multi-class target variable
np.unique(y_train)
Out[35]:
array(['badminton', 'running', 'standing', 'walking'], dtype=object)
In [36]:
# step 1: 데이터 불러오기 및 나누기
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
Time series concatenation¶
- multivariate 데이터를 긴 univariate data로 변환하여 univariate의 분류기 적용
In [37]:
steps = [
("concatenate", ColumnConcatenator()),
("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
Out[37]:
1.0
Column ensembling¶
- 각 시계열 열에 대해 예측하는 모델들을 앙상블
In [38]:
clf = ColumnEnsembleClassifier(
estimators=[
("TSF0", TimeSeriesForestClassifier(n_estimators=100), [0]),
("BOSSEnsemble3", BOSSEnsemble(max_ensemble_size=5), [3]),
]
)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
Out[38]:
0.95
Bespoke classification algorithms¶
In [39]:
clf = MrSEQLClassifier()
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
Out[39]:
1.0
In [ ]:
'Study > Machine Learning' 카테고리의 다른 글
Pretext task (0) | 2021.12.01 |
---|---|
Shapley Value (0) | 2021.11.29 |
딥러닝에서 비선형 활성화함수를 쓰는 이유? (0) | 2021.11.11 |
Model Assessment and Selection (0) | 2021.10.23 |
Multilabel classification (0) | 2021.05.05 |
Comments