Home>

Application of k-nearest neighbor method to time series data

↑ The question is an extension of this question.

I tried to graph the anomaly of x-axis acceleration using the k-nearest neighbor method for time series data (acceleration data).

I was able to get a high degree of abnormality value firmly at the abnormal part.

I have a question here

・ How much anomaly should be taken to determine anomaly? How to determine the threshold

-What kind of code should be applied to evaluate how accurate the abnormality judgment is actually compared to the original data?

Two questions remain.

I would like somebody to teach.

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
'''
Divide data into slice windows for each size
'''
def main ():
df = pd.read_csv ("20191121.csv")
# Remove extra data from DataFrame
df = df.drop (['name','x_rad/s','y_rad/s','z_rad/s'], axis = 1)
df = df.set_index ('time')
Visualize # x, y, z-axis acceleration
df.plot (). legend (loc ='upper left')
# 2480 x-axis accelerations from the front are used as training data, and the next 2479 are used as test data.
# # df.iloc [2479] --->53845130
# df.iloc [2480] --->53845150
train_data = df.loc [: 53845130,'x_ags']
test_data = df.loc [53845150 :,'x_ags'] .reset_index (drop = True)
# Window width
width = 30
# k-nearest neighbor k
nk = 1
# Create a set of vectors using window width
train = embed (train_data, width)
test = embed (test_data, width)
Clustering with # k-nearest neighbor method
neigh = NearestNeighbors (n_neighbors = nk)
neigh.fit (train)
#Calculate distance
d = neigh.kneighbors (test) [0]
# Distance normalization
mx = np.max (d)
d = d/mx
#Training data
plt.subplot (221)
plt.plot (train_data, label ='Training')
plt.xlabel ("Amplitude", fontsize = 12)
plt.ylabel ("Sample", fontsize = 12)
plt.grid ()
leg = plt.legend (loc = 1, fontsize = 15)
leg.get_frame (). set_alpha (1)
# Abnormality
plt.subplot (222)
plt.plot (d, label ='d')
plt.xlabel ("Amplitude", fontsize = 12)
plt.ylabel ("Sample", fontsize = 12)
plt.grid ()
leg = plt.legend (loc = 1, fontsize = 15)
leg.get_frame (). set_alpha (1)
# Verification data
plt.subplot (223)
plt.plot (test_data, label ='Test')
plt.xlabel ("Amplitude", fontsize = 12)
plt.ylabel ("Sample", fontsize = 12)
plt.grid ()
leg = plt.legend (loc = 1, fontsize = 15)
leg.get_frame (). set_alpha (1)
def embed (lst, dim):
emb = np.empty ((0, dim), float)
for i in range (lst.size --dim + 1):
tmp = np.array (lst [i: i + dim]) [:: -1] .reshape ((1, -1))
emb = np.append (emb, tmp, axis = 0)
return emb
if __name__ =='__main__':
main ()
```

Trends

- python - you may need to restart the kernel to use updated packages error
- php - coincheck api authentication doesn't work
- php - i would like to introduce the coincheck api so that i can make payments with bitcoin on my ec site
- [php] i want to get account information using coincheck api
- the emulator process for avd pixel_2_api_29 was killed occurred when the android studio emulator was started, so i would like to
- javascript - how to check if an element exists in puppeteer
- python 3x - typeerror: 'method' object is not subscriptable
- i want to call a child component method from a parent in vuejs
- dart - flutter: the instance member'stars' can't be accessed in an initializer error
- xcode - pod install [!] no `podfile 'found in the project directory

You may have misunderstood the k-nearest neighbor method.

The k-nearest neighbor method with k = 1 is to "adopt the same correct answer as the training data of the closest distance", and the size of the distance does not matter in the judgment.

Wikipedia-k-nearest neighbor method

"The k-nearest neighbor method when k = 1 is called the nearest neighbor method, and the class of the training example closest to it is adopted."

Note that k = 1 is not essential to the above story. Even if k>1, k training data are selected from the closest order regardless of the absolute distance, and the correct answer is predicted by the majority vote of the result. In the k-nearest neighbor method, after determining k, in the prediction,

The size of the distance order is relevant, but the size of the absolute distance is irrelevant...On the contrary, if you feel from the domain knowledge that "it is correct that the judgment changes depending on the size of the absolute distance", it means that the k-nearest neighbor method is not suitable. Consider other techniques.