Home>

### python - evaluation of accuracy of k-nearest neighbor method for time series data

Application of k-nearest neighbor method to time series data
↑ The question is an extension of this question.

I tried to graph the anomaly of x-axis acceleration using the k-nearest neighbor method for time series data (acceleration data).
I was able to get a high degree of abnormality value firmly at the abnormal part.
I have a question here
・ How much anomaly should be taken to determine anomaly? How to determine the threshold
-What kind of code should be applied to evaluate how accurate the abnormality judgment is actually compared to the original data?

Two questions remain.

I would like somebody to teach.

``````import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors

'''
Divide data into slice windows for each size
'''
def main ():
df = pd.read_csv ("20191121.csv")
# Remove extra data from DataFrame
df = df.set_index ('time')
Visualize # x, y, z-axis acceleration
df.plot (). legend (loc ='upper left')
# 2480 x-axis accelerations from the front are used as training data, and the next 2479 are used as test data.
# # df.iloc [2479] --->53845130
# df.iloc [2480] --->53845150
train_data = df.loc [: 53845130,'x_ags']
test_data = df.loc [53845150 :,'x_ags'] .reset_index (drop = True)
# Window width
width = 30
# k-nearest neighbor k
nk = 1
# Create a set of vectors using window width
train = embed (train_data, width)
test = embed (test_data, width)
Clustering with # k-nearest neighbor method
neigh = NearestNeighbors (n_neighbors = nk)
neigh.fit (train)
#Calculate distance
d = neigh.kneighbors (test) [0]
# Distance normalization
mx = np.max (d)
d = d/mx
#Training data
plt.subplot (221)
plt.plot (train_data, label ='Training')
plt.xlabel ("Amplitude", fontsize = 12)
plt.ylabel ("Sample", fontsize = 12)
plt.grid ()
leg = plt.legend (loc = 1, fontsize = 15)
leg.get_frame (). set_alpha (1)
# Abnormality
plt.subplot (222)
plt.plot (d, label ='d')
plt.xlabel ("Amplitude", fontsize = 12)
plt.ylabel ("Sample", fontsize = 12)
plt.grid ()
leg = plt.legend (loc = 1, fontsize = 15)
leg.get_frame (). set_alpha (1)
# Verification data
plt.subplot (223)
plt.plot (test_data, label ='Test')
plt.xlabel ("Amplitude", fontsize = 12)
plt.ylabel ("Sample", fontsize = 12)
plt.grid ()
leg = plt.legend (loc = 1, fontsize = 15)
leg.get_frame (). set_alpha (1)

def embed (lst, dim):
emb = np.empty ((0, dim), float)
for i in range (lst.size --dim + 1):
tmp = np.array (lst [i: i + dim]) [:: -1] .reshape ((1, -1))
emb = np.append (emb, tmp, axis = 0)
return emb
if __name__ =='__main__':
main ()``````
• Answer # 1

You may have misunderstood the k-nearest neighbor method.

The k-nearest neighbor method with k = 1 is to "adopt the same correct answer as the training data of the closest distance", and the size of the distance does not matter in the judgment.

Wikipedia-k-nearest neighbor method
"The k-nearest neighbor method when k = 1 is called the nearest neighbor method, and the class of the training example closest to it is adopted."

Note that k = 1 is not essential to the above story. Even if k>1, k training data are selected from the closest order regardless of the absolute distance, and the correct answer is predicted by the majority vote of the result. In the k-nearest neighbor method, after determining k, in the prediction,The size of the distance order is relevant, but the size of the absolute distance is irrelevant...

On the contrary, if you feel from the domain knowledge that "it is correct that the judgment changes depending on the size of the absolute distance", it means that the k-nearest neighbor method is not suitable. Consider other techniques.