Home>

### python - how to calculate the center of gravity of time-varying data

When the k-means method is implemented, the distance between the centroid of the data and the centroid of the cluster is measured to determine the affiliation of the cluster. However, I do not know how to display the centroid of the data and the centroid of the cluster. I want you to.
By the way, the data used is the power consumption per hour per day for 540 houses.

``````import pandas as pd
import numpy as np
import glob
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

df = glob.glob ('* _ m.csv')
df2 = pd.DataFrame ()
for i in range (546):
i + = 1
a = df [i-1]
df3 = df1.groupby (df1.index // 60) .mean ()
for l in range (24):
l + = 1
df2.loc [l-1, i] = df3.iloc [l-1,1] + df3.iloc [l-1,2]
array = np.array (df2)
array = array.T
pred = KMeans (n_clusters = 30) .fit_predict (array)
for n in range (30):
df4 = pd.DataFrame ()
n + = 1
for i in range (546):
i + = 1
if pred [i-1] == n-1:
for l in range (24):
l + = 1
df4.loc [l-1, i-1] = df2.iloc [l-1, i-1]
df4.to_csv (f'30 {n-1} .csv ')``````

I can execute without error, but I want to know how to calculate the center of gravity of the cluster and data because I want to examine the variance when performing the evaluation.

Data center of gravity:

Is it for all data (all 540 houses)?
https://code.i-harness.com/en/q/13070
Considering this page, think about 540 dimensions and decide which one to use after considering what kind of viewpoint you need a center of gravity.

Cluster centroid:

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

``````km = KMeans (n_clusters = 30) .fit (array)
You will see km.cluster_centers_.``````

I want to examine the variance:

If you want to aggregate all data that belong to the same cluster,

``````for k in range (30):
my_members = k_means.labels_ == k
dati = array [my_members, i]``````

Is it okay to target the data that is referenced in?