Home>

When the k-means method is implemented, the distance between the centroid of the data and the centroid of the cluster is measured to determine the affiliation of the cluster. However, I do not know how to display the centroid of the data and the centroid of the cluster. I want you to.
By the way, the data used is the power consumption per hour per day for 540 houses.

import pandas as pd
import numpy as np
import glob
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

df = glob.glob ('* _ m.csv')
df2 = pd.DataFrame ()
for i in range (546):
    i + = 1
    a = df [i-1]
    df1 = pd.read_csv (f '{a}')
    df3 = df1.groupby (df1.index // 60) .mean ()
    for l in range (24):
        l + = 1
        df2.loc [l-1, i] = df3.iloc [l-1,1] + df3.iloc [l-1,2]
array = np.array (df2)
array = array.T
pred = KMeans (n_clusters = 30) .fit_predict (array)
for n in range (30):
    df4 = pd.DataFrame ()
    n + = 1
    for i in range (546):
        i + = 1
        if pred [i-1] == n-1:
            for l in range (24):
                l + = 1
                df4.loc [l-1, i-1] = df2.iloc [l-1, i-1]
            df4.to_csv (f'30 {n-1} .csv ')


I can execute without error, but I want to know how to calculate the center of gravity of the cluster and data because I want to examine the variance when performing the evaluation.

  • Answer # 1

    Data center of gravity:

    Is it for all data (all 540 houses)?
    https://code.i-harness.com/en/q/13070
    Considering this page, think about 540 dimensions and decide which one to use after considering what kind of viewpoint you need a center of gravity.

    Cluster centroid:

    Please see the documentation.
    http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

    km = KMeans (n_clusters = 30) .fit (array)
    You will see km.cluster_centers_.

    I want to examine the variance:

    If you want to aggregate all data that belong to the same cluster,

    for k in range (30):
            my_members = k_means.labels_ == k
            dati = array [my_members, i]


    Is it okay to target the data that is referenced in?

  • Answer # 2

    http://www.heisei-u.ac.jp/ba/fukui/pdf/stattext05.pdf

    Is the center of gravity the average value?