Home>

Currently, we have created a function for searching for cocoons recently using the APIsklearn.neighbors.NearestNeighbors.

NearestNeighbors'salgorithmparameter specifieskd_treeto improve search efficiency
I can't confirm how the kd tree is built.

Isn't

sklearn.tree.DecisionTreeClassifiervisible likesklearn.tree.plot_tree?

I can get the value stored internally inKDTree (X) .__ getstate __ (), but I don't know what it means.

I want to see how the tree is divided even if I can't visualize it ...

I'm sorry for being thrown away, but please give me your answer.

  • Answer # 1

    Looking at the source code comments, the tree structure seems to be managed by a one-dimensional array stored in thenode_dataattribute of theKDTreeobject.

    According to the comment, KDTree.node_data [0] is the root node and the two child nodes of node KDTree.node_data [i] are KDTree.node_data [2 * i + 1] and KDTree.node_data [2 * i + See 2].

    See the source code comments for the meaning of each element of KDTree.node_data. (There are about 3000 lines and I don't read all of them, so I don't know the details of the implementation.)

    The KDTree class inherits the BinaryTree class, so the following two codes are applicable.

    scikit-learn/_binary_tree.pxi

    scikit-learn/_kd_tree.pyx

    # In a typical KD Tree or Ball Tree implementation, the nodes are implemented
    # as dynamically allocated structures with pointers linking them.Here we
    # take a different approach, storing all relevant data in a set of arrays
    # so that the entire tree object can be saved in a pickle file.For efficiency,
    # the data can be stored in such a way that explicit pointers are not
    # necessary: ​​for node data stored at index i, the two child nodes are at
    # index (2 * i + 1) and (2 * i + 2);the parent node is (i-1) // 2
    # (where // indicates integer division).
    Code to store and confirm 1D array as tree structure
    # An external library called anytree is used to visualize the tree structure.
    You can install with # pip install anytree.
    import numpy as np
    from anytree import Node, RenderTree
    from sklearn.neighbors import KDTree
    np.random.seed (0)
    X = np.random.randint (0, 101, (20, 2))
    tree = KDTree (X, leaf_size = 2)
    def tarverse (i = 0, parent = None):
        node = Node (i, parent)
        if i * 2 + 1
    Node ('/ 0')
    ├── Node ('/ 0/1')
    │ ├── Node ('/ 0/1/3')
    │ │ ├── Node ('/ 0/1/3/7')
    │ │ └── Node ('/ 0/1/3/8')
    │ └── Node ('/ 0/1/4')
    │ ├── Node ('/ 0/1/4/9')
    │ └── Node ('/ 0/1/4/10')
    └── Node ('/ 0/2')
        ├── Node ('/ 0/2/5')
        │ ├── Node ('/ 0/2/5/11')
        │ └── Node ('/ 0/2/5/12')
        └── Node ('/ 0/2/6')
            ├── Node ('/ 0/2/6/13')
            └── Node ('/ 0/2/6/14')