Home>

I'm running a code that uses MeCab to split a specific part of speech in basic form, but the process isn't complete.
I've been waiting for about 5 hours, but the process is not over and I'm asking if there is a problem with the code.
The tsv file is about 500kb, so I don't think it is a serious cause. When using the same tsv file with other code, it works crisply.
Please make any changes to lighten the operation or point out any problems.

corresponding code

with open ("jurycomment2.tsv", mode = 'r', encoding = 'utf-8') as f:
    # reports.tsv contains word-of-mouth ID and word-of-mouth in a row separated by tabs
    reader = csv.reader (f, delimiter = "\ t")
    for report_id, report in reader:
        words = []
        node = mt.parseToNode (report)
        while node:
                if node.feature.split (",") [0] == u "noun":
                        words.append (node.surface)
                elif node.feature.split (",") [0] == u "adjective":
                        words.append (node.feature.split (",") [6])
                elif node.feature.split (",") [0] == u "verb":
                        words.append (node.feature.split (",") [6])
                        node = node.next
        stopword = []
        words2 = [token for token in words if token not in stopword]
        # words is a list of words in the sentence, tags specify the sentence ID
        reports.append (TaggedDocument (words = words2, tags = [report_id]))
  • Answer # 1

    If there is a node that is neither a noun, a verb nor an adjective, node = node.next will not be called, so an infinite loop will occur.


    Appendix

    I'm glad that it was solved, but if you look closely, it wouldn't be enough if there were non-verb nodes.

    while node:
        if node.feature.split (",") [0] == u "noun":
            words.append (node.surface)
        elif node.feature.split (",") [0] == u "adjective":
            words.append (node.feature.split (",") [6])
        elif node.feature.split (",") [0] == u "verb":
            words.append (node.feature.split (",") [6])
        node = node.next


    is not it.