Home>

We use php in ubuntu to get data from twitter, store it in json, and perform morphological analysis of [text] in json.
I'm using php-mecab, but I don't know how to extract only the words that have been determined.

BOS/EOS, *, *, *, *, *, *, *, *
Roppongi, Minato-ku, Tokyo
Noun, proper noun, region, general, *, *, Roppongi, Minato-ku, Tokyo, Tokyo Minato Kroppongi, Tokyo Minato Kroppongi

Symbol, blank, *, *, *, *,,,
Roppongi Hills
Noun, proper noun, general, *, *, *, Roppongi Hills, Roppongi Hills, Roppongi Hills

Particle, unification, *, *, *, *, no, no
Above
Noun, non-independent, adverb possible, *, *, *, upper, ue, ue

Particle, case particle, general, *, *, *, ni, ni

Verb, independence, *, *, one step, continuous form, lie, i, i

Auxiliary verb, *, *, *, special/mass, basic form, mas, mass, mass
.
Symbol, punctuation, *, *, *, *. ,. ,.

BOS/EOS, *, *, *, *, *, *, *, *

Applicable source code
<? php
$file = file_get_contents ("tweets.json");
$options = array ('-d', '/ usr/local/lib/mecab/dic/mecab-ipadic-neologd');
$file2 = json_decode ($file, true);
$json_count = count ($file2 ["statuses"]);
$tweets = $file2 ['statuses'] [0] ['text'];
$mecab = new \ MeCab \ Tagger ($options);
$nodes = $mecab->parseToNode ($tweets);
foreach ($nodes as $n)
{
echo $surface = $n->getSurface (). "<br />";
echo $feature = $n->getFeature (). "<br />";
}
?>
Supplemental information (FW/tool version etc.)

php 7.0.33

  • Answer # 1

    $featureshould be divided by a comma, determine if the value of the third element from the beginning is "region", and extract the word when it is true. Uka.

    Note that "Roppongi Hills" in this example cannot be extracted as a region because the value of the third element is not "region".
    If you want to manage this, look for a dictionary that uses "Roppongi Hills" as "region" and install it in MeCab, or add a code to determine whether it is "region" when you extract words. Is necessary.