Home>

Colabratory uses python3 to collect Twitter hashtags, and also collects the combinations of hashtags included in tweets to determine the degree of duplication.
Initially we were collecting with Tweety, but with the TwitterAPI method we can not collect retroactively over a week ago, so we decided to collect tweets using a package called GetOldTweets. did.
However, when I changed the program made using Tweepy to a program that uses GetOldTweets, the following error message occurred.

Addition: As a result of executing the response, an error occurred again. The corresponding source code has also been changed.

Error message
AttributeError Traceback (most recent call last)
<ipython-input-35-d2284eea22e6>in<module>()
     15 for v in tweet:
     16 print (v.text)
--->17 for tag0, tag1 in itertools.combinations (v.entities ['hashtags'], 2):
     18 tag0 = tag0 ['text']
     19 tag1 = tag1 ['text']
AttributeError: 'Tweet' object has no attribute 'entities'
Applicable source code
! git clone https://github.com/Jefferson-Henrique/GetOldTweets-python
! pip install lxml pyquery
import os
os.chdir ('GetOldTweets-python')
import got3 as got
import json
import itertools
import networkx as nx
G = nx.Graph ()
tweetCriteria = got.manager.TweetCriteria (). setQuerySearch ('# heatstroke'). setSince ("2018-07-10"). setUntil (
    "2018-08-30"). SetMaxTweets (10000)
tweet = got.manager.TweetManager.getTweets (tweetCriteria)
print (tweet)
for v in tweet:
  print (v.text)
  for tag0, tag1 in itertools.combinations (v.entities ['hashtags'], 2):
    tag0 = tag0 ['text']
    tag1 = tag1 ['text']
    if G.has_edge (tag0, tag1):
      G [tag0] [tag1] ["weight"] + = 1
    else:
      G.add_edge (tag0, tag1, weight = 1)
Tried

A program created using Tweepy was replaced with GetOldTweets and executed with a star.
However, there is nothing to see how to build a program, and I have no idea what to do to try out the method.

Supplemental information (FW/tool version etc.)
from tweepy.streaming
import StreamListener
import json
import networkx as nx
G = nx.Graph ()
class MyStreamListener (StreamListener):
  def __init __ (self, api, ** kw):
    self.api = api
    super (tweepy.StreamListener, self) .__ init __ ()
    self.twcnt = 0
  def on_status (self, tweet):
    self.twcnt + = 1
    for tag0, tag1 in itertools.combinations (tweet.entities ['hashtags'], 2):
      tag0 = tag0 ['text']
      tag1 = tag1 ['text']
      if G.has_edge (tag0, tag1):
        G [tag0] [tag1] ["weight"] + = 1
      else:
        G.add_edge (tag0, tag1, weight = 1)
    if self.twcnt>10000:
      return False
  def on_error (self, status):
    return True
auth = tweepy.OAuthHandler (consumer_key, consumer_secret)
auth.set_access_token (access_token, access_token_secret)
api = tweepy.API (auth)
stream = tweepy.Stream (auth, MyStreamListener (api))
stream.filter (track = ['Follow #RT people'])


The following image is a part of the result when collecting in real time using the above program.

I would like to use GetOldTweets to extract like the above image.

  • Answer # 1

    I have never used GetOldTweets, or even Tweety.
    For now, the current error can be solved by importing itertools.

    import itertools
    Added

    Because it was a boat I boarded, I tried it a little at hand.
    You can get hashtags by the following means:

    Local environment

    Microsoft Windows [Version 10.0.17134.228]

    Python 3.6.6 :: Anaconda, Inc.

    Remodeled GetOldTweets

    I don't know if the Twitter specification has changed, but I can't get the hashtag right now.
    Replace the code on line 39 of got3.manager.TweetManager.py as follows:

    # txt = re.sub (r "\ s +", "", tweetPQ ("p.js-tweet-text"). text (). replace ('# ',' # '). replace (' @ ',' @ '))
    txt = tweetPQ ("p.js-tweet-text"). text ()
    txt = re.sub (r "# \ s?", '#', txt)
    txt = re.sub (r "@ \ s?", '@', txt)
    txt = re.sub (r "\ s +", '', txt)
    Execution code
    import got3 as got
    tweet_criteria = got.manager.TweetCriteria () \
        .setQuerySearch ('# heatstroke') \
        .setSince ("2018-07-10") \
        .setUntil ("2018-08-30") \
        .setMaxTweets (10)
    tweets = got.manager.TweetManager.getTweets (tweet_criteria)
    for tweet in tweets:
        hash_tags = [
            tag.lstrip ('#') for tag in tweet.hashtags.split ()
            if tag! = '#'
        ]
        print (hash_tags)

    After executing the above code, the following results were obtained.

    ['heatstroke&apos ;,'heatstroke measures&apos ;,'hot&apos ;,'blog update http']
    ['care&apos ;,'heatstroke']
    ['Kanto&apos ;,'weather&apos ;,'heatstroke']
    ['heatstroke']
    ['heatstroke']
    ['vertigo&apos ;,'vomiting&apos ;,'heatstroke&apos ;,'convulsions&apos ;,'muscle pain']
    ['heatstroke']
    ['Gyudon&apos ;,'Yoshinoya&apos ;,'Frozen&apos ;,'Free Shipping&apos ;,'Rakuten&apos ;,'Summer&apos ;,'Hot&apos ;,'Heatstroke&apos ;]
    ['heatstroke attention&apos ;,'heatstroke prevention&apos ;,'heatstroke measures&apos ;,'severe heat&apos ;,'heatstroke']
    ['drug&apos ;,'pharmacist&apos ;,'heatstroke&apos ;,'dispensing pharmacy&apos ;,'dehydration&apos ;,'oral rehydration solution&apos ;,'dehydration symptoms']

    If you take this combination, it should work.

    where

    GetOldTweets doesn't seem to be well maintained for Python3.x.
    (If you look at the contents of the error, I'm wondering if it works correctly even with the 2nd line.)

      

    This package assumes using Python 2.x.The Python3"got3"folder is maintained as experimental and is not officially supported.

    Source: GetOldTweets-python/README.mdBoldis the quote

    I ’m sorry I could n’t suggest an alternative, but I ’d recommendalternatives.