Home>

I need to write a loop in order for the parser to collect data from all pages, but my version does not work, how could I implement it differently?

import time
    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    from selenium.webdriver import Chrome
    from datetime import datetime
    PAGES_COUNT= 13
    url= 'https://rozetka.com.ua/search/?page=^^^&
producer= gazer &
seller= rozetka &
text= Gazer '
    url= url.replace ('/? page= 0', '')
    webdriver= r "C: \ Users \ K. Boyar (Second) \ source \ repos \ RozetaParcer \ chromedriver.exe"
    def getPageData ():
        driver= Chrome (webdriver)
    driver.implicitly_wait (10)
    driver.get ("https://rozetka.com.ua/search/?producer=gazer&
seller= rozetka &
text= Gazer ")
    total= []
    items= driver.find_elements_by_css_selector (". goods-tile.ng-star-inserted")
    cur_date= datetime.now (). strftime ("% d_% m_% Y")
    for item in items:
        t_name= item.find_element_by_css_selector ('. goods-tile__title'). text
        t_price= item.find_element_by_css_selector ('. goods-tile__price-value'). text
        t_nal= item.find_element_by_css_selector ('. goods-tile__availability'). text
        row= cur_date, t_name, t_price, t_nal
        total.append (row)
        for pageIdx in range (0, PAGES_COUNT + 1):
            total += getPageData (pageIdx)
    driver.close ()
    df= pd.DataFrame (total, columns= ['Date', 'Name', 'Price', 'Nal'])
    df.to_csv (f'Rozetka_parcer_ {cur_date} .csv ')

Don't use js snippet for python code. Use `` '' before and after the code to format your code. By the way, about formatting, you also need to use it. Now it is not clear where what is happening

gil9red2021-10-13 10:36:21

Fixed, will not happen again!

Константин Николаевич Бояр II2021-10-13 10:37:34

Thanks. There is no time to deal with the parser and write the answer, so I'll give you an idea. Hardcoding the page count is not good. I would do this: 1) load the page with products, parse its products 2) look at the bottom of the page for the presence of a pagination button to the next page 3) if there is, then pull out its link and follow it, repeat step 1). That site has a button to go to the next pagegil9red2021-10-13 10:45:04

Understood, now I'll try, if something comes out I'll send it here as an answer!

Константин Николаевич Бояр II2021-10-13 10:46:03

... but this method has 2 drawbacks: 1) you may have to flip the page to the very bottom, because may selenium swear (check on the spot, if it does not swear, then ok) 2) if the page does not have that pagination element, then the script will wait for the time specified in implicitly_wait and then swear. This is not optimal for the speed of work, then you need to rewrite the code with obvious delays (I wrote about this in the last answer)

gil9red2021-10-13 10:47:33