Home>

https://www.jrf-reit.com/portfolio /list.html

The goal is to obtain the property name and address from the property listing page of this real estate investment corporation.

from bs4 import BeautifulSoup
import requests
import urllib
url = 'https://www.jrf-reit.com/portfolio/list.html'
res = requests.get (url)
soup = BeautifulSoup (res.content, 'html.parser')
print (soup.tbody)


Result

<tbody v-bind: key = "index" v-for = "(item, index) in filtered_data">
<tr>
<td rowspan = "2">
<span></span></td>
<td rowspan = "2">

<p>{{item.summary}}</p>
<p v-html = "item.name"></p>
<p>{{item.addr}}</p>

</td>
<td rowspan = "2">{{item.date | shortDate}}</td>
<td rowspan = "2"><span v-if = "site == 'IIF'">{{item.build | shortDate}}</span><span v-else = "">{{item.build | calcAge}}</span></td>
<td rowspan = "2">{{item.space | localeString}}</td>
<td>{{item.price | localeString}}</td>
<td>{{item.valuation | localeString}}</td>
<td>{{item.tenant_num}}</td>
<td>{{item.op_rate}}</td>
</tr>
<tr>
<td>{{calcRatio (item.price, total_price)}}</td>
<td>{{calcRatio (item.valuation, total_valuation)}}</td>
<td colspan = "2">
<p><span v-html = "item.major_tenant"></span></p>
</td>
</tr>
</tbody>

Lines 7-9

<p>{{item.summary}}</p>
<p v-html = "item.name"></p>
<p>{{item.addr}}</p>

I think this is the information I want, but it looks like{{item.addr}}instead of the actual property name or address.
It looks likecontext? Using Django's template engine, but I wantG Building Minami Aoyama 02,
A real name like 5-8-5 Minami Aoyama, Minato-ku, Tokyo.
How can I get this information?

  • Answer # 1

    For now, you can get an address with Selenium and headless chrome.

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup
    options = webdriver.ChromeOptions ()
    options.add_argument ('-headless')
    options.add_argument ('-no-sandbox')
    options.add_argument ('-disable-dev-shm-usage')
    driver = webdriver.Chrome ('chromedriver', options = options)
    driver.implicitly_wait (10)
    driver.get ('https://www.jrf-reit.com/portfolio/list.html')
    html = driver.page_source.encode ('utf-8')
    driver.quit ()
    soup = BeautifulSoup (html, "html.parser")

    You can get from the portfolio list (CSV: 12.9KB) except for the address.

    You can get Javascript array here, so it seems to be able to convert it to JSON
    https://www.jrf-reit.com/common/data/object_list.js

  • Answer # 2

    print (soup.body)

Related articles