Home>

I want to display only the tag name "gml: posList" information from the following XML file.
Eventually, we are thinking about outputting only position coordinate data in CSV format.
I want to do standard output for the time being, but it doesn't work
Please tell me someone.
The python code is below.

# coding: utf-8
from bs4 import BeautifulSoup
import csv
import pandas as pd
import os
xml = open ("A31-12_13.xml", "r", encoding = "utf-8"). read ()
soup = BeautifulSoup (xml, 'lxml-xml')
info = {}
data = soup.find_all ("gml: posList")
print (data)

If i specify with find_all, it would be a huge amount, so I thought that nothing would be displayed due to memory error etc. I tried specifying with find, but the result was None.

<? xml version = "1.0" encoding = "UTF-8"?>
<ksj: Dataset gml: id = "A31Dataset" xmlns: ksj = "http://nlftp.mlit.go.jp/ksj/schemas/ksj-app" xmlns: gml = "http://www.opengis.net /gml/3.2 "xmlns: xlink =" http://www.w3.org/1999/xlink "xmlns: xsi =" http://www.w3.org/2001/XMLSchema-instance "xsi: schemaLocation =" http://nlftp.mlit.go.jp/ksj/schemas/ksj-app KsjAppSchema-A31.xsd ">
<gml: description>National Land Numerical Information Inundation Expected Area Instance Document</gml: description>
<!-Data provision range->
<gml: boundedBy>
    <gml: EnvelopeWithTimePeriod srsName = "JGD2000/(B, L)" frame = "GC/JST">
        <gml: lowerCorner>20.0 123.0</gml: lowerCorner>
        <gml: upperCorner>46.0 154.0</gml: upperCorner>
        <gml: beginPosition calendarEraName = "AD">1900</gml: beginPosition>
        <gml: endPosition indeterminatePosition = "unknown" />
    </gml: EnvelopeWithTimePeriod>
</gml: boundedBy>
<gml: Curve gml: id = "c00001">
    <gml: segments>
        <gml: LineStringSegment>
            <gml: posList>
35.817677 139.767502
35.817654 139.767160
35.817635 139.767028
35.817580 139.766645
35.817475 139.766214
35.817265 139.765878
35.816824 139.765594
35.816578 139.765308
35.816492 139.765208
35.816435 139.765142
35.816361 139.765211
35.816338 139.765231
35.816223 139.765338
35.815745 139.765779
35.815693 139.765699
35.815020 139.766155
35.814327 139.765866
35.813789 139.766001
35.813731 139.767758
35.814411 139.767795
35.815425 139.767733
35.815438 139.768314
35.815339 139.768599
35.815413 139.769020
35.815673 139.769181
35.815487 139.769552
35.814745 139.770431
35.815017 139.770530
35.816101 139.770586
35.816044 139.770143
35.816024 139.769277
35.816093 139.768886
35.815850 139.768255
35.815754 139.767677
35.815600 139.767350
35.815369 139.766599
35.815888 139.766445
35.816850 139.765963
35.817292 139.766425
35.817254 139.767157
35.817372 139.767640
35.817677 139.767502
            </gml: posList>
        </gml: LineStringSegment>
    </gml: segments>
</gml: Curve>
<gml: Surface gml: id = "a00001">
    <gml: patches>
        <gml: PolygonPatch>
            <gml: exterior>
                <gml: Ring>
                    <gml: curveMember xlink: href = "# c00001" />
                </gml: Ring>
            </gml: exterior>
        </gml: PolygonPatch>
    </gml: patches>
</gml: Surface>
  • Answer # 1

    It was solved with the following code. Thank you very much.

    # coding: utf-8
    from bs4 import BeautifulSoup
    import csv
    import pandas as pd
    import os
    with open ('A31-12_13.xml', 'r', encoding = 'utf-8') as file:
        read = file.read ()
    soup = BeautifulSoup (read, "lxml-xml")
    position = soup.find_all ("posList")
    position = str (position)
    position_split = position.split ('\ n')
    with open ('sinsui_tokyo_xmldata.csv', 'w', encoding = 'CP932', newline = "") as file: # add to sumodata {}. csv
        writer = csv.writer (file)
        writer.writerows ([position_split])
    for i in range (0, 101):
        print (position_split [i])