Home>

Please let me ask a question about regular expressions in php.

<table>
    <tr><th>Item</th><td>Content</td></tr>
    <tr><th>Item</th><td>Content</td></tr>
    <tr>
        <th>Item</th>
        <td>
        <table>
            <tr><th>Item</th><td>Content</td></tr>
            <tr><th>Item</th><td>Content</td></tr>
            <tr><th>Item</th><td>Content</td></tr>
        </table>
        </td>
    </tr>
    <tr><th>Item</th><td>Content</td></tr>
</table>
For

html

<div><table>
    <tr><th>Item</th><td>Content</td></tr>
    <tr><th>Item</th><td>Content</td></tr>
    <tr>
        <th>Item</th>
        <td>
        <table>
            <tr><th>Item</th><td>Content</td></tr>
            <tr><th>Item</th><td>Content</td></tr>
            <tr><th>Item</th><td>Content</td></tr>
        </table>
        </td>
    </tr>
    <tr><th>Item</th><td>Content</td></tr>
</table></div>
Searching for a regular expression to replace

.

Simply

'/<table (. *?)<\/table>/i'


However, when nested, it matches the closing tag of the child element because of the shortest match.

See below

<div><table>
    <tr><th>Item</th><td>Content</td></tr>
    <tr><th>Item</th><td>Content</td></tr>
    <tr>
        <th>Item</th>
        <td>
        <table>
            <tr><th>Item</th><td>Content</td></tr>
            <tr><th>Item</th><td>Content</td></tr>
            <tr><th>Item</th><td>Content</td></tr>
        </table></div>
        </td>
    </tr>
    <tr><th>Item</th><td>Content</td></tr>
</table>


The same applies to other tags such as div tags.
Is there a way to match the parent element if there are the same child elements?

php
  • Answer # 1

    In the first place, it isincorrectto write<pre>directly in<pre>. Even though the contents of<pre>, the<and>that make up the tag are&lt;or&gt;and a character reference.

  • Answer # 2

    Regular expressions are not impossible, but if the structure is complex,
    It's easier to grab it with DOMDocument

  • Answer # 3

    If you don't have a line break, you could do this, but it's not practical so it's on hold.

    '/<(table) (. *?)>(. * [<table. *?<\/table>] *. *)<\/table>/i'