Home>

How to correctly match nested html using php regular expressions
This is the continuation of the question at the above URL.

Nested HTML (this time<table>) is a way to enclose only the parent with a specific tag (this time<div>).
At first, I was challenging with regular expressions in php, but it didn't work, so I challenged with DOM.

Tag before replacement

&lt;table&gt;
    &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;
        &lt;th&gt;Item&lt;/th&gt;
        &lt;td&gt;
        &lt;table&gt;
            &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
            &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
            &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
        &lt;/table&gt;
        &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;


Replaced tag

&lt;div&gt;&lt;table&gt;
    &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
    &lt;tr&gt;
        &lt;th&gt;Item&lt;/th&gt;
        &lt;td&gt;
        &lt;table&gt;
            &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
            &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
            &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
        &lt;/table&gt;
        &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;&lt;th&gt;Item&lt;/th&gt;&lt;td&gt;Content&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

The class of'.table-child'is added to the table tag of the child element, and the table with that class is excluded.
After that, the class is deleted.

$dom = new DOMDocument ();
$dom-&gt;loadHTML ($content);
// Tables
$tables_wrap = $dom-&gt;createElement ('div');
$tables_wrap-&gt;setAttribute ('class', 'table-wrapper');
$tables = $dom-&gt;getElementsByTagName ('table');
foreach ($tables as $table) {
    foreach ($table-&gt;getElementsByTagName ('table') as $child) {
        $child-&gt;setAttribute ('class', 'dummy-wrapper');
    }
    $classes = $table-&gt;getAttribute ('class');
    if (preg_match ('/ dummy-wrapper /', $classes)) {
        $classes = trim (preg_replace ('/ dummy-wrapper/i', $classes));
        if (empty ($classes)) {
            $table-&gt;removeAttribute ('class');
        } else {
            $table-&gt;setAttribute ('class', $classes);
        }
        continue;
    }
    $table_wrap = $tables_wrap-&gt;cloneNode ();
    $table-&gt;parentNode-&gt;replaceChild ($table_wrap, $table);
    $table_wrap-&gt;appendChild ($table);
}
return $dom-&gt;saveHTML ();

I'd like to ask you a better way.
Is this correct?

php
  • Answer # 1

    For example

    <? PHP
    $html =<<<eof
    <table>
        <tr><th>Item</th><td>Content</td></tr>
        <tr><th>Item</th><td>Content</td></tr>
        <tr>
            <th>Item</th>
            <td>
            <table>
                <tr><th>Item</th><td>Content</td></tr>
                <tr><th>Item</th><td>Content</td></tr>
                <tr><th>Item</th><td>Content</td></tr>
            </table>
            </td>
        </tr>
        <tr><th>Item</th><td>Content</td></tr>
    </table>
    eof;
    $doc = new DOMDocument ();
    $doc->loadHTML ("<meta http-equiv = 'Content-Type' content = 'text/html;charset = UTF-8' />\n".$html);
    $node = $doc->getElementsByTagName ("table") [0];
    $new_node = $doc->createElement ('div');
    $new_node->setAttribute ("class", "table-wrapper");
    $node->parentNode->replaceChild ($new_node, $node);
    $new_node->appendChild ($node);
    print $doc->saveXML ($node->parentNode);

Related articles