Home>

Although it is immediately, as shown in the title, we are manually organizing the information currently managed by Excel.
Since the number of target data is enormous, there is a limit in manual work, so it can be processed using VBA
am thinking.

The process is as follows.
1. Delete cells other than"name"characters
2. Deleted single-byte alphanumeric characters
3. Remove "symbol"
4). Delete blank line

■ Example --------------------------------------------- -----------------------------------
Pre-processing A1:<div id ="name">Surname Name</div>
A2:<div id ="age">20</div>
A3:<div id ="address">Tokyo</div>

After processing A1: Last name
A2:

A3: Tokyo

※ As shown in the example, the HTML source is pasted in Excel and we want to extract only the information.

By referring to the website, steps 1 and 2 were realized, but I couldn't find any useful information for steps 3 and 4.
In addition, since VBA itself is a beginner, a button is provided for each process and executed one by one.

I'd like you to teach me the code.

  • Answer # 1

    I wrote a regular expression.
    It is supposed to process 100 lines.
    Correct the number of lines accordingly.
    Delete blank lines after deleting non-double-byte characters from each line.
    Please correct the"sheet name"accordingly.

    Sub test ()
        Set reg = CreateObject ("VBscript.RegExp")
        Set sh = Worksheets ("Sheet name")
        With reg
            .Pattern = "[\ x01- \ x7E \ xA1- \ xDF]"
            .IgnoreCase = False
            .Global = True
        End With
        'Remove all non-double-byte characters
        For r = 1 To 100
            sh.Cells (r, 1) = reg.Replace (sh.Cells (r, 1), "")
        Next
        'Delete blank line
        For r = 100 To 1 Step -1
            If sh.Cells (r, 1) = "" Then
                sh.Cells (r, 1) .EntireRow.Delete
            End If
        Next
        Set reg = Nothing
    End Sub


    Addendum
    Sorry for the correction after closing the question.
    The purpose is to make work more efficient and simple, and I don't think that the efficiency of the code itself is related.
    Since the loop seems to be one time, I will put an improved version.

    Sub test ()
        Set reg = CreateObject ("VBscript.RegExp")
        Set sh = Worksheets ("Sheet name")
        With reg
            .Pattern = "[\ x01- \ x7E \ xA1- \ xDF]"
            .IgnoreCase = False
            .Global = True
        End With
        For r = 100 To 1 Step -1
            'Remove all non-double-byte characters
            sh.Cells (r, 1) = reg.Replace (sh.Cells (r, 1), "")
            'Delete if blank
            If sh.Cells (r, 1) = "" Then
                sh.Cells (r, 1) .EntireRow.Delete
            End If
        Next
        Set reg = Nothing
    End Sub