Thanks for your help.
I will ask you a question for the first time.

I would like to know the identity of a blank on the homepage.

Specifically, there is a space between the representatives between the tables on this home page. (* Person in charge)

There is a non-breaking space () in between, but there is a mysterious half-width space in front of it.
What character is this?

After Unicode conversion, other characters are converted to escape characters such as \ u3000, but only this space remains blank.
Is it another character code such as ShiftJis? Is it a special character?

If i have knowledge, could you please teach me?

  • Answer # 1

    Half-width space as far as you can see in Chrome (\ u0020) Looks like.


    \ u4EE3 \ u0020 \ u0026 \ u006E \ u0062 \ u0073 \ u0070 \ u003B \ u8868 \ u0020 \ u0026 \ u006E \ u0062 \ u0073 \ u0070 \ u003B \ u8005


    \ u4EE3 \ u0020 \ u0020 \ u8868 \ u0020 \ u0020 \ u8005

  • Answer # 2

    st = "daitableperson"

    Previous: re.sub ("\ u3000 \ xa0 \ u2002 \ u0020", "", st)
    After: re.sub ("[\ u3000 \ xa0 \ u2002 \ u0020]", "", st)

    -The regular expression [] was just omitted ...
    I'm sorry for both of you who took the time.

    When converted to unicode with the function called ascii of python, only the half-width space is not \ u0020,
    For some reason, it remained as it was, so I guessed it was a special character.

    Thank you very much.