Home>

Save Excel as text data in Unicode on windows,
File sharing to a virtual machine with VirtualBox. OS is CentOS.
The file currently has a lot of tab spaces, and I want to delete them all at once.

Problems you have encountered or tried

# tr -d''<test1-4.txt>test1-4
And nothing was converted.

# tr -d \ t<test1-4.txt>test1-4
Did nothing,

# tr -d'\ t'<test1-4.txt>test1-4
The tab just garbled.

Supplemental information (FW/tool version etc.)

windows10
CentOS Linux release 7.5.1804 (Core)

Append (About garbled characters)

1 1 5 1 11B
2 0 2 1 2 1 215
3 0 3 1 6 1 31D
4 0 4 1 1 1 413
5 0 5 1 3 1 517

is

0 ㄀ ^ @ 1 㔀 ^ @ 1 ^ @ 11B ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ഀ ਀ ^ @ 2 ㄀ ^ @ 2 ㄀ ^ @ ㈀ ㄀ 㔀 ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @@
^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @
2 0㌀ ^ @ 1 㘀 ^ @ 1 ^ @ 31D ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ഀ ਀ ^ @ 4 ㄀ ^ @ 1 ㄀ ^ @ 㐀 ㄀ ㌀ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^^
@ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @
3 0 㔀 ^ @ 1㌀ ^ @ 1 ^ @ 517 ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ഀ ਀ ^ @ 6 ㄀ ^ @ 4 ㄀ ^ @ 㘀 ㄀ 㤀 ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^^
@ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @
4 0 㜀 ^ @ 1㈀ ^ @ 1 ^ @ 715 ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ഀ ਀ ^ @ 8 ㄀ ^ @ 5 ㄀ ^ @ 㠀 ㄀ 䈀 ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^^
@ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @
5 0 㤀 ^ @ 1 㜀 ^ @ 1 ^ @ 91F ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ഀ ਀ ^ @ 10 ㄀ ^ @ 6 ㄀ ^ @ 䄀 ㄀ 䐀 ^ @ ^ @ ^ @ ^ @ ^ @ ^ @@
^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @
6 0 ㄀ ㄀ ^ @ 1㌀ ^ @ 1 ^ @ B17 ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ഀ ਀ ^ @ 12 ㄀ ^ @ 5 ㄀ ^ @ 䌀 ㄀ 䈀 ^ @ ^ @ ^ @ ^ @ ^ @ ^ @@
^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @

It will be in a state

With hexdump -c test1-4

0000000 377 376 0 \ 0 \ 0 1 \ 0 \ 0 1 \ 0 \ 0 5 \ 0 \ 0 1 \ 0
0000010 \ 0 \ 0 1 \ 0 1 \ 0 B \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0
0000020 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0
0000030 \ 0 \ 0 \ 0 \ 0 \ 0 \ r \ 0 \ n \ 0 0 \ 0 \ 0 2 \ 0 \ 0 1
0000040 \ 0 \ 0 2 \ 0 \ 0 1 \ 0 \ 0 \ 0 2 \ 0 1 \ 0 5 \ 0 \ 0
0000050 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0

In od -tx1 test1-4

0000000 ff fe 30 00 00 31 00 00 31 00 00 35 00 00 31 00
0000020 00 00 31 00 31 00 42 00 00 00 00 00 00 00 00 00
0000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000060 00 00 00 00 00 0d 00 0a 00 30 00 00 32 00 00 31
0000100 00 00 32 00 00 31 00 00 00 32 00 31 00 35 00 00
0000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000140 00 00 00 00 00 00 00 00 00 00 00 00 0d 00 0a 00
0000160 30 00 00 33 00 00 31 00 00 36 00 00 31 00 00 00

(Some excerpts from the beginning)

  • Answer # 1

    In Windows, "Unicode" means UTF-16 encoding. Is that correct?
    If that is the case, it cannot be processed by Linux general string processing commands.
    Once converted to UTF-8, it is processed.

    iconv -f utf-16 -t utf-8 test1-4.txt | tr -d '\ t' | iconv -f utf-8 -t utf-16> ;test1-4

    In the above example, the tab was deleted and then changed back to UTF-16. On Linux, it is better to process with UTF-8 and make it UTF-16 when it is really needed.

  • Answer # 2

    Ctrl-v [TAB]allows you to enter the tab character itself in the console, sotr -d"[TAB character]"< Try/code>.

  • Answer # 3

    sed -e "s/\ t // g"<test1-4.txt>test1-4

    Isn't it cut with this?