Home>

I want to remove the BOM from a file to work with python etc.

OS linux

What I tried Confirmation of BOM

$file BOMfile.rpt
BOMfile.rpt: UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators

Delete BOM with nkf

$nkf --overwrite --oc = UTF-8 BOMfile.rpt

Check again

$file BOMfile.rpt
BOMfile.rpt: ASCII text, with very long lines, with CRLF line terminators

Question

(1) I overwrote the file with the above command, but at that time a new file called BOMfile.rpt.nkftmpX9pww2 was created. What is this? ?? ?? Checking the number of lines, it seems that it is considerably less than the original file.
⑵ I want to overwrite the line feed code from CRLF to LF. Can I do it at the same time as deleting the BOM?

Also, please teach me if there is another better way.

  • Answer # 1

    (1) I overwrote the file with the above command, but at that time a new file called BOMfile.rpt.nkftmpX9pww2 was created. What is this? ?? ??

    Due to the difference in the environment, I did not reproduce it when I tried it, but I think that it is a temporary file created in the middle of conversion. It should be okay to delete it after conversion.

    ⑵ I want to overwrite the line feed code from CRLF to LF. Can I do it at the same time as deleting the BOM?

    nkfLine feed code conversion option for commands-L)there is.-LuBy specifying the specification at the same timeLFCan be

    $nkf --version
    Network Kanji Filter Version 2.1.4 (2015-12-12)
    Copyright (C) 1987, FUJITSU LTD. (I.Ichikawa).
    Copyright (C) 1996-2015, The nkf Project.
    $file BOMfile.rpt
    BOMfile.rpt: UTF-8 Unicode (with BOM) text, with CRLF line terminators
    $wc -l BOMfile.rpt
    16 BOMfile.rpt
    $nkf --overwrite --oc = UTF-8 -Lu BOMfile.rpt
    $file BOMfile.rpt
    BOMfile.rpt: UTF-8 Unicode text
    $wc -l BOMfile.rpt
    16 BOMfile.rpt