I want to delete each column in the csv file with a Linux command.
I'm thinking about finding duplicate words in each line loop usingcut -d,
etc., with commas as delimiters and deleting them.
Specifically, it looks like the following.
kingyo, panda, pig, pig
neko, inu, sakana, penguin
sea, see, sea, mountain
taro, taro, taro1, taro2
kanji, hiragana, katakana, eigo, kanji
kingyo, panda, pig
neko, inu, sakana, penguin
sea, see, mountain
taro, taro1, taro2
kanji, hiragana, katakana, eigo
I want to convert like
Are there any solutions?
[Afterwards]
Thank you all for decompressing.
Additionally, it is necessary to delete the numbers attached to the end of each word, so I useddatamash transpose
as shown below to execute one line at a time.
The original file is specified in $1, and the destination file name is specified in $2.
while read row;do
echo $row |
sed -e 's /,/\ t/g' | #, =>tab
datamash transpose | # transpose
# Remove 1 or 2 digits at the end of line
sed -e "s/[0-9] * $//" |
sed -e "s/[0-9] [0-9] * $//" |
sort -u | # Remove duplicate lines
tr '\ n' ',' | # Return a line with a newline as a comma
sed "s/^, // g" >>$2 # Remove comma at beginning of line
done<$1
I would be grateful if you could give me your opinion if there is something wrong with this method.
P.S. This data is processed bycut
after the fifth column of the original data. After working, it is necessary to merge with the original file, but when you merge with thepaste
command, several rows and columns are stored in one cell at several places. So, instead of cutting out the original data, I want to use thesed
command etc. "only for items in the ~ th column" while keeping the original data. Is there any good way to do this?
-
Answer # 1
-
Answer # 2
I tried with Perl's one-liner.
$cat file.csv kingyo, panda, pig, pig neko, inu, sakana, penguin sea, see, sea, mountain taro, taro, taro1, taro2 kanji, hiragana, katakana, eigo, kanji $perl -nle 'print join ",", grep {! $buf {$_} ++} split ",", $_;' file.csv kingyo, panda, pig neko, inu, sakana, penguin sea, see, mountain taro, taro1, taro2 kanji, hiragana, katakana, eigo
-
Answer # 3
When trying to write with
awk
, it seemed a bit longer, so stop it and usesed
.cat<
Since there may be more than two of the same word, loop witht
until the same word is gone. -
Answer # 4
I used perl to change the taste a little and tried using only regular expressions.
$perl -ple 's/(,?) ([^,] +) (? {$`! ~ $2? $1. $2:" "})/$^ R/g 'file.csv
-
Answer # 5
while read row;do echo "Executing commands in $count th row ..." echo "..." echo "..." count = $((count + 1)) echo $row | cut -d, -f 5- | # after column 5 nkf -X --overwrite | # Change half-width kana to full-width kana sed -e 's /,/\ t/g' | #, =>tab datamash transpose | # transpose sort -u | # Remove duplicate lines tr '\ n' ',' | # Return a line with a newline as a comma sed 's/^ "// g' | # Delete quotation at the beginning of the line sed "s/^, // g" >>$2 # Remove comma at beginning of line echo >>$2 done<$1
Related articles
- linux - i want to delete 3 lines left with duplicate lines with a specific character string in the shell
- javascript - i want to delete the specified column duplicate item in the gas spreadsheet with reference to ascending/descending
- linux - i want to delete unnecessary words from mozc's system dictionary
- mysql - ec2 amazon linux 2 i want to delete unnecessary files
- Linux delete files with spaces (not directories)
- python - to delete duplicate dictionary list key values
- linux - i want to delete the untracked files displayed in git status
- Linux delete system comes with a detailed explanation of the process of Python
- python - i want to delete specific data in a column
- sql server - if there is duplicate column data in sql, i want to extract only the latest one
- Python37 openpyxl delete a specified column or line of code
- Linux deepin delete redundant kernel implementation method
- ruby - [linux] i want to recursively delete files other than specified multiple files
- python find duplicate pictures and delete (picture deduplication)
- Linux server delete folder, delete file, decompress command method
- pandas delete row delete column increase row increase column implementation
- Summary of mysql find and delete duplicate data in a table
- Linux command line delete file operation method
- Linux delete file prompts Operation not permitted
- Linux awk comma-separated example of a file column
- replacement of specified columns and characters in linux command
- linux : How to display in the terminal the rights of a specific user for a specific file?
- linux - about errors when running npm run dev
- linux - extract from command result with bash with grep and put log
- linux - run ansible-playbook with extra-vars option using bash
- linux - i want to insert a blank line below the shell script string
- linux - i don't know how to edit profile with a text editor and save environment variables (learning about bash)
- linux - i want to generate json from a list
- i want to call the argument of the python script executed on the terminal in the code and omit it
- linux - how to align multiple outputs horizontally and output to csv
take88's answer and idea are the same, but
If you don't specify my working hash as my, the problem will occur when the same word reappears on another line.
Use-F, a option to write a simple one-liner.
So
Additional questions
I think it's too wasteful to replace a matrix when you just want to delete the number of each term.
If you use sed, you can use
s/[0-9] *, /,/g;s/[0-9] * $//;
.A script that supports additional items. As expected, oneliner has become difficult, so let's make it a script file.