KarmaTek

KarmaTek

Code | Collaborate | Create
info@karmatek.io | +12028055054

Cleaning up .csv files with awk

Super quick, super easy!


Reformatting a .csv with awk

So I received a request to reformat a large csv file with about 100,000 lines, removing all rows that had no value entered in one of the fields. I had been producing the report using a PHP script and while I could have gone back in to refactor the code and check for an empty value, I thought I'd take an alternative approach to utilize some other tools. I decided to use the Unix awk utility to execute a one-liner after the report was created to elimiate the unecessary lines in the file. There was an extra challenge with the file since it was tab delimited so I had to add some additional arguments to account for this.

Here's the command, which removes any records in the file which have an empty field in the 12th column.

awk -F"\t" '$12!=""' input.csv > output.csv

-F"\t" processes the file as tab delimited.

'$12!=""' finds lines where the 12th column is not blank and input.csv > output.csv takes in the input file and redirects to an output file (awk will print the output by default).

That's it! In no time the file is cleaned of all records with empty values in the 12th column.