awk and tabs in input and output
July 9, 2022
Why does awk incorrectly detect tab delimited data boundaries (input field separator)
The following command will return an empty result instead of the expected third column:
echo '1 2 3 4 5 6' | awk -F'\t' '{ print $3 }'
In the command, instead of the standard FS (Input field separator), which is a space by default, the -F'\t' option is set to a new separator, which is specified as “\t”, which means a tab character.
The problem with the previous command is that in the input, the fields are not actually tab-delimited, but separated by multiple spaces.
That is, using the -F option is not necessary in the previous command:
echo '1 2 3 4 5 6' | awk '{ print $3 }' 3
Even though the data is separated by multiple spaces, you don't need to specify this with the -F option, as it correctly interprets the input. The default field separator in awk is one or more spaces (space or tab), which matches [ \t]+ or if you use the posix classes [[:blank:]]+
This is why, even if the data is actually tab delimited, the awk command handles it correctly:
echo '1 2 3 4' | awk '{ print $3 }' 3
In this case, the -F'\t' option works as expected:
echo '1 2 3 4' | awk -F'\t' '{ print $3 }' 3
It should be noted that the field separator in awk is a regular expression. Therefore, consecutive repeating characters chosen as column separators are treated as a single split between two adjacent fields.
To check which non-printable characters are present in your input, use cat -A. For example:
echo '1 2 3 4' | cat -A 1M-bM-^PM-^A^IM-oM-?M-=2^I3 4$
How to make awk output fields separated by tabs
The following command will output the third and fourth columns separated by a space:
echo '1 2 3 4 5' | awk '{ print $3,$4 }' 3 4
If you want the output data to be separated by tabs (or any other character), then it must be set as the value of OFS (output field separator). For example:
echo '1 2 3 4 5' | awk 'BEGIN {OFS="\t"}; { print $2,$3 }' 2 3
OFS is inserted between fields separated by commas, that is, the following command will not display tabs between fields (and will not even display a space):
echo '1 2 3 4 5' | awk 'BEGIN {OFS="\t"}; { print $2 $3 }' 23
In addition to changing the OFS (output field separator) value, you can specify a tab character in the output template. For example, the following command will use standard OFS (that is, a space) to separate the second and third fields, and a tab character will be inserted between the third and fourth columns:
echo '1 2 3 4 5' | awk '{ print $2,$3"\t"$4 }' 2 3 4
Related articles:
- How to remove newline from command output and files on Linux command line (79.8%)
- How to print from specific column to last in Linux command line (79.8%)
- How to split a large file (text or binary) into smaller files (SOLVED) (70.3%)
- How to convert a string to uppercase in Bash (66.2%)
- How to convert a string to lowercase in Bash (66.2%)
- pacman error “warning: failed to retrieve some files” (SOLVED) (RANDOM - 50%)