awk and tabs in input and output

Why does awk incorrectly detect tab delimited data boundaries (input field separator)

The following command will return an empty result instead of the expected third column:

echo '1     2     3     4     5     6' | awk -F'\t' '{ print $3 }'

In the command, instead of the standard FS (Input field separator), which is a space by default, the -F'\t' option is set to a new separator, which is specified as “\t”, which means a tab character.

The problem with the previous command is that in the input, the fields are not actually tab-delimited, but separated by multiple spaces.

That is, using the -F option is not necessary in the previous command:

echo '1     2     3     4     5     6' | awk '{ print $3 }'

Even though the data is separated by multiple spaces, you don't need to specify this with the -F option, as it correctly interprets the input. The default field separator in awk is one or more spaces (space or tab), which matches [ \t]+ or if you use the posix classes [[:blank:]]+

This is why, even if the data is actually tab delimited, the awk command handles it correctly:

echo '1	2	3	4' | awk '{ print $3 }'

In this case, the -F'\t' option works as expected:

echo '1	2	3	4' | awk -F'\t' '{ print $3 }'

It should be noted that the field separator in awk is a regular expression. Therefore, consecutive repeating characters chosen as column separators are treated as a single split between two adjacent fields.

To check which non-printable characters are present in your input, use cat -A. For example:

echo '1	2	3    4' | cat -A
1M-bM-^PM-^A^IM-oM-?M-=2^I3    4$

How to make awk output fields separated by tabs

The following command will output the third and fourth columns separated by a space:

echo '1	2	3	4	5' | awk '{ print $3,$4 }'
3 4

If you want the output data to be separated by tabs (or any other character), then it must be set as the value of OFS (output field separator). For example:

echo '1	2	3	4	5' | awk 'BEGIN {OFS="\t"}; { print $2,$3 }'
2	3

OFS is inserted between fields separated by commas, that is, the following command will not display tabs between fields (and will not even display a space):

echo '1	2	3	4	5' | awk 'BEGIN {OFS="\t"}; { print $2 $3 }'

In addition to changing the OFS (output field separator) value, you can specify a tab character in the output template. For example, the following command will use standard OFS (that is, a space) to separate the second and third fields, and a tab character will be inserted between the third and fourth columns:

echo '1	2	3	4	5' | awk '{ print $2,$3"\t"$4 }'
2 3	4

Leave Your Observation

Your email address will not be published.