Tag: console text processing and console text editors

awk and tabs in input and output

Why does awk incorrectly detect tab delimited data boundaries (input field separator)

The following command will return an empty result instead of the expected third column:

echo '1     2     3     4     5     6' | awk -F'\t' '{ print $3 }'

In the command, instead of the standard FS (Input field separator), which is a space by default, the -F'\t' option is set to a new separator, which is specified as “\t”, which means a tab character.

The problem with the previous command is that in the input, the fields are not actually tab-delimited, but separated by multiple spaces.

That is, using the -F option is not necessary in the previous command:

echo '1     2     3     4     5     6' | awk '{ print $3 }'
3

Even though the data is separated by multiple spaces, you don't need to specify this with the -F option, as it correctly interprets the input. The default field separator in awk is one or more spaces (space or tab), which matches [ \t]+ or if you use the posix classes [[:blank:]]+

This is why, even if the data is actually tab delimited, the awk command handles it correctly:

echo '1	2	3	4' | awk '{ print $3 }'
3

In this case, the -F'\t' option works as expected:

echo '1	2	3	4' | awk -F'\t' '{ print $3 }'
3

It should be noted that the field separator in awk is a regular expression. Therefore, consecutive repeating characters chosen as column separators are treated as a single split between two adjacent fields.

To check which non-printable characters are present in your input, use cat -A. For example:

echo '1	2	3    4' | cat -A
1M-bM-^PM-^A^IM-oM-?M-=2^I3    4$

How to make awk output fields separated by tabs

The following command will output the third and fourth columns separated by a space:

echo '1	2	3	4	5' | awk '{ print $3,$4 }'
3 4

If you want the output data to be separated by tabs (or any other character), then it must be set as the value of OFS (output field separator). For example:

echo '1	2	3	4	5' | awk 'BEGIN {OFS="\t"}; { print $2,$3 }'
2	3

OFS is inserted between fields separated by commas, that is, the following command will not display tabs between fields (and will not even display a space):

echo '1	2	3	4	5' | awk 'BEGIN {OFS="\t"}; { print $2 $3 }'
23

In addition to changing the OFS (output field separator) value, you can specify a tab character in the output template. For example, the following command will use standard OFS (that is, a space) to separate the second and third fields, and a tab character will be inserted between the third and fourth columns:

echo '1	2	3	4	5' | awk '{ print $2,$3"\t"$4 }'
2 3	4

How to convert a string to lowercase in Bash

This note will show you how to convert a string to lowercase (small letters) on the Linux command line.

To convert a string to lower case regardless of its current case, use one of the following commands.

tr

echo "Hi all" | tr '[:upper:]' '[:lower:]'
hi all

Attention! If you want to change the case of any letters other than Latin (national alphabets, letters with diacritics), then do not use tr, but use any other solution suggested below. This is because the classic Unix tr operates on single-byte characters and is not compatible with Unicode.

AWK

echo "Hi all" | awk '{print tolower($0)}'
hi all

Bash

a="Hi all"
echo "${a,,}"
hi all

Starting with Bash 5.1, there is a conversion option “L”, which is intended to convert a string to lowercase:

${var@L}

Example:

v="heLLo"
echo "${v@L}"
hello

sed

echo "Hi all" | sed -e 's/\(.*\)/\L\1/'
hi all

Or this solution:

echo "Hi all" | sed -e 's/\(.*\)/\L\1/' <<< "$a"
hi all

Another solution:

echo "Hi all" | sed 's/./\L&/g'

Perl

echo "Hi all" | perl -ne 'print lc'
hi all

Python

a="Hi all"
b=`echo "print ('$a'.lower())" | python`; echo $b

Ruby

a="Hi all"
b=`echo "print '$a'.downcase" | ruby`; echo $b

PHP

b=`php -r "print strtolower('$a');"`; echo $b

NodeJS

b=`node -p "\"$a\".toLowerCase()"`; echo $b

In zsh

a="Hi all"
echo $a:l

How to convert a string to uppercase in Bash

This note will show you how to convert a string to upper case (capital letters, uppercase) on the Linux command line.

To convert a string to capital letters regardless of its current case, use one of the following commands.

tr

echo "Hi all" | tr '[:lower:]' '[:upper:]'
HI ALL

Attention! If you want to change the case of any letters other than Latin (national alphabets, letters with diacritics), then do not use tr, but use any other solution suggested below. This is because the classic Unix tr operates on single-byte characters and is not compatible with Unicode.

AWK

echo "Hi all" | awk '{print toupper($0)}'
HI ALL

Bash

a="Hi all"
echo "${a^^}"
HI ALL

Starting with Bash 5.1, there is a conversion option U, which is intended to convert a string to uppercase:

${var@U}

Example:

v="heLLo"
echo "${v@U}"
HELLO

sed

echo "Hi all" | sed -e 's/\(.*\)/\U\1/'
HI ALL

Or this solution:

echo "Hi all" | sed -e 's/\(.*\)/\U\1/' <<< "$a"
HI ALL

Another solution:

echo "Hi all" | sed 's/./\U&/g'

Perl

echo "Hi all" | perl -ne 'print uc'
HI ALL

Python

a="Hi all"
b=`echo "print ('$a'.upper())" | python`; echo $b

Ruby

a="Hi all"
b=`echo "print '$a'.upcase" | ruby`; echo $b

PHP

b=`php -r "print strtoupper('$a');"`; echo $b

NodeJS

b=`node -p "\"$a\".toUpperCase()"`; echo $b

In zsh

a="Hi all"
echo $a:u

How to print from specific column to last in Linux command line

In this note, we will consider how to display from a specific column to the last one. For example:

  • how to output from second column to last
  • how to display from third column to last
  • how to display from the fourth column to the last
  • how to output from nth column to last
  • how to display the last column

To display from a certain column to the very last, the “cut” command is most convenient. If for some reason you prefer the “awk” command, then here will be shown examples of output from a specific column to the last using the awk command.

cut

How to output from the nth column to the last one:

Use the following command, in which “N” is replaced by the number of the column from which you want to start the output:

cut -fN- -d' '

How to output from the second column to the last:

cut -f2- -d' '

Example:

echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | cut -f2- -d' '

How to output from the third column to the last:

cut -f3- -d' '

Example:

echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | cut -f3- -d' '

How to display from the fourth column to the last:

cut -f4- -d' '

Example:

echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | cut -f4- -d' '

awk

How to output from the nth column to the last one:

In awk you can use a construct like:

awk '{$1=$2=$3="";print $0}'

This example will clear the contents of the first three columns and show all columns starting from the fourth column.

But at the same time, the spaces that are between the first and second columns, the second and third columns, as well as the third and fourth columns will not be removed and will be shown. To remove them, additionally use the “sed” command:

awk '{$1=$2=$3="";print $0}' | sed -r "s/^\s+//g"

Or you can use the following awk command to clean up spaces between deleted columns:

awk '{$1=$2=$3=""; $0=$0; $1=$1; print}'

How to output from the second column to the last:

awk '{$1="";print $0}' | sed -r "s/^\s+//g"
awk '{$1=""; $0=$0; $1=$1; print}'

Examples:

echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | awk '{$1="";print $0}' | sed -r "s/^\s+//g"
echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | awk '{$1=""; $0=$0; $1=$1; print}'

How to output from the third column to the last:

awk '{$1=$2="";print $0}' | sed -r "s/^\s+//g"
awk '{$1=$2=""; $0=$0; $1=$1; print}'

Examples:

echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | awk '{$1=$2="";print $0}' | sed -r "s/^\s+//g"
echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | awk '{$1=$2=""; $0=$0; $1=$1; print}'

How to display from the fourth column to the last:

awk '{$1=$2=$3="";print $0}' | sed -r "s/^\s+//g"
awk '{$1=$2=$3=""; $0=$0; $1=$1; print}'

Examples:

echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | awk '{$1=$2=$3="";print $0}' | sed -r "s/^\s+//g"
echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | awk '{$1=$2=$3=""; $0=$0; $1=$1; print}'

How to display the last column

To display the last column with awk, use the command:

awk '{print $NF}'

For example:

echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | awk '{print $NF}'

If you want to do without awk, then use the following set of commands:

rev | cut -d' ' -f1 | rev

For example:

echo 'vddp vddpi vss cb0 cb1 cb2 cb3 ct0 ct1 ct2 ct3' | rev | cut -d' ' -f1 | rev

How to remove newline from command output and files on Linux command line

How to remove newline from a string in Bash

The following characters are used for line breaks in operating systems:

  • '\n' (newline)
  • '\r' (carriage return)

Moreover, \n is used in Linux (also called EOL, End of Line, newline). There may be variations on other operating systems.

By default, many programs, Linux command line utilities automatically add the newline character – in general, this makes the output more readable. But sometimes the newline character is not needed. This note is about how to remove the newline character from the output string or from the lines of the file.

How to remove newline character from a string

echo

If you're echoing a string or the result of a command with “echo”, then you can use the -n option, which means don't print the trailing newline character.

Note the different output of the commands:

echo -n 'HackWare.ru' | md5sum
ce7d43633e2bfb3d283f2cfbdbeb0d2a -

echo 'HackWare.ru' | md5sum
19acfcdef400742c5de064e0bf9e9a87 -

The first command calculates the checksum of the “HackWare.ru” string, and the second command calculates the checksum of the “HackWare.ru” string to which the trailing newline character is added.

tr

You can remove the trailing newline with tr in the construct

tr -d '\n'

For example:

echo 'HackWare.ru' | tr -d '\n' | md5sum
ce7d43633e2bfb3d283f2cfbdbeb0d2a -

sed

You can remove the trailing newline with sed in a construct (this command removes the "\n" and "\r" characters:

sed -z 's/[\n\r]//g'

For example:

echo 'HackWare.ru' | sed -z 's/[\n\r]//g' | md5sum
ce7d43633e2bfb3d283f2cfbdbeb0d2a -

Perl

The following PERL construct also removes the newline character:

perl -pe 'chomp'

Example:

echo 'HackWare.ru' | perl -pe 'chomp' | md5sum
ce7d43633e2bfb3d283f2cfbdbeb0d2a -

Another usage example:

wc -l < log.txt | perl -pe 'chomp'

awk

With awk, you can remove newline characters using the following construct:

awk '{ printf "%s", $0 }'

For example:

echo 'HackWare.ru' | awk '{ printf "%s", $0 }' | md5sum

Another option:

awk '{printf $0}'

For example:

echo 'HackWare.ru' | awk '{printf $0}' | md5sum

Removing newline from command output

All previous examples can be used to remove newline from command output by piping the output (“|”). Here are a few more constructs that you can use to remove newline from the output of a command.

printf

Put the COMMAND in the expression:

printf '%s' $(COMMAND)

The result of executing the COMMAND without the trailing newline will be displayed.

For example:

printf '%s' $(echo 'HackWare.ru') | md5sum

xargs and echo

To suppress the output of the newline character, you can use the xargs construct:

COMMAND | xargs echo -n

Be careful with the previous construct as it also compresses spaces. To understand what is at stake, examine the output of the following command:

echo "a b" | xargs echo -n; echo -n $(echo "a b")

Because xargs can be very slow, you can use the following construct:

echo -n `COMMAND`

Remember that if the output starts with -e, then the previous construct will interpret the output as an “echo” option.

Command substitution

In the following examples, the command enclosed in "$(COMMAND)" will be printed without the trailing newline:

echo -n "$(wc -l < log.txt)"
printf "%s" "$(wc -l < log.txt)"

How to remove only last newline character from multiline output

All previous examples assume that deleting a character is done from a single line output. If you need to remove the last character from a multiline output, the following shows how to do it.

Perl

The following command will output the contents of the log.txt file, removing only one newline character at the very end of the file, all other newlines will be preserved. The peculiarity of the command is that even if the file ends with several newline characters, they will all be deleted.

perl -pe 'chomp if eof' log.txt

printf

The following example will also remove the newline character at the end of the log.txt file, but it will only remove the LAST newline character:

printf "%s" "$(< log.txt)"

How to remove newline from a file in Bash

You can use file output in conjunction with any of the above constructs to remove the newline. For example:

cat log.txt | tr -d '\n'

Similar to the previous command:

tr -d '\n' < log.txt

The awk, sed, perl, and other commands can either process standard input or take the names of the file to be processed (remove newlines) as an option. Examples:

awk '{ printf "%s", $0 }' log.txt
awk '{printf $0}' file
sed ':a;N;$!ba;s/\n//g' file.txt
perl -p -i -e 's/\R//g;' filename

How to remove newline from a variable in Bash

To remove the newline character (or any other characters) you can use Pattern substitution (a kind of Shell Parameter Expansion), the format is as follows:

  • To remove all matches:
${VARIABLE//FROM/TO}
  • To remove the first match:
${VARIABLE/FROM/TO}
  • To remove a match at the end of a string:
${VARIABLE/%FROM/TO}
  • Removing all matches and assigning a new value to the same variable:
VARIABLE=${VARIABLE//FROM/TO}

In this case, the newline character (\n) must be escaped with a backslash.

Variable output without removing newline:

text='hello\n\nthere\nagain\n'
echo -e ${text}

Result:

hello

there
again

Variable output with all newlines removed:

text='hello\n\nthere\nagain\n'
echo -e ${text//\\n/}

Result:

hellothereagain

Variable output with only the first newline removed:

text='hello\n\nthere\nagain\n'
echo -e ${text/\\n/}

Result:

hello
there
again

Variable output with the last newline removed:

text='hello\n\nthere\nagain\n'
echo -e ${text/%\\n/}

Result:

hello

there
again

How to replace newline ("\n") with space (" ")

tr

To replace newline ("\n") with a space you can use the following construct:

tr '\n' ' '

For example:

echo -e 'hello\n\nthere\nagain\n' | tr '\n' ' '

sed

GNU sed:

sed ':a;N;$!ba;s/\n/ /g' FILE

Cross platform compatible syntax that works with BSD and OS X sed:

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' FILE

GNU sed has a -z option for zero-delimited entries (lines). You can simply call:

sed -z 's/\n/ /g'

Example:

sed -z 's/\n/ /g' FILE

Bash

Slow solution:

while read line; do printf "%s" "$line "; done < FILE

Another solution:

cat FILE.txt | while read line; do echo -n "$line "; done

One more solution:

while read line; do echo -n "$line "; done < FILE.txt

Perl

Perl solution, speed is about the same as with sed:

perl -p -e 's/\n/ /' FILE

paste

Solution with tr, speed is about the same as with paste, can only replace one character:

paste -s -d ' ' FILE

awk

Solution with awk, speed is about the same as with tr:

awk 1 ORS=' ' FILE

Explanation:

An awk program is made up of rules made up of conditional code blocks, that is:

condition { code block }

If the code block is omitted, the default value is used: { print $0 }. Thus, 1 is always interpreted as a true condition, and print $0 is executed for each line.

When awk reads input, it breaks it into records based on the RS (Record Separator) value, which is newline by default, so awk will parse the input line by line by default. The split also includes the removal of RS from the input record.

Now, when printing a record, ORS (Output Record Separator) is added to it, by default again newline. Thus, since we have replaced the ORS value with a space, all newlines are replaced with spaces.

Another option to replace all newlines with spaces using awk without reading the entire file into memory:

awk '{printf "%s ", $0}' FILE

If you want the final newline to be present:

awk '{printf "%s ", $0} END {printf "\n"}' FILE

You can use more than just the space character (in this case, instead of a space, the separator is the “|” character):

awk '{printf "%s|", $0} END {printf "\n"}' FILE

Another simple awk solution:

awk '{printf $0 " "}' FILE

xargs

Simple xargs solution:

xargs < FILE.txt
Loading...
X