Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
924 views
in Technique[技术] by (71.8m points)

bash - Trailing new line after piping to a command: is there any standard?

Answering How to remove the last CR char with cut I found out that some programs do add a trailing new line to the end of a string, while others don't:

Say we have the string foobar and print it with printf so that we don't get an extra new line:

$ printf "foobar" | od -c
0000000   f   o   o   b   a   r
0000006

Or with echo -n:

$ echo -n "foobar" | od -c
0000000   f   o   o   b   a   r
0000006

(echo's default behaviour is to return the output followed by a newline, so echo "foobar" returns f o o b a r ).

Neither sed nor cat do add any extra character:

$ printf "foobar" | sed 's/./&/g' | od -c
0000000   f   o   o   b   a   r
0000006
$ printf "foobar" | cat - | od -c
0000000   f   o   o   b   a   r
0000006

Whereas both awk and cut do. Also xargs and paste add this trailing new line:

$ printf "foobar" | cut -b1- | od -c
0000000   f   o   o   b   a   r  

0000007
$ printf "foobar" | awk '1' | od -c
0000000   f   o   o   b   a   r  

0000007
$ printf "foobar" | xargs | od -c
0000000   f   o   o   b   a   r  

0000007
$ printf "foobar" | paste | od -c
0000000   f   o   o   b   a   r  

0000007

So I was wondering: why is this different behaviour? Is there anything POSIX suggests about this?

Note I am running all of this in my Bash 4.3.11 and the rest is:

  • GNU Awk 4.0.1
  • sed (GNU sed) 4.2.2
  • cat (GNU coreutils) 8.21
  • cut (GNU coreutils) 8.21
  • xargs (GNU findutils) 4.4.2
  • paste (GNU coreutils) 8.21
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

So I was wondering: why is this different behaviour? Is there anything POSIX suggests about this?

Some commands (like for example printf) are simple interface to the libc library calls (e.g. printf()) which don't add automatically. Most *NIX text processing commands would add a on the end of the last line.

From the Definitions of POSIXv7, a textual line has to have a newline on the end:

3.206 Line

A sequence of zero or more non- <newline> characters plus a terminating character.

If the newline is missing, it becomes this:

3.195 Incomplete Line

A sequence of one or more non- <newline> characters at the end of the file.

The general idea is that text file can be treated as a list of records, where every record is terminated by . In other words, is not something between lines - it is the part of the line. See for example the fgets() function: the is always included and serves to identify the case whether the text line was read completely or not. If the last line is missing the , then one has to do more checks to read the file correctly.

In general, as long as your text files are created on *NIX by *NIX programs/scripts, it is fine to expect that last line is properly terminated. But many Java applications as well as the Windows applications do not handle that correctly or consistently. Not only they often forget to add the last , oftentimes they also incorrectly treat the trailing as an additional empty line.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...