Bug 921714

Summary: grep -e behaviour does not match expected behaviour
Product: [openSUSE] openSUSE Tumbleweed Reporter: Forgotten User 3LNXxBNEaD <forgotten_3LNXxBNEaD>
Component: BasesystemAssignee: Karl Eichwalder <ke>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: astieger, bwiedemann, jsmeix, lmb, rw
Version: 201501*   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 13.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 932494    

Description Forgotten User 3LNXxBNEaD 2015-03-11 10:45:04 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2292.0 Safari/537.36
Build Identifier: 

when using grep -e the result given by grep versions  2.21 and 2.22 do not match up with grep version 2.14 when encountering ^@ (null) characters such as found in /proc/<pid>/cmdline . This is breaking some init scripts, in my case specifically the /etc/init.d/ceph script.


Reproducible: Always

Steps to Reproduce:
1.grep -e -<insert filter that will match> /proc/<pid>/cmdline && echo Found
2.
3.
Actual Results:  
no output

Expected Results:  
Found is output to the terminal.

an example

mythMedia:~ # grep --version
grep (GNU grep) 2.14
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
mythMedia:~ # grep -e -i.a /proc/6000/cmdline 
Binary file /proc/6000/cmdline matches
mythMedia:~ # less /proc/6000/cmdline
/usr/bin/ceph-mon^@-i^@a^@--pid-file^@/var/run/ceph/mon.a.pid^@-c^@/etc/ceph/ceph.conf^@--cluster^@ceph^@-f^@




microserver-1:~ # grep --version
grep (GNU grep) 2.21
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
microserver-1:~ # grep -e -i.b /proc/10209/cmdline
microserver-1:~ # echo $?
1
microserver-1:~ # less /proc/10209/cmdline
/usr/bin/ceph-mon^@-i^@b^@--pid-file^@/var/run/ceph/mon.b.pid^@-c^@/etc/ceph/ceph.conf^@--cluster^@ceph^@-f^@


I have no idea what else this might be breaking as grep is used quite a lot.
Comment 1 Bernhard Wiedemann 2015-03-11 13:58:05 UTC
I found, that the old behaviour is still in 2.20 (as shipped in 13.2)

breakage probably comes from:
* Fri Nov 28 2014 andreas.stieger@gmx.de
- GNU grep 2.21
  * When searching binary data, grep now may treat non-text bytes as
    line terminators.  This can boost performance significantly.


and I found, that adding a --text grep option,
might give the old matching behaviour again (but with different output).
Comment 2 Andreas Stieger 2015-03-11 14:22:40 UTC
This is documented and intended behaviour in grep 2.21 and later. As such grep -e really does match expected behaviour, I consider the issue invalid as per the current summary.

I realize that this does not do what you need for the script, but should this not be fixed in the affected init scripts and at upstream ceph, also given the fact that the script will no longer be able to rely on legacy grep behaviour on other platforms either?
Comment 3 Forgotten User 3LNXxBNEaD 2015-03-11 14:30:45 UTC
All I will say is it breaks compatibility and without any caveats added to --help or the manpage.

I doubt the intention of the performance change was to actually change what matched and what did not. 

While it might be behaving as expected inlight of the code change it does not feel like it is behaving in line with expectations due to historical behavior and lack of obvious 'changed behavior' information. 

Who knows how many other init or make scripts rely on the old behaviour.
Comment 4 Andreas Stieger 2015-03-11 14:51:10 UTC
(In reply to Malcolm Haak from comment #3)
> All I will say is it breaks compatibility and without any caveats added to
> --help or the manpage.

That is a different statement than the summary.

> I doubt the intention of the performance change was to actually change what
> matched and what did not. 

Actually, that was exactly what was mentioned as an accepted effect of the change:
"grep now may treat non-text bytes as line terminators."

> While it might be behaving as expected inlight of the code change it does
> not feel like it is behaving in line with expectations due to historical
> behavior and lack of obvious 'changed behavior' information. 

Uniform behavour between all platforms using grep 2.21 or later is desired. Commonly used tools including grep will change every once in a while. I am also certain most compiler warnings and errors will be fixed in the compiled code rather than the compiler, unless shown to deviate from spec or intention.
 
> Who knows how many other init or make scripts rely on the old behaviour.

I am confident that we can fix it all. Deviating from intended upstream behaviour, albeit annoying, should not be patched in without a very good reason. I do not see this to be the case here.
Comment 5 Forgotten User 3LNXxBNEaD 2015-03-11 22:32:53 UTC
(In reply to Andreas Stieger from comment #4)
> (In reply to Malcolm Haak from comment #3)
> > All I will say is it breaks compatibility and without any caveats added to
> > --help or the manpage.
> 
> That is a different statement than the summary.
> 

Cool change the name.

> > I doubt the intention of the performance change was to actually change what
> > matched and what did not. 
> 
> Actually, that was exactly what was mentioned as an accepted effect of the
> change:
> "grep now may treat non-text bytes as line terminators."

So, put it under a different flag. --fastbin or something, it seems ridiculous to just break behaviour in grep between versions with little more than a change log mention. 

> 
> > While it might be behaving as expected inlight of the code change it does
> > not feel like it is behaving in line with expectations due to historical
> > behavior and lack of obvious 'changed behavior' information. 
> 
> Uniform behavour between all platforms using grep 2.21 or later is desired.
> Commonly used tools including grep will change every once in a while. I am
> also certain most compiler warnings and errors will be fixed in the compiled
> code rather than the compiler, unless shown to deviate from spec or
> intention.

So upstream patching grep makes more sense than patches to every other thing known and unknown that relies on the old behaviour. 

>  
> > Who knows how many other init or make scripts rely on the old behaviour.
> 
> I am confident that we can fix it all. Deviating from intended upstream
> behaviour, albeit annoying, should not be patched in without a very good
> reason. I do not see this to be the case here.

I guess thats your call.
Comment 6 Andreas Stieger 2015-03-11 23:14:33 UTC
Described behaviour is (upstream) intended/accepted behaviour from grep 2.21.
Not convinced to alter this as a special case.
Comment 7 Bernhard Wiedemann 2015-04-28 18:27:28 UTC
maybe worth a mention in the release-notes:

> When using grep-2.21+ on binary files you might get less matches than before.
> Use the --text option to get more matches again.
Comment 8 Johannes Meixner 2015-10-23 08:57:49 UTC
Only an addendum FYI:

I do strongly recommend to never deviate in any way
from upstream intended/accepted behaviour.

From my personal experience deviating from upstream
intended/accepted behaviour is the topmost reason
for an endless sequence of subsequent issues that
pile up into a monstrosity of "patches" that pervert
a software package into a totally unmaintainable mess.

We had such a totally unmaintainable mess with grep in the past
and it was duwe@suse.de who finally did the only right thing
to clean it up by a reset to full compliance with upstream, cf.:
------------------------------------------------------------------------
$ osc cat Base:System grep grep.changes
...
Thu Jul 22 15:45:31 CEST 2010 - jsmeix@suse.de

- Forwarded the below "upgrade to grep-2.6.3" to openSUSE:Factory.
  This is also a reset to full compliance with upstream.
  All our own patches and "speedups" were dropped in the below
  "upgrade to grep-2.6.3" because they had bad side effects
  like bnc#618455 (SLES11-SP1) and bug#616037 (SLES9-SP4)
  which do not happen with an upstream compliant grep
  (regardless of an old version 2.5.1 or the new 2.6.3).

- On Fri Apr 9 16:43:45 CEST 2010 duwe@suse.de did a version
  upgrade to grep-2.6.3, which brings among various compile fixes
  vast improvements for UTF-8 / multibyte handling.
  Fixes bnc#255977 (SLES10-SP2) and bnc#415228 (SLES9-SP3).
------------------------------------------------------------------------

In general:

Maybe worth a mention everywhere:

 "When you use tools, in particular when you use them in a special way,
  read the documemntation so that you really know how the tool behaves.
  Otherwise you may get unexpected results. When you get unexpected
  results, read the documemntation so that you really know how the tool
  is meant to be used and what the intended behaviour of the tool is."

An example from my personal experience what I am talking about:

In the past I got several complaints and "severe bug" reports
about "grep does not behave as expected" where the root cause
was that the users did not understand the meaning of "locale".
They all had called grep in a non-POSIX locale but expected
POSIX-conformant behaviour. Because of this I made
https://en.opensuse.org/SDB:Plain_Text_versus_Locale

Before that happened instead of telling the users how to use grep
what we had done was adding our own "patches" and "speedups"
to make it "behave better" but in the end that had introduced
real bugs in grep. Then I was made maintainer of grep but I
(and apparently also all others) were unable to maintain that
monstrosity that we had made of grep until duwe@suse.de
had put an end to that wrong way.