Bug 627972

Summary: getcwd(2) returns bogus path
Product: [openSUSE] openSUSE 11.3 Reporter: Harald Koenig <koenig>
Component: BasesystemAssignee: Leonardo Chiquitto <lchiquitto>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P3 - Medium CC: forgotten_sLJ7K2dvxj, lchiquitto
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 11.3   
See Also: https://bugzilla.novell.com/show_bug.cgi?id=565151
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Harald Koenig 2010-08-03 17:09:04 UTC
recently we updated the kernel from 2.6.27.45-0.1.1 to 2.6.27.48-0.1.1, my $HOME is mounted via autofs.

now, for the 1st time, I notice the following strange problems with some shell scripts:

for one (of many open;) xterm and the bash within /bin/pwd now returns
"koenig/dir" instead of "/home/koenig/dir" in other bash instances which are in the same dir (note esp. the missing / for the broken /bin/pwd output!).
in other xterm/bash instances this does not happen at all, while it's 100% reproducable in that one bash.

"strace /bin/pwd" shows that that bad string directly comes from getcwd(2):

good bash:	getcwd("/home/koenig/dir", 4096) = 17
bad bash:	getcwd("koenig/dir", 4096)      = 11


a "cd $PWD" or "cd subdir" fixes the problem for this shell, but not the parent process/shell (same for interactive shells/tests):

$ bash -c "/bin/pwd"
koenig/dir
$ bash -c "cd $PWD ; /bin/pwd"
/home/koenig/dir
$ bash -c "cd subdir ; /bin/pwd"
/home/koenig/dir/subdir
$ /bin/pwd
koenig/dir


more facts:

"stat ." show identical output in shells with good and broken pwd output.

that directory is a quite old dir, it was not removed/recreated/renamed/whatsoever in the last decade

/home is ext3fs

this happend twice today in two different directories! in both cases the 1st error was from acroread (it claims "ERROR: Cannot determine current directory.").  for the 1st problem I just did "cd $PWD" (thinking about some "rm dir ; mkdir dir" problem), for the 2nd occurance I started some more testing to see what's coing on. 

we installed that new kernel on July 29, so it's only running now for 2 office days with at least 2 instances of this new(?) behaviour.

autofs did not change recently (install date Mar 16 2009)

/proc/self/cwd shows the same broken information for that bash process:

lrwxrwxrwx 1 koenig s+c 0 Aug  3 17:55 cwd -> koenig/dir

dmesg doesn't show any error or (to me) significant output 

a closer look at the strace output shows more weird differences:  the st_ino=... return value 
for all stat() and fstat() calls differ. surprisingly, the strace of the "bad" /bin/pwd
shows the correct st_ino vaules (compared with "ls -i file" and "stat file" in both
a broken and good bash instance -- all show the same inode numbers!)



any idea what's going on here ?

any problem in the new kernel, like some weird memory corruption or similar ?



I'll run a full fsck on that partition overnight -- just in case....
Comment 1 Leonardo Chiquitto 2010-08-04 13:09:09 UTC
Hi Harald, thanks for the bug report. I'm afraid this is a known problem (please see bug #565151).

The summary is:

The problem only happens when AutoFS is restarted. Running processes, with $CWD set to an automounted directory, will get "truncated" results from getcwd().

Here's the status for our supported openSUSE releases:

  openSUSE 11.3: fixed.

  openSUSE 11.2: not fixed yet. All the code is there, but we need to update
    the AutoFS init script and sysconfig to enable the feature that fixes it.
    I will do it the next time we update AutoFS.

  openSUSE 11.1: not fixed. Unfortunately it's not easy to fix. It's basically
    a new feature that requires Kernel changes plus an AutoFS version update.

I'm sorry but, considering that the problem is not critical for most use cases, I think this is a WONTFIX for openSUSE 11.1.
Comment 2 Leonardo Chiquitto 2010-08-05 14:27:25 UTC
Harald, considering the previous comment, are you OK with closing it as "won't fix"?
Comment 3 Harald Koenig 2010-08-05 18:03:38 UTC
(In reply to comment #2)
> Harald, considering the previous comment, are you OK with closing it as "won't
> fix"?

ACK, 11.1 is more or less end-of-life anyway;)

but one comment reading your interesting information:  
I did not find any evidence that autofs on my PC got restarted.  I've checked /var/log/messages etc. unfortunately I did not keep the output of ps before rebooting, so I'll have to wait for the next time this will happen to do more checks (or just cross fingers;) -- ticket closed for now...

thanks for the info!
Comment 4 Leonardo Chiquitto 2010-08-05 18:12:51 UTC
Thanks! Closing as WONTFIX (for 11.1, already fixed for 11.3).
Comment 5 Harald Koenig 2010-08-05 18:21:47 UTC
(In reply to comment #3)
> I did not find any evidence that autofs on my PC got restarted.

FYI a quick update: I just had the chance to restart autofs on a suse 11.1 system (did not want to test this on my PC right now -- one never knows...;-)

restarting autofs on that system shows this msg in syslog with the old PID of automount:  "umount_autofs_indirect: ask umount returned busy /home"

I find the same message in my own PC's syslog file:

   Aug  3 15:18:09 atuin pm-suspend[29642]: Entering suspend. In case of problems, please check /var/log/pm-suspend.log
   Aug  3 15:18:10 atuin automount[5602]: umount_autofs_indirect: ask umount returned busy /home

so you're totally right:  the restart of autofs got triggered by a test of suspend2ram for my PC,  and it all was about restarting autofs.


thanks again!
Comment 6 Harald Koenig 2011-04-04 17:01:43 UTC
(In reply to comment #1)
> Hi Harald, thanks for the bug report. I'm afraid this is a known problem
> (please see bug #565151).
> 
> The summary is:
> 
> The problem only happens when AutoFS is restarted. Running processes, with $CWD
> set to an automounted directory, will get "truncated" results from getcwd().
> 
> Here's the status for our supported openSUSE releases:
> 
>   openSUSE 11.3: fixed.

RUMORS!!!

actually today I did the same suspend/resume test with my desktop PC,
now running opensuse 11.3 -- and surprise: I slipped into the same bogus behaviour 
as last year with opensuse 11.1!!

	atuin > acroread 
	ERROR: Cannot determine current directory.
	
	atuin > pwd
	/home/koenig/dir
	
	atuin > /bin/pwd  ; echo
	koenigdir
	
	atuin > strace -e getcwd /bin/pwd
	getcwd("koenigdir", 4096)          = 9
	

please (also?!) note the missing slash between my home dir name "koenig" and the subdir name "dir" !
the 2.6.27 kernel from opensuse 11.1 at least did still print that slash which is now missing too ;-)


	atuin > uname -a
	Linux atuin 2.6.34.7-0.7-default #1 SMP 2010-12-13 11:13:53 +0100 x86_64 x86_64 x86_64 GNU/Linux

	atuin > rpm -q autofs
	autofs-5.0.5-7.2.x86_64



Harald -- now offline for a reboot... :-(((
Comment 7 Leonardo Chiquitto 2011-04-04 17:11:45 UTC
Please attach /etc/sysconfig/autofs here.
Comment 8 Harald Koenig 2011-04-04 20:04:12 UTC
(In reply to comment #7)
> Please attach /etc/sysconfig/autofs here.

DEFAULT_BROWSE_MODE=no

that'sall.... autofs gets all it's data via NIS:

atuin koenig > grep auto /etc/nsswitch.conf
automount:      nis  files

atuin koenig > ypmatch /home auto.master
auto.home -rw,grpid,hard,intr,nodevs,nosuid

atuin koenig > ypmatch koenig auto.home  
atuin:/net/atuin/fs1/home/&
Comment 9 Leonardo Chiquitto 2011-04-04 20:19:39 UTC
> > Please attach /etc/sysconfig/autofs here.
> 
> DEFAULT_BROWSE_MODE=no
> 
> that'sall.... autofs gets all it's data via NIS:

That explains why you're still seeing the problem. You need to add the following line to /etc/sysconfig/autofs:

  USE_MISC_DEVICE="yes"

This is set by default in the sysconfig file shipped with the package, but you removed it for some reason. This means AutoFS is *not* using the misc device (/dev/autofs), the feature that resolves this bug.

Although the original bug is fixed (if you have the option explicitly set to "yes"), your comments have made me realize we still have a bug in our init script: if $USE_MISC_DEVICE is not defined, it should be interpreted as "yes" by default (currently this is not the case and that's why you hit the bug again).

I'll report this in a new bug and fix it in openSUSE Factory.
Comment 10 Harald Koenig 2011-04-05 16:24:01 UTC
(In reply to comment #9)
> That explains why you're still seeing the problem. You need to add the
> following line to /etc/sysconfig/autofs:
> 
>   USE_MISC_DEVICE="yes"

ACK! with USE_MISC_DEVICE="yes" and "rcautofs restart" there are no longer getcwd() problems after suspend2ram!

thanks a lot for your quick help (and the fix in #684997 -- *please* feed this change into updates for 11.4 and 11.3, too!)