Bug 779145

Summary: nfs4 mounts with sec=krb5 not mounted during boot
Product: [openSUSE] openSUSE 12.2 Reporter: H Markert <mephisto>
Component: BasesystemAssignee: E-mail List <bnc-team-screening>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: fcrozat, jsuchome, nfbrown, werner
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 12.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: dmesg output after booting with systemd.log_level=debug, hw clock initially wrong
dmesg output after booting with systemd.log_level=debug, hw clock initially right
clocktest --- a bash script to check clock setup
strace warpclock

Description H Markert 2012-09-06 20:09:45 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0

I try to mount home directories through nfs4 using sec=krb5 on a client machine using openSUSE 12.2.
During bootup, mounting fails, I can find a "Permission denied by server" message in the logs.
After the system finished booting, a manual mount via "mount /home" succeeds (hence, the fstab entry as well as all kerberos stuff is working).

My guess would be that during startup some important service (rpc.gssd?) is not yet available when systemd tries to mount the home directory.

Reproducible: Always

Steps to Reproduce:
1. Prepare your system to mount a nfs4 share with kerberos security enabled.
2. Add a corresponding line to fstab to automate the mounting
3. reboot
Actual Results:  
nfs directory not mounted.

Expected Results:  
nfs directory should be mounted
Comment 1 H Markert 2012-09-07 19:14:47 UTC
I tried booting with sysvinit-init, which shows the same behavior.

I have no idea what causes this bug. According to boot.log, all relevant services seem to be started before the mount is actually performed, and still it fails with "permission denied by server".

After the system finished booting, logging in as root and performing
systemctl restart home.mount
perfectly works and mounts the nfs home directories, so everything seems to be set up fine.

I tried to replace the hostname with an IP number in /etc/fstab today, and also added the server to /etc/hosts, but without success.
Comment 2 H Markert 2012-09-09 16:08:03 UTC
I tried replacing line 247
 mount -at nfs,nfs4 || rc_failed 1
with
 sleep 2 && mount -at nfs,nfs4 &
in /etc/init.d/nfs as suggested in the comment a few lines above in the same file.
This makes the nfs-mount come up on boot for me.

I now think this is some sort of dependency issue, as obviously the two seconds delay are enough to start the missing components.

I compared the /etc/init.d/nfs scripts in 12.1 and 12.2, they are identical. In 12.1, nfs mounts come up on boot fine.
There was however a similar issue with nfs mounts during boot, see bug 738448 (I don't know if it is related to this issue).
Comment 5 H Markert 2012-09-12 19:06:22 UTC
I finally found the reason for that: It's a problem with the system time.
If the hw clock is set to UTC, then when the system tries to mount the nfs share, the system time is still set to UTC, as the clock is adjusted later in the boot process.
This causes a difference to "real time" of 2 hours for me (Germany). With such a high offset, the kerberos mount is denied by the server, which causes the problem.

As a workaround I did set my hw clock to local time, which makes the problem disappear (at least as long as there is no clock offset due to daylight saving changes).

A better solution would be to set the system date and time earlier in the boot process, at least before trying to mount nfs shares.
Comment 6 Neil Brown 2012-09-12 23:31:13 UTC
Thanks for finding this! I doubt we would have found such a thing in testing.

I'm a bit confused though - the hw clock *should* be set to UTC.  That should work best.

What does:
 grep HWCLOCK /etc/sysconfig/clock

show?  If it shows 'HWCLOCK="--localtime"', then you might want to change it to
'HWCLOCK="-u"'

If it is already "-u", then it is very strange that setting the clock to localtime would fix anything.

Another thing you could try is editing /etc/init.d/nfs, finding the "Required-Start:" line (Shortly after "### BEGIN INIT INFO") and add "boot.clock" to the end of the line.  That should ensure that the clock is set properly before 'nfs' starts.

Please let me know if either of those helps.
Comment 7 H Markert 2012-09-29 17:47:14 UTC
Hi,

sorry for the delay, I did not have time to look further into this earlier.

a
 grep HWCLOCK /etc/sysconfig/clock
outputs
 USE_HWCLOCK="yes"
Which is probably not quite what was expected.

I have my system clock set to localtime, as I sometimes have to use Windows on that particular computer. Hence, I did set the corresponding option in yast that my hardware clock is set to localtime.

After some experiments today, I found that everything works as expected if I config yast to set the system clock to UTC and actually set the hardware clock accordingly.
If I set it to localtime in yast and adjust the hardware clock prior to booting SuSE 12.2, everything also works well.
However, on shutdown or reboot, SuSE sets the hardware clock back to UTC, even if localtime is configured in yast. This causes the time offset during the next boot. Things get fixed when sntpd is started (because it readjusts the clock immediately), but this is too late in the boot process.

For me, this basically seems not to be a bug in NFS, but in the scripts that store the system time back to hardware clock on reboot/shutdown.

I did not yet try to add "boot.clock" to "Required-Start:" in the init script, and currently I think it will not help much. Probably, sntpd should be started prior to NFS, which would allow to correct large drifts of the hardware clock when the computer is off. However, the real cause of the problem seems to be in the shutdown scripts.
Comment 8 Neil Brown 2012-10-02 00:37:59 UTC
My turn to apologise - that HWCLOCK setting doesn't exist any more in 12.2.
The /etc/adjtime file is used instead.

The third line of that file is either "UTC" "LOCAL".

Werner: could you look at this please?  It seems that in 12.2, if the hwclock is set to "localtime" something goes wrong leading to the time not being correct in early boot, and initial kerberos authentication (which requires good time sync) fails.
Comment 9 Dr. Werner Fink 2012-10-02 06:54:11 UTC
The question rises if /etc/sysconfig/clock and /etc/adjtime from disk are in sync with /etc/sysconfig/clock and /etc/adjtime in the initrd as otherwise the warpclock utility can not inform the kernel about the broken local time in CMOS and tell the kernel the offset to enable the kernel to set its own system clock back to real UTC.  The can be check with the command

  lsinitrd -l /boot/initrd | grep -E '(sysconfig/clock|etc/adjtime|warpclock)'

In fact if there is no old Windows[tm] system in parallel then UTC in CMOS is recommended!  With Windows7[tm] and above it is possible to use UTC in CMOS. For older Windows[tm] systems it is essential to help the linux kernel to have correct UTC in its system clock as this is the reference for network time as well as for the user time.  For this mkinitrd is used to copy /etc/sysconfig/clock and /etc/adjtime together with the warpclock tool from disk into the initrd, the initial ram disk to set the system clock of the kernel back to UTC before any file system is checked and touched by the kernel.
Comment 10 H Markert 2012-10-04 20:08:59 UTC
lsinitrd -l /boot/initrd | grep -E '(sysconfig/clock|etc/adjtime|warpclock)'
outputs
-rw-r--r--   1 root     root           48 Oct  4 20:02 etc/adjtime
-rwxr-xr-x   1 root     root        10608 Aug 15 13:23 bin/warpclock

on my system, apparently "sysconfig/clock" is missing.

However, etc/adjtime has the same modification time and date as my local /etc/adjtime file, so this seems to be transferred to initrd correctly.

I am aware that generally UTC would be a better setting for the system clock, however I have a Windows Vista install on that machine, where system clock in UTC does not work reliably.
Comment 11 Dr. Werner Fink 2012-10-05 07:34:48 UTC
The script /lib/mkinitrd/scripts/boot-clock.sh should be part of the initrd as well below the boot/ directory as e.g. boot/05-clock.sh in the initrd and this script does touch a controls file below /dev/shm/  ... that is you should have a file /dev/shm/warpclock after the system has booted.

Also you should be aware that you have to make sure that the Daylight Saving Time switch is not done automatically, that is this has to be done by you or the by the foreign OS.
Comment 12 H Markert 2012-10-09 19:17:16 UTC
I have boot/05-clock.sh in initrd, and /dev/shm/warpclock gets created during boot.
Still, the time is off by two hours during early boot.

Is it possible that the error occurs during shutdown, when system time is written back to hw clock?
Comment 13 Dr. Werner Fink 2012-10-10 06:58:13 UTC
AFAIK for the the boot scrip /etc/init.d/boot.clock has to be enabled and AFAIK this is *not* the case for openSUSE 12.2.

Beside this up to the point where warpclock informs the kernel about the wrong CMOS clock the kernels initial system clock is not in UTC as the kernel always assumes that the CMOS clock is in UTC.  Only the warpclock program in initrd tells the kernel that the CMOS clock is not in UTC and provides the clock offset with the settimeofday(2) system call with the correct struct timezone. Compare this with the source code of the warpclock program.

You may check you CMOS clock by switching into the BIOS before booting into Linux. Then you should check if both the /etc/localtime in initrd and on disk is correct and identical with the zoneinfo file below /usr/share/zoneinfo for your timezone specified in /etc/sysconfig/clock in the variable TIMEZONE.  E.g. by using

  . /etc/sysconfig/clock
  cmp /etc/localtime /usr/share/zoneinfo/$TIMEZONE
  zcat /boot/initrd | cpio -i --quiet --to-stdout etc/localtime | cmp /etc/localtime -

Also please make sure to have all updates for openSUSE 12.2 installed.
Comment 14 H Markert 2012-10-11 18:15:44 UTC
When I set the hardware clock in BIOS to correct local time before booting SuSE, startup and mounting kerberos nfs share works.
However, after the system finished booting, I get
hwclock
Thu Oct 11 18:14:21 2012  -0.188092 seconds
and
date
Thu Oct 11 20:14:21 CEST 2012

When I shut down under that conditions, the system clock shows the wrong time in BIOS, and the next boot fails because of the time offset.

Some startup script seems to modify the hardware clock in a wrong way during boot...
Comment 15 Dr. Werner Fink 2012-10-12 06:44:23 UTC
Try

   grep -rsl hwclock /etc/init.d/*.d/  /etc/systemd/ /lib/systemd/ \
                     /etc/udev/ /lib/udev/ /etc/profile.d/

to check this.

Add the maintainer of systemd to CC list as this could be a problem with one of the systemd scriplets.
Comment 16 H Markert 2012-10-12 09:47:43 UTC
The output of the grep command is
/lib/systemd/systemd
/lib/systemd/systemd-timedated
which are both ELF-binaries, no scripts I can look into.

Unfortunately I have no idea who maintains systemd, so I cannot add the maintainer to the CC list...
Comment 17 Dr. Werner Fink 2012-10-12 10:53:36 UTC
I've added Frederic already :)

Now let us ask Frederic if the problem could occur dues /lib/systemd/systemd and/or /lib/systemd/systemd-timedated as the strings command shows

  /suse/werner> strings /lib/systemd/systemd | grep hwclock
  hwclock_apply_localtime_delta
  /suse/werner> strings /lib/systemd/systemd-timedated | grep hwclock
  hwclock_apply_localtime_delta
  hwclock_get_time
  hwclock_set_time

before this I'd like to know the content of /etc/adjtime of both initrd and disk as well the value of the variable TIMEZONE in /etc/sysconfig/clock of both initrd and disk.
Comment 18 Frederic Crozat 2012-10-12 13:36:23 UTC
at startup, systemd is calling hwclock_apply_localtime_delta, so it shouldn't be a issue.

Could you boot with "systemd.log_level=debug systemd.log_target=kmsg" and attach dmesg output after booting ?
Comment 19 H Markert 2012-10-27 19:03:17 UTC
/etc/adjtime (initrd):
436.719742 1348907835 0.000000
1348907835
LOCAL

/etc/adjtime (system):
436.719742 1348907835 0.000000
1348907835
LOCAL

value of TIMEZONE in /etc/sysconfig/clock (system):
TIMEZONE="Europe/Berlin"

File /etc/sysconfig/clock does not exist in initrd:
# ls -l /tmp/initrd/etc/sysconfig
total 4
-rw-r--r-- 1 root root    0 Oct 27 18:55 kernel
drwxr-xr-x 6 root root 4096 Oct 27 18:55 network


From my understanding this would mean that the system does not know about the correct timezone during early boot, as initrd does not contain the required information. This seems to explain the behavior.

Given that /etc/sysconfig/clock does not exist in initrd, do you still need the dmesg debug output of systemd?
Comment 20 Dr. Werner Fink 2012-10-29 08:31:47 UTC
(In reply to comment #19)

The command

  lsinitrd /boot/initrd

should show

  a) etc/localtime    which should be identical with current /etc/localtime
  b) etc/adjtime      which las t line should be identical with /etc/adjtime
  c) usr/share/zoneinfo/UTC
  d) bin/warpclock    which uses the file above in initrd to correct the kernel
                      system time

In other words id mkinitrd has been called after changing /etc/localtime or the last line in /etc/adjtime the early system in initrd *knows* about the timezone
Comment 21 H Markert 2012-11-03 15:43:30 UTC
The files

etc/localtime and etc/adjtime are available in my initrd and identical to those on /etc (at least identical time stamp and file size).

usr/share/zoneinfo/UTC is available in the initrd and identical to the version in /usr, according to time stamp and file size

bin/warpclok is available in initrd.
Comment 22 H Markert 2012-11-03 16:57:21 UTC
Created attachment 511853 [details]
dmesg output after booting with systemd.log_level=debug, hw clock initially wrong

This is the output of dmesg after booting the system with systemd.log_legel=debug, systemd.log_target=kmsg and the hardware clock erroneously being set to UTC, when it is configured to be in localtime.
This is the state the system brings itself into after a reboot when the hardware clock should be set in localtime.
Comment 23 H Markert 2012-11-03 17:01:37 UTC
Created attachment 511854 [details]
dmesg output after booting with systemd.log_level=debug, hw clock initially right

This is the output of dmesg after booting with systemd.log_level=debug, systemd.log_target=kmsg and the system clock being set to localtime manually before booting.
Note that in this case, the first boot is successful (/home partition gets mounted successfully via kerberized nfs4). While booting the hwclock is however set to UTC, so that subsequent boots fail.
Comment 24 H Markert 2012-11-03 17:09:10 UTC
When setting the hwclock correctly in the bios prior to booting, the system boots successfully.

However, after the first boot the hardware clock is off again (it seems to be reset to UTC during boot). In the booted system, "hwclock --localtime" gives the wrong time, "hwclock --utc" shows the correct time - I would have expected this to be the other way around when hwclock should be set to local time.

This triggers the bug on the next reboot.

Setting the hwclock manually ("hwclock --systohc") corrects the hwclock setting, but this lasts only until the next boot, where hwclock is again being set to UTC.

I tried setting
SYSTOHC="yes"
FORCE_SYSTOHC="yes"
USE_HWCLOCK="yes"
in /etc/sysconfig/clock as a workaround, hoping that this will call a "hwclock --systohc" on reboot. While this would not fix the original issue, at least it should be a workaround. However, the setting seems to be completely ignored.

I checked whether boot.clock service is active in yasts runlevel editor (it was not), and manually activated it, however, this did not change the situation at all. Note that I was never able to see the
"Set Hardware Clock to the current System Time"
message during shutdown that I would have expected to see from /etc/init.d/boot.clock with the above settings.

Is there some log file where shutdown messages are being stored?
Comment 25 H Markert 2012-12-24 19:08:22 UTC
Has any progress on this bug been archived?
Comment 26 Forgotten User b5BnQSUi71 2013-01-16 05:05:31 UTC
Setting NEEDINFO TO fcrozat as the information he asked in Comment#18 has been provided.
Comment 27 Frederic Crozat 2013-01-31 17:37:28 UTC
there is something problematic in your systemd log:
if systemd was correctly detecting your system configured as localtime, we should see on the log something like :
"[    6.894360] systemd[1]: RTC configured in localtime, applying delta of 60 minutes to system time."

or an error message. And yet, we don't see it in both logs, so systemd doesn't apply localtime delta to your system, which could explain your issue (and since you use ntp, clock is becoming "fixed" somehow while system is running.

Please attach /etc/adjtime to this bug report
Comment 28 Dr. Werner Fink 2013-02-01 08:05:11 UTC
(In reply to comment #27)

It should be noted that this should already done in the initrd by the warpclock tool before the root file system is checked and mounted.  As the kernel does accept this only once the second time in systemd does not hurt.

There not only the /etc/adjtime is required but also to outout of

      lsinitrd /boot/initrd | grep -E 'clock|warp|adjtime|localtime'
      hwclock --debug
Comment 29 Dr. Werner Fink 2013-02-01 08:08:25 UTC
Created attachment 522946 [details]
clocktest --- a bash script to check clock setup

A bash script I've wrote to check clock configuration
Comment 30 Frederic Crozat 2013-02-01 09:14:32 UTC
(In reply to comment #28)
> (In reply to comment #27)
> 
> It should be noted that this should already done in the initrd by the warpclock
> tool before the root file system is checked and mounted.  As the kernel does
> accept this only once the second time in systemd does not hurt.

Hmm, I tested on a 12.2 system (with localtime set), booting with "shell" to be stopped in the initrd and then I called warpclock manually (making sure /etc/adjtime in initrd was properly set) but the time returned by "date" was incorrect (of 1 hour). Then continuing boot under systemd "fixed" the clock..
Comment 31 Dr. Werner Fink 2013-02-01 09:32:00 UTC
(In reply to comment #30)

Please could you retest as I'd like to see an strace of the warpclock call that is please add strace to the initrd and reboot to do so.  The mkinitrd package should have the changelog entry

 Wed Jul  4 2012 werner@suse.de
 - Fix boot clock scriptlet by testing /etc/adjtime before
   /etc/sysconfig/clock to make sure that warpclock is executed
   if CMOS clock is in local time

otherwise it will not work correct
Comment 32 Frederic Crozat 2013-02-01 15:00:49 UTC
Created attachment 523039 [details]
strace warpclock

(In reply to comment #31)

> Please could you retest as I'd like to see an strace of the warpclock call that
> is please add strace to the initrd and reboot to do so.  The mkinitrd package
> should have the changelog entry

Correction, I did the test in initrd on both up to date Factory and 12.2 (which have the changelog you mentionned). 

I'm attaching strace of warpclock, done on up to date 12.2.
Comment 33 Dr. Werner Fink 2013-02-01 16:25:24 UTC
Thanks ... found a bug. the only problem that the mkinitrd for 12.2 is locked due an other maintenance act (#672)
Comment 34 Frederic Crozat 2013-02-01 16:34:47 UTC
if you want me to test an interim test package on OBS (either 12.2 or Factory), just ping me.
Comment 35 Dr. Werner Fink 2013-02-01 16:40:57 UTC
Thanks a lot ... in Base:System you find a fixed version of mkinitrd.  This one should do the job.
Comment 36 Frederic Crozat 2013-02-01 17:40:12 UTC
I just tested the Base:System package and I can confirm warpclock works fine there (but the package lacks a changelog entry ;)

I'm not 100% sure it will be the issue Heiner reported, since its systemd trace should still show "RTC configured in localtime...". Anyway, we'll see with updated mkinitrd + output from your test script.
Comment 37 Bernhard Wiedemann 2013-02-01 18:00:09 UTC
This is an autogenerated message for OBS integration:
This bug (779145) was mentioned in
https://build.opensuse.org/request/show/150758 Factory / mkinitrd
Comment 38 Bernhard Wiedemann 2013-02-01 19:00:19 UTC
This is an autogenerated message for OBS integration:
This bug (779145) was mentioned in
https://build.opensuse.org/request/show/150762 Factory / mkinitrd
Comment 39 Frederic Crozat 2013-02-05 08:50:23 UTC
just wondering, could this bug be a duplicate of bnc#791106 ? Heiner, could you try updating ntp from update and see if clock is becoming wrong after two boots ?
Comment 41 Bernhard Wiedemann 2013-02-08 16:00:19 UTC
This is an autogenerated message for OBS integration:
This bug (779145) was mentioned in
https://build.opensuse.org/request/show/154938 Maintenance /
Comment 42 Bernhard Wiedemann 2013-02-14 14:00:10 UTC
This is an autogenerated message for OBS integration:
This bug (779145) was mentioned in
https://build.opensuse.org/request/show/155423 Maintenance /
Comment 43 Benjamin Brunner 2013-02-21 12:34:01 UTC
The mkinitrd update for openSUSE 12.2 is released. Feel free to close the bug if everything is done.
Comment 44 Swamp Workflow Management 2013-02-21 13:04:34 UTC
openSUSE-RU-2013:0317-1: An update that has one recommended fix can now be installed.

Category: recommended (low)
Bug References: 779145
CVE References: 
Sources used:
openSUSE 12.2 (src):    mkinitrd-2.7.1-62.9.1
Comment 45 H Markert 2013-02-21 20:53:43 UTC
The update "openSUSE-RU-2013:0317-1" fixes the problem for me, thank you!
Comment 46 Fei Xiang Zhang 2013-04-18 03:17:28 UTC
so I think the bug can be closed.