|
Bugzilla – Full Text Bug Listing |
| Summary: | nfs4 mounts with sec=krb5 not mounted during boot | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.2 | Reporter: | H Markert <mephisto> |
| Component: | Basesystem | Assignee: | E-mail List <bnc-team-screening> |
| Status: | VERIFIED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | fcrozat, jsuchome, nfbrown, werner |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 12.2 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
dmesg output after booting with systemd.log_level=debug, hw clock initially wrong
dmesg output after booting with systemd.log_level=debug, hw clock initially right clocktest --- a bash script to check clock setup strace warpclock |
||
|
Description
H Markert
2012-09-06 20:09:45 UTC
I tried booting with sysvinit-init, which shows the same behavior. I have no idea what causes this bug. According to boot.log, all relevant services seem to be started before the mount is actually performed, and still it fails with "permission denied by server". After the system finished booting, logging in as root and performing systemctl restart home.mount perfectly works and mounts the nfs home directories, so everything seems to be set up fine. I tried to replace the hostname with an IP number in /etc/fstab today, and also added the server to /etc/hosts, but without success. I tried replacing line 247 mount -at nfs,nfs4 || rc_failed 1 with sleep 2 && mount -at nfs,nfs4 & in /etc/init.d/nfs as suggested in the comment a few lines above in the same file. This makes the nfs-mount come up on boot for me. I now think this is some sort of dependency issue, as obviously the two seconds delay are enough to start the missing components. I compared the /etc/init.d/nfs scripts in 12.1 and 12.2, they are identical. In 12.1, nfs mounts come up on boot fine. There was however a similar issue with nfs mounts during boot, see bug 738448 (I don't know if it is related to this issue). I finally found the reason for that: It's a problem with the system time. If the hw clock is set to UTC, then when the system tries to mount the nfs share, the system time is still set to UTC, as the clock is adjusted later in the boot process. This causes a difference to "real time" of 2 hours for me (Germany). With such a high offset, the kerberos mount is denied by the server, which causes the problem. As a workaround I did set my hw clock to local time, which makes the problem disappear (at least as long as there is no clock offset due to daylight saving changes). A better solution would be to set the system date and time earlier in the boot process, at least before trying to mount nfs shares. Thanks for finding this! I doubt we would have found such a thing in testing. I'm a bit confused though - the hw clock *should* be set to UTC. That should work best. What does: grep HWCLOCK /etc/sysconfig/clock show? If it shows 'HWCLOCK="--localtime"', then you might want to change it to 'HWCLOCK="-u"' If it is already "-u", then it is very strange that setting the clock to localtime would fix anything. Another thing you could try is editing /etc/init.d/nfs, finding the "Required-Start:" line (Shortly after "### BEGIN INIT INFO") and add "boot.clock" to the end of the line. That should ensure that the clock is set properly before 'nfs' starts. Please let me know if either of those helps. Hi, sorry for the delay, I did not have time to look further into this earlier. a grep HWCLOCK /etc/sysconfig/clock outputs USE_HWCLOCK="yes" Which is probably not quite what was expected. I have my system clock set to localtime, as I sometimes have to use Windows on that particular computer. Hence, I did set the corresponding option in yast that my hardware clock is set to localtime. After some experiments today, I found that everything works as expected if I config yast to set the system clock to UTC and actually set the hardware clock accordingly. If I set it to localtime in yast and adjust the hardware clock prior to booting SuSE 12.2, everything also works well. However, on shutdown or reboot, SuSE sets the hardware clock back to UTC, even if localtime is configured in yast. This causes the time offset during the next boot. Things get fixed when sntpd is started (because it readjusts the clock immediately), but this is too late in the boot process. For me, this basically seems not to be a bug in NFS, but in the scripts that store the system time back to hardware clock on reboot/shutdown. I did not yet try to add "boot.clock" to "Required-Start:" in the init script, and currently I think it will not help much. Probably, sntpd should be started prior to NFS, which would allow to correct large drifts of the hardware clock when the computer is off. However, the real cause of the problem seems to be in the shutdown scripts. My turn to apologise - that HWCLOCK setting doesn't exist any more in 12.2. The /etc/adjtime file is used instead. The third line of that file is either "UTC" "LOCAL". Werner: could you look at this please? It seems that in 12.2, if the hwclock is set to "localtime" something goes wrong leading to the time not being correct in early boot, and initial kerberos authentication (which requires good time sync) fails. The question rises if /etc/sysconfig/clock and /etc/adjtime from disk are in sync with /etc/sysconfig/clock and /etc/adjtime in the initrd as otherwise the warpclock utility can not inform the kernel about the broken local time in CMOS and tell the kernel the offset to enable the kernel to set its own system clock back to real UTC. The can be check with the command lsinitrd -l /boot/initrd | grep -E '(sysconfig/clock|etc/adjtime|warpclock)' In fact if there is no old Windows[tm] system in parallel then UTC in CMOS is recommended! With Windows7[tm] and above it is possible to use UTC in CMOS. For older Windows[tm] systems it is essential to help the linux kernel to have correct UTC in its system clock as this is the reference for network time as well as for the user time. For this mkinitrd is used to copy /etc/sysconfig/clock and /etc/adjtime together with the warpclock tool from disk into the initrd, the initial ram disk to set the system clock of the kernel back to UTC before any file system is checked and touched by the kernel. lsinitrd -l /boot/initrd | grep -E '(sysconfig/clock|etc/adjtime|warpclock)' outputs -rw-r--r-- 1 root root 48 Oct 4 20:02 etc/adjtime -rwxr-xr-x 1 root root 10608 Aug 15 13:23 bin/warpclock on my system, apparently "sysconfig/clock" is missing. However, etc/adjtime has the same modification time and date as my local /etc/adjtime file, so this seems to be transferred to initrd correctly. I am aware that generally UTC would be a better setting for the system clock, however I have a Windows Vista install on that machine, where system clock in UTC does not work reliably. The script /lib/mkinitrd/scripts/boot-clock.sh should be part of the initrd as well below the boot/ directory as e.g. boot/05-clock.sh in the initrd and this script does touch a controls file below /dev/shm/ ... that is you should have a file /dev/shm/warpclock after the system has booted. Also you should be aware that you have to make sure that the Daylight Saving Time switch is not done automatically, that is this has to be done by you or the by the foreign OS. I have boot/05-clock.sh in initrd, and /dev/shm/warpclock gets created during boot. Still, the time is off by two hours during early boot. Is it possible that the error occurs during shutdown, when system time is written back to hw clock? AFAIK for the the boot scrip /etc/init.d/boot.clock has to be enabled and AFAIK this is *not* the case for openSUSE 12.2. Beside this up to the point where warpclock informs the kernel about the wrong CMOS clock the kernels initial system clock is not in UTC as the kernel always assumes that the CMOS clock is in UTC. Only the warpclock program in initrd tells the kernel that the CMOS clock is not in UTC and provides the clock offset with the settimeofday(2) system call with the correct struct timezone. Compare this with the source code of the warpclock program. You may check you CMOS clock by switching into the BIOS before booting into Linux. Then you should check if both the /etc/localtime in initrd and on disk is correct and identical with the zoneinfo file below /usr/share/zoneinfo for your timezone specified in /etc/sysconfig/clock in the variable TIMEZONE. E.g. by using . /etc/sysconfig/clock cmp /etc/localtime /usr/share/zoneinfo/$TIMEZONE zcat /boot/initrd | cpio -i --quiet --to-stdout etc/localtime | cmp /etc/localtime - Also please make sure to have all updates for openSUSE 12.2 installed. When I set the hardware clock in BIOS to correct local time before booting SuSE, startup and mounting kerberos nfs share works. However, after the system finished booting, I get hwclock Thu Oct 11 18:14:21 2012 -0.188092 seconds and date Thu Oct 11 20:14:21 CEST 2012 When I shut down under that conditions, the system clock shows the wrong time in BIOS, and the next boot fails because of the time offset. Some startup script seems to modify the hardware clock in a wrong way during boot... Try
grep -rsl hwclock /etc/init.d/*.d/ /etc/systemd/ /lib/systemd/ \
/etc/udev/ /lib/udev/ /etc/profile.d/
to check this.
Add the maintainer of systemd to CC list as this could be a problem with one of the systemd scriplets.
The output of the grep command is /lib/systemd/systemd /lib/systemd/systemd-timedated which are both ELF-binaries, no scripts I can look into. Unfortunately I have no idea who maintains systemd, so I cannot add the maintainer to the CC list... I've added Frederic already :) Now let us ask Frederic if the problem could occur dues /lib/systemd/systemd and/or /lib/systemd/systemd-timedated as the strings command shows /suse/werner> strings /lib/systemd/systemd | grep hwclock hwclock_apply_localtime_delta /suse/werner> strings /lib/systemd/systemd-timedated | grep hwclock hwclock_apply_localtime_delta hwclock_get_time hwclock_set_time before this I'd like to know the content of /etc/adjtime of both initrd and disk as well the value of the variable TIMEZONE in /etc/sysconfig/clock of both initrd and disk. at startup, systemd is calling hwclock_apply_localtime_delta, so it shouldn't be a issue. Could you boot with "systemd.log_level=debug systemd.log_target=kmsg" and attach dmesg output after booting ? /etc/adjtime (initrd): 436.719742 1348907835 0.000000 1348907835 LOCAL /etc/adjtime (system): 436.719742 1348907835 0.000000 1348907835 LOCAL value of TIMEZONE in /etc/sysconfig/clock (system): TIMEZONE="Europe/Berlin" File /etc/sysconfig/clock does not exist in initrd: # ls -l /tmp/initrd/etc/sysconfig total 4 -rw-r--r-- 1 root root 0 Oct 27 18:55 kernel drwxr-xr-x 6 root root 4096 Oct 27 18:55 network From my understanding this would mean that the system does not know about the correct timezone during early boot, as initrd does not contain the required information. This seems to explain the behavior. Given that /etc/sysconfig/clock does not exist in initrd, do you still need the dmesg debug output of systemd? (In reply to comment #19) The command lsinitrd /boot/initrd should show a) etc/localtime which should be identical with current /etc/localtime b) etc/adjtime which las t line should be identical with /etc/adjtime c) usr/share/zoneinfo/UTC d) bin/warpclock which uses the file above in initrd to correct the kernel system time In other words id mkinitrd has been called after changing /etc/localtime or the last line in /etc/adjtime the early system in initrd *knows* about the timezone The files etc/localtime and etc/adjtime are available in my initrd and identical to those on /etc (at least identical time stamp and file size). usr/share/zoneinfo/UTC is available in the initrd and identical to the version in /usr, according to time stamp and file size bin/warpclok is available in initrd. Created attachment 511853 [details]
dmesg output after booting with systemd.log_level=debug, hw clock initially wrong
This is the output of dmesg after booting the system with systemd.log_legel=debug, systemd.log_target=kmsg and the hardware clock erroneously being set to UTC, when it is configured to be in localtime.
This is the state the system brings itself into after a reboot when the hardware clock should be set in localtime.
Created attachment 511854 [details]
dmesg output after booting with systemd.log_level=debug, hw clock initially right
This is the output of dmesg after booting with systemd.log_level=debug, systemd.log_target=kmsg and the system clock being set to localtime manually before booting.
Note that in this case, the first boot is successful (/home partition gets mounted successfully via kerberized nfs4). While booting the hwclock is however set to UTC, so that subsequent boots fail.
When setting the hwclock correctly in the bios prior to booting, the system boots successfully.
However, after the first boot the hardware clock is off again (it seems to be reset to UTC during boot). In the booted system, "hwclock --localtime" gives the wrong time, "hwclock --utc" shows the correct time - I would have expected this to be the other way around when hwclock should be set to local time.
This triggers the bug on the next reboot.
Setting the hwclock manually ("hwclock --systohc") corrects the hwclock setting, but this lasts only until the next boot, where hwclock is again being set to UTC.
I tried setting
SYSTOHC="yes"
FORCE_SYSTOHC="yes"
USE_HWCLOCK="yes"
in /etc/sysconfig/clock as a workaround, hoping that this will call a "hwclock --systohc" on reboot. While this would not fix the original issue, at least it should be a workaround. However, the setting seems to be completely ignored.
I checked whether boot.clock service is active in yasts runlevel editor (it was not), and manually activated it, however, this did not change the situation at all. Note that I was never able to see the
"Set Hardware Clock to the current System Time"
message during shutdown that I would have expected to see from /etc/init.d/boot.clock with the above settings.
Is there some log file where shutdown messages are being stored?
Has any progress on this bug been archived? Setting NEEDINFO TO fcrozat as the information he asked in Comment#18 has been provided. there is something problematic in your systemd log: if systemd was correctly detecting your system configured as localtime, we should see on the log something like : "[ 6.894360] systemd[1]: RTC configured in localtime, applying delta of 60 minutes to system time." or an error message. And yet, we don't see it in both logs, so systemd doesn't apply localtime delta to your system, which could explain your issue (and since you use ntp, clock is becoming "fixed" somehow while system is running. Please attach /etc/adjtime to this bug report (In reply to comment #27) It should be noted that this should already done in the initrd by the warpclock tool before the root file system is checked and mounted. As the kernel does accept this only once the second time in systemd does not hurt. There not only the /etc/adjtime is required but also to outout of lsinitrd /boot/initrd | grep -E 'clock|warp|adjtime|localtime' hwclock --debug Created attachment 522946 [details]
clocktest --- a bash script to check clock setup
A bash script I've wrote to check clock configuration
(In reply to comment #28) > (In reply to comment #27) > > It should be noted that this should already done in the initrd by the warpclock > tool before the root file system is checked and mounted. As the kernel does > accept this only once the second time in systemd does not hurt. Hmm, I tested on a 12.2 system (with localtime set), booting with "shell" to be stopped in the initrd and then I called warpclock manually (making sure /etc/adjtime in initrd was properly set) but the time returned by "date" was incorrect (of 1 hour). Then continuing boot under systemd "fixed" the clock.. (In reply to comment #30) Please could you retest as I'd like to see an strace of the warpclock call that is please add strace to the initrd and reboot to do so. The mkinitrd package should have the changelog entry Wed Jul 4 2012 werner@suse.de - Fix boot clock scriptlet by testing /etc/adjtime before /etc/sysconfig/clock to make sure that warpclock is executed if CMOS clock is in local time otherwise it will not work correct Created attachment 523039 [details] strace warpclock (In reply to comment #31) > Please could you retest as I'd like to see an strace of the warpclock call that > is please add strace to the initrd and reboot to do so. The mkinitrd package > should have the changelog entry Correction, I did the test in initrd on both up to date Factory and 12.2 (which have the changelog you mentionned). I'm attaching strace of warpclock, done on up to date 12.2. Thanks ... found a bug. the only problem that the mkinitrd for 12.2 is locked due an other maintenance act (#672) if you want me to test an interim test package on OBS (either 12.2 or Factory), just ping me. Thanks a lot ... in Base:System you find a fixed version of mkinitrd. This one should do the job. I just tested the Base:System package and I can confirm warpclock works fine there (but the package lacks a changelog entry ;) I'm not 100% sure it will be the issue Heiner reported, since its systemd trace should still show "RTC configured in localtime...". Anyway, we'll see with updated mkinitrd + output from your test script. This is an autogenerated message for OBS integration: This bug (779145) was mentioned in https://build.opensuse.org/request/show/150758 Factory / mkinitrd This is an autogenerated message for OBS integration: This bug (779145) was mentioned in https://build.opensuse.org/request/show/150762 Factory / mkinitrd just wondering, could this bug be a duplicate of bnc#791106 ? Heiner, could you try updating ntp from update and see if clock is becoming wrong after two boots ? This is an autogenerated message for OBS integration: This bug (779145) was mentioned in https://build.opensuse.org/request/show/154938 Maintenance / This is an autogenerated message for OBS integration: This bug (779145) was mentioned in https://build.opensuse.org/request/show/155423 Maintenance / The mkinitrd update for openSUSE 12.2 is released. Feel free to close the bug if everything is done. openSUSE-RU-2013:0317-1: An update that has one recommended fix can now be installed. Category: recommended (low) Bug References: 779145 CVE References: Sources used: openSUSE 12.2 (src): mkinitrd-2.7.1-62.9.1 The update "openSUSE-RU-2013:0317-1" fixes the problem for me, thank you! so I think the bug can be closed. |