|
Bugzilla – Full Text Bug Listing |
| Summary: | systemd hangs when processing swap on lvm | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.1 | Reporter: | Marcus Schaefer <ms> |
| Component: | Basesystem | Assignee: | Frederic Crozat <fcrozat> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P5 - None | CC: | cschum, mmarek, rjschwei, shshyukriev |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Bug Depends on: | |||
| Bug Blocks: | 735634 | ||
| Attachments: |
debug dmesg output
udev rule and sane udev exit |
||
Created attachment 462985 [details]
debug dmesg output
I also tested if the problem goes away when I use the real device name in my test /dev/dm-3 instead of /dev/systemVG/LVSwap but that also didn't help, systemd just sit and wait This bites us in Studio as well. Would be great, if we could get this fixed soon. Could you test with package from home:fcrozat:systemd / systemd ? it contains lvm related fixes. The test image I built for you in my NFS home dir was built with systemd from your home repo. So did you add changes there since I opened the report ? if yes I will rebuilt the image again but if not I fear those changes did not help changes I'm talking about were done on 2011-11-10 18:17:05 (not 100% sure when the package were ready, but I guess it should be safe). sorry still the same problem with the latest systemd version from your home project: systemd-37-298.1.x86_64 have you tried with my debug image like I wrote in comment #1 you can safely reproduce the problem there I'll try to reproduce it here. I can confirm the bug with your image. Strangely, it works fine on the second boot. It looks like kiwi initrd might leave the system in "strange" state, causing systemd (and / or udevd) to not propagate lvm events .. I'll investigate further yes that's what I found out as well but I wasn't able to identify any difference in the state from first boot (kiwi initrd) and any subsequent boot with the suse (mkinitrd) initrd. That's also the reason why I invoke a debug shell right before systemd is started to give you the chance to debug the environment Thanks for your effort ok, I've compared the situation with dracut (Fedora initramfs) and it looks like we are missing some bits (which are explained in udev 168 release notes) :
"The running udev daemon can now cleanly shut down with:
udevadm control --exit
Udev in initramfs should clean the state of the udev database
with: udevadm info --cleanup-db which will remove all state left
behind from events/rules in initramfs. If initramfs uses
--cleanup-db and device-mapper/LVM, the rules in initramfs need
to add OPTIONS+="db_persist" for all dm devices. This will
prevent removal of the udev database for these devices.
"
so, I think (I didn't test since I can't rebuild easily kiwi initrd) we need to :
- ensure we stop udev "safely", ie using :
udevadm control --exit
udevadm info --cleanup-db
- flag dm/lvm in udev database, when running under kiwi, as "persistent", by adding a additional udev rule, like dracut is doing :11-dm.rules
SUBSYSTEM!="block", GOTO="dm_end"
KERNEL!="dm-[0-9]*", GOTO="dm_end"
ACTION!="add|change", GOTO="dm_end"
OPTIONS+="db_persist"
LABEL="dm_end"
it looks like we should do this for our "regular" initrd too..
I did the suggested changes and gave it a test but the result was still the same. I will attach my patch maybe you see an error there Created attachment 466382 [details]
udev rule and sane udev exit
anything wrong here ?
patch looks good but I don't have any garantee it is supposed to fix the bug :( could you attach the kiwi file you are using, so I'll try to rebuild the appliance myself locally to test various things ? if you have kiwi and kiwi-templates installed you can simply type: kiwi --build suse-12.1-JeOS -d /tmp/mytest --lvm --type oem that will build the lvm enabled oem appliance inclduing the oem installation iso you can patch your local kiwi with the attached patch. It's the same I did for testing thanks, will do.. adding Greg as cc, since he is our udev maintainer ATM, so he might have a clue to this too ;) If you need a special build with a debug shell or something just tell me I can build that for you so you don't have to spent time on appliance creation still debugging the issue. After discussing with Kay Sievers, changing the way we quit udev in kiwi won't fix the issue (unrelated to this problem). Moreover, systemd has no knowledge of lvm (only device-mapper). So, I'm guessing we might need to wait for boot.lvm to complete before mounting swap in systemd (I had to do similar fix for cryptsetup and fsck). I'll test this hypothesis.. no change by adding a dependency on boot.lvm before running mount command. In fact, comparing output from : udevadm info --query=all --name /dev/kiwiVG/LVSwap between initial boot (when kiwi initrd is started) and "normal" boot shows a lot of differences, mostly in the symlinks for devices, which are not in udev database, which would explain why systemd doesn't "react" on the dependency on this device. I'm more and more convinced bug is udev handling between kiwi initrd, "standard" initrd and boot after initrd. Let's see if I can find more info on this.. running udevadm trigger --action=change --sysname-match=dm-* correctly fill udev database (but since I'm doing that after logging, it is too late for udev to catch-up).. Maybe we should do that in the "udev trigger" service file, but only when booting after kiwi ? some info there too : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=593625#25 Hmm, sorry but it has no effect in my test. I changed the following
in the kiwi prepared root tree:
cat /lib/systemd/system/udev-trigger.service
[Unit]
Description=udev Coldplug all Devices
Wants=udev.service
After=udev-kernel.socket udev-control.socket
DefaultDependencies=no
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/udevadm trigger --type=subsystems --action=add ; /sbin/udevadm t
rigger --type=devices --action=add ; udevadm trigger --action=change --sysname-m
atch=dm-*
I build the image with this modification and gave it a try... no change
ok, found the bug : udev is now storing its db in /run/udev (and one part in /dev/.udev), so both needs to be moved from initrd to running system (this is only for "recent" distro, like fedora16, openSUSE 12.1) : /run must be mounted tmpfs, just before starting udevd : # mount run tmpfs mount -t tmpfs -o mode=0755,nodev,nosuid tmpfs /run then, before killing udev : mount --move /run /mnt/run We should probably not kill udev the way we do ATM but it is too risky to change that now, better to postpone this for 12.2. I can verify that this fixed the problem and submitted new kiwi packages Thanks much for all your help openSUSE-RU-2012:0668-1: An update that has 16 recommended fixes can now be installed. Category: recommended (low) Bug References: 728885,729251,729315,729636,729857,730763,731457,732247,736491,740033,740073,743159,745548,747898,752259,754344 CVE References: Sources used: openSUSE 12.1 (src): kiwi-4.98.35-1.4.1 |
I build an appliance which create a swap partition on LVM on first boot systemd hangs when trying to activate the swapspace. Later when I simply call swapon -a the swap is there and works. the fstab looks like this /dev/systemVG/LVSwap swap swap defaults 0 0 I will attach the dmesg output from systemd started with --log-level=debug --log-target=kmsg How to reproduce this: The appliance exists as debugging install iso in my NFS home directory ~ms/LimeJeOS-openSUSE-12.1.x86_64-1.12.1.iso to run an installation call the following: 1. qemu-img create /tmp/mydisk 4G 2. qemu-kvm -hda /tmp/mydisk \ -cdrom LimeJeOS-openSUSE-12.1.x86_64-1.12.1.iso -boot d during the installation the system will stop with a shell right before run-init/systemd is called. You can check the environment here and if you simple type 'exit' the system will boot and you see the timeout happening. This is really important for us, the studio team and the preload team to work. The problem does not happen if the swap space is not on an LVM or if sysvinit is used