Bug 905746

Summary: DMraid fails to populate /dev/mapper with partitions
Product: [openSUSE] openSUSE Distribution Reporter: Forgotten User A9YPvv3rEp <forgotten_A9YPvv3rEp>
Component: Upgrade ProblemsAssignee: Thomas Renninger <trenn>
Status: RESOLVED FIXED QA Contact: Jiri Srain <jsrain>
Severity: Critical    
Priority: P5 - None CC: auxsvr, bwiedemann, forgotten_lNYeazqpWh, hare, mcaj, mpluskal, srid, trenn
Version: 13.2   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 13.2   
Whiteboard: ?
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Screenshot of dracut status
0001-dmraid-rely-on-kpartx-udev-rules.patch

Description Forgotten User A9YPvv3rEp 2014-11-17 13:47:32 UTC
Upgrade of 13.1 to 13.2 done as a fresh install on system with Adaptec 5081 mirrored pair RAID.

Installation appears to go OK, but on first reboot into new system boot process hangs in dracut as the system is unable to find '/' ('/dev/mapper/ddf1_System_part1') and '/home' ('/dev/mapper/ddf1_System_part2') partitions.

On investigation'/dev/mapper/ddf1_System' is present but the links to the actual partitions are missing.

I could find no way out of this for a successful boot.

Reverted to 13.1 and that behaved as expected.
Comment 1 Neil Brown 2014-11-20 02:37:26 UTC
That is strange.

the "dmraid.sh" script in dracut which runs "dmraid -ay"  to activate the array
then runs "kpartx  -a /dev/mapper/..." which should create the partitions.

Some step there is presumably going wrong.

Maybe there needs to be a 'udevadm settle' between the 'dmraid -ay' and the 'kpartx' to make sure the /dev/mapper/XX device does exist.

Can you experiment?  It would require:
 - update back to 13.2
 - when dracut gives you a shell, check the expected devices are missing
 - run 
      kpartx -a /dev/mapper/ddf1_System
 - check that the expected devices now appear
 - 'exit' - I think that will allow the 'boot' to continue
 - edit 
     /usr/lib/dracut/modules.d/90dmraid/dmraid.sh
   and insert 'udevadm settle' after each 'dmraid...' command.
 - run mkinitrd
 - reboot

And let me know how far you get.

Thanks.
Comment 2 Forgotten User A9YPvv3rEp 2014-11-20 07:31:51 UTC
At the moment the system is not available for testing. I had to do a rebuild for other reasons and thought I might as well try 13.2 while I was at it as it had been fine on other systems. It will be a while before I can release it for further testing, but I will see if I can sort something out.
Comment 3 Forgotten User A9YPvv3rEp 2014-11-24 08:38:13 UTC
Created attachment 614666 [details]
Screenshot of dracut status

Showing how dmraid is using a different naming convention to what the boot up process expects
/dev/mapper/ddf1_System_part1 <-> /dev/mapper/ddf1_System1
Comment 4 Forgotten User A9YPvv3rEp 2014-11-24 08:41:27 UTC
I have attempted the suggested fix which does not work. See the previous attachment screenshot.

I notice that other versions of dmraid, e.g. Partition Magic use this "different" naming convention so maybe there has been a change in dmraid API.
Comment 5 Neil Brown 2014-11-25 03:39:31 UTC
I'm sure I've seen this before.....

When kpartx makes partitions, it defaults to using 'p' to separate
the partition number from the device name if the latter ends in a digit, and an empty string otherwise.
That is why you see  "ddf1_System1". - dracut is letting kpartx use the default.

When udev triggers the creation of paritions via /usr/lib/udev/rules.d/66-kpartx.rules, kpartx is run as
  kpartx -u -p -part /dev/$name

The "-p -part" should tell it to use "-part" as the separator resulting in
 /dev/mapper/ddf1_System-part1
So why dracut is waiting for /dev/mapper/ddf1_System_part1 (underscore instead of hyphen) I don't know.

This is probably related to /lib/udev/rules.d/67-kpartx-compat.rules which
should create symlinks with the "_part1" name.

That seems to only work if "DM_PART" is set, and I cannot see where it gets set.

So if you modify 
  /usr/lib/dracut/modules.d/90dmraid/dmraid.sh

to add  "-p _part" to the options to kpartx, and then "mkinitrd", it might work.

But it is at least a little confusing.

Hannes:  do you know what is meant to happen here?  In particular, what sets DM_PART, and why isn't it being setting inside dracut?
Comment 6 Forgotten User A9YPvv3rEp 2014-11-25 07:39:15 UTC
Is "-p -part" a typo that should be "-p _part" ?

it might be being interpreted as -p "" and -part (unknown option) leading to the lack of the suffix.
Comment 7 Neil Brown 2014-11-25 07:45:28 UTC
No "-p -part" doesn't get confused by the "-" - is used "-part" literally as the separator (I just tested to double-check).
Comment 8 Neil Brown 2014-12-15 03:42:41 UTC
I think this is a dracut problem.
dracut should call "kpartx" with the same arg that udev does.

So re-assigning to Thomas.
Comment 9 Neil Brown 2014-12-15 03:43:33 UTC
*** Bug 907522 has been marked as a duplicate of this bug. ***
Comment 10 Martin Caj 2015-05-18 13:19:17 UTC
HI,

This weekend during the update a server running openSUSE 13.1 to 13.2 we run into the same trouble as Tim.

we spend a lot of hours to testing and fixing the grub2 (where we had suspicion the bug might be) but today with the fresh mind I found this bug.

That server (supermicro) has the Adaptec raid and running dmraid for system and data. 

I'm very sad the bug is open since 2014-11-17. 
is there any plans to fix it soon ?

There is a lot of people how are running dmraid on servers and be surprise some of them are using openSUSE !
we don't tell them: "you know, yes there is a bug, we know about it, but we don't fix it, go to Fedo*** or Ubu*** is you have dmraid ..." do we ?
Comment 12 Iakov Karpov 2015-06-17 10:29:35 UTC
For everyone who use workaround suggested by Neil (#c5):
The official update for openSUSE 13.2, openSUSE-2015-427, would return your modified /usr/lib/dracut/modules.d/90dmraid/dmraid.sh into it's original state!
Be careful! Double-check if you have corrected everything after installing it.
I broke my system yesterday, since I've forgot I had this problem on my system.
And many thanks to openSUSE maintainer team, who fix critical bugs that prevent your computer from booting as soon as possible!
Comment 13 Hannes Reinecke 2015-06-17 14:43:42 UTC
Created attachment 638240 [details]
0001-dmraid-rely-on-kpartx-udev-rules.patch

dmraid: rely on kpartx udev rules to generate partitions.
Comment 14 Hannes Reinecke 2015-06-17 14:44:02 UTC
Please test with the above patch.
Comment 15 Iakov Karpov 2015-06-17 17:50:21 UTC
(In reply to Hannes Reinecke from comment #14)
> Please test with the above patch.

The patch does the trick, many thanks! OS starts flawlessly, with all partitions mounted. "Bootloader" module in YaST works too (workaround in #c5 did break it).
Comment 16 Thomas Renninger 2015-06-24 16:10:57 UTC
JFI: Mainline there was a change to remove all dmraid udev rules:
--- a/modules.d/90dmraid/module-setup.sh
+++ b/modules.d/90dmraid/module-setup.sh
@@ -74,8 +74,6 @@ install() {
 
     inst "$moddir/dmraid.sh" /sbin/dmraid_scan
 
-    inst_rules 64-md-raid.rules
-
     inst_libdir_file "libdmraid-events*.so*"
 
     inst_rules "$moddir/61-dmraid-imsm.rules"


IMO this is the wrong way. It took me quite some time to adjust everything to dracut version 042 and I added above patch.
I continue in this order:
  - Submit this to factory
  - Submit above patch to 13.2
  - Submit things (this and other ones) mainline

-> Be aware that at least last one may take some days...
Comment 17 Bernhard Wiedemann 2015-07-02 13:00:08 UTC
This is an autogenerated message for OBS integration:
This bug (905746) was mentioned in
https://build.opensuse.org/request/show/314848 13.2 / dracut
Comment 18 Thomas Renninger 2015-07-07 13:56:44 UTC
see last autobot comment..
Comment 19 Swamp Workflow Management 2015-07-13 10:07:59 UTC
openSUSE-RU-2015:1230-1: An update that has one recommended fix can now be installed.

Category: recommended (moderate)
Bug References: 905746
CVE References: 
Sources used:
openSUSE 13.2 (src):    dracut-037-17.15.1
Comment 20 Iakov Karpov 2015-07-19 20:06:39 UTC
dmraid boot is still broken. Looks like a typo, the following fix to module-setup.sh fixes boot:

--- module-setup.sh.orig	2015-07-06 11:04:34.000000000 +0300
+++ module-setup.sh	2015-07-19 22:54:08.000000000 +0300
@@ -76,7 +76,7 @@
 
     inst "$moddir/dmraid.sh" /sbin/dmraid_scan
 
-    66-kpartx.rules 67-kpartx-compat.rules
+    inst_rules 66-kpartx.rules 67-kpartx-compat.rules
 
     inst_libdir_file "libdmraid-events*.so*"
Comment 21 Thomas Renninger 2015-07-23 12:54:13 UTC
Oh god, sorry about this. At least the mainline version I sent was correct..
Resubmitted without typo for 13.2 inclusion:

home:trenn:branches:openSUSE:13.2:Update/dracut.openSUSE_13.2_Update> eosc submitpac
created request id 318172
Comment 22 Bernhard Wiedemann 2015-07-23 13:00:10 UTC
This is an autogenerated message for OBS integration:
This bug (905746) was mentioned in
https://build.opensuse.org/request/show/318172 13.2 / dracut
Comment 23 Forgotten User A9YPvv3rEp 2015-07-30 06:23:24 UTC
I seem to be getting 'spammed' by  Swamp Workflow Management on this bug with batches of toggling between whiteboard= "? obs:running:3938:moderate " and "?". IS this really necessary, its very unhelpful from my point of view, especially as the bug is supposedly "RESOLVED,FIXED"
Comment 24 Swamp Workflow Management 2015-08-05 10:09:07 UTC
openSUSE-RU-2015:1348-1: An update that has one recommended fix can now be installed.

Category: recommended (moderate)
Bug References: 905746
CVE References: 
Sources used:
openSUSE 13.2 (src):    dracut-037-17.18.1
Comment 28 Swamp Workflow Management 2015-10-15 14:10:56 UTC
SUSE-RU-2015:1762-1: An update that has 13 recommended fixes can now be installed.

Category: recommended (important)
Bug References: 898711,904533,905746,912734,919179,922676,931307,932981,936736,939101,940100,940585,943312
CVE References: 
Sources used:
SUSE Linux Enterprise Server 12 (src):    dracut-037-51.13.1
SUSE Linux Enterprise Desktop 12 (src):    dracut-037-51.13.1