Bug 760859

Summary: raid1 with pata and sata disk started as 2 arrays
Product: [openSUSE] openSUSE 12.1 Reporter: Volker Kuhlmann <bugz57>
Component: KernelAssignee: Neil Brown <nfbrown>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P2 - High CC: bugz57, stefan.bruens, suse-beta
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Disk partitioning
raid details
mdadm config
/proc/mdstat

Description Volker Kuhlmann 2012-05-06 01:43:34 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) konqueror/4.7.2 Safari/534.34

There seems to be a race condition between PATA disk, SATA disk, and a USB flash card reader for which device gets allocated device names in /dev first.

If it ends up like

# lsscsi 
[0:0:0:0]    disk    ATA      SAMSUNG HD103UJ  1AA0  /dev/sda 
[4:0:0:0]    disk    Generic  IC1210        CF 1.9C  /dev/sdb 
[4:0:0:1]    disk    Generic  IC1210        MS 1.9C  /dev/sdc 
[4:0:0:2]    disk    Generic  IC1210    MMC/SD 1.9C  /dev/sdd 
[4:0:0:3]    disk    Generic  IC1210        SM 1.9C  /dev/sde 
[5:0:0:0]    disk    ATA      ST3120026A       3.06  /dev/sdf 

after the raid1 was created with the two disks as /dev/sd[ab] then two separate arrays are started, both degraded.

It used to work fine before systemd, or up to 11.1 (didn't try with 11.[234]), but 12.1 is broken.

# cat /etc/mdadm.conf
DEVICE containers partitions
ARRAY /dev/md/linux1:system UUID=2276a9a1:da6d0888:554fa14a:f4f32b37
ARRAY /dev/md/linux1:home   UUID=ae0f83e0:3304dfea:feed7f14:ef394574

# mdadm -Ds
ARRAY /dev/md126 metadata=1.0 name=linux1:system UUID=2276a9a1:da6d0888:554fa14a:f4f32b37
ARRAY /dev/md/linux1:home metadata=1.2 name=linux1:home UUID=ae0f83e0:3304dfea:feed7f14:ef394574
ARRAY /dev/md/linux1:home metadata=1.2 name=linux1:home UUID=ae0f83e0:3304dfea:feed7f14:ef394574

Initrd contains

# t etc/mdadm.conf
AUTO -all
ARRAY /dev/md126 metadata=1.0 name=linux1:system UUID=2276a9a1:da6d0888:554fa14a:f4f32b37

The problem occurs only with the second array for /home, not the first array for / .


Reproducible: Always

Steps to Reproduce:
1. Set up system as described.
2. Reboot several times, examining raid status each time.
3.
Actual Results:  
1 raid1 array started with 2 active disks for rootfs.

2 raid1 arrays started with 1 active disk each, both degraded. Both arrays have the same array UUID.

Expected Results:  
2 raid1 arrays started, both with 2 active disks.
Comment 1 Neil Brown 2012-05-07 09:01:47 UTC
Thanks for the report.

I suspect this is a known problem, the fix for which isn't in 12.1 yet.

I'll get you an rpm to test, but it might be a couple of days...
Comment 2 Volker Kuhlmann 2012-05-08 08:53:43 UTC
Thanks Neil, but I won't be able to test this for the next few weeks.
Comment 3 Neil Brown 2012-05-21 04:55:13 UTC
Probably easiest to try the mdadm from factory

http://download.opensuse.org/repositories/Base:/System/openSUSE_Factory/x86_64/mdadm-3.2.5-55.1.x86_64.rpm

If that makes a difference you can either just still with that or I might be able to schedule an update for 12.1
Comment 4 Volker Kuhlmann 2012-06-25 20:49:43 UTC
The current version of mdadm in factory is slightly newer and doesn't install on oS 12.1, but 
zypper -vv si mdadm
and recompiling the package is easy.

The problem indeed seems to be solved on quick investigation. The IDE disk is always enumerated after the flash card reader disks now.

Thanks Neil for all your work on mdadm!
Comment 5 Volker Kuhlmann 2012-06-26 06:57:14 UTC
Sorry Neil NOT fixed. 

I recompiled the package from factory: mdadm-3.2.5-66.1.x86_64
when installing it, the initrds were re-created so I didn't put in any extra effort there. But it's not the rootfs array that has a problem, it's the /home array.

After the first reboot a degraded array was created - expected.
I resynced that by adding the second disk and waited until that was finished.
After a reboot (or maybe two) the array showed up correctly.
After another reboot the array was degraded again:

 /proc/mdstat
   Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] 
   md127 : active raid1 sda6[2]
	 99546668 blocks super 1.0 [2/1] [_U]

Another reboot later two arrays were again created:

 /proc/mdstat
   Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] 
   md125 : active raid1 sdb6[3]
	 99546668 blocks super 1.0 [2/1] [U_]

   md127 : active (auto-read-only) raid1 sda6[2]
	 99546668 blocks super 1.0 [2/1] [_U]

This array is used for /home. The other array uses sd[ab]5, is used for /, and doesn't show this split problem.

There is no entry in syslog that suggests there might be a hardware error. The same hardware has been rock-solid for successsive openSUSE versions for 5 years.
Comment 6 Neil Brown 2012-06-27 01:34:40 UTC
There seem to be two problems here.

Firstly, the array is sometime assembled without sdb6 being present.  Presumably this is a race between "mdadm -As" running, and sdb6 appearing.
I don't really know what to do about that.  Presumably you would want that to happen if sdb6 really had died and was not coming back.  But if there are unpredictable delays in sdb6 appearing, we don't know how long to wait for it.

Could there be a dependency problem?  Is there some init.d script that make sdb appear?  Maybe it doesn't always get run before the boot.md script??

The second problem is that once the array is marked as degraded, the two halves can be assembled as separate arrays.  This certainly shouldn't happen automatically but I can imagine how it might.  I'll try to see how best to fix that.
Comment 7 Volker Kuhlmann 2012-06-27 08:04:15 UTC
Thanks Neil!

If you have an mdadm fixing the second problem I'll be happy to test.

Re first problem:

[0:0:0:0]    disk    ATA      SAMSUNG HD103UJ  1AA0  /dev/sda 
[4:0:0:0]    disk    Generic  IC1210        CF 1.9C  /dev/sdb 
[4:0:0:1]    disk    Generic  IC1210        MS 1.9C  /dev/sdc 
[4:0:0:2]    disk    Generic  IC1210    MMC/SD 1.9C  /dev/sdd 
[4:0:0:3]    disk    Generic  IC1210        SM 1.9C  /dev/sde 
[5:0:0:0]    disk    ATA      ST3120026A       3.06  /dev/sdf 

sda reliably remains to be the Samsung. The Seagate PATA randomly gets inserted as sdb (before) or sdf (after the card reader drives). The card reader is internal; USB connected.

My uneducated guess for the delay of sd[bf] is that the kernel has pushed initialisation of PATA right to the end, after SATA and USB. Combined with systemd there's not much telling of timing...

I don't see any boot script affecting access to sd[bf]. That disk is old, and contains 2 swap partitions and 2 raid partitions (for / and /home). The rootfs surely must be started first, it never has any problems, and it would start sd[bf] IIUC. Both raids are raid1, booting is with 

setup --stage2=/boot/grub/stage2 --force-lba (hd0) (hd0,4)
setup --stage2=/boot/grub/stage2 --force-lba (hd1) (hd1,4)
quit

The only non-SuSE init scripts I use are using these dependencies

# Provides:          boot.local-rc
# Required-Start:    $local_fs boot.rootfsck
# Should-Start:      boot.quota boot.cleanup
# Required-Stop:     $null
# Should-Stop:       $null
# Default-Start:     B
# Default-Stop:      

and they call
  env SYSTEMD_NO_WRAP=1 /etc/init.d/boot.cleanup
to get the usual cleanup behaviour with systemd, and they create a few files in /tmp. All of this is operates on the rootfs, not on /home. The /home raid is the one which isn't behaving.

The details of the home array are identical to the ones for the rootfs, but I had to create them manually. When I installed oS 12.1 (fresh install) yast was giving me some problems with superblock versions. I didn't want 0.9, and 1.2 didn't work so I had to manually force the use of 1.0. The system boots from the rootfs raid1. (I must be using this type of setup for almost 10 years now.)

If there's any other useful information I can provide I'd be happy to.
Comment 8 Neil Brown 2012-09-12 03:42:23 UTC
Sorry it has been so long since I last looked at this ....

It is a little hard to track exactly which version has exactly which fix it in ... could you please check if your "/etc/init.d/boot.md" contains:

                # firstly finish any incremental assembly that has started.
                $mdadm_BIN -IRs
                $mdadm_BIN -A -s -c $mdadm_CONFIG

towards the end of the "start" section?  It is the "$mdadm_BIN -IRs" bit which I think might be important.  If not, a newer mdadm package should have that.

If you do have that (and I suspect you do) could you add a call to
   udevadm settle
just before the "$mdadm_BIN -IRs" call.

I haven't had a chance to yet to make sure that mdadm never assembles the same array twice (from different bits), but the above might arrange that it doesn't try..
Comment 9 Neil Brown 2012-09-12 07:53:36 UTC
I've changed my mind.  Don't try adding "udevadm settle" - it won't help (and it is already there anyway).

You still need to change /etc/init.d/boot.md though.

Please find the line near the top which reads: 

# Require-Start:  boot.udev boot.rootfsck

and add another word to the end (after a space):
   udev-trigger

This will ensure that new events have been triggered before boot.md runs, so that the "udevadm settle" can wait for them, and md won't try to assemble incomplete arrays.
Comment 10 Neil Brown 2012-09-26 02:38:18 UTC
I've submitted a maintenance request for mdadm to add 'udev-trigger' to the 'Should-Start' line which is more correct.

So closing this as 'fixed'.  Please reopen if the above does not fix your problem.
Thanks.
Comment 11 Swamp Workflow Management 2012-10-04 20:08:40 UTC
openSUSE-RU-2012:1291-1: An update that has two recommended fixes can now be installed.

Category: recommended (low)
Bug References: 760859,772286
CVE References: 
Sources used:
openSUSE 12.1 (src):    mdadm-3.2.2-4.9.1
Comment 12 Volker Kuhlmann 2012-10-06 02:08:54 UTC
Very sorry Neil, should have remembered to test it :-(

I unstalled the new mdadm package, the changes to 12.1 orig are
--- orig/boot_mdadm-3.2.2-4.1.2.md      2011-10-30 05:30:53.000000000 +1300
+++ boot.md     2012-09-27 23:23:37.000000000 +1200
-# Should-Start: boot.scsidev boot.multipath
+# Should-Start: boot.scsidev boot.multipath udev-trigger
+               # firstly finish any incremental assembly that has started.
+               $mdadm_BIN -IRs

Both arrays were fine. After reboot one array was in a state of mental
confusion:

kereru|root[1]:~# mdadm --manage -f /dev/md127 /dev/sda6
mdadm: set device faulty failed for /dev/sda6:  No such device
kereru|root[1]:~# mdadm --manage -r /dev/md127 /dev/sda6
mdadm: hot remove failed for /dev/sda6: No such device or address
kereru|root[1]:~# mdadm --manage -a /dev/md127 /dev/sda6
mdadm: /dev/sda6 reports being an active member for /dev/md127, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sda6 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda6" first.

I had to delete the superblock to get anywhere:

kereru|root[1]:~# mdadm --zero-superblock /dev/sda6
kereru|root[1]:~# mdadm --manage -a /dev/md127 /dev/sda6
mdadm: added /dev/sda6

After resync:

Personalities : [raid1] [raid0] [raid10] [raid6] [raid5] [raid4] 
md127 : active raid1 sda6[2] sdf6[3]
      99546668 blocks super 1.0 [2/2] [UU]

reboot

md125 : active raid1 sdb6[3]
      99546668 blocks super 1.0 [2/1] [U_]
md127 : inactive sda6[2](S)
      99546668 blocks super 1.0

After a further reboot (without fixing the array):

reboot

md125 : active raid1 sda6[2]
      99546668 blocks super 1.0 [2/1] [_U]
md127 : active (auto-read-only) raid1 sdb6[3]
      99546668 blocks super 1.0 [2/1] [U_]

Conclusion: Nothing has changed since 12.1 was released.

Surprisingly, the disk can be added to the correct array without needing a resync:

# mdadm --stop /dev/md127 
mdadm: stopped /dev/md127

md125 : active raid1 sda6[2]
      99546668 blocks super 1.0 [2/1] [_U]

# mdadm -f /dev/md125 /dev/sdb6
mdadm: set device faulty failed for /dev/sdb6:  No such device
# mdadm -r /dev/md125 /dev/sdb6
mdadm: hot remove failed for /dev/sdb6: No such device or address
# mdadm -a /dev/md125 /dev/sdb6
mdadm: re-added /dev/sdb6
[no resync here]

md125 : active raid1 sdb6[3] sda6[2]
      99546668 blocks super 1.0 [2/2] [UU]
Comment 13 Neil Brown 2012-10-07 22:38:03 UTC
hmm.. it seems this is more subtle than I thought.
Can you try removing /lib/udev/64-md-raid.rules from the initrd?
The easiest way to do this is to edit /lib/mkinitrd/scripts/setup-udev.sh
and remove the line
   64-md-raid.rules \

and then run "mkinitrd".

This will make sure the initrd doesn't even try to assemble  the 'home' array.  I think that might be the problem - it gets assembled too early before all the devices are available.

I understand you might not be able to try this immediately, but if you could let me know when you might be able to try, that would be helpful.

Thanks.
Comment 14 Volker Kuhlmann 2012-10-08 11:00:58 UTC
Thanks for outlining the procedure! I removed that one line and ran mkinitrd.
After the first boot all was well.
After the second boot:

md125 : active raid1 sdb6[3]
      99546668 blocks super 1.0 [2/1] [U_]
md127 : inactive sda6[2](S)
      99546668 blocks super 1.0

Taking the initrd apart with gunzip/cpio shows that the 64-md-raid.rules file was indeed gone.
Both times when booting, the screen log said that md126 was started, then there's some resuming, then a 30s wait for timeout for one disk by uuid coming online, then a prompt asking whether to fall back on /dev/md126, with default yes - I pressed enter.
The UUID it was waiting for belongs to md126.
Next time I pressed n<return> - seems to make no difference.

Restoring original initrd behaviour resolves the boot timeout.

Is it possible something gets confused with the actual device name of the disks? Sometimes it's sd[ab], sometimes sd[af], sort of randomly for each boot. The 4 device names in between are taken up by an internal USB flash card reader.
Comment 15 Neil Brown 2012-10-09 01:23:10 UTC
Well, that wasn't it... or maybe...
64-md-raid.rules does a few different things.  I want to disable some  of them but apparently not all.

Could you try editing /lib/udev/rules.d/64-md-raid.rules (maybe take a copy first for safety) and comment out every line that contains "--incremental".
Then run "mkinitrd" and try again?

Also what is in the "etc/mdadm.conf" file in your initrd?  You included it in the original report, but the latest mdadm package tries to avoid using names like "md126" - I just want to check that it is getting that right.

The changing of devices names is expected and shouldn't cause any problems.
Comment 16 Volker Kuhlmann 2012-10-09 07:54:28 UTC
Content of /etc/mdadm.conf:
DEVICE containers partitions
ARRAY /dev/md/kereru:system UUID=2276a9a1:da6d0888:554fa14a:f4f32b37
ARRAY kereru:home   UUID=d7994183:c9a6983b:743f2dcd:3dc163a3

I tried to use logical names if possible. I remember having some trouble forcing it to v1.0 superblock, so that I can mount the raid1 constituents individually if I absolutely have to.

Changes:

--- 64-md-raid.rules    2012-09-27 23:23:37.000000000 +1200
+++ /lib/udev/rules.d/64-md-raid.rules  2012-10-09 20:32:54.085371671 +1300
-ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
+#ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
-ENV{ID_FS_TYPE}=="isw_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
+#ENV{ID_FS_TYPE}=="isw_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"

After running mkinitrd, rebooting 3 times in a row each time gives (not withstanding sdf perhaps being sdb):

md127 : active raid1 sdf6[3] sda6[2]
      99546668 blocks super 1.0 [2/2] [UU]

So it looks like you cracked it! Thank you very much Neil. Those 2 incremental lines must have been put there for a purpose. If they need to be replaced with something else  I'm happy to test again.
Comment 17 Neil Brown 2012-10-11 04:55:52 UTC
Encouraging!
I'm still not 100% sure what it going on.

Is that mdadm.conf that you gave the one from the root filesystem or the one from the initrd?
The easiest way to get the one from the initrd is:

 zcat /boot/initrd | cpio -i --to-stdout etc/mdadm.conf

I'm I correct that your 2 raid1 arrays both use the same two devices?

It might be useful to see config/md.sh from the initrd too:

 zcat /boot/initd | cpio -i --to-stdout config/md.sh
Comment 18 Volker Kuhlmann 2012-10-11 06:11:56 UTC
The mdadm.conf in c#16 is from the rootfs.
The initrd one is (2 lines)

AUTO -all
ARRAY /dev/md126 metadata=1.0 name=kereru:system UUID=2276a9a1:da6d0888:554fa14a:f4f32b37

The initrd md.sh is (2 lines):

[ "$need_mdadm" ] || need_mdadm='1'
[ "$md_devs" ] || md_devs='/dev/md126'

So the initrd only deals with the rootfs array, and fixes the device name to md126..

Just in case, the rootfs /etc/sysconfig/mdadm is:
MDADM_DELAY=60
MDADM_MAIL="root@localhost"
MDADM_PROGRAM=""
MDADM_RAIDDEVICES=""
MDADM_SCAN=yes
MDADM_CONFIG="/etc/mdadm.conf"
MDADM_SEND_MAIL_ON_START=no
MDADM_DEVICE_TIMEOUT="60"

The physical disk layout is (relevant lines)

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes  (SATA)
/dev/sda5         4016313    35342999    15663343+  fd  Linux raid autodetect
/dev/sda6        35343063   234436481    99546709+  fd  Linux raid autodetect

Disk /dev/sdb: 120.0 GB, 120033041920 bytes  (PATA)
/dev/sdb5         4016313    35342999    15663343+  fd  Linux raid autodetect
/dev/sdb6        35343063   234436544    99546741   fd  Linux raid autodetect

The idea here being to get redundancy for the important rootfs and /home, but not for the rest of the junk yard. So IIUYC yes both raid1 are located on the same 2 physical disks.
Comment 19 Neil Brown 2012-10-18 01:53:41 UTC
Thanks for the extra details.

I've run out of ideas.  Everything looks right - it should work properly.
I'm quite sure that nothing is going wrong inside the initrd.

It really looks like there is some race between "mdadm --incremental" being run by udev, and "mdadm -A" being run by boot.md.
However the addition of "udev-trigger" to the "Should-Start" line, combined with the "udevadm settle" call already in boot.md, should prevent any race.

The latest change you made prevent udev from running "mdadm --incremental" so there is now no chance of a race.

I've made a couple of changes to upstream mdadm to improve the interaction between "-I" and "-A" but they are too big to push into 12.2 (or 12.1) without good reason.  I'll make sure they get into 12.3 (or whatever comes next).

For now I think you'll just have to manage with your current work-around.
Comment 20 Volker Kuhlmann 2012-10-19 10:27:58 UTC
Thank you for having a look at it. Making a minor modification to the init script is not a problem for me.

Unfortuntely I can't narrow the change more, the previous release running on this system was 11.1. Systemd wouldn't have anything to do with it? It has much more potential to cause any sorts of interesting race conditions.
Comment 21 Volker Kuhlmann 2012-10-26 05:47:07 UTC
Drats, here's a different problem. On a more modern box I use a similar setup - 12.1, 2x SATA disk, raid1 each for / and /home.

Yesterday morning I rebooted, and there was a raid array failure I didn't pay too much attention to, and which I couldn't solve from the root shell, so I rebooted again. No trouble that I noticed then, but it must have come up with md1 (/home) with only 1 active disk. Mid day was a power cut, followed by a boot in the evening. I was missing files I had created that morning, and found that md1 was running with only 1 active disk - this time the *OTHER* one, or at least only that explains it - my missing files were on the non-active disk. It took me a few hours to clean up from that.

Just in case I reverted to mdadm-3.2.2-4.4.1.x86_64.

Now I rebooted twice for testing, all fine, though I'm unsure what 
         bitmap: 1/2 pages [4KB], 65536KB chunk
means.

I then tried mdadm-3.2.2-4.9.1.x86_64.rpm again. I rebooted twice, and each time md1 came up with only 1 active disk. Reverting back to mdadm-3.2.2-4.4.1 fixes that problem.

Increasing importance of this bug - while it may help the issue with SATA/PATA raid, for SATA raid it seems dangerous. And a raid implementation randomly starting with 1 out of 2 disks is less than useless (it's begging for data loss).

The update to mdadm-3.2.2-4.9.1 was a little while ago, and yesterday I first noticed the problem. There have been at least one more reboot, and it's theoretically possible I failed to notice running on only 1 disk.

I attach some status info from when running with mdadm-3.2.2-4.9.1.

smartctl shows no disk problem - this is a software problem.
Comment 22 Volker Kuhlmann 2012-10-26 05:47:50 UTC
Created attachment 510988 [details]
Disk partitioning
Comment 23 Volker Kuhlmann 2012-10-26 05:48:18 UTC
Created attachment 510989 [details]
raid details
Comment 24 Volker Kuhlmann 2012-10-26 05:48:48 UTC
Created attachment 510990 [details]
mdadm config
Comment 25 Volker Kuhlmann 2012-10-26 05:49:22 UTC
Created attachment 510991 [details]
/proc/mdstat
Comment 26 Christian Boltz 2012-10-26 19:15:03 UTC
(In reply to comment #21)
> [...] I was missing files I had created that morning, and found
> that md1 was running with only 1 active disk - this time the *OTHER* one, or at
> least only that explains it - my missing files were on the non-active disk. It
> took me a few hours to clean up from that.

I also had this behaviour with 12.2 (and yes, also with the OTHER disk active sometimes), but it didn't happen again after I updated my system to factory-tested around 2012-10-03 (latest changelog entry for mdadm is from Sep 20 2012).
Comment 27 Volker Kuhlmann 2012-11-01 08:02:12 UTC
Installed the updates from yesterday including the dbus packages. Rebooted, failure to run md1 altogether - this is on the machine with 2 SATA disks in RAID1, and with the previous mdadm-3.2.2-4.4.1.x86_64

Rebooted again, md1 is started, but with only 1 disk. Rebooted some more, it appears md1 is started with only one disk, and it's random which one that is.

I am guessing that the boot failure resulted from neither disk being available for md1.

The previous mdadm package was needed to make md1 start with both disks, but it seems that's no longer the case.

This whole thing smells like a race problem, and it seems irrelevant whether one of the disks is PATA. dbus may be involved, I don't see any other system-level package change since I last tested it 6 days ago.

It also looks like that Linux md raid is currently severely broken :-(
Comment 28 Stefan BrĂ¼ns 2012-11-19 23:14:21 UTC
I have another idea whats happening, and I found a workaround which works for me:

Workaround:
--- /etc/init.d/boot.md 2012-06-05 13:29:11.000000000 +0200
+++ /tmp/boot.md        2012-11-20 00:02:42.281518551 +0100
@@ -123,6 +123,7 @@
 
        # Wait for udev to settle
        if [ "$MDADM_DEVICE_TIMEOUT" -gt 0 ] ; then
+            /sbin/udevadm trigger --verbose --subsystem-match=block --attr-match=partition
            /sbin/udevadm settle --timeout="$MDADM_DEVICE_TIMEOUT"
        fi

(the --verbose is purely diagnostical)

Possible explanation:

the required udev-trigger (Should-Start ...) is run very early, before all necessary devices are available. The settle then waits for an event queue which does not contain the required devices.

I am not currently in reach of the affected machine, so I cannot debug it myself, but this may be an approach:

1. in the udev-trigger.service, echo the current timestamp and the output of "udev-adm trigger --dry-run --verbuse --attr-match=partition" to some log file.
2. in boot.md log the time stamp before the settle, and log the output of " /sbin/udevadm settle --timeout=0 ; echo $?", which returns the current number of outstanding events (to see if the settle is a noop)
3. in the udev md-raid rules, log the timestamp and device name of every "add"
Comment 29 Neil Brown 2012-11-29 04:00:42 UTC
That work around feels a lot like a hack.  It is little more than adding something like
   sleep 5

in there.

Maybe the problem is async device discovery - a phrase that I think I've heard but don't know the details of.

The current approach assumes that all devices have been discovered by the kernel when "udevadm trigger" is run by systemd, and so once 'udevadm settle' has run, all the devices will be visible.

But if discovery happen asynchronously (rather than during the module 'init' process) then I cannot see any way to reliably wait for them.

And without that, we cannot boot reliably.

I'll see if I can find someone to ask about this.
Comment 30 Neil Brown 2013-03-13 04:06:30 UTC
I think this bug (or what's left of it) is the same as bug 793954, so I'm marking it a duplicate of that.
There seems to be a problem with "udevadm settle" not really waiting properly but I don't really know why (yet).

*** This bug has been marked as a duplicate of bug 793954 ***
Comment 31 Volker Kuhlmann 2013-03-14 02:26:23 UTC
This problem has nothing to do with PATA disks and continues to exist on current hardware.