Bug 805415 - raid1: started degraded after crash
Summary: raid1: started degraded after crash
Status: RESOLVED DUPLICATE of bug 793954
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem (show other bugs)
Version: 13.1 Milestone 2
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: Neil Brown
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-23 22:13 UTC by Jiri Slaby
Modified: 2013-03-12 02:46 UTC (History)
0 users

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
/var/log/message excerpt (71.27 KB, text/plain)
2013-02-25 09:52 UTC, Jiri Slaby
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jiri Slaby 2013-02-23 22:13:17 UTC
Hi Neil,

when my system crashes, after a fresh boot, my raid1 is started degraded (I'm not sure if every time):
# journalctl |grep \\.md
Feb 23 22:26:37 bellona.site boot.md[1538]: Starting MD RAID mdadm: /dev/md1 has been started with 1 drive (out of 2).
Feb 23 22:26:37 bellona.site boot.md[1538]: ..done
# cat /proc/mdstat
Personalities : [raid0] [raid1] 
md1 : active raid1 sda2[1]
      48794496 blocks super 1.2 [2/1] [_U]
      

I have to manually call mdadm -a:
# mdadm -a /dev/sdb2
mdadm: added /dev/sdb2
bellona:~ # cat /proc/mdstat
Personalities : [raid0] [raid1] 
md1 : active raid1 sdb2[2] sda2[1]
      48794496 blocks super 1.2 [2/1] [_U]
      [>....................]  recovery =  0.0% (6528/48794496) finish=248.5min speed=3264K/sec


fdisk output:
/dev/sda2          411648    98066431    48827392   83  Linux
/dev/sdb2            2048    97656831    48827392   83  Linux


# mdadm --misc --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Sat Sep  8 20:18:43 2012
     Raid Level : raid1
     Array Size : 48794496 (46.53 GiB 49.97 GB)
  Used Dev Size : 48794496 (46.53 GiB 49.97 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sat Feb 23 23:11:34 2013
          State : clean, degraded, recovering 
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 1% complete

           Name : bellona:1
           UUID : 3bad2815:4e6bfc83:b113591c:22fd63f8
         Events : 12658

    Number   Major   Minor   RaidDevice State
       2     259    917504        0      spare rebuilding   /dev/sdb2
       1     259    262144        1      active sync   /dev/sda2


# rpm -q mdadm
mdadm-3.2.6-4.1.x86_64
# uname -a
Linux bellona.site 3.8.0-rc7-next-20130218_64+ #1768 SMP Mon Feb 18 10:08:51 CET 2013 x86_64 x86_64 x86_64 GNU/Linux
Comment 1 Neil Brown 2013-02-25 02:24:19 UTC
Can I get the kernel logs when it was booting?

I think this is mostly likely caused by some sort of race where one of the device is being held busy by udev in some way while mdadm is trying to assemble the array - so it only manages to get one device.
Hopefully the kernel logs will have more hints.
Comment 2 Jiri Slaby 2013-02-25 09:52:59 UTC
Created attachment 526376 [details]
/var/log/message excerpt

Yeah, sure.
Comment 3 Neil Brown 2013-02-26 02:40:39 UTC
Nothing much useful there unfortunately.  It does confirm that sda2 is definitely visible and working before mdadm runs, but it doesn't show what mdadm didn't use it.

does 
  journalctl | grep mdadm

show anything?

If/when it happens again, could you collect the output of
  mdadm -E /dev/sd[ab]2
before adding the missing device back in.

Is it always the same device that is missing?
Comment 4 Jiri Slaby 2013-03-05 10:08:53 UTC
(In reply to comment #3)
> does 
>   journalctl | grep mdadm
> 
> show anything?

I tried that when it happened and there was nothing for mdadm, neither for \\.md.

> If/when it happens again, could you collect the output of
>   mdadm -E /dev/sd[ab]2
> before adding the missing device back in.

It haven't happened again yet. And I tried dd if=/dev/urandom of=file with echo c >/proc/sysrq-trigger several times. As soon as it recurs, I will provided that info.
Comment 5 Neil Brown 2013-03-12 02:46:29 UTC
I'm pretty sure this is the same problem as bug 793954.
You'll find a work-around there.  I'll hopefully come up with a proper fix soon.

*** This bug has been marked as a duplicate of bug 793954 ***