|
Bugzilla – Full Text Bug Listing |
| Summary: | MD RAID Installation doesn't work after first reboot - errornous partition entry in fstab | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 13.1 | Reporter: | Forgotten User TSA7_7Eqlt <forgotten_TSA7_7Eqlt> |
| Component: | Basesystem | Assignee: | E-mail List <bnc-team-screening> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | aschnell, bwiedemann, forgotten_TSA7_7Eqlt, nfbrown, rmilasan, systemd-maintainers, thomas.blume, trenn |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 13.1 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
dmesg output
fstab journalctt -xb output |
||
|
Description
Forgotten User TSA7_7Eqlt
2014-12-01 07:44:13 UTC
You have a named MD RAID. For those YaST used the name in fstab. That also worked in previous versions. Apparently now /dev/md/backup is not created anymore and that looks like a regression. Is this really about 13.1? I think, we covered md-raid for system partitions there by openQA. Any other specials? Do you use partitions below the md-raid or logical volumes? Yes, it's about 13.1 shurely (Download 08.09.2014). No, no logical volumes. It is - because it's a test system for a small fileserver - a text only system with just one SATA-HD (later there will be 2 HDs of course, I'm just developing some scripting for that now. Partioning: - /dev/sda1 (20G) / ext4 - /dev/sda2 (5G) swap - /dev/sda3 and /dev/sda4 (each 458G) md-RAID1 with vfat (so they will be accessable in a windows environment out of the raid in emergency case). Samba, vsftp, apache2, ssh servers installed. Nothing special I think. jth There should normally be some symlinks with the raid name to the real raid device. This is created in /lib/udev/rules.d/63-md-raid-arrays.rules:
ENV{DEVTYPE}=="partition", ENV{MD_DEVNAME}=="*[^0-9]", SYMLINK+="md/$env{MD_DEVNAME}%n"
ENV{DEVTYPE}=="partition", ENV{MD_DEVNAME}=="*[0-9]", SYMLINK+="md/$env{MD_DEVNAME}p%n"
Please boot the failing system with:
rd.udev.log-priority=debug udev.log-priority=debug
and attach /var/log/messages after it went into the emergency shell.
I will do that as fast as I can. But as I'm actually working with the test system, I will have to save all this data and then clean the hd and reinstall. Maybe I won't find time for that this week. Actually the rules you named exist in /lib/udev/rules.d/63-md-raid-arrays.rules and I think they are here and unchanged since installation as their last modification date is jun, 12. IMHO it's not a problem of these rules but of an incorrect /etc/fstab. But I will reinstall as fast as I find the time. thx xu jth (In reply to Joerg Thuemmler from comment #6) > I will do that as fast as I can. But as I'm actually working with the test > system, I will have to save all this data and then clean the hd and > reinstall. Maybe I won't find time for that this week. > No need, I could reproduce the issue on a testmachine with this fstab: /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part1 swap swap defaults 0 0 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part2 / ext4 acl,user_xattr 1 1 /dev/md/testraid1 /raidmount1 ext4 acl,user_xattr 1 2 /dev/md/testraid2 /raidmount2 ext4 acl,user_xattr 1 2 actually the mount fails, because the raid symlinks have different names: # ll /dev/md/ total 0 lrwxrwxrwx 1 root root 8 Dez 5 15:19 testraid1_0 -> ../md127 lrwxrwxrwx 1 root root 8 Dez 5 15:19 testraid2_0 -> ../md126 # /sbin/mdadm --detail --export /dev/md127 MD_LEVEL=raid1 MD_DEVICES=2 MD_METADATA=1.0 MD_UUID=a95f0440:4ece5ab9:389a8dcd:6106d458 MD_DEVNAME=testraid1_0 MD_NAME=testraid1 Robert, see comment #5. Should the rule probably use ENV{MD_NAME} instead ENV{MD_DEVNAME}? Thomas, that is not my decision, the rule is part of mdadm. Lets add Neil here. @Neil, can you check comment #7, I agree with it, but it's not my call. The fact that "_0" is being appended to the name implies that: 1/ the array is not listed in /etc/mdadm.conf, and 2/ the host name stored in the metadata does not match the hostname of the host. Can you please report: 1/ mdadm --examine of one of the component devices 2/ hostname 3/ content of /etc/mdadm.conf 4/ content of /etc/mdadm.conf in the initrd My guess is that the hostname in the metadata is some sort of generic name, and that the array is being assembled from the initrd. If that is the case the best fix is to make sure the initrd *only* assembles arrays listed in /etc/mdadm.conf in the initrd. I thought we did that already but I'm not sure. Hm, that's interesting.
There is no mdadm.conf in the installed systemd nor in the initrd.
Still, the yast installation logs show:
--<--
# zgrep mdadm.conf y2log-1.gz
2014-12-05 08:13:12 <1> pxeboot236.hwlab.suse.de(3241) [libstorage] SystemCmd.cc(execute):90 SystemCmd Executing:"/sbin/mdadm --examine --scan --config=partitions > /tmp/libstorage-PuArm3/mdadm.conf"
2014-12-05 08:13:12 <1> pxeboot236.hwlab.suse.de(3241) [libstorage] SystemCmd.cc(doExecute):279 stopwatch 0.022622s for "/sbin/mdadm --examine --scan --config=partitions > /tmp/libstorage-PuArm3/mdadm.conf"
2014-12-05 08:13:12 <1> pxeboot236.hwlab.suse.de(3241) [libstorage] AsciiFile.cc(reload):62 loading file /tmp/libstorage-PuArm3/mdadm.conf
2014-12-05 08:13:12 <1> pxeboot236.hwlab.suse.de(3241) [libstorage] AsciiFile.cc(logContent):119 content of /tmp/libstorage-PuArm3/mdadm.conf
2014-12-05 08:13:12 <1> pxeboot236.hwlab.suse.de(3241) [libstorage] SystemCmd.cc(execute):90 SystemCmd Executing:"/sbin/mdadm --assemble --scan --config=/tmp/libstorage-PuArm3/mdadm.conf"
2014-12-05 08:13:12 <1> pxeboot236.hwlab.suse.de(3241) [libstorage] SystemCmd.cc(doExecute):279 stopwatch 0.028434s for "/sbin/mdadm --assemble --scan --config=/tmp/libstorage-PuArm3/mdadm.conf"
2014-12-05 14:17:55 <1> pxeboot236.hwlab.suse.de(3241) [libstorage] AsciiFile.cc(reload):62 loading file /mnt/etc/mdadm.conf
-->--
Neil, here are the requested information.
# cat /proc/mdstat
Personalities : [raid1]
md126 : active raid1 vda1[1] sda3[0]
1051584 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid1 vda2[1] sda4[0]
1039296 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
# mdadm --examine /dev/sda3
/dev/sda3:
Magic : a92b4efc
Version : 1.0
Feature Map : 0x1
Array UUID : a95f0440:4ece5ab9:389a8dcd:6106d458
Name : testraid1
Creation Time : Fri Dec 5 14:17:40 2014
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 2105312 (1028.16 MiB 1077.92 MB)
Array Size : 1051584 (1027.11 MiB 1076.82 MB)
Used Dev Size : 2103168 (1027.11 MiB 1076.82 MB)
Super Offset : 2105328 sectors
Unused Space : before=0 sectors, after=2144 sectors
State : clean
Device UUID : 24bb4e93:21ebe260:3c36cc33:2b36acd4
Internal Bitmap : -16 sectors from superblock
Update Time : Mon Dec 8 08:24:52 2014
Bad Block Log : 512 entries available at offset -8 sectors
Checksum : 47a451b3 - correct
Events : 38
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
# hostname
vm-gnocchi
# cat /etc/mdadm.conf
cat: /etc/mdadm.conf: No such file or directory
# lsinitrd initrd-3.11.6-4-desktop | grep mdadm.conf
#
Seems that the installer failed to copy mdadm.conf in the installed systemd.
Joerg, can you see an mdadm.conf on your machine?
(In reply to Thomas Blume from comment #10) > Seems that the installer failed to copy mdadm.conf in the installed systemd. Sorry, my confusion. The raid file was initially there, otherwise it couldn't have been loaded: > 2014-12-05 14:17:55 <1> pxeboot236.hwlab.suse.de(3241) [libstorage] > AsciiFile.cc(reload):62 loading file /mnt/etc/mdadm.conf Since I didn't delete it, I'm wondering where it is gone. Neil, is it possible that this is an effect of a raid sync in the wrong direction? Created attachment 616230 [details]
dmesg output
Created attachment 616231 [details]
fstab
Created attachment 616233 [details]
journalctt -xb output
Hi again, I've just installed. Changed nothing but disk partition sizes (just to be sure) and software selection. Same results with the difference that it was possible to launch emergency mode (last time this was proposed but no prompt to do so). Changed boot cmd to activate the logging (rd.udev, udev). Turns out, there is no /var/log/messages at all... I attached the output of dmesg and of journalctl -xb and the /etc/fstab therefore. But IMHO there will nothing special to be found, as I assume, fsck fails before trying to check /dev/md/backup. thx & cu jth @Thomas sorry, forgot: there is no mdadm.conf on the machine. cu jth (In reply to Joerg Thuemmler from comment #16) > @Thomas > > sorry, forgot: there is no mdadm.conf on the machine. > > cu jth Joerg, can you please grep for mdadm.conf in /var/log/YaST2/y2log* ? For the gzipped files (e.g. y2log-1.gz) use zgrep. Here it comes: # grep "mdadm.conf" <y2log-2 2014-11-26 13:46:19 <1> linux(2645) [libstorage] SystemCmd.cc(execute):90 SystemCmd Executing:"/sbin/mdadm --examine --scan --config=partitions > /tmp/libstorage-ya4USK/mdadm.conf" 2014-11-26 13:46:20 <1> linux(2645) [libstorage] SystemCmd.cc(doExecute):279 stopwatch 0.046697s for "/sbin/mdadm --examine --scan --config=partitions > /tmp/libstorage-ya4USK/mdadm.conf" 2014-11-26 13:46:20 <1> linux(2645) [libstorage] AsciiFile.cc(reload):62 loading file /tmp/libstorage-ya4USK/mdadm.conf 2014-11-26 13:46:20 <1> linux(2645) [libstorage] AsciiFile.cc(logContent):119 content of /tmp/libstorage-ya4USK/mdadm.conf 2014-11-26 13:46:20 <1> linux(2645) [libstorage] SystemCmd.cc(execute):90 SystemCmd Executing:"/sbin/mdadm --assemble --scan --config=/tmp/libstorage-ya4USK/mdadm.conf" 2014-11-26 13:46:20 <1> linux(2645) [libstorage] SystemCmd.cc(doExecute):279 stopwatch 0.046732s for "/sbin/mdadm --assemble --scan --config=/tmp/libstorage-ya4USK/mdadm.conf" 2014-11-26 14:13:48 <1> linux(2645) [libstorage] AsciiFile.cc(reload):62 loading file /mnt/etc/mdadm.conf 13:46 is the time, I installed first, I believe, but I'm not sure in that. After reboot failed, I commented out the /dev/md... in fstab, rebooted and used yast partitioner to reinstall RAID. This time it was /dev/md127. Today I reinstalled as I wrote before, same error. The I restored by clonezilla, as I have to work on the system. Interesting: after restoring sda1 the raid fails (maybe because I had to repartionate) and when I reinstalled the Raid in the yast partitioner the raid got the name /dev/md0 ...? thx & cu jth I've tested this on 13.2 and there it works correctly. /etc/mdadm.conf is there after initial reboot the symlinks are created with the raid device names: # ll /dev/md total 0 lrwxrwxrwx 1 root root 8 Dez 11 16:56 testraid1 -> ../md127 lrwxrwxrwx 1 root root 8 Dez 11 16:56 testraid2 -> ../md126 compare with comment #7: # /sbin/mdadm --detail --export /dev/md127 MD_LEVEL=raid1 MD_DEVICES=2 MD_METADATA=1.0 MD_UUID=2a1cc153:ebca019e:5042aecc:97048d55 MD_DEVNAME=testraid1 MD_NAME=any:testraid1 [...] @Thomas that's ok. As the workaround is easy (just correct the entries after firstboot or simply install the raid after a second reboot) I think there's no need to work on on this, if it's fixed in 13.2. If 13.1 becomes an "evergreen" version someone shall give a hint in the notes to be careful with md-Raids ... thx jth P.S. I'm not familiar with the rules here. If it's ok for you, someone can close the thing. Ok, thanks for the feedback. I also verified that the issue is fixed on SLES12. closing the bug as fixed in latest version. |