|
Bugzilla – Full Text Bug Listing |
| Summary: | dracut stop booting after btrfs rootfilesystem went full | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Diego Ercolani <diego.ercolani> |
| Component: | Bootloader | Assignee: | Thomas Renninger <trenn> |
| Status: | RESOLVED INVALID | QA Contact: | Jiri Srain <jsrain> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | dsterba, trenn |
| Version: | 13.2 | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | openSUSE 13.2 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
a tar of all the files regarding this issue
rpm -Va showing some lack of libraries (not involved in boot process) |
||
Probably this bug is tiled with bug 905615 I tried to replace the "new generated" initrd with the initrd that worked until 19th of May (before the filesystem got full) (as, as far as I know I have the same disk geometry and hardware like before) I have the same issue. So something happened to the fileystem?!?! I tried to examine the filesystem from the rescue disk and everithing seems fine, I can access everywhere and every subvolume. Please, as I have the system completely down, before reinstall all (fortunately I have backup) someone can address me the debugging that could be helpful also for other users? There are no apparent errors that would be related to failed mount of the root filesystem. Logs in rdsosreport-pre-mount.txt.gz contain loading of btrfs module and first device scan, the mount attempt is not there. rdsosreport-pre-mount-2.txt.gz shows a successful mount, then systemd drops to the emergency shell. journal.gz seems from the POV of a filesystem. (In reply to David Sterba from comment #3) > journal.gz seems from the POV of a filesystem. ... seems ok ... There are some errors on usb device 11-1, but otherwise nothing suspicious. [52.130683] casaregno kernel: BTRFS info (device md127): disk space caching is enabled [79.076469] casaregno systemd[1]: Started Dracut Emergency Shell. The timestamp delta is 27, this looks like some timeout, but without further details. (In reply to Diego Ercolani from comment #2) > I tried to replace the "new generated" initrd with the initrd that worked > until 19th of May (before the filesystem got full) (as, as far as I know I > have the same disk geometry and hardware like before) > I have the same issue. So something happened to the fileystem?!?! Hm, I'd try the same. It's still possible that the failed update did left some package updated in half and this stops the boot. You can try to verify the installed files against rpm database by 'rpm -vVa' ("verbose verify of all packages") and look for "missing" or wrong md5 checksum Created attachment 635087 [details]
rpm -Va showing some lack of libraries (not involved in boot process)
For the usb device, yes, my motherboard is some kind of crap...
System Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: GA-990XA-UD3
I think it's the usb3 controller...
casaregno:~ # lsusb -s 11:1 -v
Bus 011 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 9 Hub
bDeviceSubClass 0 Unused
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
idVendor 0x1d6b Linux Foundation
idProduct 0x0001 1.1 root hub
bcdDevice 3.16
iManufacturer 3 (error)
iProduct 2 OHCI PCI host controller
iSerial 1 0000:00:16.0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 25
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0002 1x 2 bytes
bInterval 255
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 4
wHubCharacteristic 0x0002
No power switching (usb 1.0)
Ganged overcurrent protection
bPwrOn2PwrGood 2 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
DeviceRemovable 0x00
PortPwrCtrlMask 0xff
Hub Port Status:
Port 1: 0000.0101 power connect
Port 2: 0000.0100 power
Port 3: 0000.0100 power
Port 4: 0000.0100 power
Device Status: 0x0001
Self Powered
For the test you suggested, I tried, effectively there where some missing libraries as you can see in file rpmVa.log I attached. Anyway I recovered the missing libraries, regenerated the initrd but nothing changed
Solved! This is what I did: 1. create a logical volume to receive the btrfs root filesystem created formatting the device 2. create all the subvolume structure in the new ntrfs volume 3. copied all the files with a "cp -ax" from every subvolume 4. mount -o bind all the /dev /sys /*** to the new rootfs 5. mkinitrd 6. grub2-install /dev/sda; grub2-install /dev/sdb 7. grub2-mkconfig -o /boot/grub2/grub.cfg 8. reboot All is working now in the new partition So my conclusion is: there is something weird in the original btrfs partition that boot process cannot understand. Since I resolved the issue on my own, do you think I can trash the old partition or you need me to try to understand what has gone wrong? (but please point out what to do) Thanks for the offer, nothing from me. I don't think the filesystem is corrupted or damaged in another way, that would be indicated by some messages already, and you were able to manually mount it. Ok. I am not a fs specialist...
In future I'd like to postpone subsystem specific bugs (lvm, multipath, btrfs,...)
to submaintainers...
David: For btrfs that would be you ;)
Now worries, there are not many, but all dracut bugs counted up, it's
a lot of work with a lot specialized needed knowledge...
So what shall we do with this one?
Is it a won't fix as everything works now and we never will find out what happened or is there something we can/should do?
David: Do we still have to do something here? I have no ideas where to look futher. See comment #8. David expects the fs was really broken, not only full... |
Created attachment 634966 [details] a tar of all the files regarding this issue I have a system that is using snapper to take periodic snapshots on a btrfs filesystem configured with volumes. This btrfs is mapped on a raid1 partition (managed by mdadm as it is able to raise errors via mail in case of degradation) I experienced the classic problem with btrfs, as time goes by "df -h" show empty space while for btrfs the volume go full. To manage this situation I added another partition to btrfs and reclaimed the space with "btrfs balance start /", I made a new initrd with "mkinitrd" and then rebooted. The dracut process hang (without dropping to an emergency shell) during rootfs mount asking, on the console appear that dracut is trying to mount the correct uuid but for some reason don't work. So, I booted with a rescue disk, removed some snapper generated subvolumes to "make space" and the removed also the secondary btrfs volume (thinking that this would recover from the "boot hang") with "btrfs device delete <volume> <path>". I regenerated the initrd with mkinitrd and rebooted obtaining the same issue: the boot process hangs trying to mount rootfs. So I set the dracut commmandline to drop to a shell during boot pre-mount stage: rd.break=pre-mount This dropped correctly to shell and generated the diagnostic logs that I attached (rdsosreport-pre-mount.txt). From commandline I issued the mount command: mount /dev/disk/by-uuid/c591f436-cc33-42b1-a272-1fc85386e2cb /sysroot/ without any problem, then control-d and then the dracut droped again to the emergency shell and generated the other log I attached (rdsosreport-pre-mount-2.txt). After I simply issue control-d and then start the boot process presenting the banner Opensuse 13.2 but problems didn't solve as this time systemd complain about the fact that cannot mount (?) the root filesystem (that I mount during dracut emergency console) and drop me to a emergency login where I find that the rootfs is correctly mounted and where I started (manually) the network and the ssh daemon to access from remote (I attache also the dump of the journal regarding this issue where at line you can see the issue: May 21 08:42:24 casaregno systemd[1]: Timed out waiting for device dev-disk-by\x2dlabel-rootfs.device. Here is the environment: [/etc/fstab] ... LABEL=rootfs / btrfs compress 1 1 LABEL=rootfs /usr/local btrfs subvol=usr/local,compress 0 0 LABEL=rootfs /boot/grub2/i386-pc btrfs subvol=boot/grub2/i386-pc 0 0 #LABEL=rootfs /boot/grub2/x86_64-efi btrfs subvol=boot/grub2/x86_64-efi 0 0 #LABEL=rootfs /home btrfs subvol=home 0 0 LABEL=rootfs /opt btrfs subvol=opt 0 0 LABEL=rootfs /srv btrfs subvol=srv 0 0 LABEL=rootfs /tmp btrfs subvol=tmp 0 0 LABEL=rootfs /var/crash btrfs subvol=var/crash 0 0 #LABEL=rootfs /var/lib/mailman btrfs subvol=var/lib/mailman 0 0 #LABEL=rootfs /var/lib/named btrfs subvol=var/lib/named 0 0 #LABEL=rootfs /var/lib/pgsql btrfs subvol=var/lib/pgsql 0 0 LABEL=rootfs /var/log btrfs subvol=var/log 0 0 LABEL=rootfs /var/opt btrfs subvol=var/opt 0 0 LABEL=rootfs /var/spool btrfs subvol=var/spool 0 0 LABEL=rootfs /var/tmp btrfs subvol=var/tmp 0 0 LABEL=rootfs /.snapshots btrfs subvol=.snapshots 0 0 ... and the environment: dracut-037-17.9.1.x86_64 kernel-desktop-devel-3.16.7-21.1.x86_64 kernel-devel-3.16.7-21.1.noarch kernel-desktop-3.16.7-21.1.x86_64 btrfsmaintenance-0.1-1.1.noarch btrfsprogs-3.16.2-4.1.x86_64 plymouth-dracut-0.9.0-1.1.x86_64 kernel-source-3.16.7-21.1.noarch libbtrfs0-3.16.2-4.1.x86_64 kernel-macros-3.16.7-21.1.noarch kernel-firmware-20141122git-5.1.noarch