Bug 950999

Summary: Tumbleweed Live USB Stick broken from btrfs/kernel bug killing systemd-journal
Product: [openSUSE] openSUSE Tumbleweed Reporter: Konstantin Voinov <kv>
Component: KernelAssignee: Marcus Schaefer <ms>
Status: VERIFIED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P5 - None CC: bwiedemann, coolo, crrodriguez, cyberorg, dsterba, forgotten_-GI11M788Y, forgotten_b58K0yj8zK, hillwoodroc, jengelh, ms, rbrown, tilman.vogel, tiwai
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: SUSE Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: serial log with backtrace

Description Konstantin Voinov 2015-10-19 12:21:22 UTC
I've made a "hardware" usb flash drive with openSUSE-Tumbleweed-Rescue-CD-x86_64-Snapshot20151014-Media.iso and openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20151012-Media.iso

Both of them starts to boot fine, but stops on starting journald service, with message:

"A start job is running for Journal Service" for 8 minutes, until watchdog kills it.

In VM these isos works fine. USB drive is fully functional, works with 13.2-Rescue, for example.
Comment 1 Konstantin Voinov 2015-10-19 12:47:02 UTC
drive was prepared with dd (under Linux) and with osforensics ImageUSB
Comment 2 Bernhard Wiedemann 2015-10-26 14:02:55 UTC
I can reproduce it here.
Both on hardware and virtual using

dd if=openSUSE-Tumbleweed-Rescue-CD-x86_64-Snapshot20151017-Media.iso of=test.img bs=64k
dd if=/dev/zero bs=1M count=1000 >> test.img
qemu-kvm -hda test.img -m 1000 -vnc :0 -monitor stdio

I guess we do not cover live images on USB/HDD in openQA testing...

btw: leaving out the added zeroes makes it work
(not an option for physical USB though)
Comment 3 Dr. Werner Fink 2015-10-27 08:12:25 UTC
IMHO you should disable journaling on a USB flash drive. That is remove /var/log/journal and set Storage=volatile in /etc/systemd/journald.conf

-> man:systemd-journald.8.gz
-> man:journald.conf(5)

Otherwise systemd-journald tries to write out it journal below /var/log/journal/<machine-id> and do a memory mapping of the log file.

IMHO this is INVALID
Comment 4 Bernhard Wiedemann 2015-10-27 11:36:12 UTC
via serial console I found out that it is a btrfs/kernel fault
that crashes systemd-journald
Comment 5 Bernhard Wiedemann 2015-10-27 11:36:42 UTC
Created attachment 653346 [details]
serial log with backtrace
Comment 6 Dr. Werner Fink 2015-10-27 12:14:07 UTC
(In reply to Bernhard Wiedemann from comment #4)

BtrFS on a USB flash drive, ouch.
Comment 7 David Sterba 2015-10-27 12:38:03 UTC
(In reply to Dr. Werner Fink from comment #6)
> BtrFS on a USB flash drive, ouch.

Why?
Comment 8 Bernhard Wiedemann 2015-10-28 08:36:43 UTC
The download itself only contains an ISO9660 filesystem,
but for some reason kiwi creates a btrfs on the remaining space
file says it is
> BTRFS Filesystem (label "hybrid", sectorsize 4096, nodesize 4096, leafsize 4096)

and probably overlays it as persistent storage
instead of the tmpfs used when booting it as CDROM.

And that is where the btrfs/kernel invalid pointer dereference happens
and brings the whole boot process to a standstill.


(In reply to David Sterba from comment #7)
> (In reply to Dr. Werner Fink from comment #6)
> > BtrFS on a USB flash drive, ouch.
> 
> Why?

I think, it is not good because systemd-journal
(same as old syslogd)
does synchronous writes to the filesystem for each new entry,
which causes a large write-amplification factor
combined with the fact that most USB flash drives
have very poor wear leveling algorithms
which means that you can physically destroy the flash hardware
within a few days of operating this way.
Comment 9 Takashi Iwai 2015-11-05 11:59:24 UTC
(In reply to Bernhard Wiedemann from comment #8)
> The download itself only contains an ISO9660 filesystem,
> but for some reason kiwi creates a btrfs on the remaining space
> file says it is
> > BTRFS Filesystem (label "hybrid", sectorsize 4096, nodesize 4096, leafsize 4096)
> 
> and probably overlays it as persistent storage
> instead of the tmpfs used when booting it as CDROM.

Yes, it's kiwi hybrid image, and what for live image.  You can save the data there.  In the past, it was ext3/4, but it seems that btrfs is taken as the default fs there, too.
 
> And that is where the btrfs/kernel invalid pointer dereference happens
> and brings the whole boot process to a standstill.
 
Right, this doesn't look good.  Do you still see this with the latest image?

I wonder whether we can test it on KVM somehow...

> (In reply to David Sterba from comment #7)
> > (In reply to Dr. Werner Fink from comment #6)
> > > BtrFS on a USB flash drive, ouch.
> > 
> > Why?
> 
> I think, it is not good because systemd-journal
> (same as old syslogd)
> does synchronous writes to the filesystem for each new entry,
> which causes a large write-amplification factor
> combined with the fact that most USB flash drives
> have very poor wear leveling algorithms
> which means that you can physically destroy the flash hardware
> within a few days of operating this way.

Well, btrfs isn't worse than others in this regard, even better.
But it's quite off topic, so let's stop here.  Only if there is a very solid reason we shouldn't use btrfs for USB flash, it's worth to discuss.  But then better in other place than Bugzilla.
Comment 10 Jeff Mahoney 2015-11-06 19:22:48 UTC
(In reply to Bernhard Wiedemann from comment #5)
> Created attachment 653346 [details]
> serial log with backtrace

This looks like a regression introduced in 4.2 that broke the assumption that file->f_path.dentry->d_inode is the same as file->f_inode.  What happens is that ->fsync gets a file pointer with file->f_path.dentry->d_inode that points to an overlayfs inode instead of a btrfs inode.  What you see is the aftermath.

I can work around it, partially, but I'm discussing whether that breakage was intentional on linux-fsdevel before committing to anything.  Until then, Overlayfs is just unsafe to use on btrfs.
Comment 11 Takashi Iwai 2015-11-16 16:54:52 UTC
*** Bug 955085 has been marked as a duplicate of this bug. ***
Comment 12 Takashi Iwai 2015-11-16 16:55:38 UTC
This bug seems hitting also on Leap.  So, don't forget to backport to 4.1.x kernel, if any, too.
Comment 13 Takashi Iwai 2015-11-16 20:33:04 UTC
(In reply to Jeff Mahoney from comment #10)
> (In reply to Bernhard Wiedemann from comment #5)
> > Created attachment 653346 [details]
> > serial log with backtrace
> 
> This looks like a regression introduced in 4.2 that broke the assumption
> that file->f_path.dentry->d_inode is the same as file->f_inode.

bug 955085 shows a similar log but with Leap 4.1.12 kernel.  So the cause exists even before 4.2?
Comment 14 Takashi Iwai 2015-11-20 11:32:09 UTC
(In reply to Takashi Iwai from comment #13)
> (In reply to Jeff Mahoney from comment #10)
> > (In reply to Bernhard Wiedemann from comment #5)
> > > Created attachment 653346 [details]
> > > serial log with backtrace
> > 
> > This looks like a regression introduced in 4.2 that broke the assumption
> > that file->f_path.dentry->d_inode is the same as file->f_inode.
> 
> bug 955085 shows a similar log but with Leap 4.1.12 kernel.  So the cause
> exists even before 4.2?

OK, this turned out to be the backport in stable 4.1.x tree.  The very same affecting commit 4bacc9c9234c is backported there.
Comment 15 Takashi Iwai 2015-11-20 11:34:43 UTC
Meanwhile, we can work around it by changing the fs type to another one, e.g. ext4.  It's a storage for a live image, and the snapshot or other nice features aren't so important.  It's often better just to have a lighter fs instead.

Coolo, could you modify the kiwi config.xml to specify the filesystem attribute, if not done yet?
Comment 16 Stephan Kulow 2015-11-20 12:54:23 UTC
I don't really want to. this is the default of the kiwi template. If btrfs is unusable for that, kiwi's default should change.
Comment 17 Takashi Iwai 2015-11-20 13:02:02 UTC
(In reply to Stephan Kulow from comment #16)
> I don't really want to. this is the default of the kiwi template. If btrfs
> is unusable for that, kiwi's default should change.

btrfs is unable for that only for 4.1.x and later kernels, i.e. Leap and TW.
I don't mind where to change, but this should be a short stop-gap until the real fix is done, so it's easier to apply in the upper level config in general.
Comment 18 Bernhard Wiedemann 2015-11-26 07:29:09 UTC
Marcus, can you please change the default hybrid overlay fs from btrfs to ext4
so that we get working USB Live images again?
Comment 21 Takashi Iwai 2015-11-26 09:54:36 UTC
The issue hits on Leap, too.  Could you backport the fix and submit to Leap maintenance?
Comment 22 Bernhard Wiedemann 2015-11-26 10:00:15 UTC
This is an autogenerated message for OBS integration:
This bug (950999) was mentioned in
https://build.opensuse.org/request/show/346339 Factory / kiwi
https://build.opensuse.org/request/show/346341 42.1 / kiwi
Comment 23 Swamp Workflow Management 2015-12-08 11:12:53 UTC
openSUSE-RU-2015:2225-1: An update that has 5 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 944017,946387,949046,950735,950999
CVE References: 
Sources used:
openSUSE Leap 42.1 (src):    kiwi-7.03.38-5.4
Comment 24 Forgotten User -GI11M788Y 2015-12-30 01:05:21 UTC
*** Bug 960229 has been marked as a duplicate of this bug. ***
Comment 25 Tilman Vogel 2016-02-03 16:47:43 UTC
As of today, the rescue image for x86_64 linked here still exhibits the problem: 

https://en.opensuse.org/openSUSE:Tumbleweed_installation#Intel_.2864-bit_and_32-bit.29_2

Is there a newer image available?
Comment 26 Bernhard Wiedemann 2016-02-05 08:36:42 UTC
reopening - there is still a btrfs filesystem created
for the hybrid overlay partition.
Comment 27 Marcus Schaefer 2016-02-05 09:19:34 UTC
Have you set hybridpersistent_filesystem="ext4" in the configuration ?
which obs project builds this image ?
Comment 28 Jigish Gohil 2016-02-05 09:28:05 UTC
It is built from config in https://build.opensuse.org/project/show/openSUSE:Factory:Live

Changes mentioned in this bug report are included in:

https://build.opensuse.org/request/show/357822
Comment 29 Marcus Schaefer 2016-02-05 09:39:09 UTC
Thanks

   openSUSE:Factory:Live/kiwi-config-openSUSE/config.xml.in

Add in the type section

   hybridpersistent_filesystem="ext4"
Comment 30 Bernhard Wiedemann 2016-03-01 07:17:46 UTC
confirmed that Rescue CD works now