Bug 807188

Summary: kernel panic after resume in fat_detach
Product: [openSUSE] openSUSE 12.3 Reporter: Juergen Weigert <jw>
Component: HotplugAssignee: Jan Kara <jack>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: jeffm, oneukum
Version: RC 1   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on:    
Bug Blocks: 357354    

Description Juergen Weigert 2013-03-03 13:36:46 UTC
An external VGA monitor was connected and an sdcard was left in the slot 
when the machine went into s2ram.
When resuming, this happened:

[29232.811649] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[29232.811659] e1000e 0000:02:00.0 eth0: 10/100 speed: disabling TSO
[29232.811771] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[29234.954065] mmc0: card b368 removed
[29234.958009] sdhci-pci 0000:15:00.2: Will use DMA mode even though HW doesn't fully claim to support it.
[29234.992434] VFS: Busy inodes after unmount of mmcblk0p1. Self-destruct in 5 seconds.  Have a nice day...
[29234.993063] BUG: unable to handle kernel NULL pointer dereference at 000000b4
[29234.993138] IP: [<c0707cc1>] _raw_spin_lock+0x11/0x30
[29234.993190] *pdpt = 0000000035e7b001 *pde = 0000000000000000 
[29234.993243] Oops: 0002 [#1] PREEMPT SMP 
[29234.993291] Modules linked in: nls_iso8859_1 nls_cp437 lp parport_pc ppdev parport usblp cdc_acm btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix vfat msdos fat jfs xfs reiserfs dm_mod fuse af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave snd_hda_codec_analog snd_hda_intel snd_hda_codec acpi_cpufreq snd_hwdep snd_pcm mperf thinkpad_acpi snd_seq coretemp kvm_intel btusb kvm snd_timer mmc_block sdhci_pci sdhci pcmcia bluetooth i2c_i801 snd_seq_device iTCO_wdt iTCO_vendor_support snd mmc_core lpc_ich sg yenta_socket pcmcia_rsrc arc4 e1000e iwl3945 iwlegacy mac80211 mfd_core firewire_ohci firewire_core microcode pcspkr pcmcia_core nsc_ircc irda soundcore cfg80211 crc_itu_t rfkill crc_ccitt snd_page_alloc battery ac tpm_tis tpm tpm_bios autofs4 i915 drm_kms_helper thermal drm i2c_algo_bit button processor video thermal_sys scsi_dh_alua scsi_dh_rdac scsi_dh_emc scsi_dh_hp_sw scsi_dh ata_generic pata_acpi ata_piix
[29234.994022] Pid: 1093, comm: gvfsd-trash Not tainted 3.7.10-1.1-desktop #1 LENOVO 1704R8G/1704R8G
[29234.994022] EIP: 0060:[<c0707cc1>] EFLAGS: 00010202 CPU: 1
[29234.994022] EIP is at _raw_spin_lock+0x11/0x30
[29234.994022] EAX: 000000b4 EBX: d23d3150 ECX: 00000002 EDX: 00000100
[29234.994022] ESI: 00000000 EDI: 000000b4 EBP: f1b75f3c ESP: f09fff5c
[29234.994022]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[29234.994022] CR0: 8005003b CR2: 000000b4 CR3: 35ef9000 CR4: 000007f0
[29234.994022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[29234.994022] DR6: ffff0ff0 DR7: 00000400
[29234.994022] Process gvfsd-trash (pid: 1093, ti=f09fe000 task=f1b250b0 task.ti=f09fe000)
[29234.994022] Stack:
[29234.994022]  f836a476 d23d3150 d23d31ec f836d400 c035af1d f5eebb80 f1b75f00 f5eebb94
[29234.994022]  c037d839 00000000 d23d3150 f087a480 f5eebb80 f1b75f4c 00000001 c037f2dc
[29234.994022]  00000001 00000007 08b13228 08b13198 f09fe000 c0707ee0 00000007 00000009
[29234.994022] Call Trace:
[29234.994022]  [<f836a476>] fat_detach+0x26/0xd0 [fat]
[29234.994022]  [<c035af1d>] evict+0x8d/0x150
[29234.994022]  [<c037d839>] fsnotify_destroy_mark+0x149/0x170
[29234.994022]  [<c037f2dc>] sys_inotify_rm_watch+0x5c/0xa0
[29234.994022]  [<c0707ee0>] syscall_call+0x7/0xb
[29234.994022]  [<b7387db1>] 0xb7387db0
[29234.994022] Code: 26 00 f3 90 0f b6 11 38 c2 75 f7 c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 89 e2 81 e2 00 e0 ff ff 83 42 14 01 ba 00 01 00 00 <f0> 66 0f c1 10 0f b6 ce 38 d1 74 0c 8d 76 00 f3 90 0f b6 10 38
[29234.994022] EIP: [<c0707cc1>] _raw_spin_lock+0x11/0x30 SS:ESP 0068:f09fff5c
[29234.994022] CR2: 00000000000000b4
[29235.093957] ---[ end trace 88265abf28ed2f16 ]---
[29235.093965] note: gvfsd-trash[1093] exited with preempt_count 1
Comment 1 Jeff Mahoney 2013-07-15 18:57:20 UTC
Ok, the real problem here isn't the oops, it's this:

[29234.992434] VFS: Busy inodes after unmount of mmcblk0p1. Self-destruct in 5
seconds.  Have a nice day...


... anything can happen once we see that message.
Comment 2 Jan Kara 2013-07-16 20:22:32 UTC
Yeah, and I'd also add that before that is line:

[29234.954065] mmc0: card b368 removed

So what likely happened is that after resume, we've got event about removal of the card, that tried to unmount the filesystem on the card but some inodes were busy (it is enough there were e.g. open files on the fs) so that failed.

I'm not sure if the remove request from the card reader is OK - it makes some sense since while the machine was suspended, someone could have removed the card and put there a different one but OTOH we have no way of unmounting a filesystem when it's used (we'd need revoke support for that) so a removal event in such case will likely result in oopses like above.

Maybe Oliver will know whether removal event from the card reader is expected or not. Oliver?
Comment 3 Oliver Neukum 2013-07-16 20:44:32 UTC
(In reply to comment #2)

> Maybe Oliver will know whether removal event from the card reader is expected
> or not. Oliver?

USB card reader almost invariably report a medium change. PCI readers usually do. But in any case surprise removal must never crash the system, for whatever reason it may happen.
Comment 4 Jeff Mahoney 2013-07-16 21:00:17 UTC
A removal event shouldn't result in a crashed kernel -- it should be handled like a disk path being lost. Spew I/O errors all day, but we shouldn't be getting to the point where we're allowing a umount to succeed with open inodes. That seems like we're missing some refcounting.
Comment 5 Jan Kara 2013-07-16 22:04:48 UTC
Yeah, right. The umount should have failed in the first place. Seeing that we oopsed when running sys_inotify_rm_watch() it seems that inotify was holding the inode reference that prevented evict_inodes() from removing all the inodes. However fsnotify_unmount_inodes() should have dropped all the references that inotify was holding... except if there's a bug in fsnotify code and we can grab inode reference twice. But then I would guess we would see the same bug more often (it's not limited to fat & suspend). Strange. Anyway, I'll check inotify code in more detail tomorrow.
Comment 6 Jan Kara 2013-07-17 11:57:22 UTC
OK, so this is really a bug in fsnotify and it has been fixed upstream by changing locking of fsnotify (merge commit 96680d2b9174668100824d763382240c71baa811). Since they went into 3.8-rc1, it should be relatively easy to backport (although it's quite a few patches).
Comment 7 Jan Kara 2013-07-17 13:06:10 UTC
OK, I've just pushed 11 relevant patches from the merge to openSUSE-12.3 tree.
Since this is hard to trigger race, I don't think waiting for reproduction is
reasonable. So I'm closing the bug now.
Comment 8 Jan Kara 2013-07-17 13:06:36 UTC
Ah, forgot to set proper state after mid-air collision...
Comment 9 Swamp Workflow Management 2013-12-30 20:06:12 UTC
openSUSE-SU-2013:1971-1: An update that solves 34 vulnerabilities and has 19 fixes is now available.

Category: security (moderate)
Bug References: 799516,801341,802347,804198,807153,807188,807471,808827,809906,810144,810473,811882,812116,813733,813889,814211,814336,814510,815256,815320,816668,816708,817651,818053,818561,821612,821735,822575,822579,823267,823342,823517,823633,823797,824171,824295,826102,826350,826374,827749,827750,828119,828191,828714,829539,831058,831956,832615,833321,833585,834647,837258,838346
CVE References: CVE-2013-0914,CVE-2013-1059,CVE-2013-1819,CVE-2013-1929,CVE-2013-1979,CVE-2013-2141,CVE-2013-2148,CVE-2013-2164,CVE-2013-2206,CVE-2013-2232,CVE-2013-2234,CVE-2013-2237,CVE-2013-2546,CVE-2013-2547,CVE-2013-2548,CVE-2013-2634,CVE-2013-2635,CVE-2013-2851,CVE-2013-2852,CVE-2013-3222,CVE-2013-3223,CVE-2013-3224,CVE-2013-3226,CVE-2013-3227,CVE-2013-3228,CVE-2013-3229,CVE-2013-3230,CVE-2013-3231,CVE-2013-3232,CVE-2013-3233,CVE-2013-3234,CVE-2013-3235,CVE-2013-3301,CVE-2013-4162
Sources used:
openSUSE 12.3 (src):    kernel-docs-3.7.10-1.24.1, kernel-source-3.7.10-1.24.1, kernel-syms-3.7.10-1.24.1