Bug 590744

Summary: Unclean ext3 after online update
Product: [openSUSE] openSUSE 11.3 Reporter: Forgotten User UiLQSAEfTt <forgotten_UiLQSAEfTt>
Component: KernelAssignee: Jan Kara <jack>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: forgotten_UiLQSAEfTt, jeffm, ricreig
Version: Final   
Target Milestone: Final   
Hardware: i586   
OS: openSUSE 11.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: screenshot of fsck.ext3

Description Forgotten User UiLQSAEfTt 2010-03-24 09:10:34 UTC
Created attachment 350223 [details]
screenshot of fsck.ext3

User-Agent:       Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.8) Gecko/20100219 Firefox/3.5.8

I installed OpenSUSE 11.2 AMD64 minimal on an Opteron with sata_nv.

Partitions:
/dev/sda1 128M ext3 /boot
/dev/sda2 16G swap -
/dev/sda3 rest ext3 /

the first boot works fine.


Reproducible: Always

Steps to Reproduce:
1. install Opensuse
2.zypper ref
3.zypper up;sync;reboot
Actual Results:  
after reboot, journal has to be recovered on /dev/sda3

Expected Results:  
clean fs

I am almost sure this happens on other SATA-Controllers (e.g. ahci) as well.
Comment 2 Forgotten User UiLQSAEfTt 2010-03-29 08:46:17 UTC
I tested on Core2 Duo with ICH9DO and Xeon with ICH10 in AHCI mode- same issue
Comment 4 Jeff Mahoney 2010-04-02 15:00:21 UTC
The important bit isn't the booting part, it's the shutdown part. Can you capture that somehow? I expect it's going to say something about the device staying busy, and if that's the case, we'll need to identify what didn't get shut down properly.
Comment 5 Forgotten User UiLQSAEfTt 2010-04-07 09:22:34 UTC
I don't see anything strange.

http://dedi3.fuckner.net/~molli123/temp/update.avi (9MB)
Comment 6 Forgotten User UiLQSAEfTt 2010-06-04 06:45:06 UTC
is there any more info I can provide?
Comment 7 Richard Creighton 2010-07-14 23:24:36 UTC
I have a problem that seems related.   It seems to be related to the unmount at shutdown and in my case affects my LVM.

In my case, my LVM handles /home which is ALWAYS fsck'ed when I reboot because it isn't cleanly unmounted.    The other day, I tried a 'umount /home' as root and fsck'ed it and it was bad without the reboot.   Therefore, I think that even if the shutdown corrupts the LVM, it is not because of the shutdown, it is because of the unmount operation itself, and like you, I think this may well be a kernel issue as I doubt 'umount' itself would cause this.

Now, so far, if I switch to root after logging out of the user acct and forego umounting the /home directory (on the LVM), a fsck shows clean but umount it and re-run the fsck, more often than not it tells me it is not clean and the full fsck is required.   I have run fsck with the Badblocks check and it found no bad blocks on the LVM.

This problem was first noticed on 11.3 M7 -> RC1 update.   I have tested every drive in the LVM with a separate diagnostic tool on another machine and it passes every test and these are quite new drives (WD 1TB drives).  I have also tested overnight with 2 different memory testers (2Gb AMD dual-core 4000+).   I don't believe this to be a hardware problem.

FWIW, it is currently formatted EXT4 but it did the same thing as EXT3.

If this is a different problem, I will file another bug report, but I believe it is a kernel problem that shows during umount, most often noticed when shutting down, which seems to be the theme of this bug report.
Comment 8 Jeff Mahoney 2010-09-03 17:28:53 UTC
openSUSE 11.2 is in security-maintenance mode. Please reopen if this issue still occurs with 11.3 or Factory.
Comment 9 Richard Creighton 2010-09-03 18:12:42 UTC
(In reply to comment #8)
> openSUSE 11.2 is in security-maintenance mode. Please reopen if this issue
> still occurs with 11.3 or Factory.

The problem DID NOT go away with the release of 11.3 GM

This seems to be part of a more general problem with LVM but manifests itself in this particularly annoying manner.  From my experiments, I don't think it is the umount program, problems with the partitioner involving LVM, and spurious errors detected during fsck of the LVM, never in the same block/sector twice and badblk checks verify the drives used (on another machine and on the same machine outside of LVM environment)

The problem is *worse* with EXT4 but exists in EXT3.  The drives are all WDigital, a mixture of 1TB and 400G previously used in my raid5 and system area and now in use as LVM for /tmp /var/log /multimedia/workarea.  The raid was upgraded to all 1TB drives for the raid5.   All drives used in the LVM have repeatedly been checked for badblocks on the 'affected' machine and on a 2nd machine by physically moving them.   One machine uses an ASUS mbd with internal sata controller, the other uses a RocketRaid controller card, but neither finds problems with any of the drives/partitions in the LVM, only when used IN a LVM and the problem also occurs when I use substitute temporary drives...eg, I don't think it is a hardware problem, and as I noted, I first noted it in 11.3 RC1 and prior to that, it had been working for me.
Comment 10 Jan Kara 2011-04-26 14:55:28 UTC
Richard, can you share more details about the problem? If I understand your comments, you say that if you umount the /home directory and then run e2fsck, it finds problems with the filesystem? Can you attach here output of e2fsck? Thanks.
Comment 11 Jan Kara 2011-05-04 19:40:56 UTC
Ping?
Comment 12 Jan Kara 2011-05-09 10:21:40 UTC
OK, Richard sent me private email saying he has moved away from 11.3 and does not observe the problem anymore because he does not reboot his 11.4 machines. The report of Michael looks very much like bug 450196 so I'm closing this as a duplicate.

*** This bug has been marked as a duplicate of bug 450196 ***