Bugzilla – Bug 332588
libata: pata_ali: exceptions and timeouts in log and extremely slow system
Last modified: 2008-04-03 04:55:31 UTC
On a HP compaq nx9005 the pata_ali driver now gets used by default for the pata devices (hdd and combi-cdrom). Besides the systems now takes ages to boot and feels _very_ slow on usage now, i see lots of ata2.00: exception ..... frozen (timeout) and ata2: soft resetting link (when those are shown the system (e.g. the bootup) pauses) Further - only with pata_ali active - i get a warning after bootup that smart would report 66 unreadable sectors. But without libata i don't get this error. Disabling libata (with hwprobe=-modules.pata) makes all those symptoms disappear (it then bootsup really fast without such errors). Will attach more detailed info and logs soon. Could we fix this or at least blacklist this controller?
Created attachment 177452 [details] lspci -v of nx9005
Created attachment 177453 [details] hwinfo --storage-ctrl of nx9005
Created attachment 177454 [details] dmesg of boot with 10.3-GM default kernel with "time" bootparameter given
In the beginning those pauses are "just" about five seconds each: --- [ 2.604000] ata1.00: ATA-6: IC25N040ATMR04-0, MO2OAD5A, max UDMA/100 [ 2.620000] ata1.00: 78140160 sectors, multi 16: LBA48 [ 2.652000] ata1.00: configured for UDMA/100 [ 2.988000] ata2.00: ATAPI: HL-DT-STCD-RW/DVD DRIVE GCC-4241N, 0C29, max MWDMA2 [ 3.176000] ata2.00: configured for MWDMA2 [ 3.192000] scsi 0:0:0:0: Direct-Access ATA IC25N040ATMR04-0 MO2O PQ: 0 ANSI: 5 [ 8.712000] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen --- but this one is really nasty.. especially as it happens always, on every bootup (while suspend/hibernate don't work yet) and it also seems to "pause" the system sometimes in between/while normal operation.. --- [ 38.164000] ALSA sound/pci/ali5451/ali5451.c:1935: ali mixer 1 creating error. [ 63.800000] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 63.816000] ata2.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data 128 in [ 63.816000] res 40/00:03:00:00:20/00:00:00:00:00/a0 Emask 0x4 (timeout) [ 63.856000] ata2: soft resetting link [ 64.368000] ata2.00: configured for MWDMA1 [ 64.384000] ata2: EH complete [ 94.404000] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 94.420000] ata2.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x5a data 128 in [ 94.420000] res 40/00:03:00:00:20/00:00:00:00:00/a0 Emask 0x4 (timeout) [ 94.460000] ata2: soft resetting link [ 94.968000] ata2.00: configured for MWDMA1 [ 94.984000] ata2: EH complete [ 125.004000] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ---
We now have this machine available here. Please let me know how i can help on this or if you'd need this machine so we could send it to you.
Does kernel parameter "libata.pata_dma=1" help?
Yes, absolutely. Booted with this parameter there are no more pauses, no ata exceptions and also the smart-warning is gone.
Created attachment 177608 [details] dmesg with libata.pata_dma=1 given on boot
Thanks. Can you post kernel boot log with IDE driver (brokenmodules=pata_ali or from SL102)?
Created attachment 177638 [details] bootlog with ali15x3 ide driver A 10.2 installation wasn't on the system and brokenmodules=pata_ali didn't work on the 10.3 system to any reason (even with rebuilt initrd etc.), but i got those boot messages out of a 10.3 rescue-system booted with brokenmodules=pata_ali
And you can access ODD without any problem from the rescue boot, right? libata currently has quite some problems with MWDMA devices on various drivers. There definitely is something wrong with MWMDA handling but we don't know what yet. I'll dig into the code. Thanks.
ODD means the dvd/cd-drive? yes, i can access it eventhough it feels very slow. Would it help to send you the machine?
Yeah, ODD means dvd/cd-drives and it would be great if you can send the machine but as I probably am half-globe away from you, I'll look into the code a bit first. I'll let you know if I get stuck. Thanks.
Hi, same problem on old ACER 210TER. Need brokenmodules=pata_ali or installation hangs on loading pata module. Extract from dmesg: ata2: soft resetting link ata2.01: failed to IDENTIFY (I/O error, err_mask=0x1) ata2: failed to recover some devices, retrying in 5 secs ata2: soft resetting link ata2.01: failed to IDENTIFY (I/O error, err_mask=0x1) ata2: failed to recover some devices, retrying in 5 secs ata2: soft resetting link ata2.01: failed to IDENTIFY (I/O error, err_mask=0x1) ata2: failed to recover some devices, retrying in 5 secs ata2: soft resetting link ata2.00: configured for MWDMA1 ata2: EH complete ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata2.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x28 data 4096 in res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) This nootebook run suse since 7.1. Starting from 10.0 needs ACPI=off to install/boot After installation I can provide more info and log if needed..
Frank, does installing KOTD fix the problem? ie. if you install KOTD and remove libata.pata_dma=1, does the system keep working? Daniele, you can install by adding the following to boot parameter. options="libata=pata_dma=1" After installation, update to KOTD (kernel of the day), remove libata.pata_dma=1 kernel parameter and test and report whether it works. ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/ Thanks.
Sorry, but the kotd (2.6.23.1-20071013190126-default) doesn't seem to solve this. I installed it, rebooted (without the libata.pata_dma=1) and again have the old behaviour (exceptions, timeouts etc.).
Created attachment 178957 [details] dmesg output with kotd (just in case ...)
# 15 Sorry, too late, already installed with brokenmodules=pata_ali. Let me know if I can do something (reinstalling no, please..) to help you.
Created attachment 179154 [details] libata_scsi-Fix-ATAPI-transfer-lengths.patch Does the attached patch fix the problem?
Created attachment 179213 [details] dmesg of boot with last patch and without libata.pata_dma=1 I just built a testkernel with this patch and tried it on that machine, but it still shows the same behaviour with this patch (without using pata_dma=1).
Created attachment 179247 [details] atapi-dma-for-rw-only.patch Please test this one. Detection should work just fine. Please also test reading, recording and ripping CDs. Keep an eye on cpu usage and r/w speed and report any anomalies. Thanks.
Created attachment 179432 [details] dmesg of boot with both patches applied Well, after adding and rebuilding the kernel with this new patch (means having both applied) i can boot without errors and timeouts (while not using libata.pata_dma=1). BUT, as soon as i put a CD in the discdrive i get those errors and pauses again (possibly also due to hal trying to immediatelly automount the medium). It seems not possible to really read a disc this way and didn't dare to try writing to a medium this way.
Created attachment 179433 [details] dmesg output while inserting a CD into discdrive (having both patches applied to currently running kernel)
another note: while the machines tries to access the medium the load and cpu-usage climbs up rapidly (top tells new 100% (IO)wait and load nearly reaches 2.0)
Yeah, as IO is halted for quite some secs, those reactions are expected. There are some differences in mwdma mode programming between IDE alim driver and pata_ali. The IDE driver just uses BIOS programmed values while the libata one tries to configure by itself, apparently incorrectly. Can you please post the result of "lspci -nnvvvxxx" from both the IDE and libata drivers? Thanks.
Created attachment 179646 [details] output of lspci -nnvvvxxx with libata loaded
Created attachment 179647 [details] output of lspci -nnvvvxxx with old IDE
Hi, any news here ? On live-cd there's no workaround. hwprobe=-modules.pata doesn't help brokenmodules=pata_ali doesn't help and it seems avaible only in installation media No live cd on my nootebook :(
Created attachment 183935 [details] bug332588-pata_ali-timing-update.patch Please apply the attached patch and report whether it fixes the problem && the result of "lspci -nnvvvxxx". Also, with alim15x3 driver loaded, please do "dd if=/dev/sr0 of=/dev/null bs=1M count=32" with a data cd inserted and post the result of "dmesg" and the dd command (it will report how fast it transferred). Thanks.
Patch for installed system ? It works well (installed with brokenmodules..). The big problem is with live-cd where workaround do not work. My nootebook is very slow, rebuilding the whole kernel takes a lot of time so I'll do ASAP..
We first need to determine what's wrong to properly fix the problem. Doesn't "hwprobe=-modules.pata" work with live cd?
Daniele: No the patch was meant for me to test not for a installed system ;-) Tejun: I applied this patch (from comment #29) to a fresh installed 10.3 kernelsource (2.6.22.12-2-default). Or was this patch meant to be applied on top of all others (already posted by you here)? With this patch applied i couldn't see a improvement now. I'll attach the logs in few moments.
Created attachment 184062 [details] output of lspci -nnvvvxxx with libata/pata_ali loaded and this new patch
Created attachment 184064 [details] output of lspci -nnvvvxxx with alim15x4 probably the same as in comment #27, but anyway ;-)
Created attachment 184065 [details] output of dd with pata_ali (and last patch)
Created attachment 184066 [details] output of dd with alim15x4 (and same medium as from dd with pata_ali)
Created attachment 184067 [details] dmesg output while dd was run (with pata_ali and the last patch)
Comment on attachment 184065 [details] output of dd with pata_ali (and last patch) (arg, bugzilla seems to have a problem with this attachment... trying to repost)
Created attachment 184069 [details] output of dd with pata_ali (and last patch) - 2nd try
Frank, can you please ship the hardware to me? I'll send you my address via email. Thanks.
HI guys, bug still present in opensuse-11 (factory alpha-3). Any news about it ?
I've spent quite some time with the hardware but it looks there's no reason pata_ali doesn't work when the IDE driver does. I also asked around other ATA developers but they were lost about it too. I still have the hardware in my room. I'll give it another shot and if it doesn't get resolved I'm afraid I'll have to disable DMA for pata_ali for openSUSE 11.
Okay, did another round of testing with various modifications. Still no luck. It seems we'll have to disable pata_ali ATAPI DMA for SL110. I'm also forwarding patch to disable ATAPI DMA on pata_ali. Resolving as FIXED. If anyone has an idea about something to try, please lemme know. :-(