Bugzilla – Bug 386555
High number of hard drive load cycles on notebooks
Last modified: 2008-07-04 14:26:51 UTC
Following what I read about some users experiences with the latest release of Fedora and Ubuntu, having "clicks" and high load cycles counts on certain laptops (see https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695 for example), I asked for some testing in Beta 2. One hundred load cycles per hour were reported on a Lenovo R61 using openSUSE 11 beta 2, as you can read here: http://lists.opensuse.org/opensuse-factory/2008-05/msg00066.html I think further investigation is required, being this issue quite serious, considering its impact on the disk life. Regards, Alberto
Aieeeeeeeee.......... This is nasty. The BIOS / drive vendors are setting APM to aggressive values w/ idle IO access pattern of windows on mind. The setting is too aggressive to the point of being fragile and different idle IO pattern causes the drive to unload like crazy && you can't really expect Linux or any other operating system to have similar idle IO pattern as windows. :-( The interesting part is that such aggressive settings can cause crazy unloads even on Windows. My friend's tp tablet does crazy unloadings under windows but not under SL103. Maybe the vaccine program my friend is running makes the idle io pattern different or something. So, this basically is the vendor excessively fine-tuning their hardware to achieve slightly better battery benchmark and the 'excessive' part, unsurprisingly, breaking down when the circumstance changes. This is tough to solve. The ATA APM feature set is vague about which values means what and, expectedly, not all drive manufacturers follow even the vague outlines. Vendors tuning their hardwares 'excessively' know which drives they're gonna use and can determine their settings. Value 255 is supposed to disable the feature but apparently it just wraps around to 127/8 on certain drives. Value 254 means maximum performance but we're never sure what it will do on odd ones and there are drives which seemingly overheat if APM is set to 255 or 254 - these drives are using APM to control other aspects of the drive than head in addition to loading/unloading - be it drive RPM, electric circuit power mode or whatever, which is legit as the ATA spec doesn't dictate how the power should be saved when APM is configured. So, there's no default golden value we can use to solve it. One way to solve it is to issue IO every 1 sec if no actual IO is in progress and idle timeout hasn't expired, which is a seriously ugly solution and will require considerable time and effort to get right. Another more realistic approach is to develop a blacklist of machines / drives which have this problem and turn it off during boot or resume. Hmmm... if the BIOS does it right and implements proper SATA ACPI and turns on APM via _GTF after hardresets, the blacklist somehow needs to be invoked after each reset or the driver should remember the setting and restore it. Eeeewwww........ BIOS shouldn't configure it w/ such fragile values. Please take a look at the following wiki page for more information. http://www.thinkwiki.org/wiki/Problem_with_hard_drive_clicking Ideally, this should be solved by updating the firmware or disabling APM from the BIOS but this problem will kill drives pretty fast and we better have some kind of workaround although I can't think of a good one at this point. Alberto, can you please cc other people who are seeing this problem here? Thanks.
Added three users to CC. They discussed with me about the issue on the Factory ML. Regards, Alberto
Changing status to opened.
I'd say you have a buggy hardware, too bad. Report it to system vendor. (Plus, we should be careful not to certify such broken machines). Blacklist is certainly good solution for now... but I don't think it is a "major" problem, at least it is not a _Linux_ major problem. Long-term solution would be to generate less disk activity by default. Maybe 5 seconds commits on ext3 are not that good idea? ...It will also allow longer spindowns and save power...
Pavel, I pretty much agree with you here but as this can actually kill disks which can make people genuinely miserable. I think adding minimal workaround is a good idea. Say, invoking hdparm -B w/ the correct value on certain configurations during boot and resume. Where would be the correct place to do that?
I agree it depends on hardware configuration, but saying it's not a Linux problem is just not seeing it. Defining that hardware buggy when it works OK with other operating systems like Windows and Mac Os X (also a macbook is affected, the user is in CC), is something at least funny. If I have to decide between the life of my hard disk and Linux, without a doubt I won't use Linux and keep my hardware safe. That's why, for me and others users, it is a major, if not higher severity, issue. Regards, Alberto
Lemme play a bat here. Alberto, I see where you're coming from too and I _am_ considering it seriously (w/ due amount of cursing of course). Pavel is just pointing out where the problem lies and that's usuall the place where the fix should go. Solving it at the different layer especially in this case is nasty and cleaning up a mess someone else did for a nickel is annoying, so please understand our frustration. Anyways, I'm working on a script which can be called during boot and resume and can issue some hdparm and/or smartctl commands to handle this, so we should be able to handle these weirdos soon. Please standby a bit. Thanks.
I perfectly understand your point of view, but I just represent the average users who behaves as I described. If I read that my brand new laptop will have a damaged hard drive under Linux, with all the consequences of the case (read void warranty), I won't surely use it. That's my point. It's not a question of how seriously you (at Novell) take the problem. I never wrote you're not considering it seriously. But I also think that if this hardware works correctly under Windows and Mac OS X, it should be work correctly without ugly hacks under Linux too. Just my two cents.
(In reply to comment #9 from Alberto Passalacqua) > It's not a question of how seriously you (at Novell) take the problem. I never > wrote you're not considering it seriously. But I also think that if this > hardware works correctly under Windows and Mac OS X, it should be work > correctly without ugly hacks under Linux too. Other points I can agree to but it working under mac os X is just dumb luck. The setting just isn't healthy. Let's say the vendor sets the unload timeout to 10 seconds and under nominal laptop circumstances Windows issues IOs every 8 secs unless it's completely idle. It will work fine there. Let's say mac does so every 7 to 9 seconds which will work fine too. Now, let's say linux does so every 10-12 seconds. Now you have a problem. Being idle for longer period time is a good thing which we should strive for but in this case it's fast death for the drive. This is why some people are reporting the problem goes away when laptop mode is disabled because then IOs will be issued more frequently. Such short fixed timeouts just can't generically work. It's bound to break. They had to go with longer timeout or adaptive one. So, it's not like something is wrong with linux, it's just different and the setting is way too aggressive for real world. Please think of my friend's laptop I mentioned earlier. That drive is going to be toasted pretty soon even if it's just differently configured windows.
Can we increase write back time (especially of the journal) by default, say by factor 4-10? This would lower the severity of the issue, at least a bit. BTW: I Windows probably also is waking up the disk really often over some weeks months...
pm-utils is the right place. The question is: "what is the correct value"? My observations with different drives usually show the following (all hdparm -B values): 1-127: Drive unloads heads and does auto-spindown, depending on access pattern 128-191: drive unloads heads if no access for a second or so >192: drive does not unload heads from the specs, 255 disables APM On my machine, i usually set it to 192 when on AC power and to 128 when on battery power.
As written above, I don't think there's one magic value we can use. We'll have to build a blacklist for problematic laptops (we should be able to take some from the thinkwiki page). I'm almost done with a bash script to do it. Will post it soon after adding some comments.
Created attachment 212826 [details] storage-fixup script Okay, here's the script. The comment on top of it should explain most things. I'll attach a sample rule file soon.
Created attachment 212833 [details] storage-fixup slightly updated. Turned off debug mode and misc fix up.
Created attachment 212834 [details] storage-fixup.conf Sample configuration file.
Stefan, does this look good enough? Or does pm-utils have better facility to do this? Alberto and other owners of troubled machines, can you guys please post the result of "dmidecode" and "hdparm -I /dev/sda" and which APM value works for the machine? Thanks.
No, i think that pm-utils does not have a better facility. We should just run your script at boot and after resume. Additionally, i'll add a pm-utils hook, that gets executed whenever the "set low power" function is called (with "true" or "false" as an argument, to set low power mode or to unset low power mode). This script will per default do nothing - but it will have a configuration file, where the user can enter values that will be used for AAM and APM settings. The config file will probably look something like: LOW_POWER_APM="128" HI_POWER_APM="192" LOW_POWER_AAM="" HI_POWER_AAM="" But i have to think about that.
I don't have that hardware, I remove myself from the CC list.
hello here are the results of "dmidecode" and "hdparm -I /dev/sda" from my Thinkpad T60 Linux linux 2.6.25-26-pae #1 SMP 2008-04-30 07:56:05 +0200 i686 i686 i386 GNU/Linux Running hdparm -I /dev/sda.... /dev/sda: ATA device, with non-removable media Model Number: Hitachi HTS722020K9SA00 Serial Number: 071006DP0410DTG4VSSP Firmware Revision: DC4OC54P Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5; Revision: ATA8-AST T13 Project D1697 Revision 0b Standards: Used: ATA-8-ACS revision 3f Supported: 8 7 6 5 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 390721968 device size with M = 1024*1024: 190782 MBytes device size with M = 1000*1000: 200049 MBytes (200 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Vendor, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Advanced power management level: 128 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE * Advanced Power Management feature set Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * 64-bit World wide name * IDLE_IMMEDIATE with UNLOAD * WRITE_UNCORRECTABLE_EXT command * SATA-I signaling speed (1.5Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters * unknown 76[12] Non-Zero buffer offsets in DMA Setup FIS DMA Setup Auto-Activate optimization * Device-initiated interface power management In-order data delivery * Software settings preservation * SMART Command Transport (SCT) feature set * SCT LBA Segment Access (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count supported: enhanced erase 72min for SECURITY ERASE UNIT. 74min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 5000cca53ec235f6 NAA : 5 IEEE OUI : cca Unique ID : 53ec235f6 Checksum: correct Running dmidecode.... # dmidecode 2.9 SMBIOS 2.4 present. 68 structures occupying 2250 bytes. Table at 0x000E0010. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: LENOVO Version: 79ETE1WW (2.21 ) Release Date: 02/05/2008 Address: 0xDC000 Runtime Size: 144 kB ROM Size: 2048 kB Characteristics: PCI is supported PC Card (PCMCIA) is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported ACPI is supported USB legacy is supported BIOS boot specification is supported Targeted content distribution is supported BIOS Revision: 2.33 Firmware Revision: 1.7 Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: LENOVO Product Name: 1952W5R Version: ThinkPad T60 Serial Number: L3C8047 UUID: 81E12281-482B-11CB-B37C-91674F2F476C Wake-up Type: Power Switch SKU Number: Not Specified Family: ThinkPad T60 Handle 0x0002, DMI type 2, 8 bytes Base Board Information Manufacturer: LENOVO Product Name: 1952W5R Version: Not Available Serial Number: VF0BC6730XA Handle 0x0003, DMI type 3, 13 bytes Chassis Information Manufacturer: LENOVO Type: Notebook Lock: Not Present Version: Not Available Serial Number: Not Available Asset Tag: No Asset Information Boot-up State: Unknown Power Supply State: Unknown Thermal State: Unknown Security Status: Unknown Handle 0x0004, DMI type 126, 13 bytes Inactive Handle 0x0005, DMI type 126, 13 bytes Inactive Handle 0x0006, DMI type 4, 35 bytes Processor Information Socket Designation: None Type: Central Processor Family: Other Manufacturer: GenuineIntel ID: F6 06 00 00 FF FB EB BF Version: Intel(R) Core(TM)2 CPU Voltage: 1.3 V External Clock: 167 MHz Max Speed: 2333 MHz Current Speed: 2333 MHz Status: Populated, Enabled Upgrade: None L1 Cache Handle: 0x000A L2 Cache Handle: 0x000C L3 Cache Handle: Not Provided Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Handle 0x0007, DMI type 5, 20 bytes Memory Controller Information Error Detecting Method: None Error Correcting Capabilities: None Supported Interleave: One-way Interleave Current Interleave: One-way Interleave Maximum Memory Module Size: 2048 MB Maximum Total Memory Size: 4096 MB Supported Speeds: Other Supported Memory Types: DIMM SDRAM Memory Module Voltage: 2.9 V Associated Memory Slots: 2 0x0008 0x0009 Enabled Error Correcting Capabilities: Unknown Handle 0x0008, DMI type 6, 12 bytes Memory Module Information Socket Designation: DIMM Slot 1 Bank Connections: 0 3 Current Speed: Unknown Type: DIMM SDRAM Installed Size: 2048 MB (Double-bank Connection) Enabled Size: 2048 MB (Double-bank Connection) Error Status: OK Handle 0x0009, DMI type 6, 12 bytes Memory Module Information Socket Designation: DIMM Slot 2 Bank Connections: 4 7 Current Speed: Unknown Type: DIMM SDRAM Installed Size: 1024 MB (Double-bank Connection) Enabled Size: 1024 MB (Double-bank Connection) Error Status: OK Handle 0x000A, DMI type 7, 19 bytes Cache Information Socket Designation: Internal L1 Cache Configuration: Enabled, Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 64 KB Maximum Size: 64 KB Supported SRAM Types: Synchronous Installed SRAM Type: Synchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Instruction Associativity: 8-way Set-associative Handle 0x000B, DMI type 7, 19 bytes Cache Information Socket Designation: Internal L1 Cache Configuration: Enabled, Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 64 KB Maximum Size: 64 KB Supported SRAM Types: Synchronous Installed SRAM Type: Synchronous Speed: Unknown Error Correction Type: Single-bit ECC System Type: Data Associativity: 8-way Set-associative Handle 0x000C, DMI type 7, 19 bytes Cache Information Socket Designation: Internal L2 Cache Configuration: Enabled, Socketed, Level 2 Operational Mode: Write Back Location: Internal Installed Size: 4096 KB Maximum Size: 4096 KB Supported SRAM Types: Burst Installed SRAM Type: Burst Speed: Unknown Error Correction Type: Single-bit ECC System Type: Unified Associativity: 8-way Set-associative Handle 0x000D, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: Infrared External Connector Type: Infrared Port Type: Other Handle 0x000E, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: External Monitor External Connector Type: DB-15 female Port Type: Video Port Handle 0x000F, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: Microphone Jack External Connector Type: Mini Jack (headphones) Port Type: Audio Port Handle 0x0010, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: Headphone Jack External Connector Type: Mini Jack (headphones) Port Type: Audio Port Handle 0x0011, DMI type 126, 9 bytes Inactive Handle 0x0012, DMI type 126, 9 bytes Inactive Handle 0x0013, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: Modem External Connector Type: RJ-11 Port Type: Modem Port Handle 0x0014, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: Ethernet External Connector Type: RJ-45 Port Type: Network Port Handle 0x0015, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: USB 1 External Connector Type: Access Bus (USB) Port Type: USB Handle 0x0016, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: USB 2 External Connector Type: Access Bus (USB) Port Type: USB Handle 0x0017, DMI type 8, 9 bytes Port Connector Information Internal Reference Designator: Not Available Internal Connector Type: None External Reference Designator: USB 3 External Connector Type: Access Bus (USB) Port Type: USB Handle 0x0018, DMI type 126, 9 bytes Inactive Handle 0x0019, DMI type 126, 9 bytes Inactive Handle 0x001A, DMI type 126, 9 bytes Inactive Handle 0x001B, DMI type 126, 9 bytes Inactive Handle 0x001C, DMI type 126, 9 bytes Inactive Handle 0x001D, DMI type 126, 9 bytes Inactive Handle 0x001E, DMI type 126, 9 bytes Inactive Handle 0x001F, DMI type 126, 9 bytes Inactive Handle 0x0020, DMI type 9, 13 bytes System Slot Information Designation: ExpressCard Slot 1 Type: x1 PCI Express Current Usage: Available Length: Other ID: 0 Characteristics: Hot-plug devices are supported Handle 0x0021, DMI type 9, 13 bytes System Slot Information Designation: CardBus Slot 1 Type: 32-bit PC Card (PCMCIA) Current Usage: Available Length: Other ID: Adapter 1, Socket 0 Characteristics: 5.0 V is provided 3.3 V is provided PC Card-16 is supported Cardbus is supported Zoom Video is supported Modem ring resume is supported PME signal is supported Hot-plug devices are supported Handle 0x0022, DMI type 126, 13 bytes Inactive Handle 0x0023, DMI type 126, 13 bytes Inactive Handle 0x0024, DMI type 126, 13 bytes Inactive Handle 0x0025, DMI type 10, 6 bytes On Board Device Information Type: Other Status: Disabled Description: IBM Embedded Security hardware Handle 0x0026, DMI type 11, 5 bytes OEM Strings String 1: IBM ThinkPad Embedded Controller -[79HT50WW-1.07 ]- Handle 0x0027, DMI type 13, 22 bytes BIOS Language Information Installable Languages: 1 enUS Currently Installed Language: enUS Handle 0x0028, DMI type 15, 25 bytes System Event Log Area Length: 0 bytes Header Start Offset: 0x0000 Header Length: 16 bytes Data Start Offset: 0x0010 Access Method: General-purpose non-volatile data functions Access Address: 0x0000 Status: Invalid, Full Change Token: 0x000000EF Header Format: Type 1 Supported Log Type Descriptors: 1 Descriptor 1: POST error Data Format 1: POST results bitmap Handle 0x0029, DMI type 16, 15 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: None Maximum Capacity: 2 GB Error Information Handle: Not Provided Number Of Devices: 2 Handle 0x002A, DMI type 17, 27 bytes Memory Device Array Handle: 0x0029 Error Information Handle: No Error Total Width: 64 bits Data Width: 64 bits Size: 2048 MB Form Factor: SODIMM Set: None Locator: DIMM 1 Bank Locator: Bank 0/1 Type: DDR2 Type Detail: Synchronous Speed: Unknown Manufacturer: Not Specified Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Handle 0x002B, DMI type 17, 27 bytes Memory Device Array Handle: 0x0029 Error Information Handle: No Error Total Width: 64 bits Data Width: 64 bits Size: 1024 MB Form Factor: SODIMM Set: None Locator: DIMM 2 Bank Locator: Bank 2/3 Type: DDR2 Type Detail: Synchronous Speed: Unknown Manufacturer: Not Specified Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Handle 0x002C, DMI type 18, 23 bytes 32-bit Memory Error Information Type: OK Granularity: Unknown Operation: Unknown Vendor Syndrome: Unknown Memory Array Address: Unknown Device Address: Unknown Resolution: Unknown Handle 0x002D, DMI type 19, 15 bytes Memory Array Mapped Address Starting Address: 0x00000000000 Ending Address: 0x000BFFFFFFF Range Size: 3 GB Physical Array Handle: 0x0029 Partition Width: 0 Handle 0x002E, DMI type 20, 19 bytes Memory Device Mapped Address Starting Address: 0x00000000000 Ending Address: 0x0007FFFFFFF Range Size: 2 GB Physical Device Handle: 0x002A Memory Array Mapped Address Handle: 0x002D Partition Row Position: 1 Handle 0x002F, DMI type 20, 19 bytes Memory Device Mapped Address Starting Address: 0x00080000000 Ending Address: 0x000BFFFFFFF Range Size: 1 GB Physical Device Handle: 0x002B Memory Array Mapped Address Handle: 0x002D Partition Row Position: 1 Handle 0x0030, DMI type 21, 7 bytes Built-in Pointing Device Type: Track Point Interface: PS/2 Buttons: 3 Handle 0x0031, DMI type 21, 7 bytes Built-in Pointing Device Type: Touch Pad Interface: PS/2 Buttons: 0 Handle 0x0032, DMI type 24, 5 bytes Hardware Security Power-On Password Status: Disabled Keyboard Password Status: Disabled Administrator Password Status: Disabled Front Panel Reset Status: Unknown Handle 0x0033, DMI type 32, 11 bytes System Boot Information Status: No errors detected Handle 0x0034, DMI type 131, 17 bytes OEM-specific Type Header and Data: 83 11 34 00 01 02 03 FF FF 1F 00 00 00 00 00 02 00 Strings: BOOTINF 20h BOOTDEV 21h KEYPTRS 23h Handle 0x0035, DMI type 131, 11 bytes OEM-specific Type Header and Data: 83 0B 35 00 00 00 E8 FF C5 01 01 Strings: IBM System Metrics Handle 0x0036, DMI type 131, 22 bytes OEM-specific Type Header and Data: 83 16 36 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 Strings: TVT-Enablement Handle 0x0037, DMI type 132, 7 bytes OEM-specific Type Header and Data: 84 07 37 00 01 D8 36 Handle 0x0038, DMI type 133, 5 bytes OEM-specific Type Header and Data: 85 05 38 00 01 Strings: KHOIHGIUCCHHII Handle 0x0039, DMI type 133, 17 bytes OEM-specific Type Header and Data: 85 11 39 00 30 30 2E 35 00 60 6F BF 00 00 00 00 01 Strings: Audit Boot History Handle 0x003A, DMI type 134, 13 bytes OEM-specific Type Header and Data: 86 0D 3A 00 04 07 06 20 00 00 00 00 00 Handle 0x003B, DMI type 134, 16 bytes OEM-specific Type Header and Data: 86 10 3B 00 00 41 54 4D 4C 01 01 00 00 02 01 02 Strings: TPM INFO System Reserved Handle 0x003C, DMI type 135, 13 bytes OEM-specific Type Header and Data: 87 0D 3C 00 54 50 07 00 01 00 00 00 00 Handle 0x003D, DMI type 135, 18 bytes OEM-specific Type Header and Data: 87 12 3D 00 54 50 07 01 01 A3 07 00 00 00 01 00 00 00 Handle 0x003E, DMI type 135, 35 bytes OEM-specific Type Header and Data: 87 23 3E 00 54 50 07 02 42 41 59 20 49 2F 4F 20 01 00 02 00 00 0E 00 F0 01 F6 03 02 00 0F 00 70 01 76 03 Handle 0x003F, DMI type 136, 6 bytes OEM-specific Type Header and Data: 88 06 3F 00 5A 5A Handle 0x0040, DMI type 137, 28 bytes OEM-specific Type Header and Data: 89 1C 40 00 0C 02 00 01 01 00 00 01 50 57 4D 53 20 49 6E 66 6F 72 6D 61 74 69 6F 6E Handle 0x0041, DMI type 138, 40 bytes OEM-specific Type Header and Data: 8A 28 41 00 14 01 01 01 07 01 01 0C 01 01 0C 01 01 0C 00 00 42 49 4F 53 20 50 61 73 73 77 6F 72 64 20 46 6F 72 6D 61 74 Handle 0x0042, DMI type 139, 37 bytes OEM-specific Type Header and Data: 8B 25 42 00 11 01 0A 00 00 00 00 00 00 00 00 00 00 50 57 4D 53 20 4B 65 79 20 49 6E 66 6F 72 6D 61 74 69 6F 6E Handle 0x0043, DMI type 127, 4 bytes End Of Table
So, for your machine, the rule would be something like the following. rule tp-t60 dmi system-manufacturer LENOVO dmi system-product-name 1952W5R dmi system-version ThinkPad T60 hal storage.model Hitachi HTS722020K9SA00 act hdparm -B 255 $DEV Can you please put the above into /etc/storage-fixup.conf, run the attached storage-fixup and verify it makes the quick unloads go away? Also, please attach large outputs instead of pasting inline. It makes bug report difficult to follow. Thanks.
This is what comes out from the HP Pavillion dv6500 of a friend who actually found the problem. # dmidecode 2.9 SMBIOS 2.4 present. 19 structures occupying 731 bytes. Table at 0x000DC010. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: Hewlett-Packard Version: F.53 Release Date: 04/02/2008 Address: 0xE6F30 Runtime Size: 102608 bytes ROM Size: 1024 kB Characteristics: ISA is supported PCI is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) ACPI is supported USB legacy is supported AGP is supported Smart battery is supported BIOS boot specification is supported Targeted content distribution is supported Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Hewlett-Packard Product Name: HP Pavilion dv6500 Notebook PC Version: Rev 1 Serial Number: CNF73565FT UUID: 434E4637-3335-3635-4654-001B249711B6 Wake-up Type: Power Switch SKU Number: GT460EA#ABZ Family: 103C_5335KV Handle 0x0002, DMI type 2, 8 bytes Base Board Information Manufacturer: Quanta Product Name: 30D2 Version: 79.2B Serial Number: None Handle 0x0003, DMI type 3, 17 bytes Chassis Information Manufacturer: Quanta Type: Notebook Lock: Not Present Version: N/A Serial Number: None Asset Tag: Boot-up State: Safe Power Supply State: Safe Thermal State: Safe Security Status: None OEM Information: 0x00000004 Handle 0x0004, DMI type 4, 35 bytes Processor Information Socket Designation: U2E1 Type: Central Processor Family: Other Manufacturer: Intel ID: FA 06 00 00 FF FB EB BF Version: Intel(R) Core(TM)2 Duo CPU T7300 Voltage: 3.3 V External Clock: 800 MHz Max Speed: 2000 MHz Current Speed: 2000 MHz Status: Populated, Enabled Upgrade: Socket 478 L1 Cache Handle: 0x0005 L2 Cache Handle: 0x0006 L3 Cache Handle: Not Provided Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Handle 0x0005, DMI type 7, 19 bytes Cache Information Socket Designation: L1 Cache Configuration: Enabled, Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 64 KB Maximum Size: 64 KB Supported SRAM Types: Burst Pipeline Burst Asynchronous Installed SRAM Type: Asynchronous Speed: Unknown Error Correction Type: Unknown System Type: Unknown Associativity: Unknown Handle 0x0006, DMI type 7, 19 bytes Cache Information Socket Designation: L2 Cache Configuration: Enabled, Socketed, Level 2 Operational Mode: Write Back Location: External Installed Size: 4096 KB Maximum Size: 4096 KB Supported SRAM Types: Burst Pipeline Burst Asynchronous Installed SRAM Type: Burst Speed: Unknown Error Correction Type: Unknown System Type: Unknown Associativity: Unknown Handle 0x0007, DMI type 9, 13 bytes System Slot Information Designation: PCI Express Slot 1 Type: 64-bit PCI Express Current Usage: Available Length: Long ID: 0 Characteristics: 5.0 V is provided 3.3 V is provided Handle 0x0008, DMI type 9, 13 bytes System Slot Information Designation: PCI Express Slot 2 Type: 64-bit PCI Express Current Usage: Available Length: Long ID: 0 Characteristics: 5.0 V is provided 3.3 V is provided PME signal is supported Hot-plug devices are supported Handle 0x0009, DMI type 9, 13 bytes System Slot Information Designation: PCI Express Slot 6 Type: 64-bit PCI Express Current Usage: Available Length: Long ID: 0 Characteristics: 5.0 V is provided 3.3 V is provided PME signal is supported Handle 0x000A, DMI type 10, 6 bytes On Board Device Information Type: Video Status: Enabled Description: Handle 0x000B, DMI type 11, 5 bytes OEM Strings String 1: $HP$ String 2: LOC#ABZ String 3: ABS 70/71 79 7A 7B 7C Handle 0x000C, DMI type 16, 15 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: None Maximum Capacity: 4 GB Error Information Handle: Not Provided Number Of Devices: 2 Handle 0x000D, DMI type 17, 27 bytes Memory Device Array Handle: 0x000C Error Information Handle: No Error Total Width: 64 bits Data Width: 64 bits Size: 1024 MB Form Factor: DIMM Set: 1 Locator: DIMM 1 Bank Locator: Bank 0,1 Type: DDR2 Type Detail: Synchronous Speed: 667 MHz (1.5 ns) Manufacturer: Not Specified Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Handle 0x000E, DMI type 17, 27 bytes Memory Device Array Handle: 0x000C Error Information Handle: No Error Total Width: 64 bits Data Width: 64 bits Size: 1024 MB Form Factor: DIMM Set: 1 Locator: DIMM 2 Bank Locator: Bank 2,3 Type: DDR2 Type Detail: Synchronous Speed: 667 MHz (1.5 ns) Manufacturer: Not Specified Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Handle 0x000F, DMI type 19, 15 bytes Memory Array Mapped Address Starting Address: 0x00000000000 Ending Address: 0x0007FFFFFFF Range Size: 2 GB Physical Array Handle: 0x000C Partition Width: 0 Handle 0x0010, DMI type 20, 19 bytes Memory Device Mapped Address Starting Address: 0x00000000000 Ending Address: 0x0003FFFFFFF Range Size: 1 GB Physical Device Handle: 0x000D Memory Array Mapped Address Handle: 0x000F Partition Row Position: 2 Interleave Position: 2 Interleaved Data Depth: 2 Handle 0x0011, DMI type 32, 20 bytes System Boot Information Status: <OUT OF SPEC> -------------------- hdparm -i /dev/sda /dev/sda: Model=SAMSUNG HM250JI , FwRev=HS100-10, SerialNo=S15YJD0P821334 Config={ Fixed DTR>10Mbs } RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4 BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=?16? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=yes: unknown setting WriteCache=enabled Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0: ATA/ATAPI-1,2,3,4,5,6,7 * signifies the current active mode Regards, Alberto
*** Bug 338230 has been marked as a duplicate of this bug. ***
Created attachment 214783 [details] storage-fixup.conf Okay, here's storage-fixup.conf for above two machines. Can you guys please put the attached file under /etc and verify that storage-fixup script does the right thing? Also, please attach long outputs instead of pasting inline.
this doesn`t only happen on notebooks but also with enterprise disks for 24/7hrs useage. >Aieeeeeeeee.......... This is nasty. The BIOS / drive vendors are setting APM >to aggressive values w/ idle IO access pattern of windows on mind. The setting >is too aggressive to the point of being fragile and different idle IO pattern >causes the drive to unload like crazy && you can't really expect Linux or any >other operating system to have similar idle IO pattern as windows. :-( even worse - there are disks on the market which don`t let you tune the timeouts with standard methods. you need to get a proprietary DOS *sigh* tool to tune the unload interval - and WD support may give that to you - or not.... see these threads: http://marc.info/?l=linux-kernel&m=120777293511872&w=2 http://marc.info/?l=linux-kernel&m=121071269907588&w=2
Created attachment 216248 [details] dmidecode dmidecode for dell e1505
Created attachment 216249 [details] hdparm -I hdparm -I /dev/sda for dell e1505
inspidell:~ # smartctl -a /dev/sda | grep Load_Cycle_Count 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 769211 This is for a dell e1505
(In reply to comment #25 from roland kletzing) > this doesn`t only happen on notebooks but also with enterprise disks for > 24/7hrs useage. > > even worse - there are disks on the market which don`t let you tune the > timeouts with standard methods. you need to get a proprietary DOS *sigh* tool > to tune the unload interval - and WD support may give that to you - or not.... > > see these threads: > http://marc.info/?l=linux-kernel&m=120777293511872&w=2 > http://marc.info/?l=linux-kernel&m=121071269907588&w=2 > Yeah, I recall the thread. I don't really think we can do something about it tho. The only possible solution is to periodically issue commands to the drive to keep its head from unloading. Yuk..
Created attachment 216281 [details] storage-fixup.conf Okay, here's the updated storage-fixup.conf. You can test this by 1. Saving https://bugzilla.novell.com/attachment.cgi?id=212833 as ~root/bin/storage-fixup 2. Saving this attachment as /etc/storage-fixup.sh 3. Execute storage-fixup and verify it did the right thing. Please verify it works. It needs to be tested before shipped.
Can we please get this tested? We *really* need to get it tested before RC1 if this problem is going to be worked around in SL110.
It works on the HP Pavillion. Moreover, it seems the problem is being addressed also by the manufacturer with a new firmware. Regards, Alberto
>This is tough to solve. yes, probably. so why not creating better "transparency" for the end user here ? smartctl can read the power-on-hours and the load_cycle_count. each value for itself doesn't tell much... you won`t even need the power_on_hours value if you just check for load cycle count increase for specific time intervals. if we want to stop harddisk vendors doing such dumb things, more users must raise an eyebrow. the more noise about this, the more chances vendors will do something about that. but how should an technically inexperienced user know, that he actually suffers from that issue ? one step into that direction would be adding a feature to smartmontools to give out a warning message, when it detects excessive load cycle count number. see http://sourceforge.net/mailarchive/forum.php?thread_name=261451287%40web.de&forum_name=smartmontools-support --- >It works on the HP Pavillion. Moreover, it seems the problem is being >addressed also by the manufacturer with a new firmware. any link/information for that? WD support was pretty ignorant about this issue and it wouldn`t hurt telling them that other vendors do something about that problem.
Yeah, that could help. I'll ping Bruce Allen with the idea and cc you. Stefan, can you please include the script and config file? We'll need to continue updating the config file and look for other ways to improve the situation but for now the dirty little script seems all we've got.
Created attachment 217657 [details] storage-fixup storage-fixup with typo fixed.
Where do we include it? In the hdparm package? And the matching, at least for thinkpads, is much too fine. Generally, the first 4 digits are the model number, the last 3 characters are the specific variety (different keyboard, different CD burner, etc), so you'll need a very huge list to match all the thinkpads. Or do we really want to only do this on the (very few) machines reported as having problems?
It should depend on hal, dmidecode, hdparm and smartctl but doesn't really belong to any of them. pm-utils? As for the matching, yeah, that can be a problem. I'll see if I can extend the current script to do globbing. Thanks.
At least HAL and smartctl alredy have init scripts where it could be called, which pm-utils has not ;-), so i think it should go in one of those two packages. Since we will need very frequent updates of the blacklist, HAL is probably also a bad idea, since a HAL update pulls in a huge QA effort everytime.
Hmmm... Maybe a separate package then? I'm almost done implementing glob matching w/ hal info caching. Man.. bash is a great programming language. :-( If separate package is the way to go, I'll package it and put it on OBS. I've never submitted a package before. What else do I need to do? Thanks.
Created attachment 217758 [details] storage-fixup storage-fix v0.1
Created attachment 217759 [details] storage-fixup.conf
Posted are the updated versions of storage-fixup and configuration file which can do glob matching and matches drives in the same family. It's also improved in many aspects including documentation. git tree is set up. http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git I started upstream discussion regarding this problem and we'll probably set up a page on linux-ata.org so that we can track things better and other distros can use it too.
here are some more for storage fixup.conf Western Digital: WD5000AACS WD7500AYPS WD1000FYPS act sendmsgtouser "please check your disk for load_cycle_count numbers. you may need wdidle3.exe on bootable dos disk/cd to fix it. this is not available for download, so you may need to ask WD support for that tool or for a firmware upgrade." TOSHIBA: MK1032GAX
(In reply to comment #39 from Tejun Heo) > Hmmm... Maybe a separate package then? I'm almost done implementing glob > matching w/ hal info caching. Man.. bash is a great programming language. :-( > > If separate package is the way to go, I'll package it and put it on OBS. I've > never submitted a package before. What else do I need to do? Add it in the PDB and submit it to autobuild. I can do that for you. If you already have it in OBS, please put the package "source dir" somewhere accessible (hm, i should be able to check it out from the OBS...). Make sure you select a good license, so that we can get past legal quickly ;-) Then we need management's approval to get that still into 11.0, so i'll add relevant people to cc now.
Here's the OBS project. The license is beerware. Is that something legal can agree with? https://build.opensuse.org/package/show?package=storage-fixup&project=home%3Ateheo Every file is accessible through OBS but just in case, the git tree is at... http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git;a=shortlog;h=suse I don't know anything about PDB so I would appreciate if you can help me with that. Thanks.
Roland, thanks for the info. Hmmm... Echoing the message to console won't do too much good for most users. It'll just pass by and in most cases the pretty boot splash won't even show them. I'll ping the research mailing list on the best way to notify the user about dying but not-workaroundable hard disks.
(In reply to comment #45 from Tejun Heo) > Here's the OBS project. The license is beerware. Is that something legal can > agree with? Let's hope so, i added Ciaran and Jürgen to CC. Feel free to remove yourself again ;-) If this is not "good enough", we can probably also dual-license it "beerware / BSD 3-clause". BSD is known to go easily through legal ;) https://build.opensuse.org/package/show?package=storage-fixup&project=home%3Ateheo > > Every file is accessible through OBS but just in case, the git tree is at... Yes, i was able to check it out with OSC. Everything is easy if you have an BS account, it's only hard for people without. > http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git;a=shortlog;h=suse > > I don't know anything about PDB so I would appreciate if you can help me with > that. I'll take care of it.
Please just use GPL or BSD3clause. Having beerware in default install is a nice joke, but...
Created attachment 218052 [details] patch against OBS package (rev. 9) I submitted a package to autobuild, with the attached diff (pm-utils hooks belong into /usr/lib/pm-utils now, some rpmlint warnings). PDB seems not to be unhappy about the license btw ;)
Thanks Stefan. I'll change the license to BSD, incorporate your changes into the repo and it seems we'll need to use different way to get storage.model as hal output is truncated.
Please update your buildservice repo once you are finished, i will pick up the changes from there.
License switched to BSD and dependency to HAL removed. It now uses hdparm and sg_inq to acquire storage device information. Also, rc script is moved to boot stage after boot.localfs. OBS project updated. Thanks.
Thanks, i submitted a package to the build system and changed the license information in PDB => everything should be fine ;-)
As workaround has gone in, lowering priority to normal. I'm gonna keep this bug entry open to keep track of this problem. Thanks.
Thank you for the excellent work! Alberto
Stefan Seyfried, there's no storage-fixup in SL110. Do you know where it went? Thanks.
No. Everybody necessary is in CC, but maybe it was not severe enough to warrant late inclusion. Coolo, can we push it out via online update?
please read mails I send around.
coolo: I do not remember being Cced on those mails. I'd still like to know what is going on. Could you attach short summary to the bug?
In short: I have nothing to do with maintenance of 11.0 unless the maintenance coordinator needs me.
Then, Is "storage-fixup" the package who fixes this problem? From http://software.opensuse.org , seems that it has versions for factory, SLED 10 and opensuse-10.3, Is save use factory rpm in 11.0 ? Thanks a lot for info
After reading storage-fixup.conf, seems that -B 255 is being used but, after reading https://wiki.ubuntu.com/DanielHahler/Bug59695 , seems that some drives would need -B 254 instead. Also laptop-mode-tools now uses 254 instead of 255 from 1.35 version as can be read in http://samwel.tk/laptop_mode/changelog Also, Is fully disabling Power Manager really needed? Wouldn't it cause some problems related with overheat or short battery lifetime? Seems that my hardrive is now using 128 value: # hdparm -I /dev/sda | grep Advanced Advanced power management level: 128 * Advanced Power Management feature set # smartctl -a /dev/sda -d ata | grep Load 193 Load_Cycle_Count 0x0032 090 090 000 Old_age Always - 218785 # smartctl -a /dev/sda -d ata | grep Power 9 Power_On_Seconds 0x0032 076 076 000 Old_age Always - 12338h+14m+13s 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 5498 192 Power-Off_Retract_Count 0x0032 098 098 000 Old_age Always - 569 I am not sure if 218785 is a good value, but, after reading "man hdparm": -B Set Advanced Power Management feature, if the drive supports it. A low value means aggressive power management and a high value means better performance. Possible settings range from values 1 through 127 (which permit spin-down), and values 128 through 254 (which do not permit spin-down). The highest degree of power management is attained with a setting of 1, and the highest I/O performance with a setting of 254. A value of 255 tells hdparm to disable Advanced Power Management altogether on the drive (not all drives support disabling it, but most do). Then, seems that 128 doesn't permit spin-down. Anyway, in hdparm man page also tells you that 254 could be needed in some cases instead of 255 Thanks a lot
(In reply to comment #70 from Pacho Ramos) > Then, Is "storage-fixup" the package who fixes this problem? From > http://software.opensuse.org , seems that it has versions for factory, SLED 10 > and opensuse-10.3, Is save use factory rpm in 11.0 ? Yes. But there will also be an online update for 11.0 delivered soon, so it should pop up in YOU shortly without the need for adding any buildservice repositories.
(In reply to comment #71 from Pacho Ramos) > After reading storage-fixup.conf, seems that -B 255 is being used but, after > reading https://wiki.ubuntu.com/DanielHahler/Bug59695 , seems that some drives > would need -B 254 instead. Also laptop-mode-tools now uses 254 instead of 255 > from 1.35 version as can be read in http://samwel.tk/laptop_mode/changelog > > Also, Is fully disabling Power Manager really needed? Wouldn't it cause some > problems related with overheat or short battery lifetime? As discussed above, no one value suits every drive. I hope it were like that but the standard isn't too specific about which value means exactly what. Furthermore, this is ATA and vendors often forget to follow the spec. So, for some drives, 255 is a good value for others 254, yet others 128. That's why storage-fixup matches specific machines and use appropriate commands. We'll need to find out which value is the appropriate one for specific machine and add it machine-by-machine.
OK, thanks a lot for explanation
released
From now on, please use the following wiki page to track this problem and report new ones to linux-ide@vger.kernel.org as this problem is not specific to SUSE. http://ata.wiki.kernel.org/index.php/Known_issues Thanks.