|
Bugzilla – Full Text Bug Listing |
| Summary: | High number of hard drive load cycles on notebooks | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.0 | Reporter: | Alberto Passalacqua <alberto.passalacqua> |
| Component: | Kernel | Assignee: | Tejun Heo <teheo> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | aj, ciaran.farrell, cmoess, coolo, elchevive68, forgotten_qMyteedNxa, forgotten_sLJ7K2dvxj, noiano, pacho, pearson45j, sbrabec, sontek, sshaw, wm |
| Version: | Beta 2 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
storage-fixup script
storage-fixup slightly updated. storage-fixup.conf storage-fixup.conf dmidecode hdparm -I storage-fixup.conf storage-fixup storage-fixup storage-fixup.conf patch against OBS package (rev. 9) |
||
|
Description
Alberto Passalacqua
2008-05-05 03:03:42 UTC
Aieeeeeeeee.......... This is nasty. The BIOS / drive vendors are setting APM to aggressive values w/ idle IO access pattern of windows on mind. The setting is too aggressive to the point of being fragile and different idle IO pattern causes the drive to unload like crazy && you can't really expect Linux or any other operating system to have similar idle IO pattern as windows. :-( The interesting part is that such aggressive settings can cause crazy unloads even on Windows. My friend's tp tablet does crazy unloadings under windows but not under SL103. Maybe the vaccine program my friend is running makes the idle io pattern different or something. So, this basically is the vendor excessively fine-tuning their hardware to achieve slightly better battery benchmark and the 'excessive' part, unsurprisingly, breaking down when the circumstance changes. This is tough to solve. The ATA APM feature set is vague about which values means what and, expectedly, not all drive manufacturers follow even the vague outlines. Vendors tuning their hardwares 'excessively' know which drives they're gonna use and can determine their settings. Value 255 is supposed to disable the feature but apparently it just wraps around to 127/8 on certain drives. Value 254 means maximum performance but we're never sure what it will do on odd ones and there are drives which seemingly overheat if APM is set to 255 or 254 - these drives are using APM to control other aspects of the drive than head in addition to loading/unloading - be it drive RPM, electric circuit power mode or whatever, which is legit as the ATA spec doesn't dictate how the power should be saved when APM is configured. So, there's no default golden value we can use to solve it. One way to solve it is to issue IO every 1 sec if no actual IO is in progress and idle timeout hasn't expired, which is a seriously ugly solution and will require considerable time and effort to get right. Another more realistic approach is to develop a blacklist of machines / drives which have this problem and turn it off during boot or resume. Hmmm... if the BIOS does it right and implements proper SATA ACPI and turns on APM via _GTF after hardresets, the blacklist somehow needs to be invoked after each reset or the driver should remember the setting and restore it. Eeeewwww........ BIOS shouldn't configure it w/ such fragile values. Please take a look at the following wiki page for more information. http://www.thinkwiki.org/wiki/Problem_with_hard_drive_clicking Ideally, this should be solved by updating the firmware or disabling APM from the BIOS but this problem will kill drives pretty fast and we better have some kind of workaround although I can't think of a good one at this point. Alberto, can you please cc other people who are seeing this problem here? Thanks. Added three users to CC. They discussed with me about the issue on the Factory ML. Regards, Alberto Changing status to opened. I'd say you have a buggy hardware, too bad. Report it to system vendor. (Plus, we should be careful not to certify such broken machines). Blacklist is certainly good solution for now... but I don't think it is a "major" problem, at least it is not a _Linux_ major problem. Long-term solution would be to generate less disk activity by default. Maybe 5 seconds commits on ext3 are not that good idea? ...It will also allow longer spindowns and save power... Pavel, I pretty much agree with you here but as this can actually kill disks which can make people genuinely miserable. I think adding minimal workaround is a good idea. Say, invoking hdparm -B w/ the correct value on certain configurations during boot and resume. Where would be the correct place to do that? I agree it depends on hardware configuration, but saying it's not a Linux problem is just not seeing it. Defining that hardware buggy when it works OK with other operating systems like Windows and Mac Os X (also a macbook is affected, the user is in CC), is something at least funny. If I have to decide between the life of my hard disk and Linux, without a doubt I won't use Linux and keep my hardware safe. That's why, for me and others users, it is a major, if not higher severity, issue. Regards, Alberto Lemme play a bat here. Alberto, I see where you're coming from too and I _am_ considering it seriously (w/ due amount of cursing of course). Pavel is just pointing out where the problem lies and that's usuall the place where the fix should go. Solving it at the different layer especially in this case is nasty and cleaning up a mess someone else did for a nickel is annoying, so please understand our frustration. Anyways, I'm working on a script which can be called during boot and resume and can issue some hdparm and/or smartctl commands to handle this, so we should be able to handle these weirdos soon. Please standby a bit. Thanks. I perfectly understand your point of view, but I just represent the average users who behaves as I described. If I read that my brand new laptop will have a damaged hard drive under Linux, with all the consequences of the case (read void warranty), I won't surely use it. That's my point. It's not a question of how seriously you (at Novell) take the problem. I never wrote you're not considering it seriously. But I also think that if this hardware works correctly under Windows and Mac OS X, it should be work correctly without ugly hacks under Linux too. Just my two cents. (In reply to comment #9 from Alberto Passalacqua) > It's not a question of how seriously you (at Novell) take the problem. I never > wrote you're not considering it seriously. But I also think that if this > hardware works correctly under Windows and Mac OS X, it should be work > correctly without ugly hacks under Linux too. Other points I can agree to but it working under mac os X is just dumb luck. The setting just isn't healthy. Let's say the vendor sets the unload timeout to 10 seconds and under nominal laptop circumstances Windows issues IOs every 8 secs unless it's completely idle. It will work fine there. Let's say mac does so every 7 to 9 seconds which will work fine too. Now, let's say linux does so every 10-12 seconds. Now you have a problem. Being idle for longer period time is a good thing which we should strive for but in this case it's fast death for the drive. This is why some people are reporting the problem goes away when laptop mode is disabled because then IOs will be issued more frequently. Such short fixed timeouts just can't generically work. It's bound to break. They had to go with longer timeout or adaptive one. So, it's not like something is wrong with linux, it's just different and the setting is way too aggressive for real world. Please think of my friend's laptop I mentioned earlier. That drive is going to be toasted pretty soon even if it's just differently configured windows. Can we increase write back time (especially of the journal) by default, say by factor 4-10? This would lower the severity of the issue, at least a bit. BTW: I Windows probably also is waking up the disk really often over some weeks months... pm-utils is the right place. The question is: "what is the correct value"?
My observations with different drives usually show the following (all hdparm -B values):
1-127: Drive unloads heads and does auto-spindown, depending on access pattern
128-191: drive unloads heads if no access for a second or so
>192: drive does not unload heads
from the specs, 255 disables APM
On my machine, i usually set it to 192 when on AC power and to 128 when on battery power.
As written above, I don't think there's one magic value we can use. We'll have to build a blacklist for problematic laptops (we should be able to take some from the thinkwiki page). I'm almost done with a bash script to do it. Will post it soon after adding some comments. Created attachment 212826 [details]
storage-fixup script
Okay, here's the script. The comment on top of it should explain most things. I'll attach a sample rule file soon.
Created attachment 212833 [details]
storage-fixup slightly updated.
Turned off debug mode and misc fix up.
Created attachment 212834 [details]
storage-fixup.conf
Sample configuration file.
Stefan, does this look good enough? Or does pm-utils have better facility to do this? Alberto and other owners of troubled machines, can you guys please post the result of "dmidecode" and "hdparm -I /dev/sda" and which APM value works for the machine? Thanks. No, i think that pm-utils does not have a better facility. We should just run your script at boot and after resume. Additionally, i'll add a pm-utils hook, that gets executed whenever the "set low power" function is called (with "true" or "false" as an argument, to set low power mode or to unset low power mode). This script will per default do nothing - but it will have a configuration file, where the user can enter values that will be used for AAM and APM settings. The config file will probably look something like: LOW_POWER_APM="128" HI_POWER_APM="192" LOW_POWER_AAM="" HI_POWER_AAM="" But i have to think about that. I don't have that hardware, I remove myself from the CC list. hello here are the results of "dmidecode" and "hdparm -I /dev/sda" from my Thinkpad T60
Linux linux 2.6.25-26-pae #1 SMP 2008-04-30 07:56:05 +0200 i686 i686 i386 GNU/Linux
Running hdparm -I /dev/sda....
/dev/sda:
ATA device, with non-removable media
Model Number: Hitachi HTS722020K9SA00
Serial Number: 071006DP0410DTG4VSSP
Firmware Revision: DC4OC54P
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
Used: ATA-8-ACS revision 3f
Supported: 8 7 6 5
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 390721968
device size with M = 1024*1024: 190782 MBytes
device size with M = 1000*1000: 200049 MBytes (200 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Vendor, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 128
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
* WRITE_UNCORRECTABLE_EXT command
* SATA-I signaling speed (1.5Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* unknown 76[12]
Non-Zero buffer offsets in DMA Setup FIS
DMA Setup Auto-Activate optimization
* Device-initiated interface power management
In-order data delivery
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT LBA Segment Access (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
72min for SECURITY ERASE UNIT. 74min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000cca53ec235f6
NAA : 5
IEEE OUI : cca
Unique ID : 53ec235f6
Checksum: correct
Running dmidecode....
# dmidecode 2.9
SMBIOS 2.4 present.
68 structures occupying 2250 bytes.
Table at 0x000E0010.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: LENOVO
Version: 79ETE1WW (2.21 )
Release Date: 02/05/2008
Address: 0xDC000
Runtime Size: 144 kB
ROM Size: 2048 kB
Characteristics:
PCI is supported
PC Card (PCMCIA) is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
BIOS Revision: 2.33
Firmware Revision: 1.7
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: LENOVO
Product Name: 1952W5R
Version: ThinkPad T60
Serial Number: L3C8047
UUID: 81E12281-482B-11CB-B37C-91674F2F476C
Wake-up Type: Power Switch
SKU Number: Not Specified
Family: ThinkPad T60
Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
Manufacturer: LENOVO
Product Name: 1952W5R
Version: Not Available
Serial Number: VF0BC6730XA
Handle 0x0003, DMI type 3, 13 bytes
Chassis Information
Manufacturer: LENOVO
Type: Notebook
Lock: Not Present
Version: Not Available
Serial Number: Not Available
Asset Tag: No Asset Information
Boot-up State: Unknown
Power Supply State: Unknown
Thermal State: Unknown
Security Status: Unknown
Handle 0x0004, DMI type 126, 13 bytes
Inactive
Handle 0x0005, DMI type 126, 13 bytes
Inactive
Handle 0x0006, DMI type 4, 35 bytes
Processor Information
Socket Designation: None
Type: Central Processor
Family: Other
Manufacturer: GenuineIntel
ID: F6 06 00 00 FF FB EB BF
Version: Intel(R) Core(TM)2 CPU
Voltage: 1.3 V
External Clock: 167 MHz
Max Speed: 2333 MHz
Current Speed: 2333 MHz
Status: Populated, Enabled
Upgrade: None
L1 Cache Handle: 0x000A
L2 Cache Handle: 0x000C
L3 Cache Handle: Not Provided
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Handle 0x0007, DMI type 5, 20 bytes
Memory Controller Information
Error Detecting Method: None
Error Correcting Capabilities:
None
Supported Interleave: One-way Interleave
Current Interleave: One-way Interleave
Maximum Memory Module Size: 2048 MB
Maximum Total Memory Size: 4096 MB
Supported Speeds:
Other
Supported Memory Types:
DIMM
SDRAM
Memory Module Voltage: 2.9 V
Associated Memory Slots: 2
0x0008
0x0009
Enabled Error Correcting Capabilities:
Unknown
Handle 0x0008, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: DIMM Slot 1
Bank Connections: 0 3
Current Speed: Unknown
Type: DIMM SDRAM
Installed Size: 2048 MB (Double-bank Connection)
Enabled Size: 2048 MB (Double-bank Connection)
Error Status: OK
Handle 0x0009, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: DIMM Slot 2
Bank Connections: 4 7
Current Speed: Unknown
Type: DIMM SDRAM
Installed Size: 1024 MB (Double-bank Connection)
Enabled Size: 1024 MB (Double-bank Connection)
Error Status: OK
Handle 0x000A, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L1 Cache
Configuration: Enabled, Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 64 KB
Maximum Size: 64 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Instruction
Associativity: 8-way Set-associative
Handle 0x000B, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L1 Cache
Configuration: Enabled, Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 64 KB
Maximum Size: 64 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Data
Associativity: 8-way Set-associative
Handle 0x000C, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal L2 Cache
Configuration: Enabled, Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 4096 KB
Maximum Size: 4096 KB
Supported SRAM Types:
Burst
Installed SRAM Type: Burst
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Unified
Associativity: 8-way Set-associative
Handle 0x000D, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: Infrared
External Connector Type: Infrared
Port Type: Other
Handle 0x000E, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: External Monitor
External Connector Type: DB-15 female
Port Type: Video Port
Handle 0x000F, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: Microphone Jack
External Connector Type: Mini Jack (headphones)
Port Type: Audio Port
Handle 0x0010, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: Headphone Jack
External Connector Type: Mini Jack (headphones)
Port Type: Audio Port
Handle 0x0011, DMI type 126, 9 bytes
Inactive
Handle 0x0012, DMI type 126, 9 bytes
Inactive
Handle 0x0013, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: Modem
External Connector Type: RJ-11
Port Type: Modem Port
Handle 0x0014, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: Ethernet
External Connector Type: RJ-45
Port Type: Network Port
Handle 0x0015, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: USB 1
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0016, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: USB 2
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0017, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Not Available
Internal Connector Type: None
External Reference Designator: USB 3
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0018, DMI type 126, 9 bytes
Inactive
Handle 0x0019, DMI type 126, 9 bytes
Inactive
Handle 0x001A, DMI type 126, 9 bytes
Inactive
Handle 0x001B, DMI type 126, 9 bytes
Inactive
Handle 0x001C, DMI type 126, 9 bytes
Inactive
Handle 0x001D, DMI type 126, 9 bytes
Inactive
Handle 0x001E, DMI type 126, 9 bytes
Inactive
Handle 0x001F, DMI type 126, 9 bytes
Inactive
Handle 0x0020, DMI type 9, 13 bytes
System Slot Information
Designation: ExpressCard Slot 1
Type: x1 PCI Express
Current Usage: Available
Length: Other
ID: 0
Characteristics:
Hot-plug devices are supported
Handle 0x0021, DMI type 9, 13 bytes
System Slot Information
Designation: CardBus Slot 1
Type: 32-bit PC Card (PCMCIA)
Current Usage: Available
Length: Other
ID: Adapter 1, Socket 0
Characteristics:
5.0 V is provided
3.3 V is provided
PC Card-16 is supported
Cardbus is supported
Zoom Video is supported
Modem ring resume is supported
PME signal is supported
Hot-plug devices are supported
Handle 0x0022, DMI type 126, 13 bytes
Inactive
Handle 0x0023, DMI type 126, 13 bytes
Inactive
Handle 0x0024, DMI type 126, 13 bytes
Inactive
Handle 0x0025, DMI type 10, 6 bytes
On Board Device Information
Type: Other
Status: Disabled
Description: IBM Embedded Security hardware
Handle 0x0026, DMI type 11, 5 bytes
OEM Strings
String 1: IBM ThinkPad Embedded Controller -[79HT50WW-1.07 ]-
Handle 0x0027, DMI type 13, 22 bytes
BIOS Language Information
Installable Languages: 1
enUS
Currently Installed Language: enUS
Handle 0x0028, DMI type 15, 25 bytes
System Event Log
Area Length: 0 bytes
Header Start Offset: 0x0000
Header Length: 16 bytes
Data Start Offset: 0x0010
Access Method: General-purpose non-volatile data functions
Access Address: 0x0000
Status: Invalid, Full
Change Token: 0x000000EF
Header Format: Type 1
Supported Log Type Descriptors: 1
Descriptor 1: POST error
Data Format 1: POST results bitmap
Handle 0x0029, DMI type 16, 15 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 2 GB
Error Information Handle: Not Provided
Number Of Devices: 2
Handle 0x002A, DMI type 17, 27 bytes
Memory Device
Array Handle: 0x0029
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: SODIMM
Set: None
Locator: DIMM 1
Bank Locator: Bank 0/1
Type: DDR2
Type Detail: Synchronous
Speed: Unknown
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Handle 0x002B, DMI type 17, 27 bytes
Memory Device
Array Handle: 0x0029
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 1024 MB
Form Factor: SODIMM
Set: None
Locator: DIMM 2
Bank Locator: Bank 2/3
Type: DDR2
Type Detail: Synchronous
Speed: Unknown
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Handle 0x002C, DMI type 18, 23 bytes
32-bit Memory Error Information
Type: OK
Granularity: Unknown
Operation: Unknown
Vendor Syndrome: Unknown
Memory Array Address: Unknown
Device Address: Unknown
Resolution: Unknown
Handle 0x002D, DMI type 19, 15 bytes
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x000BFFFFFFF
Range Size: 3 GB
Physical Array Handle: 0x0029
Partition Width: 0
Handle 0x002E, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Range Size: 2 GB
Physical Device Handle: 0x002A
Memory Array Mapped Address Handle: 0x002D
Partition Row Position: 1
Handle 0x002F, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00080000000
Ending Address: 0x000BFFFFFFF
Range Size: 1 GB
Physical Device Handle: 0x002B
Memory Array Mapped Address Handle: 0x002D
Partition Row Position: 1
Handle 0x0030, DMI type 21, 7 bytes
Built-in Pointing Device
Type: Track Point
Interface: PS/2
Buttons: 3
Handle 0x0031, DMI type 21, 7 bytes
Built-in Pointing Device
Type: Touch Pad
Interface: PS/2
Buttons: 0
Handle 0x0032, DMI type 24, 5 bytes
Hardware Security
Power-On Password Status: Disabled
Keyboard Password Status: Disabled
Administrator Password Status: Disabled
Front Panel Reset Status: Unknown
Handle 0x0033, DMI type 32, 11 bytes
System Boot Information
Status: No errors detected
Handle 0x0034, DMI type 131, 17 bytes
OEM-specific Type
Header and Data:
83 11 34 00 01 02 03 FF FF 1F 00 00 00 00 00 02
00
Strings:
BOOTINF 20h
BOOTDEV 21h
KEYPTRS 23h
Handle 0x0035, DMI type 131, 11 bytes
OEM-specific Type
Header and Data:
83 0B 35 00 00 00 E8 FF C5 01 01
Strings:
IBM System Metrics
Handle 0x0036, DMI type 131, 22 bytes
OEM-specific Type
Header and Data:
83 16 36 00 01 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 01
Strings:
TVT-Enablement
Handle 0x0037, DMI type 132, 7 bytes
OEM-specific Type
Header and Data:
84 07 37 00 01 D8 36
Handle 0x0038, DMI type 133, 5 bytes
OEM-specific Type
Header and Data:
85 05 38 00 01
Strings:
KHOIHGIUCCHHII
Handle 0x0039, DMI type 133, 17 bytes
OEM-specific Type
Header and Data:
85 11 39 00 30 30 2E 35 00 60 6F BF 00 00 00 00
01
Strings:
Audit Boot History
Handle 0x003A, DMI type 134, 13 bytes
OEM-specific Type
Header and Data:
86 0D 3A 00 04 07 06 20 00 00 00 00 00
Handle 0x003B, DMI type 134, 16 bytes
OEM-specific Type
Header and Data:
86 10 3B 00 00 41 54 4D 4C 01 01 00 00 02 01 02
Strings:
TPM INFO
System Reserved
Handle 0x003C, DMI type 135, 13 bytes
OEM-specific Type
Header and Data:
87 0D 3C 00 54 50 07 00 01 00 00 00 00
Handle 0x003D, DMI type 135, 18 bytes
OEM-specific Type
Header and Data:
87 12 3D 00 54 50 07 01 01 A3 07 00 00 00 01 00
00 00
Handle 0x003E, DMI type 135, 35 bytes
OEM-specific Type
Header and Data:
87 23 3E 00 54 50 07 02 42 41 59 20 49 2F 4F 20
01 00 02 00 00 0E 00 F0 01 F6 03 02 00 0F 00 70
01 76 03
Handle 0x003F, DMI type 136, 6 bytes
OEM-specific Type
Header and Data:
88 06 3F 00 5A 5A
Handle 0x0040, DMI type 137, 28 bytes
OEM-specific Type
Header and Data:
89 1C 40 00 0C 02 00 01 01 00 00 01 50 57 4D 53
20 49 6E 66 6F 72 6D 61 74 69 6F 6E
Handle 0x0041, DMI type 138, 40 bytes
OEM-specific Type
Header and Data:
8A 28 41 00 14 01 01 01 07 01 01 0C 01 01 0C 01
01 0C 00 00 42 49 4F 53 20 50 61 73 73 77 6F 72
64 20 46 6F 72 6D 61 74
Handle 0x0042, DMI type 139, 37 bytes
OEM-specific Type
Header and Data:
8B 25 42 00 11 01 0A 00 00 00 00 00 00 00 00 00
00 50 57 4D 53 20 4B 65 79 20 49 6E 66 6F 72 6D
61 74 69 6F 6E
Handle 0x0043, DMI type 127, 4 bytes
End Of Table
So, for your machine, the rule would be something like the following. rule tp-t60 dmi system-manufacturer LENOVO dmi system-product-name 1952W5R dmi system-version ThinkPad T60 hal storage.model Hitachi HTS722020K9SA00 act hdparm -B 255 $DEV Can you please put the above into /etc/storage-fixup.conf, run the attached storage-fixup and verify it makes the quick unloads go away? Also, please attach large outputs instead of pasting inline. It makes bug report difficult to follow. Thanks. This is what comes out from the HP Pavillion dv6500 of a friend who actually found the problem.
# dmidecode 2.9
SMBIOS 2.4 present.
19 structures occupying 731 bytes.
Table at 0x000DC010.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: Hewlett-Packard
Version: F.53
Release Date: 04/02/2008
Address: 0xE6F30
Runtime Size: 102608 bytes
ROM Size: 1024 kB
Characteristics:
ISA is supported
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
AGP is supported
Smart battery is supported
BIOS boot specification is supported
Targeted content distribution is supported
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Hewlett-Packard
Product Name: HP Pavilion dv6500 Notebook PC
Version: Rev 1
Serial Number: CNF73565FT
UUID: 434E4637-3335-3635-4654-001B249711B6
Wake-up Type: Power Switch
SKU Number: GT460EA#ABZ
Family: 103C_5335KV
Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
Manufacturer: Quanta
Product Name: 30D2
Version: 79.2B
Serial Number: None
Handle 0x0003, DMI type 3, 17 bytes
Chassis Information
Manufacturer: Quanta
Type: Notebook
Lock: Not Present
Version: N/A
Serial Number: None
Asset Tag:
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x00000004
Handle 0x0004, DMI type 4, 35 bytes
Processor Information
Socket Designation: U2E1
Type: Central Processor
Family: Other
Manufacturer: Intel
ID: FA 06 00 00 FF FB EB BF
Version: Intel(R) Core(TM)2 Duo CPU T7300
Voltage: 3.3 V
External Clock: 800 MHz
Max Speed: 2000 MHz
Current Speed: 2000 MHz
Status: Populated, Enabled
Upgrade: Socket 478
L1 Cache Handle: 0x0005
L2 Cache Handle: 0x0006
L3 Cache Handle: Not Provided
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Handle 0x0005, DMI type 7, 19 bytes
Cache Information
Socket Designation: L1 Cache
Configuration: Enabled, Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 64 KB
Maximum Size: 64 KB
Supported SRAM Types:
Burst
Pipeline Burst
Asynchronous
Installed SRAM Type: Asynchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown
Handle 0x0006, DMI type 7, 19 bytes
Cache Information
Socket Designation: L2 Cache
Configuration: Enabled, Socketed, Level 2
Operational Mode: Write Back
Location: External
Installed Size: 4096 KB
Maximum Size: 4096 KB
Supported SRAM Types:
Burst
Pipeline Burst
Asynchronous
Installed SRAM Type: Burst
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown
Handle 0x0007, DMI type 9, 13 bytes
System Slot Information
Designation: PCI Express Slot 1
Type: 64-bit PCI Express
Current Usage: Available
Length: Long
ID: 0
Characteristics:
5.0 V is provided
3.3 V is provided
Handle 0x0008, DMI type 9, 13 bytes
System Slot Information
Designation: PCI Express Slot 2
Type: 64-bit PCI Express
Current Usage: Available
Length: Long
ID: 0
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Hot-plug devices are supported
Handle 0x0009, DMI type 9, 13 bytes
System Slot Information
Designation: PCI Express Slot 6
Type: 64-bit PCI Express
Current Usage: Available
Length: Long
ID: 0
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
Handle 0x000A, DMI type 10, 6 bytes
On Board Device Information
Type: Video
Status: Enabled
Description:
Handle 0x000B, DMI type 11, 5 bytes
OEM Strings
String 1: $HP$
String 2: LOC#ABZ
String 3: ABS 70/71 79 7A 7B 7C
Handle 0x000C, DMI type 16, 15 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 4 GB
Error Information Handle: Not Provided
Number Of Devices: 2
Handle 0x000D, DMI type 17, 27 bytes
Memory Device
Array Handle: 0x000C
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 1024 MB
Form Factor: DIMM
Set: 1
Locator: DIMM 1
Bank Locator: Bank 0,1
Type: DDR2
Type Detail: Synchronous
Speed: 667 MHz (1.5 ns)
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Handle 0x000E, DMI type 17, 27 bytes
Memory Device
Array Handle: 0x000C
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 1024 MB
Form Factor: DIMM
Set: 1
Locator: DIMM 2
Bank Locator: Bank 2,3
Type: DDR2
Type Detail: Synchronous
Speed: 667 MHz (1.5 ns)
Manufacturer: Not Specified
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Handle 0x000F, DMI type 19, 15 bytes
Memory Array Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Range Size: 2 GB
Physical Array Handle: 0x000C
Partition Width: 0
Handle 0x0010, DMI type 20, 19 bytes
Memory Device Mapped Address
Starting Address: 0x00000000000
Ending Address: 0x0003FFFFFFF
Range Size: 1 GB
Physical Device Handle: 0x000D
Memory Array Mapped Address Handle: 0x000F
Partition Row Position: 2
Interleave Position: 2
Interleaved Data Depth: 2
Handle 0x0011, DMI type 32, 20 bytes
System Boot Information
Status: <OUT OF SPEC>
--------------------
hdparm -i /dev/sda
/dev/sda:
Model=SAMSUNG HM250JI , FwRev=HS100-10, SerialNo=S15YJD0P821334
Config={ Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
BuffType=DualPortCache, BuffSize=8192kB, MaxMultSect=16, MultSect=?16?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: unknown setting WriteCache=enabled
Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0: ATA/ATAPI-1,2,3,4,5,6,7
* signifies the current active mode
Regards,
Alberto
*** Bug 338230 has been marked as a duplicate of this bug. *** Created attachment 214783 [details]
storage-fixup.conf
Okay, here's storage-fixup.conf for above two machines. Can you guys please put the attached file under /etc and verify that storage-fixup script does the right thing? Also, please attach long outputs instead of pasting inline.
this doesn`t only happen on notebooks but also with enterprise disks for 24/7hrs useage. >Aieeeeeeeee.......... This is nasty. The BIOS / drive vendors are setting APM >to aggressive values w/ idle IO access pattern of windows on mind. The setting >is too aggressive to the point of being fragile and different idle IO pattern >causes the drive to unload like crazy && you can't really expect Linux or any >other operating system to have similar idle IO pattern as windows. :-( even worse - there are disks on the market which don`t let you tune the timeouts with standard methods. you need to get a proprietary DOS *sigh* tool to tune the unload interval - and WD support may give that to you - or not.... see these threads: http://marc.info/?l=linux-kernel&m=120777293511872&w=2 http://marc.info/?l=linux-kernel&m=121071269907588&w=2 Created attachment 216248 [details]
dmidecode
dmidecode for dell e1505
Created attachment 216249 [details]
hdparm -I
hdparm -I /dev/sda for dell e1505
inspidell:~ # smartctl -a /dev/sda | grep Load_Cycle_Count 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 769211 This is for a dell e1505 (In reply to comment #25 from roland kletzing) > this doesn`t only happen on notebooks but also with enterprise disks for > 24/7hrs useage. > > even worse - there are disks on the market which don`t let you tune the > timeouts with standard methods. you need to get a proprietary DOS *sigh* tool > to tune the unload interval - and WD support may give that to you - or not.... > > see these threads: > http://marc.info/?l=linux-kernel&m=120777293511872&w=2 > http://marc.info/?l=linux-kernel&m=121071269907588&w=2 > Yeah, I recall the thread. I don't really think we can do something about it tho. The only possible solution is to periodically issue commands to the drive to keep its head from unloading. Yuk.. Created attachment 216281 [details] storage-fixup.conf Okay, here's the updated storage-fixup.conf. You can test this by 1. Saving https://bugzilla.novell.com/attachment.cgi?id=212833 as ~root/bin/storage-fixup 2. Saving this attachment as /etc/storage-fixup.sh 3. Execute storage-fixup and verify it did the right thing. Please verify it works. It needs to be tested before shipped. Can we please get this tested? We *really* need to get it tested before RC1 if this problem is going to be worked around in SL110. It works on the HP Pavillion. Moreover, it seems the problem is being addressed also by the manufacturer with a new firmware. Regards, Alberto >This is tough to solve. yes, probably. so why not creating better "transparency" for the end user here ? smartctl can read the power-on-hours and the load_cycle_count. each value for itself doesn't tell much... you won`t even need the power_on_hours value if you just check for load cycle count increase for specific time intervals. if we want to stop harddisk vendors doing such dumb things, more users must raise an eyebrow. the more noise about this, the more chances vendors will do something about that. but how should an technically inexperienced user know, that he actually suffers from that issue ? one step into that direction would be adding a feature to smartmontools to give out a warning message, when it detects excessive load cycle count number. see http://sourceforge.net/mailarchive/forum.php?thread_name=261451287%40web.de&forum_name=smartmontools-support --- >It works on the HP Pavillion. Moreover, it seems the problem is being >addressed also by the manufacturer with a new firmware. any link/information for that? WD support was pretty ignorant about this issue and it wouldn`t hurt telling them that other vendors do something about that problem. Yeah, that could help. I'll ping Bruce Allen with the idea and cc you. Stefan, can you please include the script and config file? We'll need to continue updating the config file and look for other ways to improve the situation but for now the dirty little script seems all we've got. Created attachment 217657 [details]
storage-fixup
storage-fixup with typo fixed.
Where do we include it? In the hdparm package? And the matching, at least for thinkpads, is much too fine. Generally, the first 4 digits are the model number, the last 3 characters are the specific variety (different keyboard, different CD burner, etc), so you'll need a very huge list to match all the thinkpads. Or do we really want to only do this on the (very few) machines reported as having problems? It should depend on hal, dmidecode, hdparm and smartctl but doesn't really belong to any of them. pm-utils? As for the matching, yeah, that can be a problem. I'll see if I can extend the current script to do globbing. Thanks. At least HAL and smartctl alredy have init scripts where it could be called, which pm-utils has not ;-), so i think it should go in one of those two packages. Since we will need very frequent updates of the blacklist, HAL is probably also a bad idea, since a HAL update pulls in a huge QA effort everytime. Hmmm... Maybe a separate package then? I'm almost done implementing glob matching w/ hal info caching. Man.. bash is a great programming language. :-( If separate package is the way to go, I'll package it and put it on OBS. I've never submitted a package before. What else do I need to do? Thanks. Created attachment 217758 [details]
storage-fixup
storage-fix v0.1
Created attachment 217759 [details]
storage-fixup.conf
Posted are the updated versions of storage-fixup and configuration file which can do glob matching and matches drives in the same family. It's also improved in many aspects including documentation. git tree is set up. http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git I started upstream discussion regarding this problem and we'll probably set up a page on linux-ata.org so that we can track things better and other distros can use it too. here are some more for storage fixup.conf Western Digital: WD5000AACS WD7500AYPS WD1000FYPS act sendmsgtouser "please check your disk for load_cycle_count numbers. you may need wdidle3.exe on bootable dos disk/cd to fix it. this is not available for download, so you may need to ask WD support for that tool or for a firmware upgrade." TOSHIBA: MK1032GAX (In reply to comment #39 from Tejun Heo) > Hmmm... Maybe a separate package then? I'm almost done implementing glob > matching w/ hal info caching. Man.. bash is a great programming language. :-( > > If separate package is the way to go, I'll package it and put it on OBS. I've > never submitted a package before. What else do I need to do? Add it in the PDB and submit it to autobuild. I can do that for you. If you already have it in OBS, please put the package "source dir" somewhere accessible (hm, i should be able to check it out from the OBS...). Make sure you select a good license, so that we can get past legal quickly ;-) Then we need management's approval to get that still into 11.0, so i'll add relevant people to cc now. Here's the OBS project. The license is beerware. Is that something legal can agree with? https://build.opensuse.org/package/show?package=storage-fixup&project=home%3Ateheo Every file is accessible through OBS but just in case, the git tree is at... http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git;a=shortlog;h=suse I don't know anything about PDB so I would appreciate if you can help me with that. Thanks. Roland, thanks for the info. Hmmm... Echoing the message to console won't do too much good for most users. It'll just pass by and in most cases the pretty boot splash won't even show them. I'll ping the research mailing list on the best way to notify the user about dying but not-workaroundable hard disks. (In reply to comment #45 from Tejun Heo) > Here's the OBS project. The license is beerware. Is that something legal can > agree with? Let's hope so, i added Ciaran and Jürgen to CC. Feel free to remove yourself again ;-) If this is not "good enough", we can probably also dual-license it "beerware / BSD 3-clause". BSD is known to go easily through legal ;) https://build.opensuse.org/package/show?package=storage-fixup&project=home%3Ateheo > > Every file is accessible through OBS but just in case, the git tree is at... Yes, i was able to check it out with OSC. Everything is easy if you have an BS account, it's only hard for people without. > http://git.kernel.org/?p=linux/kernel/git/tj/storage-fixup.git;a=shortlog;h=suse > > I don't know anything about PDB so I would appreciate if you can help me with > that. I'll take care of it. Please just use GPL or BSD3clause. Having beerware in default install is a nice joke, but... Created attachment 218052 [details]
patch against OBS package (rev. 9)
I submitted a package to autobuild, with the attached diff (pm-utils hooks belong into /usr/lib/pm-utils now, some rpmlint warnings).
PDB seems not to be unhappy about the license btw ;)
Thanks Stefan. I'll change the license to BSD, incorporate your changes into the repo and it seems we'll need to use different way to get storage.model as hal output is truncated. Please update your buildservice repo once you are finished, i will pick up the changes from there. License switched to BSD and dependency to HAL removed. It now uses hdparm and sg_inq to acquire storage device information. Also, rc script is moved to boot stage after boot.localfs. OBS project updated. Thanks. Thanks, i submitted a package to the build system and changed the license information in PDB => everything should be fine ;-) As workaround has gone in, lowering priority to normal. I'm gonna keep this bug entry open to keep track of this problem. Thanks. Thank you for the excellent work! Alberto Stefan Seyfried, there's no storage-fixup in SL110. Do you know where it went? Thanks. No. Everybody necessary is in CC, but maybe it was not severe enough to warrant late inclusion. Coolo, can we push it out via online update? please read mails I send around. coolo: I do not remember being Cced on those mails. I'd still like to know what is going on. Could you attach short summary to the bug? In short: I have nothing to do with maintenance of 11.0 unless the maintenance coordinator needs me. Then, Is "storage-fixup" the package who fixes this problem? From http://software.opensuse.org , seems that it has versions for factory, SLED 10 and opensuse-10.3, Is save use factory rpm in 11.0 ? Thanks a lot for info After reading storage-fixup.conf, seems that -B 255 is being used but, after reading https://wiki.ubuntu.com/DanielHahler/Bug59695 , seems that some drives would need -B 254 instead. Also laptop-mode-tools now uses 254 instead of 255 from 1.35 version as can be read in http://samwel.tk/laptop_mode/changelog Also, Is fully disabling Power Manager really needed? Wouldn't it cause some problems related with overheat or short battery lifetime? Seems that my hardrive is now using 128 value: # hdparm -I /dev/sda | grep Advanced Advanced power management level: 128 * Advanced Power Management feature set # smartctl -a /dev/sda -d ata | grep Load 193 Load_Cycle_Count 0x0032 090 090 000 Old_age Always - 218785 # smartctl -a /dev/sda -d ata | grep Power 9 Power_On_Seconds 0x0032 076 076 000 Old_age Always - 12338h+14m+13s 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 5498 192 Power-Off_Retract_Count 0x0032 098 098 000 Old_age Always - 569 I am not sure if 218785 is a good value, but, after reading "man hdparm": -B Set Advanced Power Management feature, if the drive supports it. A low value means aggressive power management and a high value means better performance. Possible settings range from values 1 through 127 (which permit spin-down), and values 128 through 254 (which do not permit spin-down). The highest degree of power management is attained with a setting of 1, and the highest I/O performance with a setting of 254. A value of 255 tells hdparm to disable Advanced Power Management altogether on the drive (not all drives support disabling it, but most do). Then, seems that 128 doesn't permit spin-down. Anyway, in hdparm man page also tells you that 254 could be needed in some cases instead of 255 Thanks a lot (In reply to comment #70 from Pacho Ramos) > Then, Is "storage-fixup" the package who fixes this problem? From > http://software.opensuse.org , seems that it has versions for factory, SLED 10 > and opensuse-10.3, Is save use factory rpm in 11.0 ? Yes. But there will also be an online update for 11.0 delivered soon, so it should pop up in YOU shortly without the need for adding any buildservice repositories. (In reply to comment #71 from Pacho Ramos) > After reading storage-fixup.conf, seems that -B 255 is being used but, after > reading https://wiki.ubuntu.com/DanielHahler/Bug59695 , seems that some drives > would need -B 254 instead. Also laptop-mode-tools now uses 254 instead of 255 > from 1.35 version as can be read in http://samwel.tk/laptop_mode/changelog > > Also, Is fully disabling Power Manager really needed? Wouldn't it cause some > problems related with overheat or short battery lifetime? As discussed above, no one value suits every drive. I hope it were like that but the standard isn't too specific about which value means exactly what. Furthermore, this is ATA and vendors often forget to follow the spec. So, for some drives, 255 is a good value for others 254, yet others 128. That's why storage-fixup matches specific machines and use appropriate commands. We'll need to find out which value is the appropriate one for specific machine and add it machine-by-machine. OK, thanks a lot for explanation released From now on, please use the following wiki page to track this problem and report new ones to linux-ide@vger.kernel.org as this problem is not specific to SUSE. http://ata.wiki.kernel.org/index.php/Known_issues Thanks. |