|
Bugzilla – Full Text Bug Listing |
| Summary: | Samsung R510 reboots on x86_64 - disabling ACPI components or mem=4GB helps | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.0 | Reporter: | franco rossi <rossif8> |
| Component: | Kernel | Assignee: | Thomas Renninger <trenn> |
| Status: | RESOLVED WONTFIX | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P5 - None | CC: | acpi, andi-nbz, drtl, forgotten_a525umNONh, gbv, kent.liu, rossif8, youquan.song |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | 64bit | ||
| OS: | openSUSE 11.1 | ||
| Whiteboard: | |||
| Found By: | Community User | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
boot_log
acpidump new BIOS DSDT.dsl Modified Boot_386_suse11_1 boot.msg with current kernel and pci=noacpi dmidecode samsung e152 eron dmesg output with acpi_root_table=rsdt acpi_debug_level=0x1f processor.max_cstate=1 |
||
|
Description
franco rossi
2008-10-20 11:19:22 UTC
Created attachment 246516 [details]
boot_log
The log of the boot with acpi=on
Changing Severity. Reassign to kernel-maintainers. *** Bug 441367 has been marked as a duplicate of this bug. *** A Marvell network card... Please try to boot with boot param: pcie_aspm=off Please try to reboot several times (AFAIK it does not always hang?). I tried with pcie_aspm=off at boot time (Opensuse 11.0 x86_64) but the result is the same. The system not start , the boot sequence rerun from the Bios welcome screen continuously. I have update the BIOS too. With OpenSuse 11.1 RC1 x_386 (live) ALL OK !!!!!!!!!!!?????? With OpenSuse 11.1 RC1 x86_64 (live) KO !!!!!!!!!!! The system d'not start Thanks for your interest Yes, it is unrelated to pcie_aspm.
> Without acpi=off, the boot sequence rerun from the Bios welcome screen
> continuously.
With this info and the boot log from above, is this correct:
Everything boots fine until runlevel X is reached and while booting the machine starts to shutdown again, I mean: starts to reboot?
If you pass 1 as a boot parameter, does it work (prompt for login instead of shutting down/rebooting?).
If not, does it work with init=/bin/bash boot param?
If yes, can you switch to runlevel 2,3 and 5 then (init 2... init 3...)?
Created attachment 258115 [details]
acpidump new BIOS
With acpi=off the system boot FINE, X server OK, KDE 4 is OK, the system work very well (minus battery check , fan etc.) I have tried with init=/bin/bash 1 (single with acpi in ON), but the system rerun from the Bios welcome screen continuously. I send the acpidump after update the last BIOS. Thanks Please remove: processor thermal fan from: INITRD_MODULES="..." in: /etc/sysconfig/kernel invoke: mkinitrd reboot and try again with: init=/bin/bash does it still reboot? Created attachment 258626 [details]
DSDT.dsl Modified
Thomas After the modified the result is the SAME the system don't boot. I have tried with UBUNTU 8.10 x_386 is OK with UBUNTU 8.10 x86_64 KO (the system don't start) I have modified the DSDT.dsl and the DSDT.aml copied into the initrd.... without success (SIGH !!!!!) I will install Opensuse 11.1 x_386 (!!!!!!!!!!!!!!! ???????????) BY BY Many Thanks for your interest , i lerned very things It's too late for 11.1 Goldmaster now anyway. You could keep a little partition for x86_64 tests? Or you could install i386 and later try x86_64 kernels with it: i386 base system + x86_64 kernel works, the other way around it does not. Let's keep this bug open for while, maybe this gets fixed mainline. If you find anything related to this bug, please let us know. Otherwise I'd like to look at this again as soon as I find some time. Thomas Many Thank for your supply I hope the new kernel will solve my problem I send to you Merry Christmas and Happy New Year (Scuse me for my bad English) Franco I have the same problem on 11.1 with a Samsung E152 laptop. I have the current kernel 2.6.27.7-9-default . If you need someone to track down the error, I can help you. I solved with install opensuse 11.1 x_386 with pae kernel but i would like to know wich is the problem on opensuse 11.1 x_86_64 Thanks for your help Can you provide i386 boot dmesg output also, please. I hope you still have the x86_64 installation? In the first comment #1, providing boot_log, you passed acpi=on and acpi=off. This is not good. Can you try to boot without both of them and send the boot log again (dmesg). No, i do not have the x86_64 installation. Acpi=on parameter is not supported . I boot without acpi parameter for activate acpi. I will send i386 boot dmesg Created attachment 264938 [details]
Boot_386_suse11_1
> I boot without acpi parameter for activate acpi. Yep. You can boot an x86_64 kernel on a i386 installed system (not the other way around). How to install x86_64 kernel on an i386 installation ==================================================== Best you take the -default and -default-base kernels from here: ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/x86_64 This is the latest kernel and if we find it you can get a fixed kernel from there before a update kernel is published. Download them and additionally install the kernel like this: rpm -ivh --ignorearch /mounts/dist/kerneltest/HEAD/x86_64/kernel-default{-base,}.rpm I separately post how you might be able to boot it without the reboot issue. Exclude possibly offending modules from initrd. Add the modules to /etc/modprobe.conf.local: blacklist processor blacklist fan blacklist thermal then rebuild the initrd, best only for the newly installed x86_64 kernel: cd /boot mkinitrd -k vmlinuz-... -i initrd-... You also should remove the splash=... boot param from the new kernel in /boot/grub/menu.lst and also replace the vga=... boot param against vga=normal. Like that you should see more when the kernel is booting. Also add a simple 1 as boot param there. This avoids that these modules get loaded by other services later. Now try to boot the kernel. Does it still reboot? If it works, log in and do: echo 0x1f >/sys/module/acpi/parameters/debug_level modprobe fan; modprobe processor; modprobe thermal; dmesg >/tmp/dmesg All in one line so that the dmesg command is still executed if it starts to reboot. Then send the output of dmesg. If it still does not boot, it happens even earlier and I still have a last other idea how to get some more info out of the machine.... Created attachment 264967 [details]
boot.msg with current kernel and pci=noacpi
I installed the current kernel and booted with
1 acpi_debug_level=0x1f
Booting was only possible for runlevel "s", else the reboot problem was there.
Then, I booted with
1 acpi_debug_level=0x1f pci=noacpi
This worked, but I could not insert the modules, because it rebooted.
I attached the boot.msg for the latter boot process.
Runlevel 5 does not work with pci=noacpi, even though it gets much further in the boot process.
There is no output in /tmp/dmesg Are these commands started sequentially or started parallelly? modprobe fan; modprobe processor; modprobe thermal; dmesg >/tmp/dmesg Perhaps it should be dmesg > /tmp/dmesg & modprobe fan; modprobe processor; modprobe thermal But I am not very good with shells... Why do you need pci=noacpi? Better do not add additional parameters or things do get confusing. From above I'd say pci=noacpi does not affect the reboot problem, so it's unrelated? > Then, I booted with > 1 acpi_debug_level=0x1f pci=noacpi > This worked, but I could not insert the modules, because it rebooted. What means it worked? If it rebooted it did not work and it is the same as with booting without pci=noacpi?!? > I installed the current kernel and booted with > 1 acpi_debug_level=0x1f > Booting was only possible for runlevel "s", else the reboot problem was there. That's great. Now we can debug a bit there... Did this only work with processor, fan and thermal blacklisted? I very much expect that a kernel driver is causing the reboot. If processor, fan and thermal had been blacklisted, I'd try to load them manually. The first task now is to find out which driver/module causes the reboot. Hmm, maybe this is related to something else. If pci=noacpi changes the reboot behaviour it could be related to something else. Can you separately try to boot (without additional boot params, only these): pcie_aspm=off noaer > There is no output in /tmp/dmesg If it's too early for dmesg you could try to do: cat /proc/kmsg >/tmp/kmsg the command will block and /proc/kmsg will be empty afterwards, but that should not matter. (In reply to comment #24) > Why do you need pci=noacpi? If I do not use pci=noacpi, the machine reboots during the boot procedure. I have it from a ACPI faq page. > > Then, I booted with > > 1 acpi_debug_level=0x1f pci=noacpi > > This worked, but I could not insert the modules, because it rebooted. > What means it worked? If it rebooted it did not work and it is the same as with > booting without pci=noacpi?!? I can only get to a shell if I use that parameter. Else the system reboots during initialization. > > I installed the current kernel and booted with > > 1 acpi_debug_level=0x1f > > Booting was only possible for runlevel "s", else the reboot problem was there. > That's great. Now we can debug a bit there... > Did this only work with processor, fan and thermal blacklisted? > I very much expect that a kernel driver is causing the reboot. > If processor, fan and thermal had been blacklisted, I'd try to load them > manually. The first task now is to find out which driver/module causes the > reboot. If I modprobe these modules, the system reboots. I cannot get any dmesg output with modprobe fan; modprobe processor; modprobe thermal; dmesg >/tmp/dmesg > > > Hmm, maybe this is related to something else. If pci=noacpi changes the reboot > behaviour it could be related to something else. Can you separately try to boot > (without additional boot params, only these): > pcie_aspm=off noaer > > > There is no output in /tmp/dmesg > If it's too early for dmesg you could try to do: cat /proc/kmsg >/tmp/kmsg > the command will block and /proc/kmsg will be empty afterwards, but that should > not matter. Oops, forgot to answer the last part. pcie_aspm=off causes a reboot during the boot sequence too. I think I got it: Can you try acpi_root_table=rsdt as a boot param. Please attach dmidecode output if that works. Created attachment 265276 [details]
dmidecode samsung e152 eron
I booted the current kernel (i.e not the 11.1) with acpi_root_table=rsdt
I started with runlevel s
Then I did
modprobe acpi
without problems
dmesg shows
ACPI: SSDT BFD1AC20, 0265 (r1 PmRef Cpu0Ist 3000 INTL 20050624)
Parsing all Control Methods:
Table [SSDT](id 0089) - 6 Objects with 0 Devices 3 Methods 0 Regions
ACPI: SSDT BFD18020, 0C78 (r1 PmRef Cpu0Cst 3001 INTL 20050624)
Parsing all Control Methods:
Table [SSDT](id 008A) - 1 Objects with 0 Devices 1 Methods 0 Regions
Monitor-Mwait will be used to enter C-1 state
Monitor-Mwait will be used to enter C-2 state
ACPI: CPU0 (power states: C1[C1] C2[C2])
processor ACPI_CPU:00: registered as cooling_device0
ACPI: Processor [CPU0] (supports 8 throttling states)
ACPI: SSDT BFD19CA0, 01CF (r1 PmRef ApIst 3000 INTL 20050624)
Parsing all Control Methods:
Table [SSDT](id 0094) - 12 Objects with 0 Devices 12 Methods 0 Regions
ACPI: SSDT BFD19F20, 008D (r1 PmRef ApCst 3000 INTL 20050624)
Parsing all Control Methods:
Table [SSDT](id 0096) - 3 Objects with 0 Devices 3 Methods 0 Regions
ACPI: CPU1 (power states: C1[C1] C2[C2])
processor ACPI_CPU:01: registered as cooling_device1
ACPI: Processor [CPU1] (supports 8 throttling states)
Marking TSC unstable due to TSC halts in idle
then I did ( I think in this sequence )
modprobe fan
modprobe processor
modprobe thermal
ACPI: Transitioning device [FAN0] to D3
fan PNP0C0B:00: registered as cooling_device2
ACPI: Fan [FAN0] (off)
ACPI: Transitioning device [FAN1] to D3
fan PNP0C0B:01: registered as cooling_device3
ACPI: Fan [FAN1] (off)
thermal LNXTHERM:01: registered as thermal_zone0
ACPI: Thermal Zone [TZ00] (44 C)
thermal LNXTHERM:02: registered as thermal_zone1
ACPI: Thermal Zone [TZ01] (44 C)
I attached the output of dmidecode
Great. The boot parameter should make the machine fully work without any further problems. If you pass it at installation time, it will be used on further installed/updated kernels. Franco: Can you also attach dmidecode output please. I am going to blacklist your machines, so that no boot parameter is needed anymore for those in future kernels. > The boot parameter should make the machine fully work without any further
> problems.
Oh, I am sorry I was not clear enough in my last post.
I can get to bootlevel s. If I want to go to 3 or 5, the system reboots. (I will try 1 in a second)
I videotaped it and the last line of the boot process before rebooting is:
ACPI: Processor [CPU1] (supports 8 throttling states)
I do not know what part of the initialization follows, but that seems to cause the reboot.
Runlevel 1 causes a reboot too. Does the machine immediately (hard) reboot or is the ordinary, software controlled reboot process happening (runlevel services are stopped, etc.)? So: acpi_root_table=rsdt does not help at all? If it is a hard reboot, maybe C2 is causing this. Then processor.max_cstate=1 or idle=poll boot params may help? If the machine ordinary shuts down (services are stopped, etc.), then it's probably the thermal module causing this by a wrongly read temperature (OS things a critical temperature is reached and shuts down the machine). Can you also attach full dmesg of a x86_64 boot, please (hope you still have it on your disk?). > Does the machine immediately (hard) reboot or is the ordinary, software > controlled reboot process happening (runlevel services are stopped, etc.)? No, the computer does a hard reboot > So: > acpi_root_table=rsdt > does not help at all? It helps. Without the option I cannot get to runlevel s. > > If it is a hard reboot, maybe C2 is causing this. > Then processor.max_cstate=1 > or > idle=poll > boot params may help? No, this did not help. I get till runlevel s. If I type init 1 the system reboots > > If the machine ordinary shuts down (services are stopped, etc.), then it's > probably the thermal module causing this by a wrongly read temperature (OS > things a critical temperature is reached and shuts down the machine). dmesg shows a temperature of 48 degrees. So this does not seem to be the problem. I compared the dmesg output with that of the 32bit livecd which works. After the ACPI, the sound is initialized, perhaps this causes problems? After all, I can modprobe acpi without problems... this follows the acpi init in 32-bit livecd. HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22 HDA Intel 0000:00:1b.0: setting latency timer to 64 hda_codec: Unknown model for ALC262, trying auto-probe from BIOS... I'll send you a full dmesg later.
IN THE FORUM I FIND THIS ????????????????????????
my notebook: Samsung E152 -Aura T5750 Eron, 4GB RAM
my OS: Kubuntu Intrepid
$ uname -r
2.6.27-9-generic
I have some trouble with acpi. Without special boot parameters, Intrepid will not boot. With acpi=off it boots. I also found that it boots with ACPI using the flag "mem=4096M" (don't understand why, but it works...). But when I use this flag, the OS recognizes only 3 GB of my 4 GB RAM:
$ free -m
total used free shared buffers cached
Mem: 3012 769 2242 0 12 342
-/+ bufe Fn-Keysfers/cache: 414 2597
Swap: 7906 0 7906
Also, most Fn keys don't work when ACPI is in use, but that is a known bug...
Created attachment 265332 [details]
dmesg output with acpi_root_table=rsdt acpi_debug_level=0x1f processor.max_cstate=1
I started with the aforementioned parameters and entered boot lvl s. Then I modprobed acpi, fan, processor, thermal. init 1 caused a hard reboot. > I have some trouble with acpi. Without special boot parameters, Intrepid will
> not boot. With acpi=off it boots. I also found that it boots with ACPI using
> the flag "mem=4096M" (don't understand why, but it works...). But when I use
> this flag, the OS recognizes only 3 GB of my 4 GB RAM:
I can confirmt this.
with mem=4096MB I am able to boot and ACPI works. (unplugging my power cord caused a hard reboot before, now kde recognizes this)
My machine shows only 3GB RAM too.
windows vista shows correctly 4GB of RAM 32-bit opensuse livecd with mem=4G shows 3GB (acpi=on) 64-bit opensuse kernel with mem=4G shows 3GB (acpi=on) 64-bit current kernel with mem=4G shows 3GB (acpi=on) But anyways, having 1GB less ram is better than not being able to disconnect the laptop :-) This looks like a memory corruption or say memory setup problem, not an ACPI problem.
Summary:
- x86_64 is broken, i386 works fine
- booting without any acpi modules loaded into single user mode only works
with pci=noacpi. Without the boot param -> reboot
- Booted into single user mode loading a specific acpi module -> reboot
- As pci=noacpi and the processor or thermal module do not have to do with
each other much, it looks like specific ACPI mem accesses trigger this,
especially because:
- mem=4GB works, but restricts the system to 3GB, instead of 4GB installed mem
- The FADT's 64 bit addresses also have a problem. Disassembling the table
shows: "Invalid zero length subtable" messages in FACP1.dsl. But not using
them (taking the rsdt table) does change some behaviour, but do not fix the
reboot issue.
It is a hard reboot.
It looks like if ACPI pokes on specific memory addresses which trigger the hard reboot.
From dmesg:
this already looks very suspicious:
Phoenix BIOS detected: BIOS may corrupt low RAM, working it around.
Maybe it's the PATables being wrong or wrongly setup?
Anyway, I can't fix this. Someone with more memory knowledge is needed. As this is an Intel machine, CC'ing some Intel people, maybe someone already sees something in dmesg output from comment #35
If I look at my dmesg output I see BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009e400 (usable) BIOS-e820: 000000000009e400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000d2000 - 00000000000d4000 (reserved) BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000bf8a1000 (usable) BIOS-e820: 00000000bf8a1000 - 00000000bf8a7000 (reserved) BIOS-e820: 00000000bf8a7000 - 00000000bf9b4000 (usable) BIOS-e820: 00000000bf9b4000 - 00000000bfa0f000 (reserved) BIOS-e820: 00000000bfa0f000 - 00000000bfb07000 (usable) BIOS-e820: 00000000bfb07000 - 00000000bfd0f000 (reserved) BIOS-e820: 00000000bfd0f000 - 00000000bfd18000 (usable) BIOS-e820: 00000000bfd18000 - 00000000bfd1f000 (reserved) BIOS-e820: 00000000bfd1f000 - 00000000bfd65000 (usable) BIOS-e820: 00000000bfd65000 - 00000000bfd9f000 (ACPI NVS) BIOS-e820: 00000000bfd9f000 - 00000000bfe00000 (ACPI data) BIOS-e820: 0000000100000000 - 0000000140000000 (usable) user-defined physical RAM map: user: 0000000000000000 - 000000000009e400 (usable) user: 000000000009e400 - 00000000000a0000 (reserved) user: 00000000000d2000 - 00000000000d4000 (reserved) user: 00000000000dc000 - 0000000000100000 (reserved) user: 0000000000100000 - 00000000bf8a1000 (usable) user: 00000000bf8a1000 - 00000000bf8a7000 (reserved) user: 00000000bf8a7000 - 00000000bf9b4000 (usable) user: 00000000bf9b4000 - 00000000bfa0f000 (reserved) user: 00000000bfa0f000 - 00000000bfb07000 (usable) user: 00000000bfb07000 - 00000000bfd0f000 (reserved) user: 00000000bfd0f000 - 00000000bfd18000 (usable) user: 00000000bfd18000 - 00000000bfd1f000 (reserved) user: 00000000bfd1f000 - 00000000bfd65000 (usable) user: 00000000bfd65000 - 00000000bfd9f000 (ACPI NVS) user: 00000000bfd9f000 - 00000000bfe00000 (ACPI data) 0xbfe00000 == 3GB. This is what I have available. BIOS-e820: 0000000100000000 - 0000000140000000 (usable) is missing in the user-defined physical RAM map. 0x140000000-0x100000000=0x40000000= 1073741824 = 1GB This could be the missing Gigabyte of ram. Right after the ACPI data. Perhaps this could be a hint to where the memory went. *** Bug 465699 has been marked as a duplicate of this bug. *** OpenSUSE 11.1 X86_64. Samsung R510 NP-R510-XA05ES. 4 GB ram This bug is still here. It hit me this week when I bought a Samsung R510 laptop. First solution I found was deactivate ACPI at boot , but this give me a lot of problems and hard resets when I tried to change display the display bright, clock delays and many other horror histories. Fortunalely I found the mem=4GB workaround and at least it is stable and fast. Just I lost 1 GB. Kerneltrap lists, ubuntu and some other lists also have reports about this bug and the only fix is, at the moment, the mem=4GB trick. Some users have reported to fix this problem after a BIOS update, but I'm not in the list because Samsung still have the 07IL BIOS version for my laptop. After reading here and there, it seems it is because a bug in BIOS that the kernel is not able to resolve. Thanks for the info. You might want to add references to the most important info you found. If I find some time I look at it. Thanks. Ok, these are other bug list threads where the problem is showed In ubuntu lauchpad (the second link is more interesting) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/354824 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/272530 Fedora https://bugzilla.redhat.com/show_bug.cgi?id=467519 and in Linux kernel bugzilla http://bugzilla.kernel.org/show_bug.cgi?id=12106 http://bugzilla.kernel.org/show_bug.cgi?id=11658 Hope this help oops. I accidently closed this one. Last attempt..., we should have a resolved -> documented flag for such bugs. Quick summary: - On a Samsung R510 the machine reboots at boot up process. - mem=4GB helps - mem=4GB hides 1GB and only 3 of 4 GB memory are visible - mem=4GB makes the e820 entry: BIOS-e820: 0000000100000000 - 0000000140000000 (usable) disappear. Looks like accessing memory in this area causes the reboot. Any other outcome, possibly a new BIOS exists? I am going to close this one won't fix now as I expect there is nothing we can do and this looks nicely documented by google. Please update the bug if you still find out more. The problem seems to be fixed with the current BIOS release (08LI of 13.11.2009) and release 11.2. I have not checked on 11.1. I have now the correct amount of RAM in /proc/meminfo I had to restore the default settings after the upgrade. Before the error persisted. Can anyone confirm this on his laptop? I've just updated the firmware of my R510 laptop with 08LI version. openSUSE 11.2. After restore the default settings and after shutdown I have now 4 GB of memory!!. Note that I also supressed the MEM=4GB kernel boot parameter I needed before. And also note that a restart is not sufficient. I've had to shutdown the system. It actually seems the new BIOS release resolves the problem, at least on 11.2 |