|
Bugzilla – Full Text Bug Listing |
| Summary: | mga[2x G550] Xserver hangs in int10 | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.2 | Reporter: | Stephan Lauffer <lauffer> |
| Component: | X.Org | Assignee: | Egbert Eich <eich> |
| Status: | RESOLVED UPSTREAM | QA Contact: | E-mail List <xorg-maintainer-bugs> |
| Severity: | Enhancement | ||
| Priority: | P4 - Low | CC: | andreas.schallenberg, eich, sndirsch |
| Version: | Alpha 5 | ||
| Target Milestone: | --- | ||
| Hardware: | i586 | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | Beta-Customer | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Xorg.0.log with x11-org-server-6.9.0-48
Xorg.0.log with x11-org-server-6.9.0-50.17 output of the lspci -vv command xorg-x11-server-6.9.0-50.24.i586.rpm Xorg.0.log correspondingto xorg--x11-server rpm from #11 tgz with a lot ow 'lspci -v' results - see #29 Xorg.99.log sysdata-22229 |
||
|
Description
Stephan Lauffer
2006-10-09 11:53:25 UTC
sorry for my cut&past failure above... indeed the last device has the Identifier "Device[3]" and not 2. Changes in xorg-x11-server-6.9.0-50.17 since xorg-x11-server from 10.1: * Do Jun 29 2006 - sndirsch@suse.de - p_ia64-console.diff: * fixes MCA after start of second Xserver (Bug #177011) * Mi Jun 28 2006 - sndirsch@suse.de - p_initialize-pci-tag.diff: * initialize PCI tag correctly, which is used by an IA64 specific patch (see Bug #147261 for details); fixes Xserver crashes with fglrx driver - and possibly other drivers like vesa - during initial startup (!), VT switch and startup of second Xserver (SLED10 Blocker Bugs #180535, #170991, #158806) * Do Jun 08 2006 - sndirsch@suse.de - p_xnest-ignore-getimage-errors.diff: * ignores the X error on GetImage in Xnest (Bug #174228) * Fr Jun 02 2006 - sndirsch@suse.de - pc_xf86-pci.diff: * fixes broken BIOS reading (due to changes in recent Linux kernels), which is required for dual card support (Bug #171453, X.Org Bug #6751) - removed no longer required patch "p_xlib_skip_select_substructure_redirect.diff" again (Bug #151836) * Mi Mai 31 2006 - sndirsch@suse.de - generate /usr/X11R6/lib/X11/fonts/misc/fonts.dir on s390/s390x during %install (Bug #178315) * Di Mai 30 2006 - sndirsch@suse.de - fixed check for empty /usr/X11R6/lib/X11/fonts/misc/fonts.dir (Bug #178315) * Mo Mai 29 2006 - sndirsch@suse.de - p_xlib_skip_select_substructure_redirect.diff: * fool java swing apps that no WM is running (Bug #151836) * Mo Mai 22 2006 - sndirsch@suse.de - p_xlib_skip_ext_env.diff: * added support for disabling extensions through environment variables (Bug #167317) - no longer remove NVIDIA installer in %pre of xorg-x11-server-glx, since it's no longer conflicting with the NVIDIA driver package (Bug #175683) - make sure that /usr/X11R6/lib/X11/fonts/misc/fonts.dir is not empty (Bug #178315) * Fr Mai 19 2006 - sndirsch@suse.de - /etc/X11/xdm/Xsetup: * start compiz on gdm when GLX_EXT_texture_from_pixmap is available to fix horrible performance (Bug #173901) * Do Mai 11 2006 - sndirsch@suse.de - p_pci-legacy-mmap.diff: * fixes legacy area mapping on IA64 (Bug #166112) - p_xorg-fbcompose-radek2.diff: * fixes massive Xrender corruption (Bug #152730, X.Org Bug #6827) * Di Mai 09 2006 - sndirsch@suse.de - %post of xorg-x11-server: "/dev/mouse" --> "/dev/input/mice" in /etc/X11/xorg.conf (Bug #172260) Hm. This *could* be a broken IA64 patch I'm currently working on. However, this is just a rough idea... Candidates are: - p_ia64-console.diff - p_initialize-pci-tag.diff - pc_xf86-pci.diff My favorite is pc_xf86-pci.diff. Could you also attach /var/log/Xorg.0.log for the original xorg-x11-server RPM from the CD/DVD *and* for the 6.9.0-50.17 one? I hope to see some interesting differences in the output. Thanks. Created attachment 101099 [details]
Xorg.0.log with x11-org-server-6.9.0-48
Created attachment 101100 [details]
Xorg.0.log with x11-org-server-6.9.0-50.17
Created attachment 101101 [details]
output of the lspci -vv command
Ok, it got some new informations - logs see in #5, #6 and #7
The "crash" is more like getting an unusable system:
- If the machine is in init 3 and if I start '/usr/X11R6/bin/Xorg -verbose 9'
with the 6.9.0-50.17 my eth device sucks - the machine is getting in a
"unusable state", BUT i/o on hdd device is ok AND Xorg is comming up!
- If the machine is in init 3 and switching to init 5... I see there no
comming up X and the machine looks like crashed (no keybord response, no ping
replies..., blank screens).
If I only start Xorg (like the 1st example above) and keep my eyes on ping I
see something like this:
lisa:~ # ping www.suse.de
[...]
64 bytes from turing.suse.de (195.135.220.3): icmp_seq=114 ttl=54 time=24.8
ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
[...]
Breaking this ping process and restarting gives me an "Destination Host
Unreachable". But I can bring the interface down, rmmod the eth module and
reload this... followed by this loginformation from /var/log/warn:
Oct 10 11:53:40 lisa kernel: PCI: Enabling device 0000:02:0b.0 (0000 -> 0003)
Oct 10 11:53:40 lisa kernel: e100: 0000:02:0b.0: e100_eeprom_load: EEPROM
corrupted
Oct 10 11:53:40 lisa kernel: e100: probe of 0000:02:0b.0 failed with error
-11
During a normal boot and insmod I got this info for e100...:
lisa:~ # dmesg | grep e100
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
e100: eth0: e100_probe: addr 0xef800000, irq 6, MAC addr 00:02:B3:B8:20:7A
e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
The log files...
#5 Working X, see log from /var/log/Xorg.0.log in Xorg.0.log-6.9.0-48
#6 Trouble-X, see log from /var/log/Xorg.0.log in Xorg.0.log-6.9.0-50.17
#7 'lspci -vv' output in lspci-vv.txt
greetings!
Oh well. :-( --- Xorg.0.log-6.9.0-48 2006-10-10 12:31:56.000000000 +0200 +++ Xorg.0.log-6.9.0-50.17 2006-10-10 12:31:46.000000000 +0200 [...] -Requesting insufficient memory window!: start: 0xf0800000 end: 0xf1efffff size 0x2000000 -(EE) Cannot find empty range to map base to -(WW) MGA(0): Video BIOS info block not detected! +(II) Truncating PCI BIOS Length to 36864 +(--) MGA(0): Video BIOS info block at offset 0x07CE0 [...] -(--) MGA(0): Max pixel clock is 600 MHz +(--) MGA(0): Max pixel clock is 1200 MHz [...] -(II) MGA(0): Clock range: 12.00 to 600.00 MHz +(II) MGA(0): Clock range: 12.00 to 1200.00 MHz Probably some bogus broken BIOS values are used now ... I think this is related to "pc_xf86-pci.diff" patch, which needs to be verified first. I'll attach a RPM without this patch for testing ASAP. Created attachment 101111 [details]
xorg-x11-server-6.9.0-50.24.i586.rpm
RPM for testing.
Please give it a try. Created attachment 101117 [details]
Xorg.0.log correspondingto xorg--x11-server rpm from #11
still having problems with the nic (the eth0 is dead), but the diff between the xorg logs of build 48 and your 50.24 from #11 shows less changes... hm...
So does it fix the initial problem? Yes, it fixes the crash of the whole system! Now I can start X/KDE with the 2*2 head setup. :) The new Xorg now only crashes the ethernet controller. This problem is still open: PCI: Enabling device 0000:02:0b.0 (0000 -> 0003) kernel: e100: 0000:02:0b.0: e100_eeprom_load: EEPROM corrupted kernel: e100: probe of 0000:02:0b.0 failed with error (In reply to comment #15) > Yes, it fixes the crash of the whole system! Now I can start X/KDE with the > 2*2 head setup. :) Ok. Unfortunately it's no an option to simply remove this patch again. See Bug #171453, X.Org Bug #6751. > The new Xorg now only crashes the ethernet controller. This problem is still > open: > PCI: Enabling device 0000:02:0b.0 (0000 -> 0003) > kernel: e100: 0000:02:0b.0: e100_eeprom_load: EEPROM corrupted > kernel: e100: probe of 0000:02:0b.0 failed with error I think this is a different issue. Egbert, any ideas what's wrong with pc_xf86-pci.diff. At least it breaks Matrox Dualhcard support. :-( * Fr Jun 02 2006 - sndirsch@suse.de - pc_xf86-pci.diff: * fixes broken BIOS reading (due to changes in recent Linux kernels), which is required for dual card support ( Bug #171453, X.Org Bug #6751) *** Bug 184002 has been marked as a duplicate of this bug. *** Egbert? (In reply to comment #15) The new Xorg now only crashes the ethernet controller. This problem is still > open: > PCI: Enabling device 0000:02:0b.0 (0000 -> 0003) > kernel: e100: 0000:02:0b.0: e100_eeprom_load: EEPROM corrupted > kernel: e100: probe of 0000:02:0b.0 failed with error > This is a boot message - right? If so it would get generated before X is started. Do you still see the ethernet lockups after you start X or do you only get this message? If yes - please test with a bare Xserver: just run 'X' from a root console in runlvl 3. (In reply to comment #17) > Egbert, any ideas what's wrong with pc_xf86-pci.diff. At least it breaks Matrox > Dualhcard support. :-( > Your assumption about the BIOS seems plausible. For testing we could provide a patched driver which ignores the BIOS table. (In reply to comment #20) Egbert: The e100 error was not a boot message, it was caused after starting the X server. Booting in runlevel 3 and all worked fine. You can give me a patched version and I'll give it a try. But it could take about some hours until I can test it here. Stephan: would you please run lspci -v a. before you start X, b: while X is running c: after you have terminated X. Does your network come back after you take down X? The EEPROM data is loaded when the driver is initialized. Usually network drivers are initialized before X starts. I do not now of any reason why the startup of X does a reinitialization of the network device.
I've run accross this checksum problem on Intel NICs before. Infact I've reported a bug on this -> #57976. The problem was unrelated to any X activeties.
Stephan: can you verify that the message appears also when you have booted in runlvl 3, logged into a console as root and started a bare Xserver as described in attachment #21 [details]?
Err, I ment to say: comment #21. Bug #57976 and #177440 have some information what can be done to fix the eeprom issue. Stephan, any comments on Egbert's questions? Sorry, not now. Some weeks ago I plugged out the 2nd card and this card is in use in another machine since this date. I'm waiting for the delivery of an ordered PNY Quadro4 440 NVS card. As soon as this card arrives here, I can play with this stuff and start new testings. But I guess it will be january... is this to late (I will not forget the test)? That's ok. :-) Ok, back on bug 21098 with a dedicated test machine (same hardware as before). You can get full root access if you like... Here the test results for Egbert (see #24) in short, the complete lscpi logs will be send in some mimutes. I really wonder why the old test I did in #13 and #15 (with the test rpm xorg-x11-server-6.9.0-50.24.i586.rpm in #11 from stefan) did NOT crash while not i crashes the host! I made these teste with several xorg-x11 SL-10.1 packes: -------------------------------------------------------- 1. Boot in runlevel 3 2. lspci -v > <file...>-0 3. X -verbose 1 4. lspci -v > <file...>-1 5. kill the X from 3. 6. lspci -v <file...>-2 I made these test above (as far as possible) with the following SL-10.1 xorg installations: a) xorg in the installation from the first/final SL-10.1 ftp release b) ...with the xorg-x11-driver-video update with patch level 46.15 c) ...with the xorg-x11-driver-video update with patch level 46.20 d) ...with the xorg update packages patch level 50.17 -> Important note: This was the last working X installation! After starting (and killing) X the ethernet card becomes "unusable"! e) ...with... 50.20 f) ...with... 50.24 g) ...with... stefans patched xorg-x11-server 50.24 from #11 In all the tests a..g the "interuptmopde" in the bios was "APIC". So I started some more tests with the "PIC" mode. But there it wasn't something other than in the apic mode. Ok... wait some minutes, I'll upload a tgz with the lspci outputs. Created attachment 112449 [details]
tgz with a lot ow 'lspci -v' results - see #29
Filenames <-> comment:
lspci-50.17-0 <-> patch level 50.17, runlevel 3
lspci-50.17-1 <-> dito, X has been started
lspci-50.17-2 <-> ... X has been stopped
lspci-50.20-0, lspci-50.24-0, lspci-patched-package-bug-210988-50.24-0 <-> lscpi before starting X, host freezes/crashes after start
Other files:
lspci-50.24-pic-old_video-driver-46-0 <-> SL-10.1 final with no video-driver updates but the latest server.
lspci-patched-package-bug-210988-50.24-pic-old_video-driver-46-0 <-> SL-10.1 final with no video-driver updates but the patched server from stefan (see #11)
And some other files with the "PIC" (not "APIC") mode. These files has "pic" in the filename.
Poorly you will not see big differences in all these lspci outputs but I made them all the not have some still opened questions in this case.
(don't forget my note in #29: I can grant you a root login on this dedicated test machine...)
Greetings, Stephan
Egbert, could you comment on the Stephan's investigations triggered by your qeustions? Thanks. Egbert? Date: Mon, 12 Feb 2007 15:42:22 +0100 From: Stephan Lauffer <lauffer@ph-freiburg.de> To: sndirsch@novell.com Subject: Re: [Bug 210988] kernel crash caused by xorg with two DH Matrox G550 > ------- Comment #32 from sndirsch@novell.com 2007-01-30 13:35 MST ------- > Egbert? Hallole! Du, also... ich hab einfach damals aufgegeben und mir ne Quad-Karte mit nVidia Chipsatzt gekauft. Ich weiss nich, ob ihr die Sache wirklich noch tief untersuchen wollt. Ich koennt die Testkiste auch fuer andere Sachen "miss-"brauchen. Also falls das euch nich so wichtig iss... dann bau ich die idle Kiste (immerhin p4 mit 1gb ram) ab. ich kann dann halt nimmer gross behilflich sein... was meinst? -- Liebe Gruesse, with best regards Stephan Lauffer finally closing as WONTFIX. I will never get a chance to look at this one if it's closed. reopening to assign to myself. Adjusting priority. So you have two G550 cards? One of them PCI? Otherwise I'm afraid you can't investigate this issue. No, but I haven't had a chance to look at the pile of logs attached here. Let's reduce the priority on this one for now. JFYI, Matthias. This is a bugreport, which is assigned to Egbert/me or with Egbert/me in CC or reported by Egbert/me. (In reply to comment #17 from Stefan Dirsch) > Egbert, any ideas what's wrong with pc_xf86-pci.diff. At least it breaks Matrox > Dualhcard support. :-( > > * Fr Jun 02 2006 - sndirsch@suse.de > - pc_xf86-pci.diff: > * fixes broken BIOS reading (due to changes in recent Linux > kernels), which is required for dual card support ( Bug #171453, > X.Org Bug #6751) This patch no longer exists in 10.2. I suggest to test first if this issue still exists with openSUSE 11.0 before investigating it. Meanwhile I have pile of PCI G450 cards here. There's no kernel crash any more happening, but X hangs here:
(--) MGA(0): Chipset: "mgag400" (G450)
(==) MGA(0): Depth 24, (==) framebuffer bpp 32
(==) MGA(0): RGB weight 888
(II) Loading sub module "int10"
(II) LoadModule: "int10"
(II) Loading /usr/lib64/xorg/modules//libint10.so
(II) Module int10: vendor="X.Org Foundation"
compiled for 1.4.0.90, module version = 1.0.0
ABI class: X.Org Video Driver, version 2.0
(II) MGA(0): Initializing int10
This is on x86_64. I'll attach the truncated logfile.
Created attachment 208351 [details]
Xorg.99.log
Created attachment 208352 [details]
sysdata-22229
xorg.conf
Marcus is currently working on a similar bug --> Bug #380298. This one sounds well suited for Luc 1. Old hardware 2. Exotic setup (Multicard) :-) Implementing enhancement Bug #381644 would resolve this issue as well. (In reply to comment #52 from Stefan Dirsch) > Implementing enhancement Bug #381644 would resolve this issue as well. done. At least there is some hope to get this feature again. http://lists.x.org/archives/xorg-devel/2009-May/000828.html http://lists.x.org/archives/xorg-devel/2009-May/000928.html Needs to be addressed upstream. |