Bug 210988

Summary: mga[2x G550] Xserver hangs in int10
Product: [openSUSE] openSUSE 10.2 Reporter: Stephan Lauffer <lauffer>
Component: X.OrgAssignee: Egbert Eich <eich>
Status: RESOLVED UPSTREAM QA Contact: E-mail List <xorg-maintainer-bugs>
Severity: Enhancement    
Priority: P4 - Low CC: andreas.schallenberg, eich, sndirsch
Version: Alpha 5   
Target Milestone: ---   
Hardware: i586   
OS: Other   
Whiteboard:
Found By: Beta-Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Xorg.0.log with x11-org-server-6.9.0-48
Xorg.0.log with x11-org-server-6.9.0-50.17
output of the lspci -vv command
xorg-x11-server-6.9.0-50.24.i586.rpm
Xorg.0.log correspondingto xorg--x11-server rpm from #11
tgz with a lot ow 'lspci -v' results - see #29
Xorg.99.log
sysdata-22229

Description Stephan Lauffer 2006-10-09 11:53:25 UTC
My "quad head setup" with two G550 DH (one AGP card, one PCI) worked fine for a long time. But since the xorg-x11-server-6.9.0-50.17 update SL-10.1 update the machine crashes during the start of X11. This bug is in OSS-10.2Alpha5, two.

I see there no error logs from the kernel crash (console on ttyS) and no errors logged by the xserver (-verbose 9).

The related part of the xorg.conf:

<<snip>>
[...]

Section "Module"
  Load         "type1"
  Load         "dbe"
  Load         "freetype"
  Load         "glx"
  Load         "v4l"
  Load         "extmod"
EndSection

Section "Device"
  BoardName    "G550"
  BusID        "PCI:1:0:0"
  Driver       "mga"
  Identifier   "Device[0]"
  Option       "hwcursor" "on"
  Option       "MGASDRAM" "off"
  Chipset       "mgag550"
  VideoRam     32768
  Screen       0
  VendorName   "Matrox"
EndSection

Section "Device"
  BoardName    "G550"
  BusID        "PCI:1:0:0"
  Driver       "mga"
  Identifier   "Device[1]"
  Option       "hwcursor" "on"
  Option       "MGASDRAM" "off"
  Chipset       "mgag550"
  VideoRam     32768
  Screen       1
  VendorName   "Matrox"
EndSection

Section "Device"
  BoardName    "G550"
  BusID        "PCI:3:0:0"
  Driver       "mga"
  Identifier   "Device[2]"
  Option       "hwcursor" "on"
  Option       "MGASDRAM" "off"
  Chipset       "mgag550"
  VideoRam     32768
  Screen       0
  VendorName   "Matrox"
EndSection

Section "Device"
  BoardName    "G550"
  BusID        "PCI:3:0:0"
  Driver       "mga"
  Identifier   "Device[2]"
  Option       "hwcursor" "on"
  Option       "MGASDRAM" "off"
  Chipset       "mgag550"
  VideoRam     32768
  Screen       1
  VendorName   "Matrox"
EndSection

[...]

<<snap>>

How can I help?
Comment 1 Stephan Lauffer 2006-10-09 12:03:51 UTC
sorry for my cut&past failure above... indeed the last device has the Identifier "Device[3]" and not 2.
Comment 2 Stefan Dirsch 2006-10-09 12:29:24 UTC
Changes in xorg-x11-server-6.9.0-50.17 since xorg-x11-server from 10.1:

* Do Jun 29 2006 - sndirsch@suse.de
- p_ia64-console.diff:
  * fixes MCA after start of second Xserver (Bug #177011)

* Mi Jun 28 2006 - sndirsch@suse.de
- p_initialize-pci-tag.diff:
  * initialize PCI tag correctly, which is used by an IA64 specific
  patch (see Bug #147261 for details); fixes Xserver crashes with
  fglrx driver - and possibly other drivers like vesa - during
  initial startup (!), VT switch and startup of second Xserver
  (SLED10 Blocker Bugs #180535, #170991, #158806)

* Do Jun 08 2006 - sndirsch@suse.de
- p_xnest-ignore-getimage-errors.diff:
  * ignores the X error on GetImage in Xnest (Bug #174228)

* Fr Jun 02 2006 - sndirsch@suse.de
- pc_xf86-pci.diff:
  * fixes broken BIOS reading (due to changes in recent Linux
  kernels), which is required for dual card support (Bug #171453,
  X.Org Bug #6751)
- removed no longer required patch
  "p_xlib_skip_select_substructure_redirect.diff" again (Bug #151836)

* Mi Mai 31 2006 - sndirsch@suse.de
- generate /usr/X11R6/lib/X11/fonts/misc/fonts.dir on s390/s390x
  during %install (Bug #178315)

* Di Mai 30 2006 - sndirsch@suse.de
- fixed check for empty /usr/X11R6/lib/X11/fonts/misc/fonts.dir
  (Bug #178315)

* Mo Mai 29 2006 - sndirsch@suse.de
- p_xlib_skip_select_substructure_redirect.diff:
  * fool java swing apps that no WM is running (Bug #151836)

* Mo Mai 22 2006 - sndirsch@suse.de
- p_xlib_skip_ext_env.diff:
  * added support for disabling extensions through environment
  variables (Bug #167317)
- no longer remove NVIDIA installer in %pre of xorg-x11-server-glx,
  since it's no longer conflicting with the NVIDIA driver package
  (Bug #175683)
- make sure that /usr/X11R6/lib/X11/fonts/misc/fonts.dir is not
  empty (Bug #178315)

* Fr Mai 19 2006 - sndirsch@suse.de
- /etc/X11/xdm/Xsetup:
  * start compiz on gdm when GLX_EXT_texture_from_pixmap is
  available to fix horrible performance (Bug #173901)

* Do Mai 11 2006 - sndirsch@suse.de
- p_pci-legacy-mmap.diff:
  * fixes legacy area mapping on IA64 (Bug #166112)
- p_xorg-fbcompose-radek2.diff:
  * fixes massive Xrender corruption (Bug #152730, X.Org Bug #6827)

* Di Mai 09 2006 - sndirsch@suse.de
- %post of xorg-x11-server: "/dev/mouse" --> "/dev/input/mice" in
  /etc/X11/xorg.conf (Bug #172260)
Comment 3 Matthias Hopf 2006-10-09 12:33:27 UTC
Hm. This *could* be a broken IA64 patch I'm currently working on. However, this is just a rough idea...
Comment 4 Stefan Dirsch 2006-10-09 12:39:51 UTC
Candidates are:
- p_ia64-console.diff
- p_initialize-pci-tag.diff
- pc_xf86-pci.diff

My favorite is pc_xf86-pci.diff. Could you also attach /var/log/Xorg.0.log for
the original xorg-x11-server RPM from the CD/DVD *and* for the 6.9.0-50.17 one? I hope to see some interesting differences in the output. Thanks.
Comment 5 Stephan Lauffer 2006-10-10 10:26:15 UTC
Created attachment 101099 [details]
Xorg.0.log with x11-org-server-6.9.0-48
Comment 6 Stephan Lauffer 2006-10-10 10:28:47 UTC
Created attachment 101100 [details]
Xorg.0.log with x11-org-server-6.9.0-50.17
Comment 7 Stephan Lauffer 2006-10-10 10:29:33 UTC
Created attachment 101101 [details]
output of the lspci -vv command
Comment 8 Stephan Lauffer 2006-10-10 10:31:33 UTC
Ok, it got some new informations - logs see in #5, #6 and #7

The "crash" is more like getting an unusable system:
  - If the machine is in init 3 and if I start '/usr/X11R6/bin/Xorg -verbose 9'
with the 6.9.0-50.17 my eth device sucks - the machine is getting in a
"unusable state", BUT i/o on hdd device is ok AND Xorg is comming up!
  - If the machine is in init 3 and switching to init 5... I see there no
comming up X and the machine looks like crashed (no keybord response, no ping
replies..., blank screens).

If I only start Xorg (like the 1st example above) and keep my eyes on ping I
see something like this:
  lisa:~ # ping www.suse.de
    [...] 
  64 bytes from turing.suse.de (195.135.220.3): icmp_seq=114 ttl=54 time=24.8
ms
  ping: sendmsg: No buffer space available
  ping: sendmsg: No buffer space available
    [...]

Breaking this ping process and restarting gives me an "Destination Host
Unreachable". But I can bring the interface down, rmmod the eth module and
reload this... followed by this loginformation from /var/log/warn:
  Oct 10 11:53:40 lisa kernel: PCI: Enabling device 0000:02:0b.0 (0000 -> 0003)
  Oct 10 11:53:40 lisa kernel: e100: 0000:02:0b.0: e100_eeprom_load: EEPROM
corrupted
  Oct 10 11:53:40 lisa kernel: e100: probe of 0000:02:0b.0 failed with error
-11

During a normal boot and insmod I got this info for e100...:
lisa:~ # dmesg | grep e100
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
e100: eth0: e100_probe: addr 0xef800000, irq 6, MAC addr 00:02:B3:B8:20:7A
e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex

The log files...
#5 Working X, see log from /var/log/Xorg.0.log in Xorg.0.log-6.9.0-48
#6 Trouble-X, see log from /var/log/Xorg.0.log in Xorg.0.log-6.9.0-50.17
#7 'lspci -vv' output in lspci-vv.txt

greetings!
Comment 9 Stefan Dirsch 2006-10-10 10:48:09 UTC
Oh well. :-(

--- Xorg.0.log-6.9.0-48 2006-10-10 12:31:56.000000000 +0200
+++ Xorg.0.log-6.9.0-50.17      2006-10-10 12:31:46.000000000 +0200
[...]
-Requesting insufficient memory window!: start: 0xf0800000 end: 0xf1efffff size 0x2000000
-(EE) Cannot find empty range to map base to
-(WW) MGA(0): Video BIOS info block not detected!
+(II) Truncating PCI BIOS Length to 36864
+(--) MGA(0): Video BIOS info block at offset 0x07CE0
[...]
-(--) MGA(0): Max pixel clock is 600 MHz
+(--) MGA(0): Max pixel clock is 1200 MHz
[...]
-(II) MGA(0): Clock range:  12.00 to 600.00 MHz
+(II) MGA(0): Clock range:  12.00 to 1200.00 MHz

Probably some bogus broken BIOS values are used now ...

I think this is related to "pc_xf86-pci.diff" patch, which needs to be verified first.
Comment 10 Stefan Dirsch 2006-10-10 10:57:16 UTC
I'll attach a RPM without this patch for testing ASAP.
Comment 11 Stefan Dirsch 2006-10-10 12:06:29 UTC
Created attachment 101111 [details]
xorg-x11-server-6.9.0-50.24.i586.rpm

RPM for testing.
Comment 12 Stefan Dirsch 2006-10-10 12:06:59 UTC
Please give it a try. 
Comment 13 Stephan Lauffer 2006-10-10 12:54:54 UTC
Created attachment 101117 [details]
Xorg.0.log correspondingto  xorg--x11-server rpm from #11

still having problems with the nic (the eth0 is dead), but the diff between the xorg logs of build 48 and your 50.24 from #11 shows less changes... hm...
Comment 14 Stefan Dirsch 2006-10-10 13:14:49 UTC
So does it fix the initial problem?
Comment 15 Stephan Lauffer 2006-10-10 13:33:17 UTC
Yes, it fixes the crash of the whole system! Now I can start X/KDE with the 2*2 head setup. :)

The new Xorg now only crashes the ethernet controller. This problem is still open:
PCI: Enabling device 0000:02:0b.0 (0000 -> 0003)
kernel: e100: 0000:02:0b.0: e100_eeprom_load: EEPROM corrupted
kernel: e100: probe of 0000:02:0b.0 failed with error
Comment 16 Stefan Dirsch 2006-10-10 13:58:34 UTC
(In reply to comment #15)
> Yes, it fixes the crash of the whole system! Now I can start X/KDE with the
> 2*2 head setup. :)
Ok. Unfortunately it's no an option to simply remove this patch again. See 
Bug #171453, X.Org Bug #6751.

> The new Xorg now only crashes the ethernet controller. This problem is still
> open:
> PCI: Enabling device 0000:02:0b.0 (0000 -> 0003)
> kernel: e100: 0000:02:0b.0: e100_eeprom_load: EEPROM corrupted
> kernel: e100: probe of 0000:02:0b.0 failed with error

I think this is a different issue.
Comment 17 Stefan Dirsch 2006-10-22 15:54:03 UTC
Egbert, any ideas what's wrong with pc_xf86-pci.diff. At least it breaks Matrox Dualhcard support. :-(

* Fr Jun 02 2006 - sndirsch@suse.de
- pc_xf86-pci.diff:
  * fixes broken BIOS reading (due to changes in recent Linux
  kernels), which is required for dual card support ( Bug #171453,
  X.Org  Bug #6751)
Comment 18 Stefan Dirsch 2006-10-23 11:21:14 UTC
*** Bug 184002 has been marked as a duplicate of this bug. ***
Comment 19 Stefan Dirsch 2006-11-30 00:07:26 UTC
Egbert?
Comment 20 Egbert Eich 2006-11-30 09:00:41 UTC
(In reply to comment #15)
 The new Xorg now only crashes the ethernet controller. This problem is still
> open:
> PCI: Enabling device 0000:02:0b.0 (0000 -> 0003)
> kernel: e100: 0000:02:0b.0: e100_eeprom_load: EEPROM corrupted
> kernel: e100: probe of 0000:02:0b.0 failed with error
> 
This is a boot message - right? If so it would get generated before X is started. Do you still see the ethernet lockups after you start X or do you only get this message?
If yes - please test with a bare Xserver: just run 'X' from a root console in runlvl 3.
Comment 21 Egbert Eich 2006-11-30 09:03:39 UTC
(In reply to comment #17)
> Egbert, any ideas what's wrong with pc_xf86-pci.diff. At least it breaks Matrox
> Dualhcard support. :-(
> 

Your assumption about the BIOS seems plausible.
For testing we could provide a patched driver which ignores the BIOS table.
Comment 22 Stephan Lauffer 2006-11-30 09:11:03 UTC
(In reply to comment #20)
Egbert: The e100 error was not a boot message, it was caused after starting the X server. Booting in runlevel 3 and all worked fine.

You can give me a patched version and I'll give it a try. But it could take about some hours until I can test it here.
Comment 23 Egbert Eich 2006-11-30 11:05:55 UTC
Stephan:
would you please run lspci -v a. before you start X, b: while X is running c: after you have terminated X.
Does your network come back after you take down X?
Comment 24 Egbert Eich 2006-11-30 11:32:11 UTC
The EEPROM data is loaded when the driver is initialized. Usually network drivers are initialized before X starts. I do not now of any reason why the startup of X does a reinitialization of the network device. 
I've run accross this checksum problem on Intel NICs before. Infact I've reported a bug on this -> #57976. The problem was unrelated to any X activeties.

Stephan: can you verify that the message appears also when you have booted in runlvl 3, logged into a console as root and started a bare Xserver as described in attachment #21 [details]?
Comment 25 Egbert Eich 2006-11-30 11:39:39 UTC
Err, I ment to say: comment #21.
Bug #57976 and #177440 have some information what can be done to fix the eeprom issue.
Comment 26 Stefan Dirsch 2006-12-19 21:40:00 UTC
Stephan, any comments on Egbert's questions?
Comment 27 Stephan Lauffer 2006-12-20 08:00:15 UTC
Sorry, not now. Some weeks ago I plugged out the 2nd card and this card is in use in another machine since this date. I'm waiting for the delivery of an ordered PNY Quadro4 440 NVS card. As soon as this card arrives here, I can play with this stuff and start new testings. But I guess it will be january... is this to late (I will not forget the test)? 
Comment 28 Stefan Dirsch 2006-12-20 09:54:55 UTC
That's ok. :-)
Comment 29 Stephan Lauffer 2007-01-11 13:08:49 UTC
Ok, back on bug 21098 with a dedicated test machine (same hardware as before). You can get full root access if you like...

Here the test results for Egbert (see #24) in short, the complete lscpi logs will be send in some mimutes.

I really wonder why the old test I did in #13 and #15 (with the test rpm xorg-x11-server-6.9.0-50.24.i586.rpm in #11 from stefan) did NOT crash while not i crashes the host!

I made these teste with several xorg-x11 SL-10.1 packes:
--------------------------------------------------------
1. Boot in runlevel 3
2. lspci -v > <file...>-0
3. X -verbose 1
4. lspci -v > <file...>-1
5. kill the X from 3.
6. lspci -v <file...>-2

I made these test above (as far as possible) with the following SL-10.1 xorg installations:
a) xorg in the installation from the first/final SL-10.1 ftp release
b) ...with the xorg-x11-driver-video update with patch level 46.15
c) ...with the xorg-x11-driver-video update with patch level 46.20
d) ...with the xorg update packages patch level 50.17
   -> Important note: This was the last working X installation!
      After starting (and killing) X the ethernet card becomes "unusable"!
e) ...with... 50.20
f) ...with... 50.24
g) ...with... stefans patched xorg-x11-server 50.24 from #11

In all the tests a..g the "interuptmopde" in the bios was "APIC". So I started some more tests with the "PIC" mode. But there it wasn't something other than in the apic mode.

Ok... wait some minutes, I'll upload a tgz with the lspci outputs. 




Comment 30 Stephan Lauffer 2007-01-11 13:21:39 UTC
Created attachment 112449 [details]
tgz with a lot ow 'lspci -v' results - see #29

Filenames <-> comment:

lspci-50.17-0  <-> patch level 50.17, runlevel 3
lspci-50.17-1  <-> dito, X has been started
lspci-50.17-2  <-> ... X has been stopped

lspci-50.20-0, lspci-50.24-0, lspci-patched-package-bug-210988-50.24-0 <-> lscpi before starting X, host freezes/crashes after start

Other files:

lspci-50.24-pic-old_video-driver-46-0 <-> SL-10.1 final with no video-driver updates but the latest server.

lspci-patched-package-bug-210988-50.24-pic-old_video-driver-46-0 <-> SL-10.1 final with no video-driver updates but the patched server from stefan (see #11)

And some other files with the "PIC" (not "APIC") mode. These files has "pic" in the filename.

Poorly you will not see big differences in all these lspci outputs but I made them all the not have some still opened questions in this case.

(don't forget my note in #29: I can grant you a root login on this dedicated test machine...)

Greetings, Stephan
Comment 31 Stefan Dirsch 2007-01-11 16:19:34 UTC
Egbert, could you comment on the Stephan's investigations triggered by your qeustions? Thanks.
Comment 32 Stefan Dirsch 2007-01-30 20:35:41 UTC
Egbert?
Comment 34 Stefan Dirsch 2007-02-12 14:47:07 UTC
Date: Mon, 12 Feb 2007 15:42:22 +0100
From: Stephan Lauffer <lauffer@ph-freiburg.de>
To: sndirsch@novell.com
Subject: Re: [Bug 210988] kernel crash caused by xorg with two DH Matrox G550

> ------- Comment #32 from sndirsch@novell.com  2007-01-30 13:35 MST -------
> Egbert?

Hallole!

Du, also... ich hab einfach damals aufgegeben und mir ne Quad-Karte mit
nVidia Chipsatzt gekauft. Ich weiss nich, ob ihr die Sache wirklich noch
tief untersuchen wollt. Ich koennt die Testkiste auch fuer andere Sachen
"miss-"brauchen. Also falls das euch nich so wichtig iss... dann bau ich
die idle Kiste (immerhin p4 mit 1gb ram) ab. ich kann dann halt nimmer
gross behilflich sein... was meinst?


--
Liebe Gruesse, with best regards
Stephan Lauffer
Comment 35 Stefan Dirsch 2007-02-12 14:49:07 UTC
finally closing as WONTFIX.
Comment 36 Egbert Eich 2007-02-15 14:27:28 UTC
I will never get a chance to look at this one if it's closed.
reopening to assign to myself.
Comment 37 Egbert Eich 2007-02-15 14:28:24 UTC
Adjusting priority.
Comment 38 Stefan Dirsch 2007-02-15 14:39:08 UTC
So you have two G550 cards? One of them PCI? Otherwise I'm afraid you can't investigate this issue.
Comment 39 Egbert Eich 2007-02-15 19:09:07 UTC
No, but I haven't had a chance to look at the pile of logs attached here.
Comment 40 Egbert Eich 2007-04-30 09:21:31 UTC
Let's reduce the priority on this one for now.
Comment 41 Stefan Dirsch 2007-05-12 10:42:38 UTC
JFYI, Matthias. This is a bugreport, which is assigned to Egbert/me or with Egbert/me in CC or reported by Egbert/me.
Comment 42 Stefan Dirsch 2008-04-16 13:33:34 UTC
(In reply to comment #17 from Stefan Dirsch)
> Egbert, any ideas what's wrong with pc_xf86-pci.diff. At least it breaks Matrox
> Dualhcard support. :-(
> 
> * Fr Jun 02 2006 - sndirsch@suse.de
> - pc_xf86-pci.diff:
>   * fixes broken BIOS reading (due to changes in recent Linux
>   kernels), which is required for dual card support ( Bug #171453,
>   X.Org  Bug #6751)

This patch no longer exists in 10.2. I suggest to test first if this issue still exists with openSUSE 11.0 before investigating it. Meanwhile I have
pile of PCI G450 cards here.


Comment 43 Stefan Dirsch 2008-04-16 15:00:43 UTC
There's no kernel crash any more happening, but X hangs here:

(--) MGA(0): Chipset: "mgag400" (G450)
(==) MGA(0): Depth 24, (==) framebuffer bpp 32
(==) MGA(0): RGB weight 888
(II) Loading sub module "int10"
(II) LoadModule: "int10"
(II) Loading /usr/lib64/xorg/modules//libint10.so
(II) Module int10: vendor="X.Org Foundation"
        compiled for 1.4.0.90, module version = 1.0.0
        ABI class: X.Org Video Driver, version 2.0
(II) MGA(0): Initializing int10

This is on x86_64. I'll attach the truncated logfile.
Comment 44 Stefan Dirsch 2008-04-16 15:03:33 UTC
Created attachment 208351 [details]
Xorg.99.log
Comment 45 Stefan Dirsch 2008-04-16 15:04:04 UTC
Created attachment 208352 [details]
sysdata-22229

xorg.conf
Comment 46 Stefan Dirsch 2008-04-16 16:24:35 UTC
Marcus is currently working on a similar bug --> Bug #380298.
Comment 47 Stefan Dirsch 2008-04-19 21:23:49 UTC
This one sounds well suited for Luc

1. Old hardware
2. Exotic setup (Multicard)

:-)
Comment 52 Stefan Dirsch 2008-05-08 09:32:52 UTC
Implementing enhancement Bug #381644 would resolve this issue as well.
Comment 53 Stefan Dirsch 2008-07-19 12:38:01 UTC
(In reply to comment #52 from Stefan Dirsch)
> Implementing enhancement Bug #381644 would resolve this issue as well.

done.

Comment 54 Stefan Dirsch 2009-05-21 11:48:44 UTC
At least there is some hope to get this feature again.

http://lists.x.org/archives/xorg-devel/2009-May/000828.html
http://lists.x.org/archives/xorg-devel/2009-May/000928.html
Comment 55 Stefan Dirsch 2010-08-14 10:16:54 UTC
Needs to be addressed upstream.