Bug 345866

Summary: System crashes after launching Yast under Xen kernel
Product: [openSUSE] openSUSE 10.3 Reporter: Henry Laurent <laurent.henry>
Component: XenAssignee: Jan Beulich <jbeulich>
Status: RESOLVED NORESPONSE QA Contact: Jiri Srain <jsrain>
Severity: Major    
Priority: P5 - None CC: carnold, laurent.henry, marcus
Version: Final   
Target Milestone: ---   
Hardware: i386   
OS: openSUSE 10.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: save_y2logs output
xend log
dmesg with the working kernel
dmesg with the kernel producing the kernel
xen boot.msg file
boot.msg with default kernel

Description Henry Laurent 2007-12-04 13:51:47 UTC
On a new installation of an Opensuse 10.3 (32 bits).
While booting on the default-2.6.22.13-0.3 kernel, i can do normal usage of Yast.

While booting with the xen-2.6.22.13-0.3 kernel, use of yast produces a complete crash of my system instantly (freezed, total loss of control, the system doesn't even answer to ping anymore).

I have seen the error message before the crash:
sbin/yast: line 386: 4075 Erreur de segmentation $ybindir/y2base menu ncurses $NCTHREADS


After this I used apt-get update and apt-get dist-upgrade. Then i succeeded using yast once fine.
Now, i still suffer the same crashes everitime i try to launch yast, but without the error message.
Comment 1 Henry Laurent 2007-12-04 13:55:54 UTC
Created attachment 185816 [details]
save_y2logs output
Comment 2 Arvin Schnell 2007-12-07 10:36:46 UTC
Originally, which YaST module did you start that led to the crash? (Was
this really just the menu in ncurses?)

What version of YaST do you have installed now (best provide rpm -qa)?
Comment 3 Henry Laurent 2007-12-07 13:16:26 UTC
-The crash occurs just when typing "yast" on the root prompt

i'm mixed with the feeling it could be a kernel problem, badly accessing memory.
I can't imagine how just yast could freeze the whole system.

- rpm -qa|grep yast

yast2-storage-lib-2.15.27-4
yast2-xml-2.15.0-55
yast2-control-center-qt-2.15.4-12
yast2-ncurses-2.15.27-16
yast2-2.15.58-12
yast2-country-2.15.20-7
yast2-sound-2.15.11-18
yast2-firewall-2.15.8-8
yast2-runlevel-2.15.3-19
yast2-x11-2.15.11-22
yast2-fingerprint-reader-2.15.2-27
yast2-kerberos-client-2.15.7-32
yast2-ldap-client-2.15.12-37
yast2-users-2.15.38-7
yast2-inetd-2.15.1-41
autoyast2-installation-2.15.17-17
autoyast2-2.15.17-17
yast2-restore-2.15.4-22
yast2-online-update-frontend-2.15.24-0.1
yast2-repair-2.15.8-0.1
yast2-backup-2.15.5-0.1
yast2-schema-2.15.0-123
yast2-trans-stats-2.15.0-32
yast2-transfer-2.14.0-107
yast2-hardware-detection-2.15.8-36
yast2-perl-bindings-2.15.3-29
yast2-qt-2.15.16-19
yast2-control-center-2.15.4-12
yast2-mouse-2.15.1-81
yast2-printer-2.15.6-4
yast2-vm-2.16.1-48
yast2-bluetooth-2.15.4-17
yast2-irda-2.15.1-94
yast2-pam-2.14.0-128
yast2-scanner-2.15.5-42
yast2-sysconfig-2.15.3-58
yast2-network-2.15.81-2
yast2-ntp-client-2.15.12-7
yast2-tv-2.15.7-23
yast2-installation-2.15.54-4
yast2-samba-client-2.15.11-33
yast2-packager-2.15.81-4
yast2-update-2.15.23-21
yast2-iscsi-client-2.15.2-39
yast2-metapackage-handler-0.7.1-9
yast2-registration-2.15.3-15
yast2-iscsi-client-2.15.2-39
yast2-metapackage-handler-0.7.1-9
yast2-registration-2.15.3-15
yast2-sudo-2.15.3-86
yast2-bootloader-2.15.29-2
yast2-add-on-2.15.17-4
yast2-online-update-2.15.24-0.1
yast2-core-2.15.13-0.1
yast2-profile-manager-2.15.1-0.1
yast2-theme-openSUSE-2.15.14-4
yast2-slp-2.15.0-31
yast2-pkg-bindings-2.15.51-4
yast2-ldap-2.15.1-83
yast2-apparmor-2.1-26
yast2-nfs-client-2.15.0-25
yast2-support-2.15.3-14
yast2-nis-client-2.15.3-21
yast2-security-2.15.1-23
yast2-mail-2.15.23-2
yast2-samba-server-2.15.7-57
yast2-storage-2.15.27-4
yast2-tune-2.15.7-20
yast2-trans-fr-2.15.16-2.1
Comment 4 Arvin Schnell 2007-12-07 14:36:42 UTC
"yast" just starts the menu; not even hardware probing should be involved
there.
Comment 5 Henry Laurent 2007-12-07 15:53:27 UTC
Definitely there is something going really wrong since when i'm with any xen kernel it crashes while i reboot with the default 32 bits one, all is fine.


Memory management problem with these kernels ?
Comment 6 Charles Arnold 2008-01-07 16:33:07 UTC
Please attach the kernel logs (dmesg, etc) and xend.log so that we may better understand what is happening on your system.
Comment 7 Charles Arnold 2008-01-07 16:33:34 UTC
*** Bug 346178 has been marked as a duplicate of this bug. ***
Comment 8 Henry Laurent 2008-01-08 09:17:16 UTC
Created attachment 189700 [details]
xend log
Comment 9 Henry Laurent 2008-01-08 09:17:56 UTC
Created attachment 189701 [details]
dmesg with the working kernel
Comment 10 Henry Laurent 2008-01-08 09:21:15 UTC
Created attachment 189705 [details]
dmesg with the kernel producing the kernel
Comment 11 Henry Laurent 2008-01-08 11:06:08 UTC
i am noticing something really weird under the xen kernels concerning ntp going crazy, not sure there is something to do with the actual trouble but it occurs only with the wen kernels on this hardware too


in var log messages it could be seen as:
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4696602488 shadow=1100000067915 offset=4136832
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4696519541 shadow=1100000067915 offset=4219705
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4704805144 shadow=1100000067915 offset=4228629
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4704781345 shadow=1100000067915 offset=4252501
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4706074107 shadow=1100000067915 offset=4498243
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4706189123 shadow=1100000067915 offset=4528109
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4706248675 shadow=1100000067915 offset=4552807
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4706455918 shadow=1100000067915 offset=4578997
Jan  8 10:36:28 xen1 kernel: clocksource/1: Time went backwards: delta=-4706791654 shadow=1100000067915 offset=4661999
Jan  8 10:36:55 xen1 kernel: printk: 5 messages suppressed.
Jan  8 10:36:55 xen1 kernel: Timer ISR/1: Time went backwards: delta=-26763989925 delta_cpu=4716010075 shadow=1104000078321
 off=712004995 processed=1131476065650 cpu_processed=1099996065650
Jan  8 10:36:55 xen1 kernel:  0: 1131472065650
Jan  8 10:36:55 xen1 kernel:  1: 1099996065650
Jan  8 10:36:55 xen1 kernel: clocksource/1: Time went backwards: delta=-26765288604 shadow=1104000078321 offset=712224591
Jan  8 10:36:55 xen1 kernel: clocksource/1: Time went backwards: delta=-26771352533 shadow=1104000078321 offset=712269391
Jan  8 10:36:55 xen1 kernel: clocksource/1: Time went backwards: delta=-26771343265 shadow=1104000078321 offset=712278173
Jan  8 10:36:55 xen1 kernel: clocksource/1: Time went backwards: delta=-26771317593 shadow=1104000078321 offset=712303921
Comment 12 Charles Arnold 2008-01-08 16:18:01 UTC
About comment #11, this sounds like bug 279062 found and fixed in sles10sp1.  The same fix has been taken for 10.3 but is not yet available in the maintenance channel for download.
Comment 13 Henry Laurent 2008-01-08 16:28:14 UTC
I am getting a 'permission deny ' to this bug.

In fact it will be difficult to fix it to see if there is a link with what i am experiencing since launching yast crashes the system.
Comment 14 Jan Beulich 2008-02-06 08:03:43 UTC
The messages in #11 should be gone with 2.6.22.16-0.1 - please try that kernel.
Comment 15 Henry Laurent 2008-02-25 10:21:23 UTC
update from 2.6.22.13-0.3 to 2.6.22.17-0.1 done.

i am still "segfault-ing" while launching yast only with xen kernels.
Comment 16 Jan Beulich 2008-02-25 15:29:40 UTC
Can you make a statement regarding the 'time went backwards' messages with the new kernel?

As to the seg-faulting - without you providing more detail on them (e.g. messages printed generated by the kernel or Xen, if any) and with the understanding that you are not using the PAE kernel flavor (for which a possibly similar problem was found) I'm afraid there's not much else we can do.
Comment 17 Jan Beulich 2008-02-25 15:30:22 UTC
Oh, perhaps your list of loaded modules might also provide some hint.
Comment 19 Henry Laurent 2008-02-25 15:53:37 UTC
About ntp issue, the date i've seen time given with ntp is correct now and don't find any buggy message about this anymore.

About segfault, the problem is exactly the same i've mentionned on my first posts:

under xen and xen-pae kernel (the same happens for both), while login as root, anytime just when typing the yast command my system instantly freezes and i just can execute a manual poweroff, all i can see on the screen is the following message:

#yast
sbin/yast: line 386: 4075 Erreur de segmentation $ybindir/y2base menu ncurses
$NCTHREADS
(it's in french, 'erreur de segmentation' meaning segfault).

I dont find any relevant log about the crash and i am open to any manipulation needed.

output of lsmod (Linux xen1 2.6.22.17-0.1-xen #1 SMP 2008/02/10 20:01:04 UTC i686 i686 i386 GNU/Linux)
Module                  Size  Used by
af_packet              29064  0
bridge                 53528  1
netbk                  78420  0 [permanent]
netloop                10752  0
blkbk                  25504  0 [permanent]
blktap                118696  2 [permanent]
xenbus_be               8064  3 netbk,blkbk,blktap
iptable_filter          6912  0
ip_tables              16324  1 iptable_filter
ip6_tables             17476  0
x_tables               18308  2 ip_tables,ip6_tables
microcode               8072  0
firmware_class         13568  1 microcode
edd                    12996  0
apparmor               40736  0
ext3                  131848  1
jbd                    68276  1 ext3
mbcache                12292  1 ext3
loop                   21892  0
dm_mod                 56880  0
ide_cd                 40324  0
cdrom                  37148  1 ide_cd
pata_serverworks       13824  0
ata_generic            11524  0
libata                139472  2 pata_serverworks,ata_generic
thermal                18440  0
processor              27808  1 thermal
button                 12304  0
parport_pc             40764  0
serverworks            11400  0 [permanent]
generic                 8836  0 [permanent]
8250_pnp               13568  0
shpchp                 34836  0
e100                   38924  0
i2c_piix4              12300  0
8250                   31384  1 8250_pnp
mii                     9344  1 e100
pci_hotplug            33216  1 shpchp
ide_core              123972  3 ide_cd,serverworks,generic
i2c_core               27520  1 i2c_piix4
parport                37960  1 parport_pc
serial_core            24704  1 8250
serio_raw              10756  0
rtc_cmos               12448  0
rtc_core               23304  1 rtc_cmos
sworks_agp             13984  0
agpgart                37428  1 sworks_agp
rtc_lib                 7040  1 rtc_core
sg                     36908  0
reiserfs              232500  1
sd_mod                 30976  4
usbhid                 41556  0
hid                    29184  1 usbhid
ff_memless              9352  1 usbhid
aic7xxx               157732  3
scsi_transport_spi     26880  1 aic7xxx
scsi_mod              140504  5 libata,sg,sd_mod,aic7xxx,scsi_transport_spi
ohci_hcd               24068  0
usbcore               124908  3 usbhid,ohci_hcd
xenblk                 20976  0
xennet                 29960  0



The same with the working kernel (2.6.22.17-0.1-default)

Module                 Size  Used by
iptable_filter          6912  0
ip_tables              16324  1 iptable_filter
ip6_tables             17476  0
x_tables               18308  2 ip_tables,ip6_tables
microcode              15372  0
firmware_class         13568  1 microcode
apparmor               40736  0
ext3                  131848  1
jbd                    68148  1 ext3
mbcache                12292  1 ext3
loop                   21636  0
dm_mod                 56880  0
e100                   38156  0
parport_pc             40892  0
mii                     9344  1 e100
rtc_cmos               12064  0
parport                37832  1 parport_pc
button                 12560  0
sworks_agp             13344  0
shpchp                 35092  0
rtc_core               23048  1 rtc_cmos
agpgart                35764  1 sworks_agp
rtc_lib                 7040  1 rtc_core
serio_raw              10756  0
pci_hotplug            33216  1 shpchp
i2c_piix4              12556  0
i2c_core               27520  1 i2c_piix4
sr_mod                 19492  0
sg                     37036  0
cdrom                  37020  1 sr_mod
usbhid                 41300  0
hid                    29184  1 usbhid
ff_memless              9352  1 usbhid
ohci_hcd               23684  0
sd_mod                 31104  4
usbcore               124268  3 usbhid,ohci_hcd
edd                    12996  0
reiserfs              233140  1
fan                     9220  0
aic7xxx               157348  3
scsi_transport_spi     27008  1 aic7xxx
pata_serverworks       13824  0
libata                139216  1 pata_serverworks
scsi_mod              140376  6 sr_mod,sg,sd_mod,aic7xxx,scsi_transport_spi,libata
thermal                20872  0
processor              40876  1 thermal


Comment 20 Jan Beulich 2008-02-25 16:52:44 UTC
I think preventing at least sworks_agp, pata_serverworks, and the two non-Xen modules not loaded in -default at all (ata_generic and generic) from loading might be a reasonable first step. For these last two modules it'd be especially interesting to know why they get loaded in -xen, but not in -default.

And please be so kind a re-attach /var/log/boot.msg for -default and -xen with the kernel version you just installed.
Comment 22 Marcus Robst 2008-02-26 20:46:10 UTC
Hi, 

I've got exactly the same issue. Have reinstalled 3 times and added additional phsyical ram but segfaults with yast. 

I got a little further when installing a minimal, no desktop system. Could start yast in Xen but crashes when installing something. With either Gnome or KDE installed it crashes as soon as you type "yast" as root over a ssh connetion. 

/sbin/yast: line 386:  4247 Segmentation fault      $ybindir/y2base menu ncurses $NCTHREADS

Server is a 32 bit Dell Power Edge SC430 with 1.5Gb Ram running clean install of OpenSuse 10.3. I'm happy to provide more data, install anything or possibly provide remote access, currently the box is unused.

thanks
Marcus
Comment 23 Jan Beulich 2008-02-27 08:11:45 UTC
Yes, getting remote access might help, unless we're able to duplicate this inhouse (which is currently being attempted). Since I wouldn't immediately have the cycles to do debugging this way, I'll get back to that offer once I know what our lab folks say on it.

Of course, if you want to do some debugging of this meanwhile - what we would minimally need to get would be register state and backtrace from gdb at the point of the SEGV.

The other things to do (independently) would be to
- collect Xen *and* kernel messages over serial (to grab a possible kernel crash's printout), or (if that doesn't provide anything)
- check whether SysRq still works at the point of the hang, and if so, collect SysRq-p and SysRq-t output (again over serial), or (if that doesn't work)
- collect Xen's response to sending 'd' over the serial line (after switching input to Xen).
Comment 24 Henry Laurent 2008-02-27 18:03:15 UTC
Created attachment 197497 [details]
xen boot.msg file
Comment 25 Henry Laurent 2008-02-27 18:06:52 UTC
Created attachment 197498 [details]
boot.msg with default kernel
Comment 26 Henry Laurent 2008-02-27 18:10:18 UTC
- I've blacklisted or done rmmod to numerous modules, the one you mentionned and a few more (xenblk and xennet) and still crashing, there are some i can't
delete yet (generic always given as "busy").

- I prevent launching of xend, xendomains and avahi processes too, without anymore succes.

- It took me about 20 tries with crashes and reboots to do all this, 3 times among all of this yast launched well (i succeed editing runlevels and searching for online updates). I am not able to reproduce what makes it work sometimes, it seems a random behavior.

- I've uploaded the 2 boot.msg files, for the actual kernel.

- About console redirecting of kernel messages, why not, by i have no idea how to
do so.


PS: about Marcus message, my own server is a 32 bits 2500 Dell poweredge.
Comment 27 Jan Beulich 2008-02-28 08:55:52 UTC
>- About console redirecting of kernel messages, why not, by i have no idea how
>to do so.

This requires collecting messages over serial, and the 'xencons=xvc' (or 'xencons=ttyS') kernel (not Xen) boot option.

Also, the other information requested in the last paragraph of comment #23 would also apply to your system; without getting an understanding on the kind of crash/hang I don't think there's much we can do.
Comment 28 Christoph Thiel 2008-04-25 14:49:21 UTC
Closing NOREPSONSE, due to missing information for more than 21 days. Please 
feel free to reopen and provide the requested information.