Bug 861488

Summary: systemd crash on resume
Product: [openSUSE] openSUSE 13.1 Reporter: Michael Meeks <mmeeks>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P2 - High CC: crrodriguez, systemd-maintainers
Version: RC 1   
Target Milestone: ---   
Hardware: i686   
OS: openSUSE 13.1   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: syslog fragment
patch
hwinfo.txt

Description Michael Meeks 2014-01-31 10:21:03 UTC
I suspended with powersave -u, on resume I came back to an X server and no input devices =)

Rebooting showed:

2014-01-31T10:01:58.302527+00:00 linux-tc27 systemd[1]: Caught <SEGV>, dumped core as pid 3824.
2014-01-31T10:01:58.303013+00:00 linux-tc27 systemd[1]: Freezing execution.

$ sudo systemd-coredumpctl list
No coredumps found

file /tmp/systemd-crash ...

$ rpm -q systemd
systemd-208-15.1.i586
$ rpm -q --changelog systemd | head
* Mon Dec 16 2013 fcrozat@suse.com
- Update apply-ACL-for-nvidia-device-nodes.patch with latest fixes
  for Nvidia cards (bnc#808319).

* Thu Dec 12 2013 fcrozat@suse.com
- Update insserv-generator.patch: fix crash in insserv generator
  (bnc#854314).

I attach the relevant chunk of the syslog. Ignore the USB message - it is a normal part of an ongoing kernel developer plot to kill my SSDs in an untimely way ;-) and has been long filed.
Comment 1 Michael Meeks 2014-01-31 10:21:24 UTC
Created attachment 576660 [details]
syslog fragment
Comment 2 Dr. Werner Fink 2014-02-11 08:08:45 UTC
Without core dump I can do exacvtly nothing
Comment 3 Michael Meeks 2014-02-11 13:35:02 UTC
Where would I expect to find the core-dump ? the log says it was dumped; however I see nothing in the file-system in /var/lib/systemd/coredump/.

Thanks.
Comment 4 Frederic Crozat 2014-02-11 13:40:30 UTC
and in / ?
Comment 5 Michael Meeks 2014-02-11 14:43:33 UTC
Ah ! there is one there, it subsequently crashed again on resume hopefully with similar symptoms; I guess this is the guy:

(gdb) bt
#0  0xb7799424 in __kernel_vsyscall ()
#1  0xb7677276 in raise (sig=sig@entry=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#2  0x08052a88 in crash (sig=11) at src/core/main.c:143
#3  <signal handler called>
#4  __strnlen_sse2 () at ../sysdeps/i386/i686/multiarch/strlen-sse2.S:125
#5  0xb74fb7c6 in _IO_vfprintf_internal (s=s@entry=0xbfeeee20, format=<optimized out>, 
    format@entry=0x8117dec "PRIORITY=%i\nSYSLOG_FACILITY=%i\n%s%.*s%s%s%.*i%s%s%.*s%s%s%.*s%sSYSLOG_IDENTIFIER=%s\n", 
    ap=0xbfeeef78 "V\216\021\b\326|\021\b\001", ap@entry=0xbfeeef64 "\006") at vfprintf.c:1629
#6  0xb75b4580 in ___vsnprintf_chk (s=s@entry=0xbfeef2ec "PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=", maxlen=<optimized out>, maxlen@entry=2048, 
    flags=flags@entry=1, slen=slen@entry=4294967295, 
    format=format@entry=0x8117dec "PRIORITY=%i\nSYSLOG_FACILITY=%i\n%s%.*s%s%s%.*i%s%s%.*s%s%s%.*s%sSYSLOG_IDENTIFIER=%s\n", 
    args=args@entry=0xbfeeef64 "\006") at vsnprintf_chk.c:63
#7  0xb75b44a7 in ___snprintf_chk (s=s@entry=0xbfeef2ec "PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=", maxlen=maxlen@entry=2048, 
    flags=flags@entry=1, slen=slen@entry=4294967295, 
    format=format@entry=0x8117dec "PRIORITY=%i\nSYSLOG_FACILITY=%i\n%s%.*s%s%s%.*i%s%s%.*s%s%s%.*s%sSYSLOG_IDENTIFIER=%s\n") at snprintf_chk.c:34
#8  0x080b8235 in snprintf (__fmt=0x8117dec "PRIORITY=%i\nSYSLOG_FACILITY=%i\n%s%.*s%s%s%.*i%s%s%.*s%s%s%.*s%sSYSLOG_IDENTIFIER=%s\n", __n=2048, 
    __s=0xbfeef2ec "PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=") at /usr/include/bits/stdio2.h:64
#9  log_do_header (header=header@entry=0xbfeef2ec "PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=", level=level@entry=30, 
    file=file@entry=0x80e5c6e "src/core/manager.c", line=line@entry=1736, func=func@entry=0x80e7556 <__func__.13183> "process_event", 
    object_name=object_name@entry=0x0, object=object@entry=0x0, size=2048) at src/shared/log.c:445
#10 0x080b99d9 in log_struct_internal (level=30, level@entry=6, file=file@entry=0x80e5c6e "src/core/manager.c", line=line@entry=1736, 
    func=func@entry=0x80e7556 <__func__.13183> "process_event", 
    format=format@entry=0x80e6bf0 "MESSAGE_ID=%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x") at src/shared/log.c:750
#11 0x0805a66d in process_event (ev=0xbfeefba8, m=<optimized out>) at src/core/manager.c:1733
#12 manager_loop (m=0x8559658) at src/core/manager.c:1858
#13 0x080505e3 in main (argc=2, argv=0xbfef00c4) at src/core/main.c:1652

The bug is here:

441	static int log_do_header(char *header, size_t size,
442	                         int level,
443	                         const char *file, int line, const char *func,
444	                         const char *object_name, const char *object) {
445	        snprintf(header, size,
446	                 "PRIORITY=%i\n"
447	                 "SYSLOG_FACILITY=%i\n"
448	                 "%s%.*s%s"
449	                 "%s%.*i%s"
(gdb) 
450	                 "%s%.*s%s"
451	                 "%s%.*s%s"
452	                 "SYSLOG_IDENTIFIER=%s\n",
453	                 LOG_PRI(level),
454	                 LOG_FAC(level),
455	                 file ? "CODE_FILE=" : "",
456	                 file ? LINE_MAX : 0, file, /* %.0s means no output */
457	                 file ? "\n" : "",
458	                 line ? "CODE_LINE=" : "",
459	                 line ? 1 : 0, line, /* %.0d means no output too, special case for 0 */
(gdb) 
460	                 line ? "\n" : "",
461	                 func ? "CODE_FUNCTION=" : "",
462	                 func ? LINE_MAX : 0, func,
463	                 func ? "\n" : "",
464	                 object ? object_name : "",
465	                 object ? LINE_MAX : 0, object, /* %.0s means no output */
466	                 object ? "\n" : "",
467	                 program_invocation_short_name);
468	        header[size - 1] = '\0';
469	        return 0;
(gdb) p file
$1 = 0x80e5c6e "src/core/manager.c"
(gdb) p func
$2 = 0x80e7556 <__func__.13183> "process_event"
(gdb) p object
$3 = 0x0
(gdb) p object_name
$4 = 0x0

Patch attached ... will submit to the mailinglist ...
Comment 6 Michael Meeks 2014-02-11 14:44:12 UTC
Created attachment 578072 [details]
patch
Comment 7 Dr. Werner Fink 2014-02-11 16:12:23 UTC
Thanks a lot, just added.
Comment 8 Michael Meeks 2014-02-11 16:38:32 UTC
The suggestion on-list is that the patch is pointless =)

https://bugzilla.redhat.com/show_bug.cgi?id=1010603

is more likely the cause .... Sorry pardon ! perhaps we shipped that already ?
Comment 9 Dr. Werner Fink 2014-02-11 16:42:48 UTC
Does this mean this is a i686 thinkpad?
Comment 10 Dr. Werner Fink 2014-02-11 16:49:51 UTC
*** Bug 862678 has been marked as a duplicate of this bug. ***
Comment 11 Michael Meeks 2014-02-11 16:50:02 UTC
Created attachment 578112 [details]
hwinfo.txt

Sure an oldish Thinkpad W500 with 32bit OS on it (to save space); doesn't crash every time I suspend/resume either - it's intermittent - say one in ten ... [ and I've no bandwidth to run tests either - sorry ].
Comment 12 Dr. Werner Fink 2014-02-11 16:59:34 UTC
OK this is kernel stuff:

https://bugzilla.kernel.org/show_bug.cgi?id=61781
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c511851de162e8ec03d62e7d7feecbdf590d881d

and it seems not to be part of openSUSE:13.1:Update/kernel-source.2530
Comment 13 Cristian Rodríguez 2014-02-11 17:01:32 UTC
This is a known kernel bug .. see also #851044 for another symptom.
Comment 14 Dr. Werner Fink 2014-02-11 17:09:29 UTC
Next time the i686 should be in Platform.

*** This bug has been marked as a duplicate of bug 851044 ***