Bug 911358

Summary: ogg123 ends with SIGSEGV at 0x56F71BB: __lll_unlock_elision (in /lib64/libpthread-2.19.so)
Product: [openSUSE] openSUSE Distribution Reporter: Ulrich Windl <Ulrich.Windl>
Component: OtherAssignee: Takashi Iwai <tiwai>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: Ulrich.Windl, whdu
Version: 13.2   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 13.2   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on: 911588    
Bug Blocks:    

Description Ulrich Windl 2014-12-29 22:16:55 UTC
When playing Ogg Vorbis files (that played fine in openSUSE 12.3), ogg123 plays them correctly (AFAIK), but hwn done, the program dies with SIGSEGV which is not expected. Earlier Ifound out that when you enter ^C (to skip to the next file) when multiple files are given on command line, ogg123 also gets a SIGSEGV.
The problem also occurs with files freshly encoded with oggenc.
Running ogg123 under valgrind I get:
==4584== 40,17 [00:00,00] of 01:40,17  (163,3 kbps)  Output Buffer   0,0% (EOS) 
==4584== Process terminating with default action of signal 11 (SIGSEGV)
==4584==  General Protection Fault
==4584==    at 0x56F71BB: __lll_unlock_elision (in /lib64/libpthread-2.19.so)
==4584==    by 0x408CD3: ??? (in /usr/bin/ogg123)
==4584==    by 0x404099: ??? (in /usr/bin/ogg123)
==4584==    by 0x6284B04: (below main) (in /lib64/libc-2.19.so)
==4584== 
[...]
vorbis-tools-1.4.0-17.1.3.x86_64
Comment 1 Weihua Du 2014-12-30 02:55:22 UTC
Takashi, would you please take a look? Thanks!
Comment 2 Takashi Iwai 2014-12-30 09:28:43 UTC
This happens only after suspend/resume, right?  If so, it's a dup of bug 903874.

The latest upstream kernel should have already the fix.  A temporary workaround would be to uninstall/downgrade the CPU microcode in ucode-intel package and recreated initrd.
Comment 3 Ulrich Windl 2014-12-31 17:04:07 UTC
(In reply to Takashi Iwai from comment #2)
> This happens only after suspend/resume, right?
Did you try to reproduce? Usually I don't suspend my computer during playback of music ;-) No, I did not suspend/resume the computer.
Despite of that I think SIGILL (invalid instruction) is quite different from SIGSEVG (bad memory access).
For what's worth it: My CPU is a "Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz" with microcode 0x19 (stepping 3). I quick look at bug 903874 showed that the microcode in use is 0x1c there (much newer than what I have).
If the only common thing in both bugs is "__lll_lock_elision()", that code could just be broken (assuming Intel CPUs are not).
So if you can: Play any Ogg/Vobis file on a Intel CPU in x86_64 environemnt. I can 100% reproduce it, and when you specify multiple files in command line, pressing ^C does not continue with the next filem but aborts the command. Testing this is really easy!
Comment 4 Takashi Iwai 2014-12-31 17:44:25 UTC
(In reply to Ulrich Windl from comment #3)
> (In reply to Takashi Iwai from comment #2)
> > This happens only after suspend/resume, right?
> Did you try to reproduce?

This problem is known to happen only on certain broken CPU and BIOS, so I cannot reproduce here at all.

> Usually I don't suspend my computer during
> playback of music ;-) No, I did not suspend/resume the computer.
> Despite of that I think SIGILL (invalid instruction) is quite different from
> SIGSEVG (bad memory access).
> For what's worth it: My CPU is a "Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz"
> with microcode 0x19 (stepping 3). I quick look at bug 903874 showed that the
> microcode in use is 0x1c there (much newer than what I have).
> If the only common thing in both bugs is "__lll_lock_elision()", that code
> could just be broken (assuming Intel CPUs are not).
> So if you can: Play any Ogg/Vobis file on a Intel CPU in x86_64 environemnt.
> I can 100% reproduce it, and when you specify multiple files in command
> line, pressing ^C does not continue with the next filem but aborts the
> command. Testing this is really easy!

So, in your case, installing the latest ucode-intel package should fix the problem.  This will disable TSX that is known to be broken.  Please check it.

Of course, due to the bug 903874, the problem may reappear after suspend/resume.
Comment 5 Ulrich Windl 2015-01-03 19:58:39 UTC
(In reply to Takashi Iwai from comment #4)
[...]
> This problem is known to happen only on certain broken CPU and BIOS, so I
> cannot reproduce here at all.
[...]
> So, in your case, installing the latest ucode-intel package should fix the
> problem.  This will disable TSX that is known to be broken.  Please check it.
[...]
I checked my system: There seems to be no package related to "microcode" installed, and I checked YaST: There is no microcode-related package matching "microcode|intel|processor". My expectation was that such a package should be installed by default, and it should be on the installation media.
Only on http://software.opensuse.org/package/ucode-intel I could find the package.
So I tired "http://download.opensuse.org/repositories/openSUSE:/13.2/standard/x86_64/ucode-intel-20140913-1.1.x86_64.rpm"
For some obscure reason, the CPUs' microcode did not update; for some strange reasons, a reboot seems to be required. Ok, you'll have to wait...
Comment 6 Takashi Iwai 2015-01-03 20:44:40 UTC
(In reply to Ulrich Windl from comment #5)
> (In reply to Takashi Iwai from comment #4)
> [...]
> > This problem is known to happen only on certain broken CPU and BIOS, so I
> > cannot reproduce here at all.
> [...]
> > So, in your case, installing the latest ucode-intel package should fix the
> > problem.  This will disable TSX that is known to be broken.  Please check it.
> [...]
> I checked my system: There seems to be no package related to "microcode"
> installed, and I checked YaST: There is no microcode-related package
> matching "microcode|intel|processor". My expectation was that such a package
> should be installed by default, and it should be on the installation media.

Maybe it's missing from the installation pattern after the package split.  Please open another bug report.

> Only on http://software.opensuse.org/package/ucode-intel I could find the
> package.
> So I tired
> "http://download.opensuse.org/repositories/openSUSE:/13.2/standard/x86_64/
> ucode-intel-20140913-1.1.x86_64.rpm"
> For some obscure reason, the CPUs' microcode did not update;

This is intentional, IIRC.  Updating the microcode on a running system would lead to crashes easily (the opposite direction of this bug, for example).
You can still poke the microcode reload by writing to a sysfs file manually by yourself.

> for some
> strange reasons, a reboot seems to be required.

The same reason.  The CPU microcode is loaded nowadays earlier, even before initrd execution, for achieving consistency.  With the risk of crash, you can load the microcode by yourself, as mentioned in the above, too.
Comment 7 Takashi Iwai 2015-01-20 15:23:14 UTC
The kernel side was already fixed.  Let's close.