|
Bugzilla – Full Text Bug Listing |
| Summary: | X with kms_swrast_dri causes SIGFPE when built with GCC 6 | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Richard Biener <rguenther> |
| Component: | X.Org | Assignee: | E-mail List <xorg-maintainer-bugs> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <xorg-maintainer-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | rguenther, schwab |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
Eventually X doesn't check if pthread_barrier_init succeeds and passes it a zero count (upon which it exits with barrier uninitialized and thus likely all zeros). Any idea why this doesn't happen with gcc < 6? No idea yet - trying to create a Test-DVD with gdb and required debuginfo/source
packages to look what happens.
But certainly in ./src/gallium/auxiliary/os/os_thread.h
typedef pthread_barrier_t pipe_barrier;
static inline void pipe_barrier_init(pipe_barrier *barrier, unsigned count)
{
pthread_barrier_init(barrier, NULL, count);
}
static inline void pipe_barrier_destroy(pipe_barrier *barrier)
{
pthread_barrier_destroy(barrier);
}
static inline void pipe_barrier_wait(pipe_barrier *barrier)
{
pthread_barrier_wait(barrier);
}
doesn't properly verify pthread_barrier_init succeeds (nor does it return
the return value).
in ./src/gallium/drivers/llvmpipe/lp_rast.c I see
for (i = 0; i < MAX2(1, num_threads); i++) {
struct lp_rasterizer_task *task = &rast->tasks[i];
...
rast->num_threads = num_threads;
rast->no_rast = debug_get_bool_option("LP_NO_RAST", FALSE);
create_rast_threads(rast);
/* for synchronizing rasterization threads */
pipe_barrier_init( &rast->barrier, rast->num_threads );
so the loop cares for num_threads < 1 which looks to me it can be zero
and this value is passed to pipe_barrier_init unmodified.
Looks like a non-GCC specific issue (I didn't try to see if it reproduces
with GCC 5). Note that I have a new LLVM in Staging:Gcc6 as well (not
sure if that matters).
Caller has
screen->num_threads = util_cpu_caps.nr_cpus > 1 ? util_cpu_caps.nr_cpus : 0;
#ifdef PIPE_SUBSYSTEM_EMBEDDED
screen->num_threads = 0;
#endif
screen->num_threads = debug_get_num_option("LP_NUM_THREADS", screen->num_threads);
screen->num_threads = MIN2(screen->num_threads, LP_MAX_THREADS);
screen->rast = lp_rast_create(screen->num_threads);
so there are certainly cases where num_threads == 0. I suppose in that
case the pipe_barrier should be not used at all? In fact if nr_cpus is 1
(as default in qemu) num_threads will be zero.
Hmm, running X in gdb just hangs the machine :/ Ok, probably "caused" by the new glibc as b02840ba introduced the modulo operation. commit b02840bacdefde318d2ad2f920e50785b9b25d69 Author: Torvald Riegel <triegel@redhat.com> Date: Wed Jun 24 14:37:32 2015 +0200 New pthread_barrier algorithm to fulfill barrier destruction requirements. The previous barrier implementation did not fulfill the POSIX requirements for when a barrier can be destroyed. Specifically, it was possible that threads that haven't noticed yet that their round is complete still access the barrier's memory, and that those accesses can happen after the barrier has been legally destroyed. The new algorithm does not have this issue, and it avoids using a lock internally. You need to fix Mesa to not call pthread_barrier_init with zero count (or not destroy or wait on such barrier). You likely don't wait on it already as that has the same modulo operation. Guard in ./src/gallium/drivers/llvmpipe/lp_rast.c void lp_rast_destroy( struct lp_rasterizer *rast ) { ... /* for synchronizing rasterization threads */ pipe_barrier_destroy( &rast->barrier ); with if rast->num_threads >= 1 the pipe_thread_wait in this function is already guarded by means of the loop for (i = 0; i < rast->num_threads; i++) { #ifdef _WIN32 pipe_semaphore_wait(&rast->tasks[i].work_done); #else pipe_thread_wait(rast->threads[i]); #endif } Oh, and best not call pipe_barrier_init with num-threads == 0 either. And X indeed works fine with qemu -smp cpus=2 ... Talked to Richard. I'll add the patch to Mesa in X11:XOrg and submit it to factory/TW. (In reply to Stefan Dirsch from comment #9) > Talked to Richard. I'll add the patch to Mesa in X11:XOrg and submit it to > factory/TW. done (SR#373686) Richard, feel free to linkpac to obs://X11:XOrg/Mesa for now. ;-) This is an autogenerated message for OBS integration: This bug (971350) was mentioned in https://build.opensuse.org/request/show/373686 Factory / Mesa This is an autogenerated message for OBS integration: This bug (971350) was mentioned in https://build.opensuse.org/request/show/373998 Factory / Mesa |
In openSUSE:Factory:Staging:Gcc6 on x86_64 launching X causes [ 26.429] (EE) Backtrace: [ 26.429] (EE) 0: /usr/bin/X (xorg_backtrace+0x4a) [0x59721a] [ 26.429] (EE) 1: /usr/bin/X (0x400000+0x19b569) [0x59b569] [ 26.429] (EE) 2: /lib64/libc.so.6 (0x7f2cdee91000+0x34b10) [0x7f2cdeec5b10] [ 26.429] (EE) 3: /lib64/libpthread.so.0 (pthread_barrier_destroy+0x9) [0x7f2cdec81c19] [ 26.429] (EE) 4: /usr/lib64/dri/kms_swrast_dri.so (0x7f2cd8a74000+0x63d45f) [0x7f2cd90b145f] [ 26.429] (EE) 5: /usr/lib64/dri/kms_swrast_dri.so (0x7f2cd8a74000+0x648b01) [0x7f2cd90bcb01] [ 26.429] (EE) 6: /usr/lib64/dri/kms_swrast_dri.so (0x7f2cd8a74000+0x2f8c5f) [0x7f2cd8d6cc5f] [ 26.429] (EE) 7: /usr/lib64/dri/kms_swrast_dri.so (0x7f2cd8a74000+0x2f8d05) [0x7f2cd8d6cd05] [ 26.429] (EE) 8: /usr/lib64/dri/kms_swrast_dri.so (0x7f2cd8a74000+0x2f71a2) [0x7f2cd8d6b1a2] [ 26.429] (EE) 9: /usr/lib64/libgbm.so.1 (0x7f2cda047000+0x328d) [0x7f2cda04a28d] [ 26.429] (EE) 10: /usr/lib64/xorg/modules/libglamoregl.so (0x7f2cda255000+0x746b) [0x7f2cda25c46b] [ 26.429] (EE) 11: /usr/lib64/xorg/modules/libglamoregl.so (glamor_egl_init+0x117) [0x7f2cda25d6a7] [ 26.429] (EE) 12: /usr/lib64/xorg/modules/drivers/modesetting_drv.so (0x7f2cdaa9d000+0x87ac) [0x7f2cdaaa57ac] [ 26.429] (EE) 13: /usr/bin/X (InitOutput+0xa6d) [0x47e7ed] [ 26.429] (EE) 14: /usr/bin/X (0x400000+0x3d106) [0x43d106] [ 26.429] (EE) 15: /lib64/libc.so.6 (__libc_start_main+0xf1) [0x7f2cdeeb1721] [ 26.429] (EE) 16: /usr/bin/X (_start+0x29) [0x4283e9] [ 26.429] (EE) [ 26.429] (EE) Floating point exception at address 0x7f2cdec81c19 [ 26.429] (EE) Fatal server error: [ 26.429] (EE) Caught signal 8 (Floating point exception). Server aborting where glibc does int pthread_barrier_destroy (pthread_barrier_t *barrier) { struct pthread_barrier *bar = (struct pthread_barrier *) barrier; /* Destroying a barrier is only allowed if no thread is blocked on it. Thus, there is no unfinished round, and all modifications to IN will have happened before us (either because the calling thread took part in the most recent round and thus synchronized-with all other threads entering, or the program ensured this through other synchronization). We must wait until all threads that entered so far have confirmed that they have exited as well. To get the notification, pretend that we have reached the reset threshold. */ unsigned int count = bar->count; unsigned int max_in_before_reset = BARRIER_IN_THRESHOLD - BARRIER_IN_THRESHOLD % count; thus barrier->count is zero somehow. This is from within Qemu with the Test-DVD for openQA. Note the project has glibc from Base:System which is at 2.23 (factory is 2.22 still).