Bug 1162365

Summary: if the lock does not use lock elision pthread_mutex_destroy will fail
Product: [openSUSE] openSUSE Distribution Reporter: Jan Michalski <jan.m.michalski>
Component: OtherAssignee: Andreas Schwab <schwab>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P2 - High CC: alynx.zhou, lukasz.dorau, marcin.slusarz
Version: Leap 15.1   
Target Milestone: ---   
Hardware: x86-64   
OS: All   
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=23275
Whiteboard:
Found By: Customer Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Jan Michalski 2020-01-31 12:29:38 UTC
Source code: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/locking_issue_repro.c
Makefile: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/Makefile
Distro: openSUSE Leap 15.1
Kernel: 4.12.14-lp151.28.36-default
Glibc: glibc-devel-2.26-lp151.18.7.x86_64
CPU: Intel(R) Xeon(R) Gold 6142M CPU @ 2.60GHz

Scenario:
Two worker threads at the same time are using a common set of primitives:
struct action {
               pthread_mutex_t lock;
               pthread_cond_t cond;
               unsigned val;
};
One of the threads is waiting on pthread_cond_t while another is setting val to 1.
Everything happens in the action_cancel_worker function: https://github.com/janekmi/pmdk/blob/test-pthread-3/src/test/obj_pmalloc_mt/locking_issue_repro.c#L159
After exiting from the worker thread all mutexes should be unlocked so it should be possible to destroy them. But they are not. pthread_mutex_destroy fails with EBUSY.

Repro:
$ ./locking_issue_repro 32 1000
pthread_mutex_destroy: Device or resource busy

Note:
After each pthread_mutex_lock and pthread_mutex_unlock API call internal state of the mutex is dumped to /dev/shm/obj_pmalloc_mt_dump file.
The key is:
TID -> actions[worker-id][op-id] = {data read from the pthread_mutex_t} (stage of the worker)

Issue: (appears sporadically, but at least 1/5):
$ cat /dev/shm/obj_pmalloc_mt_dump | tail
2793 -> actions[7][996] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
2793 -> actions[7][997] = {nusers: 0, owner: 0, kind: 256} (lock t1)
2793 -> actions[7][997] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
2793 -> actions[7][998] = {nusers: 0, owner: 0, kind: 256} (lock t1)
2793 -> actions[7][998] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
2793 -> actions[7][999] = {nusers: 0, owner: 0, kind: 256} (lock t1)
2793 -> actions[7][999] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
2777 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (dump)
2777 -> actions[7][794] = {nusers: 1, owner: 2793, kind: 256} (dump)

Clues:
All of the locks are of the kind: PTHREAD_MUTEX_ELISION_NP so nearly all of them looks as follows:
2792 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (lock t0)
2792 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (unlock t0)
2793 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (lock t1)
2793 -> actions[7][711] = {nusers: 0, owner: 0, kind: 256} (unlock t1)
So it looks like all of them are use lock elision.
But if any of them does not use lock elision it behaves strangely:
- it seems locked all the time:
$ cat /dev/shm/obj_pmalloc_mt_dump | grep \\[7\\] | grep 710
2793 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (lock t1) // no matter if it is after lock
2792 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (lock t0)
2792 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (unlock t0) // or after unlock
2793 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (unlock t1)
2777 -> actions[7][710] = {nusers: 1, owner: 2793, kind: 256} (dump)
- but at the same time, they work fine!
- excluding the fact they are impossible to destroy them
- the rule is: if the lock does not use lock elision it will fail during pthread_mutex_destroy.
Comment 1 Marcin Ĺšlusarz 2020-01-31 15:11:24 UTC
FTR, this bug was discovered by PMDK test and reported here:
https://github.com/pmem/pmdk/issues/4510
Comment 2 Andreas Schwab 2020-02-20 14:33:15 UTC
dup

*** This bug has been marked as a duplicate of bug 1131330 ***