Bugzilla – Bug 380751
k45.suse.de - pthread_cond_latency hangs while several runs
Last modified: 2009-03-09 20:07:58 UTC
Kernel: SLERT 10 SP2 Beta 6 + NO_IRQ_HW_AFFINITY ltp-realtime: ltp-realtime-20080229-1 Host: K45.suse.de pthread_cond_latency keeps hanging while running several times. The test got looped at least three times. The test got called within the SLERT-ltp-realtime testsuite in a loop. Regarding Felix, the entire SLERT-ltp-realtime takes longer on each run: #1 approx. 3h #2 approx. 4.5h #3 hanging for 1day 2h ----- K45:~ # rpm -q kernel-rt --changelog * Fri Apr 11 2008 - dgollub@suse.de - patches.rt/shield-procs: got touched to apply smoothly for recent revert of CPU affinity for hardware IRQ threads. Avoid CPU affinity on hardware IRQ threads. ---- (gdb) thread apply all bt Thread 2 (Thread 1082132800 (LWP 10093)): #0 0x00002ac13c22e5f6 in poll () from /lib64/libc.so.6 #1 0x0000000000401f93 in childfunc (arg=<value optimized out>) at pthread_cond_latency.c:116 #2 0x00002ac13be04143 in start_thread () from /lib64/libpthread.so.0 #3 0x00002ac13c2368cd in clone () from /lib64/libc.so.6 #4 0x0000000000000000 in ?? () Thread 1 (Thread 47009427580624 (LWP 10090)): #0 0x00002ac13c21fb17 in sched_yield () from /lib64/libc.so.6 #1 0x0000000000401cb3 in test_signal (broadcast_flag=1, iter=4) at pthread_cond_latency.c:165 #2 0x0000000000401f03 in main (argc=2, argv=0x7fff6edc4eb8) at pthread_cond_latency.c:237 #3 0x00002ac13c193184 in __libc_start_main () from /lib64/libc.so.6 #4 0x0000000000401a49 in _start () #0 0x00002ac13c21fb17 in sched_yield () from /lib64/libc.so.6 ---- ltp-realtime package is available at /suse/dgollub/SLERT/testpackages-20080409/ltp-realtime/
Felix, could you stop your testrun and try to reproduce this with calling pthread_cond_latency several times in row? If pthread_cond_latency doesn't hang after 10 cycles - could you give feedback and try to find a way to reproduce the issue quickly?
The first testrun works fine , but further runs of the same test will hang. Sometimes two runs will work fine, but then the third run will hang. Even longer breaks between the testruns of pthread_cond_latency won't help. I called pthread_cond_latency with the parameter "4", indicating that each testrun itself will do 4 loops. If i run the test with "1" as parameter the test works fine. The original script from rt-tests/ltp-realtime also uses four loops.
Felix, Please capture a stack trace (sysrq-t) when the test hangs and attach. I am looking into a similar hang on the pthread-detach test and would like to see if there is a correlation. thx
This sounds like a futex is not properly getting unlocked. Subsequent tests would sleep trying to lock that futex.
Created attachment 208686 [details] sysrq-t output Attached to this comment is the output from sysrq-t in /var/log/messages (unnecessary parts removed)
Still valid?
Closing - no response from reporter.