Bugzilla – Bug 514437
DPC thread consumes 100% CPU utilization
Last modified: 2009-07-13 19:53:32 UTC
After upgrading from SLERT Update 4 to Update 5, the 'DPC thread' kernel thread consumes 100% of a CPU core when an application utilizes the ESD-CAN kernel module. This issue is not present in update 4 or kernel 2.6.22.19-0.19-rt_bigsmp. As per Sven Dietrich, compiled ESD CAN driver against source code linux -> linux-2.6.22.19-0.21 while running the 2.6.22.19-0.21-rt_bigsmp kernel. Issue is present after compiling under update 5 kernel.
Just got the supportconfig output. Here is the some evidence of the problem: #==[ Top 10 CPU Processes ]=========================# %CPU PID USER CMD 95.7 7888 adacs can_test 45.9 7883 root [dpc thread] 1.5 7890 root /bin/bash /sbin/supportconfig 0.2 7735 adacs top 0.2 6907 root [IRQ-17] 0.0 8680 root sed -e /^%/d 0.0 8679 root head -11 0.0 8678 root sort -k 1 -r -n 0.0 8677 root ps axwwo %cpu,pid,user,cmd 0.0 7885 root [IRQ-27] These are the modules that are tainting the kernel: module=esdcan_pci405 license=None supported=no module=nls_iso8859_1 license=Dual BSD/GPL supported=yes module=nls_cp437 license=Dual BSD/GPL supported=yes module=nvidia license=NVIDIA supported=no module=agpgart license=GPL and additional rights supported=yes module=rtc_cmos license=GPL supported=no module=rtc_core license=GPL supported=no module=rtc_lib license=GPL supported=no
Is it possible to confirm that the issue is present in the i386 and the i386-bigsmp Kernel?
Can you specify the exact kernel revision you are speaking about. Are you talking about the SLES Kernel?
(In reply to comment #4) > Can you specify the exact kernel revision you are speaking about. Are you > talking about the SLES Kernel? Can you reproduce the issue in kernel-rt-2.6.22.19-0.19 as well as in kernel-rt_bigsmp-2.6.22.19-0.19?
As noted above, the issue is not present in kernel kernel-rt_bigsmp-2.6.22.19-0.19, which is update 4. I will test the kernel-rt-2.6.22.19-0.19 and report back.
(In reply to comment #6) > As noted above, the issue is not present in kernel > kernel-rt_bigsmp-2.6.22.19-0.19, which is update 4. I will test the > kernel-rt-2.6.22.19-0.19 and report back. Yes, I completely mis-stated Comment #5. The expectation would be, that the issue is NOT found in any -0.19 (or prior) Kernels, while any Kernels of the -0.21 varieties should exhibit high CPU utilization from the DPC Thread. More importantly, does the PTF-2 Kernel I provided today exhibit the problem?
***2.6.22.19-DPC_2-rt test kernel exhibits DPC Thread 100% of CPU issue*** Attachment to follow with requested /proc hardware info. Installed: 2.6.22.19-DPC_2-rt Fresh compile of CAN kernel module from: Driver for ESD CAN cards Version > 3.0.0 05.02.2008 Loaded with insmod /lib/modules/2.6.22.19-DPC_2-rt/esdcan/esdcan-pci405 lsmod|grep -i esd esdcan_pci405 1499404 2 Executed: can_test Cut from top: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5281 root 20 0 0 0 0 R 99 0.0 4:23.23 dpc thread 5322 adacs 20 0 106m 1856 1008 S 99 0.1 4:23.28 can_test lspci -v for card: 07:06.0 CANBUS: IBM 405GP PLB to PCI Bridge (rev 01) Subsystem: ESD Electronic System Design GmbH Unknown device 0407 Flags: bus master, 66MHz, medium devsel, latency 71, IRQ 27 Memory at d1000000 (32-bit, prefetchable) [size=16M] Memory at d0000000 (32-bit, prefetchable) [size=16M] Capabilities: [58] Power Management version 2
Created attachment 299553 [details] As requested by ESD, lspci, interrupts and cpuinfo data
Created attachment 299620 [details] The adaptive-locking timeout patch which masks the DPC thread issue. This patch was initially included with the adaptive timeout patch set. When SLERT dev-team pushed the technology upstream to RT developer community, the adaptive timeout was dropped. Reasoning for dropping was that infinite wait times were technically not possible. This patch re-introduces the adaptive locking timeout, as work-around, pending resolution of the CAN bus driver issue. Generation of a corresponding PTF Kernel and source tree is in progress.
Created attachment 299621 [details] Break-out just the adaptive patch, eliminating peripheral clutter This shows just the changes introduced by the adaptive timeout patch, eliminating changes made to resolve conflicts in other patches.
I have run preliminary testing on the 3.8.12 driver from ESD with Novell's esd_complete.patch file applied to the 3.8.12 source. The cantest application seems to report the interface is working correctly and we are now receiving date in our data acquisition system. Functionally, everything seems to be fine, but we are going to run the system over the weekend to give it a very basic stability test. I am currently waiting on ESD to provide a more robust test methodology based on ESD's cantest application to validate the patched 3.8.12 driver.
We have found the patched version of the ESD 3.8.12 driver to be unstable. When we ran our acquisition system with 4 user monitoring applications the system would crash and execute the kdump core capture process. On Friday I configured the system to use only one instance of our user application and the crash occurred 10 hour later by automated kdump. I have uploaded the core file and test notes in file 514437-cantest_core-2009-06-27-03-20.tgz to ftp.novell.com/incoming. I am currently collecting a core file from a test advised by ESD that I executed in runlevel 3. Using this test I can reproduce the kernel crash in less than 5 minutes. I will update this bug when I have uploaded this core dump as it may be easier to trace the issue as less applications/services are running.
From Roy Kinsella: Hello everyone, We have been testing the new driver for 24 hours, on our test system running the SLERT kernel 2.6.22.19-0.21-rt_bigsmp, with continuous real i/o traffic on four channels with complete success. We are also able to perform numerous restarts of our application level interface without system lockups. It looks like this this one is golden. Thank you everyone for your effort and support. Roy Roy Kinsella, P.Eng. Manager - Applications and Software Engineering Horiba ATS
Marking fixed.