Bug 514437 - DPC thread consumes 100% CPU utilization
Summary: DPC thread consumes 100% CPU utilization
Status: RESOLVED FIXED
Alias: None
Product: SUSE Linux Enterprise Real Time 10 SP2 (SLERT 10 SP2)
Classification: SUSE Linux Enterprise Real Time Extension
Component: kernel (show other bugs)
Version: Update4
Hardware: i386 Other
: P5 - None : Normal
Target Milestone: ---
Assignee: Sven Dietrich
QA Contact: Erik Hamera
URL:
Whiteboard:
Keywords: DSLA_REQUIRED
Depends on:
Blocks:
 
Reported: 2009-06-18 15:53 UTC by Mike Latimer
Modified: 2009-07-13 19:53 UTC (History)
2 users (show)

See Also:
Found By: ---
Services Priority: 400
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
As requested by ESD, lspci, interrupts and cpuinfo data (3.37 KB, application/x-compressed-tar)
2009-06-22 15:58 UTC, Alan Walker
Details
The adaptive-locking timeout patch which masks the DPC thread issue. (11.51 KB, patch)
2009-06-22 21:12 UTC, Sven Dietrich
Details | Diff
Break-out just the adaptive patch, eliminating peripheral clutter (5.08 KB, patch)
2009-06-22 21:17 UTC, Sven Dietrich
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Latimer 2009-06-18 15:53:25 UTC
After upgrading from SLERT Update 4 to Update 5, the 'DPC thread' kernel thread consumes 100% of a CPU core when an application utilizes the ESD-CAN kernel module.  

This issue is not present in update 4 or kernel 2.6.22.19-0.19-rt_bigsmp.

As per Sven Dietrich, compiled ESD CAN driver against source code linux -> linux-2.6.22.19-0.21 while running the 2.6.22.19-0.21-rt_bigsmp kernel. 

Issue is present after compiling under update 5 kernel.
Comment 1 Mike Latimer 2009-06-18 16:08:22 UTC
Just got the supportconfig output. Here is the some evidence of the problem:

#==[ Top 10 CPU Processes ]=========================#
%CPU   PID USER     CMD
95.7  7888 adacs    can_test
45.9  7883 root     [dpc thread]
 1.5  7890 root     /bin/bash /sbin/supportconfig
 0.2  7735 adacs    top
 0.2  6907 root     [IRQ-17]
 0.0  8680 root     sed -e /^%/d
 0.0  8679 root     head -11
 0.0  8678 root     sort -k 1 -r -n
 0.0  8677 root     ps axwwo %cpu,pid,user,cmd
 0.0  7885 root     [IRQ-27]

These are the modules that are tainting the kernel:

module=esdcan_pci405   license=None                       supported=no
module=nls_iso8859_1   license=Dual      BSD/GPL          supported=yes
module=nls_cp437       license=Dual      BSD/GPL          supported=yes
module=nvidia          license=NVIDIA                     supported=no
module=agpgart         license=GPL and additional rights  supported=yes
module=rtc_cmos        license=GPL                        supported=no
module=rtc_core        license=GPL                        supported=no
module=rtc_lib         license=GPL                        supported=no
Comment 3 Sven Dietrich 2009-06-18 16:55:34 UTC
Is it possible to confirm that the issue is present in the i386 and the i386-bigsmp Kernel?
Comment 4 Alan Walker 2009-06-18 17:26:04 UTC
Can you specify the exact kernel revision you are speaking about.  Are you talking about the SLES Kernel?
Comment 5 Sven Dietrich 2009-06-18 19:30:54 UTC
(In reply to comment #4)
> Can you specify the exact kernel revision you are speaking about.  Are you
> talking about the SLES Kernel?

Can you reproduce the issue in kernel-rt-2.6.22.19-0.19 as well as in kernel-rt_bigsmp-2.6.22.19-0.19?
Comment 6 Alan Walker 2009-06-18 22:57:20 UTC
As noted above, the issue is not present in kernel kernel-rt_bigsmp-2.6.22.19-0.19, which is update 4.  I will test the kernel-rt-2.6.22.19-0.19 and report back.
Comment 7 Sven Dietrich 2009-06-18 23:07:05 UTC
(In reply to comment #6)
> As noted above, the issue is not present in kernel
> kernel-rt_bigsmp-2.6.22.19-0.19, which is update 4.  I will test the
> kernel-rt-2.6.22.19-0.19 and report back.

Yes, I completely mis-stated Comment #5.

The expectation would be, that the issue is NOT found in any -0.19 (or prior) Kernels, while any Kernels of the -0.21 varieties should exhibit high CPU utilization from the DPC Thread.

More importantly, does the PTF-2 Kernel I provided today exhibit the problem?
Comment 8 Alan Walker 2009-06-22 15:57:36 UTC
***2.6.22.19-DPC_2-rt test kernel exhibits DPC Thread 100% of CPU issue***

Attachment to follow with requested /proc hardware info.

Installed:  2.6.22.19-DPC_2-rt
Fresh compile of CAN kernel module from:
Driver for ESD CAN cards              Version > 3.0.0               05.02.2008

Loaded with 
insmod /lib/modules/2.6.22.19-DPC_2-rt/esdcan/esdcan-pci405

lsmod|grep -i esd
esdcan_pci405        1499404  2


Executed: can_test

Cut from top:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5281 root      20   0     0    0    0 R   99  0.0   4:23.23 dpc thread
 5322 adacs     20   0  106m 1856 1008 S   99  0.1   4:23.28 can_test


lspci -v for card:

07:06.0 CANBUS: IBM 405GP PLB to PCI Bridge (rev 01)
        Subsystem: ESD Electronic System Design GmbH Unknown device 0407
        Flags: bus master, 66MHz, medium devsel, latency 71, IRQ 27
        Memory at d1000000 (32-bit, prefetchable) [size=16M]
        Memory at d0000000 (32-bit, prefetchable) [size=16M]
        Capabilities: [58] Power Management version 2
Comment 9 Alan Walker 2009-06-22 15:58:56 UTC
Created attachment 299553 [details]
As requested by ESD, lspci, interrupts and cpuinfo data
Comment 10 Sven Dietrich 2009-06-22 21:12:54 UTC
Created attachment 299620 [details]
The adaptive-locking timeout patch which masks the DPC thread issue.

This patch was initially included with the adaptive timeout patch set.
When SLERT dev-team pushed the technology upstream to RT developer community, the adaptive timeout was dropped. Reasoning for dropping was that infinite wait times were technically not possible.
This patch re-introduces the adaptive locking timeout, as work-around, pending resolution of the CAN bus driver issue.
Generation of a corresponding PTF Kernel and source tree is in progress.
Comment 11 Sven Dietrich 2009-06-22 21:17:29 UTC
Created attachment 299621 [details]
Break-out just the adaptive patch, eliminating peripheral clutter

This shows just the changes introduced by the adaptive timeout patch, eliminating changes made to resolve conflicts in other patches.
Comment 12 Alan Walker 2009-06-26 15:28:45 UTC
I have run preliminary testing on the 3.8.12 driver from ESD with Novell's esd_complete.patch file applied to the 3.8.12 source.  The cantest application seems to report the interface is working correctly and we are now receiving date in our data acquisition system.  Functionally, everything seems to be fine, but we are going to run the system over the weekend to give it a very basic stability test.  I am currently waiting on ESD to provide a more robust test methodology based on ESD's cantest application to validate the patched 3.8.12 driver.
Comment 13 Alan Walker 2009-06-29 15:09:18 UTC
We have found the patched version of the ESD 3.8.12 driver to be unstable.  When we ran our acquisition system with 4 user monitoring applications the system would crash and execute the kdump core capture process.  On Friday I configured the system to use only one instance of our user application and the crash occurred 10 hour later by automated kdump.  I have uploaded the core file and test notes in file 514437-cantest_core-2009-06-27-03-20.tgz to ftp.novell.com/incoming.  

I am currently collecting a core file from a test advised by ESD that I executed in runlevel 3.  Using this test I can reproduce the kernel crash in less than 5 minutes.  I will update this bug when I have uploaded this core dump as it may be easier to trace the issue as less applications/services are running.
Comment 14 Sven Dietrich 2009-07-13 19:52:26 UTC
From Roy Kinsella:

Hello everyone, 

We have been testing the new driver for 24 hours, on our test system 
running the SLERT kernel 2.6.22.19-0.21-rt_bigsmp,  with continuous real 
i/o traffic on four channels with complete success.  We are also able to 
perform numerous restarts of  our application level interface without 
system lockups.  It looks like this this one is golden.

Thank you everyone for your effort and support.

Roy


Roy Kinsella, P.Eng.
Manager - Applications and Software Engineering
Horiba ATS
Comment 15 Sven Dietrich 2009-07-13 19:53:32 UTC
Marking fixed.