Bug 186061

Summary: can not start crypto helper: failed to find any available worker
Product: [openSUSE] SUSE Linux 10.1 Reporter: Andreas Schwab <schwab>
Component: NetworkAssignee: Marius Tomaschewski <mt>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: lmuelle, radmanic, suse-beta
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: grep -F 'pluto[17401]' /var/log/messages

Description Andreas Schwab 2006-06-17 15:16:44 UTC
Every now and then (about once per hour, probably depending on the traffic) the IPsec SA expires because "can not start crypto helper: failed to find any available worker".  That only happens when using NAT traversal.
Comment 1 Andreas Schwab 2006-06-17 15:18:33 UTC
Created attachment 90031 [details]
grep -F 'pluto[17401]' /var/log/messages
Comment 2 Marius Tomaschewski 2006-06-20 10:19:13 UTC
If you have more than two CPUs (reported by sysconf(_SC_NPROCESSORS_ONLN))
pluto starts ncpu_online-1 helpers. Otherwise it starts only one helper.
BTW: You can also override this value using the "nhelpers" parameter in
the "config setup" section of the /etc/ipsec.conf.

What happens is, that if you have multiple tunnels (to one destination) the
reinit of the IPSEC SAs are done asynchronously -- but they are serialized,
because there are no avaliable worker in this moment, that is, all workers
(usually only one) are busy with work for an another tunnel.

If this happens, the tunnel SA may expire, but it is marked for a reinit
on demand. As soon as there are packets for this tunnel, pluto reinits
it again.

You can see in log lines like this:

Jun 17 17:10:09 whitebox pluto[17401]:
initiate on demand from 10.204.0.116:0 to 149.44.160.50:0 proto=0 state:
 fos_start because: acquire
"schwab-novell1" #26: initiating Quick Mode RSASIG+ENCRYPT+TUNNEL+PFS+UP
 {using isakmp#21}
"schwab-novell1" #26: transition from state STATE_QUICK_I1 to state
 STATE_QUICK_I2
"schwab-novell1" #26: STATE_QUICK_I2: sent QI2, IPsec SA established 
 ...

This is how pluto is wokring now... I can't change this default.

You can also set the ("keep_alive=20" and) "force_keepalive=yes"
options in "config setup" of /etc/ipsec.conf.
Comment 3 Andreas Schwab 2006-06-20 15:54:22 UTC
Then why does it _never_ happen without NAT traversal?
Comment 4 Andreas Schwab 2006-06-24 08:59:31 UTC
Neither keep_alive nor force_keepalive are documented in ipsec.conf(5).
Comment 5 Marius Tomaschewski 2006-06-26 13:07:20 UTC
Yes, I know that they're documented.

  nhelpers=< number of helpers >= 0, 0 disables use of helpers >
  keep_alive=< in seconds, e.g. 20 >
  force_keepalive=< yes | no >

I've tested the actual 2.4.5 - there is no behaviour difference.
The "initiate on demand" is still used (and I think it'll remain).

Today I started to test the 2.4.6rc1 version... I'll submit it to
our BETA dist tree later.
Comment 6 Marius Tomaschewski 2006-06-26 13:12:10 UTC
(In reply to comment #5)
> Yes, I know that they're documented.
                         not
Comment 7 Andreas Schwab 2006-07-01 09:55:50 UTC
force_keepalive does not change anything.
Comment 8 Marius Tomaschewski 2006-07-10 13:12:51 UTC
BTW:
I reported this issue long time ago (08-26-05) in the openswan bug
tracking system: http://bugs.xelerance.com/view.php?id=412
It is still open and assigned to mcr at xelerance until now...
Comment 9 Marius Tomaschewski 2006-08-30 14:16:15 UTC
I've updated to openswan-2.4.6 (in BETA at the moment) and built
RPMs for 10.0 and 10.1 (and stable) at:

  http://www.suse.de/~mt/openswan/RPMs/

Now, the ipsec.conf contains an "nhelpers=0" by default that should
avoid this problem.

Please try out if it works for you. Thanks!
Comment 10 Marius Tomaschewski 2006-09-25 15:07:09 UTC
Fixed by "nhelpers=0" option, that is used by default on STABLE (10.2).
Comment 11 Egbert Eich 2007-01-22 21:46:29 UTC
Marius thinks that my patch in Bug #234042 may fix this problem so the nhelpers = 0 workaround isn't needed any more.
Andreas, could you give it a try?
If not we should close the ticket again, otherwise we should mark as duplicate.
Comment 12 Andreas Schwab 2007-01-23 19:37:40 UTC
It doesn't help.
Comment 13 Milisav Radmanic 2007-03-14 15:38:56 UTC
Is this still an open bug?

Did the proposed workaround (adding "nhelpers=0") do the trick?

At least the patch for fix of Bug #234042 does not seem to affect this bug as indicated in comment #11?

Please comment.
Comment 14 Marius Tomaschewski 2007-03-16 10:11:34 UTC
No, as Andreas already wrote, the fix from bug 234042 does not help
against this problem. The "nhelpers=0" workaround is still needed
and is currently the only "official fix" as provided upstream.