Bug 436310

Summary: nscd crash if you use ldap client for authentication
Product: [openSUSE] openSUSE 11.0 Reporter: Pedro Oliveira <pmsoliveira>
Component: OtherAssignee: Petr Baudis <pbaudis>
Status: RESOLVED NORESPONSE QA Contact: E-mail List <qa-bugs>
Severity: Critical    
Priority: P2 - High CC: harbaugh
Version: Final   
Target Milestone: ---   
Hardware: i586   
OS: openSUSE 11.0   
Whiteboard:
Found By: Integration Test Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: nsswitch.conf
nscd.conf

Description Pedro Oliveira 2008-10-16 22:39:27 UTC
Created attachment 246131 [details]
nsswitch.conf

Hi i've the following behaviour on my opensuse 11:

After boot nscd service stop randomly (sometimes take 1 minute, sometimes 10), i'm doing authentication against an open ldap server.

My architecture:
a cluster with running ipvs and 100 servers 
 
This shouldn't be a major problem but i'm using about 50-70 users for each machine with remote access, without nscd the system gets really slow (because of the passwd remote verification), but the worst part is that the openldap cluster gets knocked out with the amount of requests.

i've tried several novell distros and they all present the same simptom (opensuse 10.(0-3), sled 10, sled9, sles10, sles10 x86-64.

i've managed to get a workarround that is a tiny script that check for the pid of nscd every second and if it doesn't start it will restart it.

attached i'll send 
/etc/nsswitch
/etc/nscd.conf
Comment 1 Pedro Oliveira 2008-10-16 22:40:33 UTC
Created attachment 246132 [details]
nscd.conf
Comment 2 Petr Baudis 2008-11-19 19:28:32 UTC
Huh, 10.3 nscd used to be very stable - it would be interesting if you checked with that one again and paste nscd -d last few messages and include the core (call ulimit -c unlimited, then nscd -d from the same shell).

For 11.0, can you do the same, please?
Comment 3 Toni Harbaugh-Blackford 2009-01-17 14:43:19 UTC
I am seeing this bug also.  The problem of nscd dying unexpectely is causing
other problems - see bug 467161 (opensuse Factory as of 01/16/09, also
SLES 11 RC1)

I will try to get the 'nscd -d' output.
Comment 4 Toni Harbaugh-Blackford 2009-01-19 01:56:21 UTC
I ran nscd interactively with -d.  It crashed at an assertion:

.
.
.
30010: considering GETPWBYUID entry "4001", timeout 1232293240
30010: considering GETPWBYUID entry "74", timeout 1232293240
30010: considering GETPWBYUID entry "106", timeout 1232293240
30010: considering GETPWBYNAME entry "postfix", timeout 1232293240
30010: considering GETPWBYNAME entry "haldaemon", timeout 1232293240
30010: considering GETPWBYNAME entry "messagebus", timeout 1232293240
30010: considering GETPWBYNAME entry "ntp", timeout 1232293240
30010: considering GETPWBYUID entry "76", timeout 1232293240
30010: considering GETPWBYNAME entry "topol", timeout 1232293240
30010: considering GETPWBYUID entry "66", timeout 1232293240
30010: considering GETPWBYUID entry "3912", timeout 1232293240
30010: considering GETPWBYUID entry "100", timeout 1232293240
30010: considering GETPWBYNAME entry "volfovsn", timeout 1232293240
30010: remove GETPWBYUID entry "101"
30010: remove GETPWBYNAME entry "myi"
30010: remove GETPWBYNAME entry "ldap"
30010: remove GETPWBYNAME entry "sshd"
30010: remove GETPWBYUID entry "2287"
30010: remove GETPWBYNAME entry "avahi"
30010: remove GETPWBYNAME entry "harbaugh"
30010: remove GETPWBYUID entry "71"
30010: remove GETPWBYUID entry "51"
30010: remove GETPWBYNAME entry "nobody"
30010: remove GETPWBYNAME entry "wnn"
30010: remove GETPWBYUID entry "1596"
30010: remove GETPWBYUID entry "65534"
30010: remove GETPWBYUID entry "4001"
30010: remove GETPWBYNAME entry "postfix"
30010: remove GETPWBYUID entry "106"
30010: remove GETPWBYUID entry "74"
30010: remove GETPWBYNAME entry "haldaemon"
30010: remove GETPWBYNAME entry "messagebus"
30010: remove GETPWBYNAME entry "ntp"
30010: remove GETPWBYUID entry "76"
30010: remove GETPWBYNAME entry "topol"
30010: remove GETPWBYUID entry "66"
30010: remove GETPWBYUID entry "3912"
30010: remove GETPWBYUID entry "100"
30010: remove GETPWBYNAME entry "volfovsn"
nscd: mem.c:412: gc: Assertion `next_data < &he_data[db->head->nentries]' failed.
Aborted


I set 'ulimit -c unlimited', but no core file was produced
Comment 5 Toni Harbaugh-Blackford 2009-01-19 02:04:27 UTC
This appears to be the same problem as bug 387202


nscd: mem.c:412: gc: Assertion `next_data < &he_data[db->head->nentries]'
failed.
Aborted
Comment 6 Petr Baudis 2009-01-21 00:29:58 UTC
This has nothing to do with LDAP but is the same problem as bug 387202 indeed. No details about the LDAP interaction available, so I'm closing this.