Bug 395114

Summary: flock interface change between 10.2 and 10.3
Product: [openSUSE] openSUSE 10.3 Reporter: Ben Harris <bjh21>
Component: OtherAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED NORESPONSE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: jeffm, werner
Version: Final   
Target Milestone: ---   
Hardware: i686   
OS: openSUSE 10.3   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Bug Depends on: 394787    
Bug Blocks:    

Description Ben Harris 2008-05-28 15:36:41 UTC
This is kind of a followup to #394787, in that the underlying startup problem has now manifested on three different openSUSE 10.3 systems, so I don't think it's the fault of the oddities of the original system.

Running the standard syslogd-1.4.1-632 package, when I try to start syslogd by running "/etc/init.d/syslog start, I end up with these processes and no working syslog:

root     29725     1  0 16:16 pts/0    00:00:00 /sbin/syslogd
root     29726 29725  0 16:16 ?        00:00:00 [syslogd] <defunct>

Manually starting syslogd under strace reveals the following behaviour in the child:

[pid 30475] open("/var/run/syslogd.pid", O_RDONLY) = 0
[pid 30475] fstat64(0, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 30475] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7eef000
[pid 30475] read(0, "", 4096)           = 0
[pid 30475] close(0)                    = 0
[pid 30475] munmap(0xb7eef000, 4096)    = 0
[pid 30475] open("/var/run/syslogd.pid", O_RDWR|O_CREAT, 0644) = 0
[pid 30475] fcntl64(0, F_GETFL)         = 0x2 (flags O_RDWR)
[pid 30475] fstat64(0, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 30475] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7eef000
[pid 30475] _llseek(0, 0, [0], SEEK_CUR) = 0
[pid 30475] flock(0, LOCK_EX|LOCK_NB)   = -1 EACCES (Permission denied)
[pid 30475] read(0, "", 4096)           = 0
[pid 30475] close(0)                    = 0
[pid 30475] munmap(0xb7eef000, 4096)    = 0
[pid 30475] fstat64(1, 0xbfef5da8)      = -1 EBADF (Bad file descriptor)
[pid 30475] mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7eee000
[pid 30475] write(1, "Can\'t lock, lock is held by pid 0.\n", 35) = -1 EBADF (Bad file descriptor)
[pid 30475] exit_group(1)               = ?

If I delete syslogd.pid first, I instead get:

[pid 31181] open("/var/run/syslogd.pid", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 31181] open("/var/run/syslogd.pid", O_RDWR|O_CREAT, 0644) = 0
[pid 31181] fcntl64(0, F_GETFL)         = 0x2 (flags O_RDWR)
[pid 31181] fstat64(0, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 31181] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7efd000
[pid 31181] _llseek(0, 0, [0], SEEK_CUR) = 0
[pid 31181] flock(0, LOCK_EX|LOCK_NB)   = -1 EACCES (Permission denied)
[pid 31181] read(0, "", 4096)           = 0
[pid 31181] close(0)                    = 0
[pid 31181] munmap(0xb7efd000, 4096)    = 0
[pid 31181] fstat64(1, 0xbf9a2048)      = -1 EBADF (Bad file descriptor)
[pid 31181] mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7efc000
[pid 31181] write(1, "Can\'t lock, lock is held by pid 0.\n", 35) = -1 EBADF (Bad file descriptor)
[pid 31181] exit_group(1)               = ?

The point where it all goes wrong is the failure of the flock() call.  The man page for flock() doesn't document EACCES as an error code for it, and since syslogd is running as root and has the file open read/write, I can't see any obvious cause.  The problem occurs on systems with both ReiserFS and EXT3 root filesystems, and even occurs if I replace the syslogd package with the one from openSUSE 10.2.  syslogd worked correctly in openSUSE 10.2.
Comment 1 Dr. Werner Fink 2008-05-29 15:51:54 UTC
There is *no* difference between the code for opening, locking, and writing
a pid file between syslog from 10.2 and 10.3 ... in other words there must
be a difference between provided glibc macros and/or glibc function flock(2).

Petr?  What is going on there?
Comment 2 Dr. Werner Fink 2008-05-30 11:01:20 UTC
I've switched from flock() to fcntl(F_SETLK) to avoid the problem with
the changed flock() interface.  Nevertheless the question remains why
there is a difference between flock() from 10.2 and 10.3 *and* why the
syslogd binary from 10.2 seems to work correct even on a 10.3.
Comment 3 Petr Baudis 2008-12-04 12:29:05 UTC
In glibc, flock() is just thin syscall wrapper.
Comment 4 Jeff Mahoney 2008-12-05 19:20:49 UTC
-EACCES could be returned by security_file_lock. Are there any messages in your dmesg?
Comment 5 Jeff Mahoney 2009-05-15 18:19:30 UTC
Closing due to lack of response.