|
Bugzilla – Full Text Bug Listing |
| Summary: | D-Bus rc-script exits before dbus-daemon is ready/up | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 10.3 | Reporter: | Daniel Rose <tremonium2100> |
| Component: | Mobile Devices | Assignee: | Holger Macht <hmacht> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | behlert, benjamin.j.mccann, ffiene, frasirchan, gleixner, hmuelle, martin.wilck, pkreussel, rastislav.krupansky, ro |
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | i686 | ||
| OS: | openSUSE 10.3 | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
output of lshal when Supend greyed out in Kpowersave
modified dbus rc script |
||
|
Description
Daniel Rose
2007-10-05 02:34:52 UTC
Hmm, works for me. Danny? I have the same problem: At first, suspend to disk an ram worked out of the box on my Thinkpad R60e with 10.3 final. Now, both are disabled in Kpowersave, buttons produce the error message stated above, after rebooting *sometimes* the menuentries are enabled. s2ram and s2disk work. Anything to do with "Unknown username gdm" in my boot.msg? <notice>startproc: execve (/usr/sbin/console-kit-daemon) [ /usr/sbin/console-kit-daemon ], [ CONSOLE=/dev/console ROOTFS_FSTYPE=ext3 SHELL=/bin/sh TERM=linux ROOTFS_FSCK=0 LC_ALL=POSIX INIT_VERSION=sysvinit-2.86 REDIRECT=/dev/tty1 COLUMNS=124 PATH=/bin:/sbin:/usr/bin:/usr/sbin vga=0x317 RUNLEVEL=5 PWD=/ SPLASHCFG=/etc/bootsplash/themes/SuSE/config/bootsplash-1024x768.cfg PREVLEVEL=N LINES=44 HOME=/ SHLVL=2 splash=silent SPLASH=yes ROOTFS_BLKDEV=/dev/disk/by-id/scsi-SATA_FUJITSU_MHV2060_NW06T6626Y0P-part5 _=/sbin/startproc DAEMON=/usr/sbin/console-kit-daemon ] Unknown username "gdm" in message bus configuration file (In reply to comment #1 from Kay Sievers) > Hmm, works for me. Danny? Never seen something like that, works for me too. Can only assume here some problems in PolicyKit or ConsoleKit atm. (In reply to comment #0 from Daniel Rose) > Suspending as root still worked. I "solved" the bug by adding a config-file to > /etc/pm/config.d with the line S2RAM_OPTS="-f -p -m". I later removed this file > and suspend still worked. Then it stopped working again and after rebooting > worked again without me changing anything. Don't change something in the pm config, this has absolutely nothing to do with your problem! > While suspend not working, "suspend -u" declared that PolicyKit denied suspend > (I do not have the correct output, because it works by now). We can't do anything without the correct error message. You should also check the state of hal (work lshal) and consolekit (rcconsolekit status). Created attachment 178861 [details]
output of lshal when Supend greyed out in Kpowersave
Since I needed to repartition my hard disk, I resinstalled OpenSuse 10.3 - everything the same now. After first reboot, Suspend to Disk and Suspend to RAM are greyed out in Kpowersave. Error Messages when pressing the Shortcut-Keys is "Supend to Disk/RAM durch Administrator deaktiviert". "suspend" is an internal shell command. What error messages do you need? I attached the output of lshal. Status of rcconsolekit was "unused". So I started the Service, restarted rcpowersaved and kpowersave. Didn't help, though. You need to log out and in again after starting consolekit for the changes to take effect. BTW.: Did you do an update? (In reply to comment #6 from Holger Macht) > You need to log out and in again after starting consolekit for the changes to > take effect. > > BTW.: Did you do an update? @Holger: Due to his last comment I would assume he did a clean install. This all looks to me as if we have a general problem with get ConsoleKit running on boot. The strange is: I have seen this never on one of my test machines with GM. I saw this only as we introduced the new HAL/policyKit/ConsoleKit the first time while beta. We should think about put all the ConsoleKit bugs together to two bugs: one for the update from 10.2 to 10.3 and one for clean 10.3 install. (In reply to comment #6 from Holger Macht) > You need to log out and in again after starting consolekit for the changes to > take effect. I see. So that happens when and only when consolekit is not running. > > BTW.: Did you do an update? > No, I did a clean install into an empty partition. I did a clean install as well. The only thing I kept was my home-directory One of you, please follow the suggestions in bug 333735, comment #12 and tell us whether it makes the problem vanish. Thanks. I just changed the boot-script and rebooted 5 times. Works for me now. I had the same problem. Making matters worse, both haldaemon and consolekit were not running (rchaldaemon status -> "unused"). Apparently this is due to dbus being started asynchronously by /etc/init.d/dbus, so that the rc script may finish before is dbus actually available, causing subsequent dependent services to fail (fail SILENTLY though, there were no error messages from the rc scripts). I put a "sleep 2" statement in /etc/init.d/haldaemon, and it helped. The hint in bug 333735, comment #12 should also cause dbus invocation to be serialized if I understand it correctly, but the reporter there said it didn't help. In my case, the delays in dbus startup were apparently caused by a problem in the ConsoleKit package. /etc/dbus-1/system.d/ConsoleKit.conf refers to a user gdm who didn't exist on my system (KDE installaiton, no gdm package installed). This caused an error message "" to be displayed by dbus, and may have caused a delay of some sort, too. Here is a boot.msg excerpt: Starting D-Bus daemondone <notice>startproc: execve (/usr/sbin/console-kit-daemon) [ /usr/sbin/console-kit-daemon ], [ CON SOLE=/dev/console ROOTFS_FSTYPE=ext3 SHELL=/bin/sh TERM=linux ROOTFS_FSCK=0 LC_ALL=POSIX INIT_VE RSION=sysvinit-2.86 REDIRECT=/dev/tty1 COLUMNS=124 PATH=/bin:/sbin:/usr/bin:/usr/sbin vga=0x317 RUNLEVEL=2 PWD=/ SPLASHCFG=/etc/bootsplash/themes/SuSE/config/bootsplash-1024x768.cfg PREVLEVEL= N LINES=44 HOME=/ SHLVL=2 splash=silent SPLASH=yes ROOTFS_BLKDEV=/dev/dm-0 _=/sbin/startproc DAE MON=/usr/sbin/console-kit-daemon ] Starting ConsoleKitdone acpid: 2 rules loaded Starting HAL daemonChecking/updating CPU microcode done <notice>startproc: execve (/usr/sbin/hald) [ /usr/sbin/hald --daemon=yes ], [ CONSOLE=/dev/conso le ROOTFS_FSTYPE=ext3 SHELL=/bin/sh TERM=linux ROOTFS_FSCK=0 LC_ALL=POSIX INIT_VERSION=sysvinit- 2.86 REDIRECT=/dev/tty1 COLUMNS=124 PATH=/bin:/sbin:/usr/bin:/usr/sbin vga=0x317 RUNLEVEL=2 PWD= / SPLASHCFG=/etc/bootsplash/themes/SuSE/config/bootsplash-1024x768.cfg PREVLEVEL=N LINES=44 HOME =/ SHLVL=2 splash=silent SPLASH=yes ROOTFS_BLKDEV=/dev/dm-0 _=/sbin/startproc DAEMON=/usr/sbin/h ald ] done Loading CPUFreq modulesUnknown username "gdm" in message bus configuration file done It can be seen that the dbus rc script exits, and much later (after the hal daemon was started), the dbus error msg about the "gdm" user is diplayed. At least, since I installed gdm, the error has disappeared for me. Nevertheless I reckon that it could still unless the dbus startup script is fixed. It needs to wait until dbus is fully initialized before it exits. The "fix" to install gdm (or just create a "gdm" user) isn't solid. So, 2 problems here: 1 dbus rc script must not exit before dbus is really up 2 ConsoleKit shouldn't refer to the "gdm" user (or depend on the gdm package, which would be annoying for KDE users) Would you mind testing the proposal from bug 333735, comment #12, just to make sure...and maybe remove the gdm package again (the warning about the missing gdm user this is a WONTFIX for 10.3, but could be looked at again if it is critical). With the proposal from bug 333735, comment #12, it was ok for me in ~20 subsequent boots. It's not a clean solution though. Solution proposal: a) remove <fork/> directive in /etc/dbus-1/system.conf b) use "startproc -s" to start dbus in /etc/init.d/dbus, like this: startproc -s -f -p $DBUS_DAEMON_PID $DBUS_DAEMON_BIN This has the same affect as <fork/>, but now startproc is aware of the fact dbus detaches and waits until dbus is initialized (verified that by means of my "gdm" error message). The current approach (with <fork/>) is broken, because there is no means to make sure that dbus is up when other services depending on dbus start. The problem would be more obvious if the haldaemon and consolekit rc scripts failed obviously when dbus isn't functional. Currently the rc scripts pretend to have succeeded, but later the service is marked "unused". Ok, thanks for the proposal. There are a few other bugs where I think they are related to this issue. Unfortunately, Timo is on vacation right now. When the other bugs have been confirmed, I'll care about... How would startproc wait and know until D-Bus is fully initialized? Having D-Bus fork itself sounds right, the D-Bus init script will not return until the daemon is running properly, and no dependency will start earlier. (In reply to comment #20 from Kay Sievers) > How would startproc wait and know until D-Bus is fully initialized? > > Having D-Bus fork itself sounds right, the D-Bus init script will not return > until the daemon is running properly, and no dependency will start earlier. If D-Bus forks, the init script returns _before_ D-Bus is up and running. I think starproc behaviour without any options is to start all processes asyncronous. That would be a silly bug in D-Bus, to fork before complete init. Did you see this, or are you guessing? With looking at the code, at least it calls bus_context_new(...) which in turn does the fork before calling _dbus_loop_run (...). And it even calls process_config_postinit(...) (whatever this does) and friends before becoming a daemon AFAICS. But the actual problem is that startproc returns no matter if D-Bus forks at the beginning or at the end of initialization if I see things right. (In reply to comment #22 from Kay Sievers) > That would be a silly bug in D-Bus, to fork before complete init. Did you see > this, or are you guessing? We saw it. That's what this bug is all about. It's easy to verify with the "invalid user gdm" problem, at least on my system. With the current OpenSUSE packages, the warning comes _after_ the rc script is finished: # rcdbus start starting DBUS services ... Done [ warning message about gdm user ] Using my proposal from comment #18: # rcdbus start starting DBUS services... [ warning message about gdm user ] Done Note that this is an SMP (Core-2) system. wrt comment #20: > How would startproc wait and know until D-Bus is fully initialized? Of course startproc does not know if dbus is 100% initialized yet. But there is a difference because now startproc itself does the daemonizing. Do be 100% sure, a test for working dbus services must be added after the startproc in /etc/init.d/dbus, with a wait loop if the service isn't there yet. I don't know how to write such a test though. Perhaps a test for exisitence of the dbus socket is sufficient. But that needs to be checked by dbus experts. Again, you can't use startproc, it will just return, without knowing what's going on. (In reply to comment #24 from Martin Wilck) > (In reply to comment #22 from Kay Sievers) > > That would be a silly bug in D-Bus, to fork before complete init. Did you see > > this, or are you guessing? > > We saw it. That's what this bug is all about. As Holger says, it appears to be actually that startproc exits immediately, it daemonizes the *parent* process immediately and doesn't wait for it to exit normally. If it did dbus really would be all setup. As Kay pointed out to me: startproc /bin/sleep 5 - it exits immediately, not after 5 seconds. > It's easy to verify with the "invalid user gdm" problem, at least on my system. > > With the current OpenSUSE packages, the warning comes _after_ the rc script is > finished: > > # rcdbus start > starting DBUS services ... Done > [ warning message about gdm user ] > > Using my proposal from comment #18: > > # rcdbus start > starting DBUS services... > [ warning message about gdm user ] > Done > > Note that this is an SMP (Core-2) system. > > wrt comment #20: > > How would startproc wait and know until D-Bus is fully initialized? > > Of course startproc does not know if dbus is 100% initialized > yet. But there is a difference because now startproc itself does the > daemonizing. > > Do be 100% sure, a test for working dbus services must be added after the > startproc in /etc/init.d/dbus, with a wait loop if the service isn't there yet. > I don't know how to write such a test though. Perhaps a test for exisitence of > the dbus socket is sufficient. But that needs to be checked by dbus experts. If so startproc is worse than useless here and we should just eliminate it like was done for 11.0 as per bug 332845. It seems startproc is more for things that don't do their own daemonization properly. The observed behavior must have another reason then. There is a lot of fork()ing involved when processes daemonize. Perhaps this is some weird scheduler behavior. In any case, for the rc script must make sure that associated service is _up_ before it finishes. Otherwise the whole startpar/dependency tracking stuff is doomed to fail randomly, and we are back at serial system initialization. Properly written services do not return from forking until they are ready to use. If D-Bus has problems here, we should fix the daemon itself, and not try to ensure anything in the init script. I guess, it seemed to work for you because D-Bus without internal fork() is just that bit faster, that it is initialized before the script terminates. But it's still unreliable and unpredictable, and letting D-Bus itself block the init script execution is the right solution. If external tools (startproc, init replacements) should do the forking/supervising/babysitting of a system service, the services itself must provide a signaling method when its initialization has finished, otherwise the forking process will never be able to guarantee the requirements for the dependent services. Startproc (in its current async behavior) can not be used for any service that has dependencies. Even for a complete serialized startup, it may cause problems when services are initializing in the background, while later services depend on it. Created attachment 180667 [details]
modified dbus rc script
Here is my latest attempt to create a simple workaround. I admit that fixing dbus would be better, but that's your job :-)
I found that my messages file was full of the following messages:
Oct 24 02:37:28 psyche console-kit-daemon[2583]: WARNING: Couldn't connect to system bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
That inspired the attached workaround.
Here is an strace of a self-forking D-Bus. It
- starts
- creates the socket
- listens to it
- sets the socket world readable
- forks the daemon
- writes the pid file
- exits
09:47:15 execve("/bin/dbus-daemon", ["dbus-daemon", "--system"], [/* 58 vars */]) = 0
09:47:15 stat64("/var/run/dbus/pid", 0xbfd677bc) = -1 ENOENT (No such file or directory)
09:47:15 socket(PF_FILE, SOCK_STREAM, 0) = 3
09:47:15 stat64("/var/run/dbus/system_bus_socket", 0xbfd6760c) = -1 ENOENT (No such file or directory)
09:47:15 bind(3, {sa_family=AF_FILE, path="/var/run/dbus/system_bus_socket"}, 33) = 0
09:47:15 listen(3, 30) = 0
09:47:15 fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
09:47:15 fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
09:47:15 chmod("/var/run/dbus/system_bus_socket", 0777) = 0
09:47:15 fcntl64(3, F_GETFD) = 0
09:47:15 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
09:47:15 chdir("/") = 0
09:47:15 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7dcc708) = 5490
09:47:15 open("/var/run/dbus/pid", O_WRONLY|O_CREAT|O_EXCL, 0644) = 4
09:47:15 fcntl64(4, F_GETFL) = 0x1 (flags O_WRONLY)
09:47:15 fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
09:47:15 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f37000
09:47:15 _llseek(4, 0, [0], SEEK_CUR) = 0
09:47:15 write(4, "5490\n", 5) = 5
09:47:15 close(4) = 0
09:47:15 exit_group(0) = ?
What are you trying to tell me with this trace? The "Failed to connect to socket: No such file or directory" messages are there, and they indicate pretty clearly that it's possible that console-kit-daemon starts before the socket is created. I don't care if this is fixed in dbus or in the rc scipt, but it must be fixed somehow. The attachment in comment #29 represents a potential fix which is easy to understand and to implement and has almost zero overhead in the non-error case. Please note that this is a nasty problem. Important functionality (NetworkManager, Suspend/Sleep, Hotplug) don't work without consolekit, and the reason is hard to find out even for experienced users (for me, personally, this was the first encounter with "ConsoleKit" and "PolicyKit"). It's the "Linux doesn't work properly on my Hardware" kind of error. One more note: It's weird that the console-kit rc script doesn't print FAILED messages during startup. If I manually stop dbus and try to start console-kit, it prints a FAILED message. I never saw that during startup. Interestingly, the rchal startscript starts OK even without dbus (but you see failure messages in syslog). I am repeating myself, but the only solid solution is to make sure that rcdbus doesn't exit with success before dbus is functional. My attempt in comment #29 may be too naive (the socket may exist but dbus still initializing), but it's a start. (In reply to comment #31 from Martin Wilck) > What are you trying to tell me with this trace? The "Failed to connect to > socket: No such file or directory" messages are there, and they indicate pretty > clearly that it's possible that console-kit-daemon starts before the socket is > created. Its a trace of starting dbus without startproc. It shows that the socket is setup and pid file is written before dbus exits if startproc is *not* used. If startproc is used, you get the error. As per bug 332845, replacing the startproc line with just: $DBUS_DAEMON_BIN $DBUS_DAEMON_PARAMETER should fix it. *** Bug 337394 has been marked as a duplicate of this bug. *** *** Bug 335499 has been marked as a duplicate of this bug. *** *** Bug 338390 has been marked as a duplicate of this bug. *** Update package candidates are here: ftp://ftp.suse.com/pub/people/hmacht/10.3/dbus-1/ Please test! Also make sure that when updating, the dbus daemon does not get restarted. As a result, you should _not_ get something like this in /var/log/messages: powersaved[30146]: ERROR (filter_function:99) DBus daemon disconnected. Trying to reconnect... Adjusting needinfo... I updated the packages from holger and this was my first boot since upgrade from 10.2. where haldaemon and consolekit started. Thanks! *** Bug 338389 has been marked as a duplicate of this bug. *** *** Bug 332057 has been marked as a duplicate of this bug. *** Patchinfo and fixed package submitted to 10.3 tree. *** Bug 333706 has been marked as a duplicate of this bug. *** Released for 10.3 *** Bug 340535 has been marked as a duplicate of this bug. *** *** Bug 338390 has been marked as a duplicate of this bug. *** |