|
Bugzilla – Full Text Bug Listing |
| Summary: | Installing lighttpd patch crashes webyast | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.2 | Reporter: | Josef Reidinger <jreidinger> |
| Component: | Other | Assignee: | Marcus Rückert <mrueckert> |
| Status: | RESOLVED FIXED | QA Contact: | Klaus Kämpf <kkaempf> |
| Severity: | Critical | ||
| Priority: | P1 - Urgent | CC: | jkrupa, jsuchome, kkaempf, maint-coord, maintenance, meissner, mmarek, mrueckert, mvidner, radmanic, ro, schubi, security-team, werner |
| Version: | Final | ||
| Target Milestone: | Final | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | maint:released:sle11:29465 maint:released:sle11:30475 maint:released:11.1:34862 maint:released:11.2:34862 maint:released:11.3:34862 | ||
| Found By: | Development | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
start script for WebYaST client
start script for WebYaST service rclighttpd.diff lighty-webyast-init-test /tmp/rclighttpd.diff, with PID_FILE parsing lighty-webyast-init-test, complete and prettyfied lighty-webyast-init-test, extended with today's bugs |
||
|
Description
Josef Reidinger
2009-12-01 14:39:08 UTC
Oops, looks like we cannot handle updates of the running webyast. This might be tricky anyways. Can you log in again ? jsuchome also repoduce it. It is related to lighttpd update. And what is worse, that now I cannot communicate with appliance even if I manually restart services. So no page shown anymore. I cannot log in. I'm using VirtualBox with NAT and with next attempt, I cannot connect to port 54984 I use bridged network. I try restart network, yastw* and lighttpd service and nothing help. If I restart target machine, then it starts working. Josef, I assign this to you for further investigation. Martin suggested that SDK patches (like this one) should be already installed. OK, I see that patch is successfully installed. If I try ruby update, it works. So problem is in lighttpd update. Darix - as lighttpd maintainer do you have idea what is going wrong during update? Maybe some service restart missing. is the lighttpd running? (In reply to comment #9) > is the lighttpd running? Yes, webyast use lighttpd to its run. any test machine which i can access? After the update is installed, lighttpd is running with /etc/lighttpd/lighttpd.conf, and the two lighttpds for yastws and yastws are not running (BUT the init scripts think they do. a bug!) /var/log/yastws/log/production.log shows /usr/lib/ruby/gems/1.8/gems/rails-2.3.4/lib/fcgi_handler.rb:160:in `exit', probably a result of the RPM script restarting the server. Yes, I can confirm. After the update: lighttpd 5535 1.8 8.1 61012 41656 ? S 12:00 0:28 /usr/bin/ruby /srv/www/yast/public/dispatch.fcgi lighttpd 8846 0.0 0.1 5560 896 ? S 12:22 0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf Before update of lighttpd we have had: lighttpd 5532 0.0 0.5 6780 2748 ? S 12:00 0:00 /usr/sbin/lighttpd -f /srv/www/yast/config/lighttpd.conf lighttpd 5535 0.7 8.1 61012 41656 ? S 12:00 0:12 /usr/bin/ruby /srv/www/yast/public/dispatch.fcgi yastws 5557 0.0 0.3 6244 1664 ? S 12:00 0:00 /usr/sbin/lighttpd -f /etc/yastws/lighttpd.conf Setting DISABLE_RESTART_ON_UPDATE in /etc/sysconfig/services helps, I've just installed all updates with this and webyast looks healthy. However, I do not know if we can do this, it will affect all services... (In reply to comment #16) > Setting DISABLE_RESTART_ON_UPDATE in /etc/sysconfig/services I mean, Setting DISABLE_RESTART_ON_UPDATE to "yes" So, i would suggest that this flag should be set in the appliance and the user should be informed in the release notes that we have set this. Well, that will also disable restarts other services, even those that handle it gracefully e.g. ssh. The lighttpd init scripts kills all instances of lighttpd as it doesn't pass a pidfile to killproc. IIRC killproc would only kill one instance if it had a pid file. So this problem could be solved by fixing the lighttpd init script. i can not rely on the pid file, as it is set in the config file and i dont really think parsing the config will work reliable. Darix, do you see another solution for this problem ? Otherwise I see the solution which we have made in comment#18 only. ... or we can include current lighttpd patch in the appliance (comment 7) and hope that its next update will also somehow behave correctly on restart. comment#22 That would be at least useable for appliances. Sure. BTW. we are setting the pid files in lighttpd.conf: ## ## store a pid file ## server.pid-file = state_dir + "/yastwc.pid" ## ## store a pid file ## server.pid-file = state_dir + "/yastws.pid" I think Darix as the package maintainer of lighttpd is the right person :-) to behave 100% correctly an lighttpd restart would also need to restart your lighty instances. (In reply to comment #25) > to behave 100% correctly an lighttpd restart would also need to restart your > lighty instances. True, but we can include our own logic for that. If only the init script was not so eager in killing all instances, but I see that the current config layout makes selective killing quite hard :-( After checking back with mrueckert (lighttpd maintainer) and mls (rpm maintainer), we settled now for "%triggerin" to restart yast{wc,ws} if lighttpd gets updated.
package checked into obs
Martin, please create a new image for testing
No, it does not work :-( I have tested with these packages and build 22. As I have Darix understood there is also a patch for lighttpd needed and there has been raised and additional problem with startproc. He has promised to write yesterday an Email to Werner concerning that stuff but I have not seen one :-( In order to go on we have decided to pick up Martin's proposal from comment #7 Martin, could you please add the latest version of lighttpd to our repos ? Jsrain has confirmed that the triggers did not help. The actual triggers use %restart_on_update and this macro uses the try-restart init script action. It restarts the service only if it is running. But lighty kills yastw[sc] and so they do not restart. Darix, we do need lighttpd fixed to work only on its own instance. This is a critical problem for WebYaST. (In reply to comment #30) > The actual triggers use %restart_on_update and this macro uses the try-restart > init script action. It restarts the service only if it is running. > But lighty kills yastw[sc] and so they do not restart. Actually there is a bug where rcyastw[sc] think they are running when in fact only the generic lighttpd is running. I will fork this report. That means try-restart should restart webyast in the end because the bugs should cancel out. But it doesn't I don't know why yet. (In reply to comment #30) > Jsrain has confirmed that the triggers did not help. > > The actual triggers use %restart_on_update and this macro uses the try-restart > init script action. It restarts the service only if it is running. > But lighty kills yastw[sc] and so they do not restart. The lets try calling "...yast{wc,ws} restart" directly ! > > Darix, we do need lighttpd fixed to work only on its own instance. This is a > critical problem for WebYaST. Adapting component now, this is no WebYaST bug (In reply to comment #32) > The lets try calling "...yast{wc,ws} restart" directly ! That won't help, it turns out to be related to the status problem. See bug 560302. A simple restart or try-restart does not work: webyast-demo:~ # ps aux|grep light lighttpd 5535 0.7 8.1 61012 41656 ? S 12:00 0:17 /usr/bin/ruby /srv/www/yast/public/dispatch.fcgi lighttpd 8902 0.3 0.1 5560 896 ? S 12:37 0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf root 8911 0.0 0.1 3264 792 pts/0 S+ 12:38 0:00 grep light webyast-demo:~ # rcyastws stop Shutting down yastws done webyast-demo:~ # cat /var/run/yastws.pid 5557 webyast-demo:~ # rcyastws status Checking for service yastws running webyast-demo:~ # ps aux|grep light lighttpd 5535 0.7 8.1 61012 41656 ? S 12:00 0:17 /usr/bin/ruby /srv/www/yast/public/dispatch.fcgi lighttpd 8902 0.0 0.1 5560 896 ? S 12:37 0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf root 8930 0.0 0.1 3264 788 pts/0 S+ 12:39 0:00 grep light webyast-demo:~ # webyast-demo:~ # webyast-demo:~ # rcyastws restart Shutting down yastws done Starting yastws done webyast-demo:~ # ps aux|grep light lighttpd 5535 0.7 8.1 61012 41656 ? S 12:00 0:17 /usr/bin/ruby /srv/www/yast/public/dispatch.fcgi lighttpd 8902 0.0 0.1 5560 896 ? S 12:37 0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf root 8956 0.0 0.1 3264 792 pts/0 S+ 12:39 0:00 grep light webyast-demo:~ # cat /var/run/yastws.pid 5557 Created attachment 330735 [details]
start script for WebYaST client
Created attachment 330736 [details]
start script for WebYaST service
I think we have collected enough data to ask Werner for help :-) Werner, please have a look to both init scripts and espl. to the comment#33. Perhaps we have made a general error here. Could you please check ? If you would like to see it "online" just call me. Werner, do I understand it right that checkproc -k does not need -i? And because -k means "imitate killproc", than killproc does not need -i either? I have just removed them from yastwS and it indeed left yastwC untouched. Created attachment 330771 [details]
rclighttpd.diff
Here's a proposed patch to /etc/init.d/lighttpd.
As said earlier, figuring out PID_FILE properly is very hard, I'll think about it and what to do if it fails.
But that does not really help. The whole concept of -p and -i works differently than we need, especially with startproc which (understandably) does not have a killproc-like option. startproc, even with -p, does not start a process if it is already running. and because the generic lighttpd cannot know about the other packages that also use its executable (yastwc, yastws), it cannot use -i So, if webyast is running, "rclighttpd start" reports success falsely, and does not start it. I think that startproc and friends are simply incompatible with a daemon that has multiple independent instances using a single executable. The -p option works as LSB told us. And the option -i works e.g. see
sendmail. The -k option for checkproc simply does:
-k This option makes checkproc work like killproc(8) which changes
the operation mode, e.g. the exit status of the program will be
that of killproc(8). Without this option, checkproc works like
startproc (8) and finds all processes with an executable that
matches the specified pathname, even if a given pid file (see
option -p) isn't up-to-date. Nevertheless it uses its own
exit status (see section EXIT CODES).
to implement an other option for using pid file only even if it is not valid
anymore, that is not to scan the /proc file system in case of an broken
pid file I have simply not the time.
(In reply to comment #43) > The -p option works as LSB told us. Now that you mention it, no, it works wrong. Here I have yastwc and yastws running, and the generic lighttpd stopped. # pgrep -fl lighttpd 6884 /usr/sbin/lighttpd -f /etc/yastws/lighttpd.conf 6911 /usr/sbin/lighttpd -f /srv/www/yast/config/lighttpd.conf # rm /var/run/lighttpd.pid Now, this should start the generic one, but it doesn't: # startproc -p /var/run/lighttpd.pid \ -e /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf # pgrep -fl lighttpd 6884 /usr/sbin/lighttpd -f /etc/yastws/lighttpd.conf 6911 /usr/sbin/lighttpd -f /srv/www/yast/config/lighttpd.conf (I understand why our implementation doesn't. And it needs the ignore file because of that.) But LSB does specify that PID be obeyed: http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html """The start_daemon, killproc and pidofproc functions shall use the following algorithm for determining the status and the process identifiers of the specified program. 1. [...] If the -p pidfile option is specified and the named pidfile does not exist, the functions shall assume that the daemon is not running. """ AFACIR from comment #34 the pid file exists I think that could be a workaround: in the spec file of yast2-webservice: #--------------------------------------------------------------- # restart yastws on lighttpd update (bnc#559534) %triggerin -- lighttpd killproc -i /var/run/yastwc.pid /usr/sbin/lighttpd if test -f '/var/run/yastws.pid' rm /var/run/yastws.pid startproc -p /var/run/yastws.pid -i /var/run/yastwc.pid /usr/sbin/lighttpd -f /etc/yastws/lighttpd.conf fi in the spec file of yast2-webclient: #--------------------------------------------------------------- # restart yastwc on lighttpd update (bnc#559534) %triggerin -- lighttpd killproc -i /var/run/yastws.pid /usr/sbin/lighttpd if test -f '/var/run/yastwc.pid' rm /var/run/yastwc.pid startproc -p /var/run/yastwc.pid -i /var/run/yastws.pid /usr/sbin/lighttpd -f /etc/yastws/lighttpd.conf fi The killproc kills all wrong started lighttpd deamons. The test -f '/var/run/yastw?.pid' checks if the process has run before. I have made these calls step by step after the lighttpd package has been installed and both services are running again. So I have the hope (if trigger works) we can use this workarounnd. Any opinions ? The disadvantage would be that all other lighttpd deamons (not yast-deamons) will be stopped. But at that moment they are already killed by the lighttpd update. (In reply to comment #45) > AFACIR from comment #34 the pid file exists In comment 44, where I was demonstrating how start_daemon(*) does not conform to LSB, the PID file does not exist. But anyway, I can demonstrate the brokenness on comment 34 too. When running yastws start, the pid file exists, and contains a pid that does not exist. Yet afterwards the service is not started. man startproc: "startproc does not use the pid to search for a process but the full path of the corresponding program which is used to identify the executable" LSB: "Conforming implementations may use other mechanisms besides those based on pidfiles, unless the -p pidfile option has been used. Conforming applications should not rely on such mechanisms and should always use a pidfile." (*) I used startproc, but meanwhile I checked that start_daemon is broken too. Re comment 46 and 47: mvidner: [this is incorrect, you are] killing other lightys (vendor's payload?) schubi: at that moment there is no valid lightys available. At least we have no other fix. As I said it is just a "workaround" schubi: At least I like it more than patching "DISABLE_RESTART_ON_UPDATE" mvidner: IMHO the workaround is "don't update lighty" and the fix is patching webyast, lighty, and sysvinit (and physics) mvidner: I was hoping for a simple fix, but by now I don't see one schubi: sure. For the moment it fits. I believe that with my workaround we could "survive" also the next lighttpd updates without solving the real problem. Try mbuild boole-werner-84 for a better version of startproc as in fact *one* line was missed to enable force in case of providing a pid file. And also make the manual page more clear how startproc works. Created attachment 330994 [details] lighty-webyast-init-test Werner thank you! The sysvinit fix is working (together with the comment 41 fix for rclighttpd and fixes for yastw[cs] that I yet have to post). I am attaching a working version of the test suite. I'm indebted to you as you're report has taken me to the missing line within startproc.c :) Created attachment 331020 [details]
/tmp/rclighttpd.diff, with PID_FILE parsing
Created attachment 331050 [details]
lighty-webyast-init-test, complete and prettyfied
Submitted a fixed sysvinit package to factory *and* 11.2 IMHO this requires a SWAMID for an update not only for sysvinit :) Thank you. Actually WebYaST needs the fix in SLE 11 GA. (Sorry not to mention it earlier, we only track WY bugs under 11.2) OK I've submitted it also to SLES11-SP1 Please open a separate bug report for the sysvinit issue ! Note that we need it for SLE 11 GA, before SP1. done Unfortunately, despite my wonderful test case, there's still a bug, where "rclighttpd restart" kills webyast.
Now it is killproc ignoring -p :
# pgrep -fl lighttpd
3631 /usr/sbin/lighttpd -f /etc/yastws/lighttpd.conf
# grep . /var/run/yastw[cs].pid
/var/run/yastwc.pid:3203
/var/run/yastws.pid:3631
# cat /var/run/lighttpd.pid
cat: /var/run/lighttpd.pid: není souborem ani adresářem
# killproc -TERM -p /var/run/lighttpd.pid /usr/sbin/lighttpd
# pgrep -fl lighttpd
#
So killproc kills yastws, contrary to common sense and LSB ("If the -p pidfile option is specified and the named pidfile does not exist, the functions shall assume that the daemon is not running.")
Additionally, checkproc (and pidofproc) without -k ignore the pid file in a similar way. I could work around this by using -k (and rclighttpd does it), but the killproc problem needs a fix in sysvinit. Created attachment 331340 [details]
lighty-webyast-init-test, extended with today's bugs
And finally, startproc -p will not start the process IFF the pid file is stale and a process is already running with that executable.
Attaching an extended test script for c61-c63.
(In reply to comment #63) How should startproc determine that the pid file is stale? It could be a crash of one of several processes of one binrary without having the pid file removed. This case is AFAICS from LSB specs not handled. > How should startproc determine that the pid file is stale? A pid file is stale if the pid it contains is not present (or if it is referencing a process running a different binary, more about that in http://perfec.to/stalepid.html but that is not the case here). > It could be a crash of one of several processes of one binrary without having > the pid file removed. Yes, like in our case, where the single lighty binary is used for 3 separate services. > This case is AFAICS from LSB specs not handled. In this case, I think it is handled, with the quote from comment 48: http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html LSB: "Conforming implementations may use other mechanisms besides those based on pidfiles, unless the -p pidfile option has been used. Conforming applications should not rely on such mechanisms and should always use a pidfile." I interpret that as directing startproc to check the pid only and not look at other processes using that binary. The mbuild boole-werner-90 us running, please check out. (In reply to comment #66) Somehow libselinux-devel seems to be broken?? boole-werner-91 has passed all 73 assertions in 9 test cases. Thanks, Werner! Please submit to /work/src/done/SLES11 and SUSE:SLE-11:GA:Products:Test . I will copy the built package to http://www.suse.de/~mvidner/webyast-lighty/ . For WebYaST packages, Build0025 is fine already, I will rebuild the appliance with this sysvinit fix as 0.25.1 and call that GMC. Darix, is lighttpd available for 11.2 already, in some devel project? the fixed lighty is in sle11. i didnt submit one for 11.2 Update released for: sysvinit, sysvinit-debuginfo, sysvinit-debugsource Products: SLE-DEBUGINFO 11 (i386, ia64, ppc64, s390x, x86_64) SLE-DESKTOP 11 (i386, x86_64) SLE-SERVER 11 (i386, ia64, ppc64, s390x, x86_64) lighty is still hanging in /work/src/done/SLE11/lighttpd while it is needed for WebYaST which is officially launched tomorrow :-(( a nice. "is in sle11" was associatged with "in sle11 GA", not with "submitted for update". New SWAMPINFO required The SWAMPID for this issue is 30469. Please submit the patch and patchinfo file using this ID. (https://swamp.suse.de/webswamp/wf/30469) Raising prio because of bnc#562236 Update released for: lighttpd, lighttpd-debuginfo, lighttpd-debugsource, lighttpd-mod_cml, lighttpd-mod_magnet, lighttpd-mod_mysql_vhost, lighttpd-mod_rrdtool, lighttpd-mod_trigger_b4_dl, lighttpd-mod_webdav Products: SLE-DEBUGINFO 11 (i386, ia64, ppc64, s390x, x86_64) SLE-SDK 11 (i386, ia64, ppc64, s390x, x86_64) update released *** Bug 545338 has been marked as a duplicate of this bug. *** Update released for: sysvinit, sysvinit-debuginfo, sysvinit-debugsource, sysvinit-tools, sysvinit-tools-debuginfo Products: openSUSE 11.1 (debug, i586, ppc, x86_64) openSUSE 11.2 (debug, i586, x86_64) openSUSE 11.3 (debug, i586, x86_64) This is an autogenerated message for OBS integration: This bug (559534) was mentioned in https://build.opensuse.org/request/show/38504 Factory / lighttpd |