Bug 464680

Summary: software management: provide user-friendly way to remove dead-locks
Product: [openSUSE] openSUSE 11.1 Reporter: macias - <bluedzins>
Component: YaST2Assignee: Ladislav Slezák <lslezak>
Status: RESOLVED INVALID QA Contact: Jiri Srain <jsrain>
Severity: Enhancement    
Priority: P5 - None    
Version: Final   
Target Milestone: ---   
Hardware: i586   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: y2logs.tgz

Description macias - 2009-01-08 22:18:10 UTC
You run SM, it crashes, you run it again and SM refuses to run because of the lock. Now -- what weekend-user Joe Doe is supposed to do? Run Konsole, 
sudo rm -f ...
?

Add "force run" as the another option to retry/cancel because in case of crash, neither retry nor cancel won't do any good.
Comment 1 Cyril Hrubis 2009-02-13 14:14:01 UTC
Do you have any logs from the crash and when SM complained about locks?
Comment 2 macias - 2009-02-13 15:46:11 UTC
No, but this is general problem, not that "in my case..." -- it is related to principles of SM design, how it handles "dangling" locks.
Comment 3 Ladislav Slezák 2009-02-23 09:30:01 UTC
The locks are handled quite well in the package manager - file /var/run/zypp.pid contains PID of the process which has the lock. If the process is not running the lock is automatically removed, so you should never need to manually remove the lock if SM has crashed.

Please, attach yast logs, there is more information about the lock (why it could not have been obtained).
Comment 4 macias - 2009-02-23 14:07:33 UTC
Created attachment 274665 [details]
y2logs.tgz

You know, usually in software, theory is one thing, and practice is another :-) -- I can only tell what I see.
Comment 5 Ladislav Slezák 2009-02-23 14:42:56 UTC
Um, the logs do not contain any failed attempt, all attempts to obtain the lock were successful.

Maybe the old logs were removed from the system (yast keeps 10 previous log files by default).

Please, try to reproduce and attach the logs again.
Comment 6 macias - 2009-02-27 19:12:32 UTC
Ok, I am looking right at the dialog informing about dead-lock. I created logs. The link is: https://bugzilla.novell.com/attachment.cgi?id=276111
Comment 7 Ladislav Slezák 2009-03-02 13:46:42 UTC
The log contains these messages from the failed SM start:

2009-02-27 19:59:28 <5> linux-dqhf(6873) [zypp] Exception.cc(log):133 Close this application before trying again.
2009-02-27 19:59:28 <5> linux-dqhf(6873) [zypp] Exception.cc(log):133 PkgFunctions.cc(zypp_ptr):103 RETHROW:  ZYppFactory.cc(getZYpp):366: System management is locked by the application with pid 2863 (/usr/lib/YaS
T2/bin/y2base).
2009-02-27 19:59:28 <5> linux-dqhf(6873) [zypp] Exception.cc(log):133 Close this application before trying again.
2009-02-27 19:59:28 <3> linux-dqhf(6873) [Pkg] PkgFunctions.cc(Connect):165 Error in Connect: FactoryException: System management is locked by the application with pid 2863 (/usr/lib/YaST2/bin/y2base).
Close this application before trying again.

Process 2863 is still running and prints messages

2009-02-27 20:02:21 <1> linux-dqhf(2863) [satsolver] PoolImpl.cc(logSat):81 installing old packages
2009-02-27 20:02:21 <1> linux-dqhf(2863) [satsolver] PoolImpl.cc(logSat):81 resolving job rules

to the log.

So there is no "dangling" lock, there is another Yast instance running and Yast behaves correctly regarding to the locks.
Comment 8 macias - 2009-03-02 14:38:13 UTC
Ladislav, then you can call it that SM crashes incorrectly, From user POV SM crashed. I would like to run it again and my wish is the cleanup will be automatic. That's all I wish for.

If it is incorrect crash, or incorrect lock cleanup, it is really second-grade problem, because it depends on design. And I can only tell what I see, thus reopening -- the problem _exists_.
Comment 9 Ladislav Slezák 2009-03-02 14:54:21 UTC
Sorry, there is no crash, another Yast _is_ still running.

See the log or check the list of running processes (Ctrl+Esc in KDE). Process 2863 is solving package dependencies in a loop (that's another bug), while another Yast instance is started which of course fails because another yast is already running.

You have to close (kill) the "frozen" Yast (reported in bug 480553), which has process ID 2863, before starting Yast again.
Comment 10 macias - 2009-03-02 15:53:32 UTC
I give up. 

The fact you see it in the logs does not mean it is running, and if it is splitted into two processes -- frontend and backend this is exactly what is this wish about, ability to run SM after crash without chasing locks, processes.

If I have to clean the mess by myself it is fairly easy for SM to start again "automatically".
Comment 11 Ladislav Slezák 2009-03-02 16:58:58 UTC
Oh, OK, I got it. I thought that Yast crashed completely and another idea was that it's running (completely). I didn't think about half-running Yast.

Anyway, the popup tells it clearly what to do in such case - close the application which name and PID is displayed in the popup. You can close it using X button or by a process manager (if there is no window).

The only way how to "fix" it is to explicitly kill the process. But that's too dangerous IMO, the process might be a running update, killing it in the middle of the update process may result in a broken (unusable) system.

The _user_ should decide what to do, any "magic" is not desired in such potentially dangerous situation.
Comment 12 macias - 2009-03-02 17:39:15 UTC
Ladislav, I agree with you that fully automatic overrun could be dangerous, but I don't agree with you that current state is what user decides.

Now, there is one extreme side:
* hey user, the pid 85757 is locked, figure out what to do
(exaggeration, on purpose)

The other extreme is:
* hey user, we just killed another SM, next time watch out

The latter is bad, but the former is no better -- there is _no hint_ what user could actually do.

And you said -- is should be up to user, it should be user decision. 

So now, let's imagine such dialog.
"Another SM seems to be running at the same time.
[ retry ] [ kill other SM ] [ cancel ]"

Is this user decision? 100%. Is it easy to user to manage any crashes, deadlocks, etc? Sure.

And I wish such behaviour -- that I, as user, does not have to search for solution. Just answer one-two questions, one confirmation, and voila, I am in SM.

Of course this is not about details -- there could be internal communication, so SM could ensure the other instance is really running. There could be
[ ] advanced options
sections, and so on, but I just wanted to show the big picture, the middle ground between
user on his own
and
SM leaves no space for user decision.