Bug 1208273

Summary: crm cluster stop took too long to complete (~1hr)
Product: [SUSE Linux Enterprise High Availability Extension] PUBLIC SUSE Linux Enterprise High Availability 15 SP3 Reporter: Pri Bal <priyanka.14balotra>
Component: PacemakerAssignee: Yan Gao <ygao>
Status: RESOLVED DUPLICATE QA Contact: SUSE Linux Enterprise High Availability Team <ha-bugs>
Severity: Normal    
Priority: P5 - None CC: priyanka.14balotra
Version: unspecified   
Target Milestone: unspecified   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Pri Bal 2023-02-15 06:32:02 UTC
crm cluster stop (cluster level operation) took too long to complete. The stop operation began at 17:21:04 and completed at 18:21:03 


Oct 01 17:21:04 FILE-2 pacemakerd          [30001] (stop_child)         notice: Stopping pacemaker-based | sent signal 15 to process 30015
Oct 01 17:21:04 FILE-2 pacemaker-based     [30015] (crm_signal_dispatch)        notice: Caught 'Terminated' signal | 15 (invoking handler)
Oct 01 17:21:04 FILE-2 pacemaker-based     [30015] (cib_shutdown)       info: Waiting on 1 clients to disconnect (0)
Oct 01 17:21:04 FILE-2 pacemaker-based     [30015] (cib_shutdown)       info: All clients disconnected (0)
Oct 01 17:21:04 FILE-2 pacemaker-based     [30015] (terminate_cib)      info: initiate_exit: Exiting from mainloop...
Oct 01 17:21:04 FILE-2 pacemaker-based     [30015] (crm_cluster_disconnect)     info: Disconnecting from cluster infrastructure: corosync
Oct 01 17:21:04 FILE-2 pacemakerd          [30001] (crm_cs_flush)       info: Sent 0 CPG messages  (2 remaining, last=13): Try again (6)
Oct 01 17:21:04 FILE-2 pacemakerd          [30001] (crm_cs_flush)       info: Sent 0 CPG messages  (2 remaining, last=13): Try again (6)
Oct 01 17:21:05 FILE-2 pacemakerd          [30001] (crm_cs_flush)       info: Sent 0 CPG messages  (2 remaining, last=13): Try again (6)
Oct 01 17:21:05 FILE-2 pacemakerd          [30001] (crm_cs_flush)       info: Sent 0 CPG messages  (2 remaining, last=13): Try again (6)
Oct 01 17:21:05 FILE-2 pacemakerd          [30001] (crm_cs_flush)       info: Sent 0 CPG messages  (2 remaining, last=13): Try again (6)
.
.
.
Oct 01 17:24:03 FILE-2 pacemakerd          [30001] (crm_cs_flush)       info: Sent 0 CPG messages  (2 remaining, last=13): Try again (6)
Oct 01 17:24:03 FILE-2 pacemakerd          [30001] (crm_cs_flush)       info: Sent 0 CPG messages  (2 remaining, last=13): Try again (6)
Oct 01 17:24:04 FILE-2 pacemakerd          [30001] (crm_cs_flush)       info: Sent 0 CPG messages  (2 remaining, last=13): Try again (6)
Oct 01 17:24:04 FILE-2 pacemakerd          [30001] (escalate_shutdown)  error: Child pacemaker-based not terminating in a timely manner, forcing
Oct 01 17:24:04 FILE-2 pacemakerd          [30001] (stop_child)         notice: Stopping pacemaker-based | sent signal 11 to process 30015
Oct 01 17:24:04 FILE-2 pacemakerd          [30001] (pcmk_child_exit)    error: pacemaker-based[30015] terminated with signal 11 (core=0)
Oct 01 17:24:04 FILE-2 pacemakerd          [30001] (pcmk_shutdown_worker)       notice: Shutdown complete
Oct 01 17:24:04 FILE-2 pacemakerd          [30001] (qb_ipcs_us_withdraw)        info: withdrawing server sockets
Oct 01 18:21:03 FILE-2 pacemakerd          [30001] (crm_xml_cleanup)    info: Cleaning up memory from libxml2
Oct 01 18:21:03 FILE-2 pacemakerd          [30001] (crm_exit)   info: Exiting pacemakerd | with status 0
Comment 1 Yan Gao 2023-02-15 14:47:34 UTC
(In reply to Pri Bal from comment #0)
> Oct 01 17:24:04 FILE-2 pacemakerd          [30001] (qb_ipcs_us_withdraw)    
> info: withdrawing server sockets

Was pacemaker-controld already stopped by then? Was there anything else logged in pacemaker.log or syslog after this and until 18:21:03?

Could it be the system encountering certain difficulties such as high load or somehow freezing/hanging?
Comment 2 Yan Gao 2023-03-16 09:38:24 UTC
This looks like a duplicate of bsc#1205214.

Let's track it there since supportconfig was provided there.

*** This bug has been marked as a duplicate of bug 1205214 ***