Bug 1035838

Summary: Cluster lvm with sanlock misbehaves
Product: [openSUSE] openSUSE Tumbleweed Reporter: zhen ren <zren>
Component: High AvailabilityAssignee: zhen ren <zren>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description zhen ren 2017-04-25 03:31:36 UTC
1. versions

tw1:~ # rpm -qa|grep lvm2
lvm2-lockd-2.02.168-257.1.x86_64
lvm2-2.02.168-257.1.x86_64

tw1:~ # rpm -qa|grep sanlock
sanlock-3.4.0-1.1.x86_64
libsanlock1-3.4.0-1.1.x86_64

2. errors

tw2:~/lvm2test # systemctl status sanlock
● sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-04-10 14:55:42 CST; 2 weeks 0 days ago
  Process: 10742 ExecStart=/usr/sbin/sanlock daemon $SANLOCKOPTS (code=exited, status=0/SUCCESS)
 Main PID: 10744 (sanlock)
    Tasks: 7 (limit: 4915)
   CGroup: /system.slice/sanlock.service
           ├─10744 /usr/sbin/sanlock daemon -U sanlock -G sanlock
           └─10745 /usr/sbin/sanlock daemon -U sanlock -G sanlock

Apr 20 09:58:20 tw2 sanlock[10744]: 2017-04-20 09:58:15+0800 858385 [11412]: s1 renewal error -202 delta_length 10 last_success 858354
Apr 20 09:59:11 tw2 sanlock[10744]: 2017-04-20 09:59:11+0800 858441 [11412]: s1 delta_renew read timeout 10 sec offset 0 /dev/mapper/testvg-lvmlock
Apr 20 09:59:11 tw2 sanlock[10744]: 2017-04-20 09:59:11+0800 858441 [11412]: s1 renewal error -202 delta_length 10 last_success 858410
Apr 20 09:59:26 tw2 sanlock[10744]: 2017-04-20 09:59:26+0800 858455 [11412]: s1 delta_renew long write time 11 sec
Apr 20 10:05:21 tw2 sanlock[10744]: 2017-04-20 10:05:21+0800 858810 [11412]: s1 delta_renew long write time 27 sec
Apr 20 10:05:21 tw2 sanlock[10744]: 2017-04-20 10:05:21+0800 858810 [11412]: s1 renewed 858783 delta_length 33 too long
Apr 20 10:05:31 tw2 sanlock[10744]: 2017-04-20 10:05:31+0800 858821 [11412]: s1 delta_renew read timeout 10 sec offset 0 /dev/mapper/testvg-lvmlock
Apr 20 10:05:31 tw2 sanlock[10744]: 2017-04-20 10:05:31+0800 858821 [11412]: s1 renewal error -202 delta_length 10 last_success 858783
Apr 20 10:06:06 tw2 sanlock[10744]: 2017-04-20 10:06:06+0800 858855 [11412]: s1 delta_renew long write time 26 sec
Apr 20 10:06:06 tw2 sanlock[10744]: 2017-04-20 10:06:06+0800 858855 [11412]: s1 renewed 858829 delta_length 34 too long

tw2:~/lvm2test # systemctl status wdmd
● wdmd.service - Watchdog Multiplexing Daemon
   Loaded: loaded (/usr/lib/systemd/system/wdmd.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-04-10 14:55:41 CST; 2 weeks 0 days ago
  Process: 10741 ExecStart=/usr/sbin/wdmd $WDMDOPTS (code=exited, status=0/SUCCESS)
 Main PID: 10743 (wdmd)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/wdmd.service
           └─10743 /usr/sbin/wdmd -G sanlock

Apr 19 23:08:40 tw2 wdmd[10743]: test warning now 819409 ping 819399 close 0 renewal 819331 expire 819411 client 10744 sanlock_lvm_testvg:2
Apr 19 23:08:40 tw2 wdmd[10743]: /dev/watchdog0 closed unclean
Apr 19 23:08:41 tw2 wdmd[10743]: test warning now 819410 ping 819399 close 819410 renewal 819331 expire 819411 client 10744 sanlock_lvm_testvg:2
Apr 19 23:08:42 tw2 wdmd[10743]: test failed rem 59 now 819411 ping 819399 close 819410 renewal 819331 expire 819411 client 10744 sanlock_lvm_testvg:2
Apr 19 23:08:43 tw2 wdmd[10743]: test failed rem 58 now 819412 ping 819399 close 819410 renewal 819331 expire 819411 client 10744 sanlock_lvm_testvg:2
Apr 19 23:08:44 tw2 wdmd[10743]: test failed rem 57 now 819413 ping 819399 close 819410 renewal 819331 expire 819411 client 10744 sanlock_lvm_testvg:2
Apr 19 23:08:45 tw2 wdmd[10743]: test failed rem 56 now 819414 ping 819399 close 819410 renewal 819331 expire 819411 client 10744 sanlock_lvm_testvg:2
Apr 19 23:08:46 tw2 wdmd[10743]: test failed rem 55 now 819415 ping 819399 close 819410 renewal 819331 expire 819411 client 10744 sanlock_lvm_testvg:2
Apr 19 23:08:47 tw2 wdmd[10743]: test failed rem 54 now 819416 ping 819399 close 819410 renewal 819331 expire 819411 client 10744 sanlock_lvm_testvg:2
Apr 19 23:08:47 tw2 wdmd[10743]: /dev/watchdog0 reopen
Comment 1 zhen ren 2017-04-25 04:54:42 UTC
Apr 25 12:53:41 tw1 sanlock[2112]: 2017-04-25 12:53:41+0800 4930 [2116]: s2 add_lockspace fail result -203
Apr 25 12:53:41 tw1 lvmlockd[2676]: 1493096021 S lvm_testvg add_lockspace_san add_lockspace error -203
Apr 25 12:53:40 tw1 sanlock[2112]: 2017-04-25 12:53:40+0800 4929 [2699]: s2 connect_watchdog failed -1
Apr 25 12:53:40 tw1 sanlock[2112]: 2017-04-25 12:53:40+0800 4929 [2699]: s2 wdmd_connect failed -111
Apr 25 12:50:30 tw1 lvmlockd[2676]: [D] creating /run/lvm/lvmlockd.socket
Apr 25 12:50:29 tw1 systemd[1]: Started LVM2 lock daemon.
Apr 25 12:50:29 tw1 lvmlockd[2676]: 1493095829 lvmlockd started
Comment 2 zhen ren 2017-04-25 05:12:47 UTC
wdmd.service requires to take fully control of watchdog devices. But, SBD.service also use watchdog device, so you must make sure SBD/pacemaker is stopped before trying to start wdmd.