Bug 959966

Summary: dlm_start_0:2168:stderr [ mount: none is already mounted or /sys/kernel/config busy ]
Product: [openSUSE] openSUSE Distribution Reporter: Roger Zhou <zzhou>
Component: High AvailabilityAssignee: Lidong Zhong <lidong.zhong>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: ghe, ygao, zren
Version: Leap 42.1   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Roger Zhou 2015-12-22 04:08:31 UTC
File this bug before I forgot. It looks like happens for a brand new installation since I couldn't 100% reproduce from my existent system. I will try to reproduce it later when I find my time.

Symptom
==================

1. error log from journalctl

Dec 15 17:31:03 Leap421-02 lrmd[1672]: notice: dlm_start_0:2168:stderr [ mount: none is already mounted or /sys/kernel/config busy ]
Dec 15 17:31:03 Leap421-02 lrmd[1672]: notice: finished - rsc:dlm action:start call_id:42 pid:2168 exit-code:0 exec-time:1135ms queue-time:0ms
Dec 15 17:31:03 Leap421-02 crmd[1675]: notice: Operation dlm_start_0: ok (node=Leap421-02, call=42, rc=0, cib-update=21, confirmed=true)

2. While, no error report from "crm status"

Leap421-02:~ # crm status
Last updated: Mon Dec 14 17:06:03 2015		Last change: Mon Dec 14 13:32:37 2015 by root via cibadmin on Leap421-01
Stack: corosync
Current DC: Leap421-02 (version 1.1.13-12.2-7906df9) - partition with quorum
2 nodes and 11 resources configured

Online: [ Leap421-01 Leap421-02 ]

Full list of resources:

 stonith_sbd	(stonith:external/sbd):	Started Leap421-02
 Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
     Masters: [ Leap421-02 ]
     Slaves: [ Leap421-01 ]
 Resource Group: g_nfs
     p_lvm_nfs	(ocf::heartbeat:LVM):	Started Leap421-02
     p_fs_engineering	(ocf::heartbeat:Filesystem):	Started Leap421-02
     p_fs_sales	(ocf::heartbeat:Filesystem):	Started Leap421-02
     p_ip_nfs	(ocf::heartbeat:IPaddr2):	Started Leap421-02
 Clone Set: cl_exportfs_root [p_exportfs_root]
     Started: [ Leap421-01 Leap421-02 ]
 Clone Set: base-clone [base-group]
     Started: [ Leap421-01 Leap421-02 ]


3. Looks like, DLM is not functional. It introduces the problem to mkfs.ocfs2 

Leap421-02:~ # mkfs.ocfs2 -N 32 /dev/sdd
mkfs.ocfs2 1.8.4
Cluster stack: classic o2cb
Label: 
Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super xattr indexed-dirs refcount discontig-bg
Block size: 1024 (10 bits)
Cluster size: 4096 (12 bits)
Volume size: 524288000 (128000 clusters) (512000 blocks)
Cluster groups: 17 (tail covers 5120 clusters, rest cover 7680 clusters)
Extent allocator size: 0 (0 groups)
Journal size: 16777216
Node slots: 32
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 0 block(s)
Formatting Journals: mkfs.ocfs2: Unable to find available bit while formatting journal "journal:0031"


Further trouble shooting
==================

1. stop all other RA, and restart pacemaker. Looks DLM works well.

Leap421-02:~ # crm status full
Last updated: Thu Dec 17 14:52:28 2015		Last change: Thu Dec 17 13:51:49 2015 by root via cibadmin on Leap421-02
Stack: corosync
Current DC: Leap421-01 (version 1.1.13-12.2-7906df9) - partition with quorum
2 nodes and 11 resources configured

Node Leap421-01: online
	stonith_sbd	(stonith:external/sbd):	Started
	dlm	(ocf::pacemaker:controld):	Started
Node Leap421-02: online
	dlm	(ocf::pacemaker:controld):	Started

Inactive resources:

 Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
     Stopped: [ Leap421-01 Leap421-02 ]
 Resource Group: g_nfs
     p_lvm_nfs	(ocf::heartbeat:LVM):	(target-role:Stopped) Stopped
     p_fs_engineering	(ocf::heartbeat:Filesystem):	(target-role:Stopped) Stopped
     p_fs_sales	(ocf::heartbeat:Filesystem):	(target-role:Stopped) Stopped
     p_ip_nfs	(ocf::heartbeat:IPaddr2):	(target-role:Stopped) Stopped
 Clone Set: cl_exportfs_root [p_exportfs_root]
     Stopped: [ Leap421-01 Leap421-02 ]

Operations:
* Node Leap421-01:
   stonith_sbd: migration-threshold=1000000
    + (11) start: last-rc-change='Tue Dec 15 17:30:59 2015' last-run='Tue Dec 15 17:30:59 2015' exec-time=1760ms queue-time=0ms rc=0 (ok)
   p_drbd_nfs: migration-threshold=1000000
    + (46) monitor: interval=15000ms last-rc-change='Tue Dec 15 17:31:02 2015' exec-time=94ms queue-time=0ms rc=8 (master)
    + (73) stop: last-rc-change='Thu Dec 17 13:50:30 2015' last-run='Thu Dec 17 13:50:30 2015' exec-time=352ms queue-time=0ms rc=0 (ok)
   p_lvm_nfs: migration-threshold=1000000
    + (49) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:03 2015' exec-time=21ms queue-time=0ms rc=0 (ok)
    + (67) stop: last-rc-change='Thu Dec 17 13:50:14 2015' last-run='Thu Dec 17 13:50:14 2015' exec-time=356ms queue-time=0ms rc=0 (ok)
   p_fs_engineering: migration-threshold=1000000
    + (51) monitor: interval=10000ms last-rc-change='Tue Dec 15 17:31:03 2015' exec-time=38ms queue-time=0ms rc=0 (ok)
    + (65) stop: last-rc-change='Thu Dec 17 13:50:14 2015' last-run='Thu Dec 17 13:50:14 2015' exec-time=115ms queue-time=0ms rc=0 (ok)
   p_fs_sales: migration-threshold=1000000
    + (53) monitor: interval=10000ms last-rc-change='Tue Dec 15 17:31:03 2015' exec-time=36ms queue-time=0ms rc=0 (ok)
    + (63) stop: last-rc-change='Thu Dec 17 13:50:13 2015' last-run='Thu Dec 17 13:50:13 2015' exec-time=701ms queue-time=0ms rc=0 (ok)
   p_ip_nfs: migration-threshold=1000000
    + (55) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:03 2015' exec-time=29ms queue-time=0ms rc=0 (ok)
    + (61) stop: last-rc-change='Thu Dec 17 13:50:13 2015' last-run='Thu Dec 17 13:50:13 2015' exec-time=34ms queue-time=0ms rc=0 (ok)
   p_exportfs_root: migration-threshold=1000000
    + (43) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:01 2015' exec-time=20ms queue-time=0ms rc=0 (ok)
    + (57) stop: last-rc-change='Thu Dec 17 13:49:39 2015' last-run='Thu Dec 17 13:49:39 2015' exec-time=41ms queue-time=0ms rc=0 (ok)
   dlm: migration-threshold=1000000
    + (74) start: last-rc-change='Thu Dec 17 13:51:48 2015' last-run='Thu Dec 17 13:51:48 2015' exec-time=1426ms queue-time=0ms rc=0 (ok)
    + (75) monitor: interval=60000ms last-rc-change='Thu Dec 17 13:51:50 2015' exec-time=20ms queue-time=0ms rc=0 (ok)
* Node Leap421-02:
   p_drbd_nfs: migration-threshold=1000000
    + (44) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:03 2015' exec-time=82ms queue-time=0ms rc=0 (ok)
    + (54) stop: last-rc-change='Thu Dec 17 13:50:31 2015' last-run='Thu Dec 17 13:50:31 2015' exec-time=302ms queue-time=0ms rc=0 (ok)
   p_exportfs_root: migration-threshold=1000000
    + (41) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:02 2015' exec-time=23ms queue-time=0ms rc=0 (ok)
    + (47) stop: last-rc-change='Thu Dec 17 13:49:40 2015' last-run='Thu Dec 17 13:49:40 2015' exec-time=46ms queue-time=0ms rc=0 (ok)
   dlm: migration-threshold=1000000
    + (55) start: last-rc-change='Thu Dec 17 13:51:49 2015' last-run='Thu Dec 17 13:51:49 2015' exec-time=1136ms queue-time=0ms rc=0 (ok)
    + (56) monitor: interval=60000ms last-rc-change='Thu Dec 17 13:51:50 2015' exec-time=22ms queue-time=0ms rc=0 (ok)


2. No suspicious error from journalctl reports.

journalctl -n400 -u pacemaker.service  > tt

[..]

Dec 17 13:51:49 Leap421-02 stonith-ng[1671]: notice: Relying on watchdog integration for fencing
Dec 17 13:51:49 Leap421-02 lrmd[1672]: notice: executing - rsc:dlm action:start call_id:55
Dec 17 13:51:49 Leap421-02 dlm_controld[19559]: 159703 dlm_controld 4.0.2 started
Dec 17 13:51:50 Leap421-02 lrmd[1672]: notice: finished - rsc:dlm action:start call_id:55 pid:19539 exit-code:0 exec-time:1136ms queue-time:0ms
Dec 17 13:51:50 Leap421-02 crmd[1675]: notice: Operation dlm_start_0: ok (node=Leap421-02, call=55, rc=0, cib-update=27, confirmed=true)
Dec 17 13:55:28 Leap421-02 dlm_controld[19559]: 159922 uevent message has 3 args
Comment 1 Lidong Zhong 2015-12-22 04:41:14 UTC
I can't download the libdlm package by vpn, so will dig into it tomorrow.
Comment 2 Lidong Zhong 2015-12-24 03:32:32 UTC
(In reply to Roger Zhou from comment #0)
> File this bug before I forgot. It looks like happens for a brand new
> installation since I couldn't 100% reproduce from my existent system. I will
> try to reproduce it later when I find my time.
> 
> Symptom
> ==================
> 
> 1. error log from journalctl
> 
Here is the related source code:

        if [ ! -e $OCF_RESKEY_configdir ]; then
            modprobe configfs
            if [ ! -e $OCF_RESKEY_configdir ]; then
               ocf_log err "$OCF_RESKEY_configdir not available"
               return $OCF_ERR_INSTALLED
        fi  
    fi  

        mount | grep "type configfs" > /dev/null
        if [ $? != 0 ]; then
       mount -t configfs none $OCF_RESKEY_configdir                                                                                                   
        fi  

While lrmd starting dlm_controld daemon, it first load the configfs kernel module and run mount to check if the configfs dir is created. If no, then it
will mount the configfs to the configured directory. The error message here is 
obviously because it finds no configfs when running mount, then it tries to mount configfs while configfs is now mounted. So I guess it's probably a timing issue which leads to the error.

However, it doesn't affect the final result. The dlm_controld daemon is started successfully.

> Dec 15 17:31:03 Leap421-02 lrmd[1672]: notice: dlm_start_0:2168:stderr [
> mount: none is already mounted or /sys/kernel/config busy ]
> Dec 15 17:31:03 Leap421-02 lrmd[1672]: notice: finished - rsc:dlm
> action:start call_id:42 pid:2168 exit-code:0 exec-time:1135ms queue-time:0ms
> Dec 15 17:31:03 Leap421-02 crmd[1675]: notice: Operation dlm_start_0: ok
> (node=Leap421-02, call=42, rc=0, cib-update=21, confirmed=true)
> 
> 2. While, no error report from "crm status"
> 
> Leap421-02:~ # crm status
> Last updated: Mon Dec 14 17:06:03 2015		Last change: Mon Dec 14 13:32:37
> 2015 by root via cibadmin on Leap421-01
> Stack: corosync
> Current DC: Leap421-02 (version 1.1.13-12.2-7906df9) - partition with quorum
> 2 nodes and 11 resources configured
> 
> Online: [ Leap421-01 Leap421-02 ]
> 
> Full list of resources:
> 
>  stonith_sbd	(stonith:external/sbd):	Started Leap421-02
>  Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
>      Masters: [ Leap421-02 ]
>      Slaves: [ Leap421-01 ]
>  Resource Group: g_nfs
>      p_lvm_nfs	(ocf::heartbeat:LVM):	Started Leap421-02
>      p_fs_engineering	(ocf::heartbeat:Filesystem):	Started Leap421-02
>      p_fs_sales	(ocf::heartbeat:Filesystem):	Started Leap421-02
>      p_ip_nfs	(ocf::heartbeat:IPaddr2):	Started Leap421-02
>  Clone Set: cl_exportfs_root [p_exportfs_root]
>      Started: [ Leap421-01 Leap421-02 ]
>  Clone Set: base-clone [base-group]
>      Started: [ Leap421-01 Leap421-02 ]
> 
> 
> 3. Looks like, DLM is not functional. It introduces the problem to
> mkfs.ocfs2 
> 

I can't get any clue that the failure is related to dlm does now work normally.
There is no lockspace is created, no lock operations happened while initiating the ocfs2 file system.

> Leap421-02:~ # mkfs.ocfs2 -N 32 /dev/sdd
> mkfs.ocfs2 1.8.4
> Cluster stack: classic o2cb
> Label: 
> Features: sparse extended-slotmap backup-super unwritten inline-data
> strict-journal-super xattr indexed-dirs refcount discontig-bg
> Block size: 1024 (10 bits)
> Cluster size: 4096 (12 bits)
> Volume size: 524288000 (128000 clusters) (512000 blocks)
> Cluster groups: 17 (tail covers 5120 clusters, rest cover 7680 clusters)
> Extent allocator size: 0 (0 groups)
> Journal size: 16777216
> Node slots: 32
> Creating bitmaps: done
> Initializing superblock: done
> Writing system files: done
> Writing superblock: done
> Writing backup superblock: 0 block(s)
> Formatting Journals: mkfs.ocfs2: Unable to find available bit while
> formatting journal "journal:0031"

I think here is the reason that leads to the initiation error.

> 
> 
> Further trouble shooting
> ==================
> 
> 1. stop all other RA, and restart pacemaker. Looks DLM works well.
> 
> Leap421-02:~ # crm status full
> Last updated: Thu Dec 17 14:52:28 2015		Last change: Thu Dec 17 13:51:49
> 2015 by root via cibadmin on Leap421-02
> Stack: corosync
> Current DC: Leap421-01 (version 1.1.13-12.2-7906df9) - partition with quorum
> 2 nodes and 11 resources configured
> 
> Node Leap421-01: online
> 	stonith_sbd	(stonith:external/sbd):	Started
> 	dlm	(ocf::pacemaker:controld):	Started
> Node Leap421-02: online
> 	dlm	(ocf::pacemaker:controld):	Started
> 
> Inactive resources:
> 
>  Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
>      Stopped: [ Leap421-01 Leap421-02 ]
>  Resource Group: g_nfs
>      p_lvm_nfs	(ocf::heartbeat:LVM):	(target-role:Stopped) Stopped
>      p_fs_engineering	(ocf::heartbeat:Filesystem):	(target-role:Stopped)
> Stopped
>      p_fs_sales	(ocf::heartbeat:Filesystem):	(target-role:Stopped) Stopped
>      p_ip_nfs	(ocf::heartbeat:IPaddr2):	(target-role:Stopped) Stopped
>  Clone Set: cl_exportfs_root [p_exportfs_root]
>      Stopped: [ Leap421-01 Leap421-02 ]
> 
> Operations:
> * Node Leap421-01:
>    stonith_sbd: migration-threshold=1000000
>     + (11) start: last-rc-change='Tue Dec 15 17:30:59 2015' last-run='Tue
> Dec 15 17:30:59 2015' exec-time=1760ms queue-time=0ms rc=0 (ok)
>    p_drbd_nfs: migration-threshold=1000000
>     + (46) monitor: interval=15000ms last-rc-change='Tue Dec 15 17:31:02
> 2015' exec-time=94ms queue-time=0ms rc=8 (master)
>     + (73) stop: last-rc-change='Thu Dec 17 13:50:30 2015' last-run='Thu Dec
> 17 13:50:30 2015' exec-time=352ms queue-time=0ms rc=0 (ok)
>    p_lvm_nfs: migration-threshold=1000000
>     + (49) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:03
> 2015' exec-time=21ms queue-time=0ms rc=0 (ok)
>     + (67) stop: last-rc-change='Thu Dec 17 13:50:14 2015' last-run='Thu Dec
> 17 13:50:14 2015' exec-time=356ms queue-time=0ms rc=0 (ok)
>    p_fs_engineering: migration-threshold=1000000
>     + (51) monitor: interval=10000ms last-rc-change='Tue Dec 15 17:31:03
> 2015' exec-time=38ms queue-time=0ms rc=0 (ok)
>     + (65) stop: last-rc-change='Thu Dec 17 13:50:14 2015' last-run='Thu Dec
> 17 13:50:14 2015' exec-time=115ms queue-time=0ms rc=0 (ok)
>    p_fs_sales: migration-threshold=1000000
>     + (53) monitor: interval=10000ms last-rc-change='Tue Dec 15 17:31:03
> 2015' exec-time=36ms queue-time=0ms rc=0 (ok)
>     + (63) stop: last-rc-change='Thu Dec 17 13:50:13 2015' last-run='Thu Dec
> 17 13:50:13 2015' exec-time=701ms queue-time=0ms rc=0 (ok)
>    p_ip_nfs: migration-threshold=1000000
>     + (55) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:03
> 2015' exec-time=29ms queue-time=0ms rc=0 (ok)
>     + (61) stop: last-rc-change='Thu Dec 17 13:50:13 2015' last-run='Thu Dec
> 17 13:50:13 2015' exec-time=34ms queue-time=0ms rc=0 (ok)
>    p_exportfs_root: migration-threshold=1000000
>     + (43) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:01
> 2015' exec-time=20ms queue-time=0ms rc=0 (ok)
>     + (57) stop: last-rc-change='Thu Dec 17 13:49:39 2015' last-run='Thu Dec
> 17 13:49:39 2015' exec-time=41ms queue-time=0ms rc=0 (ok)
>    dlm: migration-threshold=1000000
>     + (74) start: last-rc-change='Thu Dec 17 13:51:48 2015' last-run='Thu
> Dec 17 13:51:48 2015' exec-time=1426ms queue-time=0ms rc=0 (ok)
>     + (75) monitor: interval=60000ms last-rc-change='Thu Dec 17 13:51:50
> 2015' exec-time=20ms queue-time=0ms rc=0 (ok)
> * Node Leap421-02:
>    p_drbd_nfs: migration-threshold=1000000
>     + (44) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:03
> 2015' exec-time=82ms queue-time=0ms rc=0 (ok)
>     + (54) stop: last-rc-change='Thu Dec 17 13:50:31 2015' last-run='Thu Dec
> 17 13:50:31 2015' exec-time=302ms queue-time=0ms rc=0 (ok)
>    p_exportfs_root: migration-threshold=1000000
>     + (41) monitor: interval=30000ms last-rc-change='Tue Dec 15 17:31:02
> 2015' exec-time=23ms queue-time=0ms rc=0 (ok)
>     + (47) stop: last-rc-change='Thu Dec 17 13:49:40 2015' last-run='Thu Dec
> 17 13:49:40 2015' exec-time=46ms queue-time=0ms rc=0 (ok)
>    dlm: migration-threshold=1000000
>     + (55) start: last-rc-change='Thu Dec 17 13:51:49 2015' last-run='Thu
> Dec 17 13:51:49 2015' exec-time=1136ms queue-time=0ms rc=0 (ok)
>     + (56) monitor: interval=60000ms last-rc-change='Thu Dec 17 13:51:50
> 2015' exec-time=22ms queue-time=0ms rc=0 (ok)
> 
> 
> 2. No suspicious error from journalctl reports.
> 
> journalctl -n400 -u pacemaker.service  > tt
> 
> [..]
> 
> Dec 17 13:51:49 Leap421-02 stonith-ng[1671]: notice: Relying on watchdog
> integration for fencing
> Dec 17 13:51:49 Leap421-02 lrmd[1672]: notice: executing - rsc:dlm
> action:start call_id:55
> Dec 17 13:51:49 Leap421-02 dlm_controld[19559]: 159703 dlm_controld 4.0.2
> started
> Dec 17 13:51:50 Leap421-02 lrmd[1672]: notice: finished - rsc:dlm
> action:start call_id:55 pid:19539 exit-code:0 exec-time:1136ms queue-time:0ms
> Dec 17 13:51:50 Leap421-02 crmd[1675]: notice: Operation dlm_start_0: ok
> (node=Leap421-02, call=55, rc=0, cib-update=27, confirmed=true)
> Dec 17 13:55:28 Leap421-02 dlm_controld[19559]: 159922 uevent message has 3
> args

Since it can't be reproduced anymore, I suggest to close it for now. It could be reopened when it could be reproduced.
Comment 3 Roger Zhou 2015-12-29 02:42:01 UTC
(In reply to Lidong Zhong from comment #2)
> The error message here is 
> obviously because it finds no configfs when running mount, then it tries to
> mount configfs while configfs is now mounted. So I guess it's probably a
> timing issue which leads to the error.
> 
> However, it doesn't affect the final result. The dlm_controld daemon is
> started successfully.
> 

[...]

> Since it can't be reproduced anymore, I suggest to close it for now. It could be reopened when it could be reproduced.

Reasonable explanation. By looking further, I treat this as a valid trivial bug, and the error handling can be improved to avoid misleading to the user. There is no value to report this false positive message, and can be easily fixed.
Comment 4 Lidong Zhong 2015-12-29 05:00:28 UTC
Hi Yan, 

Please help to review this patch. Since dlm kernel module depends on configfs, configfs will be auto loaded when loading the dlm module.
What's your opinion? Thanks


diff --git a/extra/resources/controld b/extra/resources/controld
index a0eb8f2..04a1e67 100644
--- a/extra/resources/controld
+++ b/extra/resources/controld
@@ -143,19 +143,6 @@ controld_start() {
       *) return $OCF_ERR_GENERIC;;
     esac
 
-        if [ ! -e $OCF_RESKEY_configdir ]; then
-           modprobe configfs
-           if [ ! -e $OCF_RESKEY_configdir ]; then
-               ocf_log err "$OCF_RESKEY_configdir not available"
-                      return $OCF_ERR_INSTALLED
-           fi
-       fi
-
-       mount | grep "type configfs" > /dev/null
-       if [ $? != 0 ]; then
-          mount -t configfs none $OCF_RESKEY_configdir    
-       fi
-
        if [ ! -e $OCF_RESKEY_configdir/dlm ]; then
            modprobe dlm
           if [ ! -e $OCF_RESKEY_configdir/dlm ]; then
Comment 5 Yan Gao 2016-01-08 17:35:28 UTC
(In reply to Lidong Zhong from comment #4)
> Hi Yan, 
> 
> Please help to review this patch. Since dlm kernel module depends on
> configfs, configfs will be auto loaded when loading the dlm module.
> What's your opinion? Thanks
> 
> 
> diff --git a/extra/resources/controld b/extra/resources/controld
> index a0eb8f2..04a1e67 100644
> --- a/extra/resources/controld
> +++ b/extra/resources/controld
> @@ -143,19 +143,6 @@ controld_start() {
>        *) return $OCF_ERR_GENERIC;;
>      esac
>  
> -        if [ ! -e $OCF_RESKEY_configdir ]; then
> -           modprobe configfs
> -           if [ ! -e $OCF_RESKEY_configdir ]; then
> -               ocf_log err "$OCF_RESKEY_configdir not available"
> -                      return $OCF_ERR_INSTALLED
> -           fi
> -       fi
> -
> -       mount | grep "type configfs" > /dev/null
> -       if [ $? != 0 ]; then
> -          mount -t configfs none $OCF_RESKEY_configdir    
> -       fi
> -
>         if [ ! -e $OCF_RESKEY_configdir/dlm ]; then
>             modprobe dlm
>            if [ ! -e $OCF_RESKEY_configdir/dlm ]; then
I'm not sure if the dependencies are automatically resolved in old kernels. Given that the logic has been existing for years, this probably would introduce potential compatibility issue?

I'm curious where this stderror message:

"[ mount: none is already mounted or /sys/kernel/config busy ]"

is from. Given the logic in the RA, it won't perform "modprobe configfs" again if configfs already exists, it won't perform "mount" again if configfs is already mounted, right?
Comment 6 Lidong Zhong 2016-01-11 03:22:46 UTC
(In reply to Yan Gao from comment #5)
> (In reply to Lidong Zhong from comment #4)
> > Hi Yan, 

Thanks for your reply :)

snip

> > -
> > -       mount | grep "type configfs" > /dev/null
> > -       if [ $? != 0 ]; then
> > -          mount -t configfs none $OCF_RESKEY_configdir    
> > -       fi
> > -
> >         if [ ! -e $OCF_RESKEY_configdir/dlm ]; then
> >             modprobe dlm
> >            if [ ! -e $OCF_RESKEY_configdir/dlm ]; then
> I'm not sure if the dependencies are automatically resolved in old kernels.
> Given that the logic has been existing for years, this probably would
> introduce potential compatibility issue?
> 
> I'm curious where this stderror message:
> 
> "[ mount: none is already mounted or /sys/kernel/config busy ]"

After the configfs is already mounted , this error message will be printed out when running this command:

mount -t configfs none $OCF_RESKEY_configdir

So it seems it's a timing issue.

> 
> is from. Given the logic in the RA, it won't perform "modprobe configfs"
> again if configfs already exists, it won't perform "mount" again if configfs
> is already mounted, right?

Yes, it is. Basically the source I am removing shouldn't cause any problem to the logic. Instead it looks more like a protection. 
Anyway, please ignore the patch for now. I will close this bug. We can reopen it once we could reproduce this problem.

BTW, I just checked the history of dlm and found that the first kernel patch already depended on configfs. So I think the compatibility issue is quite rare.
Comment 7 Yan Gao 2016-01-11 12:06:39 UTC
(In reply to Lidong Zhong from comment #6)
> (In reply to Yan Gao from comment #5)
> > (In reply to Lidong Zhong from comment #4)
> > > Hi Yan, 
> 
> Thanks for your reply :)
> 
> snip
> 
> > > -
> > > -       mount | grep "type configfs" > /dev/null
> > > -       if [ $? != 0 ]; then
> > > -          mount -t configfs none $OCF_RESKEY_configdir    
> > > -       fi
> > > -
> > >         if [ ! -e $OCF_RESKEY_configdir/dlm ]; then
> > >             modprobe dlm
> > >            if [ ! -e $OCF_RESKEY_configdir/dlm ]; then
> > I'm not sure if the dependencies are automatically resolved in old kernels.
> > Given that the logic has been existing for years, this probably would
> > introduce potential compatibility issue?
> > 
> > I'm curious where this stderror message:
> > 
> > "[ mount: none is already mounted or /sys/kernel/config busy ]"
> 
> After the configfs is already mounted , this error message will be printed
> out when running this command:
> 
> mount -t configfs none $OCF_RESKEY_configdir
> 
> So it seems it's a timing issue.
Hmm, probably. Sounds like "modprobe configfs" itself will mount configfs as well, right? while there's a pretty short time window between:

"modprobe configfs"

and the test

"mount | grep "type configfs"

, while it doesn't appear yet.