Bug 743213

Summary: InfiniBand network interface initialization hangs (caused by systemd ?)
Product: [openSUSE] openSUSE 12.1 Reporter: Bart Van Assche <bart.vanassche>
Component: BasesystemAssignee: Frederic Crozat <fcrozat>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None    
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: SUSE Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Contents of /var/log/messages with systemd logging enabled
Contents of /var/log/messages with systemd logging enabled (2)
Contents of /var/log/messages with systemd logging enabled
Output of "systemd-analyze plot"
/etc/init.d/openibd script used during this test
/var/log/messages with systemd-37-3.149.1.x86_64
Contents of /var/log/messages with systemd logging enabled
/etc/init.d/openibd script used during this test

Description Bart Van Assche 2012-01-24 20:02:48 UTC
Created attachment 472566 [details]
Contents of /var/log/messages with systemd logging enabled

User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1

I'm using the OFED startup scripts since a long time on openSUSE. These worked fine with openSUSE 11.4 and before but not with openSUSE 12.1 (installed from scratch). If I configure the openibd and opensmd script to run during startup, then startup hangs before switching from console mode to X11 desktop.

Reproducible: Always

Steps to Reproduce:
1. Install openSUSE 12.1.
2. Download, configure, build and install OFED 1.5.4 (http://www.openfabrics.org/downloads/OFED/ofed-1.5.4/OFED-1.5.4.tgz).
3. Enable the OFED scripts to run during boot:
chkconfig -s openibd on
chkconfig -s opensmd 235
4. Reboot.

Actual Results:  
Startup never completes.

Expected Results:  
System boots normally and the graphical (KDE) desktop appears after a reasonable time.

Can probably only be reproduced on a system with at least one InfiniBand HCA.

# head /etc/sysconfig/network/ifcfg-ib*
==> /etc/sysconfig/network/ifcfg-ib0 <==
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR='192.168.5.1/24'
MTU=''
NAME='MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='auto'
USERCONTROL='no'

==> /etc/sysconfig/network/ifcfg-ib1 <==
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR='192.168.6.1/24'
MTU=''
NAME='MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='auto'
USERCONTROL='no'
Comment 1 Bart Van Assche 2012-01-24 20:03:57 UTC
Note: the OFED initialization scripts work fine if started after startup finished:

# time { /etc/init.d/openibd start; /etc/init.d/opensmd start; }
Loading HCA driver and Access Layer:                       [  OK  ]
Setting up InfiniBand network interfaces:
    ib0       device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)
Bringing up interface ib0:                                 [  OK  ]
    ib1       device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)
Bringing up interface ib1:                                 [  OK  ]
Setting up service network . . .                           [  done  ]
redirecting to systemctl

real    0m4.917s
user    0m0.070s
sys     0m0.960s
Comment 2 Bart Van Assche 2012-01-25 11:52:46 UTC
Note: apparently systemd got the startup order wrong. openibd must be started before opensmd, while systemd decided to start these services in the reverse order. From /var/log/messages:

Jan 24 19:34:53 asus kernel[1]: Installed new job opensmd.service/start as 84
Jan 24 19:34:53 asus kernel[1]: Installed new job openibd.service/start as 88
Comment 3 Bart Van Assche 2012-01-26 19:03:06 UTC
My system starts again normally after having removed openibd from Should-Start in network and after having added $network as Required-start in openibd and openibd as Required-Start for opensmd:

# grep -E 'Required-Start:|Should-Start:' network openibd opensmd
network:# Required-Start:       $local_fs
network:# Should-Start:         earlysyslog isdn SuSEfirewall2_init
openibd:# Required-Start: $local_fs $network
opensmd:# Required-Start: $syslog openibd

So why is openibd specified as a Should-Start dependency of network in sysconfig-0.75.4-2.2.2.x86_64.rpm ? Is that correct ?
Comment 4 Frederic Crozat 2012-01-30 09:17:29 UTC
please reboot with adding on kernel command line "systemd.log_level=debug systemd.log_target=kmsg" and attach dmesg output to this bug report.
Comment 5 Bart Van Assche 2012-01-30 19:12:59 UTC
Created attachment 473319 [details]
Contents of /var/log/messages with systemd logging enabled (2)

Contents of /var/log/messages for the following system configuration:
* Unmodified /etc/init.d/network (rpm --verify sysconfig-0.75.4-2.2.2.x86_64; echo $? gives 0 as output).
* openibd has $local_fs as Required-Start
* opensmd has $local_fs, $network and openibd as Required-Start
Comment 6 Frederic Crozat 2012-01-31 10:33:56 UTC
openibd startup has a timeout.

From a quick look at the "initscript" shipped by OFED, it seems completely broken and doesn't integrate properly with openSUSE initscript (no source of /etc/rc.status, which will break systemd integration when script is manually called).

It is not clear if this script is supposed to start a daemon or not.

Try adding in the LSB header (it requires systemd from 12.1 maintenance update to work properly):
# X-Systemd-RemainAfterExit: true
Comment 7 Bart Van Assche 2012-02-01 18:35:12 UTC
Created attachment 473854 [details]
Contents of /var/log/messages with systemd logging enabled

Even after having added "X-Systemd-RemainAfterExit: true" there is still a ten minute delay during startup with openibd enabled. I also noticed a strange message while enabling openibd:

asus:~ # systemctl --system daemon-reload
asus:~ # systemctl disable openibd.service
openibd.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig openibd off
asus:~ # systemctl enable openibd.service
openibd.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig openibd on
Warning: unit files do not carry install information. No operation executed.
# find /etc/init.d/rc* -name \*openibd
/etc/init.d/rc2.d/S01openibd
/etc/init.d/rc2.d/K10openibd
/etc/init.d/rc3.d/S01openibd
/etc/init.d/rc3.d/K10openibd
/etc/init.d/rc5.d/S01openibd
/etc/init.d/rc5.d/K10openibd
Comment 8 Bart Van Assche 2012-02-01 18:38:00 UTC
Created attachment 473855 [details]
Output of "systemd-analyze plot"
Comment 9 Bart Van Assche 2012-02-01 18:41:25 UTC
Created attachment 473863 [details]
/etc/init.d/openibd script used during this test

Please note that I've attempted to make the openibd startup script LSB-compliant (see attachment).
Comment 10 Bart Van Assche 2012-02-01 18:44:27 UTC
(In reply to comment #6)
> It is not clear if this script is supposed to start a daemon or not.

As far as I know the openibd startup script does not start any daemons. What it does is to load several kernel modules, to adjust some system settings and to bring up the IPoIB network interfaces.
Comment 11 Frederic Crozat 2012-02-08 17:42:45 UTC
could you :
- move the "# X-Systemd-RemainAfterExit" line abobe the "# Description" line (otherwise, it is ignored)
- install latest version of systemd package (from http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ )
- edit /etc/init.d/openibd and add -x to the first line :
#!/bin/bash -x

and attach /var/log/messages after booting
Comment 12 Bart Van Assche 2012-02-09 17:38:02 UTC
Created attachment 475354 [details]
/var/log/messages with systemd-37-3.149.1.x86_64

Startup still takes 10 minutes with openibd enabled unfortunately.
Comment 13 Frederic Crozat 2012-02-10 10:09:33 UTC
from the log, it looks like modprobe mlx4_core is blocking in openibd core, so it doesn't seem systemd related, it is just that systemd highlight the issue.
Comment 14 Bart Van Assche 2012-02-10 11:09:11 UTC
(In reply to comment #13)
> from the log, it looks like modprobe mlx4_core is blocking in openibd core, so
> it doesn't seem systemd related, it is just that systemd highlight the issue.

Are you sure ? From /var/log/messages:

Feb  9 16:42:58 asus openibd[875]: + /sbin/modprobe mlx4_ib
Feb  9 16:42:58 asus openibd[875]: + /sbin/modprobe mlx4_en
Feb  9 16:42:59 asus kernel[875]: + /sbin/modprobe mlx4_core

So before the openibd startup script issues the shell command "/sbin/modprobe mlx4_core", the modules mlx4_ib and mlx4_en have already been loaded successfully. Since these last two modules depend on mlx4_core, loading either module also loads the mlx4_core module. So how is it possible that the "/sbin/modprobe mlx4_core" command hangs at a time that module has already been loaded ?
Comment 15 Bart Van Assche 2012-02-10 11:45:50 UTC
After having had a closer look at /var/log/messages, my conclusion is not only that the messages in /var/log/messages are out of order but also that the timestamps are out of order. There are several messages in the log that show that loading both mlx4_ib and mlx4_en succeeded, so loading mlx4_core must also have succeeded and that last module must have been loaded before mlx4_ib and mlx4_en.

So my conclusion is that the command that caused /etc/init.d/openibd to hang is:

Feb  9 16:42:58 asus openibd[875]: + /sbin/ifup ib0
Feb  9 16:42:58 asus openibd[875]: ib0       device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)

I'm not sure though what ifup could be waiting for, given that a static IP address is being assigned to interface ib0 ?
Comment 16 Frederic Crozat 2012-02-10 11:50:57 UTC
Well, something is blocking and from the log, it seems to be mlx4_core but it could be something else.

please attach dmesg output and not /var/log/messages.
Comment 17 Bart Van Assche 2012-02-10 18:31:15 UTC
Created attachment 475662 [details]
Contents of /var/log/messages with systemd logging enabled
Comment 18 Bart Van Assche 2012-02-10 18:32:08 UTC
Created attachment 475664 [details]
/etc/init.d/openibd script used during this test
Comment 19 Bart Van Assche 2012-02-10 18:53:40 UTC
I've inserted "strace -f -tt" in front of ifup in /etc/init.d/openibd, rebooted and attached the resulting system log. This is what I have derived from the output of "grep -E 'openibd.*execve|openibd.*clone' var-log-messages.txt":
/etc/init.d/openibd
-> /sbin/ifup ib0
  -> [1095] scripts/ifup-sysctl ib0
  -> [1102] scripts/ifup-wireless ib0
  -> [1112] scripts/ifup-infiniband ib0
    -> [1114] /sbin/ip link set up dev ib0
    -> [1117] /sbin/ip addr add dev ib0 local 192.168.5.1/24 ...
  -> [1122] /etc/sysconfig/network/scripts/ifup-route
  -> [1131] scripts/ifup-services
  -> [1134] if-up.d/21-cifs
    -> [1139] /usr/sbin/rcnmb start
    -> [1139] /bin/systemctl start nmb.service       <- hangs

I'm not sure why that last command hangs ?
Comment 20 Bart Van Assche 2012-02-11 10:07:55 UTC
I've also noticed the following:
* "systemctl restart network.service" runs fine if STARTMODE is set to manual for all InfiniBand interfaces (in /etc/sysconfig/network/ifcfg-ib*).
* "systemctl restart network.service" hangs if STARTMODE is set to auto for at least one InfiniBand interface.

My conclusion is that this issue is not caused by the OFED startup script but is probably some interaction between ifup and systemd. The versions installed here of these packages are:
# type ifup
ifup is /sbin/ifup
# rpm -qf /sbin/ifup
sysconfig-0.75.4-2.5.1.x86_64
# rpm --verify sysconfig-0.75.4-2.5.1.x86_64; echo $?
0
# type systemd 
systemd is /bin/systemd
# rpm -qf /bin/systemd
systemd-37-3.149.1.x86_64
# rpm --verify systemd-37-3.149.1.x86_64; echo $?
0
Comment 21 Frederic Crozat 2012-02-13 09:54:35 UTC
Hanging on samba is bug #725503

could you install samba-client package from http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ ? This might explain the entire lock-up of openibd service (since it is starting network stack).
Comment 22 Bart Van Assche 2012-02-14 20:24:00 UTC
# rpm -q samba-client
samba-client-3.6.1-34.7.1.x86_64

# time systemctl start openibd.service

real    0m5.640s
user    0m0.000s
sys     0m0.000s

Seems to help - I'll test this further.
Comment 23 Bart Van Assche 2012-02-15 17:44:15 UTC
Startup also works fine with openibd and opensmd enabled - thanks !
Comment 24 Frederic Crozat 2012-02-16 08:47:47 UTC
marked as duplicate of the samba bug.

*** This bug has been marked as a duplicate of bug 732395 ***