|
Bugzilla – Full Text Bug Listing |
| Summary: | InfiniBand network interface initialization hangs (caused by systemd ?) | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 12.1 | Reporter: | Bart Van Assche <bart.vanassche> |
| Component: | Basesystem | Assignee: | Frederic Crozat <fcrozat> |
| Status: | RESOLVED DUPLICATE | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | ||
| Version: | Final | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | SUSE Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Contents of /var/log/messages with systemd logging enabled
Contents of /var/log/messages with systemd logging enabled (2) Contents of /var/log/messages with systemd logging enabled Output of "systemd-analyze plot" /etc/init.d/openibd script used during this test /var/log/messages with systemd-37-3.149.1.x86_64 Contents of /var/log/messages with systemd logging enabled /etc/init.d/openibd script used during this test |
||
|
Description
Bart Van Assche
2012-01-24 20:02:48 UTC
Note: the OFED initialization scripts work fine if started after startup finished:
# time { /etc/init.d/openibd start; /etc/init.d/opensmd start; }
Loading HCA driver and Access Layer: [ OK ]
Setting up InfiniBand network interfaces:
ib0 device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)
Bringing up interface ib0: [ OK ]
ib1 device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)
Bringing up interface ib1: [ OK ]
Setting up service network . . . [ done ]
redirecting to systemctl
real 0m4.917s
user 0m0.070s
sys 0m0.960s
Note: apparently systemd got the startup order wrong. openibd must be started before opensmd, while systemd decided to start these services in the reverse order. From /var/log/messages: Jan 24 19:34:53 asus kernel[1]: Installed new job opensmd.service/start as 84 Jan 24 19:34:53 asus kernel[1]: Installed new job openibd.service/start as 88 My system starts again normally after having removed openibd from Should-Start in network and after having added $network as Required-start in openibd and openibd as Required-Start for opensmd: # grep -E 'Required-Start:|Should-Start:' network openibd opensmd network:# Required-Start: $local_fs network:# Should-Start: earlysyslog isdn SuSEfirewall2_init openibd:# Required-Start: $local_fs $network opensmd:# Required-Start: $syslog openibd So why is openibd specified as a Should-Start dependency of network in sysconfig-0.75.4-2.2.2.x86_64.rpm ? Is that correct ? please reboot with adding on kernel command line "systemd.log_level=debug systemd.log_target=kmsg" and attach dmesg output to this bug report. Created attachment 473319 [details]
Contents of /var/log/messages with systemd logging enabled (2)
Contents of /var/log/messages for the following system configuration:
* Unmodified /etc/init.d/network (rpm --verify sysconfig-0.75.4-2.2.2.x86_64; echo $? gives 0 as output).
* openibd has $local_fs as Required-Start
* opensmd has $local_fs, $network and openibd as Required-Start
openibd startup has a timeout. From a quick look at the "initscript" shipped by OFED, it seems completely broken and doesn't integrate properly with openSUSE initscript (no source of /etc/rc.status, which will break systemd integration when script is manually called). It is not clear if this script is supposed to start a daemon or not. Try adding in the LSB header (it requires systemd from 12.1 maintenance update to work properly): # X-Systemd-RemainAfterExit: true Created attachment 473854 [details]
Contents of /var/log/messages with systemd logging enabled
Even after having added "X-Systemd-RemainAfterExit: true" there is still a ten minute delay during startup with openibd enabled. I also noticed a strange message while enabling openibd:
asus:~ # systemctl --system daemon-reload
asus:~ # systemctl disable openibd.service
openibd.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig openibd off
asus:~ # systemctl enable openibd.service
openibd.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig openibd on
Warning: unit files do not carry install information. No operation executed.
# find /etc/init.d/rc* -name \*openibd
/etc/init.d/rc2.d/S01openibd
/etc/init.d/rc2.d/K10openibd
/etc/init.d/rc3.d/S01openibd
/etc/init.d/rc3.d/K10openibd
/etc/init.d/rc5.d/S01openibd
/etc/init.d/rc5.d/K10openibd
Created attachment 473855 [details]
Output of "systemd-analyze plot"
Created attachment 473863 [details]
/etc/init.d/openibd script used during this test
Please note that I've attempted to make the openibd startup script LSB-compliant (see attachment).
(In reply to comment #6) > It is not clear if this script is supposed to start a daemon or not. As far as I know the openibd startup script does not start any daemons. What it does is to load several kernel modules, to adjust some system settings and to bring up the IPoIB network interfaces. could you : - move the "# X-Systemd-RemainAfterExit" line abobe the "# Description" line (otherwise, it is ignored) - install latest version of systemd package (from http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ ) - edit /etc/init.d/openibd and add -x to the first line : #!/bin/bash -x and attach /var/log/messages after booting Created attachment 475354 [details]
/var/log/messages with systemd-37-3.149.1.x86_64
Startup still takes 10 minutes with openibd enabled unfortunately.
from the log, it looks like modprobe mlx4_core is blocking in openibd core, so it doesn't seem systemd related, it is just that systemd highlight the issue. (In reply to comment #13) > from the log, it looks like modprobe mlx4_core is blocking in openibd core, so > it doesn't seem systemd related, it is just that systemd highlight the issue. Are you sure ? From /var/log/messages: Feb 9 16:42:58 asus openibd[875]: + /sbin/modprobe mlx4_ib Feb 9 16:42:58 asus openibd[875]: + /sbin/modprobe mlx4_en Feb 9 16:42:59 asus kernel[875]: + /sbin/modprobe mlx4_core So before the openibd startup script issues the shell command "/sbin/modprobe mlx4_core", the modules mlx4_ib and mlx4_en have already been loaded successfully. Since these last two modules depend on mlx4_core, loading either module also loads the mlx4_core module. So how is it possible that the "/sbin/modprobe mlx4_core" command hangs at a time that module has already been loaded ? After having had a closer look at /var/log/messages, my conclusion is not only that the messages in /var/log/messages are out of order but also that the timestamps are out of order. There are several messages in the log that show that loading both mlx4_ib and mlx4_en succeeded, so loading mlx4_core must also have succeeded and that last module must have been loaded before mlx4_ib and mlx4_en. So my conclusion is that the command that caused /etc/init.d/openibd to hang is: Feb 9 16:42:58 asus openibd[875]: + /sbin/ifup ib0 Feb 9 16:42:58 asus openibd[875]: ib0 device: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0) I'm not sure though what ifup could be waiting for, given that a static IP address is being assigned to interface ib0 ? Well, something is blocking and from the log, it seems to be mlx4_core but it could be something else. please attach dmesg output and not /var/log/messages. Created attachment 475662 [details]
Contents of /var/log/messages with systemd logging enabled
Created attachment 475664 [details]
/etc/init.d/openibd script used during this test
I've inserted "strace -f -tt" in front of ifup in /etc/init.d/openibd, rebooted and attached the resulting system log. This is what I have derived from the output of "grep -E 'openibd.*execve|openibd.*clone' var-log-messages.txt":
/etc/init.d/openibd
-> /sbin/ifup ib0
-> [1095] scripts/ifup-sysctl ib0
-> [1102] scripts/ifup-wireless ib0
-> [1112] scripts/ifup-infiniband ib0
-> [1114] /sbin/ip link set up dev ib0
-> [1117] /sbin/ip addr add dev ib0 local 192.168.5.1/24 ...
-> [1122] /etc/sysconfig/network/scripts/ifup-route
-> [1131] scripts/ifup-services
-> [1134] if-up.d/21-cifs
-> [1139] /usr/sbin/rcnmb start
-> [1139] /bin/systemctl start nmb.service <- hangs
I'm not sure why that last command hangs ?
I've also noticed the following: * "systemctl restart network.service" runs fine if STARTMODE is set to manual for all InfiniBand interfaces (in /etc/sysconfig/network/ifcfg-ib*). * "systemctl restart network.service" hangs if STARTMODE is set to auto for at least one InfiniBand interface. My conclusion is that this issue is not caused by the OFED startup script but is probably some interaction between ifup and systemd. The versions installed here of these packages are: # type ifup ifup is /sbin/ifup # rpm -qf /sbin/ifup sysconfig-0.75.4-2.5.1.x86_64 # rpm --verify sysconfig-0.75.4-2.5.1.x86_64; echo $? 0 # type systemd systemd is /bin/systemd # rpm -qf /bin/systemd systemd-37-3.149.1.x86_64 # rpm --verify systemd-37-3.149.1.x86_64; echo $? 0 Hanging on samba is bug #725503 could you install samba-client package from http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ ? This might explain the entire lock-up of openibd service (since it is starting network stack). # rpm -q samba-client samba-client-3.6.1-34.7.1.x86_64 # time systemctl start openibd.service real 0m5.640s user 0m0.000s sys 0m0.000s Seems to help - I'll test this further. Startup also works fine with openibd and opensmd enabled - thanks ! marked as duplicate of the samba bug. *** This bug has been marked as a duplicate of bug 732395 *** |