Bug 883565

Summary: Race condition in systemd startup/shutdown affecting CIFS remote mounts
Product: [openSUSE] openSUSE 13.1 Reporter: Rodney Baker <rodney.baker>
Component: NetworkAssignee: systemd maintainers <systemd-maintainers>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: arvidjaar, fbui, mt, wicked-maintainers
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: openSUSE 13.1   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Rodney Baker 2014-06-20 13:28:02 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0

With the default service/unit-file dependencies as shipped with openSuSE 13.1, the remote-fs service is started before if-up controlled network interfaces are available, and is shut down after the network interfaces. 

This results in remote mounts that should be mounted by remote-fs.service (for example, cifs mounts defined in /etc/samba/cifstab) failing to mount at startup, and if mounted manually, not being unmounted cleanly at shutdown. The shutdown process also hangs until the unmount job times out since the network has been taken down before the remote file systems are unmounted.

To fix this locally, I had to add the line:

Requires=network@eth0.service

to /usr/lib/systemd/system/remote-fs.service.

This ensures that remote-fs.service is not started before the network interface is up and that the network interface is not stopped until remote-fs.service has exited during shutdown. 

I tried using "Requires=network.service" and "Requires=network.target" - these fixed mounting at start up but did not fix the delayed shutdown. 

Unforunately my hack is too system specific because it requires that the correct interface name is known when remote-fs.service is created. This will not be the same on every system and, in the case of multiple network interfaces, there may be multiple dependencies.

Although it works as a short-term workaround a more permanent and more generic solution is needed.

Reproducible: Always

Steps to Reproduce:
1. Define remote smb mounts in /etc/samba/cifstab
2. Have LAN interface controlled by if-up
3. Startup the system - remote cifs mounts will not be mounted.
4. Manually mount the cifs shares (systemctl restart remote-fs will do it).
5. Shutdown the systme - the system will hang during shutdown because the network interface is stopped before the cifs shares are unmounted.
Actual Results:  
As described above.

Expected Results:  
The remote-fs service should only be started after the relevant network interface(s) are started and online.

The remote shares should be unmounted before the relevant network interface(s) is/are shut down.
Comment 1 Andrei Borzenkov 2014-06-20 15:59:03 UTC
(In reply to comment #0)
> to /usr/lib/systemd/system/remote-fs.service.
> 

Please show output of

"rpm -qif /usr/lib/systemd/system/remote-fs.service"
Comment 2 Rodney Baker 2014-06-20 17:39:52 UTC
Sorry - correction - the file should be remote-fs.target (and this is where I added the Requires=network@eth0.service line.

Name        : systemd
Version     : 208
Release     : 19.1
Architecture: x86_64
Install Date: Fri Mar 14 02:51:03 2014
Group       : System/Base
Size        : 6235015
License     : LGPL-2.1+
Signature   : RSA/SHA256, Mon Mar  3 20:05:01 2014, Key ID b88b2fd43dbdc284
Source RPM  : systemd-208-19.1.src.rpm
Build Date  : Mon Feb 24 21:37:07 2014
Build Host  : build08
Relocations : (not relocatable)
Packager    : http://bugs.opensuse.org
Vendor      : openSUSE
URL         : http://www.freedesktop.org/wiki/Software/systemd
Summary     : A System and Session Manager
Description :
Systemd is a system and service manager, compatible with SysV and LSB
init scripts for Linux. systemd provides aggressive parallelization
capabilities, uses socket and D-Bus activation for starting services,
offers on-demand starting of daemons, keeps track of processes using
Linux cgroups, supports snapshotting and restoring of the system state,
maintains mount and automount points and implements an elaborate
transactional dependency-based service control logic. It can work as a
drop-in replacement for sysvinit.
Distribution: openSUSE 13.1


My edited remote-fs.target unit file is as follows:

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Remote File Systems
Documentation=man:systemd.special(7)
After=remote-fs-pre.target network.target
DefaultDependencies=no
Conflicts=shutdown.target
Requires=network.service network@eth0.service

Note that I first added Requires=network.service but this did not solve the problem - it was necessary to require the interface. I did not try network-online.target though...
Comment 3 Andrei Borzenkov 2014-06-20 17:54:49 UTC
(In reply to comment #2)
> Sorry - correction - the file should be remote-fs.target

So what service mounts filesystems from /etc/samba/cifstab? remote-fs.target is pure synchronization point and does not do anything by itself.
Comment 4 Andrei Borzenkov 2014-06-21 16:30:42 UTC
Could you test updated package. It should make legacy services to be started after network-online.target; network@xxx.service are ordered before it by default.

zypper add obs://home:arvidjaar:bnc:883565/standard bnc883565
zypper refresh bnc883565
zypper dup -r bnc883565

And revert your change in remote-fs.target (should be overwritten on update anyway).
Comment 5 Rodney Baker 2014-06-22 10:12:33 UTC
OK, thanks. Please give me a day or so to test and respond.
Comment 6 Rodney Baker 2014-06-22 10:34:32 UTC
Actually I got it done quicker. Happy to report that the fix appears to have resolved the issue for me. A couple of shutdown/startup cycles and the CIFS shares are mounted and unmounted correctly each time. Thank you!
Comment 7 Andrei Borzenkov 2014-06-22 15:04:54 UTC
@systemd-maintainers - this is fixed upstream in commit 0404c609f399b2092a3de52eef9d75b0dc12e94c which makes legacy scripts that need $network be ordered after network-online.target. network@.service is already ordered before network-onlilne.target.

For factory request 238278 contains backport of this commit (it does not apply directly). For 13.1 I believe we need patch as well.
Comment 8 Dr. Werner Fink 2014-06-23 08:23:48 UTC
(In reply to comment #7)

This requires that in wicked.service there is a line

     Wants=network-online.target

as otherwise we get into trouble with OS 13.2 and SLES12

Marius?
Comment 9 Andrei Borzenkov 2014-06-23 09:45:57 UTC
(In reply to comment #8)
> (In reply to comment #7)
> 
> This requires that in wicked.service there is a line
> 
>      Wants=network-online.target
> 

No, it does not. Systemd automatically adds Wants=network-online.target to legacy sysvinit services. That's the main point of upstream patch.

This requires that network-online.target comes after wicked services; I assumed this is already the case?
Comment 10 Dr. Werner Fink 2014-06-23 12:15:46 UTC
This will interfere with 0018-Make-LSB-Skripts-know-about-Required-and-Should.patch and insserv-generator.patch
Comment 11 Rodney Baker 2014-06-23 12:55:22 UTC
Er, I'm confused. I can find no wicked.service on my machine at all.
Comment 12 Dr. Werner Fink 2014-06-23 13:18:06 UTC
I'm talking about the soon coming 13.2 and not 13.1
Comment 13 Rodney Baker 2014-06-23 13:44:08 UTC
Ah, OK. This report was for 13.1, hence my confusion at your response. :)
Comment 14 Andrei Borzenkov 2014-06-23 17:10:27 UTC
(In reply to comment #10)
> This will interfere with
> 0018-Make-LSB-Skripts-know-about-Required-and-Should.patch

Could you explain. My patch is on top of 0018-Make-LSB-Skripts-know-about-Required-and-Should.patch and complements it by adding compatible upstream behavior. It does not change what 0018-Make-LSB-Skripts-know-about-Required-and-Should.patch does.

> and insserv-generator.patch

You are right, I missed it. I submitted SR#238415 which updates insserv-generator.patch to use network-onlilne.target for $network.
Comment 15 Dr. Werner Fink 2014-06-24 07:21:34 UTC
(In reply to comment #14)

the patch is on top of 0018-Make-LSB-Skripts-know-about-Required-and-Should.patch but as you may have seen this patch also uses UNIT_REQUIRES_OVERRIDABLE for all required services .. therefore I wonder if your boolean does match:

   if (streq(m, SPECIAL_NETWORK_ONLINE_TARGET) && d == UNIT_AFTER && e == _UNIT_DEPENDENCY_INVALID)
               e = UNIT_WANTS;

do you have traced this to see how often this does match?

I'm currently testing out your new SR#238415 ... beside some other changes.
Comment 16 Dr. Werner Fink 2014-06-24 12:02:33 UTC
For openSUSE 13.1 I have to use

%if 0%{?suse_version} <= 1310
#
# Older versions like oS 13.1 do not distinguish between
# network.target and network-online.target
#
for f in src/core/service.c src/insserv-generator/insserv-generator.c
do
  sed -ri '/"network",.*SPECIAL_NETWORK_ONLINE_TARGET,/{ s/SPECIAL_NETWORK_ONLINE_TARGET/SPECIAL_NETWORK_TARGET/}' $f
done
%endif

to make it work
Comment 17 Marius Tomaschewski 2014-06-25 12:56:50 UTC
(In reply to comment #8)
> (In reply to comment #7)
> 
> This requires that in wicked.service there is a line
> 
>      Wants=network-online.target
> 
> as otherwise we get into trouble with OS 13.2 and SLES12
> 
> Marius?

Yes. Fix is in git master and will be submitted today.
[https://github.com/openSUSE/wicked/commit/e6d1f904ba]
Comment 18 Bernhard Wiedemann 2014-06-25 17:00:44 UTC
This is an autogenerated message for OBS integration:
This bug (883565) was mentioned in
https://build.opensuse.org/request/show/238676 Factory / wicked
Comment 20 Andrei Borzenkov 2014-06-25 18:48:53 UTC
(In reply to comment #16)
> For openSUSE 13.1 I have to use
> 
> %if 0%{?suse_version} <= 1310
> #
> # Older versions like oS 13.1 do not distinguish between
> # network.target and network-online.target
> #
> for f in src/core/service.c src/insserv-generator/insserv-generator.c
> do
>   sed -ri '/"network",.*SPECIAL_NETWORK_ONLINE_TARGET,/{
> s/SPECIAL_NETWORK_ONLINE_TARGET/SPECIAL_NETWORK_TARGET/}' $f
> done
> %endif
> 
> to make it work

?? That is what we have right now. And it is broken.

I think I understand the problem. In 13.1 network@<if>.service is not part of transaction. It is started explicitly by /etc/init,d/network via "systemctl start network@<if>". This means we just replaced one race condition with another one. If job for network@<if> is queued *before* network-online.target, it works. But if network-online@target happens to be finished before, we lose.

Note that upstream commit 58e027023b47b32e42cf93dd4a629b869ee1ef25 adds explicit After=network.target to network-online.target. This should fix it - it ensures that network-online.target is started only after /etc/init.d/network is finished, which means all jobs for network@if are submitted.

I added backport of this patch to my repo, which is rebuilding right now. Could you test if it works for you?
Comment 21 Andrei Borzenkov 2014-06-25 18:50:51 UTC
(In reply to comment #17)
> (In reply to comment #8)
> > (In reply to comment #7)
> > 
> > This requires that in wicked.service there is a line
> > 
> >      Wants=network-online.target
> > 
> > as otherwise we get into trouble with OS 13.2 and SLES12
> > 
> > Marius?
> 
> Yes. Fix is in git master and will be submitted today.
> [https://github.com/openSUSE/wicked/commit/e6d1f904ba]

Folks, that is totally wrong, please do not do it. network-online.target should be pulled in by *consumers* of network, not by *providers*. What you must do, is to order wicked.service before network-online.target, but this is automatic with commit 58e027023b47b32e42cf93dd4a629b869ee1ef25 from upstream.
Comment 22 Dr. Werner Fink 2014-06-26 06:55:43 UTC
(In reply to comment #21)

The commit 58e027023b47b32e42cf93dd4a629b869ee1ef25 is now part of Base:System/systemd ... nevertheless I've seen that upstream there is a systemd-networkd-wait-online.service which has

 [Install]
 WantedBy=network-online.target

... I wonder if this should go into wicked.service as this one is what the upstream service does.
Comment 23 Bernhard Wiedemann 2014-06-26 22:00:37 UTC
This is an autogenerated message for OBS integration:
This bug (883565) was mentioned in
https://build.opensuse.org/request/show/238832 Factory / wicked
Comment 24 Bernhard Wiedemann 2014-06-27 12:00:38 UTC
This is an autogenerated message for OBS integration:
This bug (883565) was mentioned in
https://build.opensuse.org/request/show/238861 Factory / wicked
Comment 25 Marius Tomaschewski 2014-06-30 07:25:24 UTC
(In reply to comment #21)
> (In reply to comment #17)
> > (In reply to comment #8)
> > > (In reply to comment #7)
> > > 
> > > This requires that in wicked.service there is a line
> > > 
> > >      Wants=network-online.target
> > > 
> > > as otherwise we get into trouble with OS 13.2 and SLES12
> > > 
> > > Marius?
> > 
> > Yes. Fix is in git master and will be submitted today.
> > [https://github.com/openSUSE/wicked/commit/e6d1f904ba]
> 
> Folks, that is totally wrong, please do not do it. network-online.target should
> be pulled in by *consumers* of network, not by *providers*. What you must do,
> is to order wicked.service before network-online.target, but this is automatic
> with commit 58e027023b47b32e42cf93dd4a629b869ee1ef25 from upstream.

So we have to remove this again. :-(

Done in https://github.com/openSUSE/wicked/pull/291

wicked.service contains a Before=...network-online.target...

Andrey, could you review if it is OK so far now again?

https://github.com/openSUSE/wicked/blob/6a5cdf4889b29b3a1c498bf8804d6f7fd1ef6508/etc/systemd/wicked.service.in
Comment 26 Dr. Werner Fink 2014-06-30 07:28:29 UTC
I'd like to suggest to use WantedBy=network-online.target
Comment 27 Marius Tomaschewski 2014-06-30 07:35:42 UTC
Andrey,

do we need a WantedBy instead?:

 [Install]
 WantedBy=network-online.target
Comment 28 Marius Tomaschewski 2014-06-30 07:37:20 UTC
(In reply to comment #26)
> I'd like to suggest to use WantedBy=network-online.target

I'll add it when it is correct to add it.
Comment 29 Dr. Werner Fink 2014-06-30 07:55:46 UTC
IMHO it is correct:

 systemd/Upstream> grep network-online.target -rs units/
 units/systemd-networkd-wait-online.service.in:Before=network.target network- online.target
 units/systemd-networkd-wait-online.service.in:WantedBy=network-online.target

From manual page man:systemd.unit(5)
       WantedBy=, RequiredBy=
           This option may be used more than once, or a space-separated list of unit names may be given. A symbolic link is created in the .wants/ or .requires/ directory of each of the listed units when this unit is installed by systemctl enable. This has the effect that a dependency of type Wants= or Requires= is added from the listed unit to the current unit. The primary result is that the current unit will be started when the listed unit is started. See the description of Wants= and Requires= in the [Unit] section for details.

           WantedBy=foo.service in a service bar.service is mostly equivalent to Alias=foo.service.wants/bar.service in the same file. In case of template units, systemctl enable must be called with an instance name, and this instance will be added to the .wants/ or .requires/ list of the listed unit. E.g. WantedBy=getty.target in a service getty@.service will result in systemctl enable getty@tty2.service creating a getty.target.wants/getty@tty2.service link to getty@.service.
Comment 30 Andrei Borzenkov 2014-07-03 16:23:33 UTC
(In reply to comment #22)
> systemd-networkd-wait-online.service which has
>  [Install]
>  WantedBy=network-online.target
> 
> ... I wonder if this should go into wicked.service as this one is what the
> upstream service does.

But this is backward! WantedBy != Wants, it is exactly reverse dependency.

(In reply to comment #27)
> Andrey,
> 
> do we need a WantedBy instead?:
> 
>  [Install]
>  WantedBy=network-online.target

Well ... upstream has two different units - service that actively configures networking and service that passively checks (or waits) for network to be configured. Service, that waits for network, is not started automatically to avoid delays on startup for everyone. That's the reason of WantedBy - to pull in service to wait for network if someone needs to wait.

As I understand, there is no such difference in wicked. wicked.service is synchronous, right? When wicked.service is has finished startup, networking is configured. So it is enough to have it ordered before network-online.target as it already is.

So no, I think it is not required.
Comment 31 Dr. Werner Fink 2014-07-03 17:14:12 UTC
(In reply to comment #30)

... yes and no ... as the future of wicked will become asynchronous (AFAIK) and therefore I'd like to see this WantedBy=network-online.target to avoid trouble in future ;)

Don't know how dail-in on demand will fit into this scheme but I guess if the device is up the network is somehow online.
Comment 32 Andrei Borzenkov 2014-07-03 17:49:27 UTC
(In reply to comment #31)
> (In reply to comment #30)
> 
> ... yes and no ... as the future of wicked will become asynchronous

Then it should provide second service wicked-online or similar.

>                                                                    (AFAIK) and
> therefore I'd like to see this WantedBy=network-online.target to avoid trouble
> in future ;)
> 

How can WantedBy on *wicked* service help in this case?!? Then you will need WantedBy on service that will wait until network is online (wicked-online.service or whatever). Having WantedBy on a service that just triggers network configuration without waiting for it is pointless. That is exactly the problem we try to fix here, in this bug report.

> Don't know how dail-in on demand will fit into this scheme but I guess if the
> device is up the network is somehow online.
Comment 33 Dr. Werner Fink 2014-07-04 10:19:01 UTC
This works only because network-online.target will be wanted by service_load_sysv_path() for the LSB services.  Without LSB scripts any other unit depending on network-online.target would also require a Wants=network-online.target
Comment 34 Andrei Borzenkov 2014-07-04 10:51:12 UTC
(In reply to comment #33)
> This works only because network-online.target will be wanted by
> service_load_sysv_path() for the LSB services.  Without LSB scripts any other
> unit depending on network-online.target would also require a
> Wants=network-online.target

I lost you here. network-online.target is by design intended to be *EXPLICITLY* pulled in by units that want to be delayed on startup until network is configured. For those cases when units are autogenerated by systemd (like network mounts) dependency is added automatically. For units that you ship manually you need to add this manually to unit file if unit really needs it.

So what exactly are you arguing here? Sorry, I do not get it.
Comment 35 Dr. Werner Fink 2014-07-04 11:12:49 UTC
If there is a unit written by a user/customer which has network-online.target in its After= line and no LSB script is enabled then this unit may fail as it could be up before network
Comment 36 Andrei Borzenkov 2014-07-04 12:50:00 UTC
(In reply to comment #35)
> If there is a unit written by a user/customer which has network-online.target
> in its After= line and no LSB script is enabled then this unit may fail as it
> could be up before network

I'm afraid we are going in circles. As my last comment:

- short version -
This is a bug in respective unit and user/customer needs to fix it.

- long version -
network-online.target should be explicitly pulled in by units that need it. Quoting http://www.freedesktop.org/software/systemd/man/systemd.special.html:

network-online.target

    Units that strictly require a configured network connection should pull in network-online.target (via a Wants= type dependency) and order themselves after it. 

And some more detailed explanation in http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Comment 37 Bernhard Wiedemann 2014-07-04 13:00:41 UTC
This is an autogenerated message for OBS integration:
This bug (883565) was mentioned in
https://build.opensuse.org/request/show/239589 Factory / wicked
Comment 39 Dr. Werner Fink 2014-07-04 13:12:03 UTC
(In reply to comment #36)

I'm aware ... nevertheless in my experience documentation will not be read but a bug created causing work load :(
Comment 40 Swamp Workflow Management 2016-02-03 14:35:01 UTC
openSUSE-RU-2016:0320-1: An update that has 146 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 737690,742774,750845,818044,838475,841544,849870,852015,852021,852232,853293,854884,856389,856392,856858,857204,858864,859072,859365,860574,860937,861316,861489,863217,864745,864904,865834,866732,866933,867128,867663,867664,867840,868019,868230,868439,868931,869142,869603,872929,873432,873444,874665,875502,876587,876694,877021,877674,878525,880438,880732,881125,881559,881942,882393,882714,883565,884271,884403,885232,885288,886211,886599,886852,888178,888215,888612,889297,889357,890977,892096,892162,892300,893797,895087,896664,897799,897801,897803,898233,898240,898432,900558,901481,902240,902901,903009,903963,904214,904517,904828,905550,906709,906900,907318,907393,908476,909358,910643,911347,912030,912334,913517,916420,918118,919095,920195,921831,921898,921920,926169,927250,927457,928265,931388,932284,933365,933512,933521,933533,934077,934901,937512,937900,938908,939571,940264,941576,944132,944799,945282,947212,948458,948555,948705,949574,949683,949739,950510,951265,951663,953241,954336,954781,955635,961576
CVE References: 
Sources used:
openSUSE 13.1 (src):    systemd-210-40.1, systemd-mini-210-40.1
Comment 41 Franck Bui 2016-02-08 09:50:32 UTC
This issue should be fixed by the latest update release of systemd. Feel free to reopen it if you don't think it's the case.

Closing.