Bug 741590

Summary: systemd: socket unit crashing
Product: [openSUSE] openSUSE 12.1 Reporter: Peter Conrad <conrad-novell.com>
Component: BasesystemAssignee: Frederic Crozat <fcrozat>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: fcrozat
Version: Final   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
URL: https://bugs.freedesktop.org/show_bug.cgi?id=39016
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Peter Conrad 2012-01-16 13:43:02 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.14 (KHTML, like Gecko) Chrome/18.0.972.0 Safari/535.14 SUSE/18.0.972.0

I've created a socket-activated service with Accept=true. Everything's been working nicely for a couple of days, until today the socket unit started crashing. /var/log/messages says:

Jan 16 14:05:18 b071 systemd[1]: netqmail-smtpd.socket failed to queue socket startup job: Transport endpoint is not connected
Jan 16 14:05:18 b071 systemd[1]: Unit netqmail-smtpd.socket entered failed state.

After this message, the socket is indeed down:

b071:/var/spool/qmail-queue # systemctl status netqmail-smtpd.socket
netqmail-smtpd.socket - Network socket for incoming SMTP connections
          Loaded: loaded (/etc/systemd/system/netqmail-smtpd.socket; enabled)
          Active: failed since Mon, 16 Jan 2012 14:05:18 +0100; 33s ago
        Accepted: 1493; Connected: 0
          CGroup: name=systemd:/system/netqmail-smtpd.socket

I've managed to capture tcp packets a couple of minutes later:

14:11:55.826550 IP 80.46.66.38.25924 > 62.141.42.71.25: Flags [S], seq 3295731215, win 65535, options [mss 1400,nop,nop,sackOK], length 0
14:11:55.826615 IP 62.141.42.71.25 > 80.46.66.38.25924: Flags [S.], seq 2159025160, ack 3295731216, win 14600, options [mss 1460,nop,nop,sackOK], length 0
14:11:55.884948 IP 80.46.66.38.25924 > 62.141.42.71.25: Flags [.], ack 1, win 65535, length 0
14:11:55.886595 IP 80.46.66.38.25924 > 62.141.42.71.25: Flags [R.], seq 1, ack 1, win 0, length 0

which corresponds to:

Jan 16 14:11:55 b071 systemd[1]: netqmail-smtpd.socket failed to queue socket startup job: Transport endpoint is not connected
Jan 16 14:11:55 b071 systemd[1]: Unit netqmail-smtpd.socket entered failed state.

Apparently, the remote side has closed the connection immediately. Probably some kind of probe.

Looking at http://cgit.freedesktop.org/systemd/systemd/tree/src/socket.c I'd guess that getpeername() in instance_from_socket() returns ENOTCONN because at that time the remote side has already closed the connection.

IMO it's a really bad idea to shut down the listening socket when an error regarding the accept()ed socket occurs. systemd need to be much more robust here.


Reproducible: Always

Steps to Reproduce:
1.
2.
3.
Comment 1 Frederic Crozat 2012-01-19 13:35:13 UTC
please report this bug upstream to https://bugs.freedesktop,org/ thanks.
Comment 2 Peter Conrad 2012-01-19 14:35:10 UTC
Probably a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=39016 .
I don't have an account on bugs.freedesktop.org, could you attach a pointer to here?
Comment 3 Frederic Crozat 2012-01-19 14:48:04 UTC
done, I'll monitor this upstream
Comment 4 Frederic Crozat 2012-03-14 18:00:24 UTC
I've just committed upstream fix, which should be available in http://download.opensuse.org/repositories/home:/fcrozat:/systemd/openSUSE_12.1/ pretty soon.

could you test ?
Comment 5 Peter Conrad 2012-03-15 15:01:17 UTC
I managed to reproduce the problem reliably with nmap -PA 127.0.0.1 .
After installing the above version of systemd the socket unit no longer crashes, so I guess the fix is working.
Comment 6 Frederic Crozat 2012-03-15 15:06:15 UTC
thanks, it will be part of next maintenance update
Comment 7 Frederic Crozat 2012-03-19 14:11:12 UTC
released