Bug 1164617 - sendfile(2) does no longer work after switching NIC from e1000e to r8169
Summary: sendfile(2) does no longer work after switching NIC from e1000e to r8169
Status: RESOLVED DUPLICATE of bug 1144162
Alias: None
Product: openSUSE Distribution
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Leap 15.1
Hardware: Other Other
: P5 - None : Normal (vote)
Target Milestone: ---
Assignee: E-mail List
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-22 10:25 UTC by Stefan Seyfried
Modified: 2020-02-23 01:27 UTC (History)
3 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Seyfried 2020-02-22 10:25:11 UTC
Recently I added a r8169 PCI-E nic to my home server machine, because the onboard e1000e was starting to act up weird.

A day later, the content of my apache webserver looked "strange".
Finally, using strace on httpd-prefork, I found this:

[pid 16528] openat(AT_FDCWD, "/srv/www/htdocs/owncloud/core/img/loading-dark.gif", O_RDONLY|O_CLOEXEC) = 22
[pid 16528] read(21, 0x5587151055c8, 8000) = -1 EAGAIN (Resource temporarily unavailable)
[pid 16528] setsockopt(21, SOL_TCP, TCP_CORK, [1], 4) = 0
[pid 16528] writev(21, [{iov_base="HTTP/1.1 200 OK\r\nDate: Sat, 22 F"..., iov_len=234}], 1) = 234
[pid 16528] sendfile(21, 22, [0], 2316) = -1 EOPNOTSUPP (Operation not supported)
[pid 16528] setsockopt(21, SOL_TCP, TCP_CORK, [0], 4) = 0
[pid 16528] write(15, "192.168.200.12 - - [22/Feb/2020:"..., 126) = 126

curl complained about a short read for this:
seife@strolchi:~> curl -v http://server/owncloud/core/img/loading-dark.gif
*   Trying 192.168.200.1:80...
* TCP_NODELAY set
* Connected to server (192.168.200.1) port 80 (#0)
> GET /owncloud/core/img/loading-dark.gif HTTP/1.1
> Host: server
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Sat, 22 Feb 2020 09:08:59 GMT
< Server: Apache/2.4.33 (Linux/SUSE)
< Last-Modified: Wed, 03 Jul 2019 11:17:17 GMT
< ETag: "90c-58cc501036940"
< Accept-Ranges: bytes
< Content-Length: 2316
< Content-Type: image/gif
< 
* transfer closed with 2316 bytes remaining to read
* Closing connection 0
curl: (18) transfer closed with 2316 bytes remaining to read

Then I switched the default apache config to use "EnableSendfile off" and now everything works fine again.

Kernel is
Linux server 4.12.14-lp151.28.36-default #1 SMP Fri Dec 6 13:50:27 UTC 2019 (8f4a495) x86_64 x86_64 x86_64 GNU/Linux

This worked "forever" before switching the NIC. I am pretty sure that I did not change anything but the NIC between the working and the non-working state.
Comment 1 Stefan Seyfried 2020-02-22 10:30:10 UTC
server:~ # lspci -nv -d ::0200
00:19.0 0200: 8086:10df (rev 02)
        Subsystem: 1734:114d
        Flags: bus master, fast devsel, latency 0, IRQ 28
        Memory at fc600000 (32-bit, non-prefetchable) [size=128K]
        Memory at fc627000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 1820 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] PCI Advanced Features
        Kernel driver in use: e1000e
        Kernel modules: e1000e

01:00.0 0200: 10ec:8168 (rev 06)
        Subsystem: 10ec:0123
        Flags: bus master, fast devsel, latency 0, IRQ 29
        I/O ports at 2000 [size=256]
        Memory at fc904000 (64-bit, prefetchable) [size=4K]
        Memory at fc900000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
        Capabilities: [d0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number ec-25-00-00-68-4c-e0-00
        Kernel driver in use: r8169
        Kernel modules: r8169

00:19.0 is the (presumably broken) e1000e, 01:00.0 is the realtek chip
Comment 2 Stefan Seyfried 2020-02-22 10:32:47 UTC
server:~ # dmesg|grep r8169
[    5.412053] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    5.412133] r8169 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM control
[    5.412868] r8169 0000:01:00.0 eth0: RTL8168evl/8111evl at 0xffffc90000ff9000, 00:13:3b:28:01:58, XID 0c900800 IRQ 29
[    5.413002] r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Comment 3 Stefan Seyfried 2020-02-22 10:45:22 UTC
one more datapoint: it works with Kernel:stable's 5.5.5-3.g5157fff-default, will try the 15.2 beta kernel next.
Comment 4 Stefan Seyfried 2020-02-22 10:53:29 UTC
Leap 15.2's kernel 5.3.18 also works fine.
Comment 5 Michal Kubeček 2020-02-22 10:56:31 UTC
Sounds like the same problem as in bsc#1144162 (which I forgot about). Can you
check if enabling scatter-gather also helps in your case?
Comment 6 Stefan Seyfried 2020-02-22 11:35:06 UTC
Yes, "ethtool -K eth0 tx-scatter-gather on" fixes the issue.
Comment 7 Stefan Seyfried 2020-02-22 11:37:43 UTC
and indeed, e1000e (at least with my hardware) has scatter-gather enabled by default, so that explains why it works with one driver but not with another...
Comment 8 Michal Kubeček 2020-02-23 01:27:45 UTC
Let's keep only one bug.

*** This bug has been marked as a duplicate of bug 1144162 ***