Bugzilla – Bug 1164617
sendfile(2) does no longer work after switching NIC from e1000e to r8169
Last modified: 2020-02-23 01:27:45 UTC
Recently I added a r8169 PCI-E nic to my home server machine, because the onboard e1000e was starting to act up weird. A day later, the content of my apache webserver looked "strange". Finally, using strace on httpd-prefork, I found this: [pid 16528] openat(AT_FDCWD, "/srv/www/htdocs/owncloud/core/img/loading-dark.gif", O_RDONLY|O_CLOEXEC) = 22 [pid 16528] read(21, 0x5587151055c8, 8000) = -1 EAGAIN (Resource temporarily unavailable) [pid 16528] setsockopt(21, SOL_TCP, TCP_CORK, [1], 4) = 0 [pid 16528] writev(21, [{iov_base="HTTP/1.1 200 OK\r\nDate: Sat, 22 F"..., iov_len=234}], 1) = 234 [pid 16528] sendfile(21, 22, [0], 2316) = -1 EOPNOTSUPP (Operation not supported) [pid 16528] setsockopt(21, SOL_TCP, TCP_CORK, [0], 4) = 0 [pid 16528] write(15, "192.168.200.12 - - [22/Feb/2020:"..., 126) = 126 curl complained about a short read for this: seife@strolchi:~> curl -v http://server/owncloud/core/img/loading-dark.gif * Trying 192.168.200.1:80... * TCP_NODELAY set * Connected to server (192.168.200.1) port 80 (#0) > GET /owncloud/core/img/loading-dark.gif HTTP/1.1 > Host: server > User-Agent: curl/7.68.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Date: Sat, 22 Feb 2020 09:08:59 GMT < Server: Apache/2.4.33 (Linux/SUSE) < Last-Modified: Wed, 03 Jul 2019 11:17:17 GMT < ETag: "90c-58cc501036940" < Accept-Ranges: bytes < Content-Length: 2316 < Content-Type: image/gif < * transfer closed with 2316 bytes remaining to read * Closing connection 0 curl: (18) transfer closed with 2316 bytes remaining to read Then I switched the default apache config to use "EnableSendfile off" and now everything works fine again. Kernel is Linux server 4.12.14-lp151.28.36-default #1 SMP Fri Dec 6 13:50:27 UTC 2019 (8f4a495) x86_64 x86_64 x86_64 GNU/Linux This worked "forever" before switching the NIC. I am pretty sure that I did not change anything but the NIC between the working and the non-working state.
server:~ # lspci -nv -d ::0200 00:19.0 0200: 8086:10df (rev 02) Subsystem: 1734:114d Flags: bus master, fast devsel, latency 0, IRQ 28 Memory at fc600000 (32-bit, non-prefetchable) [size=128K] Memory at fc627000 (32-bit, non-prefetchable) [size=4K] I/O ports at 1820 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] PCI Advanced Features Kernel driver in use: e1000e Kernel modules: e1000e 01:00.0 0200: 10ec:8168 (rev 06) Subsystem: 10ec:0123 Flags: bus master, fast devsel, latency 0, IRQ 29 I/O ports at 2000 [size=256] Memory at fc904000 (64-bit, prefetchable) [size=4K] Memory at fc900000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 01 Capabilities: [b0] MSI-X: Enable- Count=4 Masked- Capabilities: [d0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number ec-25-00-00-68-4c-e0-00 Kernel driver in use: r8169 Kernel modules: r8169 00:19.0 is the (presumably broken) e1000e, 01:00.0 is the realtek chip
server:~ # dmesg|grep r8169 [ 5.412053] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 5.412133] r8169 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM control [ 5.412868] r8169 0000:01:00.0 eth0: RTL8168evl/8111evl at 0xffffc90000ff9000, 00:13:3b:28:01:58, XID 0c900800 IRQ 29 [ 5.413002] r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
one more datapoint: it works with Kernel:stable's 5.5.5-3.g5157fff-default, will try the 15.2 beta kernel next.
Leap 15.2's kernel 5.3.18 also works fine.
Sounds like the same problem as in bsc#1144162 (which I forgot about). Can you check if enabling scatter-gather also helps in your case?
Yes, "ethtool -K eth0 tx-scatter-gather on" fixes the issue.
and indeed, e1000e (at least with my hardware) has scatter-gather enabled by default, so that explains why it works with one driver but not with another...
Let's keep only one bug. *** This bug has been marked as a duplicate of bug 1144162 ***