Bug 229848

Summary: unable to access certain web sites
Product: [openSUSE] openSUSE 10.2 Reporter: Felix Miata <mrmazda>
Component: KernelAssignee: Hartmut Meyer <hartmut.meyer>
Status: RESOLVED WONTFIX QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: bob_l_lewis, dimstar, forgotten_5jFyFBvk-I, forgotten_gxNgjAWAcH, forgotten_x7Ms9HcTTQ, frank.buschlinger, goldstein.mark, MARTIN, michaelnel, scott, stelian.iancu, tonyn
Version: Final   
Target Milestone: ---   
Hardware: Other   
OS: Other   
URL: http://www.marymount.edu/
Whiteboard:
Found By: Other Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: tcpdump output
vanilla 2.6.19 kernel config file for successfully accessing sites
a more nearly stock vanilla 2.6.19 config file that works
tcpdump -i eth0 -vv -n -l host www.marymount.edu

Description Felix Miata 2006-12-20 00:54:25 UTC
There's a long thread discussion about this on the opensuse mailing list. http://www.marymount.edu/ and http://www.keh.com/ are the only two sites so far known to exhibit this problem. The thread was started by Chip Cooper at http://lists.opensuse.org/opensuse/2006-12/msg02707.html because the subject URL is his work web site.

To reproduce:
1-open any web browser in 10.2 
2-try to open subject URL or the alternate URL

Actual behavior:
1-perpetually "waiting for www.marymount.edu" (e.g. from Firefox)

Expected behavior:
1-site opens fully in short order

Notes:
1-firewalls, proxies and hardware are not factors
2-most who try fail to reach site (very few reports of success)
3-failure also using Debian Etch 2.6.18-3 and Fedora 6 2.6.18-1 on exact same system as 10.2
4-failure does not occur in 10.0 2.6.13-15.12 or Ubuntu 6.10 2.6.15-27 on exact same system as 10.2
5-failure in SUSE occurs with 2.6.18.2-33 and 2.6.18.2-34 kernels, default and bigmem
6-booting 10.2 with 2.6.16 kernel solves problem for the OP
7-booting 10.2 with generic 2.6.19 solves problem for me
8-failure also occurs attempting to fetch URL with wget
9-ping and traceroute also fail to reach the URL
10-as yet no reports of useful information from anyone who's tried wireshark
Comment 1 Forgotten User YLzcEHequO 2006-12-20 02:44:17 UTC
Computer #1: Compaq ML330 server openSuse 10.0 connects in short order
PC #2: older laptop either with builtin nic or Linksys WRT54G PCMCIA card, openSuse 10.2 _cannot_ connect with either card to http://www.marymount.edu/
Comment 2 Hartmut Meyer 2006-12-20 07:41:15 UTC
There is at least one other site that appears to show the same problem:

  http://www.keh.com

This was also reported on the opensuse mailinglist and I can confirm it (works fine with Konqueror on OES but doesn't work with either konqueror or firefox in 10.2). As with http://www.marymount.com, the "download" starts (the title is displayed in the konqueror/firefox title bar) but then stalls.
Comment 3 Martin Mielke 2006-12-20 07:43:02 UTC
Unable to access through squid-2.6.STABLE5-7 running on openSuSE 10.2 with kernel 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux.

squid logs show line like these when trying to access the conflicting website from another PC on the LAN:

---
1166565117.647      0 192.168.0.204 TCP_DENIED/403 1398 GET http://www.marymount.edu/ - NONE/- text/html

1166565117.895      4 192.168.0.204 TCP_DENIED/403 1420 GET http://www.marymount.edu/favicon.ico - NONE/- text/html
---

Comment 4 Hartmut Meyer 2006-12-20 07:45:50 UTC
Created attachment 110467 [details]
tcpdump output
Comment 5 Dominique Leuenberger 2006-12-20 08:17:43 UTC
The same problem applies on a WiFi based connection:
openSUSE 10.2, i586, latest updates. It is not depending of browser software, even a simple
telnet www.marymount.edu 80
get / HTTP/1.1

fails and stalls.

A tcpdump showed no abnomalies in the connection (no crc errors, nothing weird).

A discussion on #linux (irc.freenode.org) showed that several other people, not openSUSE users, had the same problem with this site with Linux >= 2.6.17.

Accessing the site from behind a proxy using openSUSE 10.2 works.
Comment 6 Mark Goldstein 2006-12-20 09:01:06 UTC
I tried both sites from 2 different boxes with openSuSE 10.2

One was IBM Thinkpad with Intel Pro/100 NIC, kernel 2.6.18.2-34-default, behind company firewall / proxy.

Another was IBM Aptiva with RTL-8139 NIC, same kernel. First time it was behind firewall/NAT (iptables on Fedora Core 3) and then it was directly connected to ADSL modem.

(Note. Right now the first site http://www.marymount.edu/ seems unaccessible at all, probably too many people are trying), but the second one, http://www.keh.com, works fine for me.)
Comment 7 Stelian Iancu 2006-12-20 11:28:55 UTC
I cannot also access this site. I am trying only from my laptop, wifi connection. 

thor:~ # lspci | grep Network
02:02.0 Network controller: Intel Corporation PRO/Wireless LAN 2100 3B Mini PCI Adapter (rev 04)

thor:~ # uname -a
Linux thor 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 i686 i686 i386 GNU/Linux
Comment 8 Michael Nelson 2006-12-20 13:38:57 UTC
I'm running 10.1 with a self compiled 2.6.18 kernel from kernel.org.  I am able to reach the site fine with Firefox 1.5.08 and 2.0, as well as with Konqueror and Opera.
Comment 9 Manfred Hollstein 2006-12-20 13:44:46 UTC
Both sites (www.keh.com and www.marymount.edu) fail here on a DELL D620 running
with current updates (ie. kernel-default-2.6.18.2-34.x86_64.rpm); NIC is the onboard

  09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5752 Gigabit Ethernet PCI Express (rev 02)

connected to a GigE Netgear GS105 switch connected to a Fritz 7170 A-DSL modem with no proxy in between.
Comment 10 Michael Nelson 2006-12-20 13:50:12 UTC
OK, I got snookered by a post on the mailing list.  www.marymount.com works.  www.marymount.edu and www.keh.com both hang without loading, with 10.1 and a kernel.org self-compiled 2.6.18 kernel.
Comment 11 Felix Miata 2006-12-21 16:20:17 UTC
Created attachment 110712 [details]
vanilla 2.6.19 kernel config file for successfully accessing sites
Comment 12 Felix Miata 2006-12-21 16:33:18 UTC
Created attachment 110714 [details]
a more nearly stock vanilla 2.6.19 config file that works
Comment 13 Mark Goldstein 2006-12-21 19:49:27 UTC
I was advised by John Andersen to check whether my provider uses transparent proxy. I used tcptracerout and it looks like John's right (tcptraceroute to port 80 completes after second hop, while traceroute takes much more and different amount every time). So in the second case from comment #6 (IBM Aptiva through ADSL modem) most probable successful connection is also from behind the proxy.
Comment 14 Robert Lewis 2006-12-31 01:18:56 UTC
Another web site that illustrates this issue is:
furniture33.com

Thanks for the work of providing the kernel patches needed to resolve this.

Please advise if we know what released/shipping kernel version will 
incorporate this fix?   If we know that, then then when is SUSE likely to
release this via the software updater mechanism?
Comment 16 Anders Johansson 2007-01-03 16:23:11 UTC
I think it has something to do with window scaling. Try

echo "0" > /proc/sys/net/ipv4/tcp_window_scaling

and then try loading the site again. I can reproduce the problem with window scaling turned on, but for me it works when I disable it. I think these sites have problems fragmenting packets
Comment 17 Felix Miata 2007-01-03 16:43:25 UTC
comment 16 fix works for me
Comment 18 Tony Nelson 2007-01-03 16:54:14 UTC
(In reply to comment #16)
> I think it has something to do with window scaling. Try
> 
> echo "0" > /proc/sys/net/ipv4/tcp_window_scaling
> 
> and then try loading the site again. I can reproduce the problem with window
> scaling turned on, but for me it works when I disable it. I think these sites
> have problems fragmenting packets
> 

Works for me too!  Thanks.  Hope the fix makes it into an update soon.
Comment 19 Forgotten User x7Ms9HcTTQ 2007-01-04 04:55:12 UTC
(In reply to comment #16)
> I think it has something to do with window scaling. Try
> 
> echo "0" > /proc/sys/net/ipv4/tcp_window_scaling
> 
> and then try loading the site again. I can reproduce the problem with window
> scaling turned on, but for me it works when I disable it. I think these sites
> have problems fragmenting packets
> 

Well someone may have problems with fragmenting packets, but its not THESE sites.  Its clearly the SUSE kernel.  

10.1 had Window scaling turned on, and had no trouble loading marymount.edu.
12.2 with the same setting fails. 

So setting scaling off merely masks the problem. Its not a solution.

Comment 20 Anders Johansson 2007-01-04 16:15:09 UTC
s/masks/shows/

I never said disabling tcp window scaling was a solution, but it does point to where the problem is.

These sites must have problems with TCP fragmentation, and I suspect the reason why it isn't an issue with other versions is that the default window is too small. A LAN trace shows that the remote site never responds, probably because the fragmented packets get caught somewhere. My guess is that on the versions which work, the initial window size is larger, allowing these sites to respond. The solution in this case would be to fix the remote sites

It looks like no matter what I set tcp_rmem to, the window size is always down around 5K on 10.2, which to me looks like a bug as well, so a fix for this would also be a part of the solution
Comment 21 Karsten Keil 2007-01-04 19:53:57 UTC
In my analysis it seems the these sites does not handle Window scaling correctly. The big difference between 10.1 and 10.2 kernel is, that the default /proc/sys/net/ipv4/tcp_rmem [2] sizes are much larger 10.1 has
131072  10.2 4194304 which result in a shift.cnt of 2 in 10.1, 7 in 10.2.

For me marymount.edu works if I do a
echo "4096 16384 131072" > /proc/sys/net/ipv4/tcp_rmem

In leave /proc/sys/net/ipv4/tcp_window_scaling on.

I do not think, that this is a bug in 10.2.
Comment 22 Forgotten User x7Ms9HcTTQ 2007-01-05 04:05:45 UTC
>In my analysis it seems the these sites does not handle Window scaling
>correctly.
>I do not think, that this is a bug in 10.2.

That might be an approach to take if you were Microsoft, but a small distro like openSUSE can not stand up on its hind legs and declare itself right and Solaris wrong, and to hell with what anyone else thinks.

Every other Linux distro can paint that site with no problem.  

Comment 23 Karsten Keil 2007-01-05 13:54:13 UTC
That I'm think that it is no bug don't say that we do not need a workaround. But in general, higher window offers via window scaling option is not a brand new feature, the RFC is from 1992. If the site don't support it, it should not offer the window scaling option in its SACK. That the problem was hidden up to now was pure luck because we never allow so high windows before, the higher window should help for high performance in the local network and so I think
it's a good default, all newer kernel have it. Without knowing what exactely cause the problem I don't think that I can convince the netdevel people to change the current default.

I see the same problem with vanilla 2.6.19 and the config from comment #12 so I do not know why it did work on Felix machine, maybe some other things influence the setup.
Felix how much memory do you have in this machine ?
Can you retest with your kernel from comment 12 and if it still works do a tcpdump ? (e.g. tcpdump -i eth0 -vv -n -l host www.marymount.edu |tee mary.log). And please post also the tcp_rmem settings (cat /proc/sys/net/ipv4/tcp_rmem).
Comment 24 Olaf Kirch 2007-01-05 14:21:15 UTC
John, this is a problem we can do little about. This is neither a bug
in our stack nor in the destination system's stack - I assume there's
a connection tracking router/filter at the remote site that does not
understand window scaling. This problem has been cropping up for quite some
time on the netdev mailing list and in other places. Essentially what happens
is that the router happily passes all TCP options during the SYN handshake,
including the options announcing window scaling. However, the router doesn't
understand window scaling, so when it sees the Linux client announcing a
window of "47" (which is really (47 << some scaling factor), it will discard
any packets that are not fully inside that 47 byte window.

When the stack announces different windows which (scaled or unscaled) are
large enough, you will not see any connection hangs (but the download may
be rather slow).

The fact this issue didn't show up with 10.1 is more of a coincidence,
I suspect - the window size announced by the stack depends on lots of
factors, including the rmem values and your amount of physical RAM.

There's really nothing we can do about this, short of turning off window
scaling globally for all peers. The only thing you can do is turn off
window scaling locally.

(Note - I think this should be put into an article somewhere on the wiki)
Comment 25 patrick shanahan 2007-01-05 14:34:11 UTC
NOTE that this issue *is* present in 10.1 as I reported earlier.

09:29 wahoo:~ > cat /proc/sys/net/ipv4/tcp_rmem 
4096    87380   4194304

2.6.18.5-jen40-default #1 SMP x86_64 

09:30 wahoo:~ > cat /etc/SuSE-release |head -1
SUSE LINUX 10.1 (x86_64)
Comment 26 Felix Miata 2007-01-05 14:36:11 UTC
Created attachment 111644 [details]
tcpdump -i eth0 -vv -n -l host www.marymount.edu

Still works in Firefox, though it was slow to load the first time today. furniture 33 & keh loaded right up.

512M RAM

tcp_rmem 4096 87380 131072

IIRC, I configured this box with ipv6 disabled on installation.

Someone needs to test to see if doz Vista has this problem.
Comment 27 Hartmut Meyer 2007-01-05 14:42:27 UTC
Slawek Ligus has just volunteered to give it a shot (create a knowledge base article in the opensuse.org wiki). Sometime next week ...

Thank you Slawek!
Comment 28 Felix Miata 2007-01-05 15:00:31 UTC
(In reply to comment #22)
> Every other Linux distro can paint that site with no problem.  
 
Fedora 6 with 2.6.18-1 kernel hangs on marymount keh & furniture33 as well.
Comment 29 Lars Marowsky-Bree 2007-01-05 15:54:57 UTC
This doesn't seem to be a kernel bug. Proposal: Invalid.

Karsten, Olaf, is there any workaround you wish to implement in the kernel? Or does it require firewall rules to be specified/fixed?
Comment 30 Felix Miata 2007-01-05 16:01:36 UTC
This may be invalid as a kernel bug, but people certainly need to be able to access the same web sites windoz users can access, and so needs a solution from SUSE.
Comment 31 Hartmut Meyer 2007-01-05 16:29:04 UTC
SDB article is available now:

http://en.opensuse.org/SDB:Problem_with_establishing_TCP/IP_connection_in_openSUSE_10.2
Comment 32 Karsten Keil 2007-01-05 17:36:38 UTC
Thanks Slawek for writing the first version, but it seems that he refer to the wrong description. It is not related to TCP fragmentation. Olaf gave the correct description in comment #24.

And here is a second workaround possible with 10.2 which does not limit the
window to 64K globally, you can add a special route for the problematic sites:
ip route add <ipaddress>/32 via <your default gateway> window 65535

ipaddress is the address of the affected site.
Comment 33 Slawek Ligus 2007-01-05 19:28:21 UTC
Hi everyone,

Karsten, thanks for pointing me to that. 
I've just updated the article.
Comment 34 patrick shanahan 2007-01-05 20:08:20 UTC
(In reply to comment #32)
> Thanks Slawek for writing the first version, but it seems that he refer to the
> wrong description. It is not related to TCP fragmentation. Olaf gave the
> correct description in comment #24.
> 
> And here is a second workaround possible with 10.2 which does not limit the
> window to 64K globally, you can add a special route for the problematic sites:
> ip route add <ipaddress>/32 via <your default gateway> window 65535
> 
> ipaddress is the address of the affected site.
> 

Note that this also affects 10.1.....
Comment 35 Lars Marowsky-Bree 2007-01-05 21:53:14 UTC
Re-assigning, as kernel-maintainers can't do anything about it ;-)
Comment 36 Greg Kroah-Hartman 2007-01-12 04:31:08 UTC
*** Bug 227090 has been marked as a duplicate of this bug. ***
Comment 37 Hartmut Meyer 2007-01-12 07:57:55 UTC
If no one objects, I will close this bug. The problem (and its solution/workaround) is described in

  http://en.opensuse.org/SDB:Problem_with_establishing_TCP/IP_connection_in_openSUSE_10.2
Comment 38 Felix Miata 2007-01-13 18:51:57 UTC
"Situation" at that URL needs fixup. It's not even close to just a Firefox problem. Wget and other web browsers suffer too.

I have my doubts that most users will recognize the problem and seek out the workarounds. IMO this bug should remain open until something better than obtuse workarounds materializes. Has anyone looked to see if Fedora is doing anything about this, and if so, what?
Comment 39 Hartmut Meyer 2007-01-28 08:51:40 UTC
Closing, because the documentation of the issue (incl. workarounds) is the only thing possible right now.

Please re-open if you find a real solution.
Comment 40 Felix Miata 2007-01-28 11:44:10 UTC
http://en.opensuse.org/SDB:Problem_with_establishing_TCP/IP_connection_in_openSUSE_10.2 has not yet been fixed to say this is a general problem rather than a Firefox problem. People who can't reach such pages with SeaMonkey or Konq or Opera or Epiphany or Curl or Wget won't find a workaround that claims to be a Firefox problem.
Comment 41 Hartmut Meyer 2007-01-28 15:42:00 UTC
I have slightly reworded the text in the knowledge base article:

It now reads "e. g. by accessing a particular website with a web browser" instead of "e. g. by accessing a particular website with firefox"

Have adjusted the English and the German version of the article.

However, I think the article was already fine before since it's scope is much wider than just concerning web browsers (or firefox for that matter). That's why it said "Establishing a connection, e. g. by accessing a particular website ...". Have a look at the title of the article. It is "Problem with establishing TCP/IP connection in openSUSE 10.2"
Comment 42 Felix Miata 2007-01-29 06:46:24 UTC
Replacing "Firefox" with "web browser" helped, but there's still room for improvement. It's currently written in Geek for Geeks. Normal web users' browsers "can't reach page" or "can't load page", not "can't establish connection". "waiting for www.marymount.edu" means a connection has been been initiated, but that it never changes means that existent connection is somehow defective.

The grammar is poor too. If I was the author, I would replace "Establishing a connection, e. g. by accessing a particular website with a web browser, can be failed in some cases on the default  installation of openSUSE 10.2." with "Establishing a connection, e. g. by trying to reach a particular website with a web browser, can in some cases fail, using the default configuration of openSUSE 10.2, and other Linux distributions using 2.6.18 or newer kernels."

The current reading without the last clause above that I would include might lead one to believe this problem is exclusive to SUSE.
Comment 43 Hartmut Meyer 2007-01-29 11:56:28 UTC
Felix: please improve the article if you can. It's a wiki!. You need the same login/password as for bugzilla ...
Comment 44 Karsten Keil 2007-02-08 17:34:33 UTC
*** Bug 237421 has been marked as a duplicate of this bug. ***
Comment 45 Scott Couston 2008-03-27 03:47:23 UTC
I cannot nor have ever seen or been able to replicate this issue with old comms Hardware and Old Versions of OpenSuse
I am running my home/office with a D-Link UTM DFL-260 with a D-Link DSL 502-T ADSL 2/+ Modem/Router and a DES-1016D Switch and 18 PC's OpenSuse version 10.0 to now 10.3_64
I have just finished an install using New Hardware for a client, DFL-860 UTM and 6 x DKVM-8E KVM switches combined SUSE_64 and Windows XP_64 PC and a DSL-2740B Triple Play Modem/Router.
I don't want to sound like a walking advertisement, however as there is been no major constant with the above issue, in respect to comms hardware, I think here lies the issue. Some user reply not having an issue, however in respect to all other considerations to the bug, I think we must all state the comms hardware we are seeing this issue reliably demonstrated in AND their ISP. I mention the ISP as I have had major issues with at least 2 ISP's in THEIR ability to resolve any DNS if you are reliant in this respect.