|
Bugzilla – Full Text Bug Listing |
| Summary: | Xvnc: memory corruption during vnc installation when using ppc as a client and ppc as a server | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.0 | Reporter: | Olaf Hering <ohering> |
| Component: | X.Org | Assignee: | Stefan Dirsch <sndirsch> |
| Status: | RESOLVED DUPLICATE | QA Contact: | E-mail List <xorg-maintainer-bugs> |
| Severity: | Normal | ||
| Priority: | P5 - None | CC: | dmueller, eich, mls |
| Version: | RC 1 | ||
| Target Milestone: | --- | ||
| Hardware: | PowerPC | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Found By: | Development | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
xorg-server-1.4-vnc.patch
Xvnc-valgrind.txt |
||
|
Description
Olaf Hering
2008-02-05 15:12:36 UTC
On which machine could I install and test Xvnc (STABLE)? pear.suse.de runs factory.
better backtrace from an inst-sys:
Program received signal SIGABRT, Aborted.
[Switching to Thread 0xf7ff4000 (LWP 2771)]
0x0f85dddc in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x0f85dddc in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x0f85f7d8 in *__GI_abort () at abort.c:88
#2 0x0f89c098 in __libc_message (do_abort=2, fmt=0xf965880 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#3 0x0f8a33c0 in malloc_printerr (action=3, str=0xf9659b4 "free(): invalid next size (normal)", ptr=<value optimized out>) at malloc.c:5891
#4 0x0f8a5878 in *__GI___libc_free (mem=0x105016c0) at malloc.c:3626
#5 0x1005db48 in rfbTranslateNone (pScreen=<value optimized out>, table=<value optimized out>, in=0x1047c244, out=<value optimized out>, optr=0x1050bb40 "",
bytesBetweenInputLines=<value optimized out>, width=2, height=273684236, x=0, y=3) at translate.c:183
#6 0x10057c78 in SendSubrect (cl=0x104d6278, x=0, y=3, w=2, h=38) at tight.c:557
#7 0x10059610 in SendRectSimple (cl=0x104d6278, x=0, y=3, w=2, h=38) at tight.c:533
#8 0x1005999c in rfbSendRectEncodingTight (cl=0x104d6278, x=0, y=3, w=2, h=38) at tight.c:340
#9 0x1004adc8 in rfbSendFramebufferUpdate (pScreen=0x10487588, cl=0x104d6278) at rfbserver.c:1588
#10 0x1003df60 in rfbDeferredUpdateCallback (timer=<value optimized out>, now=<value optimized out>, arg=<value optimized out>) at draw.c:1959
#11 0x103a1088 in DoTimer (timer=<value optimized out>, now=<value optimized out>, prev=<value optimized out>) at WaitFor.c:465
#12 0x103a1968 in WaitForSomething (pClientsReady=0xff826fe0) at WaitFor.c:294
#13 0x100729f8 in Dispatch () at dispatch.c:425
#14 0x1008a784 in main (argc=21, argv=0xff827734, envp=<value optimized out>) at main.c:452
(gdb)
Well, starting vncserver on pear.suse.de and connecting via vncviewer locally on pear.suse.de works for me. Connecting from a different machine via vncviewer didn't work for me for some reason. After disabling the firewall on pear.suse.de this works as well. Tried this from a x86_64 machine (shannon). Could you check if Xvnc has been started with option "-noreset"? See also Bug #351338. I thought this has been fixed for YaST2. yes, -noreset is there. Ok. Then I have no idea how to reproduce this without doing an installation with a more or less un-debuggable instsys. :-( Looking closer at the debug output. Something strange happens with the height/h argument. It changes from initially 38 in rfbSendRectEncodingTight() to 273684236 in rfbTranslateNone(). Then probably the malloc fails in rfbTranslateNone(). Check xorg-server-1.4-vnc.patch in xorg-x11-server package for VNC sources. Created attachment 193303 [details]
xorg-server-1.4-vnc.patch
For your convenience.
This is the code in question:
if ((x + truewidth > pVNC->width) || truewidth != width) {
unsigned char *buffer = malloc(truewidth * height * in->bitsPerPixel / \
8);
unsigned char *buf = buffer;
(*pScreen->GetImage)(pDraw, x, y, truewidth, height, ZPixmap, ~0, (char\
*)buf);
while (height--) {
memcpy(optr, buf, width * in->bitsPerPixel / 8);
optr += width * in->bitsPerPixel / 8;
buf += truewidth * in->bitsPerPixel / 8;
}
free(buffer);
It looks rather unsuspicious. The value of 'height' in gdb is the height at the time of the crash, It should be 0 because of the while loop.
I have no idea why it isn't.
The malloc checker complains because the header of the malloced area has been overwritten.
I don't see right off hand what might have caused this.
There are some things you could try:
1. comment out the GetImage() function call.
2. comment out the memcpy()
3. add an ErrorF(">> %p %p %i\n",optr, buf, height);
4. if possible also try buidling with '-O0'.
At present I have no good idea on how to exchange things on an inst-sys. I've done this once or twice in the distant past.
I will check if valgrind finds anything. the bogus h value is a gdb bug, dont worry. Thanks. So I've been searching into the wrong direction. :-( Created attachment 193395 [details]
Xvnc-valgrind.txt
valgrind did not catch it.
looks like we have to browse the source code.
Dirk, does the valgrind log show anything useful? sure, ==2823== Invalid read of size 1 ==2823== at 0xFFBB048: memcpy (in /usr/lib/valgrind/ppc32-linux/vgpreload_memcheck.so) ==2823== by 0x1005DB10: rfbTranslateNone (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10057C74: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10059548: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10059C64: rfbSendRectEncodingTight (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1004ADC4: rfbSendFramebufferUpdate (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1004CBD4: rfbProcessClientMessage (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10055578: rfbCheckFds (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10047618: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x100777C4: WakeupHandler (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x103A1430: WaitForSomething (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x100729F4: Dispatch (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== Address 0x7d38d20 is 0 bytes after a block of size 2,048 alloc'd ==2823== at 0xFFB9C54: malloc (in /usr/lib/valgrind/ppc32-linux/vgpreload_memcheck.so) ==2823== by 0x1005DAB0: rfbTranslateNone (in /mounts/mp_0001/usr/bin/Xvnc) means it reads out of bounds.. smells like an off-by-one somewhere this is where it causes heap corruption: ==2823== Invalid write of size 4 ==2823== at 0x102D165C: fbBlt (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x102D1918: fbBltStip (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x102D4704: fbGetImage (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x102F3A38: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1004F7B8: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1005DAE0: rfbTranslateNone (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10057C74: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1005960C: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10059C64: rfbSendRectEncodingTight (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10059D0C: rfbSendRectEncodingTight (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1004ADC4: rfbSendFramebufferUpdate (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1004CBD4: rfbProcessClientMessage (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== Address 0x7e2ece8 is 0 bytes after a block of size 1,960 alloc'd ==2823== at 0xFFB9C54: malloc (in /usr/lib/valgrind/ppc32-linux/vgpreload_memcheck.so) ==2823== by 0x1005DAB0: rfbTranslateNone (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10057C74: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1005960C: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10059C64: rfbSendRectEncodingTight (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10059D0C: rfbSendRectEncodingTight (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1004ADC4: rfbSendFramebufferUpdate (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x1004CBD4: rfbProcessClientMessage (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10055578: rfbCheckFds (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x10047618: (within /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x100777C4: WakeupHandler (in /mounts/mp_0001/usr/bin/Xvnc) ==2823== by 0x103A1430: WaitForSomething (in /mounts/mp_0001/usr/bin/Xvnc) where is the source for rfbTranslateNone ? Dirk, VNC code is in patch of comment #9. If it's more convenient you can also look Alan's xf4vnc CVS sources in ~sndirsch/local/src/xf4vnc/modular/src/xserver The patch has been generated on this base. don't have much time studying the source, but two things I find suspicious upon first look: (x + truewidth > pVNC->width) this condition could possibly be >= width or x + width >= width (*pScreen->GetImage)(pDraw, x, y, truewidth, height, ZPixmap, ~0, (char*)buf); it probably wants to use "width" here and not "truewidth". Thanks a lot for your input, Dirk! its only depth 16, everything works as expected with depth 8,15,24,32. Thanks for information. Still it seems it's only reproducable during installation. So I need a ppc machine, which I can reboot regularly for testing the installatin and on which this problem is reproducable. Please let me know, where I can find such a machine. I likely need it in my office I'm afraid for pressing the power button all the time. So it needs to be transportable. it happens also in the running system. gdb --quiet --readnow --ex 'b main' -ex 'r -noreset -inetd -once -query localhost -geometry 1024x768 -depth 16' /usr/bin/Xvnc (In reply to comment #23 from Olaf Hering) > it happens also in the running system. > > gdb --quiet --readnow --ex 'b main' -ex 'r -noreset -inetd -once -query > localhost -geometry 1024x768 -depth 16' /usr/bin/Xvnc # gdb --quiet --readnow --ex 'b main' -ex 'r -noreset -inetd -once -query localhost -geometry 1024x768 -depth 16' /usr/bin/Xvnc (no debugging symbols found) Breakpoint 1 at 0x1008a344 Starting program: /usr/bin/Xvnc -noreset -inetd -once -query localhost -geometry 1024x768 -depth 16 [Thread debugging using libthread_db enabled] [New Thread 0xf7fa9000 (LWP 19888)] [Switching to Thread 0xf7fa9000 (LWP 19888)] Breakpoint 1, 0x1008a344 in main () (gdb) And now what? How to connect to this VNC server? Pressing 'c' in gdb afterwards? Tried this instead: # Xvnc r -noreset -once -query localhost -geometry 1024x768 -depth 16 connected via "vncviewer pear". Works fine. I didn't use -inetd. This is the only difference. of course, that should have been something like: Xvnc -noreset -geometry 1024x768 -rfbport 5901 -rfbwait 120000 -depth 16 :42 its from bigendian to bigendian. I guess you see no garbage when connecting from little endian to big endian. a 11.0a1 (and later) vncviewer can connect to any vncserver (from sles10 to 11.0a2) an older vncviewer (from sles10 to 10.3) can not connect to an 11.0a1 (and later) vncserver did the protocol version change after 10.3? Looks like a missing byte swap somewhere (In reply to comment #25 from Olaf Hering) > of course, that should have been something like: > Xvnc -noreset -geometry 1024x768 -rfbport 5901 -rfbwait 120000 -depth 16 :42 port 5901 doesn't work. There's another VNC already running. Tried it instead with port 6000: pear:~ # Xvnc -noreset -geometry 1024x768 -rfbport 6000 -rfbwait 120000 \ -depth 16 :42 and connected with shannon(x86_64):~ # vncviewer pear::6000 Still works fine. (In reply to comment #26 from Olaf Hering) > its from bigendian to bigendian. I guess you see no garbage when connecting > from little endian to big endian. No garbage when connecting from x86_64(vncviewer) to ppc64(Xvnc). Maybe you're using a different VNC viewer? On which architecture do you run your VNC viewer? (In reply to comment #27 from Olaf Hering) > a 11.0a1 (and later) vncviewer can connect to any vncserver (from sles10 to > 11.0a2) > an older vncviewer (from sles10 to 10.3) can not connect to an 11.0a1 (and > later) vncserver > did the protocol version change after 10.3? Looks like a missing byte swap > somewhere On x86_64 the vncviewer of tightvnc of 10.1, 10.2, 10.3 and STABLE works fine. So on which machine do I need to start tightvnc to reproduce this issue? running vncviewer (logged in locally) on mac.suse.de to vncserver on mac.suse.de, no corruption running vncviewer (logged in via ssh) on mac.suse.de to vncserver on mac.suse.de, got corruption today (not yesterday) same on papaya, sles10, no corruption > running vncviewer (logged in via ssh) on mac.suse.de to vncserver on
> mac.suse.de,
I cannot reproduce this corruption.
Olaf, you need to show me this. :-) We don't get anywhere in Bugzilla. Ok. Thanks to Olaf I can finally reproduce this issue.
1) vncviewer weissichgradnich:1 (x86 machine)
2) login as root with session type twm
3) start "Xvnc -noreset -geometry 1024x768 -rfbport 6666 -rfbwait 120000 \
-depth 16 :42" on pear (ppc machine)
4) on weissichgradnich: vncviewer pear::6666 --> works (x86 --> ppc)
5) ssh to pear
6) on pear: vncviewer pear::6666 --> Xvnc crashes (ppc --> ppc)
So vncviewer needs to be started on a ppc machine connecting to Xvnc also running on a ppc machine (in this case the same machine). But this vncviewer needs to run on a Xvnc Server started on a x86 machine. Something like this. Weird setup I know. I'm not sure why the first step is needed. Could it be that it's not possible to connect with a BE and LE client at the same time because this doesn't get negitated per connection but per server instance? It could be that a per connection implementation is intended but not fully carried thru. Egbert, the two vncviewer involved are connected to *different* Xvnc servers. I couldn't find an easier setup to reproduce this issue. :-( Stefan,
> 4) on weissichgradnich: vncviewer pear::6666 --> works (x86 --> ppc)
> 5) ssh to pear
> 6) on pear: vncviewer pear::6666 --> Xvnc crashes (ppc --> ppc)
To me 4 and 6 seem to connect to the same Xvnc server. What seems to happen here, is that the vnc viewer in 4 is running inside the vncviewer started in 1 (on what machine?).
Which server crahses by the way, the one on pear or the one on 'weissichgradnich'?
(In reply to comment #38 from Egbert Eich) > Stefan, > > 4) on weissichgradnich: vncviewer pear::6666 --> works (x86 --> ppc) > > 5) ssh to pear > > 6) on pear: vncviewer pear::6666 --> Xvnc crashes (ppc --> ppc) > > To me 4 and 6 seem to connect to the same Xvnc server. Yes, but I never tried to run them simultaneously. I only wanted to show, that vncviewer started on weissichgradnich does not crash the Xvnc on pear. > What seems to happen here, is that the vnc viewer in 4 is running inside > the vncviewer started in 1 (on what machine?). Yes, that's correct. The vncviewer started in 1 is running on shannon (x86_64). > Which server crahses by the way, the one on pear or the one on > 'weissichgradnich'? The Xvnc on pear is crashing. So the crash would also happen without 4)? so 6) has noting to do with the previous steps. 6 alone should make things crash, right? The only condition is that the pear -> pear (BE->BE) vncviwer is viewed itself on an LE machine, right? (In reply to comment #40 from Egbert Eich) > So the crash would also happen without 4)? Yes. > so 6) has noting to do with the previous steps. 6 alone should make things > crash, right? The only condition is that the pear -> pear (BE->BE) vncviwer > is viewed itself on an LE machine, right? On a Xvnc on a LE machine apparently. Otherwise I would have been able to reproduce the crash also by ssh'ing directly from shannon (LE/x86_64) to pear. You only run into this issue, if Xvnc width on pear >= Xnvc width on weissichgradnich In this case both have 1024. This explains why we need this indirection to reproduce this issue. Olaf, you use too much indirections. And in the end you no longer know, what you're trying to do. :-) So the setup has become easier now. Xvnc -geometry 1024x768 -depth 16 :42 & DISPLAY=:42 twm & vncviewer :42 In this Xsession: ssh pear Xvnc -geometry 1024x768 -depth 16 :42 & vncviewer :42 Due to the strange setup, which is required to reproduce this issue --> MINIMAL. its more the other way around. Olaf, you didn't understand me. In order to reproduce this issue you must be running vncviewer on a Xvnc, which is not bigger than the Xvnc on the machine, you want to install (usually it isn't fun to scroll around in vncviewer when doing an installation, right?). And even this is not enough. You need to login via ssh on the machine, you want to install first (which doesn't make sense for a vnc installation; ssh is not enabled by default when vnc is enabled AFAIK) to crash the Xvnc on the machine, you want to install. --- Summary: ... when using a strange setup +++ Summary: ... when using ppc as a client and ppc as a server I still do not believe this. I'm unable to do vnc installs from my workstation, unless I run vncviewer from an x86 host BTW, it would be good to have at least one ppc machine in our X team ... this does appearently work with the current 11.0rc1 inst-sys. It was still broken a few days ago, with 11.0beta snapshots. I can connect to rc1 with a 11.0 and a 10.3 client. Maybe its fixed, maybe something in my setup changed. closing Unfortunately, its still broken. If the client is displayed on a 10.3-ppc, the Xvnc in the inst-sys will crash I logged into a 11.0rc1 from a 10.3-ppc and ran vncviewer host:1 note from Marius while debugging something else:
I think it happens with vncviewer 10.3-i386 and inst-sys 11.0-x86_64.
Or some other combination of vncviewer {10.3,11.0}-{i386,x86_64} and inst-sys {10.3,11.0}-{i386,x86_64}
*** Bug 389386 has been marked as a duplicate of this bug. *** *** Bug 389386 has been marked as a duplicate of this bug. *** Could well be a color depth issue on the client. See Bug #389386, comment #46. The code in rfbTranslateNone assumes 32bit bpp, all other bpps are broken. Just look at the "truewidth" calculation, the "/ 4" is obviously bogus. I've attached a new version of the vnc patch to bug #389386. *** This bug has been marked as a duplicate of bug 389386 *** |