|
Bugzilla – Full Text Bug Listing |
| Summary: | y2base occasionally freezes during install due to bug exposed by glibc: SR#295007 | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Dominique Leuenberger <dimstar> |
| Component: | Installation | Assignee: | Martin Vidner <mvidner> |
| Status: | RESOLVED FIXED | QA Contact: | Jiri Srain <jsrain> |
| Severity: | Normal | ||
| Priority: | P1 - Urgent | CC: | dimstar, mgorman, mpluskal, mvidner, schwab |
| Version: | 201503* | Flags: | mvidner:
needinfo?
(mgorman) |
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Dominique Leuenberger
2015-05-06 06:26:19 UTC
Note that this probably indicates that the installer uses uninitialised memory. The primary impact this patch has is that some new allocations that were filled with zeros now contain uninitialised data. In older versions of glibc the application would work by co-incidence. Reverting the patch avoids the problem temporarily but it'll recur when glibc 2.22 is released if it's used by openSUSE. One way of testing would be to force the installer to globally set MALLOC_CHECK_=2 during installation and see does that "fix" it. I don't know how to setup a temporary installation environment like that but some of the yast people should. Marcus, I see you assigned this to Andreas but did you see comment 2 where it was stated that this is very likely to be a bug in the installer using uninitialised memory? (In reply to Mel Gorman from comment #2) > Marcus, I see you assigned this to Andreas but did you see comment 2 where > it was stated that this is very likely to be a bug in the installer using > uninitialised memory? Or rpm - or any of the rpm scriptlets running code. or libzypp, or [...] In the various tests I'd seen, the lockup was not always in the same package(s). (In reply to Dominique Leuenberger from comment #3) > (In reply to Mel Gorman from comment #2) > > Marcus, I see you assigned this to Andreas but did you see comment 2 where > > it was stated that this is very likely to be a bug in the installer using > > uninitialised memory? > > Or rpm - or any of the rpm scriptlets running code. or libzypp, or [...] > > In the various tests I'd seen, the lockup was not always in the same > package(s). I think the installation scripts are a bad fit because we'd expect the same packages to freeze each time. It's also very likely that they are single-threaded which means they are unaffected by the glibc patch. rpm also feels like a bad fit because it's short-lived and I don't see calls to pthread_create in there. libzypp, zypper or the installer are better candidates because at least zypper is threaded and they're long-lived enough to eventually see an unluckly allocation pattern that gets uninitialised memory. I guessed the installer simply because zypper use on an installed system seems ok. Bugs due to uninitialised memory are not a bug in glibc though so the assignee still is inappropriate. Please file bug reports for every lockup you see and assign to the respective maintainer. The installer is the most likely component to be locking up here but I don't know how to setup the appropriate test environment. I'm going to attempt a reassign and see if the maintainers respond. To be clear, based on previous tests I believe that the installer is using uninitialised memory and getting confused. If it's not fixed now, it'll just be a problem later when glibc is next updated. > One way of testing would be to force the installer to globally set MALLOC_CHECK_=2 during installation and see does that "fix" it. I don't know how to setup a temporary installation environment like that but some of the yast people should.
Sure :-)
Simply use a boot parameter MALLOC_CHECK_=2 and the installer will export it to the environment, producing the desired result. It seems even PID 1 has it.
(In reply to Martin Vidner from comment #7) > > One way of testing would be to force the installer to globally set MALLOC_CHECK_=2 during installation and see does that "fix" it. I don't know how to setup a temporary installation environment like that but some of the yast people should. > > Sure :-) > > Simply use a boot parameter MALLOC_CHECK_=2 and the installer will export it > to the environment, producing the desired result. It seems even PID 1 has it. All righty Martin, thanks. Dominique, I know these are dumb questions but I never deal with the installer and just want to push this along so we don't get burned in the future when glibc updates again. Is there still an ISO image available that freezes during install? I can at least download it and see if MALLOC_CHECK_=2 "fixes" it. That would at least indicate that something in the installer has an uninitialised memory bug. @Mel, The link in the original comment to openQA also allows you to get the ISO file used for the task. https://openqa.opensuse.org/tests/60367 => https://openqa.opensuse.org/tests/60367/asset/3037 The difficulty in finding the root cause will likely be that it's not forcibly the yast installer failing, but it could as well be RPM (as we spawn rpm ever so often), zypp/libzypp, or any of the rpm scriptlets commands that might possily cause this. (In reply to Dominique Leuenberger from comment #9) > @Mel, > > The link in the original comment to openQA also allows you to get the ISO > file used for the task. > > https://openqa.opensuse.org/tests/60367 => > https://openqa.opensuse.org/tests/60367/asset/3037 > Well, I get a duh prize. I used to ISO and KVM to reproduce this. 1 in 5 installations appear to fail with a freeze where the UI ceases to interact -- X pointer works, no text can be selected and the UI cannot be interacted with. Terminal switching still works and using that I checked what was active. There were no RPM scripts active or any portion of rpm. tar existed as a zombie process that was a child of y2base. Even if they were the problem with packages, the UI would not freeze and besides, it would always be the same package that froze. The window manager is not threaded so that's not likely to be the problem. What appears to be frozen is y2base. I'll now test with MALLOC_CHECK_=2 and see does it freeze but right now, y2base appears to be the primary candidate as the problem. Martin, would you be able to or identify someone on the yast team that could run the installer through valgrind to see if it spits out any warnings about uninitialised memory use and debug it? Ideally it would be with the devel version of glibc but it's not strictly necessary as uninitinialised memory use is unconditionally a bug regardless of system libraries used. (In reply to Mel Gorman from comment #10) > (In reply to Dominique Leuenberger from comment #9) > > @Mel, > > > > The link in the original comment to openQA also allows you to get the ISO > > file used for the task. > > > > https://openqa.opensuse.org/tests/60367 => > > https://openqa.opensuse.org/tests/60367/asset/3037 > > > > <SNIP> > I used to ISO and KVM to reproduce this. 1 in 5 installations appear to fail > with a freeze where the UI ceases to interact -- X pointer works, no text > can be selected and the UI cannot be interacted with. Terminal switching > still works and using that I checked what was active. > > I'll now test with MALLOC_CHECK_=2 and see does it freeze I successfully installed 10 times without freezes with MALLOC_CHECK_=2 specified as a boot parameter. At this point, it really looks like y2base is the source. Based on the experiences with llvm regression suites, I also suspect it's due to an uninitialised memory bug. I updated the bug title accordingly. Martin, any thoughts? I will test the installation with valgrind myself. The test is still running, and it has found some bugs but I guess they are pretty harmless. The TUmbleweed repo doesn't have that glibc patch though. (In reply to Martin Vidner from comment #14) > The test is still running, and it has found some bugs but I guess they are > pretty harmless. The TUmbleweed repo doesn't have that glibc patch though. Anything resembling an uninitialised memory usage bug or a use-after-free bug could cause problems with the newer version of glibc. It's not in Tumbleweed because it was backed out due to the installer occasionally freezing. The devel project still has the updates though and it builds cleanly against factory https://build.opensuse.org/package/show/Base:System/glibc . I have used https://github.com/openSUSE/mksusecd to make an installation ISO with the new glibc, but I still cannot reproduce the problem. I have used kvm on x86_64, first with a single cpu, then with "-smp 2". I have used MALLOC_CHECK_=3 and run y2base under valgrind. It has uncovered problems that I reported in bug 932306, but they all seem minor and not related to the UI thread. (In reply to Martin Vidner from comment #16) > I have used https://github.com/openSUSE/mksusecd to make an installation ISO > with the new glibc, but I still cannot reproduce the problem. > Have you tried with the iso linked at https://openqa.opensuse.org/tests/60367/asset/3037? I was definitely able to stall that when installing under KVM. I was using a machine with 8 logical CPUs and the launch command qemu-kvm \ -cpu host \ -hda disk.img \ -drive file=3037.iso,media=cdrom \ -net nic,model=rtl8139 -net user,hostname=installcheck \ -m 1G \ -monitor stdio \ -name Installer \ "$@" It's not 100% reproducible. Only 1 in 5 installations failed. > I have used kvm on x86_64, first with a single cpu, then with "-smp 2". > I have used MALLOC_CHECK_=3 and run y2base under valgrind. It has uncovered > problems that I reported in bug 932306, but they all seem minor and not > related to the UI thread. There is an outside possibility that this was fixed since by accident. If you make the ISO you used available somewhere then I can try installing with it and see can I hit the problem. Thank you, Mel. But https://openqa.opensuse.org/tests/60367/asset/3037 seems to have expired. Do you have the image around? I am testing with https://w3.suse.de/~mvidner/glibc-bsc929806.iso which I run as qemu-kvm -m 4096 -smp 2 -cdrom ~/svn/mksusecd/glibc-bsc929806.iso scratch.qcow2 with the boot option VALGRIND=1 The CD was made with: ./mksusecd --verbose \ --create glibc-bsc929806.iso \ --micro \ --initrd ~/tmp/glibc-debuginfo-2.21-409.39.x86_64.rpm \ --initrd ~/tmp/glibc-2.21-409.39.x86_64.rpm \ --initrd ~/tmp/valgrind-3.10.1-2.1.x86_64.rpm \ --initrd ~/tmp/yast2-core-3.1.17-2.1.x86_64.rpm \ --initrd ~/tmp/yast2-core-debuginfo-3.1.17-2.1.x86_64.rpm \ --initrd ~/dl/yast2-storage-debuginfo-3.1.55-1.3.x86_64.rpm \ --initrd ~/dl/libstorage6-debuginfo-2.25.20-2.2.x86_64.rpm \ --initrd ~/dl/libstorage6-2.25.20-2.2.x86_64.rpm \ --initrd ~/dl/yast2-storage-3.1.55-1.3.x86_64.rpm \ --initrd ~/dl/ruby2.2-debuginfo-2.2.2-1.3.x86_64.rpm \ --initrd ~/dl/ruby2.2-stdlib-debuginfo-2.2.2-1.3.x86_64.rpm \ --initrd ~/dl/libruby2_2-2_2-2.2.2-1.3.x86_64.rpm \ --initrd ~/dl/ruby2.2-2.2.2-1.3.x86_64.rpm \ --initrd ~/dl/ruby2.2-stdlib-2.2.2-1.3.x86_64.rpm \ --initrd ~/dl/libruby2_2-2_2-debuginfo-2.2.2-1.3.x86_64.rpm \ --initrd ~/dl/libstorage-ruby-debuginfo-2.25.20-2.2.x86_64.rpm \ --initrd ~/dl/libstorage-ruby-2.25.20-2.2.x86_64.rpm \ --initrd ~/dl/gdb-7.9-2.1.x86_64.rpm \ --initrd initrd \ /dist/install/openSUSE-UNTESTED/openSUSE-Tumbleweed-NET-x86_64-Snapshot20150525-Media.iso (master) mvidner@mrakoplas:mksusecd$ diff -u initrd/usr/lib/YaST2/startup/YaST2.call{.orig,} --- initrd/usr/lib/YaST2/startup/YaST2.call.orig 2015-05-26 10:23:36.479179018 +0200 +++ initrd/usr/lib/YaST2/startup/YaST2.call 2015-05-27 10:07:59.813261210 +0200 @@ -307,8 +307,15 @@ log "\tUI_ARGS: $Y2_UI_ARGS" log "\tQT_IM_MODULE: $QT_IM_MODULE" + if [ "$VALGRIND" = 1 ]; then + VALGRIND="valgrind --leak-check=no --track-origins=yes \ + --time-stamp=yes \ + --main-stacksize=10000000 \ + --log-file=/tmp/valgrind" + fi + if [ "$Y2GDB" != "1" ]; then - $OPT_FBITERM y2base \ + $OPT_FBITERM $VALGRIND y2base \ "$Y2_MODULE_NAME" \ $Y2_MODE_FLAGS \ $Y2_MODULE_ARGS \ (In reply to Martin Vidner from comment #18) > Thank you, Mel. But https://openqa.opensuse.org/tests/60367/asset/3037 seems > to have expired. Do you have the image around? > I have a copy locally but it could take a few days to complete an upload due to limited upstream bandwidth. Does anyone cc'd have a copy on a machine in an office that they could make available? > I am testing with https://w3.suse.de/~mvidner/glibc-bsc929806.iso which I > run as > > qemu-kvm -m 4096 -smp 2 -cdrom ~/svn/mksusecd/glibc-bsc929806.iso > scratch.qcow2 > with the boot option VALGRIND=1 > I'm unable to reproduce the freeze with this ISO. Has anything changed in yast since about mid-April? It's possible it got accidentally fixed or worked around since the original glibc submission. Related to that, is the version of yast used the same as what it is in Factory? If so then it might be appropriate to try resubmit SR#295007. At worst, the same problem will recur but there will be a problematic ISO available. > I'm unable to reproduce the freeze with this ISO. Has anything changed in yast since about mid-April? Actually, yes, we have fixed some GCC warnings: https://github.com/yast/yast-core/pull/100 I *think* this should not change things related to uninitialized memory, but it seems best to retry the glibc submission. I am not sure how to do that since https://build.opensuse.org/request/show/295007 is marked as Accepted. Mel, can you resubmit glibc and then resolve this as Works For Me please? (In reply to Martin Vidner from comment #20) > > I'm unable to reproduce the freeze with this ISO. Has anything changed in yast since about mid-April? > > Actually, yes, we have fixed some GCC warnings: > https://github.com/yast/yast-core/pull/100 > I *think* this should not change things related to uninitialized memory, but > it seems best to retry the glibc submission. > I am not sure how to do that since > https://build.opensuse.org/request/show/295007 is marked as Accepted. As glibc was revertd post-accept, you will have create a new submitrequest: > osc sr Base:System glibc openSUSE:Factory -m "Let's retry to see what this brings" (In reply to Dominique Leuenberger from comment #21) > (In reply to Martin Vidner from comment #20) > > > I'm unable to reproduce the freeze with this ISO. Has anything changed in yast since about mid-April? > > > > Actually, yes, we have fixed some GCC warnings: > > https://github.com/yast/yast-core/pull/100 > > I *think* this should not change things related to uninitialized memory, but > > it seems best to retry the glibc submission. > > I am not sure how to do that since > > https://build.opensuse.org/request/show/295007 is marked as Accepted. > > As glibc was revertd post-accept, you will have create a new submitrequest: As there have been no changes to the Base:System glibc project since, I went ahead and created a new request 309677. Thanks. (In reply to Martin Vidner from comment #20) > Mel, can you resubmit glibc and then resolve this as Works For Me please? It's resubmitted but I did not close this as resolved until we see if the ISO created for openQA testing reproduces the problem or not. Status update: The new submission https://build.opensuse.org/request/show/309677 revealed a crash in mksquashfs. Mel has made a patch for that yesterday: https://sourceware.org/ml/libc-alpha/2015-06/msg00255.html which I don't see in our builds yet. (In reply to Martin Vidner from comment #24) > Mel has made a patch for that yesterday: > https://sourceware.org/ml/libc-alpha/2015-06/msg00255.html which I don't see > in our builds yet. It's not included in the builds yet because I need upstream to review and merge it before I can add it to Base:System/glibc. glibc has now been updated in Factory and the installer was fine. Closing this bug now. Thanks Martin for all your help on this. |