Bugzilla – Bug 1018262
Installation failure "cpio: rename" PowerPC multipath openQA test
Last modified: 2022-02-13 23:58:28 UTC
Created attachment 708670 [details] install_and_reboot-y2logs.tar.bz2 This bug is created as follow-up of previous boo#1009472 to continue investigation of same error after worker update. I am using the same Summary: Installation failure "cpio: rename" PowerPC multipath openQA test As said below I need help to continue investigation of this problem. [Build 20170104] openQA test fails in install_and_reboot ## Observation openQA test in scenario opensuse-Tumbleweed-DVD-ppc64le-install_only_ppc@ppc64le-multipath fails in [install_and_reboot](http://openqa.opensuse.org/tests/329391/modules/install_and_reboot/steps/21) ## Reproducible Fails since (at least) Build [20161110](http://openqa.opensuse.org/tests/303570) ## Expected result Last good: [20161107](http://openqa.opensuse.org/tests/303068) (or more recent) ## Further details Always latest result in this scenario: [latest](http://openqa.opensuse.org/tests/latest?flavor=DVD&arch=ppc64le&version=Tumbleweed&test=install_only_ppc&distri=opensuse&machine=ppc64le-multipath) I am appending below the same status from https://bugzilla.suse.com/show_bug.cgi?id=1009472#c15 I need suggestion to continue investigation as per following status. Current_Status: * The failure is specific to disk multipath test and btrfs for TW PowerPC the reported error in y2log is "cpio: rename" error * No failure for Leap 42.2 * Unable to recreate the failure without openQA env. * Not same failure in ext4 FS in place of btrfs. * The error reported by Yast is any package installation failure and the y2log reports a "cpio: rename" error with no error number. * the "cpio: rename" string is related to error from fsmRename fct in lib/fsm.c: Reported by rpm via the zypp traces from libzypp (for ExternalProgram.cc, Exception.cc, RpmDb.cc) the last error is reported by rpm psm.c rpmpsmUnpack fct as error from rpmPackageFilesInstall the related string from emsg (output of rpmfileStrerror) string "cpio: rename" is build in this rpmfileStrerror by decoding of RPMERR_RENAME_FAILED RC Summary of related source lines: === ./rpm-4.12.0.1/lib/psm.c:671: fsmrc = rpmPackageFilesInstall(psm->ts, psm->te, psm->files, === fsmrc = rpmPackageFilesInstall(psm->ts, psm->te, psm->files, psm, &failedFile); emsg = rpmfileStrerror(fsmrc); rpmlog(RPMLOG_ERR, _("unpacking of archive failed%s%s: %s\n"), (failedFile != NULL ? _(" on file ") : ""), (failedFile != NULL ? failedFile : ""), emsg); === ./rpm-4.12.0.1/lib/rpmfi.c:2111:char * rpmfileStrerror(int rc) ./rpm-4.12.0.1/lib/fsm.c:535: static int fsmRename(const char *opath, const char *path) ./rpm-4.12.0.1/lib/rpmarchive.h RPMERR_RENAME_FAILED = -32774, ===
*** Bug 1009472 has been marked as a duplicate of this bug. ***
Looks more like a kernel btrfs problem. There have been such cases in the past, e.g. bug #950178 and bug #963020.
two testcases (not multipath tests) previously set with default HDDMODEL=virtio-blk and forced temporarily with HDDMODEL=scsi-hd are reporting similar problem. So source of the problem is not only btrfs but also scsi-hd DD. === https://openqa.opensuse.org/tests/380110 https://openqa.opensuse.org/tests/380111 === 2017-03-31 14:41:21 <1> install(3004) [zypp++] ExternalProgram.cc(start_program):249 Executing 'rpm' '--root' '/mnt' '--dbpath' '/var/lib/rpm' '-U' '--percent' '--noglob' '--force' '--nodeps' '--' '/mnt/var/cache/zypp/packages/openSUSE-Tumbleweed-20170322-0/suse/ppc64le/perl-5.24.0-5.53.ppc64le.rpm' 2017-03-31 14:41:21 <1> install(3004) [zypp++] ExternalProgram.cc(start_program):412 pid 5357 launched 2017-03-31 14:41:22 <1> install(3004) [zypp++] ExternalProgram.cc(checkStatus):506 Pid 5357 exited with status 1 2017-03-31 14:41:22 <5> install(3004) [zypp] Exception.cc(log):137 RpmDb.cc(doInstallPackage):2043 THROW: Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/lib/perl5/5.24.0/unicore/lib/InSC/Cantilla.pl: cpio: rename 2017-03-31 14:41:22 <5> install(3004) [zypp] Exception.cc(log):137 error: perl-5.24.0-5.53.ppc64le: install failed 2017-03-31 14:41:22 <5> install(3004) [zypp] Exception.cc(log):137 ===
I see the same again in the same scenario but not in every job. Only about 1/10 runs recently. In before it happened reproducibly in (nearly) every run.
Latest y2log shows: ``` 2017-05-25 14:12:56 <1> install(3312) [zypp] RpmDb.cc(doInstallPackage):1928 RpmDb::installPackage(/mnt/var/cache/zypp/packages/openSUSE-20170524-0/suse/noarch/kbd-legacy-2.0.3-4.1.noarch.rpm,0x0000000c) 2017-05-25 14:12:56 <1> install(3312) [zypp++] ExternalProgram.cc(start_program):249 Executing 'rpm' '--root' '/mnt' '--dbpath' '/var/lib/rpm' '-U' '--percent' '--noglob' '--force' '--nodeps' '--' '/mnt/var/cache/zypp/packages/openSUSE-20170524-0/suse/noarch/kbd-legacy-2.0.3-4.1.noarch.rpm' 2017-05-25 14:12:56 <1> install(3312) [zypp++] ExternalProgram.cc(start_program):412 pid 4998 launched 2017-05-25 14:12:57 <1> install(3312) [zypp++] ExternalProgram.cc(checkStatus):506 Pid 4998 exited with status 1 2017-05-25 14:12:57 <5> install(3312) [zypp] Exception.cc(log):137 RpmDb.cc(doInstallPackage):2043 THROW: Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/kbd/keymaps/legacy/include/compose.latin3: cpio: rename 2017-05-25 14:12:57 <5> install(3312) [zypp] Exception.cc(log):137 error: kbd-legacy-2.0.3-4.1.noarch: install failed 2017-05-25 14:12:57 <5> install(3312) [zypp] Exception.cc(log):137 2017-05-25 14:12:57 <5> install(3312) [zypp] Exception.cc(log):137 2017-05-25 14:12:57 <1> install(3312) [Ruby] modules/PackageCallbacks.rb:422 DonePackage(error: 3, reason: 'Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/kbd/keymaps/legacy/include/compose.latin3: cpio: rename error: kbd-legacy-2.0.3-4.1.noarch: install failed ```
I now have the problem also on Leap 42.3 since Build0071 snapshot (1) with similar cpio rename reported error in y2log except that a non empty error code is reported: "cpio: rename failed - No space left on device" (I did not have any error on Leap 42.3 Build0054 (0)) By default the disk space is set to 10GB, If I do a trial with a 40GB then I still have the same reported error ! (2) I do not know if this new error code could help for investigation. (0) https://openqa.opensuse.org/tests/399191# Build0054: no failure kernel 4.4.62-1 disk 10GB (1) https://openqa.opensuse.org/tests/410912#step/install_and_reboot/21 Build0071: "cpio: rename failed - No space left on device" kernel 4.4.68-2 disk 10GB (2) https://openqa.opensuse.org/tests/411068#step/install_and_reboot/13 "cpio: rename failed - No space left on device" kernel 4.4.68-2 disk 40GB
to complet comment #12 now Leap 42.3 openQA 6 tests are failing with same error with Build0083: https://openqa.opensuse.org/tests/overview?groupid=30&version=42.3&build=0083&distri=opensuse I do not have access to https://bugzilla.suse.com/show_bug.cgi?id=1039504 But could it be a similar problem ?
bug 1039504 is closed as duplicate of bug 1040182 which is the same issue on SLE
to complet comment #12 and comment #13 if I continue on Leap 42.3 Build0083 doing a clone_job with FILESYSTEM=ext4 then I generate a job that do not fail. That confirm the cpio rename error is related to btrfs FS. === $/usr/share/openqa/script/clone_job.pl --from https://openqa.opensuse.org 417515 --host https://openqa.opensuse.org FILESYSTEM=ext4 --skip-download Created job #417938: opensuse-42.3-DVD-ppc64le-Build0083-minimalx@ppc64le === https://openqa.opensuse.org/tests/417515# <= btrfs failure https://openqa.opensuse.org/tests/417938# <= ext4 passed ===
I am changing the priority and severity because now Leap 42.3 openQA tests are failing for ppc64le arch with default btrfs FS as reported by comment #12 comment #13 comment #15 What need to be done to help to isolate and solve this btrfs problem ?
There is work underway to fix this bug. Unfortunately the bug is not reliably reproducible inside QA and is very hard to reproduce outside QA. So finding the bug may take some time. If you can provide a test case that reproduces the bug without running a full QA installation test that would be helpful. Also using such test to point out a particular kernel commit that causes the bug or makes it more prominent would be helpful.
(In reply to Michal Suchanek from comment #17) > There is work underway to fix this bug. > > Unfortunately the bug is not reliably reproducible inside QA and is very > hard to reproduce outside QA. So finding the bug may take some time. Well, it *is* reproducible within the openQA tests and therefore what I consider "inside QA". https://openqa.opensuse.org/tests/418998 is the latest example from yesterday and the logs explicitly show that it is the same error: ``` 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 RpmDb.cc(doInstallPackage):2043 THROW: Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/fonts/100dpi/courO14-ISO8859-10.pcf.gz: cpio: rename 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 error: xorg-x11-fonts-7.6-32.1.noarch: install failed 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 2017-06-10 21:45:02 <1> install(3321) [Ruby] modules/PackageCallbacks.rb:422 DonePackage(error: 3, reason: 'Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/fonts/100dpi/courO14-ISO8859-10.pcf.gz: cpio: rename error: xorg-x11-fonts-7.6-32.1.noarch: install failed ``` > If you can provide a test case that reproduces the bug without running a > full QA installation test that would be helpful. It might be possible to reproduce the same error by just repeatedly trying to install/uninstall a package using rpm. Other than this, what is the problem with the "full QA installation test"? Only other alternative I have in mind right now is running a specific subset of "xfstests" but I don't know which one would be feasible. @Michel Normand: Maybe you can try out to run xfstests in an environment similar to the one that fails here? > Also using such test to point out a particular kernel commit that causes the > bug or makes it more prominent would be helpful. In case no one did that yet I recommend to check the kernel version differences between the first failed and the last good and then look into the changelog to identify submit requests and commits correspondingly.
And about half of the tests succeed for recent builds and most of them for Build20170527. That is what I call not reliably reproducible.(In reply to Oliver Kurz from comment #18) > (In reply to Michal Suchanek from comment #17) > > There is work underway to fix this bug. > > > > Unfortunately the bug is not reliably reproducible inside QA and is very > > hard to reproduce outside QA. So finding the bug may take some time. > > Well, it *is* reproducible within the openQA tests and therefore what I > consider "inside QA". https://openqa.opensuse.org/tests/418998 is the latest > example from yesterday and the logs explicitly show that it is the same > error: > > ``` > 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 > RpmDb.cc(doInstallPackage):2043 THROW: Subprocess failed. Error: RPM > failed: error: unpacking of archive failed on file > /usr/share/fonts/100dpi/courO14-ISO8859-10.pcf.gz: cpio: rename > 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 error: > xorg-x11-fonts-7.6-32.1.noarch: install failed > 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 > 2017-06-10 21:45:02 <5> install(3321) [zypp] Exception.cc(log):137 > 2017-06-10 21:45:02 <1> install(3321) [Ruby] modules/PackageCallbacks.rb:422 > DonePackage(error: 3, reason: 'Subprocess failed. Error: RPM failed: error: > unpacking of archive failed on file > /usr/share/fonts/100dpi/courO14-ISO8859-10.pcf.gz: cpio: rename > error: xorg-x11-fonts-7.6-32.1.noarch: install failed > ``` And about half of the tests succeed for recent builds and most of them for Build20170527. That is what I call not reliably reproducible. > > > If you can provide a test case that reproduces the bug without running a > > full QA installation test that would be helpful. > > It might be possible to reproduce the same error by just repeatedly trying > to install/uninstall a package using rpm. Yes, it *might*. But nobody reproduced it that way so far. So if you have exact steps that lead to the error with reasonable probability go ahead and share them. > > Other than this, what is the problem with the "full QA installation test"? That it happens after a lengthy process on a virtual machine somewhere in QA which is trashed after the test rather than on a developer machine where the state of the system can be analyzed after the error. > Only other alternative I have in mind right now is running a specific subset > of "xfstests" but I don't know which one would be feasible. Or some tar or cpio benchmarks come to mind, yes.
FYIO, as a bypass I added in openQA a retry of packages install (1), retry that allow to complete the Leap 42.3 ppc64le Build0089. (1) https://openqa.opensuse.org/tests/421918#step/install_and_reboot/3
(In reply to Michel Normand from comment #20) > FYIO, as a bypass I added in openQA a retry of packages install (1), > retry that allow to complete the Leap 42.3 ppc64le Build0089. > > (1) https://openqa.opensuse.org/tests/421918#step/install_and_reboot/3 Similarly same bypass working also for TW last 20170615 snapshot (ppc64/ppc64le) https://openqa.opensuse.org/tests/422452#step/install_and_reboot/3 https://openqa.opensuse.org/tests/422451#step/install_and_reboot/3
(In reply to Michal Suchanek from comment #19) > And about half of the tests succeed for recent builds and most of them for > Build20170527. That is what I call not reliably reproducible. > ...[CUT]... With Last Leap 42.3 Build 0101 the failure is reproducible on trial as per two exemples (1) and (2). There were some btrfs disk capacity captured for similar bug #1039504 (I do not have access to this bug, could you add me in cc ?) as detailed in (3) Would that data capture is sufficient and if not, what need to be added ? Note that (1) and (2) are clone_job with increased HDDSIZEGB as per (4) (1) https://openqa.opensuse.org/tests/433628#step/install_and_reboot/6 (DVD) (2) https://openqa.opensuse.org/tests/433630#step/install_and_reboot/6 (NET) (3) https://github.com/os-autoinst/os-autoinst-distri-opensuse/commit/22add07cf40044352acf5e846e774bfb317248ba (4) ==== $/usr/share/openqa/script/clone_job.pl --from https://openqa.opensuse.org 433351 --host https://openqa.opensuse.org HDDSIZEGB=20 BETA=1 --skip-download Created job #433628: opensuse-42.3-DVD-ppc64le-Build0101-minimalx@ppc64le -> https://openqa.opensuse.org/t433628 === $/usr/share/openqa/script/clone_job.pl --from https://openqa.opensuse.org 433343 --host https://openqa.opensuse.org HDDSIZEGB=20 BETA=1 --skip-download Created job #433630: opensuse-42.3-NET-ppc64le-Build0101-minimalx@ppc64le -> https://openqa.opensuse.org/t433630 ===
as per https://bugzilla.suse.com/show_bug.cgi?id=1040182#c129 wait for related kernel patch (1) rebuild for Leap 42.3 not yet in iso Build0102 as per bad openQA result (2) (1) http://kernel.opensuse.org/cgit/kernel-source/commit/?h=openSUSE-42.3&id=8bf31dae2ad1a3c5471841801bc4f12233e3c2ec (2) https://openqa.opensuse.org/tests/435028#step/install_and_reboot/6
Seems this has not happened in past month so closing. There were fixes that went into the btrfs kernel driver to address this.
ok to close as not anymore failures in TW openQA runs Will check in next Leap 15 when available.
This is an autogenerated message for openQA integration by the openqa_review script: This bug is still referenced in a failing openQA test: mru-install-multipath-remote https://openqa.suse.de/tests/6145303 To prevent further reminder comments one of the following options should be followed: 1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted 2. The openQA job group is moved to "Released" 3. The label in the openQA scenario is removed
This is an autogenerated message for openQA integration by the openqa_review script: This bug is still referenced in a failing openQA test: create_hdd_tumbleweed_kde https://openqa.opensuse.org/tests/2134907 To prevent further reminder comments one of the following options should be followed: 1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted 2. The openQA job group is moved to "Released" or "EOL" (End-of-Life) 3. The bugref in the openQA scenario is removed or replaced, e.g. `label:wontfix:boo1234`
This is an autogenerated message for openQA integration by the openqa_review script: This bug is still referenced in a failing openQA test: offline_sles15sp1_ltss_media_basesys-srv-desk-dev-contm-lgm-py2-wsm_all_full_x11 https://openqa.suse.de/tests/8150203 To prevent further reminder comments one of the following options should be followed: 1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted 2. The openQA job group is moved to "Released" or "EOL" (End-of-Life) 3. The bugref in the openQA scenario is removed or replaced, e.g. `label:wontfix:boo1234`