Bug 1009472

Summary: Installation failure "cpio: rename" PowerPC multipath openQA test
Product: [openSUSE] openSUSE Tumbleweed Reporter: Michel Normand <normand>
Component: OtherAssignee: E-mail List <bnc-team-screening>
Status: RESOLVED DUPLICATE QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: coolo, forgotten_ny8t7SHjD_, igonzalezsosa, jreidinger, msuchanek, normand
Version: Current   
Target Milestone: ---   
Hardware: PowerPC-64   
OS: Other   
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: y2log.txt.gz
openqa_save_y2logs_failure.png

Description Michel Normand 2016-11-10 11:12:56 UTC
Created attachment 701436 [details]
y2log.txt.gz

Installation of package kbd-legacy failed for PowerPC multipath openQA test
as reported in (1)
The associated y2log extracted from there is attached to this bug.

There is no information in this log of why the install failed (2)


(1) https://openqa.opensuse.org/tests/303022#step/start_install/16
(2) extract of attached y2log:
===
2016-11-10 04:27:57 <1> install(3310) [zypp:fetcher++] Fetcher.cc(addCachePath):319 Adding fetcher cache: '/mnt/var/cache/zypp/packages/openSUSE-20161107-0'.
2016-11-10 04:27:57 <1> install(3310) [zypp:fetcher] Fetcher.cc(downloadAndReadIndexList):723 No indexes to read.
2016-11-10 04:27:57 <1> install(3310) [zypp:fetcher] Fetcher.cc(provideFromCache):350 start fetcher with 1 cache directories.
2016-11-10 04:27:57 <1> install(3310) [zypp:fetcher] Fetcher.cc(provideToDest):547 Not found in cache, downloading
2016-11-10 04:27:57 <1> install(3310) [zypp++] MediaSetAccess.cc(provide):203 Going to try to provide  file ./suse/noarch/kbd-legacy-2.0.3-1.1.noarch.rpm from media number 1
2016-11-10 04:27:57 <1> install(3310) [zypp++] MediaManager.cc(checkDesired):112 checkDesired(8): desired (cached)
2016-11-10 04:27:57 <1> install(3310) [zypp++] MediaManager.cc(checkDesired):112 checkDesired(8): desired (cached)
2016-11-10 04:27:57 <1> install(3310) [zypp++] MediaHandler.cc(provideFile):999 provideFile(./suse/noarch/kbd-legacy-2.0.3-1.1.noarch.rpm)
2016-11-10 04:27:57 <1> install(3310) [zypp++] MediaManager.cc(checkDesired):112 checkDesired(8): desired (cached)
2016-11-10 04:27:57 <1> install(3310) [zypp++] ExternalProgram.cc(start_program):249 Executing '/bin/cp' '--remove-destination' '--' '/var/adm/mount/AP_0xq76z9N/suse/noarch/kbd-legacy-2.0.3-1.1.noarch.rpm' '/mnt/var/cache/zypp/packages/openSUSE-20161107-0/suse/noarch/kbd-legacy-2.0.3-1.1.noarch.rpm'
2016-11-10 04:27:57 <1> install(3310) [zypp++] ExternalProgram.cc(start_program):412 pid 4827 launched
2016-11-10 04:27:57 <1> install(3310) [zypp++] ExternalProgram.cc(checkStatus):513 Pid 4827 successfully completed
2016-11-10 04:27:57 <1> install(3310) [zypp++] MediaSetAccess.cc(releaseFile):85 Going to release file ./suse/noarch/kbd-legacy-2.0.3-1.1.noarch.rpm from media number 1
2016-11-10 04:27:57 <1> install(3310) [zypp:fetcher] Fetcher.cc(validate):392 Checking job [/mnt/var/cache/zypp/packages/openSUSE-20161107-0/suse/noarch/kbd-legacy-2.0.3-1.1.noarch.rpm] (1 checkers )
2016-11-10 04:27:57 <1> install(3310) [Progress++] ProgressData.cc(report):88 {#85|}END
2016-11-10 04:27:57 <1> install(3310) [Ruby] modules/PackageSlideShow.rb:1225 pkg_name: kbd-legacy
2016-11-10 04:27:57 <1> install(3310) [Pkg] modules/PackageSlideShow.rb:1253 Pkg Builtin called: PkgInstalled
2016-11-10 04:27:57 <1> install(3310) [Pkg] Package.cc(searchPackage):582 Package 'kbd-legacy' installed: false
2016-11-10 04:27:57 <1> install(3310) [Pkg] modules/SlideShowCallbacks.rb:296 Pkg Builtin called: PkgDU
2016-11-10 04:27:57 <1> install(3310) [Pkg] modules/SlideShowCallbacks.rb:314 Pkg Builtin called: TargetAvailable
2016-11-10 04:27:57 <1> install(3310) [zypp++] ExternalProgram.cc(start_program):249 Executing 'rpm' '--root' '/mnt' '--dbpath' '/var/lib/rpm' '-U' '--percent' '--noglob' '--force' '--nodeps' '--' '/mnt/var/cache/zypp/packages/openSUSE-20161107-0/suse/noarch/kbd-legacy-2.0.3-1.1.noarch.rpm'
2016-11-10 04:27:57 <1> install(3310) [zypp++] ExternalProgram.cc(start_program):412 pid 4828 launched
2016-11-10 04:27:57 <1> install(3310) [zypp++] ExternalProgram.cc(checkStatus):506 Pid 4828 exited with status 1
2016-11-10 04:27:57 <5> install(3310) [zypp] Exception.cc(log):137 RpmDb.cc(doInstallPackage):2043 THROW:    Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/kbd/keymaps/legacy/include/compose.latin3: cpio: rename
2016-11-10 04:27:57 <5> install(3310) [zypp] Exception.cc(log):137 error: kbd-legacy-2.0.3-1.1.noarch: install failed
===
Comment 1 Michel Normand 2016-11-10 16:07:19 UTC
If doing a new trial on another openQA server:
* the install failing package is not the same:
  (one trial failed on perl-DateTime-Local install)
  (another one on manufacturer-PPDs install)
* There is no such failure if openQA job is started without MULTIPATH=1
=== failure:
/var/lib/openqa/script/client  jobs  post ISO=openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20161107-Media.iso DESKTOP=minimalx DISTRI=opensuse DVD=1 VERSION=Tumbleweed FLAVOR=DVD BUILD=20161107 INSTALLONLY=1 MACHINE=ppc64le-multipath MULTIPATH=1 QEMU=ppc64 QEMUCPU=host VGA=std WORKER_CLASS=qemu_ppc64le NOIMAGES=1 ARCH=ppc64le OFW=1 QEMUCPUS=8 QEMUTHREADS=8 TEST=install_minimalx Test=install_minimalx
=== no failure:
/var/lib/openqa/script/client  jobs  post ISO=openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20161107-Media.iso DESKTOP=minimalx DISTRI=opensuse DVD=1 VERSION=Tumbleweed FLAVOR=DVD BUILD=20161107 INSTALLONLY=1 MACHINE=ppc64le-multipath QEMU=ppc64 QEMUCPU=host VGA=std WORKER_CLASS=qemu_ppc64le NOIMAGES=1 ARCH=ppc64le OFW=1 QEMUCPUS=8 QEMUTHREADS=8 TEST=install_minimalx Test=install_minimalx
===
Comment 2 Michel Normand 2016-11-10 16:45:58 UTC
Note: Since a while the openQA server had an incorrect setup of ppc64le-multipath machine definition (1). That explain why this bug failure was not reported before for ppc64le.
This should appear on futur new test run on this server.

(1)
===
(10:56:06) michel_mno: There is an error in openQA database about ppc64le multipath machine http://paste.opensuse.org/view/raw/16432658    MULTIPATH1 is set, but should be MULTIPATH 
(10:56:06) michel_mno: the templates is correct in git, the database has to be modified (I do not have admin access)
(10:58:03) okurz: michel_mno: I updated the machine definition for ppc64le-multipath
===
https://openqa.opensuse.org/admin/machines
ppc64le-multipath	qemu	
MULTIPATH1=1  <= this has to be replaced by MULTIPATH=1
NOVIDEO=1
OFW=1
QEMUCPU=host
QEMUCPUS=8
QEMURAM=4096
QEMUTHREADS=8
WORKER_CLASS=qemu_ppc64le
===
Comment 3 Josef Reidinger 2016-11-11 13:33:35 UTC
Hmm, it is package failure during package unpackaing. Does multipath scenario have less disk space? My suspicious is some relation of btrfs disk usage and maybe multipath?
Comment 4 Forgotten User ny8t7SHjD_ 2016-11-14 08:58:44 UTC
It seems that we just need to rebuild a whole tree against recent rpm.. 

It was kbd-legacy, then kernel-firmware, now netpbm.. All are cured by rebuilding those packages. (I'm not certain about kbd-legacy though as Andreas did something to it.)

Anyway rebuild is in progress..
Comment 5 Michel Normand 2016-11-14 09:15:52 UTC
(In reply to Dinar Valeev from comment #4)
> It seems that we just need to rebuild a whole tree against recent rpm.. 
> 

Hi Dinar, I would be interested to understand what informations make you point to the need for such rebuild; could you give me some details ?
Comment 6 Michel Normand 2016-11-16 07:43:01 UTC
same error with Snapshot 20161115 after rebuild as reported in
https://openqa.opensuse.org/tests/305788
Comment 7 Michel Normand 2016-11-16 08:10:31 UTC
(In reply to Michel Normand from comment #6)
> same error with Snapshot 20161115 after rebuild as reported in
> https://openqa.opensuse.org/tests/305788

in this retry same error of rpm command, cpio rename.
===
2016-11-16 01:50:42 <1> install(3311) [zypp++] ExternalProgram.cc(start_program):249 Executing 'rpm' '--root' '/mnt' '--dbpath' '/var/lib/rpm' '-U' '--percent' '--noglob' '--force' '--nodeps' '--' '/mnt/var/cache/zypp/packages/openSUSE-20161115-0/suse/noarch/kbd-legacy-2.0.3-1.2.noarch.rpm'
2016-11-16 01:50:42 <1> install(3311) [zypp++] ExternalProgram.cc(start_program):412 pid 4862 launched
2016-11-16 01:50:42 <1> install(3311) [zypp++] ExternalProgram.cc(checkStatus):506 Pid 4862 exited with status 1
2016-11-16 01:50:42 <5> install(3311) [zypp] Exception.cc(log):137 RpmDb.cc(doInstallPackage):2043 THROW:    Subprocess failed. Error: RPM failed: error: unpacking of archive failed on file /usr/share/kbd/keymaps/legacy/include/compose.latin4: cpio: rename
===
Comment 8 Michel Normand 2016-11-16 11:03:33 UTC
Created attachment 702238 [details]
openqa_save_y2logs_failure.png

* in previous failing conditions default disk size is used (qcow2 10GB)
* on my local openqa server 
  (because I am not authorized to issue openQA client cmd on main instance)
  * if setup HDDSIZEGB=40 then still rpm cpio rename error.
  * if setup HDDSIZEGB=80 then no more rpm cpio rename error.
    so assumption of comment #3 by  Josef Reidinger is valid, and
    yast2 ultimately stuck with "Saving bootloader configuration"
    But openQA unable to capture y2log because save_y2log failed
    as reported by this openqa_save_y2logs_failure.png
Comment 9 Forgotten User ny8t7SHjD_ 2016-11-18 22:21:19 UTC
this test defaults to HDDSIZEGB=10

To exclude btrfs, I'd run the same test but with ext3/ext4 fs. There is definitely something wrong with disk in this setup.
Comment 10 Michel Normand 2016-11-21 14:15:19 UTC
(In reply to Dinar Valeev from comment #9)
> this test defaults to HDDSIZEGB=10
> 
> To exclude btrfs, I'd run the same test but with ext3/ext4 fs. There is
> definitely something wrong with disk in this setup.

the problem may be specific to multipath disk setup but only for TW,
because same test is not failing for Leap 42.2:

https://openqa.opensuse.org/tests/308190#  <= Leap 42.2 passed
https://openqa.opensuse.org/tests/308165   <= TW failed
Comment 11 Michel Normand 2016-11-22 06:07:12 UTC
(In reply to Michel Normand from comment #8)
> Created attachment 702238 [details]
> openqa_save_y2logs_failure.png
> 
> * in previous failing conditions default disk size is used (qcow2 10GB)
> * on my local openqa server 
>   (because I am not authorized to issue openQA client cmd on main instance)
>   * if setup HDDSIZEGB=40 then still rpm cpio rename error.
>   * if setup HDDSIZEGB=80 then no more rpm cpio rename error.
>     so assumption of comment #3 by  Josef Reidinger is valid, and
>     yast2 ultimately stuck with "Saving bootloader configuration"
>     But openQA unable to capture y2log because save_y2log failed
>     as reported by this openqa_save_y2logs_failure.png

The hung condition at "Saving bootloader configuration" is tracked by new bug
https://bugzilla.suse.com/show_bug.cgi?id=1011529
Comment 12 Imobach Gonzalez Sosa 2016-11-24 14:53:21 UTC
Hi Dinar,

What's expected from the YaST team? It looks a problem with Btrfs and/or multipath set-up. How can we help?
Comment 13 Michel Normand 2016-11-29 11:02:42 UTC
next step is for me to add debug traces to understand the problem around the cpio rename error (without qualification string as per log extract of comment #7) and why is it specific to multipath configuration (use of dev mapper)
Comment 14 Michel Normand 2016-12-01 09:05:53 UTC
investigation blocked by new bug
https://bugzilla.suse.com/show_bug.cgi?id=1012825
Comment 15 Michel Normand 2016-12-06 16:00:34 UTC
I need suggestion to continue investigation as per following status.

Current_Status:
* The failure is specific to disk multipath test and btrfs for TW PowerPC
  the reported error in y2log is "cpio: rename" error
* No failure for Leap 42.2
* Unable to recreate the failure without openQA env.                 
* Not same failure in ext4 FS in place of btrfs.
* The error reported by Yast is any package installation failure
  and the y2log reports a "cpio: rename" error with no error number.
* the "cpio: rename" string is related to error from fsmRename fct in lib/fsm.c:
  Reported by rpm via the zypp traces from libzypp                    
  (for ExternalProgram.cc, Exception.cc, RpmDb.cc)
  the last error is reported by rpm psm.c rpmpsmUnpack fct as error from rpmPackageFilesInstall
  the related string from emsg (output of rpmfileStrerror)
  string "cpio: rename" is build in this rpmfileStrerror by decoding of RPMERR_RENAME_FAILED RC
  Summary of related source lines:  
===
./rpm-4.12.0.1/lib/psm.c:671: fsmrc = rpmPackageFilesInstall(psm->ts, psm->te, psm->files,
===
    fsmrc = rpmPackageFilesInstall(psm->ts, psm->te, psm->files, psm, &failedFile);
    emsg = rpmfileStrerror(fsmrc);
    rpmlog(RPMLOG_ERR,
            _("unpacking of archive failed%s%s: %s\n"),          
            (failedFile != NULL ? _(" on file ") : ""),
            (failedFile != NULL ? failedFile : ""),
            emsg);                  
===
./rpm-4.12.0.1/lib/rpmfi.c:2111:char * rpmfileStrerror(int rc)
./rpm-4.12.0.1/lib/fsm.c:535: static int fsmRename(const char *opath, const char *path)
./rpm-4.12.0.1/lib/rpmarchive.h RPMERR_RENAME_FAILED = -32774,
===
Comment 16 Michel Normand 2016-12-07 17:58:26 UTC
In my openQA instance there is no such failure if eatmydata package is removed !
This is used in os-autoinst qemu backend:
===
backend/qemu.pm:719: unshift(@params, "/usr/bin/eatmydata") if (-e "/usr/bin/eatmydata");
===

Do we really need this eatmydata call ? that seems to really eat data.
Comment 17 Michel Normand 2016-12-08 08:07:39 UTC
(In reply to Michel Normand from comment #16)
> In my openQA instance there is no such failure if eatmydata package is
> removed !
> This is used in os-autoinst qemu backend:
> ===
> backend/qemu.pm:719: unshift(@params, "/usr/bin/eatmydata") if (-e
> "/usr/bin/eatmydata");
> ===
> 
> Do we really need this eatmydata call ? that seems to really eat data.

wait for openQA update with PR
https://github.com/os-autoinst/os-autoinst/pull/660
Comment 18 Michel Normand 2016-12-15 08:16:49 UTC
(In reply to Michel Normand from comment #17)
> [CUT] ...
> 
> wait for openQA update with PR
> https://github.com/os-autoinst/os-autoinst/pull/660

Hello Michal,
do you know who is in charge to update the PowerPC workers used by openQA ?
I would like to know when those workers will be updated with the correction
of above PR.

The yesterday's run of the related test (1) still trace that eatmydata is still used by the worker.

(1) https://openqa.opensuse.org/tests/321033/file/autoinst-log.txt
===
+++ worker notes +++
start time: 2016-12-14 13:16:02
running on power8:16 
...
13:16:09.9790 153031 starting: /usr/bin/eatmydata /usr/bin/qemu-system-ppc64 ...
===
Comment 19 Michal Suchanek 2016-12-15 11:22:40 UTC
No idea. Will reassign to OpenQA
Comment 20 Harald Mueller-Ney 2016-12-16 14:12:59 UTC
The SUSE tool channel is used for our internal tracking.

openQA is developed in the open as a community project:
https://progress.opensuse.org/projects/openqav3

Please raise the issue in the community project:
https://progress.opensuse.org/projects/openqav3/issues

I close the bug and move it back to openSUSE Tumbleweed -  QA Tools component, so that you can reference the bug to avoid loosing information.
Comment 21 Michel Normand 2016-12-19 08:18:43 UTC
as suggested I am creating a new issue
https://progress.opensuse.org/issues/15548
Comment 22 Michel Normand 2017-01-04 08:19:22 UTC
(In reply to Michel Normand from comment #21)
> as suggested I am creating a new issue
> https://progress.opensuse.org/issues/15548

above issue created in wrong project, replaced by
https://progress.opensuse.org/issues/15770  in 
https://progress.opensuse.org/projects/openqav3/issues)
Comment 23 Michel Normand 2017-01-05 08:53:29 UTC
after power8 worker update as per issue
https://progress.opensuse.org/issues/15770
set this bug as duplicate of new boo#1018262
to continue investigation from status summary of comment #15

*** This bug has been marked as a duplicate of bug 1018262 ***