Bug 1189590

Summary: YaST crashes in libzypp when evaluating upgrade
Product: [openSUSE] openSUSE Tumbleweed Reporter: Fabian Vogt <fvogt>
Component: InstallationAssignee: Ladislav Slezák <lslezak>
Status: RESOLVED FIXED QA Contact: Jiri Srain <jsrain>
Severity: Critical    
Priority: P3 - Medium CC: bzeller, dgonzalez, dimstar, lslezak, ma, zypp-maintainers
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
URL: https://trello.com/c/LZ0YbPvf
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: YaST logs (from the assets page)

Description Fabian Vogt 2021-08-18 15:22:12 UTC
Starting two days ago, the upgrade_staging test in Tumbleweed stagings fails in most cases (there are a few random successful ones as well).
When showing the installation overview, yast causes a crash in libzypp (libsolv).
Example: https://openqa.opensuse.org/tests/1878709 (yast logs attached).
This unfortunately blocks all recent stagings, so Tumbleweed can't accept any new changes until this is fixed.

Backtrace:

2021-08-18 02:28:48 <1> install(3808) [zypp::solver] SATResolver.cc(solving):522 Checking droplists ...
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 Error: signal 11
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [hd]: (-3) /lib64/libzypp.so.1722 : zypp::dumpBacktrace(std::ostream&)+0x39 [0x7f7e97405ae9]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [hd]: (-2) /lib64/libzypp.so.1722 : +0x356696 [0x7f7e973da696]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [hd]: (-1) /lib64/libc.so.6 : +0x3d3a0 [0x7f7ea53733a0]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 vvvvvvvvvv----------------------------------------
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (0) /lib64/libc.so.6 : +0xa9af2 [0x7f7ea53dfaf2]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (1) /lib64/libzypp.so.1722 : +0x472392 [0x7f7e974f6392]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (2) /lib64/libzypp.so.1722 : +0x470f8c [0x7f7e974f4f8c]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (3) /lib64/libzypp.so.1722 : zypp::IdString::IdString(char const*)+0x21 [0x7f7e9734aa01]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (4) /lib64/libzypp.so.1722 : +0x3601b9 [0x7f7e973e41b9]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (5) /lib64/libzypp.so.1722 : +0x46b601 [0x7f7e974ef601]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (6) /lib64/libzypp.so.1722 : +0x2aa6c2 [0x7f7e9732e6c2]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (7) /lib64/libzypp.so.1722 : zypp::Capability::Capability(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, zypp::Rel, zypp::Edition const&, zypp::ResKind const&)+0x30 [0x7f7e9732ea80]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (8) /lib64/libzypp.so.1722 : zypp::sat::Solvable::valuesOfNamespace(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x397 [0x7f7e973e5957]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (9) /lib64/libzypp.so.1722 : zypp::solver::detail::SATResolver::solving(std::unordered_set<zypp::Capability, std::hash<zypp::Capability>, std::equal_to<zypp::Capability>, std::allocator<zypp::Capability> > const&, std::unordered_set<zypp::Capability, std::hash<zypp::Capability>, std::equal_to<zypp::Capability>, std::allocator<zypp::Capability> > const&)+0x12d5 [0x7f7e9730f325]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (10) /lib64/libzypp.so.1722 : zypp::solver::detail::SATResolver::resolvePool(std::unordered_set<zypp::Capability, std::hash<zypp::Capability>, std::equal_to<zypp::Capability>, std::allocator<zypp::Capability> > const&, std::unordered_set<zypp::Capability, std::hash<zypp::Capability>, std::equal_to<zypp::Capability>, std::allocator<zypp::Capability> > const&, std::__cxx11::list<zypp::PoolItem, std::allocator<zypp::PoolItem> > const&, std::set<zypp::Repository, std::less<zypp::Repository>, std::allocator<zypp::Repository> > const&)+0x992 [0x7f7e97310b52]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (11) /usr/lib64/YaST2/plugin/libpy2Pkg.so.2 : PkgFunctions::PkgUpdateAll(YCPMap const&)+0x391 [0x7f7e9772baf1]
2021-08-18 02:28:48 <5> install(3808) [zypp] ZYppFactory.cc(sigsegvHandler):58 [bt]: (12) /usr/lib64/YaST2/plugin/libpy2Pkg.so.2 : Y2PkgFunction::evaluateCall()+0x126f4 [0x7f7e97794f24]

I'm occasionally able to reproduce it (openQA triggers it fairly reliably) and was able to gdb into the crashing process once.
It's visible that libsolv's stringpool is intact, but the to be inserted string pointer points into deallocated (unmapped) space.

It can also be seen that the crashed process has a open file handle to a deleted solv cache:

35 -> /var/cache/zypp/solv/openSUSE-1-0/solv (deleted)
37 -> /mnt/var/cache/zypp/solv/@System/solv

That implies that it deleted an in-use pool, which seems dangerous and seems related to bug 1183711.
Comment 1 Fabian Vogt 2021-08-18 15:22:52 UTC
Created attachment 851899 [details]
YaST logs (from the assets page)
Comment 2 Ladislav Slezák 2021-08-19 08:05:18 UTC
Um, that should not happen, I have reverted all changes which touched the libzypp cache:

- https://github.com/yast/yast-packager/pull/565 (https://bugzilla.suse.com/show_bug.cgi?id=1183711)

- https://github.com/yast/yast-installation/pull/900 (https://bugzilla.suse.com/show_bug.cgi?id=1179415)

YaST should not delete anything from the libzypp cache.


The log does not contain any hint about the solv file besides creating it and successfully loading:

2021-08-18 02:27:48 <1> install(3808) [DEFINE_LOGGROUP++] forkspawnengine.cc(start):149 Executing 'repo2solv' '-o' '/var/cache/zypp/solv/openSUSE-1-0/solv' '-X' '/var/cache/zypp/raw/openSUSE-
1-0'
2021-08-18 02:27:48 <1> install(3808) [DEFINE_LOGGROUP++] forkspawnengine.cc(start):301 pid 4578 launched
2021-08-18 02:27:49 <1> install(3808) [DEFINE_LOGGROUP++] abstractspawnengine.cc(checkStatus):180 Pid 4578 successfully completed

...

2021-08-18 02:27:49 <1> install(3808) [zypp] Repository.cc(addSolv):336 sat::repo(openSUSE-1-0){prio 0.0, size 3773} after adding /var/cache/zypp/solv/openSUSE-1-0/solv
2021-08-18 02:27:49 <1> install(3808) [zypp::satpool] PoolImpl.cc(setDirty):253 setRepoInfo openSUSE-1-0
2021-08-18 02:27:49 <1> install(3808) [zypp] Repository.cc(setInfo):288 sat::repo(openSUSE-1-0){prio -99.2, size 3773}
2021-08-18 02:27:49 <1> install(3808) [Pkg] Source_Resolvables.cc(LoadResolvablesFrom):160 Pool size at end: 3773 (loaded 3773 resolvables)


Michael, any hint? BTW that "DEFINE_LOGGROUP++" name looks strange...
Comment 3 Michael Andres 2021-08-19 10:38:10 UTC
"DEFINE_LOGGROUP++" is just a missing define as we relocated the sources. 
This does not affect anything else.
Comment 4 Fabian Vogt 2021-08-19 13:51:11 UTC
I see some more strange behaviour when I try to reproduce it locally.
This happens fairly reliably and in almost 100% of cases:
When the installation overview appears, it shows that it's going to upgrade only 5 packages and remove 1932. This is clearly wrong, and going into the packager shows that it's going to upgrade most of them instead. If I hit the "Back" and then the "Next" button again, it shows the correct summary.

Sometimes when I hit the "Abort" button, it shows that /mnt/var/cache/ fails to unmount, because /mnt/var/cache/zypp/solv/openSUSE-1-0/solv is being used.
Comment 5 Michael Andres 2021-08-19 14:46:04 UTC
I'm not 100% sure, what YAST is doing, but I suspects it's triggered by YASTs own repo managemet they perform beside zypps repo manager. The two managers appear to have a different view to the system:

grep 'openSUSE-1-0\|RepoManager\|setDirty' y2log:
> 2021-08-18 02:27:43 <1> install(3808) [Pkg] PkgFunctions.cc(CreateRepoManager):236 Path to repository files: /etc/zypp/repos.d
> 2021-08-18 02:27:43 <1> install(3808) [zypp] RepoManager.cc(init_knownRepositories):839 start construct known repos
> 2021-08-18 02:27:43 <1> install(3808) [zypp] RepoManager.cc(repositories_in_dir):308 directory /etc/zypp/repos.d
> 2021-08-18 02:27:43 <1> install(3808) [zypp] RepoManager.cc(init_knownRepositories):904 end construct known repos
> 2021-08-18 02:27:46 <1> install(3808) [zypp] RepoManager.cc(probe):1443 going to probe the repo type at cd:/?devices=/dev/disk/by-id/scsi-0QEMU_QEMU_CD-ROM_cd0 ()
> 2021-08-18 02:27:46 <1> install(3808) [zypp] RepoManager.cc(probe):1469 Probed type RPMMD at cd:/?devices=/dev/disk/by-id/scsi-0QEMU_QEMU_CD-ROM_cd0 ()
> 2021-08-18 02:27:46 <1> install(3808) [Pkg] Source_Create.cc(createManagedSource):241 Name of the repository: 'openSUSE-1-0'
> 2021-08-18 02:27:46 <1> install(3808) [Pkg] Source_Create.cc(createManagedSource):283 - alias       : openSUSE-1-0
> 2021-08-18 02:27:46 <1> install(3808) [Pkg] Source_Create.cc(createManagedSource):283 - name        : openSUSE-1-0
> 2021-08-18 02:27:46 <1> install(3808) [Pkg] Source_Create.cc(createManagedSource):283 - metadataPath: /var/cache/zypp/raw/openSUSE-1-0
> 2021-08-18 02:27:46 <1> install(3808) [Pkg] Source_Create.cc(createManagedSource):283 - packagesPath: /var/cache/zypp/packages/openSUSE-1-0
> ...
> 2021-08-18 02:27:46 <1> install(3808) [zypp] RepoManager.cc(checkIfToRefreshMetadata):994 Check if to refresh repo openSUSE-1-0 at cd:/?devices=/dev/disk/by-id/scsi-0QEMU_QEMU_CD-ROM_cd0 (rpm-md)
> ...
> 2021-08-18 02:27:49 <1> install(3808) [Pkg] Source_Create.cc(createManagedSource):305 Added source 'openSUSE-1-0': 'cd:/?devices=/dev/disk/by-id/scsi-0QEMU_QEMU_CD-ROM_cd0', enabled: true, autorefresh: false
> 2021-08-18 02:27:49 <1> install(3808) [Pkg] Source_Resolvables.cc(LoadResolvablesFrom):67 Loading resolvables from 'openSUSE-1-0', pool size at start: 0
> 2021-08-18 02:27:49 <1> install(3808) [zypp::satpool] PoolImpl.cc(setDirty):253 _createRepo openSUSE-1-0
> 2021-08-18 02:27:49 <1> install(3808) [zypp::satpool] PoolImpl.cc(setDirty):253 _addSolv openSUSE-1-0

Here it looks like YAST ([Pkg]) creates its own openSUSE-1-0 repository. The zypp::RepoManager is created, but only to perform the refresh. The ReopoInfo YAST created is not passed to RepoManager (i.e. it is not added as a persistent repo to the system; no entry in /etc/zypp/repos.d).

Then YAST loads the repo data (from the .solv file) into the pool.


> 2021-08-18 02:28:10 <1> install(3808) [Pkg] PkgFunctions.cc(RepoManagerUpdateTarget):459 Updating RepoManager (target changed from  to /)
> 2021-08-18 02:28:10 <0> install(3808) [Pkg] PkgFunctions.cc(RepoManagerUpdateTarget):463 repomanager options size: 0
> 2021-08-18 02:28:10 <1> install(3808) [zypp] RepoManager.cc(init_knownRepositories):839 start construct known repos
> 2021-08-18 02:28:10 <1> install(3808) [zypp] RepoManager.cc(repositories_in_dir):308 directory /etc/zypp/repos.d
> 2021-08-18 02:28:10 <1> install(3808) [DEFINE_LOGGROUP] PathInfo.cc(recursive_rmdir):434 recursive_rmdir /var/cache/zypp/raw/openSUSE-1-0 
> 2021-08-18 02:28:10 <1> install(3808) [DEFINE_LOGGROUP] PathInfo.cc(recursive_rmdir):434 recursive_rmdir /var/cache/zypp/solv/openSUSE-1-0 
> ...

That's probably where the issue begins. YAST again creates a zypp::RepoManager for the system. The startup code finds no repos in /etc/zypp/repos.d but now apparently orphaned data in the cache which are cleaned. This is where the cache vanishes for YASTs openSUSE-1-0 repo.

Yast should either stay with it's Initial RepoManager for the system or use one with a private cache for it's own stuff.


> 2021-08-18 02:28:42 <1> install(3808) [Pkg] Source_Save.cc(SourceSaveAll):220 Adding repository 'openSUSE-1-0'
> 2021-08-18 02:28:42 <0> install(3808) [Pkg] Callbacks.cc(start):382 ProgressStart: id:52, Adding repository 'openSUSE-1-0'
> 2021-08-18 02:28:42 <0> install(3808) [Pkg] Callbacks.cc(progress):401 ProgressProgress: id:52, Adding repository 'openSUSE-1-0': 0%
> 2021-08-18 02:28:42 <1> install(3808) [zypp] RepoManager.cc(addRepository):1654 Try adding repo --------------------------------------

Later YAST adds its repos to the zypp::RepoManager, This workflow would basically be ok, if YAST would stay with the initially created RepoManager for the system. 

"PkgFunctions.cc(RepoManagerUpdateTarget) ... Updating RepoManager (target changed from  to /)"
This shoudl probably handle an empty root like "/", in case this woudl prevent RepoManagerUpdateTarget from creating a new RepoManger instance.
Comment 6 Michael Andres 2021-08-19 14:54:09 UTC
> 2021-08-18 02:28:47 <1> install(3808) [Pkg] 
> PkgFunctions.cc(RepoManagerUpdateTarget):459 Updating RepoManager 
> (target changed from /mnt to /mnt)

Looks like YAST wrongly assumes a target change more often.
Comment 7 Fabian Vogt 2021-08-19 15:32:11 UTC
What I noticed is that sometimes it uses /var/cache/zypp/solv/openSUSE-1-0/solv and sometimes /mnt/var/cache/zypp/solv/openSUSE-1-0/solv.

This looks odd too, and probably reinforces comment 5:

2021-08-19 09:56:37 <1> install(3874) [Pkg] Source_Load.cc(SourceLoadImpl):452 Resolvables from 'openSUSE-1-0' are already present, not loading
2021-08-19 09:56:37 <1> install(3874) [Pkg] Source_Resolvables.cc(LoadResolvablesFrom):67 Loading resolvables from 'openSUSE-1-0', pool size at start: 5936
2021-08-19 09:56:37 <1> install(3874) [zypp] Repository.cc(eraseFromPool):300 sat::repo(openSUSE-1-0){prio -99.2, size 3773} removed from pool
2021-08-19 09:56:37 <1> install(3874) [zypp::satpool] PoolImpl.cc(setDirty):253 _deleteRepo openSUSE-1-0
2021-08-19 09:56:37 <1> install(3874) [zypp::satpool] PoolImpl.cc(setDirty):253 _createRepo openSUSE-1-0
2021-08-19 09:56:37 <1> install(3874) [zypp::satpool] PoolImpl.cc(setDirty):253 _addSolv openSUSE-1-0

It probably gets confused because the installation medium results in a repo called "openSUSE-1-0" and the to be upgraded system also contains /mnt/etc/zypp/repos.d/openSUSE-1-0.repo.

This would explain why it only causes issues in staging, where the version number is always "1-0".

I did some tests with different upgrade system repos, with the repo removed and renamed. After renaming, YaST actually proposes to remove it for the upgrade.

openSUSE-1-0: 3/3 failures (wrong actions in overview)
None: 2/2 success
openSUSE-2-0: 2/2 success
Comment 8 Fabian Vogt 2021-08-20 12:02:19 UTC
(In reply to Fabian Vogt from comment #7)
> What I noticed is that sometimes it uses
> /var/cache/zypp/solv/openSUSE-1-0/solv and sometimes
> /mnt/var/cache/zypp/solv/openSUSE-1-0/solv.
> 
> This looks odd too, and probably reinforces comment 5:
> 
> 2021-08-19 09:56:37 <1> install(3874) [Pkg]
> Source_Load.cc(SourceLoadImpl):452 Resolvables from 'openSUSE-1-0' are
> already present, not loading
> 2021-08-19 09:56:37 <1> install(3874) [Pkg]
> Source_Resolvables.cc(LoadResolvablesFrom):67 Loading resolvables from
> 'openSUSE-1-0', pool size at start: 5936
> 2021-08-19 09:56:37 <1> install(3874) [zypp]
> Repository.cc(eraseFromPool):300 sat::repo(openSUSE-1-0){prio -99.2, size
> 3773} removed from pool
> 2021-08-19 09:56:37 <1> install(3874) [zypp::satpool]
> PoolImpl.cc(setDirty):253 _deleteRepo openSUSE-1-0
> 2021-08-19 09:56:37 <1> install(3874) [zypp::satpool]
> PoolImpl.cc(setDirty):253 _createRepo openSUSE-1-0
> 2021-08-19 09:56:37 <1> install(3874) [zypp::satpool]
> PoolImpl.cc(setDirty):253 _addSolv openSUSE-1-0
> 
> It probably gets confused because the installation medium results in a repo
> called "openSUSE-1-0" and the to be upgraded system also contains
> /mnt/etc/zypp/repos.d/openSUSE-1-0.repo.
> 
> This would explain why it only causes issues in staging, where the version
> number is always "1-0".

I just realized that this isn't accurate, and caused by my method of local
testing. After reproducing the issue, I did not rollback the source .qcow2 to
its original state, but YaST actually writes into the target system's zypp conf
at the installation overview, even before the upgrade starts. That changed the
behaviour of the following tests, which then had only the openSUSE-1-0 repo in
the target system. Originally, the target system does not have any
"openSUSE-1-0" repo, but it's added when computing the upgrade. In that scenario
I also got the crash again instead of the broken calculated upgrade.

This means my theory about why it only seems to fail in stagings is invalid,
it affects hit all upgrades. It does still confirm that there is some confusion
with repos and caches from the inst-sys and the target system.
Comment 9 Ladislav Slezák 2021-08-20 14:43:42 UTC
(In reply to Michael Andres from comment #5)

> That's probably where the issue begins. YAST again creates a
> zypp::RepoManager for the system. The startup code finds no repos in
> /etc/zypp/repos.d but now apparently orphaned data in the cache which are
> cleaned. This is where the cache vanishes for YASTs openSUSE-1-0 repo.

Would saving the repository to a /etc/zypp/repos.d/*.repo file fix the problem?
Comment 10 Michael Andres 2021-08-23 06:46:08 UTC
(In reply to Ladislav Slezák from comment #9)
> Would saving the repository to a /etc/zypp/repos.d/*.repo file fix the
> problem?

Probably, the zypp::RepomManager keeps (root)/etx/zypp/repos.d and (root)/var/cache/zypp in sync upon it's construction. If the removed cache entry is the cause, this should fix it.
Comment 11 Stefan Hundhammer 2021-08-25 13:51:47 UTC
See also bug #1189793
Comment 12 Stefan Hundhammer 2021-09-02 13:40:42 UTC
See also bug #1190113 (also segfault)
Comment 13 Ladislav Slezák 2021-09-06 13:43:17 UTC
*** Bug 1190113 has been marked as a duplicate of this bug. ***
Comment 14 Ladislav Slezák 2021-09-07 07:32:34 UTC
I have tested a fix, saving the repositories really fixes the crash.

BTW it was quite difficult to reproduce the problem, I used latest TW image and it was fine, I downloaded the Staging:J image and still was fine.

Then I downloaded the disk image with the old system from openQA. Then I could reproduce the problem! After removing the old repositories it was again fine. So it seems the crash is also affected by repositories present in the old system.
Comment 15 Michael Andres 2021-09-07 07:53:03 UTC
(In reply to Ladislav Slezák from comment #14)
> again fine. So it seems the crash is also affected by repositories present
> in the old system.

From others present or from the order loaded. It depends on the data queried. We may fall back or refer to data from installed/ other available packages, e.g. when looking for translations.
Comment 16 Ladislav Slezák 2021-09-07 12:07:47 UTC
Fixed in yast2-update-4.4.3 (https://github.com/yast/yast-update/pull/171).

With the patch the upgrade process does not crash anymore.