Bug 1127849

Summary: Zypper segfaults in official openSUSE docker image (opensuse/ namespace)
Product: [openSUSE] openSUSE Tumbleweed Reporter: Petr Vorel <petr.vorel>
Component: ContainersAssignee: Containers Team <containers-bugowner>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: asarai, bart.vanassche+novell, fvogt, petr.vorel, rbrown, rbrown, rjschwei, turtlevt
Version: Current   
Target Milestone: ---   
Hardware: Other   
OS: Other   
See Also: http://bugzilla.suse.com/show_bug.cgi?id=1127508
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Travis log

Description Petr Vorel 2019-03-05 07:43:53 UTC
Created attachment 798829 [details]
Travis log

Zypper segfaults in official openSUSE docker image (opensuse/ namespace), see [1] or attachment.

Retrieving: libxcrypt-devel-4.4.3-1.1.x86_64.rpm ---------------------[starting]./opensuse.sh: line 22:     6 Segmentation fault      (core dumped) zypper --non-interactive install --no-recommends autoconf automake clang gcc gzip make kernel-default-devel keyutils-devel libacl-devel libaio-devel libcap-devel libnuma-devel libopenssl-devel libselinux-devel libtirpc-devel linux-glibc-devel lsb-release

I understand the switch [2], but it'd be nice to have reliable opensuse images otherwise we'll have to stop testing LTP against openSUSE.

[1] https://travis-ci.org/linux-test-project/ltp/jobs/501823207
[2] https://github.com/docker-library/official-images/issues/5371
Comment 1 Fabian Vogt 2019-03-05 08:17:35 UTC
This only happens in Travis and gitlab.com environments.

I reverted the image to an older state already, but so far I don't know what's causing this.

I assume it's either an incompatible docker or host kernel version.
Comment 2 Richard Brown 2019-03-05 09:40:42 UTC
(In reply to Petr Vorel from comment #0)

> I understand the switch [2], but it'd be nice to have reliable opensuse
> images otherwise we'll have to stop testing LTP against openSUSE.

If you want reliable testing, use a reliable test host instead of those with old/obsolete hosts (the current leading hypothesis for this failure)

I can recommend openQA ;)
Comment 3 Petr Vorel 2019-03-05 10:48:54 UTC
(In reply to Richard Brown from comment #2)
> If you want reliable testing, use a reliable test host instead of those with
> old/obsolete hosts (the current leading hypothesis for this failure)
> 
> I can recommend openQA ;)
Well, this idea that openQA is suitable for everything is simply wrong and IMHO hurts openQA. Anyway, we do have LTP tests in openQA, but that's for testing openSUSE/SLES. openSUSE images in travis are for LTP upstream CI testing, using openQA wouldn't be the best solution nor acceptable for upstream.
Comment 4 Petr Vorel 2019-03-05 10:49:38 UTC
(In reply to Fabian Vogt from comment #1)
> This only happens in Travis and gitlab.com environments.
> 
> I reverted the image to an older state already, but so far I don't know
> what's causing this.
> 
> I assume it's either an incompatible docker or host kernel version.

Thanks for fixing, please let me know, if I can help to prevent this in the future.
Comment 5 Richard Brown 2019-03-05 10:55:44 UTC
(In reply to Petr Vorel from comment #3)
> (In reply to Richard Brown from comment #2)
> > If you want reliable testing, use a reliable test host instead of those with
> > old/obsolete hosts (the current leading hypothesis for this failure)
> > 
> > I can recommend openQA ;)
> Well, this idea that openQA is suitable for everything is simply wrong and
> IMHO hurts openQA. Anyway, we do have LTP tests in openQA, but that's for
> testing openSUSE/SLES. openSUSE images in travis are for LTP upstream CI
> testing, using openQA wouldn't be the best solution nor acceptable for
> upstream.

Ok but how is that going to work? There is no kernel in our OCI containers..so upstream LTP actually want a situation where openSUSE LTP is run against unknown, unvetted, arbitrary kernels from their CI host?

That seems like an invitation for the sort of unexplained problems as you currently see here..
Comment 6 Petr Vorel 2019-03-05 11:05:18 UTC
(In reply to Richard Brown from comment #5)
> Ok but how is that going to work? There is no kernel in our OCI
> containers..so upstream LTP actually want a situation where openSUSE LTP is
> run against unknown, unvetted, arbitrary kernels from their CI host?
> That seems like an invitation for the sort of unexplained problems as you
> currently see here..
That LTP CI is for LTP itself, i.e. for building LTP. But that's a bit OT to the bug. It'd be nice these bugs were caught before they reach users.
Comment 7 Richard Palethorpe 2019-03-05 11:07:27 UTC
In gitlab it is quite easy to use our own test runners where we have control over the host. Github is another matter, especially as we are talking about an upstream project where it might not be so easy to get access to cloud.suse.de where I am hosting the runners.

However we might be able to get some public cloud hosting from the Linux Foundation.

This also leads into using Cyrils new test runner to run the actual LTP test cases in CI, but that needs to be either on bare metal or on a host which is guaranteed to support nested virtualization.
Comment 8 Cyril Hrubis 2019-03-05 12:34:18 UTC
 > Ok but how is that going to work? There is no kernel in our OCI
> containers..so upstream LTP actually want a situation where openSUSE LTP is
> run against unknown, unvetted, arbitrary kernels from their CI host?
> 
> That seems like an invitation for the sort of unexplained problems as you
> currently see here..

Looks like you are confused here, what we are talking about is LTP build bot that makes sure that LTP builds fine on different gcc/libc combinations and also with and without some devel libraries. It runs for each commit on GitHub using Travis, which in turn uses different docker images for most common distributions.
Comment 9 Petr Vorel 2019-03-14 12:36:07 UTC
(In reply to Fabian Vogt from comment #1)
> This only happens in Travis and gitlab.com environments.
> 
> I reverted the image to an older state already, but so far I don't know
> what's causing this.
> 
> I assume it's either an incompatible docker or host kernel version.

Unfortunately the problem occurred again. Can you please revert the image again?

https://travis-ci.org/pevik/ltp/jobs/506193277
https://api.travis-ci.org/v3/job/506193277/log.txt
Comment 10 Fabian Vogt 2019-03-14 12:54:34 UTC
(In reply to Petr Vorel from comment #9)
> (In reply to Fabian Vogt from comment #1)
> > This only happens in Travis and gitlab.com environments.
> > 
> > I reverted the image to an older state already, but so far I don't know
> > what's causing this.
> > 
> > I assume it's either an incompatible docker or host kernel version.
> 
> Unfortunately the problem occurred again. Can you please revert the image
> again?
> 
> https://travis-ci.org/pevik/ltp/jobs/506193277
> https://api.travis-ci.org/v3/job/506193277/log.txt

Due to a bug an older version of the image got published instead of the latest build with the libcurl fix. It's fixed now.

Please reopen if it breaks again.
Comment 11 Neil Rickert 2019-03-17 17:16:32 UTC
*** Bug 1128945 has been marked as a duplicate of this bug. ***
Comment 12 Neil Rickert 2019-03-17 17:19:20 UTC
*** Bug 1129480 has been marked as a duplicate of this bug. ***