|
Bugzilla – Full Text Bug Listing |
| Summary: | kubelet service (1.10.2) fails to start: failed to get device for dir "/var/lib/kubelet" | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Tumbleweed | Reporter: | Maximilian Meister <mmeister> |
| Component: | Kubic | Assignee: | Maximilian Meister <mmeister> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P2 - High | CC: | aherzig, rbrown, vrothberg |
| Version: | Current | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | obs:running:10751:important | ||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Maximilian Meister
2018-05-30 06:08:10 UTC
i've run the conformance tests for 1.10.4 [0] (sle based environment with updated cri-o and crio-tools) and they were green, which makes me wonder why this only happens on kubic. i skimmed through the k8s changelogs but couldnt find any meaningful entry about sth having fixed this issue @richard any idea what could be the difference here? or should we test it again on kubic with 1.10.4? last test was done with 1.10.3 IIRC also feel free to adapt the priority of the bug [0] http://jenkins.caasp.suse.net/job/caasp-manual-sandbox/job/master/60/ (In reply to Maximilian Meister from comment #1) > @richard any idea what could be the difference here? or should we test it > again on kubic with 1.10.4? last test was done with 1.10.3 IIRC SLE12 SP3 (CaaSP until v3) has /var/lib/kubelet as subvolume Kubic (CaaSP from v4) has /var as subvolume and /var/lib/kubelet is a directory inside this subvolume. I bet that this is what confuses kubernetes. (In reply to Thorsten Kukuk from comment #2) > (In reply to Maximilian Meister from comment #1) > > @richard any idea what could be the difference here? or should we test it > > again on kubic with 1.10.4? last test was done with 1.10.3 IIRC > > SLE12 SP3 (CaaSP until v3) has /var/lib/kubelet as subvolume > Kubic (CaaSP from v4) has /var as subvolume and /var/lib/kubelet is a > directory inside this subvolume. > > I bet that this is what confuses kubernetes. Indeed - my guestimate suggests that https://github.com/google/cadvisor/pull/1668 only works if /var/lib/kubelet is it's own subvolume It's only a guestimate because I really don't understand how go's 'stat' works, so I'm little blind as to how that fix worked in the past. But one thing we can say for sure is that it doesn't work on Kubic and the difference in the subvolume layout is the biggest change that is likely to trigger any difference in logic for volume/partition ID detection. That change isn't just present in Kubic - we can expect similar behaviour in any SLE 15 based CaaSP also (eg. CaaSP v4) So I'd recommend running any conformance tests for 1.10.x against both SLE 12/CaaSP v3 and SLE 15/Kubic/CaaSP v4 - assuming both are being targetted for k8s 1.10 releases. Bumping up the severity and priority on the grounds of Kubic/CaaSP v4 without kubernetes is as useful as a submarine with a sunroof or an inflatable dartboard ;) I'd recommend the bug be considered equally important for CaaSPv4 until it's proven that it doesn't exist there. i've added a patch as part of [0] to fix this bug, and asmallfter a local test, k8s was running fine and the error message hasn't appeared anymore, i only ran into a failing openldap as a followup but this was more or less expected [0] https://build.opensuse.org/request/show/617020 (In reply to Maximilian Meister from comment #4) > i've added a patch as part of [0] to fix this bug, and asmallfter a local > test, k8s was running fine and the error message hasn't appeared anymore, i > only ran into a failing openldap as a followup but this was more or less > expected > > [0] https://build.opensuse.org/request/show/617020 old sr, this is the correct one -> https://build.opensuse.org/request/show/617501 (In reply to Maximilian Meister from comment #4) > i've added a patch as part of [0] to fix this bug, and asmallfter a local > test, k8s was running fine and the error message hasn't appeared anymore, i > only ran into a failing openldap as a followup but this was more or less > expected Looks like the containers were not part of the last Tumbleweed snapshot ... has been accepted to devel now. the fix is part of this factory SR -> https://build.opensuse.org/request/show/617520 fixed SUSE-SU-2018:4020-1: An update that solves two vulnerabilities and has 7 fixes is now available. Category: security (important) Bug References: 1084765,1095131,1108195,1111341,1112967,1112980,1114645,1116933,1118198 CVE References: CVE-2016-8859,CVE-2018-1002105 Sources used: SUSE CaaS Platform 3.0 (src): caasp-container-manifests-3.0.0+git_r291_33f7b2d-3.6.3, cri-o-1.10.6-4.8.5, cri-tools-1.0.0beta2-3.3.3, kubernetes-1.10.11-4.8.2, kubernetes-salt-3.0.0+git_r888_7af7095-3.33.2 This is an autogenerated message for OBS integration: This bug (1095131) was mentioned in https://build.opensuse.org/request/show/658922 15.0 / kubectl This is an autogenerated message for OBS integration: This bug (1095131) was mentioned in https://build.opensuse.org/request/show/659046 15.0+Backports:SLE-12 / kubectl This is an autogenerated message for OBS integration: This bug (1095131) was mentioned in https://build.opensuse.org/request/show/659074 15.0 / kubectl This is an autogenerated message for OBS integration: This bug (1095131) was mentioned in https://build.opensuse.org/request/show/714707 15.1 / kubernetes This is an autogenerated message for OBS integration: This bug (1095131) was mentioned in https://build.opensuse.org/request/show/714723 15.1 / kubernetes openSUSE-SU-2020:0554-1: An update that solves 7 vulnerabilities and has 22 fixes is now available. Category: security (important) Bug References: 1039663,1042383,1042387,1057277,1059207,1061027,1065972,1069469,1084765,1084766,1085009,1086185,1086412,1095131,1095154,1096773,1097473,1100838,1101010,1104598,1104821,1112980,1118897,1118898,1136403,1144065,1155323,1161056,1161179 CVE References: CVE-2016-5195,CVE-2016-8859,CVE-2017-1002101,CVE-2018-1002105,CVE-2018-16873,CVE-2018-16874,CVE-2019-10214 Sources used: openSUSE Leap 15.1 (src): cri-o-1.17.1-lp151.2.2, cri-tools-1.18.0-lp151.2.1, go1.14-1.14-lp151.6.1, kubernetes-1.18.0-lp151.5.1 |