Bug 1011529 - YaST2 hung ofpathname "too many arguments" for ppc64le multipath configuration
Summary: YaST2 hung ofpathname "too many arguments" for ppc64le multipath configuration
Status: RESOLVED FIXED
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Installation (show other bugs)
Version: Current
Hardware: PowerPC-64 Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: Forgotten User ny8t7SHjD_
QA Contact: Jiri Srain
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-22 05:52 UTC by Michel Normand
Modified: 2017-01-04 10:27 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
logs from /var/log (736.57 KB, application/octet-stream)
2016-11-22 05:58 UTC, Michel Normand
Details
xx.txt is linuxrc session accessed via ssh when yast is hung (11.45 KB, text/plain)
2016-11-23 13:22 UTC, Michel Normand
Details
grub2-install debug trace (102.33 KB, text/plain)
2016-11-23 17:21 UTC, Michel Normand
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michel Normand 2016-11-22 05:52:08 UTC
YaST2 hung on Saving bootloader configuration for multipath ppc64le guest

* Create a ppc64le guest with qemu parameters of (1) with a disk accessed by two paths.
  (to mimic the configuration set for multipath test in openQA (2))
* YaST2 hung on Saving bootloader configuration for multipath ppc64le guest
* by ssh access able to retrieve the list of hang process (3)
  the strace of ofpathname shows an infinit loop (4)
  and the y2log is full of "too many argument" error (5)

(1)
===
$qemu-img create raid/l1 -f raw 10G
$qemu-system-ppc64 -vga std -m 4096 -machine usb=off -cpu host -nographic -netdev user,id=qanet0,hostfwd=::10022-:22,hostname=qemu2 -device virtio-net,netdev=qanet0 -device virtio-scsi-pci,id=scsi0 -device virtio-scsi-pci,id=scsi1 -device scsi-hd,drive=hd1a,bus=scsi0.0 -drive file=raid/l1,cache=none,if=none,id=hd1a,serial=mpath1,format=raw -device scsi-hd,drive=hd1b,bus=scsi1.0 -drive file=raid/l1,cache=none,if=none,id=hd1b,serial=mpath1,format=raw -drive media=cdrom,if=none,id=cd0,format=raw,file=/home/michel/raid/openSUSE-Tumbleweed-DVD-ppc64le-Snapshot20161115-Media.iso -device scsi-cd,drive=cd0,bus=scsi0.0 -boot once=d,menu=on,splash-time=5000 -smp 8,threads=8 -enable-kvm -no-shutdown -monitor stdio -serial pty -S -append 'linemode=1 linuxrc.log=/var/log/YaST2/linuxrc.log linuxrc.debug=1 startshell=1 insecure=1 UseSSH=1 SSHPassword=root' -kernel /home/michel/raid/linux -initrd /home/michel/raid/initrd
===
(2) https://openqa.opensuse.org/tests/305788#step/start_install/15
(3)
===
$ps axf
...
15729 pts/1    S+     0:00 \_ /usr/bin/perl /usr/lib/YaST2/servers_non_y2/ag_uid
16212 pts/1    S+     0:00 \_ /usr/sbin/grub2-install --target=powerpc-ieee1275 --force --skip-fs-probe /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1
16231 pts/1    S+     0:00     \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
16256 pts/1    S+     0:12         \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
===
(4)
===
# strace -p 16256
strace: Process 16256 attached
read(3, "sda\nsdb\n", 128)              = 8
read(3, "", 128)                        = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15267, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 15267
waitpid(-1, 0x3ffff795e9d4, WNOHANG)    = -1 ECHILD (No child processes)
rt_sigreturn()                          = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x10025e68, [], 0}, {SIG_IGN, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [], 0}, {0x10025e68, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [], 0}, {SIG_IGN, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
open("slaves/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents(3, /* 4 entries */, 65536)     = 96
getdents(3, /* 0 entries */, 65536)     = 0
close(3)                                = 0
write(2, "/usr/sbin/ofpathname: line 412: "..., 55) = 55
pipe([3, 5])                            = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x3fff8da64800) = 15270
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x1008b670, [], SA_RESTART}, {0x1008b670, [], SA_RESTART}, 8) = 0
close(5)                                = 0
read(3, "sda\nsdb\n", 128)              = 8
read(3, "", 128)                        = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15270, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
...
===
(5)
===
$tail -n3 /var/log/YaST2/y2log
2016-11-21 19:02:03 <3> linux-9de6(4188) [Ruby] lib/cheetah.rb:206 Error output: /usr/sbin/ofpathname: line 412: cd: too many arguments
2016-11-21 19:02:03 <3> linux-9de6(4188) [Ruby] lib/cheetah.rb:206 Error output: /usr/sbin/ofpathname: line 412: cd: too many arguments
2016-11-21 19:02:03 <3> linux-9de6(4188) [Ruby] lib/cheetah.rb:206 Error output: /usr/sbin/ofpathname: line 412: cd: too many arguments
===
Comment 1 Michel Normand 2016-11-22 05:58:06 UTC
Created attachment 703148 [details]
logs from /var/log
Comment 2 Michel Normand 2016-11-22 06:05:22 UTC
This bug is tracking the hung condition previously reported while investigating bug https://bugzilla.suse.com/show_bug.cgi?id=1009472#c8
Comment 3 Michel Normand 2016-11-23 13:22:44 UTC
Created attachment 703492 [details]
xx.txt is linuxrc session accessed via ssh when yast is hung

While yast is hung calling ofpathname, I was able to access linuxrc session, but I am not able to find from there the /usr/sbin/ofpathname file that is reported as hung process (details in attached xx.txt)
===
2:linux-ac6f:~ # ls -l  /usr/sbin/grub2* /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
ls: cannot access '/usr/sbin/grub2*': No such file or directory
ls: cannot access '/usr/sbin/ofpathname': No such file or directory
lrwxrwxrwx 1 root root 7 Nov 23 13:52 /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1 -> ../dm-0
===

I would need help to continue investigation.
Comment 4 Michel Normand 2016-11-23 13:39:07 UTC
(In reply to Michel Normand from comment #3)
> [CUT]
> 
> I would need help to continue investigation.

if I try to chroot to /mnt and kill hung ofpathname then parent is making cleanup that umount, so unable to manually call ofpathname to better understand original hung failure.
===
2:linux-ac6f:~ # chroot /mnt
linux-ac6f:/ # ps axf
...
17473 pts/0    S+     0:00  |                       \_ /usr/sbin/grub2-install --target=powerpc-ieee1275 --force --skip-fs-probe /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1
17489 pts/0    S+     0:00  |                           \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
17505 pts/0    S+     1:38  |                               \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
...
linux-ac6f:/ # kill -9 17505
linux-ac6f:/ # ls /usr/sbin/ofpathname
/usr/sbin/ofpathname

linux-ac6f:/ # /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
sed: can't read /proc/cpuinfo: No such file or directory
grep: /proc/cpuinfo: No such file or directory
grep: /proc/cpuinfo: No such file or directory
grep: /proc/cpuinfo: No such file or directory
/usr/bin/find: '/sys/class/net': No such file or directory
ofpathname: Could not retrieve Open Firmware device path
            for logical device "/dev/mapper/0QEMU_QEMU_HARDDISK_mpath1".
===
Comment 5 Michel Normand 2016-11-23 14:34:40 UTC
if do not kill hung ofpathname and execute manually the same command
So I assume the argument passed by yast to grub is incorrect in case of multipath.

===
linux-0uxa:/ # /bin/bash +x /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
...
+ case $DEVICE in
++ get_slave dm-0
++ cd /sys/class/block/dm-0
+++ ls slaves
++ [[ -n sda
sdb ]]
++ cd slaves/sda slaves/sdb
/usr/sbin/ofpathname: line 412: cd: too many arguments
===
Comment 6 Michel Normand 2016-11-23 14:53:37 UTC
yast parameter passed to grub2-install
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1
is changed by this last one as passed to ofpathname
/dev/mapper/0QEMU_QEMU_HARDDISK_mpath1

so next step is to understand why this change in grub2-install

===
linux-0uxa:/ # ps axf
...
15670 ?        S+     0:00  | \_ /usr/sbin/grub2-install --target=powerpc-ieee1275 --force --skip-fs-probe /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1
15686 ?        S+     0:00  |     \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
15703 ?        S+     7:48  |         \_ /bin/bash /usr/sbin/ofpathname /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
===
linux-0uxa:/ # ls -l  /dev/mapper
total 0
lrwxrwxrwx 1 root root       7 Nov 23 15:00 0QEMU_QEMU_HARDDISK_mpath1 -> ../dm-0
lrwxrwxrwx 1 root root       7 Nov 23 15:00 0QEMU_QEMU_HARDDISK_mpath1-part1 -> ../dm-1
lrwxrwxrwx 1 root root       7 Nov 23 15:00 0QEMU_QEMU_HARDDISK_mpath1-part2 -> ../dm-2
lrwxrwxrwx 1 root root       7 Nov 23 15:00 0QEMU_QEMU_HARDDISK_mpath1-part3 -> ../dm-3
lrwxrwxrwx 1 root root       7 Nov 23 15:00 0QEMU_QEMU_HARDDISK_mpath1_part1 -> ../dm-1
lrwxrwxrwx 1 root root       7 Nov 23 15:00 0QEMU_QEMU_HARDDISK_mpath1_part2 -> ../dm-2
lrwxrwxrwx 1 root root       7 Nov 23 15:00 0QEMU_QEMU_HARDDISK_mpath1_part3 -> ../dm-3
crw------- 1 root root 10, 236 Nov 23 14:59 control
linux-0uxa:/ # ls -l  /dev/disk/by-id/scsi-*
lrwxrwxrwx 1 root root  9 Nov 23 14:59 /dev/disk/by-id/scsi-0QEMU_QEMU_CD-ROM_cd0 -> ../../sr0
lrwxrwxrwx 1 root root 10 Nov 23 15:00 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Nov 23 15:00 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Nov 23 15:00 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part2 -> ../../dm-2
lrwxrwxrwx 1 root root 10 Nov 23 15:00 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part3 -> ../../dm-3
lrwxrwxrwx 1 root root  9 Nov 23 14:59 /dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_mpath1 -> ../../sdb
===
linux-0uxa:/ # rpm -qf /usr/sbin/grub2-install
grub2-2.02~beta3-15.2.ppc64le
===
Comment 7 Michel Normand 2016-11-23 17:21:20 UTC
Created attachment 703548 [details]
grub2-install debug trace

I am wondering if the attached xx3.log debug trace would explain why grub2-install changed from input
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1
to
/dev/mapper/0QEMU_QEMU_HARDDISK_mpath1
when calling ofpathname script.
Comment 8 Michel Normand 2016-11-23 18:37:10 UTC
I have an RFC patch for ofpathname
https://github.com/nfont/powerpc-utils/pull/14

but I do not know if this is the correct way to solve this problem.
Comment 9 Michel Normand 2016-11-28 15:55:03 UTC
(In reply to Michel Normand from comment #8)
> I have an RFC patch for ofpathname
> https://github.com/nfont/powerpc-utils/pull/14
> 
> but I do not know if this is the correct way to solve this problem.

I verified with a DUD file that the above patch is sufficient
to avoid the Yast hung on "Saving bootloader configuration"
Comment 10 Michal Suchanek 2016-11-28 17:05:22 UTC
Maybe someone who is familiar with the grub scripts could tell if all the slaves are needed or picking one suffices?
Comment 11 Michael Chang 2016-11-29 06:52:23 UTC
(In reply to Michel Normand from comment #7)
> Created attachment 703548 [details]
> grub2-install debug trace
> 
> I am wondering if the attached xx3.log debug trace would explain why
> grub2-install changed from input
> /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_mpath1-part1

It's udev device name (for a partition)

> to
> /dev/mapper/0QEMU_QEMU_HARDDISK_mpath1

It's (kernel's) "canonical" device name (for a disk)

The udev names are translated into "canonical" names under /dev/... and then processed internally for their name patterns being "known" to grub2. Otherwise you have to teach grub to know each and every different name patterns changing all the time with user-land tools applying different rules/policies ..

The device mapper is probably the only exception to use names under/dev/mappper/... as it's more human readable than /dev/dm-[0-9]+ and also the pattern is understood.

> when calling ofpathname script.

Why does the name matter here ?
Thanks.
Comment 12 Michael Chang 2016-11-29 07:26:39 UTC
(In reply to Michal Suchanek from comment #10)
> Maybe someone who is familiar with the grub scripts could tell if all the
> slaves are needed or picking one suffices?

Not really. It is for powerpc-utils of which I am not maintainer. Nevertheless It looks to me, for multipath, it probably suffice to use only one slave as all slaves only provides different routes to the same "device" so that data should well be identical. But I couldn't tell whether it's the only case to consider, for eg, where any disk fails or other device mapper device (for eg, dmraid or dmcrypt, though firmware may not support booting them directly thus may not be valid).
Comment 13 Michal Suchanek 2016-11-29 16:39:36 UTC
'firmware' here is Linux most likely

The problem is with ofpathname and turned up by a change to grub scripts that started calling it afaik.

So what does the grub script expect to get if there are multiple ways to reach the disk (provided it's not a deficiency of ofpath and there are in fact multiple equally canonical of names of the disk)?
Comment 14 Michael Chang 2016-11-30 07:15:16 UTC
(In reply to Michal Suchanek from comment #13)
> 'firmware' here is Linux most likely

Sorry but I am confused here. I don't really get your idea in such a short comment. (Please be more verbose to help me understand your thoughts :))

Any, In comment#12, the 'firmware' I was exactly inferring to Open Firmware, which can only understand IEEE1275 device tree path for the disk (containing PReP partition) translated by ofpathname. 

The ofpathname also takes care the Linux logical device to OFW path, and it have trouble (Michel Normand created RFC patch for it). If firmware can't really find a way to deal with the logical device, it has to report something like "LVM Disk /dev/system/root is not support by "direct" firmware booting." or such.

> The problem is with ofpathname and turned up by a change to grub scripts
> that started calling it afaik.

Again I am confused by the 'grub script'. :( Here listed my candidates :

1. grub2-install, but it's nothing a script
2. scripts under /etc/grub.d/ but they have nothing to do with ofpathname (ie not calling it)
3. perl bootloader and/or YaST scripts, but they are not grub scripts
4. else ..

I presume 'grub2-install' is what you were talking about, but it has been calling ofpathname since day one for booting powerpc-ieee1275.

Btw, grub2 has also grub2_ofpathname but is not used here, is it cause of confusion here?

> So what does the grub script expect to get if there are multiple ways to
> reach the disk (provided it's not a deficiency of ofpath and there are in
> fact multiple equally canonical of names of the disk)?

In this case, it boots the the disk from which firmware loads it (aka the boot disk). That is setting the $prefix to

'(,msdos3)/boot/grub2'

The msdos3 is set during grub2-install. As cross-disk installation (ie the PReP and /boot partition are on different disk) is not allowed in grub2, it will have identical result, even if you swap the disk order.

You can see this line in attachment #3 [details].

grub-mkimage --directory '/usr/lib/grub2/powerpc-ieee1275' --prefix '(,msdos3)/boot/grub2' --output '/boot/grub2/powerpc-ieee1275/core.elf' --format 'powerpc-ieee1275' --compression 'auto'  --config '/boot/grub2/powerpc-ieee1275/load.cfg' 'btrfs' 'part_msdos'

Thanks.
Comment 15 Michal Suchanek 2016-12-05 11:14:20 UTC
ok, so grub does not support searching the boot device on multiple physical disks (multipath) and whichever disk is presented is fine. 

Then the presented patch for ofpathname resolves the lockup and cannot potentially degrade grub multipath functionality since there is not any.

Thanks for the clarification.
Comment 16 Michal Suchanek 2016-12-05 14:18:08 UTC
Assigning maintainer.

Please integrate patch from comment #8 or reassign to me.
Comment 17 Michel Normand 2016-12-05 17:08:51 UTC
(In reply to Michal Suchanek from comment #16)
> Assigning maintainer.
> 
> Please integrate patch from comment #8 or reassign to me.

it is already in OBS with SR
  https://build.opensuse.org/request/show/442438
  https://build.opensuse.org/request/show/442808
Comment 18 Michel Normand 2017-01-04 10:27:35 UTC
The last  20161229 snapshot do not experienced such  problem (when testing with worker with boo#1009472 bypass) so I am closing this bug as fixed