Bugzilla – Bug 466484
on shutdown the home-partition does not clearly unmount.
Last modified: 2010-01-29 16:13:59 UTC
Sorry for my English! Often some applications accesses the /home-partition (if it exists) and so unmounting on shutdown fails. Solutions seems to be moving the follwing block echo "Sending all processes the TERM signal..." killall5 -15 echo -e "$rc_done_up" # wait between last SIGTERM and the next SIGKILL rc_wait /sbin/blogd /sbin/splash echo "Sending all processes the KILL signal..." killall5 -9 echo -e "$rc_done_up" between test -s /etc/init.d/.depend.halt || RUN_PARALLEL="no" type -p startpar &> /dev/null || RUN_PARALLEL="no" startpar -v &> /dev/null || RUN_PARALLEL="no" and # # set back system boot configuration # if test "$RUN_PARALLEL" = "yes" ; then startopt="-p4 -t 30 -T 3" eval $(startpar $startopt -M halt) unset failed_service skipped_service
Oh I forgot: The filename for editing is /etc/init.d/halt Here is the original thread in German: http://www.linux-club.de/viewtopic.php?f=4&t=99992&start=20&st=0&sk=t&sd=a&sid=9b6315c409a8ed6b549ff96166609ab9
The major problem with this approach is that now udevd is killed and some of the boot scripts loose the event handling of the udevd for caused kernel events. This is the reason why we have now an other solution for this. But this requires a new sysvinit package which includes two new tools: /sbin/mkill - Send processes making a active mount point busy a signal /sbin/vhangup - Cause a virtually hangup on the specified terminals which are used in a changed /etc/init.d/boot.localfs of the package aaa_base.
Created attachment 265643 [details] sysvinit.rpm for i586 and higher New sysvinit with the /sbin/mkill and /sbin/vhangup binary. Please install this *before* installing the next attachment.
Created attachment 265645 [details] aaa_base.rpm for i586 and higher the aaa_base with the /etc/init.d/boot.localfs script using mkill(8) and /etc/init.d/halt using vhangup(8). Please test out if this works for you.
Please try out the above attachments of comment #3 and comment #4 by installing first attachment #265643 [details] and then attachment #265645 [details]. Does this work for you?
I could not install the package sysvinit: sudo rpm -ivh sys*.rpm Preparing... ########################################### [100%] file /bin/fsync from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /bin/mountpoint from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /bin/usleep from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/blogd from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/blogger from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/checkproc from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/detectups from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/halt from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/init from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/isserial from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/killall5 from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/killproc from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/powerd from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/runlevel from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/showconsole from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/shutdown from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/start-stop-daemon from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/startpar from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/startproc from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /sbin/sulogin from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /usr/bin/last from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64 file /usr/bin/utmpdump from install of sysvinit-2.86-189.11.i586 conflicts with file from package sysvinit-2.86-186.13.x86_64
Do *not* use `-ihv' but `-Uhv' please ... -i is for install and -U for update and you want the later case.
System does not boot with these packages. Maybe wrong architecture? (i586 instead of x86_64) /sys/class seems to be missing on boot.
Created attachment 265689 [details] sysvinit.rpm for x86_64 this is for x86_64
Created attachment 265691 [details] aaa_base.rpm for x86_64 this is for x86_64
Please retry with the correct architecture ... does the /sys/class error happens again?
Created attachment 265705 [details] Screenshot System still does not boot, it hangs (look screenshot), but I could restart with CTRL+Alt+Del.
I installed attachment #265643 [details] and then attachment #265645 [details] (I red after for architecture x86_64 have to be of comment 9 and comment 10 but they seem also to be for x86_64. I added this comment for track purpose. then I'll see if it solve the shutdown hangup That's my session dump: rpm -Uhv /home/diego/Desktop/sysvinit.rpm Preparing... ########################################### [100%] 1:sysvinit ########################################### [100%] Scanning scripts ... Resolve dependencies ... Install symlinks in /lib/mkinitrd/setup ... Install symlinks in /lib/mkinitrd/boot ... Scanning scripts ... Resolve dependencies ... Install symlinks in /lib/mkinitrd/setup ... Install symlinks in /lib/mkinitrd/boot ... casaregno:~ # rpm -Uhv /home/diego/Desktop/aaa_base.rpm Preparing... ########################################### [100%] 1:aaa_base ########################################### [100%] insserv: Script jexec is broken: incomplete LSB comment. insserv: missing `Required-Stop:' entry: please add even if empty. Updating etc/sysconfig/language... Updating etc/sysconfig/backup... Updating etc/sysconfig/boot... Updating etc/sysconfig/kernel... Updating etc/sysconfig/suseconfig... Updating etc/sysconfig/clock... Updating etc/sysconfig/proxy... Updating etc/sysconfig/windowmanager... Updating etc/sysconfig/sysctl... Updating etc/sysconfig/cron... Updating etc/sysconfig/news... Updating etc/sysconfig/shutdown... Updating etc/passwd...unchanged Updating etc/group...unchanged Updating etc/shadow...unchanged insserv: Script jexec is broken: incomplete LSB comment. insserv: missing `Required-Stop:' entry: please add even if empty. casaregno:~ # uname -a Linux casaregno 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux casaregno:~ # rpm -Uhv /home/diego/Desktop/sysvinit sysvinit(2).rpm sysvinit.rpm casaregno:~ # rpm -Uhv /home/diego/Desktop/sysvinit\(2\).rpm Preparing... ########################################### [100%] package sysvinit-2.86-189.11.x86_64 is already installed
To check for wich arcituecture a rpm is please type in e.g. rpm --queryformat '%{NAME} for %{ARCH}\n' -qp sysvinit.rpm this gives for the first two packages: /suse/werner> rpm --queryformat '%{NAME} for %{ARCH}\n' -qp sysvinit.rpm sysvinit for i586 /suse/werner> rpm --queryformat '%{NAME} for %{ARCH}\n' -qp aaa_base.rpm aaa_base for i586 and the second tow packages: /suse/werner> rpm --queryformat '%{NAME} for %{ARCH}\n' -qp sysvinit.rpm sysvinit for x86_64 /suse/werner> rpm --queryformat '%{NAME} for %{ARCH}\n' -qp aaa_base.rpm aaa_base for x86_64 the next point is for overwriting an existing package with the same version and smae release numbers the option --force can be used: rpm -Uhv sysvinit.rpm --force
I have to inform that new sysvinit and aaa_base doesn't solve the shutdown problem. I'm not sure that the problem is related to some process that lock the umount, maybe could be some dirty cleanup of some kernel module?
(In reply to comment #12) This is very strange: for me this looks like the mingetties are respawning to fast but I'm not able to read the text on attachment #265705 [details]. You should check if you have really installed the correct sysvinit package (x86_64). You may use for this single user mode (be aware that you do not have a virtual console but the system console then, that is no Ctrl-C works and you have to mount the partitions by hand) or the openSuSE DVD with the repair menu entry.
(In reply to comment #15) Diego? Does this mean that your system boots without problems after installing the two packages? Do you have the mkill binary around, that is that type -p mkill should proviode /sbin/mkill and the mkill should be used within /etc/init.d/boot.localfs to stop all processes making the mount points busy.
Okay, I tried with a fresh installation in virtualbox. System ist booting. System seems to unmount clearly. the output of type -p mkill is nothing!
You have to root for type -p mkill otherwise you will not see mkill at /sbin/mkill
Created attachment 265992 [details] Booting fails on x86_64 My system still does not boot with these packages. I made sure that I installed the correct architecture. It seems that sysfs could not be mounted on boot, look at the attached log.
What does your /etc/init.d/boot.local script do? AFAICS from your log the /sys file system seems to be mounted. But the message mount: /sys not mounted already, or bad option leads me to the guess that there is an error. Or it could be that you're missing a module or are running a wrong kernel as the /sys/kernel/security can not be mounted. The only difference between mounting /sys between old /etc/init.d/boot [...] echo -n "Mounting procfs at /proc" mount -n -t proc proc /proc rc_status -v -r echo -n "Mounting sysfs at /sys" mount -n -t sysfs sysfs /sys rc_status -v -r [...] and new /etc/init.d/boot is [...] if test ! -d /proc/1 ; then echo -n "Mounting procfs at /proc" mount -n -t proc proc /proc rc_status -v -r fi if test ! -d /sys/block ; then echo -n "Mounting sysfs at /sys" mount -n -t sysfs sysfs /sys rc_status -v -r fi [...] ... this may fail if you have a mount point /sys with an directory named block therein.
cat /etc/init.d/boot.local #! /bin/sh # # Copyright (c) 2002 SuSE Linux AG Nuernberg, Germany. All rights reserved. # # Author: Werner Fink <werner@suse.de>, 1996 # Burchard Steinbild, 1996 # # /etc/init.d/boot.local # # script with local commands to be executed from init on system startup # # Here you should add things, that should happen directly after booting # before we're going to the first run level. # /bin/echo min_power > /sys/class/scsi_host/host0/link_power_management_policy /bin/echo min_power > /sys/class/scsi_host/host1/link_power_management_policy /bin/echo min_power > /sys/class/scsi_host/host2/link_power_management_policy /bin/echo min_power > /sys/class/scsi_host/host3/link_power_management_policy /bin/echo min_power > /sys/class/scsi_host/host4/link_power_management_policy /bin/echo min_power > /sys/class/scsi_host/host5/link_power_management_policy /bin/echo min_power > /sys/class/scsi_host/host6/link_power_management_policy /bin/echo 1500 > /proc/sys/vm/dirty_writeback_centisecs /bin/echo 1 > /sys/module/snd_ac97_codec/parameters/power_save /sbin/modprobe saa7134-alsa /sbin/modprobe lirc_dev Nothing very special, some options from powertop and some modules... Okay I could start from Live-CD and have a look if there is a /sys/block. cat /etc/fstab /dev/disk/by-id/scsi-SATA_WDC_WD6400AAKS-_WD-WMASY2641335-part6 swap swap defaults 0 0 /dev/disk/by-id/scsi-SATA_WDC_WD6400AAKS-_WD-WMASY2641335-part5 / ext3 defaults 1 1 /dev/disk/by-id/scsi-SATA_WDC_WD6400AAKS-_WD-WMASY2641335-part7 /home ext3 defaults 1 2 /media/sda9 ext3 defaults 1 2 /dev/disk/by-id/scsi-SATA_WDC_WD6400AAKS-_WD-WMASY2641335-part1 /windows/C ntfs-3g uid=1000,exec,users,gid=users,fmask=133,dmask=022,locale=de_DE.UTF-8 0 0 proc /proc proc defaults 0 0 sysfs /sys sysfs noauto 0 0 debugfs /sys/kernel/debug debugfs noauto 0 0 devpts /dev/pts devpts mode=0620,gid=5 0 0 /dev/fd0 /media/floppy auto noauto,user,sync 0 0 /dev/disk/by-id/scsi-SATA_SAMSUNG_HD501LJS0MUJ1EQ164247-part1 /media/data ext3 defaults 1 2
Indeed there was an empty folder called /sys/block. I deleted it and now it works. Partitions were cleanly unmounted on reboot. type -p mkill /sbin/mkill Maybe the init.script should check if the /sys/block-directory is empty or not. There is also a directory called "kernel" but this seems to be needed by aaa-base?
Hmmm ... the file systems /proc and /sys are virtual file systems and indeed they exist only in the memory and only if a directory or file will be opened from a user space application. If /proc and /sys are not mounted the mount point should be empty ... we could replace the simple test for the directorieas by something like test $(stat -f -c '%T' /proc) = proc || mount -n -t proc proc /proc test $(stat -f -c '%T' /sys) = sysfs || mount -n -t sysfs sysfs /sys as this would avoid buggy mount points. Rudi? What do you think about? AFAICS on openSuSE 11.1 and SLES11 we have /bin/stat and with this we could do this very simple.
Created attachment 266171 [details] aaa_base-11.1-10007.12.i586.rpm
Created attachment 266172 [details] sysvinit-2.86-186.14.i586.rpm
Created attachment 266173 [details] aaa_base-11.1-10007.12.x86_64.rpm
Created attachment 266174 [details] sysvinit-2.86-186.14.x86_64.rpm
Diego? Does those packages work for you?
It seem it works but I have to do some other test
*** Bug 462585 has been marked as a duplicate of this bug. ***
Anja? For a SWAMPID is required for both packages sysvinit and aaa_base.
My brother tested the new packages on 3 i586-installations and it seems to work there, too!
Hello. The problem seems to persist even with new packages. I Think we need a sort of "magick key" to click when we are in freeze-mode an have a sort of machine status dump.... can it be possible?
Hello Diego, you could edit the file /etc/init.d/boot.localfs and look for the line echo "Unmounting file systems" Add the follwing two lines: date >> /var/log/boot.fail.msg lsof | grep /home >> /var/log/boot.fail.msg
You may read /usr/src/linux/Documentation/sysrq.txt from kernel-source rpm.
Created attachment 267063 [details] This is the "debug" patch I added to boot.localfs to try the backtrace as suggested by Sven Zielke This is the output that the "patch" produced when boot.localfs didn't umount directories: shutdown procedure freezed (as reported by set -x on the console) with a line beginning by "mkill -TERM" and more mounted path (honestly I didn't wrote down the screen dump) -----------boot.fail.msg---------------- Thu Jan 22 23:38:46 CET 2009 ------ BACKTRACE ----- Traceback: 0 Functions: ------ mtab ---- /dev/hda11 on / type reiserfs (rw) /proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) debugfs on /sys/kernel/debug type debugfs (rw) udev on /dev type tmpfs (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) /dev/hda12 on /home type xfs (rw) /dev/hda8 on /data1 type reiserfs (rw) /dev/hda7 on /suse10.2 type reiserfs (rw) /dev/hda1 on /windows/C type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda2 on /windows/D type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda3 on /windows/E type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda5 on /windows/F type fuseblk (rw,noexec,nosuid,nodev,allow_other,default_permissions,blksize=4096) /dev/hda9 on /windows/G type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda10 on /windows/H type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/mapper/dati-multimedia on /mnt/hdb1 type xfs (rw,noexec,nosuid,nodev) /dev/mapper/dati-distribuzione on /mnt/hdb2 type reiserfs (rw,noexec,nosuid,nodev) securityfs on /sys/kernel/security type securityfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) ------ lsof ---- COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME init 1 root cwd DIR 3,11 688 2 / init 1 root rtd DIR 3,11 688 2 / init 1 root txt REG 3,11 838176 68779 /sbin/init init 1 root 10u FIFO 0,14 0t0 1763 /dev/initctl kthreadd 2 root cwd DIR 3,11 688 2 / kthreadd 2 root rtd DIR 3,11 688 2 / kthreadd 2 root txt unknown /proc/2/exe migration 3 root cwd DIR 3,11 688 2 / migration 3 root rtd DIR 3,11 688 2 / migration 3 root txt unknown /proc/3/exe ksoftirqd 4 root cwd DIR 3,11 688 2 / ksoftirqd 4 root rtd DIR 3,11 688 2 / ksoftirqd 4 root txt unknown /proc/4/exe events/0 5 root cwd DIR 3,11 688 2 / events/0 5 root rtd DIR 3,11 688 2 / events/0 5 root txt unknown /proc/5/exe khelper 6 root cwd DIR 3,11 688 2 / khelper 6 root rtd DIR 3,11 688 2 / khelper 6 root txt unknown /proc/6/exe kintegrit 7 root cwd DIR 3,11 688 2 / kintegrit 7 root rtd DIR 3,11 688 2 / kintegrit 7 root txt unknown /proc/7/exe kblockd/0 8 root cwd DIR 3,11 688 2 / kblockd/0 8 root rtd DIR 3,11 688 2 / kblockd/0 8 root txt unknown /proc/8/exe kacpid 9 root cwd DIR 3,11 688 2 / kacpid 9 root rtd DIR 3,11 688 2 / kacpid 9 root txt unknown /proc/9/exe kacpi_not 10 root cwd DIR 3,11 688 2 / kacpi_not 10 root rtd DIR 3,11 688 2 / kacpi_not 10 root txt unknown /proc/10/exe cqueue 11 root cwd DIR 3,11 688 2 / cqueue 11 root rtd DIR 3,11 688 2 / cqueue 11 root txt unknown /proc/11/exe kseriod 12 root cwd DIR 3,11 688 2 / kseriod 12 root rtd DIR 3,11 688 2 / kseriod 12 root txt unknown /proc/12/exe kondemand 13 root cwd DIR 3,11 688 2 / kondemand 13 root rtd DIR 3,11 688 2 / kondemand 13 root txt unknown /proc/13/exe pdflush 14 root cwd DIR 3,11 688 2 / pdflush 14 root rtd DIR 3,11 688 2 / pdflush 14 root txt unknown /proc/14/exe pdflush 15 root cwd DIR 3,11 688 2 / pdflush 15 root rtd DIR 3,11 688 2 / pdflush 15 root txt unknown /proc/15/exe kswapd0 16 root cwd DIR 3,11 688 2 / kswapd0 16 root rtd DIR 3,11 688 2 / kswapd0 16 root txt unknown /proc/16/exe aio/0 17 root cwd DIR 3,11 688 2 / aio/0 17 root rtd DIR 3,11 688 2 / aio/0 17 root txt unknown /proc/17/exe kpsmoused 18 root cwd DIR 3,11 688 2 / kpsmoused 18 root rtd DIR 3,11 688 2 / kpsmoused 18 root txt unknown /proc/18/exe ata/0 57 root cwd DIR 3,11 688 2 / ata/0 57 root rtd DIR 3,11 688 2 / ata/0 57 root txt unknown /proc/57/exe ata_aux 58 root cwd DIR 3,11 688 2 / ata_aux 58 root rtd DIR 3,11 688 2 / ata_aux 58 root txt unknown /proc/58/exe scsi_eh_0 60 root cwd DIR 3,11 688 2 / scsi_eh_0 60 root rtd DIR 3,11 688 2 / scsi_eh_0 60 root txt unknown /proc/60/exe scsi_eh_1 61 root cwd DIR 3,11 688 2 / scsi_eh_1 61 root rtd DIR 3,11 688 2 / scsi_eh_1 61 root txt unknown /proc/61/exe scsi_eh_2 76 root cwd DIR 3,11 688 2 / scsi_eh_2 76 root rtd DIR 3,11 688 2 / scsi_eh_2 76 root txt unknown /proc/76/exe ksuspend_ 187 root cwd DIR 3,11 688 2 / ksuspend_ 187 root rtd DIR 3,11 688 2 / ksuspend_ 187 root txt unknown /proc/187/exe khubd 188 root cwd DIR 3,11 688 2 / khubd 188 root rtd DIR 3,11 688 2 / khubd 188 root txt unknown /proc/188/exe reiserfs/ 554 root cwd DIR 3,11 688 2 / reiserfs/ 554 root rtd DIR 3,11 688 2 / reiserfs/ 554 root txt unknown /proc/554/exe udevd 627 root cwd DIR 3,11 688 2 / udevd 627 root rtd DIR 3,11 688 2 / udevd 627 root txt REG 3,11 101544 52774 /sbin/udevd udevd 627 root mem REG 3,11 47784 220323 /lib64/libnss_files-2.9.so udevd 627 root mem REG 3,11 43744 23292 /lib64/libnss_nis-2.9.so udevd 627 root mem REG 3,11 89232 211603 /lib64/libnsl-2.9.so udevd 627 root mem REG 3,11 31792 220058 /lib64/libnss_compat-2.9.so udevd 627 root mem REG 3,11 14872 211599 /lib64/libdl-2.9.so udevd 627 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so udevd 627 root mem REG 3,11 113904 38855 /lib64/libselinux.so.1 udevd 627 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so udevd 627 root 0u CHR 1,3 0t0 1830 /dev/null udevd 627 root 1u CHR 1,3 0t0 1830 /dev/null udevd 627 root 2u CHR 1,3 0t0 1830 /dev/null udevd 627 root 3r DIR 3,11 1440 3099 /etc/init.d/boot.d udevd 627 root 4r DIR 0,10 0 1 inotify udevd 627 root 5u unix 0xffff88003787fc00 0t0 1884 socket udevd 627 root 6u sock 0,4 0t0 1885 can't identify protocol udevd 627 root 7r FIFO 0,6 0t0 1886 pipe udevd 627 root 8w FIFO 0,6 0t0 1886 pipe kgameport 1021 root cwd DIR 3,11 688 2 / kgameport 1021 root rtd DIR 3,11 688 2 / kgameport 1021 root txt unknown /proc/1021/exe khpsbpkt 1293 root cwd DIR 3,11 688 2 / khpsbpkt 1293 root rtd DIR 3,11 688 2 / khpsbpkt 1293 root txt unknown /proc/1293/exe knodemgrd 1310 root cwd DIR 3,11 688 2 / knodemgrd 1310 root rtd DIR 3,11 688 2 / knodemgrd 1310 root txt unknown /proc/1310/exe saa7133[0 1380 root cwd DIR 3,11 688 2 / saa7133[0 1380 root rtd DIR 3,11 688 2 / saa7133[0 1380 root txt unknown /proc/1380/exe kstriped 1464 root cwd DIR 3,11 688 2 / kstriped 1464 root rtd DIR 3,11 688 2 / kstriped 1464 root txt unknown /proc/1464/exe kdmflush 1479 root cwd DIR 3,11 688 2 / kdmflush 1479 root rtd DIR 3,11 688 2 / kdmflush 1479 root txt unknown /proc/1479/exe kdmflush 1488 root cwd DIR 3,11 688 2 / kdmflush 1488 root rtd DIR 3,11 688 2 / kdmflush 1488 root txt unknown /proc/1488/exe xfs_mru_c 1547 root cwd DIR 3,11 688 2 / xfs_mru_c 1547 root rtd DIR 3,11 688 2 / xfs_mru_c 1547 root txt unknown /proc/1547/exe xfslogd/0 1548 root cwd DIR 3,11 688 2 / xfslogd/0 1548 root rtd DIR 3,11 688 2 / xfslogd/0 1548 root txt unknown /proc/1548/exe xfsdatad/ 1549 root cwd DIR 3,11 688 2 / xfsdatad/ 1549 root rtd DIR 3,11 688 2 / xfsdatad/ 1549 root txt unknown /proc/1549/exe xfsbufd 1550 root cwd DIR 3,11 688 2 / xfsbufd 1550 root rtd DIR 3,11 688 2 / xfsbufd 1550 root txt unknown /proc/1550/exe xfsaild 1552 root cwd DIR 3,11 688 2 / xfsaild 1552 root rtd DIR 3,11 688 2 / xfsaild 1552 root txt unknown /proc/1552/exe xfssyncd 1553 root cwd DIR 3,11 688 2 / xfssyncd 1553 root rtd DIR 3,11 688 2 / xfssyncd 1553 root txt unknown /proc/1553/exe mount.ntf 1575 root cwd DIR 3,11 688 2 / mount.ntf 1575 root rtd DIR 3,11 688 2 / mount.ntf 1575 root txt REG 3,11 40400 41621 /bin/ntfs-3g mount.ntf 1575 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so mount.ntf 1575 root mem REG 3,11 130284 23297 /lib64/libpthread-2.9.so mount.ntf 1575 root mem REG 3,11 273120 56571 /lib64/libntfs-3g.so.40.0.0 mount.ntf 1575 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so mount.ntf 1575 root mem REG 3,11 256444 233499 /usr/lib/locale/it_IT.utf8/LC_CTYPE mount.ntf 1575 root mem REG 3,11 952254 233500 /usr/lib/locale/it_IT.utf8/LC_COLLATE mount.ntf 1575 root mem REG 3,11 54 31176 /usr/lib/locale/it_IT.utf8/LC_NUMERIC mount.ntf 1575 root mem REG 3,11 2426 199831 /usr/lib/locale/it_IT.utf8/LC_TIME mount.ntf 1575 root mem REG 3,11 294 31077 /usr/lib/locale/it_IT.utf8/LC_MONETARY mount.ntf 1575 root mem REG 3,11 54 233494 /usr/lib/locale/it_IT.utf8/LC_MESSAGES/SYS_LC_MESSAGES mount.ntf 1575 root mem REG 3,11 34 31219 /usr/lib/locale/it_IT.utf8/LC_PAPER mount.ntf 1575 root mem REG 3,11 62 233491 /usr/lib/locale/it_IT.utf8/LC_NAME mount.ntf 1575 root mem REG 3,11 127 220321 /usr/lib/locale/it_IT.utf8/LC_ADDRESS mount.ntf 1575 root mem REG 3,11 49 31066 /usr/lib/locale/it_IT.utf8/LC_TELEPHONE mount.ntf 1575 root mem REG 3,11 23 31223 /usr/lib/locale/it_IT.utf8/LC_MEASUREMENT mount.ntf 1575 root mem REG 3,11 26050 30680 /usr/lib64/gconv/gconv-modules.cache mount.ntf 1575 root mem REG 3,11 343 29123 /usr/lib/locale/it_IT.utf8/LC_IDENTIFICATION mount.ntf 1575 root 0u CHR 1,3 0t0 1830 /dev/null mount.ntf 1575 root 1u CHR 1,3 0t0 1830 /dev/null mount.ntf 1575 root 2u CHR 1,3 0t0 1830 /dev/null mount.ntf 1575 root 3r DIR 3,11 1440 3099 /etc/init.d/boot.d mount.ntf 1575 root 4u BLK 3,5 0x27115f400 1487 /dev/hda5 mount.ntf 1575 root 5u CHR 10,229 0t0 5152 /dev/fuse xfsbufd 1576 root cwd DIR 3,11 688 2 / xfsbufd 1576 root rtd DIR 3,11 688 2 / xfsbufd 1576 root txt unknown /proc/1576/exe xfsaild 1577 root cwd DIR 3,11 688 2 / xfsaild 1577 root rtd DIR 3,11 688 2 / xfsaild 1577 root txt unknown /proc/1577/exe xfssyncd 1578 root cwd DIR 3,11 688 2 / xfssyncd 1578 root rtd DIR 3,11 688 2 / xfssyncd 1578 root txt unknown /proc/1578/exe console-k 2062 root cwd DIR 3,11 688 2 / console-k 2062 root rtd DIR 3,11 688 2 / console-k 2062 root txt REG 3,11 140224 154371 /usr/sbin/console-kit-daemon console-k 2062 root mem REG 3,11 96744 263920 /lib64/libgcc_s.so.1 console-k 2062 root mem REG 3,11 170240 220093 /lib64/libexpat.so.1.5.2 console-k 2062 root mem REG 3,11 194816 39927 /usr/lib64/libpcre.so.0.0.1 console-k 2062 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so console-k 2062 root mem REG 3,11 130284 23297 /lib64/libpthread-2.9.so console-k 2062 root mem REG 3,11 106040 243740 /usr/lib64/libpolkit.so.2.0.0 console-k 2062 root mem REG 3,11 803112 186719 /usr/lib64/libglib-2.0.so.0.1800.2 console-k 2062 root mem REG 3,11 36008 220367 /lib64/librt-2.9.so console-k 2062 root mem REG 3,11 18984 38389 /usr/lib64/libgthread-2.0.so.0.1800.2 console-k 2062 root mem REG 3,11 277928 186965 /usr/lib64/libgobject-2.0.so.0.1800.2 console-k 2062 root mem REG 3,11 253488 138799 /lib64/libdbus-1.so.3.4.0 console-k 2062 root mem REG 3,11 89232 211603 /lib64/libnsl-2.9.so console-k 2062 root mem REG 3,11 135848 105580 /usr/lib64/libdbus-glib-1.so.2.1.0 console-k 2062 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so console-k 2062 root 0u CHR 1,3 0t0 1830 /dev/null console-k 2062 root 1u CHR 1,3 0t0 1830 /dev/null console-k 2062 root 2u CHR 1,3 0t0 1830 /dev/null console-k 2062 root 3r DIR 3,11 2520 3105 /etc/init.d/rc5.d console-k 2062 root 4r FIFO 0,6 0t0 5888 pipe console-k 2062 root 5u CHR 1,3 0t0 1830 /dev/null console-k 2062 root 6r DIR 0,10 0 1 inotify console-k 2062 root 7w FIFO 0,6 0t0 5888 pipe console-k 2062 root 8r FIFO 0,6 0t0 5889 pipe console-k 2062 root 9w FIFO 0,6 0t0 5889 pipe console-k 2062 root 12r DIR 0,10 0 1 inotify console-k 2062 root 14r DIR 3,11 88 76582 /etc/ConsoleKit/run-session.d console-k 2062 root 15r FIFO 0,6 0t0 11086 pipe console-k 2062 root 16w FIFO 0,6 0t0 11086 pipe console-k 2062 root 17r DIR 3,11 48 76592 /usr/lib/ConsoleKit/run-session.d console-k 2062 root 18r DIR 3,11 88 76582 /etc/ConsoleKit/run-session.d console-k 2062 root 19r DIR 3,11 48 76592 /usr/lib/ConsoleKit/run-session.d console-k 2062 root 20r DIR 3,11 88 76582 /etc/ConsoleKit/run-session.d console-k 2062 root 21r DIR 3,11 48 76592 /usr/lib/ConsoleKit/run-session.d console-k 2062 root 22r DIR 3,11 88 76582 /etc/ConsoleKit/run-session.d console-k 2062 root 23r DIR 3,11 48 76592 /usr/lib/ConsoleKit/run-session.d kauditd 2720 root cwd DIR 3,11 688 2 / kauditd 2720 root rtd DIR 3,11 688 2 / kauditd 2720 root txt unknown /proc/2720/exe gam_serve 3978 diego cwd DIR 3,12 12288 131 /home/diego gam_serve 3978 diego rtd DIR 3,11 688 2 / gam_serve 3978 diego txt REG 3,11 343918 92261 /usr/lib64/gam_server gam_serve 3978 diego mem REG 3,11 194816 39927 /usr/lib64/libpcre.so.0.0.1 gam_serve 3978 diego mem REG 3,11 1406248 23271 /lib64/libc-2.9.so gam_serve 3978 diego mem REG 3,11 803112 186719 /usr/lib64/libglib-2.0.so.0.1800.2 gam_serve 3978 diego mem REG 3,11 127896 23264 /lib64/ld-2.9.so gam_serve 3978 diego mem REG 3,11 217016 199483 /var/run/nscd/passwd gam_serve 3978 diego mem REG 3,11 26050 30680 /usr/lib64/gconv/gconv-modules.cache gam_serve 3978 diego 0r CHR 1,3 0t0 1830 /dev/null gam_serve 3978 diego 1w CHR 1,3 0t0 1830 /dev/null gam_serve 3978 diego 2w CHR 1,3 0t0 1830 /dev/null gam_serve 3978 diego 3r DIR 0,10 0 1 inotify gam_serve 3978 diego 4u unix 0xffff88006b0759c0 0t0 12136 socket gam_serve 3978 diego 5r FIFO 0,6 0t0 12137 pipe gam_serve 3978 diego 6w FIFO 0,6 0t0 12137 pipe em28xx-wo 4036 root cwd DIR 3,11 688 2 / em28xx-wo 4036 root rtd DIR 3,11 688 2 / em28xx-wo 4036 root txt unknown /proc/4036/exe rc 4885 root cwd DIR 3,11 688 2 / rc 4885 root rtd DIR 3,11 688 2 / rc 4885 root txt REG 3,11 715072 215022 /bin/bash rc 4885 root mem REG 3,11 293936 188701 /lib64/libncurses.so.5.6 rc 4885 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so rc 4885 root mem REG 3,11 14872 211599 /lib64/libdl-2.9.so rc 4885 root mem REG 3,11 263568 35119 /lib64/libreadline.so.5.2 rc 4885 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so rc 4885 root 0u CHR 5,1 0t0 1790 /dev/console rc 4885 root 1u CHR 5,1 0t0 1790 /dev/console rc 4885 root 2u CHR 5,1 0t0 1790 /dev/console rc 4885 root 255r REG 3,11 9374 131764 /etc/init.d/rc S01halt 5796 root cwd DIR 3,11 688 2 / S01halt 5796 root rtd DIR 3,11 688 2 / S01halt 5796 root txt REG 3,11 715072 215022 /bin/bash S01halt 5796 root mem REG 3,11 293936 188701 /lib64/libncurses.so.5.6 S01halt 5796 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so S01halt 5796 root mem REG 3,11 14872 211599 /lib64/libdl-2.9.so S01halt 5796 root mem REG 3,11 263568 35119 /lib64/libreadline.so.5.2 S01halt 5796 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so S01halt 5796 root 0u CHR 5,1 0t0 1790 /dev/console S01halt 5796 root 1u CHR 5,1 0t0 1790 /dev/console S01halt 5796 root 2u CHR 5,1 0t0 1790 /dev/console S01halt 5796 root 3r FIFO 0,6 0t0 19698 pipe S01halt 5796 root 255r REG 3,11 5994 131759 /etc/init.d/halt startpar 5829 root cwd DIR 3,11 688 2 / startpar 5829 root rtd DIR 3,11 688 2 / startpar 5829 root txt REG 3,11 27464 47214 /sbin/startpar startpar 5829 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so startpar 5829 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so startpar 5829 root 0u CHR 5,1 0t0 1790 /dev/console startpar 5829 root 1w FIFO 0,6 0t0 19698 pipe startpar 5829 root 2u CHR 5,1 0t0 1790 /dev/console startpar 5829 root 3r DIR 3,11 1440 3099 /etc/init.d/boot.d startpar 5829 root 4r FIFO 0,6 0t0 19699 pipe startpar 5829 root 5w FIFO 0,6 0t0 19699 pipe startpar 5829 root 6u CHR 5,2 0t0 1832 /dev/ptmx boot.loca 5970 root cwd DIR 3,11 688 2 / boot.loca 5970 root rtd DIR 3,11 688 2 / boot.loca 5970 root txt REG 3,11 715072 215022 /bin/bash boot.loca 5970 root mem REG 3,11 293936 188701 /lib64/libncurses.so.5.6 boot.loca 5970 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so boot.loca 5970 root mem REG 3,11 14872 211599 /lib64/libdl-2.9.so boot.loca 5970 root mem REG 3,11 263568 35119 /lib64/libreadline.so.5.2 boot.loca 5970 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so boot.loca 5970 root 0u CHR 5,1 0t0 1790 /dev/console boot.loca 5970 root 1u CHR 136,0 0t0 2 /dev/pts/0 boot.loca 5970 root 2u CHR 136,0 0t0 2 /dev/pts/0 boot.loca 5970 root 3r DIR 3,11 1440 3099 /etc/init.d/boot.d boot.loca 5970 root 255r REG 3,11 9778 230893 /etc/init.d/boot.localfs lsof 5980 root cwd DIR 3,11 688 2 / lsof 5980 root rtd DIR 3,11 688 2 / lsof 5980 root txt REG 3,11 127416 35202 /usr/bin/lsof lsof 5980 root mem REG 3,11 14872 211599 /lib64/libdl-2.9.so lsof 5980 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so lsof 5980 root mem REG 3,11 113904 38855 /lib64/libselinux.so.1 lsof 5980 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so lsof 5980 root 0u CHR 5,1 0t0 1790 /dev/console lsof 5980 root 1w REG 3,11 33932 88408 /var/log/boot.fail.msg lsof 5980 root 2u CHR 136,0 0t0 2 /dev/pts/0 lsof 5980 root 3r DIR 0,3 0 1 /proc lsof 5980 root 4r DIR 0,3 0 19868 /proc/5980/fd lsof 5980 root 5w FIFO 0,6 0t0 19872 pipe lsof 5980 root 6r FIFO 0,6 0t0 19873 pipe lsof 5981 root cwd DIR 3,11 688 2 / lsof 5981 root rtd DIR 3,11 688 2 / lsof 5981 root txt REG 3,11 127416 35202 /usr/bin/lsof lsof 5981 root mem REG 3,11 14872 211599 /lib64/libdl-2.9.so lsof 5981 root mem REG 3,11 1406248 23271 /lib64/libc-2.9.so lsof 5981 root mem REG 3,11 113904 38855 /lib64/libselinux.so.1 lsof 5981 root mem REG 3,11 127896 23264 /lib64/ld-2.9.so lsof 5981 root 4r FIFO 0,6 0t0 19872 pipe lsof 5981 root 7w FIFO 0,6 0t0 19873 pipe ------ lsmod ---- Module Size Used by zl10353 8368 1 em28xx_dvb 20092 0 dvb_core 87948 1 em28xx_dvb em28xx_audio 9036 0 tuner_xc3028 6264 1 tvp5150 18712 0 em28xx 413988 2 em28xx_dvb,em28xx_audio ip6t_LOG 7180 7 xt_tcpudp 3608 2 xt_pkttype 2152 3 ipt_LOG 6812 11 xt_limit 3180 18 binfmt_misc 10260 1 ip6t_REJECT 6024 3 nf_conntrack_ipv6 24840 4 ip6table_raw 2456 1 xt_NOTRACK 2152 4 ipt_REJECT 3480 3 xt_state 2568 14 iptable_raw 2760 1 iptable_filter 3400 1 ip6table_mangle 3128 0 nf_conntrack_netbios_ns 2840 0 nf_conntrack_ipv4 12792 10 nf_conntrack 80480 5 nf_conntrack_ipv6,xt_NOTRACK,xt_state,nf_conntrack_netbios_ns,nf_conntrack_ipv4 ip_tables 19464 2 iptable_raw,iptable_filter ip6table_filter 3240 1 ip6_tables 21048 4 ip6t_LOG,ip6table_raw,ip6table_mangle,ip6table_filter x_tables 23376 11 ip6t_LOG,xt_tcpudp,xt_pkttype,ipt_LOG,xt_limit,ip6t_REJECT,xt_NOTRACK,ipt_REJECT,xt_state,ip_tables,ip6_tables ipv6 293608 11 ip6t_REJECT,nf_conntrack_ipv6,ip6table_mangle cpufreq_conservative 8272 0 cpufreq_userspace 4204 0 cpufreq_powersave 2248 0 powernow_k8 15580 0 fuse 61088 2 nls_iso8859_1 5352 5 nls_cp437 7064 5 vfat 11864 5 fat 54376 1 vfat xfs 545312 2 loop 17924 0 dm_mod 73952 5 saa7134_alsa 14464 0 tda827x 10892 1 tda8290 14956 1 tuner 26220 0 saa7134 158020 1 saa7134_alsa sg 35344 0 osst 52928 0 ir_common 43340 1 saa7134 compat_ioctl32 8536 1 saa7134 videodev 35328 4 em28xx,tuner,saa7134,compat_ioctl32 st 38892 2 v4l1_compat 14220 2 em28xx,videodev ohci1394 31380 0 v4l2_common 12600 2 tuner,saa7134 videobuf_dma_sg 14332 2 saa7134_alsa,saa7134 videobuf_core 20748 2 saa7134,videobuf_dma_sg ieee1394 98880 1 ohci1394 tveeprom 13708 1 saa7134 nvidia 5662024 0 rtc_cmos 13960 0 snd_pcm 95440 2 em28xx_audio,saa7134_alsa ppdev 8208 0 isp1760 20776 0 shpchp 32244 0 rtc_core 22420 1 rtc_cmos snd_timer 26664 1 snd_pcm ide_cd_mod 33984 0 pci_hotplug 31864 1 shpchp button 8328 0 rtc_lib 3560 1 rtc_core parport_pc 40392 0 ns558 6264 0 snd 74632 4 em28xx_audio,saa7134_alsa,snd_pcm,snd_timer gameport 13640 2 ns558 cdrom 36200 1 ide_cd_mod i2c_nforce2 8624 0 parport 41568 2 ppdev,parport_pc forcedeth 60312 0 k8temp 5352 0 pcspkr 3064 0 snd_page_alloc 9816 1 snd_pcm i2c_core 35280 12 zl10353,tuner_xc3028,tvp5150,em28xx,tda827x,tda8290,tuner,saa7134,v4l2_common,tveeprom,nvidia,i2c_nforce2 soundcore 8816 1 snd floppy 63240 0 ide_disk 14872 14 ehci_hcd 55348 0 ohci_hcd 36548 0 usbcore 198656 7 em28xx_dvb,em28xx_audio,em28xx,isp1760,ehci_hcd,ohci_hcd advansys 79600 0 edd 10272 0 reiserfs 241392 4 fan 6016 0 ide_pci_generic 4652 0 ata_generic 6044 0 pata_amd 13692 0 sata_nv 26480 0 libata 183376 3 ata_generic,pata_amd,sata_nv scsi_mod 179144 5 sg,osst,st,advansys,libata dock 14564 1 libata amd74xx 7152 12 ide_core 118012 4 ide_cd_mod,ide_disk,ide_pci_generic,amd74xx thermal 24232 0 processor 49904 2 powernow_k8,thermal thermal_sys 14336 3 fan,thermal,processor hwmon 4040 2 k8temp,thermal_sys ------------END-------
Hello Diego, it should be "lsof | grep /home" instead of "lsof" to shorten the log to the most important things. Nevertheless the gamin-server blocks your /home on unmounting. That was my problem, too before I installes these packages. "gam_serve 3978 diego cwd DIR 3,12 12288 131 /home/diego" I cannot say, why this still happens to you, maybe Dr. Werner Fink has an idea.
IMHO this has nothing todo with the HOME paritition. It looks like mkill -TERM stops a daemon or service process which shouldn't be stopped on Diegos system. Diego? Please could you change the above line into strace mkill -TERM the we may see, which user space process is the problem. Beside this it seems that you're using NTF file systems together with the ntfs-3g tools from gnome. And the mount.ntfs seems to hang around as daemons? As a possible solution you could add ntfs,ntfs-3g to the line typeset -r tmpfs=tmpfs,ramfs,hugetlbfs,mqueue of /etc/init.d/boot.localfs
Created attachment 267289 [details] "debug" patch for boot.localfs I rewrote the patch to accomplish more informations here it is
Created attachment 267291 [details] dump generated by attachment #267289 [details] This is the generated log file from #267289. Near Date I wrote ok if the dump refer to a successful shutdown I wrote "failed" if it refers to a "freezed" shutdown. The dump refers to another fresh 11.1 install machine with #266174 and #266173
Do you have troed to add ntfs,ntfs-3g to tmpfs like typeset -r tmpfs=tmpfs,ramfs,hugetlbfs,mqueue,ntfs,ntfs-3g in /etc/init.d/boot.localfs
I'll do this week end. This is another PC
I forgot to say two things for my last attachment. As you can see in my patch, I mark the end of strace with a tag "------------END 2st phase-------" and for the successful shutdown it correctly appears in the log file (strangely because of mkill itself) and doesn't appear in the failed shutdown so it seem that mkill is the freezing thing as also on the console doesn't appear the debug line that "set -x" should show.
Nothing strange ... if mkill terminates a user space deamon which serve a NTFS file system as user space driver the system hangs.
But, as you see... in this computer attachment #267291 [details] I don't have any ntfs filesystem
In comment #37 I see a running user space daemon /bin/ntfs-3g .. maybe this is not for a NTFS but for a vfat. Nevertheless it is a user space daemon which provides a driver for windows file systems. And this is killed by mkill (which should not happen here).
Yes, on that machine I added the modifications you asked me to in comment #39 but for my comment #41 I shoud repeat that on that (comment #41) I don't have any vfat/ntfs partition
There is also a user space daemon gvfs-fuse which provides a file system driver in user space.
Please provide the full content of the pid directory of such a gvfs-fuse and ntfs-3g daemon (you have to be root to do this), e.g. with this for p in $(fuser /dev/fuse 2>/dev/null); do find /proc/$p -maxdepth 2 -type l -ls done this should find all processes which provides user space file systems driver.
Yes, gvfs uses fuse... but honestly I don't know who install gvfs as I use kde gvfs-fuse-daemon on /home/diego/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=diego) 70931 0 lrwx------ 1 diego users 64 Jan 26 12:30 /proc/4906/fd/0 -> /dev/null 70932 0 lrwx------ 1 diego users 64 Jan 26 12:30 /proc/4906/fd/1 -> /dev/null 70933 0 lrwx------ 1 diego users 64 Jan 26 12:30 /proc/4906/fd/2 -> /dev/null 70934 0 lrwx------ 1 diego users 64 Jan 26 12:30 /proc/4906/fd/3 -> /dev/fuse 70935 0 lrwx------ 1 diego users 64 Jan 26 12:30 /proc/4906/fd/4 -> socket:[14916] 70936 0 lr-x------ 1 diego users 64 Jan 26 12:30 /proc/4906/fd/5 -> pipe:[14918] 70937 0 l-wx------ 1 diego users 64 Jan 26 12:30 /proc/4906/fd/6 -> pipe:[14918] 70938 0 lrwx------ 1 diego users 64 Jan 26 12:30 /proc/4906/fd/7 -> socket:[14919] 70929 0 lrwxrwxrwx 1 diego users 0 Jan 26 12:30 /proc/4906/cwd -> / 70928 0 lrwxrwxrwx 1 diego users 0 Jan 26 12:30 /proc/4906/root -> / 22139 0 lrwxrwxrwx 1 diego users 0 Jan 26 09:15 /proc/4906/exe -> /usr/lib64/gvfs/gvfs-fuse-daemon
Created attachment 267579 [details] sysvinit-2.86-186.14.i586.rpm sysvinit with mkill which skips all processes using /dev/fuse
Created attachment 267581 [details] sysvinit-2.86-186.14.x86_64.rpm sysvinit with mkill which skips all processes using /dev/fuse
I've modified and also verified that the mkill(8) utility from the sysvinit of the attachmnent #267579 and attachmnent #67581 will not touch the user space daemons providing a file system driver. Please test out if this helps in your case.
Shall I remove the comment #42 modifications after applying new sysvinit?
yes
I think we catched the problem. It seem workstation are shutdowning correctly.
OK strike. Anja, we need a SWAMPID for this to get out updates for aaa_base and sysvinit for openSuSE 11.1. This because all users of file systems which are driven by user space deamon are affected by this problem.
I agree that we want to fix this for 11.1. I read the comment#57 as "submit to factory first", as an additional measure for protecting against regressions.
Both aaa_base and sysvinit are submitted to SLES11 and Factory ;)
added to the planned updates. if there are no regressions, lets revisit this and push out the update mid of next month.
The SWAMPID for this issue is 22283. Please submit the patch and patchinfo file using this ID. (https://swamp.suse.de/webswamp/wf/22283)
I've submitted both aaa_base *and* sysvinit as *both* packages are required. This includes also the fixes for several bugs for aaa_base bug #426270, bug #463477, bug #466718, bug #458940, bug #463175, bug #457093, bug #457984, bug #422010, bug #445646, bug #441053, and bug #442753 ...
when will be these patches landing on online_update my 11.0 to 11.1 upgraded system is still affected during every reboot cycle. loads of fsck on boot. thanks and regards.
Created attachment 269096 [details] shutdown freeze log I'm sorry to inform that again we have issue... I left log grab during shutdown and again we have a freeze of mkill.... unfortunately I don't have all the log of the failed shutdown but only the "mount" "lsmod" and "lsof" output before the stop procedure in /etc/init.d/boot.localfs. On the console procedure stopped again during mkill invocation.
*** Bug 465029 has been marked as a duplicate of this bug. ***
Update released for: aaa_base, sysvinit Products: openSUSE 11.1 (debug, i586, ppc, x86_64)
The SWAMPID for this issue is 22528. Please submit the patch and patchinfo file using this ID. (https://swamp.suse.de/webswamp/wf/22528)
Diego: Please check if the packages sysvinit and aaa_base are uptodate. The last changelog entry of sysvinit is: Fri Feb 6 00:36:27 CET 2009 - ro@suse.de - fix build (move static int loop before first usage) Tue Jan 27 16:00:03 CET 2009 - werner@suse.de - Do not terminate udevd with mkill(8) - Do not terminate udevd with killall5(8) - Avoid chrashing startpar due recursion caused by loops Mon Jan 26 12:02:43 CET 2009 - werner@suse.de - Do not kill fuse user space processes with mkill(8) (bnc#466484) - Minimize fuse patch for killall5(8) by using readlinkat(2) and those of aaa_base Mon Jan 26 11:25:36 CET 2009 - coolo@suse.de - removing the timeout, there is no good timeout value (bnc#426270) Fri Jan 23 12:19:31 CET 2009 - coolo@suse.de - wait for udev to settle the modprobe events (bnc#426270) ... AFAICS these are the latest changes for openSuSE 11.1. If with this changes the mkill hangs then the order of the mounts could be wrong. With the current mkill the udev daemon process and all processes using /dev/fuse for user space driven file systems are not touched anymore. This is process 1685 listed in your log serving the file system for /dev/hda5 aka /windows/F
my 11.1 x86 with the latest online_update patches applied doesnt suffer from unclean shutdown, and fsck sessions during startup any more. the bug seems to be gone now. thank you. at last. p.s. how about better quality control, learning from past mistakes and trying to avoid such new bugs that get introduced with each new opensuse release or even intermediate patches during lifecycle. thanks.
Andreas (Jaeger)? ... the last question concerns to you. Andreas (Bittner)? ... do you have been beta tester? Such bug can only be detected with the help of beta testers with various and exotic system setups. Diego? Do you have verified which version of aaa_base and sysvinit is installed on your system. Beside this the command line mkill -0 /windows/* | xargs ps -x whill show you all processes which makes your windows mounts busy.
Werner, andreas: Please discuss the quality of openSUSE on the opensuse-testing or opensuse-factory mailing lists. Ideas how the openSUSE community can do better are appreciated.
Diego? Please read comment #73
Sorry: sysvinit-2.86-186.15.1 aaa_base-11.1-10007.12.1 On about 20 shutdown I had about a couple of freezes. for the mkill -0 command, I did it to fuse mounted filesystem (~/.gvfs and /sys/fs/fuse/connections) but it seems no process is using it on this machine (that today frozed one time the shutdown procedure) I don't have any windows partition
The please add the boot fila messages of this system ... this because the log in attachment #269096 [details] definitly shows windows partitions: /dev/hda11 on / type reiserfs (rw) /proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) debugfs on /sys/kernel/debug type debugfs (rw) udev on /dev type tmpfs (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) /dev/hda12 on /home type xfs (rw) /dev/hda8 on /data1 type reiserfs (rw) /dev/hda7 on /suse10.2 type reiserfs (rw) /dev/hda1 on /windows/C type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda2 on /windows/D type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda3 on /windows/E type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda5 on /windows/F type fuseblk (rw,noexec,nosuid,nodev,allow_other,default_permissions,blksize=4096) /dev/hda9 on /windows/G type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda10 on /windows/H type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/mapper/dati-multimedia on /mnt/hdb1 type xfs (rw,noexec,nosuid,nodev) /dev/mapper/dati-distribuzione on /mnt/hdb2 type reiserfs (rw,noexec,nosuid,nodev) securityfs on /sys/kernel/security type securityfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) and the process 1685 (mount.ntf) is providing /windows/F.
Yes, the the dump you've shown is about another machine where there are mounted vfat and ntfs filesystems (even this system is a OpenSuSE 11.1 x86_64 with same patchlevel for sysvinit and aaa_base) : /dev/hda1 on /windows/C type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda2 on /windows/D type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda3 on /windows/E type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda5 on /windows/F type fuseblk (rw,noexec,nosuid,nodev,allow_other,default_permissions,blksize=4096) /dev/hda9 on /windows/G type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) /dev/hda10 on /windows/H type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true) even for that system: mkill -0 /windows/*, mkill -0 ~/.gvfs and mkill -0 /sys/fs/fuse/connections doesn't return any value
Strange .. then please add strace before the mkill which freeze on your system, the line mkill -TERM $ulist in /etc/init.d/boot.localfs becomes strace -s 80 mkill -TERM $ulist I'd like to see the last few famouse lines of strace.
I reput on the "debug patch" #267289 with the strace addendum a the mkill -0 on top. next time it'll happen a dirty shutdown I'll post the dump
What does `dirty shutdown' exactly mean?
sorry it's only because as shutdown procedure hangs, I have to force a machine shutdown pressing and holding PC power button
Created attachment 274659 [details] "debug" patch for boot.localfs
Created attachment 274963 [details] last shutdown hangup This is last shutdown hangup log. I have a small question about shutdown procedure: 1. what happens if I mount (outside the fstab) a windows share mount -t cifs -o username=name //cifsserver.domain/share /mnt/sambamount 2. what happens if I mount a device over another device: mount /dev/mapper/dati-samba /mnt/samba mount /dev/mapper/anotherthing /mnt/samba are these exception to what handled by boot.localfs ?
In normal case the 1) should be handled as the /etc/init.d/boot.localfs knows about cifs as it is part of the netfs variable. To bo not deadlocked such file systems will be ignored. the second case 2) seems not be the problem but it could be a problem as you mount local devices into a remote file system ... I've no clue what happens here ... but the mtab shows: /dev/sda5 on / type reiserfs (rw,acl,user_xattr) /proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) debugfs on /sys/kernel/debug type debugfs (rw) udev on /dev type tmpfs (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) /dev/mapper/dati-home on /home type xfs (rw) /dev/mapper/dati-samba on /mnt/samba type xfs (rw) /dev/mapper/dati-vmware on /mnt/vmware type reiserfs (rw) securityfs on /sys/kernel/security type securityfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) gvfs-fuse-daemon on /home/SSIS/diego.ercolani/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=SSIS+diego.ercolani) ... it seems that terminating the pid 4472 is the reason why your system hangs around. The pid belongs to the process of /usr/lib64/gam_server and seems to be connected to /usr/lib64/gvfs/gvfs-fuse-daemon as it use the same user id 10000. Please run rpm -qf /usr/lib64/gam_server to determine to which package this daemon belongs to.
Yes: rpm -qf /usr/lib64/gam_server -> gamin-0.1.10-0.pm.1 rpm -qi gamin Name : gamin Relocations: (not relocatable) Version : 0.1.10 Vendor: packman.links2linux.de Release : 0.pm.1 Build Date: Sat Jan 3 12:24:50 2009 Install Date: Mon Jan 5 09:19:47 2009 Build Host: pmbs Group : Development/Libraries Source RPM: gamin-0.1.10-0.pm.1.src.rpm Size : 777547 License: LGPL Signature : DSA/SHA1, Sat Jan 3 12:25:45 2009, Key ID f899f20d9a795806 Packager : Detlef Reichelt <detlef@links2linux.de> URL : http://www.gnome.org/~veillard/gamin/ Summary : Library providing the FAM File Alteration Monitor API Description : This C library provides an API and ABI compatible file alteration monitor mechanism compatible with FAM but not dependent on a system wide daemon. Distribution: openSUSE 11.1 (x86_64)
I do not find this package on opensuse.org ... does this daemon have a signal handler for SIGTERM around to call inotify_rm_watch(2) for its inotify file descriptor which ... I guess ... is set on cwd which is /home/SSIS/diego.ercolani/Documents. Or do you know any other reason why your systems becomes (temporary) dead locked if the gam_server is terminated? Could it be that /home/SSIS/diego.ercolani/Documents is included with samba, that is that at this point we do not have any network around and terminating the gam_server triggers a (long) network timeout. If so it would better to terminate the gam_server *before* shuting down the samba connection and the network.
You don't find because is from packman repository I guess. I think is installed through dependencies. From a konsole I killed it kill `pgrep gam_server` system doesn't hangup but in the process table I find a new instance of gam_server I issued a "ps afuxwwww" to search which process launches gam_server, but it seem to be launched at the same level of init. It isn't possible to erase gamin as is needed by: rpm -e gamin error: Failed dependencies: libfam.so.0()(64bit) is needed by (installed) libgio-fam-2.18.2-5.1.x86_64 libfam.so.0()(64bit) is needed by (installed) kdelibs3-3.5.10-21.11.x86_64 libfam.so.0()(64bit) is needed by (installed) gnome-vfs2-2.24.0-4.1.x86_64 libfam.so.0()(64bit) is needed by (installed) libkipi5-4.1.3-4.6.x86_64 libfam.so.0()(64bit) is needed by (installed) libkde4-4.2.0-102.1.x86_64
On openSuSE the libfam.so.0 is provided by fam-2.7.0-130.1 Name : fam Relocations: (not relocatable) Version : 2.7.0 Vendor: SUSE LINUX Products GmbH, Nuernberg, Germany Release : 130.29 Build Date: Fri Feb 20 17:22:58 2009 Install Date: Sun Feb 22 13:51:26 2009 Build Host: eisler Group : System/Daemons Source RPM: fam-2.7.0-130.29.src.rpm Size : 84562 License: GPL v2 or later; LGPL v2.1 or later Signature : RSA/8, Fri Feb 20 17:23:21 2009, Key ID e3a5c360307e3d54 URL : http://oss.sgi.com/projects/fam/ Summary : File Alteration Monitoring Daemon Description : Fam is a file alteration monitoring service. With it, you can receive signals when files are created or changed. This package provides libfam, which is used by KDE and GNOME. It also provides a tool for the console called fileschanged. To use fam notifications (it can reduce the network load on NFS servers, especially if they host user home directories) you need to run the fam daemon, which can be found in the fam-server package. Authors: -------- Bruce Karsh Bob Miller SGI corp. Author of fileschanged command line tool: Ben Asselstine <bda@panix.com> Distribution: SUSE:Factory:Head
In other words try rpm -e gamin --force killall -9 gam_server zypper install --name fam ... does this work for you?
rpm --nodeps -e gamin kill `pgrep gam_server` (I'm more graceful) zypper in --name fam -y (fam-2.7.0-130.1) Work done. Let's see what happens now but I think we have to investigate the gamin problem.... I sent a mail to the gamin packager...
Hi Matthias
Hi, i'm the packager of gamin, and i couldn't reproduce it on my systems. I've heard that sometimes not gam_server still hangs, but pulseaudio or artsd. So it sounds like a general problem of openSUSE 11.1. The shutdown process seems to fast... ;) I don't want to use fam, because it often hangs and kills thunar/pcmanfm. Gamin is absolut stable in use, so i decided to shift. I'm back at home in July (!), could somebody else help to fix the gamin.rpm?
For the record: I can reproduce the hang at home, even with a patched mkill from Werner. Will try to debug this at home. This is not necessarily the same issue, but we won't know until we analyze it.
Created attachment 278280 [details] Modified mkill that documents its work If something like this still happens on another machine - this is the instrumented version of mkill I used for debugging.
Created attachment 278281 [details] lsof output during shutdown
Created attachment 278282 [details] mkill log output
Created attachment 278284 [details] Bug fix for mkill This fixes the issues for me. Basically, the mount point comparison function was dead wrong. On my system, everything that had a file open *starting* with /d or /u (that includes /dev and /usr) was killed. Including "/bin/bash /etc/init.d/halt", and "/bin/bash /etc/init.d/boot.localfs" itself.
This is a severe issue. Also affects SLED11. Bug 467906 might be a dup. Though I somewhat doubt that in the meantime.
Created attachment 278423 [details] sysvinit-2.86-186.16.i586.rpm sysvinit with updated mkill utility
Created attachment 278425 [details] sysvinit-2.86-186.16.x86_64.rpm sysvinit with updated mkill
Diego? Does the new sysvinit rpm with the fixed mkill help for you?
Hello, my last change was to replace gaim with standard fam (as my comment #91), with this change, shutdown process doesn't "freeze", as supposed by Werner probably gaim (gam_server) causes a sort of deadlock when killed .... I don't know. BUT also if shutdown process doesn't hangup, the new problem is that during shutdown mkill leaves some file dectriptor open and then when the sistem brings up the next time, it complaints about dirty filesystem and fsck is runned.... this issue continues also with new (#101) sysvinit but what about you, are you recording these issues too?
mkill only opens /proc/mounts to get all active mount points and /proc to read the directory, then it uses readlink(2), open(2) and opendir(3) to determine which running program makes a mount point busy. And running strace -e open,readlink,close mkill -0 /dev does not show a file descriptor leak. Beside this mkill does not stop /sbin/udevd nor programs which have /dev/fuse open. The last one because if those would be terminated the underlying fuse file system becomes dirty. Could it be that there is a program which opens it own fuse device which the program its self creates with makedev(3) and mknod(2)? Or could it be that there is a program which uses a /dev/fuse within a chroot environment ... maybe a combination with a network based file system (samba,NFS) and a local file system. Nevertheless the new mkill sorts kill(pid,SIGTERM) in the reverse order of the mount points found in /proc/mounts. Or implies the order of your mounts that some of the mount points remains busy? To see this you should compare /etc/mtab and /proc/mounts. I'll add Magnus which maybe can explain what happens with this gvfs-fuse-daemon which seems to hold /home/SSIS/diego.ercolani/.gvfs or /home busy or dirty.
I'm actually just an external contributor helping out with package updates (which, I suppose, is where you found my name). I'm adding HPJ instead, who's the real maintainer of gvfs
Seems the latest update broke the shutdown process. At shutdown the system stops at Turning off swap files Sending all processes the TERM signal ... /etc/init.d/rc: line 317: 5907 killed $link start Master Resource Control: runlevel 0 has been reached Failed services in runlevel 0: smartd lm_sensors Skipped services in runlevel 0: SuSEfirewall2_setup INIT: no processes left in this runlevel Here the systems stops. It fails to unmount the drives and to shutdown. Could you please increase the priority of this bug, since at the moment the situation is quite problematic?
Michael, please see: http://en.opensuse.org/Bugs/Definitions This is not a BLOCKER according to our definition - and please do not change priority. Werner is on vacation this week and will answer this once he's back for sure - and change the priority himself.
Diego? Please could you add the line killproc -TERM /usr/sbin/console-kit-daemon before killproc -p $DBUS_DAEMON_PID -TERM $DBUS_DAEMON_BIN in /etc/init.d/dbus ... compare with bug #491063 Then please make sure that you have really installed aaa_base-11.2-1.6 and sysvinit-2.86-186.17.1 ... then you may add mkill -0 $ulist | xargs -r ps u mkill -0 $ulist | xargs -n 1 -r | while read p; do ls -Gl /proc/$p/fd; done sleep 10 before mkill -TERM $ulist in /etc/init.d/boot.localfs ... with this we may see what exactly happens compare with bug #486710 @Eberhard ... Are you using RUN_PARALLEL=no in /etc/sysconfig/boot or are you using PROMPT_FOR_CONFIRM=yes ??
Uhm.... in my current situation (sysvinit-2.86-186.17.1,aaa_base-11.1-10007.15.1) , shutdown doesn't hangs but filesystem ("/") doesn't cleanly unmount and so when system statup an fsck is done.
... OK ... then I'd like to know which process or which forgotten mount makes ``/'' busy. Please attach /var/log/boot.omsg, maybe this helps to see, what happens in last famous seconds. Also a cat /proc/mounts fuser -m / after the umount in /etc/init.d/boot.localfs would show whats going on there.
Created attachment 283526 [details] boot.omsg
Created attachment 283527 [details] kernel session log for a session without hang but without umount of rootfs Here it is the boot.omsg that you requested; cat /proc/mounts returned: rootfs / rootfs rw 0 0 udev /dev tmpfs rw,mode=755 0 0 /dev/hda11 / reiserfs rw 0 0 /proc /proc proc rw 0 0 sysfs /sys sysfs rw 0 0 debugfs /sys/kernel/debug debugfs rw 0 0 devpts /dev/pts devptsrw,gid=5,mode=620 0 0 securityfs /sys/kernel/security securityfs rw 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0 fuser returned: 1 rce 2rc 3rc 4rc 5rc 6rc 7rc 8rc 9rc 10rc 11rc 12rc 13rc 14rc 16rc 17rc 18rc 57rc 58rc 60rc 61rc 76rc 187rc 188rc 567rc 638rce 891rc 1165rc 1297rc 1300rc 1486rc 1579rc 1592rc 1601rc 1660rc 1661rc 1662rc 2173rce 2132rc 4288rc 4289rc 4767rc 4898rce 5803rce 5821rce 5977rce
PROMPT_FOR_CONFIRM=no RUN_PARALLEL was yes, changed to no, but that changed nothing aaa_base 11.1-100007.15.1-x86_64 sysvinit 2.86-186.17.1-x86_64 from the update repository I also attached my boot.omsg. Maybe it is of any importance that I run xfs.
@Eberhard I also use xfs here and do not have any problems here. Please run as root rpm -V aaa_base sysvinit and report the result. @Diego The root file system will be mounted read only in /etc/init.d/halt that is that it will *not* touched in /etc/init.d/boot.localfs ... AFAIC see from your comment #113 there is noting which makes the file system busy after the killall5 is done in /etc/init.d/halt
Werner: the problem is that I have more times that when I startup the machine it complaints about "dirty shutdown" This doesn't happen every time, it happens about 1 time each 3.
Hmmm ... strange, please have a look into /etc/init.d/halt at line 163 upto 184 ... was this piece of code reached if your system complains about "dirty shutdown" next boot? You may remove the `2> /dev/null' from the remount to see what happens and you may add a `usleep 100000' of `sleep 1' before the remount. Or maybe sometimes a daemon is within D state during the remount, you may also add fuser -a -m / before the remount.
... or use fuser -a -m / 2>/dev/null | xargs -r ps u and/or fuser -a -m / 2>/dev/null | \ xargs -n 1 -r | while read p; do ls -Gl /proc/$p/fd; done we may then see what going on there.
# rpm -V aaa_base sysvinit S.5....T c /etc/inittab S.5....T c /etc/mailcap
Created attachment 283820 [details] modification of boot.localfs and halt scripts
Created attachment 283822 [details] kernel session log for a session without hang but without umount of rootfs (halt complainted about "/" is busy during remount,ro)
Created attachment 283824 [details] log generated for the same session as (id=283822) by script modifications as (id=283820)
AFAICS from attachment #283824 [details] there is nothing busy :(( /dev/console does not belong nor /dev/initctl does belong to root fs
Beside the problem reported by Diego, is there anyone who has the reported problem *after* the update of aaa_base (10007.15.1), sysvinit (186.17.1), and applying ConsoleKit changes from bug #491063
Confirmed! I still have the problem although I applied the fix on consolekit.
Created attachment 284364 [details] kernel session log for a session without hang but without umount of rootfs (halt complainted about "/" is busy during remount,ro) today I registered the same issue I warned some days ago.
Created attachment 284365 [details] log generated for the same session as (id=284364) by script modifications as (id=283820)
@Eberhard: Are you running AOE+squashfs+aufs ... that is KIWI's AOE/NBD feature? If yes, this is highly experimental and does not belong to this bug, this problem is covered by bug #491890. @Diego: Please attach your boot.localfs and your halt script.
No, I'm not running it.
Created attachment 284642 [details] /etc/init.d/{boot.localfs,halt} /var/log/boot.{omsg,faill.msg} as requested in comment #128 As requested, here it is a new "session" dump. The included halt script is the last kind of modification I did: if "mount -o remount,ro /" fails, control is left to the shell. In the attached boot.fail.msg, mount -o remount failed and so I've done some dump like: lsof fuser -va / ps axuwwwwww and finally a strace -f mount -o remount,ro / >>/var/log/boot.fail.msg 2>&1 I know, probably in last "strace" the fail of the remount,ro can be caused by the strace itself but I don't know how to avoid it. After last strace, I issued a logout and correctly mount -o remount,ro / had a success run so session had a graceful shutdown. A little notice: After exit to shell,I firstly issued a "ps auxwwww" without redirecting its output to /var/log, I noticed that there was a zombie process (it was bacula-sd), after that I tryied to mount readonly the root without success and then I issued some diagnostic command (ps, lsof, fuser) redirecting output to /var/log/boot.fail.msg and then I close the session with a logout, and the halt script retryied successfully the mount remount,ro process. Examining the log on the next session, I noticed that in ps axuwwwww command I redirected to /var/log/boot.fail.msg doesn't appear any bacula-sd process.... Can be possible that after a while the zombie process exited and then it left the / filesystem free to be remounted RO?
Add maintainer of bacula to CC list (hmmm ... seems to be dropped in factory).
@Eberhard: AFAICS from comment #107 your system s loosing the root file system ... the question rises: *why* does this happen on your system. There must be a difference between your system and the system e.g. Diego or my own systems here around. Do you have a own kernel or are you using the standard kernel of 11.1. Next point are your mount points, please show us the output of `cat /proc/mounts'.
@Diego: Do the bacula daemons have threads and are real daemons? Please run ps -Leo pid,ppid,sid,lwp,stat,comm | grep bacula to see more about. What happens if you disable baluca that is insserv -r bacula-dir bacula-fd bacula-sd /etc/init.d/bacula-dir stop /etc/init.d/bacula-fd stop /etc/init.d/bacula-sd stop after this please check if there is any zombie process around. Now does shutdown work?
@Diego: Which package provides /etc/init.d/spindown? Please run rpm -qf /etc/init.d/spindown this because the boot.omsg shows: Shutting down the Bacula Storage daemonShutting down java.binfmt_misc done Shutting down irqbalance done Shutting down service kdmdone /etc/init.d/spindown: line 48: log_daemon_msg: command not found /etc/init.d/spindown: line 50: log_end_msg: command not found
I use the default kernel, now version 2.6.27.21-0.1.2-x86_64 from the update repository. Before it was 2.6.27.19 and nothing changed with the update. # cat /proc/mounts rootfs / rootfs rw 0 0 udev /dev tmpfs rw,mode=755 0 0 /dev/sda6 / xfs rw,noquota 0 0 /proc /proc proc rw 0 0 sysfs /sys sysfs rw 0 0 debugfs /sys/kernel/debug debugfs rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/sdb5 /usr/lib xfs rw,noquota 0 0 /dev/sdb6 /usr/lib64 xfs rw,attr2,noquota 0 0 /dev/sdb7 /opt xfs rw,attr2,noquota 0 0 /dev/sdb8 /home xfs rw,attr2,noquota 0 0 /dev/sda7 /windows/L vfat rw,nosuid,nodev,noexec,gid=100,fmask=0002,dmask=0002,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,utf8 0 0 /dev/sdb9 /windows/N vfat rw,nosuid,nodev,noexec,gid=100,fmask=0002,dmask=0002,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,utf8 0 0 /dev/sdb10 /windows/O vfat rw,nosuid,nodev,noexec,gid=100,fmask=0002,dmask=0002,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,utf8 0 0 /dev/sda5 /windows/D vfat rw,nosuid,nodev,noexec,gid=100,fmask=0002,dmask=0002,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,utf8 0 0 /dev/sda1 /windows/C fuseblk rw,nosuid,nodev,noexec,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096 0 0 /dev/sda9 /windows/M vfat rw,nosuid,nodev,noexec,gid=100,fmask=0002,dmask=0002,allow_utime=0020,codepage=cp437,iocharset=iso8859-1,utf8 0 0 /dev/sda8 /windows/P fuseblk rw,nosuid,nodev,noexec,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096 0 0 /dev/sdb11 /windows/Q fuseblk rw,nosuid,nodev,noexec,user_id=0,group_id=0,default_permissions,allow_other,blksize=4096 0 0 fusectl /sys/fs/fuse/connections fusectl rw 0 0 securityfs /sys/kernel/security securityfs rw 0 0 /proc /var/lib/ntp/proc proc ro 0 0
Could it be that there are still files needed from /usr/lib or /usr/lib64 when these file-systems are already unmounted? That would be funny since I'm running this fs-layout for a long time and I never had this problems before.
reference comment #133: ps -Leo pid,ppid,sid,lwp,stat,comm | grep bacula 3187 1 3187 3187 Ssl bacula-fd 3187 1 3187 3190 Ssl bacula-fd 3188 1 3188 3188 Ssl bacula-sd 3188 1 3188 3201 Ssl bacula-sd 3383 1 3383 3383 Ssl bacula-dir 3383 1 3383 3518 Ssl bacula-dir 3383 1 3383 3519 Ssl bacula-dir for the zombiness.... rcbacula-dir stop rcbacula-fd stop rcbacula-sd stop ps axuwww | grep bacula root 3188 0.0 0.0 0 0 ? Zsl 17:58 0:02 [bacula-sd] <defunct> It seem that bacula-sd remains zombie until tape driver (DDS3-SCSI) ejects its tape, so it could be a sort of i/o freeze.... But I can tell you that always the shutdown procedure take less time that is taken by the tapedriver to rewind and eject the tape..... and often the shutdown procedure successfully remount readonly the root filesystem. My 2 ยข... I read that bacula has been removed from the factory.... I think is one of the best piece of software it has ever written, I think is a bad idea to remove it from the distribution comment #134: Yes, also spindownd comes from Packman repository, from a small search via google, I have understood that "log_daemon_msg" and "log_end_message" are log facilities that belong to linux standard base 3.0-3 (functions that are defined in /lib/lsb/init-functions) the error message you refer to, doesn't seem to leave the daemon in a dirty state after rcspindaemon stop. The sysinit script also refers to another function that is "status_of_proc" that it isn't defined in opensuse 11.1 /lib/lsb/init-tools: extract of /etc/init.d/spindown: [...] . /lib/lsb/init-functions [...] case "$1" in "start") log_daemon_msg "Starting disk spindown daemon" "spindownd" start_daemon -p $PIDFILE $DAEMON -d -s $STATUSPATH -c $CONFPATH -p $PIDFILE log_end_msg $? exit $? ;; "stop") log_daemon_msg "Stopping disk spindown daemon" "spindownd" killproc -p $PIDFILE $DAEMON log_end_msg $? exit $? ;; "status") if status_of_proc -p $PIDFILE $DAEMON spindown; then echo -n else exit 1 fi killproc -p $PIDFILE $DAEMON -PIPE status exit 0 ;; [...]
@Anna: Do you know why bacula had dropped from factory? @Diego: you should drop a feature request on openSuSE.org to reenable bacula maybe Anna knows more about bacula ... nevertheless you should make sure that the /etc/init.d/spindown does not spin down the disks on stop. Beside this do you have seen the problem if you have disabled the bacula serives? IMHO a busy tape should cause a `D' but not a `Z' state (uninterruptible not defunct process). A zombie is a terminated process which is not reaped by its parent. The default parent of a real daemon is process 1 aka /sbin/init ... the question is why init takes so long to reap this specific process? @Eberhard for comment #136 ... AFAIK there is no process which requires /usr or any other mount point below /usr after boot.localfs has unmounted it. If you will find one please report. Please also attach the files /etc/init.d/.depend.boot /etc/init.d/.depend.halt /etc/init.d/.depend.start /etc/init.d/.depend.stop and the file /etc/inittab you may also add a simple line bash after the line which remounts the root file system readonly in /etc/init.d/halt to get a shell for debugging.
Created attachment 285112 [details] /etc/init.d/.depend.* /etc/inittab
Created attachment 285113 [details] ps axu In /etc/init.d/halt it gets exactly until line 168 rc_wait /sbin/blogd /sbin/splash and there it stops. Doesn't get beyond this line. If I enter bash before this line I get a shell. All volumes are still mounted and mount -no remount,ro / works without throwing an error message. At ps axu I can't see any offending process, but since my eye isn't very trained, I append the output of ps axu.
Werner: Bacula was not completely dropped from the distribution, I just moved it to Contrib (and AFAIK, Contrib repo will be a part of default installation in the next openSUSE release). I will try to update it here and there and I will fix the bugs, but I hope we find some external maintainer sooner or later. And why? First, I am tired of maintaining our huge FORTIFY_SOURCE patch (see #354872) and upstream is not interested to really fix the issue. Second, I have never found time to package it properly (ie. make it work with all the databases) - I have created the package several years ago as a quick hack for our internal IT and I do not think it really deserves to be a part of Factory. If anyone wishes to do better, I will gladly let him. As I have lot of work I consider much more important, I do not think I will do better in a reasonable time.
@Werner: Yes, it seems that "sometimes" while bacula-sd is in "Zombie" rootfs is locked in some manner and then the remount fails. In this case, when the tape is ejected, bacula-sd exits from the zombie state and then it is possible to remount the roofs. The workaroundo could be something like this: i=0; while [ (! mount -no remount,ro /) && i<50 ]; do sync; i=$[$i+1]; sleep 1; done
@Werner: my problem seems unrelated to Diego's. Maybe I better open a separate bug-report?
Yes that would be fine.
(In reply to comment #142) Diego? What happens if you force eject within the boot script of bacula-sd That is a line eject /dev/tape before the line killproc -TERM $BACULA_SD_BIN within /etc/init.d/bacula-sd ..and if this does not work, try out to move this line after the terminating killproc line. Clearly you should chekc if /dev/tape exists and points to the real physical device like /dev/st0 or /dev/nst0 ... (In reply to comment #140) Eberhard? Maybe you could try before the line with rc_wait to do an echo $BASH_VERSION echo $SECONDS and then set -x rc_wait to see if the bash has a problem with increasing $SECONDS on your sytem or the rc_wait() shell function. On question I have: Why do the processes mouning /windows/C, /windows/P,and /windows/Q exist at this point? IMHO this processes should be gone after boot.localfs has done its job even if the file systems are fuse based.
I solved the problem with bacula-sd setting: Offline On Unmount = no in bacula-sd.conf and with this line bacula doesn't send an offline to the tape when it shutdowns.
I solved the problem with bacula-sd setting: Offline On Unmount = no in bacula-sd.conf and with this line bacula doesn't send an offline to the tape when it shutdowns. Your comment was: I solved the problem with bacula-sd setting: Offline On Unmount = no in bacula-sd.conf and with this line bacula doesn't send an offline to the tape when it shutdowns. But my question is: the halt process isn't problem-proof as if some process haven't free the rootfs, system doesn't shutdown correctly
I found that my problem may be related to https://bugzilla.novell.com/show_bug.cgi?id=486710 , so I will append my answers there
This problem seems to be solved on openSuSE 11.2 ...