Bugzilla – Bug 672450
sysvinit+systemd: too many repeated messages on ttyS0 console
Last modified: 2011-07-26 13:00:40 UTC
Created attachment 414366 [details] screenlog Systemd repeats some messages thousand times while the system is booting. Screenlog attached. For example: ... ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ^MBoot logging started on /dev/ttyS0(/dev/console) at Wed Feb 16 10:29:08 2011 ...
Ah, sorry I booted with sysvinit this time. So that's its fault.
Created attachment 414379 [details] systemd-screenlog (In reply to comment #1) > Ah, sorry I booted with sysvinit this time. So that's its fault. Hmm, not really, happens also with systemd. But not that much. Only some messages are repeated.
I don't know whose fault is this then... Cmdline: BOOT_IMAGE=/boot/vmlinuz-2.6.37-22-default root=UUID=dceb6665-da77-45bd- 9744-06dfaa3811ea 3 console=tty0 console=ttyS0,115200
Hmmm .... I'm pretty sure that the blogd does send its messages only once. You may try to disable the blogd by adding the key word `quiet' to the kernels command line
The safe_out() routine of blogd retries to write out messages if and only if the system call write(2) does return -1 and the errno is set to EINTR or EAGAIN or EIO ... for EIO a reconnect to the console is tried ... for EPIPE blogd simply exits. The question is: does the kernel set the return value to less than zero and errno to EINTR or even if write(2) had written out its buffer?
Adding maintainer of mkinitrd to carbon copy list. Hi Michal!
Jiri? You may remove the files /lib/mkinitrd/scripts/boot-blogd.sh /lib/mkinitrd/scripts/boot-killblogd.sh /lib/mkinitrd/scripts/boot-killblogd2.sh and change within /lib/mkinitrd/scripts/boot-start.s the line exec < /dev/console > /dev/console 2>&1 to exec < /dev/console > /dev/kmsg 2>&1 to use the kernels message ring buffer instead the ring buffer of the blogd. OK we loose the escape sequence filter as well as the filter for the rotator/progress sequences of fsck. ... Nevertheless the serial driver in the kernel should not fool the usage of the system call write(2).
To make this active the command mkinitrd should be executed otherwise the initrd will not be changed :)
Created attachment 414407 [details] dmesg from kernel with vfs_write intercepted (In reply to comment #7) > Jiri? You may remove the files > > /lib/mkinitrd/scripts/boot-blogd.sh > /lib/mkinitrd/scripts/boot-killblogd.sh > /lib/mkinitrd/scripts/boot-killblogd2.sh > > and change within > > /lib/mkinitrd/scripts/boot-start.s > > the line > > exec < /dev/console > /dev/console 2>&1 > > to > > exec < /dev/console > /dev/kmsg 2>&1 Yes, this stops blogd to flood the console. > ... Nevertheless the serial driver in the kernel should not fool > the usage of the system call write(2). Nope, it's blogd who cripples. With this change (write(2) interception in the kernel): diff --git a/fs/read_write.c b/fs/read_write.c index 5520f8a..ecae48a 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -384,6 +384,14 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_ inc_syscw(current); } + if (!strcmp(current->comm, "blogd")) { + int a; + printk("%s: cnt=%lu ret=%ld str=", __func__, count, ret); + for (a = 0; a < min(count, 16LU); a++) + printk(KERN_CONT "%c", buf[a] == '\n' ? 'X' : buf[a]); + printk(KERN_CONT "\n"); + } + return ret; } The output looks like (i.e. blogd write the same messages several times even though the retval equals to what it wanted kernel to write...): [ 2.612154] vfs_write: cnt=78 ret=78 str=^MBoot logging st [ 2.616610] vfs_write: cnt=78 ret=78 str=^MBoot logging st [ 2.621521] vfs_write: cnt=5 ret=5 str=1095X [ 2.634171] vfs_write: cnt=118 ret=118 str=^MBoot logging st [ 2.640975] vfs_write: cnt=118 ret=118 str=^MBoot logging st [ 2.646736] vfs_write: cnt=118 ret=118 str=^MBoot logging st
Created attachment 414412 [details] dmesg from kernel with vfs_write intercepted updated Well, I print also fd and open file. It makes more sense: fd=1 (/dev/console) cnt=78 ret=78 redirected_tty_write str=^MBoot logging st fd=5 (/var/run/blogd.pid) cnt=5 ret=5 do_sync_write str=1092X fd=1 (/dev/ttyS0) cnt=118 ret=118 tty_write str=^MBoot logging st fd=7 (/dev/tty0) cnt=118 ret=118 redirected_tty_write str=^MBoot logging st fd=4 (/dev/shm/initrd.msg) cnt=118 ret=118 do_sync_write str=^MBoot logging st fd=1 (/dev/ttyS0) cnt=118 ret=118 tty_write str=^MBoot logging st fd=7 (/dev/tty0) cnt=118 ret=118 redirected_tty_write str=^MBoot logging st fd=1 (/dev/ttyS0) cnt=118 ret=118 tty_write str=^MBoot logging st fd=7 (/dev/tty0) cnt=118 ret=118 redirected_tty_write str=^MBoot logging st fd=1 (/dev/ttyS0) cnt=118 ret=118 tty_write str=^MBoot logging st fd=7 (/dev/tty0) cnt=118 ret=118 redirected_tty_write str=^MBoot logging st fd=1 (/dev/ttyS0) cnt=118 ret=118 tty_write str=^MBoot logging st
Hmmm ... I'm a bit puzzled as I do not see how this should happen. The blogd writes out each message to the physical devices only once. This would lead to the assumption that those messages will be read by blogd from the new /dev/console (a tty from pty/tty pair) again and again?
Which kernel version is used and does this happen with a prior version (2.6.37)?
I can follow the lines fd=1 (/dev/console) cnt=78 ret=78 redirected_tty_write str=^MBoot logging st fd=5 (/var/run/blogd.pid) cnt=5 ret=5 do_sync_write str=1092X fd=1 (/dev/ttyS0) cnt=118 ret=118 tty_write str=^MBoot logging st fd=7 (/dev/tty0) cnt=118 ret=118 redirected_tty_write str=^MBoot logging st fd=4 (/dev/shm/initrd.msg) cnt=118 ret=118 do_sync_write str=^MBoot logging st but after this the code of blogd does not show how this should go on: parselog(trans, cnt); /* Parse and make copy of the input */ for (c = cons; c; c = c->next) { if (c->fd < 0) continue; safeout(c->fd, (char*)trans, cnt, c->max_canon); (void)tcdrain(c->fd); /* Write copy of input to real tty */ } flushlog(); the file descriptor fd=4 is used by the thread triggered by flushlog() this shows that this is no an endless loop, that is we have two physical devices with file descriptors fd=1 and fd=7. Please could you determine if the blogd rereads its own input even from the new /dev/console?
In other words has there been cahnges at this area in the kernel?
The code can not do this ... even if the blogd function safein() would be fooled by ioctl FIONREAD or by select(), the outer loop would require that the log file of file descriptor 4 would be hit after fd 1 and 7 as the character buffer trans[] would be filled with the input from /dev/console. Adding gcc maintainer as he may now what is going on here
(In reply to comment #12) > Which kernel version is used and does this happen with a prior version > (2.6.37)? This is my testing kernel where I can very easily inject code. Yes, it happens with 11.4 2.6.37 kernel. (In reply to comment #14) > In other words has there been cahnges at this area in the kernel? It might be possible.
Created attachment 414691 [details] read+write syscalls with injected code $ grep preping.91 dmesg|sed 's/.*sys_//' read: fd=0 (/dev/ptmx) cnt=1103 ret=1103 tty_read str=preping 91-killu write: fd=1 (/dev/ttyS0) cnt=255 ret=255 tty_write str=preping 91-killu write: fd=4 (/dev/shm/initrd.msg) cnt=1103 ret=1103 do_sync_write str=preping 91-killu write: fd=7 (/dev/tty0) cnt=255 ret=255 redirected_tty_write str=preping 91-killu read: fd=0 (/dev/ptmx) cnt=1103 ret=1103 tty_read str=preping 91-killu write: fd=1 (/dev/ttyS0) cnt=255 ret=255 tty_write str=preping 91-killu write: fd=4 (/dev/shm/initrd.msg) cnt=1103 ret=1103 do_sync_write str=preping 91-killu write: fd=7 (/dev/tty0) cnt=255 ret=255 redirected_tty_write str=preping 91-killu read: fd=0 (/dev/ptmx) cnt=1103 ret=1103 tty_read str=preping 91-killu write: fd=1 (/dev/ttyS0) cnt=255 ret=255 tty_write str=preping 91-killu write: fd=4 (/dev/shm/initrd.msg) cnt=1103 ret=1103 do_sync_write str=preping 91-killu write: fd=7 (/dev/tty0) cnt=255 ret=255 redirected_tty_write str=preping 91-killu It looks like (for example) this message is read from /dev/ptmx multiple times.
(In reply to comment #16) My guess was that the ioctl FIONREAD and/or the system call select(2) could be fooled, but then the read(2) system routine would fill the transfer buffer with the same line and this should written out successively in all three file descriptors 1, 7, *and* 4. But your log shows me that only the two file descriptors 1 and 7 will be used. And as the buffer will be filled with new informations the for-loop on the list of consoles is not an endless loop. That together with the chaning content of the transer buffer (beside the first line) had lead me to the assumption that something goes wrong during compile time (gcc). Maybe this is a problem with my used buffer static unsigned char trans[TRANS_BUFFER_SIZE]; and the cast used below const ssize_t cnt = safein(fdread, (char*)trans, sizeof(trans), noerr); [...] parselog(trans, cnt); for (c = cons; c; c = c->next) { if (c->fd < 0) continue; safeout(c->fd, (char*)trans, cnt, c->max_canon); (void)tcdrain(c->fd); } flushlog(); [...] this is the reason why I've put Michael into the carbon copy list.
(In reply to comment #17) Why does it happen that read(2) can read the same string several times from the master of the slave pty used as system console? This is new in comparison to older kernels.
Hmmm ... during blogd the original console devies are switched into raw tty mode whereas the new pty as temporary system console(s) are switched into canonical mode.
(FWIW, right now I'm assuming this has nothing to do with GCC. It seems clear to me that the cause is that messages written to /dev/console are sometimes reflected back to /dev/ptmx by the kernel, and that that is the cause for blogd sometimes repeating the same message)
OK, I'll remove you from CC Jiri, please tell me your archtitecture, as I'd like to attach a blogd without switching any terminal into raw mode.
(In reply to comment #22) > OK, I'll remove you from CC > > Jiri, please tell me your archtitecture, as I'd like to attach > a blogd without switching any terminal into raw mode. x86_64 It's like it opens /dev/ptmx for reading messages. It writes the messages to /dev/console which is /dev/ptm0 which gets to /dev/ptmx: tty_insert_flip_string_fixed_flag: blogd idx=0 name=ptm0 str=^MBoot logging st sys_write: fd=1 (/dev/console) cnt=78 ret=78 redirected_tty_write str=^MBoot logging st sys_read: fd=0 file=ffff880039d57000 (/dev/ptmx) cnt=78 ret=78 tty_read str=^MBoot logging st sys_write: fd=1 (/dev/ttyS0) cnt=78 ret=78 tty_write str=^MBoot logging st tty_insert_flip_string_fixed_flag: blogd idx=0 name=ptm0 str=^MBoot logging st sys_write: fd=7 (/dev/tty0) cnt=78 ret=78 redirected_tty_write str=^MBoot logging st sys_read: fd=0 file=ffff880039d57000 (/dev/ptmx) cnt=118 ret=118 tty_read str=sys_write: fd=4 (/dev/shm/initrd.msg) cnt=78 ret=78 do_sync_write str=XBoot logging st preping 11-block
Created attachment 414956 [details] /sbin/blogd for 11.4-x86_64 Please install this executable binary as /sbin/blogd and run mkinitrd to become an updated binary as well in the initrd. Thank you for testing!
Created attachment 415233 [details] screenlog with updated blogd (In reply to comment #24) > Please install this executable binary as /sbin/blogd and It doesn't spam the output. But it doesn't log anything either. There is no "Boot log started" etc. on the console...
(In reply to comment #25) Hmmm ... that is strange, are you sure that you have removed the workaround mentioned in comment #7 ... just run lsinitrd /boot/initrd | grep blogd grep -E 'quiet|fastboot' /proc/cmdline to see if there is any blogd support therein and if so, if it is used (that is no quiet nor fastboot should be part of the kernels command line).
Created attachment 415610 [details] screenlog with updated blogd (In reply to comment #26) > Hmmm ... that is strange, are you sure that you have removed the > workaround mentioned in comment #7 ... just run > > lsinitrd /boot/initrd | grep blogd Yeah, my bad. chmod a+x /sbin/blogd helped. The issue persists. I fed the output through s/[\r\n][\r\n]*/\r/ to get rid of 600k of crap.
... The only visible change at tty kernel API is CONFIG_UNIX98_PTYS=y CONFIG_DEVPTS_MULTIPLE_INSTANCES=y <<<<<<< CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=0 that is what does devpts multiple instances do? I do not understand why blogd is able to reread its own messages send to both /dev/tty0 and /dev/ttyS0 even if the console is explicit set to /dev/pts/0 and blogd only reads from the corresponding /dev/ptm0. Does this mean that with the actual kernel blogd has to close all file descriptors for /dev/console (0, 1, and 2) to reopen them after the ioctl TIOCCONS??
Created attachment 417071 [details] /sbin/blogd for 11.4-x86_64 Yet an other blogd, now closing old console and dup pts to stdout and stderr.
i'm confused, what do you need from me here?
(In reply to comment #30) I'd like to know what difference between the kernel of 11.3 and 11.4 causes that blogd reads its own output to the psysical devices of a serial console from the system console even after the ioctl TIOCCONS for redirecting console outout to a pts device.
I've added Hannes to carbon copy list. Hi Hannes, you'll find in attchment #417071 an blogd binary for x86_64 which explicit closes file descriptor 0 aka /dev/console to open the pts device after the ioctl TIOCCONS as new system console. The question is: does this work around the new (mis)behaviour of the kernel of 11.4
Yes, the blogd from comment #29 fixes the problem for me.
OK ... then this change should go to 11.4
(In reply to comment #34) I've submitted a fixed sysvinit package for this bug #672450 and for bug ##664941
the sysvinit submission currentöly in 11.4 is the same as we shipped with 11.4. (obs request 63912) top of changes is still: ------------------------------------------------------------------- Tue Feb 15 15:44:33 CET 2011 - werner@suse.de - New killproc version 2.19: bug fix update and there is no diff to the 11.4 shipping one.
See obs equest id 64128
seems sensible +1
ok, i suvbnmitted a patchinfo already
fixed
Does not work in virtual systems -> race condition -> kernel bug!
The fixed version does its work for console=ttyS0,38400n8 console=tty0 in /boot/grub/menu.lst but *not* for the reverse order console=tty0 console=ttyS0,38400n8 something wents wrong in the kernel. IMHO this is a regression in the kernel code for serial console handling as this had worked flawless with 2.6.34. See also bug #642289 ... I've tried to debug this bug on a virtual system, therefore I've detected the behaviour above.
Yep, there was a change in ttyname(3) which affects only in initrd.
dup of 642289 *** This bug has been marked as a duplicate of bug 642289 ***
This is an autogenerated message for OBS integration: This bug (672450) was mentioned in https://build.opensuse.org/request/show/77067 Factory / sysvinit