Bug 817778

Summary: journalctl can not read (broken?) journal.
Product: [openSUSE] openSUSE Tumbleweed Reporter: Stefan Seyfried <seife>
Component: BasesystemAssignee: Frederic Crozat <fcrozat>
Status: RESOLVED FIXED QA Contact: E-mail List <qa-bugs>
Severity: Major    
Priority: P5 - None CC: crrodriguez, fcrozat
Version: 13.1 Milestone 0   
Target Milestone: ---   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: Third Party Developer/Partner Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: two journal files which trigger the described behaviour
minimal set of broken journals
another set of broken journal files

Description Stefan Seyfried 2013-04-30 06:32:56 UTC
journalctl can not read my journal, I have no way of getting my syslog messages out of it.

journalctl --full -a --since 2013-04-17
=> 
-- Logs begin at Fri 2012-11-09 15:09:50 CET, end at Tue 2013-04-30 08:22:29 CEST. --
Apr 17 00:51:27 susi.home.s3e.de dhclient[26947]: DHCPREQUEST on air to 192.168.200.1 port 67
...
scroll down, scroll down, scroll down....
...
Apr 17 08:47:14 susi.home.s3e.de goa[2224]: goa-daemon version 3.8.0 starting [main.c:113, main()]
-- Reboot --
Apr 12 09:45:29 susi.home.s3e.de systemd[1]: NetworkManager-wait-online.service: main process exited, code=exitecode=exited, status=1/FAILURE

....oops. Time warp?....
scroll down, scroll down, scroll down...
...Apr 12 09:45:31 susi.home.s3e.de gnome-session[1611]: WARNING: Could not parse desktop file /usr/share/gdm/greeter/autostart/orca-autostart.desktop or it references a not found TryExec binary
-- Reboot --
Apr 08 14:09:08 susi.home.s3e.de polkitd[1268]: /usr/share/polkit-1/rules.d/50-lightdm.rules:3: action=[Action id='org.freedesktop.consolekit.system.stop']

...oops, even more time warp=...
scroll down...
...

Apr 11 14:23:11 susi.home.s3e.de dhclient[1861]: DHCPREQUEST on air to 192.168.1.1 port 67
-- Reboot --
Apr 12 09:45:28 susi.home.s3e.de /usr/bin/dbus-launch[1611]: ** (gnome-session-check-accelerated:1616): WARNING **: Error retrieving accessibility bus address: org.freedesktop.DBus.Error.ServiceUnknown: The name org.a11y.Bus was not provided by any .service files
-- Reboot --
Apr 08 14:08:33 susi.home.s3e.de gnome-session[1629]: Gdk-WARNING: gnome-session: Fatal IO error 11 (Die Ressource ist zur Zeit nicht verfügbar) on X server :0.


...and there it goes into an endless loop.

"journalctl --full -a --no-pager --since 2013-04-17 > /tmp/logfile" never finishes but chews away with 100% CPU...

This is terminally broken and exactly what people opposing binary log formats expected to happen: no way to get the logs out of the heap of data.

journalctl --verify did complain about some of the journal files (without a way to fix them), I have removed or renamed the ones it complained on but this did not help.

My machine crashed a few times on april 11, but logfiles other than the journal survived that without permanent damage.
Comment 1 Frederic Crozat 2013-05-21 15:42:28 UTC
could you attach the guilty journals here (or send them to me, if you don't want them hosted) ?
Comment 2 Stefan Seyfried 2013-05-22 07:47:15 UTC
Created attachment 540592 [details]
two journal files which trigger the described behaviour

those two trigger the problem for me, I had to move them out of /var/log/journal.
Comment 3 Frederic Crozat 2013-05-22 09:47:15 UTC
Hmm, this is strange, I can't reproduce the loop behavior you described, either with journalctl on 12.3 or Factory.

Could you try to copy each of those journal files individually an empty directory and run "journalctl --full -a --no-pager -D <director_where_you_unpack_the_journal_file>"  ?
Comment 4 Stefan Seyfried 2013-05-22 13:33:24 UTC
Created attachment 540651 [details]
minimal set of broken journals

I verified that this is the minimum set of files that allows to reproduce the behaviour:

Apr 12 09:53:21 susi.home.s3e.de kernel: lightdm[2102]: segfault at 1d ip 0000000000410ba6 sp 00007fffaf053d10 error 6 in lightdm[400000+29000]
Apr 12 09:53:32 susi.home.s3e.de gnome-keyring-daemon[2162]: Gkm: using old keyring directory: /home/seife/.gnome2/keyrings
-- Reboot --
Apr 11 14:23:11 susi.home.s3e.de dhclient[1861]: DHCPACK from 192.168.1.1
Apr 11 14:23:11 susi.home.s3e.de dhclient[1861]: bound to 192.168.1.147 -- renewal in 1676 seconds.
Comment 5 Stefan Seyfried 2013-05-31 08:09:26 UTC
Created attachment 542067 [details]
another set of broken journal files

...even though they verify just fine:

susi:/dev/shm/journal # journalctl --verify -D .
PASS: ./user-108.journal                                                                 
PASS: ./system@0004dd66298a6e59-c8c8c32698a7d94a.journal~                                
PASS: ./user-10329.journal                                                               
PASS: ./system.journal                                                                   
susi:/dev/shm/journal # journalctl -D . --since 2013-05-23\ 19:39:23|head
-- Logs begin at Tue 2013-05-21 21:54:26 CEST, end at Fri 2013-05-31 09:38:19 CEST. --
May 23 19:39:23 susi.home.s3e.de /usr/bin/dbus-launch[1559]: ** (gnome-session-check-accelerated:1564): WARNING **: Error retrieving accessibility bus address: org.freedesktop.DBus.Error.ServiceUnknown: The name org.a11y.Bus was not provided by any .service files
-- Reboot --
May 23 09:14:57 susi.home.s3e.de /usr/bin/dbus-launch[1569]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.
-- Reboot --
May 23 19:38:39 susi.home.s3e.de systemd-journal[239]: Allowing runtime journal files to grow to 394.8M.
May 23 19:38:39 susi.home.s3e.de kernel: Initializing cgroup subsys cpuset
May 23 19:38:39 susi.home.s3e.de kernel: Initializing cgroup subsys cpu
May 23 19:38:39 susi.home.s3e.de kernel: Linux version 3.9.0-1-default (geeko@buildhost) (gcc version 4.7.3 (SUSE Linux) ) #1 SMP Tue May 7 08:14:56 UTC 2013 (d6e99fd)
May 23 19:38:39 susi.home.s3e.de kernel: Command line: root=/dev/disk/by-label/root-FACTORY resume=/dev/sda5 splash=silent quiet init=/bin/systemd vga=0x367
Comment 6 Stefan Seyfried 2013-05-31 08:19:22 UTC
user108.journal seems to be the broken piece this time...
Comment 7 Frederic Crozat 2013-05-31 09:57:30 UTC
(In reply to comment #6)
> user108.journal seems to be the broken piece this time...

not exactly.

If you check each journal individually with journalctl -D <directory_containing one entry> --no-pager, you'll notice you read them correctly.

However, when 3 of the 4 journals from your second tarball (user-10329.journal doesn't change anything) are parsed at the same time, there is an infinite loop during parsing (it can be seen when using --no-pager) which output again and again part of the same logs.

It looks like a bug in the parsing and not in the journals themselves.

I'll digg further into the code.
Comment 8 Frederic Crozat 2013-05-31 13:06:51 UTC
systemd / journalctl from 12.3 are able to correctly parse the journal files.

It is a regression in journalctl (there is nothing wrong in the journal themselves), caused by a bugfix from https://bugs.freedesktop.org/show_bug.cgi?id=63672

I'm discussing with upstream how to fix it.
Comment 9 Stefan Seyfried 2013-05-31 14:06:46 UTC
bad wording: "user018.journal seems to be the piece that triggers the brokenness" is probably better.

In all cases, the individual files are fine, only in combination they are broken.

Maybe a cronjob that just "rm -f"'s all non-"system@*" files in /var/log/journal would be enough.
Comment 10 Frederic Crozat 2013-06-13 03:01:24 UTC
fix has landed upstream. I'll backport it soon.
Comment 11 Cristian Rodríguez 2013-06-18 01:56:45 UTC
Fixed in request 179368.
Comment 12 Cristian Rodríguez 2013-06-18 01:58:47 UTC
.