|
Bugzilla – Full Text Bug Listing |
| Summary: | journald frequently crashes with coredump of the journal daemon itself | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Siegbert Sadowski <ssadowski> |
| Component: | Basesystem | Assignee: | David Sterba <dsterba> |
| Status: | RESOLVED WONTFIX | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Critical | ||
| Priority: | P5 - None | CC: | dsterba, fbui, forgotten_0Iyw_XgfeX, hare, jack, jeffm, mgorman, ssadowski, tiwai |
| Version: | Leap 42.2 | ||
| Target Milestone: | --- | ||
| Hardware: | x86-64 | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: |
Journal logfile
journald log 2017-01-28 till the crash Journal log after the crash reboot 'perf' output before the crash occured Perf output as of 11-02-2017 Datasheet of Samsung SSD Output of iostat -x 5 Terminal output copies of 'top' and 'journalctl -ef' around crash time Corrected terminal outputs 'top' and 'journald-ef' /etc/fstab Another system lockup, journal excerpt Output of lsattr for journal files |
||
Created attachment 711149 [details]
Journal logfile
Sounds more like an issue of journald itself. Reassigned to systemd guys. Feel free to reassign back once when it turned out to be really a kernel issue. Hi, It seems that journald is busy in writing logs, it's so busy that it forgot to send back the keep-alive ping to PID1. As a consequence systemd thought that journald was down and killed it with SIGABRT. Could you attach the whole content of 'journalctl -b' preferably after the first crash of journald ? Could you also send the same output but with debug logs enabled ? To do so please boot with the kernel options : "debug printk.devkmsg=on" BTW do you use BTRFS ? Thanks. Created attachment 711994 [details]
journald log 2017-01-28 till the crash
Created attachment 711995 [details]
Journal log after the crash reboot
I tried to upload the compressed journal log file until the last crash occured. However the file size is 12 MB and it failed the upload limit of 10 MB. I tried to upload the compressed journal log file until the last crash occured. However the file size is 12 MB and it failed the upload limit of 10 MB. Therefore please delete Comment 4. Answer to your question w.r.t. filesystem: Yes, the system partion is BTRFS Hi, many thanks for your quick response. I restarted as recommended the system with boot option debug printk.devkmsg=on. The journal output became very verbose and generated a huge log file (88MB) until the last crash occured. Unfortunately the upload of the compressed file failed due to the file size limit of 10MB while the compressed file is 12 MB. Please delete my duplicate comment 4. However I noticed 2 things with the debug option turned on. 1. Although the log output is much larger the system was stable for at least twice the usual crash rate. 2. A commonality, the crashes occur often around 21:30, +/- 15 minutes. I haven't found so far a cron job in this time frame. My thoughts: For me the analysis of Franck Bui is very logical. The short response time of my SSD sytem disk keeps journald busy while writing the log to 'persistent storage'. BTW: I will be out of house for the next 7 days and will be back on 06.02.2017. Best regards, Siegbert Sadowski @Siegbert: I probably should have started with this but is there any process/service doing heavy logging or heavy disk IOs while this happening ? Currently my suspicion is that syncing the journal to your (SSD) disk (formatted with BTRFS) takes a while (more than 3 min) for some reasons. @Mel, do you have an idea how this could be traced without interfering with the timings of journald ? It might be possible by using perf or do you think strace should be enough for this case ? Maybe there are some other tools. Basicallly journald mmaps the journal file for appending new records in the journal file and calls fsync() to write out the data into the disk every 5 min or so. Thanks. (In reply to Franck Bui from comment #9) > @Mel, do you have an idea how this could be traced without interfering with > the timings of journald ? > sudo perf -a trace -e fdatasync,fsync,msync,sync,sync_file_range,syncfs While that's running it'll track all sync-related syscalls globally. If you want to target journald, you'll need to specify the pid with -p. Hit ctrl-c when you're done and there should be a breakdown of the sync calls. Crude example of tracing like this # dd if=/dev/zero of=zerofile ibs=1048576 count=500; sudo perf trace -e fdatasync,fsync,msync,sync,sync_file_range,syncfs sync; rm zerofile 500+0 records in 1024000+0 records out 524288000 bytes (524 MB, 500 MiB) copied, 0.772471 s, 679 MB/s 1023.708 (1023.708 ms): sync/24234 sync( ) = 0 That is showing that the sync command after an async dd took 1023.708 ms to complete. You can monitor indefinitely by specifying no command or record for a duration by running "sleep X" where X is the number of seconds you want to record data for. > It might be possible by using perf or do you think strace should be enough > for this case ? > perf can do it with lower interference. If there are a high number of sync calls then you may need to increase the mmap space with -m to avoid lost events. The same can be achieved with strace but you specified that the interference with journald be minimised and strace is very heavy Thanks Mel for the hint.
@Siegbert, once you'll be back and ready to debug, could you install perf and then try:
> $ perf trace -e --duration=1000 -e fdatasync,fsync,msync,sync,sync_file_range,syncfs --pid=$(pgrep systemd-journal)
and then reproduce the issue.
Once reproduced, please post the output of perf.
Thanks.
Created attachment 713752 [details]
'perf' output before the crash occured
Hello, I'm back again since 06.02.2017. Since then I tried to catch the output of the the recommended perf tool. The last days the systemd crash occured very soon after system start just before I was able to start the perf tool. After that no crash occured for the whole eveneing. This evenening I caught the crash event. Unfortunately the perf tool does not allow to redirect its output to a file. Therfore I copied the screen output to the appended file. Unexpectedly the output of perf terminated with '-----' although at this point in time nobody was at the computer / keyboard. In addition I append the output of a parallel rung journald -ef. Thanks for your help. Regards Siegbert. Created attachment 713777 [details]
Perf output as of 11-02-2017
(In reply to Siegbert Sadowski from comment #14) > Created attachment 713777 [details] > Perf output as of 11-02-2017 From your attachment: > 21978396.110 (303864.579 ms): fsync(fd: 20</var/log/journal/565b8fb65ab16a395a2a210a586a86d8/system.journal>) = 0 This means journald was blocked more than 5 min when calling fsync(). @Mel, could you (or someone from the kernel team) explain why Siegbert is facing such latencies ? Hello, my two cents to add, maybe the most recent observations help. Since I always have 'journald -ef' runing in a terminal window, to see what's going on in the system, it became evident that the journald core dump almost always was preceded by logrotate. At other locations in the internet I found hints that e.g. in Unbuntu the compression of the logfiles caused similar trouble. Many thanks for your help, Siegbert. (In reply to Franck Bui from comment #15) > (In reply to Siegbert Sadowski from comment #14) > > Created attachment 713777 [details] > > Perf output as of 11-02-2017 > > From your attachment: > > > 21978396.110 (303864.579 ms): fsync(fd: 20</var/log/journal/565b8fb65ab16a395a2a210a586a86d8/system.journal>) = 0 > > This means journald was blocked more than 5 min when calling fsync(). > > @Mel, could you (or someone from the kernel team) explain why Siegbert is > facing such latencies ? I'm adding Jan Kara to the cc in case there are follow up questions as I'm travelling a lot over the next three weeks and for one of those weeks, I'll be completely unavailable. The specs of the SSD are not included but it's reasonable to assume that it can write roughly 300GB/sec under ideal conditions. 5 minutes to sync data would imply that it's syncing approximately 90GB of data. Is it possible that it's syncing that much data? Check the size of the system.journal file being synced. However, there are other factors that could delay it. If CFQ is used and there are a large number of parallel reads then the writes and sync can be starved for quite some time. Are there heavy reads going on in parallel? While it's possible to get an in-depth view of what is going on, it may be unnecessary. The first hint can be got by running iostat -x 5 in the background and look at the stats around the time of the stall. Check if there are a large number of reads/writes happening at the same time and what the r_await and w_await stats are which will indicate if there are large amounts of IO going on at the same time. While I mention CFQ, using the noop scheduler could also be a problem as all other pending reads and writes must complete before the sync. If starvation is an issue and CFQ or noop is used then the obvious recommendation would be to try deadline. This may delay some reads when there is intensive writing but fsync times are likely to be reduced. IIt is possible to identify exactly what is doing the writing but I suspect it'll be a distraction. Regardless of how long it takes journald to sync a file when the system is busy, it should not be segfaulting. As Mel says gathering 'iostat -x 5' and seeing the numbers around stall time is a good idea. We'll get idea how much the disk is busy around that time. You can check which IO scheduler you are using in /sys/block/<device>/queue/scheduler however on 42.2 for non-rotating disk it should already be set to 'deadline'. But it doesn't hurt to verify it on your machine. Finally, I'm also adding David to CC as a btrfs guy. Maybe he has an idea why fsync(2) can take 5 minutes for the root btrfs filesystem. Thanks Mel for answering ! (In reply to Mel Gorman from comment #17) > > The specs of the SSD are not included but it's reasonable to assume that it > can write roughly 300GB/sec under ideal conditions. 5 minutes to sync data > would imply that it's syncing approximately 90GB of data. Is it possible > that it's syncing that much data? Check the size of the system.journal file > being synced. I can't see how this could realistically happen: since the journal file is mmap'ed that would mean that the huge amount of data that need to be synced lives in RAM, no ? Also normally the journal files are limited in size and the default values should limit the size of one journal file to less than 1G. > IIt is possible to identify exactly what is doing the writing but I suspect > it'll be a distraction. Regardless of how long it takes journald to sync a > file when the system is busy, it should not be segfaulting. Actually journald is not segfaulting: it's rather being terminated with SIGABRT by systemd because the later noticed that journald was stuck for too long (see WatchdogSec= option in systemd.service(5) man page for details). @Siegbert, could you show the information asked by Mel and Jara (see comment #18)? (In reply to Franck Bui from comment #20) > @Siegbert, could you show the information asked by Mel and Jara (see comment > #18)? Sorry I meant "Mel and Jan". (In reply to Franck Bui from comment #19) > Thanks Mel for answering ! > > (In reply to Mel Gorman from comment #17) > > > > The specs of the SSD are not included but it's reasonable to assume that it > > can write roughly 300GB/sec under ideal conditions. 5 minutes to sync data > > would imply that it's syncing approximately 90GB of data. Is it possible > > that it's syncing that much data? Check the size of the system.journal file > > being synced. > > I can't see how this could realistically happen: since the journal file is > mmap'ed that would mean that the huge amount of data that need to be synced > lives in RAM, no ? > The dirty data is in RAM. To fsync it, the pages have to be cleaned by writing all the data to disk > Also normally the journal files are limited in size and the default values > should limit the size of one journal file to less than 1G. > Ok > > IIt is possible to identify exactly what is doing the writing but I suspect > > it'll be a distraction. Regardless of how long it takes journald to sync a > > file when the system is busy, it should not be segfaulting. > > Actually journald is not segfaulting: it's rather being terminated with > SIGABRT by systemd because the later noticed that journald was stuck for too > long (see WatchdogSec= option in systemd.service(5) man page for details). Then that may need to be re-examined. There are a number of reasons why an fsync can be delayed such as read or write starvation. It is somewhat unexpected that journald would get terminated because there was a large number of random read/writers starving IO in general. Created attachment 715475 [details]
Datasheet of Samsung SSD
The sda device is a Samsung SSD 850 PRO with 126GB capacity
Hello, as described in the beginning, the crashdump occurs 1-2 times / day. Yesterday it came before I was able to read my email. As recommended I started 'iostat -x 5 > /root/2017-02-24_iostat-log.txt &'. Please find attached the compressed log file. For better understanding: sda is the Samsung SSD systemdisk, sdb+sdc form a raid1 md127. The CPU is a Celeron G1820 @2.7GHz with 2 cores and 8GB main memory. In addition I opened on a remote machine several terminal windows for monitoring. One of the monitoring windows was 'top' in order see the processes at coredump time. To my surprise disk I/O was very low but one of the cores was completly busy with process 'btrfs-transacti'. In file '2017-02-24_log.txt' you will find terminal window copies of this behaviour. The first 'top' was just before the coredump occured in 'journalctl -ef'. The 2nd 'top' shows that 'btrfs-transacti' is still very busy after about 15 minutes after the event of journald core dump. In file '2017-02-24_directories.txt' you will find the 'ls -alt' of /var/log/journal'. This looks to me reasonable, no excessive file size. One other thing I noticed. Yesterday and today the crash occured at 19:45 when cron started. Crontab contains only the standard jobs, no own extensions. I hope this helps. Best regards, Siegbert. Created attachment 715481 [details]
Output of iostat -x 5
Created attachment 715482 [details]
Terminal output copies of 'top' and 'journalctl -ef' around crash time
The first 'top' shows the situation before the core dump, the 2nd after ~15 minutes. In between is the screen shot of the journalctl -ef output at/around the crash time.
Created attachment 715483 [details]
Corrected terminal outputs 'top' and 'journald-ef'
I'm sorry,
but the title of the previous attachement 715482 should be 2017-02-24_directories.txt.
(In reply to Siegbert Sadowski from comment #24) > Hello, > > as described in the beginning, the crashdump occurs 1-2 times / day. > Yesterday it came before I was able to read my email. As recommended I > started 'iostat -x 5 > /root/2017-02-24_iostat-log.txt &'. Please find > attached the compressed log file. For better understanding: sda is the > Samsung SSD systemdisk, sdb+sdc form a raid1 md127. The CPU is a Celeron > G1820 @2.7GHz with 2 cores and 8GB main memory. > Ok, thanks for that. Looking at it, there was some moderate activity for the most of the lock. After a bit over an hour, there is a major spike in read activity followed by a large spike in write activity with short bursts of reads/writes after that. Nothing unusual about that but crucially, the average wait times for IO completion are very slow. There is no indication from this data that the disk is saturated in any way. The disk in question is also disk with decent performance characteristics. It should be able to write 1G of data in under 2 seconds. > In addition I opened on a remote machine several terminal windows for > monitoring. One of the monitoring windows was 'top' in order see the > processes at coredump time. To my surprise disk I/O was very low but one of > the cores was completly busy with process 'btrfs-transacti'. In file > '2017-02-24_log.txt' you will find terminal window copies of this behaviour. > The first 'top' was just before the coredump occured in 'journalctl -ef'. > The 2nd 'top' shows that 'btrfs-transacti' is still very busy after about 15 > minutes after the event of journald core dump. > I think the issue here is that btrfs-transaction is completely locked up and stalls all IO while it figures out what to do. IIRC, that process going to 100% implies it's a fragmentation issue but btrfs is far from my strong suite. David, I'm afraid you'll need to take a look. The history is in the comments but the short summary is that an fsync on a btrfs filesystem takes minutes to complete and systemd responds by killing journald as it's "not responding". During the stall, there is no indication from iostat that the SSD is saturated but btrfs-transaction is pegged at 100% CPU usage. It's Leap 42.2 and kernel version 4.4.49-16-default (according to iostat). Is this something you're already familiar with? If not, any suggestions on how to debug and fix this? Siegbert, I should have added that David will probably want to see the output of the mount command to see what mount options btrfs is using. Created attachment 715500 [details]
/etc/fstab
Hello,
Mel, did you mean the fstab?
(In reply to Siegbert Sadowski from comment #30) > Created attachment 715500 [details] > /etc/fstab > > Hello, > > Mel, did you mean the fstab? It's helpful but I really did mean the mount command to see what default options are set for btrfs. I expect they are standard but I've learned not to make assumptions. Hello Mel, sorry for my misunderstanding. In the past I've used mount only for active mounting of a device. Here is the output of mount: ---- serv2:/home/siegbert # mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) devtmpfs on /dev type devtmpfs (rw,nosuid,size=4018380k,nr_inodes=1004595,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) /dev/sda3 on / type btrfs (rw,relatime,ssd,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct) mqueue on /dev/mqueue type mqueue (rw,relatime) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) debugfs on /sys/kernel/debug type debugfs (rw,relatime) /dev/sda3 on /var/log type btrfs (rw,relatime,ssd,space_cache,subvolid=275,subvol=/@/var/log) /dev/sda3 on /var/cache type btrfs (rw,relatime,ssd,space_cache,subvolid=266,subvol=/@/var/cache) /dev/sda3 on /var/lib/mysql type btrfs (rw,relatime,ssd,space_cache,subvolid=272,subvol=/@/var/lib/mysql) /dev/sda2 on /home type xfs (rw,relatime,attr2,inode64,noquota) /dev/sda3 on /usr/local type btrfs (rw,relatime,ssd,space_cache,subvolid=265,subvol=/@/usr/local) /dev/sda3 on /srv type btrfs (rw,relatime,ssd,space_cache,subvolid=263,subvol=/@/srv) /dev/sda3 on /var/crash type btrfs (rw,relatime,ssd,space_cache,subvolid=267,subvol=/@/var/crash) /dev/sda3 on /var/lib/libvirt/images type btrfs (rw,relatime,ssd,space_cache,subvolid=268,subvol=/@/var/lib/libvirt/images) /dev/sda3 on /.snapshots type btrfs (rw,relatime,ssd,space_cache,subvolid=258,subvol=/@/.snapshots) /dev/sda3 on /var/lib/machines type btrfs (rw,relatime,ssd,space_cache,subvolid=269,subvol=/@/var/lib/machines) /dev/sda3 on /tmp type btrfs (rw,relatime,ssd,space_cache,subvolid=264,subvol=/@/tmp) /dev/sda3 on /boot/grub2/x86_64-efi type btrfs (rw,relatime,ssd,space_cache,subvolid=261,subvol=/@/boot/grub2/x86_64-efi) /dev/sda3 on /var/spool type btrfs (rw,relatime,ssd,space_cache,subvolid=277,subvol=/@/var/spool) /dev/sda3 on /var/lib/mariadb type btrfs (rw,relatime,ssd,space_cache,subvolid=271,subvol=/@/var/lib/mariadb) /dev/sda3 on /var/tmp type btrfs (rw,relatime,ssd,space_cache,subvolid=278,subvol=/@/var/tmp) /dev/sda3 on /boot/grub2/i386-pc type btrfs (rw,relatime,ssd,space_cache,subvolid=260,subvol=/@/boot/grub2/i386-pc) /dev/sda3 on /opt type btrfs (rw,relatime,ssd,space_cache,subvolid=262,subvol=/@/opt) /dev/sda3 on /var/lib/pgsql type btrfs (rw,relatime,ssd,space_cache,subvolid=274,subvol=/@/var/lib/pgsql) /dev/sda3 on /var/opt type btrfs (rw,relatime,ssd,space_cache,subvolid=276,subvol=/@/var/opt) /dev/sda3 on /var/lib/named type btrfs (rw,relatime,ssd,space_cache,subvolid=273,subvol=/@/var/lib/named) /dev/sda3 on /var/lib/mailman type btrfs (rw,relatime,ssd,space_cache,subvolid=270,subvol=/@/var/lib/mailman) /dev/md127 on /local type ext3 (rw,relatime,data=ordered) proc on /var/lib/named/proc type proc (ro,nosuid,nodev,noexec,relatime) proc on /var/lib/dhcp/proc type proc (ro,relatime) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=805184k,mode=700) fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime) tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime) tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=805184k,mode=700,uid=1000,gid=100) serv2:/home/siegbert # ----- BTW: like the last days the core dump occured today shortly after the 19:45 cron started. I didn't any changes to the standard Leap 42.2 cron. Siegbert. @David, I'm reassigning this bug to you as some clues suggest an issue in btrfs. Hello, since my BTRFS problem is now open for more than 2 month it looks like that this is a more serious problem with no quick solution. Therfore I decided to change my system from BTRFS to the stable EXT4 next weekend (01.04.2017). Until then, if you need more logs for debugging please let me know. The journald coredump crashes still occur reproducible with the daily cron. Maybe BTRFS needs some more development time to become mature enough for real usage. Siegbert Sadowski The "issue" is that journald has a pathological workload for btrfs. Tiny writes plus fsync plus a relatively constant stream means a few things happen. We're constantly CoWing for each tiny write due to the fsync, we're getting very fragmented, and we're getting slowed down in the transaction path. The fix we've deployed elsewhere for this exact problem is to turn off data cow on the journal files. This is a well known interaction and the only issue is that it wasn't identified as this sooner. systemd is supposed to set the FS_NOCOW file flag for its journal files since sometime now. At least it's true on v228 on which this bug was reported (agains Leap 42.2). Siegbert, did you disable that somehow ? could you check that the flag is set on the journal files ? I didn't do any changes to system configuration files. I'm sorry, but I'm not that deep in the internals of either systemd and journald. Can you help me to find this FS_NOCOW flag? At least in the /etc/systemd configuration folder I couldn't find any hint. Thanks, Siegbert I had similar issue shortly after kernel update to 4.4.49-16.1 . At the end my FS (btrfs) became unmountable. I failed to fix it with btrfs utils and decided to reinstall the system on ext4. Surprisingly, random crashes and freezes continued, this time without FS corruption though. Then I rolled back kernel (to 4.4.27-2.1 from the installer) and since then there was no one single glitch. PS. Before kernel rollback I've checked RAM and HD, everything was ok (In reply to Siegbert Sadowski from comment #38) > Can you help me to find this FS_NOCOW flag? $ lsattr -R /var/log/journal see man lsattr(1) Created attachment 719297 [details]
Another system lockup, journal excerpt
Update: it just happened again. The system became unresponsive during photorec session. After reset, that's what i've found in journal.
Created attachment 719368 [details]
Output of lsattr for journal files
Thanks for your replies.
The 'C' attribute is set for all journal files.
Siegbert.
(In reply to Siegbert Sadowski from comment #42) > The 'C' attribute is set for all journal files. Then it seems that the kernel guys have some homework ;) Hello all, it looks like the recent SuSE updates (week of 10.Apr.2017) solved the problem. I monitored the system now for 1 week and no new crashes occured. Thanks a lot to all of you for your help. Siegbert. This is automated batch bugzilla cleanup. The openSUSE 42.2 changed to end-of-life (EOL [1]) status. As such it is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of openSUSE, or you can still observe it under openSUSE Leap 15.0, please feel free to reopen this bug against that version (see the "Version" component in the bug fields), or alternatively open a new ticket. Thank you for reporting this bug and we are sorry it could not be fixed during the lifetime of the release. [1] https://en.opensuse.org/Lifetime |
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0 Build Identifier: Hello community, I installed Leap 42.2 fresh on a new SSD. After configuration (with Yast) the system hangs 1-2 times per day. The journal log file shows always the same pattern: Jan 19 22:02:35 serv2 systemd-journald[15033]: System journal (/var/log/journal/) is currently using 784.0M. Maximum allowed usage is set to 4.0G. Leaving at least 4.0G free (of currently available 44.7G of space). Enforced usage limit is thus 4.0G, of which 3.2G are still available. Jan 19 22:02:35 serv2 systemd-journald[15033]: Journal started Jan 19 22:00:01 serv2 cron[15019]: pam_unix(crond:session): session opened for user root by (uid=0) Jan 19 22:00:01 serv2 systemd[1]: Started Session 17 of user root. Jan 19 22:02:35 serv2 CRON[15019]: pam_unix(crond:session): session closed for user root Jan 19 22:02:17 serv2 systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)! Jan 19 22:02:35 serv2 systemd[1]: Starting Flush Journal to Persistent Storage... Jan 19 22:02:35 serv2 systemd[1]: Started Flush Journal to Persistent Storage. Jan 19 22:02:35 serv2 systemd-coredump[15025]: Detected coredump of the journal daemon itself, diverted to /var/lib/systemd/coredump/core.systemd-journal.0.0c14f Then when I try to reboot it takes 30-45 minutes(!) for the system to shut down. There are no other error messages in the log. Can you help me? Many Thanks in advance. Reproducible: Couldn't Reproduce Steps to Reproduce: 1.monitored the system with journalctl -ef 2.after a couple of hours the above described log entry occurs 3.