|
Bugzilla – Full Text Bug Listing |
| Summary: | no space left on device when upgrading | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE Distribution | Reporter: | Forgotten User SpTvqxsYZX <forgotten_SpTvqxsYZX> |
| Component: | Basesystem | Assignee: | David Sterba <dsterba> |
| Status: | RESOLVED WONTFIX | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Major | ||
| Priority: | P2 - High | CC: | aschnell, bwiedemann, chcao, dsterba, forgotten_eE2c9JI9lH, jeffm, kolAflash, ma, mge, sebix+novell.com |
| Version: | 13.2 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
| Attachments: | Screenshot from openSUSE 13.2 installation with the switch to disable snapshotting | ||
|
Description
Forgotten User SpTvqxsYZX
2015-05-19 23:44:31 UTC
likely btrfs+snapper-related try btrfs filesystem df / and if you want the cure-all: for s in /.snapshots/*/snapshot ; do btrfs subvolume delete $s ; done *** Bug 931725 has been marked as a duplicate of this bug. *** $ btrfs filesystem df / Data, single: total=20.01GiB, used=8.48GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=1.01GiB, used=320.94MiB GlobalReserve, single: total=112.00MiB, used=0.00B the df information seems wrong as well. It says the total for data single is 20GB, when root partition is 30GB. That was after I removed all but two of the snapper snapshots. Setting P0 is not allowed for users. Only after running btrfs fi balance /mountpoint -dusage=50 after the snapshots command did some of the missing data appear to come back. Btrfs deletes snapshot asynchronously, so after the delete command (both btrfs and snapper) you still have to wait to get the disk space back (or use btrfs commands that wait for the cleanup to have finished). That might explain why the disk space appeared after the balance command. If really a balance command is needed it looks like a kernel problem to me. I had the same problem on a production server with opensuse 13.2. I solved it deleting several snapshots, but server was down till I found the problem. I suggest to reduce the number of old snapshots stored by default. 10 snapshots are too much space consuming. Maybe 2 or 3 would be enough. (In reply to Juan Antonio Solis from comment #9) > I had the same problem on a production server with opensuse 13.2. I solved > it deleting several snapshots, but server was down till I found the problem. > I suggest to reduce the number of old snapshots stored by default. 10 > snapshots are too much space consuming. Maybe 2 or 3 would be enough. In my experience, keeping 20-30 snapshots is no problem, if your disk size follows our recommendation (at least 16 GiB for "/"), and if you are doing regular semi-cummulative updates (i.e. weekly). I agree though that for Factory or for smaller "/" partitions the world looks differently. In addition, I strongly suggest to install the "btrfsmaintenance" package and tweak it to your needs and environment: that way you get balancing and scrubbing regularily, and you can really enjoy(!:-) using btrfs, ... MgE >In my experience, keeping 20-30 snapshots is no problem, if your disk size follows our recommendation (at least 16 GiB for "/"), / has over 30GB and my system was basically hosed from only 20 snapshots in a couple of weeks. >and you can really enjoy(!:-) using btrfs, It is hard to enjoy btrfs given how opensuse implements it, with system breaking bugs like this one from a new install. If btrfsmaintenance will get rid of snapshots before they cause the system to stop working from lack of free data, then why wouldn't opensuse people put that in by default? First of all, thank you for your answer and suggestions. My root partition has 40 Gb and the btrfsmaintenance was installed, although I had not done any changes in any configuration tool. I will try to study more about btrfs. It seems it is useful for a server which is maintained and updated in an unattended way through cron. But I think this is a very serious bug. If anyone does not tweak and do "regular semi-cummulative updates" (I still have to find out how to do it) on his desktop or laptop computer, will it run out of space in some months? I have installed opensuse 13.2 on some computers of friends and colleagues of mine, and now I am afraid they will suffer this problem soon. On my own personal computer I did an upgrade from 13.1 and my root partition is still ext4, so I cannot check how btrfs works on a laptop computer with KDE. Let's keep opensuse as a linux for open minds, not only for geeks ;-) Thank you for your effort! While thinking about this and checking on my system (which is SUSE Linux Enterprise 12, but the difference should be minimal), I realized that there might be one additional issue on typical one-user machines (compared to servers): the cleanup process (/etc/cron.daily/suse.de-snapper) is not started often enough, thus the limits defined in /etc/snapper/configs/root are not enforced early enough. @Arvin: shouldn't we trigger the snapshot cleanup also on an hourly basis (instead of daily)? One more question: should we reconsider the default limits for "/" ?? They should be rather low and people may enhance them, if there is a real need for them. For example, I have this: # grep -Ev "^#|^$" /etc/snapper/configs/root SUBVOLUME="/" FSTYPE="btrfs" ALLOW_USERS="" ALLOW_GROUPS="" SYNC_ACL="no" BACKGROUND_COMPARISON="yes" NUMBER_CLEANUP="yes" NUMBER_MIN_AGE="1800" NUMBER_LIMIT="8" NUMBER_LIMIT_IMPORTANT="8" TIMELINE_CREATE="no" TIMELINE_CLEANUP="yes" TIMELINE_MIN_AGE="1800" TIMELINE_LIMIT_HOURLY="0" TIMELINE_LIMIT_DAILY="0" TIMELINE_LIMIT_MONTHLY="0" TIMELINE_LIMIT_YEARLY="0" EMPTY_PRE_POST_CLEANUP="yes" EMPTY_PRE_POST_MIN_AGE="1800" Instead of downsizing snapper to a state where we can just not install it any longer we should fix our base tools: 1. 'df' does not work for btrfs. 2. the output of 'btrfs fi df' is incomprehensibly. 3. 'zypper' does not take into account that on btrfs with snapshots overwriting files does not give back the disk space of the old file. So it should not report "additional 7.9 KiB will be used" when likely several hundreds MiB will be used. (AFAIK this was fixed by now.) What is snapper cleanup is called after every install/upgrade action, and then if there is less than, say 5GB left on the device, it will start removing old snapshots and then doing the balance/dusage command to free up the space? And make whether or not snapper is installed on the system an option at installation? Along with fixing btrfs show fi and df. (In reply to ill lume from comment #16) > What is snapper cleanup is called after every install/upgrade action, As a default, this seems a bit too heavy to me, but as an option it sounds useful, indeed. We actually have the framework already, cf.: /usr/lib/zypp/plugins/commit/btrfs-defrag-plugin.py (package "btrfsmaintainance"). > and then if there is less than, say 5GB left on the device, it will start > removing old snapshots and then doing the balance/dusage command to free up > the space? Free-space-based decisions are being worked on; it needs btrfs subvolume quotas and snapper working together. > And make whether or not snapper is installed on the system an > option at installation? This is already there in SUSE Linux Enterprise 12 and also openSUSE 13.2 (screenshot attached). Created attachment 636673 [details]
Screenshot from openSUSE 13.2 installation with the switch to disable snapshotting
Since my server failure I am discovering all the benefits Snapper has, and it is an amazing tool. I admit that I should have studied it before installing it, but as almost everybody in the IT world, I have much work and not so much time and I had not done it yet. When I did the 13.2 fresh installation and saw that the suggested partition system was btrfs, I thought that if opensuse was suggesting it, it was because it had been thoroughly tested (I would not have thought the same with ubuntu, seriously) and was safe to install, so I accepted the default offered configuration. If I did this way, probably most standard users that install opensuse in their desktops would do the same and will accept opensuse installator proposals even unknowing what they means and expecting that the default configuration will work fine. I installed opensuse to friends of mine who does not want to know anything about snapshots or anything beyond how to browse Internet, using office, managing photos, etc. So the system should be able to manage the space automatically and to adapt the number of snapshots stored to the actual space remaining. As Matthias said, maybe a low number of snapshots stored could be enough for desktop computers. And IT people will be able to configure their servers if they need more (IT people who study more than I do, I am afraid) Or even maybe an opensuse server version and a desktop one? It is the only thing I liked from Ubuntu (I do not know if they still have it). P.D: Sorry for writing so much, but maybe it is worthy for you to know some users experience. Changed the component, as it's no direct kernel bug. Hi David, I'm not quite sure whether it is right to assign it to you, please feel free to reassign, thank you! I guess the cleanest solution for this would be a possibility, to set a maximum disk-usage. Think of setting a limit of xxx MB or yyy % of disk-space for snapshots. I guess the technical problem for this would be how to check this limit dynamically. E.g. someone deletes some large, no longer needed files to free enough space for storing other large files at the same time. So at that moment it has to be realised that old snapshots have to be deleted. I think this is pretty much the behaviour needed for a usual enduser (who maybe doesn't know anything about snapshots). So such a limit should be set by default (e.g. 10% of disk-space). An experienced user who's counting on snapshots can always disable this limit and use the already existing ways to limit snapshots and keep an eye on the free space himself. Btw. Maybe related/duplicate? https://bugzilla.opensuse.org/show_bug.cgi?id=930893 (In reply to kolA flash from comment #22) > I guess the cleanest solution for this would be a possibility, to set a > maximum disk-usage. > Think of setting a limit of xxx MB or yyy % of disk-space for snapshots. Yes,indeed. See my comment#17 above: this is called "subvolume quota", and it is being worked on. > I guess the technical problem for this would be how to check this limit > dynamically. Well, snapper could just do it, when creating or removing snapshots, ... (In reply to Matthias Eckermann from comment #23) > (In reply to kolA flash from comment #22) > > [...] > > I guess the technical problem for this would be how to check this limit > > dynamically. > > Well, snapper could just do it, when creating or removing snapshots, ... But what happens if the user's deleting files? Won't the snapshots implicitly grow at that moment, without snapper being currently active? I guess that's a problem. And if the users deleting some GB files to free necessary space for other GB files he copies onto disk in the next moment, it won't be enough to check the snapshots hourly. I don't have a lot experience with quotas, but maybe they are a possible solution for this... (In reply to kolA flash from comment #24) > (In reply to Matthias Eckermann from comment #23) > > (In reply to kolA flash from comment #22) > > > [...] > > > I guess the technical problem for this would be how to check this limit > > > dynamically. > > > > Well, snapper could just do it, when creating or removing snapshots, ... > > But what happens if the user's deleting files? Won't the snapshots > implicitly grow at that moment, without snapper being currently active? Well, I would put it this way: when deleting a file which is CoWed (Copy-on-Write-ed), the total space required in the "pool" (this is the whole btrfs partition) will not be reduced, because references to the blocks are still held. Important: this is only relevant for files which are CoWed (i.e. either snapshotted or copied as a RefLink). For "normal" files which are just single, a delete is a delete. > I guess that's a problem. And if the users deleting some GB files to free > necessary space for other GB files he copies onto disk in the next moment, > it won't be enough to check the snapshots hourly. As said, files which are not CoWed, are not affected, however if you are discussing a CoWed file, then the space indeed is not freed, when you delete the file itself, but once you are calling "snapper rm ..." -- and then we are back in good shape, because snapper can take control. Admittedly that's kind of "surprising" behaviour for people who come from "normal" unix filesystems. But for people having worked with CoW-filesystems before, it's kind of "business as usual":) That "surprising" factor is btw. one of the reasons, why for SUSE Linux Enterprise 12 we have chosen to use xfs as the default for DATA and btrfs for the OS: Administrators who know the specifics of CoW filesystems can do a lot with it (snapshotting of data, deduplication, ...), and are supported, those who don't know yet have a more relaxed life with xfs, but they can't enjoy the benefits of btrfs either. I am personally using btrfs for "/" for more than 5 years now, and for "/home" for about 4 (including snapshots for /home using pam_snapper). > I don't have a lot experience with quotas, but maybe they are a possible > solution for this... Well, proper quota monitoring will definitely help, going forward. *** Bug 967147 has been marked as a duplicate of this bug. *** This version of openSUSE changed to end-of-life (EOL [1]) status. As such it is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of openSUSE, or consider the bug still valid, please feel free to reopen this bug against that version, or open a new ticket. Thank you for reporting this bug and we are sorry it could not be fixed during the lifetime of the release. [1] https://en.opensuse.org/Lifetime So just ignore the bug without fixing it and hope it goes away? As the comment told you "please feel free to reopen this bug against that version, or open a new ticket." Do not keep it opened against EOLed product and just carry on, this is machine parsing so it closed all bugs opened against products we no longer support. Just change version to wherever you reproduce it. |