|
Bugzilla – Full Text Bug Listing |
| Summary: | bash takes too long on "command not found" | ||
|---|---|---|---|
| Product: | [openSUSE] openSUSE 11.1 | Reporter: | Stephane Delcroix <sdelcroix> |
| Component: | Other | Assignee: | Pavol Rusnak <prusnak> |
| Status: | RESOLVED FIXED | QA Contact: | E-mail List <qa-bugs> |
| Severity: | Normal | ||
| Priority: | P3 - Medium | CC: | aj, dmacvicar, froh, kkaempf, ma, matz, mls, werner |
| Version: | Beta 2 | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Other | ||
| Whiteboard: | |||
| Found By: | --- | Services Priority: | |
| Business Priority: | Blocker: | --- | |
| Marketing QA Status: | --- | IT Deployment: | --- |
|
Description
Stephane Delcroix
2008-10-14 11:01:42 UTC
IMHO this was introduced by the support for scout. Pavol? Such reports may cause me to skip this support. Indeed, it is scout, particularly python-satsolver module, which is out of my scope. But Klaus Kaempf made some significant improvements during last weeks. Stephane: Please add repository http://download.opensuse.org/repositories/home:/prusnak:/scout/openSUSE_Factory/ and issue "zypper install scout command-not-found python-satsolver" This should install newer versions of packages (they will be available in beta 3). According to my measurements these are approx. 4x times faster than the ones you are testing. Could you please test and report back? Werner: no need for skipping support. In cause of more serious trouble we might exclude command-not-found package from distribution so no handler will get called by default, but users wanting this feature could install it later. it's indeed ~3 times faster, topping at 700ms... it's way better, but still far from being acceptable (the acceptation threshold should be at ~50ms) 50ms is IMO not possible. Adding Duncan and Klaus as they are responsible for satsolver/python-satsolver and search spends nearly all of time in them (according to profiler). Maybe they could provide more info if this time is possible. Hmmm .. instead of using python .. what's about a hash table used with C code. Compare with spell checkers like aspell or the original ispell for speed. With this the hash table is loadable once, e.g. by a user based deamon started by the bash only once on each system providing a socket for scout for the communication. Python is not used for lookups. Scout just calls repo.search( self.pathre % term, satsolver.SEARCH_REGEX | s atsolver.SEARCH_FILES, None, 'solvable:filelist' ) from python-satsolver, which is wrapped C function from satsolver library. Most (95-99%) of the time is spent in this C code. The search time heavily depends on the number and the size of enabled repositories. Scout is a Python application using the satsolver-python bindings. The search is a direct call to satsolvers 'dataiterator' interface. All internal satsolver data is hashed. The most time consuming function is evaluation of the regular expression. Hmm ... regular expressions and UTF-8 locale could cause slowdowns as then wide-character strings are used. Just wait till the library fetches the complete file list from the repositories some day in the future, then this is going to be much slower... If really the regexp implementation is to blame, don't specify a match string but do custom matching instead. We could still use statically build non-updating SQLite database like in pre-zypp times (10.3 and older). This approach could meet desired times in tens of ms. But we'll miss the files in packages from user-added repos (as .db files contain only distro packages), which is the biggest advantage of using .solv files. First you should find out if it's really the regexp lib that's to blame. Um, you search in the filelists? That will become extremely slow in the future. An SQLite DB is overkill I think. What's the nature of the lookup? How does the regexp look like? Aren't you just looking for commands, instead of arbitrary files? I suggest everyone to try this feature before commenting. We are looking for commands and their fullpath. We are also able to find commands in user-added repositories. Regexp for filelists is
^/(s?bin|usr/(s?bin|games)|opt/(kde3|gnome)/s?bin)/%s$
which matches files in
{ /bin, /sbin, /usr/bin, /usr/sbin, /usr/games, /opt/kde3/bin, /opt/kde3/sbin, /opt/gnome/bin, /opt/gnome/sbin }
Normally the name space of the command in this path are simply ASCII. What difference is seen in execution time if locale is set hard to POSIX within scout? If the problem is in a parsing of UTF-8 regexp, this should be fixed in Python side. I expect, that almost binaries available in packages for openSUSE contains a ascii characters only. So you could try to convert an unicode string term to ascii. If the python-satsolver works with arguments properly (detect the unicode, vs normal strings), this should help.
diff --git a/scout/bin.py b/scout/bin.py
index 642c104..ded881f 100644
--- a/scout/bin.py
+++ b/scout/bin.py
@@ -17,7 +17,7 @@ class SolvParser(object):
etcpath = '/etc/zypp/repos.d'
solvfile = '/var/cache/zypp/solv/%s/solv'
# path regular expression for { /bin, /sbin, /usr/bin, /usr/sbin, /usr/games, /opt/kde3/bin, /opt/kde3/sbin, /opt/gnome/bin, /opt/gnome/sbin }
- pathre = '^/(s?bin|usr/(s?bin|games)|opt/(kde3|gnome)/s?bin)/%s$'
+ pathre = '^/(s?bin|usr/(s?bin|games)|opt/(kde3|gnome)/s?bin)/%s$'.encode("ascii")
def __init__(self):
self.pool = satsolver.Pool()
@@ -35,6 +35,10 @@ class SolvParser(object):
def search(self, term):
pkgmatch = []
+ try:
+ term = term.encode("ascii")
+ except UnicodeEncodeError:
+ pass
for repo in self.pool.repos():
for d in repo.search( self.pathre % term, satsolver.SEARCH_REGEX | satsolver.SEARCH_FILES, None, 'solvable:filelist' ):
path = d.value()
Werner: the # -*- coding: utf-8 -*- in __init__.py switchs[1] all of string literals in Python to unicode. [1] http://www.python.org/dev/peps/pep-0263/ Thanks. The problem with using sat-solver for matching the filelist is that it is targeted at queries resulting from interactive use (user searching for something via a GUI). It wasn't designed for these kind of automatic queries, especially searching in filelists isn't particularly fast (it needs to construct each entry from a directory list and a basename just to run the regexp on it). We could optimize this a bit further as a special query type (where directory and basename are given separately) but I wouldn't hold my breath. For the time being I either would make this feature optional (and I think it's basically a nice feature), or implement a cache just for the programs that could be looked up faster. Michal: I really do not see the difference by using your patch. (In reply to comment #19 from Michael Matz) > basically a nice feature), or implement a cache just for the programs that > could be looked up faster. Users do not usually write the same missing command to console over and over. They install the package providing it as soon as they see the instructions. I don't know if caching is going to help there, because of the usage nature. Regarding #19: come on, you can do better. As the basename is known, you can do this: repo.search( basename, satsolver.SEARCH_STRING, None, 'solvable:filelist' ) Note the missing SEARCH_FILES, which means that we're just matching basenames. If there's a hit, the iterator value will be the complete filename and you can do your regexp match. This should also speed up things a lot, as the costly regexp match is only done when the basename matches. Oh right. Heh, funny, considering I implemented the iterator :-) Argh, we have to make a tiny change in the code for this. mls, ma: Klaus mentioned today's commits to satsolver SVN. I'm digging through logs, but could you give me any pointer how to utilize optimizations (if there were any?) Do it like proposed in comment #22. mls: I converted SOLV files to new format, built satsolver from SVN, did modifications mentioned in comment #22 and improvements are minimal :( (eg. 850ms vs. 750ms) That's surprising. Where's all the time spent? Here is hotshot python profiler output:
ncalls tottime percall cumtime percall filename:lineno(function)
6 0.552 0.092 0.552 0.092 satsolver.py:93(add_solv)
4 0.066 0.017 0.066 0.017 satsolver.py:972(step)
1 0.020 0.020 0.020 0.020 satsolver.py:946(__init__)
1 0.003 0.003 0.647 0.647 bin.py:82(main)
7 0.002 0.000 0.002 0.000 ConfigParser.py:434(_read)
12/3 0.001 0.000 0.002 0.001 sre_parse.py:385(_parse)
1 0.001 0.001 0.555 0.555 bin.py:25(__init__)
15/3 0.001 0.000 0.001 0.000 sre_compile.py:38(_compile)
116 0.000 0.000 0.001 0.000 sre_parse.py:207(get)
128 0.000 0.000 0.000 0.000 sre_parse.py:188(__next)
17/5 0.000 0.000 0.000 0.000 sre_parse.py:146(getwidth)
1 0.000 0.000 0.651 0.651 scout-cmd.py:10(runscout)
7 0.000 0.000 0.002 0.000 ConfigParser.py:266(read)
108 0.000 0.000 0.000 0.000 sre_parse.py:136(__getitem__)
1 0.000 0.000 0.089 0.089 bin.py:39(search)
109 0.000 0.000 0.000 0.000 sre_parse.py:144(append)
6 0.000 0.000 0.000 0.000 optparse.py:554(__init__)
64 0.000 0.000 0.000 0.000 ConfigParser.py:354(optionxform)
1 0.000 0.000 0.000 0.000 __init__.py:498(format)
4/3 0.000 0.000 0.002 0.001 sre_parse.py:307(_parse_sub)
It seems that most of the time is spent by adding solvables.
Are you by chance loading all the .solv files all the time again and again? That would be very slow of course. You have to keep them in memory of course, the format is designed to be loaded into memory and then worked on. That's probably adding a complete repository, not a single solvable. So it takes half a second to read all 6 repositories? That's pretty slow. Is 'zypper se bash' also that slow? (In reply to comment #30 from Michael Matz) No, you can see that add_solv is called only 6 times (6 repositories) (In reply to comment #31 from Michael Schröder) $ zypper se | wc -l 20534 $ time zypper se bash > /dev/null real 0m1.771s user 0m1.412s sys 0m0.296s I don't think it makes sense to compare this to adding solvables. Why not? zypper does pretty much the same, except that it also does some gpg-key handling. We can probably make solv file reading a bit faster, but I don't think we can do more than double the speed. re comment #32: well, then your measurement doesn't reveal much. This is about slowness of completion. If most of the time taken is for reading in the repositories _and_ this happens only once (for the first tried completion) then this time, though large, is not relevant. If OTOH you read in the repos for each completion then it of course is going to be slow. Your profile run doesn't indicate either way as it seems to be from just one completion attempt, that's why I asked that, because depending on that I have to give different advice. So: if you are reading in all repos each time an completion is attempted, this won't ever fly, reading in the .solv files is fast, but not _interactively_ fast. If you are _not_ reading in all repos each time, you need to do some other measurement (i.e. attempt multiple completions so that the first time hit for loading the repos becomes irrelevant). SOLV files are read for each completion. Is daemon holding all solvdata in memory (I assume max 20 MBs for typical user) an option? It would also need to detect changes in solv files and reload the data. I could implement it, but I don't know if this approach is reasonable. No, daemons suck. Avoid them if you can. Let's first see how much we can make the solv read faster. Do you have any "monster repositories" in your repo list, i.e. repos with more than 30000 solvables? How big are your solv files? I have total 20000 solvables according to comment #32. I have these repositories: http://download.opensuse.org/factory/repo/oss/ http://download.opensuse.org/repositories/home:/matejcik/openSUSE_Factory/ http://ftp.gwdg.de/pub/linux/misc/packman/suse/11.1/ http://download.opensuse.org/repositories/devel:/languages:/python:/Factory/openSUSE_Factory/ http://download.opensuse.org/repositories/home:/prusnak:/scout/openSUSE_Factory/ http://download.opensuse.org/repositories/home:/prusnak:/scout/data/ SOLV files have ~10MB -rw-r--r-- 1 root root 7626858 2008-10-21 15:14 ./solv/factory/solv -rw-r--r-- 1 root root 11121 2008-10-21 15:14 ./solv/home-matejcik/solv -rw-r--r-- 1 root root 2164751 2008-10-21 15:14 ./solv/packman/solv -rw-r--r-- 1 root root 15773 2008-10-21 15:14 ./solv/python/solv -rw-r--r-- 1 root root 8800 2008-10-21 15:14 ./solv/scout-data/solv -rw-r--r-- 1 root root 10578 2008-10-21 15:14 ./solv/scout/solv Ah, found the reason why it is so slow. I'm to blame for it, the new solv format parser has a bug that makes it move some data much to often. Fixed version is committed to svn... It seems a little but faster, but now it also prints to stderr: data_read_idarray: id too large (9/6) unexpected EOF, depth = 2 data_read_idarray: id too large (17/8) unexpected EOF, depth = 2 For which solv file? Can you make one of them available to me, please? *** Bug 438004 has been marked as a duplicate of this bug. *** This regressed in beta3, I'm now down to 1.6sec. What about not activating this by default ? Simply do
unset command_not_found_handle
and be fine, you may add this to your personal ~/.bashrc file
(In reply to comment #44 from Werner Fink) > Simply do > unset command_not_found_handle > and be fine, you may add this to your personal ~/.bashrc file oh, am fine. I was speaking of the distribution default, not my egotist one. Disabling this be default would make the package scout useless because this one is especially for novice also called newbe or green horn ;) (In reply to comment #15 from Pavol Rusnak) > Regexp for filelists is > > ^/(s?bin|usr/(s?bin|games)|opt/(kde3|gnome)/s?bin)/%s$ > > which matches files in > > { /bin, /sbin, /usr/bin, /usr/sbin, /usr/games, /opt/kde3/bin, /opt/kde3/sbin, > /opt/gnome/bin, /opt/gnome/sbin } > With openSUSE 11.1 there should be no /opt/gnome/ anymore and there's neither /opt/kde3/sbin So, let's simplify the regex to look only in: { /bin, /sbin, /usr/bin, /usr/sbin, /usr/games, /opt/kde3/bin} What about the following (untested): ^/(s?bin|usr/(s?bin|games)|opt/kde3/bin)/%s$ Regex match is now performed only on a few strings so it really does not matter. Closing this as fixed. We did almost everything to squeeze down the "lag" and will try to make some other improvements in the future, but the current state is certainly not a bug. I've filed a feature request to discuss an alternative, faster approach: https://features.opensuse.org/305803 |