Bugzilla – Bug 388088
squid sometimes fails to rebind to its ports on log rotation
Last modified: 2008-05-20 11:54:30 UTC
Squid _sometimes_ fails to rebind to its ports on log rotation. OS: opensuse 10.2 Squid version: opensuse squid-2.6.STABLE5-7 Note: we have 60 redirectors attached to squid. /var/log/squid/cache.log 2008/05/06 03:32:48| commBind: Cannot bind socket FD 72 to 202.75.242.254:3128: (98) Address already in use 2008/05/06 03:32:48| commBind: Cannot bind socket FD 72 to 202.75.242.254:8080: (98) Address already in use I suspect this happens when squid is somehow still busy with something (redirectors or some pending connection?), so the sockets are not unbound by squid before squid tries to reattach. This might be because opensuse's /etc/init.d/squid reload does squid -k rotate, sleep 2, squid -k reconfigure. And /etc/logrotate.d/squid has /etc/init.d/squid reload as postrotate for the various squid log files. What possibly happens is 2 seconds after squid -k rotate, the squid is still busy restarting the 60 redirectors, then squid -k reload is called, and it is not possible to get rid of the previous squid socket binds in time. We will be trying a workaround of just doing squid -k rotate for log rotation. Log follows (note the HB: DEAD: <PID> messages are from the redirector processes when they die. squid is configured to bind to 127.0.0.1:3128 127.0.0.1:8080 202.75.242.254:3128 202.75.242.254:8080 ). 2008/05/06 03:32:38| storeDirWriteCleanLogs: Starting... HB: DEAD: 24962 HB: DEAD: 24964 HB: DEAD: 24960 HB: DEAD: 24976 HB: DEAD: 24977 HB: DEAD: 24981 HB: DEAD: 24959 HB: DEAD: 24967 HB: DEAD: 24970 HB: DEAD: 24975 HB: DEAD: 24982 HB: DEAD: 24941 HB: DEAD: 24916 HB: DEAD: 24979 HB: DEAD: 24917 HB: DEAD: 24958 HB: DEAD: 24921 2008/05/06 03:32:38| 65536 entries written so far. HB: DEAD: 24926 HB: DEAD: 24929 HB: DEAD: 24966 HB: DEAD: 24908 HB: DEAD: 24947 HB: DEAD: 24915 HB: DEAD: 24920 HB: DEAD: 24923 HB: DEAD: 24925 HB: DEAD: 24928 HB: DEAD: 24930 HB: DEAD: 24961 HB: DEAD: 24969 HB: DEAD: 24938 HB: DEAD: 24957 HB: DEAD: 24942 HB: DEAD: 24974 HB: DEAD: 24948 2008/05/06 03:32:38| 131072 entries written so far. 2008/05/06 03:32:38| Finished. Wrote 138190 entries. 2008/05/06 03:32:38| Took 0.2 seconds (599582.6 entries/sec). 2008/05/06 03:32:38| aioSync: flushing pending I/O operations 2008/05/06 03:32:38| aioSync: done 2008/05/06 03:32:38| logfileRotate: /var/log/squid/access.log 2008/05/06 03:32:38| Pinger socket opened on FD 9 2008/05/06 03:32:38| helperOpenServers: Starting 60 'redirector.pl' processes 2008/05/06 03:32:44| Reconfiguring Squid Cache (version 2.6.STABLE5)... 2008/05/06 03:32:44| FD 72 Closing HTTP connection 2008/05/06 03:32:44| FD 73 Closing HTTP connection 2008/05/06 03:32:44| FD 74 Closing HTTP connection 2008/05/06 03:32:44| FD 75 Closing HTTP connection 2008/05/06 03:32:44| Closing Pinger socket on FD 9 2008/05/06 03:32:44| Cache dir '/var/cache/squid' size remains unchanged at 4096000 KB 2008/05/06 03:32:44| Logformat for 'itt' is ' %ts.%03tu %6tr %>a %Ss/%03Hs %<st %rm %ru %Sh/%<A %mt "%{Referer}>h" "%{User-Age nt}>h"' 2008/05/06 03:32:44| accessLogParseLogFormat: got definition ' %ts.%03tu %6tr %>a %Ss/%03Hs %<st %rm %ru %Sh/%<A %mt "%{Refere r}>h" "%{User-Agent}>h"' 2008/05/06 03:32:44| Initialising SSL. 2008/05/06 03:32:44| Store logging disabled 2008/05/06 03:32:44| User-Agent logging is disabled. 2008/05/06 03:32:44| Referer logging is disabled. 2008/05/06 03:32:44| DNS Socket created at 0.0.0.0, port 42512, FD 7 2008/05/06 03:32:44| Adding nameserver 127.0.0.1 from squid.conf 2008/05/06 03:32:44| helperOpenServers: Starting 60 'redirector.pl' processes HB: DEAD: 28224 HB: DEAD: 28225 HB: DEAD: 28227 HB: DEAD: 28229 HB: DEAD: 28230 HB: DEAD: 28231 HB: DEAD: 28232 HB: DEAD: 28233 HB: DEAD: 28236 HB: DEAD: 28238 HB: DEAD: 28240 HB: DEAD: 28241 HB: DEAD: 28242 HB: DEAD: 28243 HB: DEAD: 28248 HB: DEAD: 28244 HB: DEAD: 28260 2008/05/06 03:32:48| commBind: Cannot bind socket FD 72 to 202.75.242.254:3128: (98) Address already in use HB: DEAD: 28262 2008/05/06 03:32:48| commBind: Cannot bind socket FD 72 to 202.75.242.254:8080: (98) Address already in use HB: DEAD: 28263 HB: DEAD: 28291 HB: DEAD: 28269 HB: DEAD: 28270 2008/05/06 03:32:48| Accepting proxy HTTP connections at 127.0.0.1, port 3128, FD 72. 2008/05/06 03:32:48| Accepting proxy HTTP connections at 127.0.0.1, port 8080, FD 73.
This can cause significant confusion and problems since though squid still ends up running it is not listening on all the required ports. So it's up, but not quite up.
I also notice that: /etc/logrotate.d/squid does not have the sharedscripts directive for all three logs.
By the way, won't doing "/etc/init.d/squid reload" for a log rotate break existing downloads? Seems a really bad idea to restart services for log rotation, rather than just doing log rotation. Doing a reload during rotation also means that if someone messed up the squid.conf file by mistake, you end up having the users noticing during log rotation time, which is usually when you don't want to be called.
If there is no reload signal, then squid continues writing into rotate logfiles, as the file handle is not changed. Will have a look...
Please attache (via attachement mechanism - and do NOT paste) /etc/squid/squid.conf /var/log/squid/cache.log Your provided information and assumptions aren't enough for me for debugging.
We have changed /etc/logrotate.d/squid To: /var/log/squid/cache.log { compress dateext maxage 365 rotate 99 size=+1024k notifempty missingok create 640 squid root sharedscripts postrotate /usr/sbin/squid -k rotate endscript } /var/log/squid/access.log { compress dateext maxage 365 rotate 99 size=+4096k notifempty missingok create 640 squid root sharedscripts postrotate /usr/sbin/squid -k rotate endscript } /var/log/squid/store.log { compress dateext maxage 365 rotate 99 size=+4096k notifempty missingok create 640 squid root sharedscripts postrotate /usr/sbin/squid -k rotate endscript } -- This appears to rotate logs successfully for us while avoiding the squid rebind, and thus hopefully the problem. Not sure if this problem will occur for a /etc/init.d/squid restart
Created attachment 214198 [details] squid.conf
Created attachment 214199 [details] cache.log when stuff went wrong 2008/05/04 03:50:19| commBind: Cannot Bind socket FD 72 to 202.75.242.254:3128: (98) Address already in use 2008/05/04 03:50:19| commBind: Cannot bind socket FD 72 to 202.75.242.254:8080: (98) Address already in use
Created attachment 214200 [details] cache.log when stuff started ok this is when stuff was ok. Not sure if it's relevant but for some reason the Pinger socket was closed in both OK logs and not closed in the bad log.
Created attachment 214201 [details] much earlier cache.log when the rotate, reconfigure went ok
I now think the Pinger bit is not relevant. This is because when squid didn't rebind properly, it stopped receiving user requests, and so that's why the pinger never got around to closing.
Sorry, but at the moment I'm unable to reproduce it. Currently I've got the feeling that the old squid isn't dying fast enough and therefore hanging a bit to long in the system, locking the port. So, you've to do some more debugging for me. Next time when this happens do (as root) in a command line terminal the following command. This might be time critical, so please don't wait. - Identify the processes which are listening on the 3128 port: lsof -i | egrep '(ndl-aas|3128).*LISTEN' ==> This will report you the process name (1st column) and the process id (2nd column). ==> If there is only one process reported, you're too late. Try again. - Check whether both processes have the same process ids (pid) or at least parent pids. ==> read the output. - If not, identify the process ids and check when both processes were started: ps -Flp <<pid1>> <<pid2>> Tell me the result. ==> still NEEDINFO
It's probably as what you say, perhaps it is a race condition and only happens under load and is related to the size of the logs being rotated (since suse 10.2 runs /etc/init.d/squid reload for each log file - 3 times). As far as I know it doesn't happen that often on our live sites but it definitely has happened on more than two sites on different occasions- and given that squid does continue running (albeit only on 127.0.0.1), I do not think that somebody did something strange directly to the site. The other explanation would be the 202.75.242.254 interface going down around the time of log rotation _and_ somehow coming back up again (since it was obviously not down). I think this is very unlikely to be the cause. For what it's worth, I have also noticed that at some sites squid is restarting the redirectors as root instead of squid! Anyway, as per comment #6, we are taking measures to avoid requiring squid to attempt to release and rebind to ports in the first place (which appears to occur with suse's /etc/init.d/squid reload ). So instead of doing suse 10.2 style of squid -k rotate; sleep 2; squid -k reconfigure potentially for _each_ squid log to be rotated, we will now just do squid -k rotate once for all 3 squid logs. This should hopefully avoid the following undesirables (estimate in brackets): 1) squid not rebinding on all required ports (very likely to work) 2) squid reloading its config every day when admins might not be around (100%) 3) squid restarting redirector processes as root (might help, but not a big issue) However this is also likely to reduce the odds of us reproducing the problem and doing your suggested debugging.
Ok, I've added "sharedscripts" to the logrotate configuration. Please note, that we don't fix this for older SUSE Linux version as it is no security issue (policy). Please note either, that the logrotate config file is a config file and therefore not replaced on next security update at your site. fixed versions submitted.
Just like to point out that as per #6 we didn't just add the sharedscripts directive we also changed: /etc/init.d/squid reload to: /usr/sbin/squid -k rotate Thanks. We'll be using our version of the config file anyway.
Sorry, but a long time ago there was an issue, because we didn't do that. The bad news is, that I cannot find the reference anymore.