Bug 492439

Summary: NTP Pattern never fails
Product: [NTS - Systems] Support Advisor Feedback Service Reporter: Aaron Burgemeister <ab>
Component: SupportPattern - Perl Assignee: Jason Record <jason.record>
Status: RESOLVED FIXED QA Contact: Mike Francis <mike.francis>
Severity: Enhancement    
Priority: P4 - Low    
Version: 1.0   
Target Milestone: 1.1   
Hardware: Other   
OS: Other   
Whiteboard:
Found By: SUSE Technical Services Services Priority: 100
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Aaron Burgemeister 2009-04-06 14:51:35 UTC
#META-DESCRIPTION    = Time drifting when running a Linux guest under VMware

ntp-00002.pl

Running this under NSA 1.0.0 I cannot get this pattern to ever show anything besides "Good" with the following section presumably being checked:

#==[ Command ]======================================#
# /usr/sbin/ntpq -p
/usr/sbin/ntpq: read: Connection refused

#==[ Configuration File ]===========================#

I'm showing the last line just so you can see there is nothing else in there.  It looks like the script is not properly checking for output as shown above and failing when it is returned.
Comment 1 Aaron Burgemeister 2009-04-06 14:52:05 UTC
Also this is in the sles9ppc archive that I sent Jason a few weeks ago if that would like to be tested.
Comment 2 Jason Record 2009-04-07 14:11:19 UTC
ntp-00002.pl is specific to a time drift problem under VMware. So sles9ppc would never get a hit because it's not VMware. However, this condition should be checked in a new pattern that looks specifically for "Connection refused" problems and an associated TID. So this is working as designed for ntp-00002.pl and should be a new pattern. I will add a proposal to http://nsa.lab.novell.com/sdp/ and work on the TID for a new pattern.
Comment 3 Jason Record 2009-04-07 14:16:57 UTC
Actually no new pattern is needed that I can tell. ntp-00001.pl correctly marks the sles9ppc archive as critical, noting that NTP is either down or dead. Please confirm that ntp-00001.pl is meeting this need. A TID can still be written about reasons why you get a connection refused, even if the ntp daemon is running. Maybe the time source specified is insufficient, but this would require testing to see if Connection refused is the error.
Comment 4 Aaron Burgemeister 2009-04-07 14:23:25 UTC
Yes, 00001 does show as Critical so as long as that one is used along with the other I think adding a TID (or modifying the current TID if appropriate) will probably work.

With that in mind, regardless of why the Connection Refused happens and regardless of its validity, we are stating that the offset is actually zero even though there is no way in the world this script actually gets that from the ntp.txt file as it exists.  It would seem to me that other invalid values could also slip through and we should be checking for an actual offset and not let things "work" just because patterns do not perfectly match... if patterns fail then fail the script (one of those non-Good/Warning/Critical errors) and let the creator know so things can be enhanced.  Does that make sense or am I missing some other philosophy behind patterns?
Comment 5 Jason Record 2009-04-07 14:34:37 UTC
I agree. This is a good case where if ntp-00001.pl fails, then ntp-000002.pl should not even be executed. This is in the works for the NSA client I believe. I'm not sure how it's going to be implemented, I think a new "pre-flight" script will be run for each product group. Once that functionality is added, this will be a mute point.
Comment 6 Jason Record 2009-04-07 16:02:29 UTC
Added dependency on "pre-flight" check feature.
Comment 7 Jason Record 2009-07-04 00:41:22 UTC
Removing dependency.
Comment 8 Jason Record 2009-07-04 00:43:23 UTC
Fixed ntp-00002.pl to account for ntpq failure and vmware vm's only. Fix checked into revision 548.