Project

General

Profile

Feature #21458

Provide alert if configured NTP server can not be contacted

Added by Sam Fourman about 2 years ago. Updated 9 months ago.

Status:
Done
Priority:
Expected
Assignee:
Timur Bakeyev
Category:
Middleware
Target version:
Estimated time:
Severity:
Medium
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

Description

Since clock skew of even a handful of seconds can break AD, configure the Alert system to check if the ntp server can be contacted.

something like:

nc -vzu 10.212.0.2 123
Connection to 10.212.0.2 123 port [udp/ntp] succeeded!

tests.tar.gz (4.99 KB) tests.tar.gz Timur Bakeyev, 07/18/2018 01:58 PM

Related issues

Related to FreeNAS - Bug #18362: Add alert to indicate when AD is out of sync with NTPClosed2016-10-19

Associated revisions

Revision e2ffdacf (diff)
Added by Timur I. Bakeyev 10 months ago

Add NTP alert, in case peer is not reachable, as well as some other
related sanity checks. More thorough ones can be derived.

Ticket: #21458

Revision 87cd47f2 (diff)
Added by Timur Bakeyev 9 months ago

Add NTP alert, in case peer is not reachable (#1540)

  • Add NTP alert, in case peer is not reachable, as well as some other
    related sanity checks. More thorough ones can be derived.

Ticket: #21458

History

#1 Updated by Ash Gokhale about 2 years ago

perhaps parse output of ntptime and error when skew is more that 3sec:
#ntptime :root:/z/case:14:04:12:272badger
ntp_gettime() returns code 0 (OK)
time dc5b2c61.ac711d7c Fri, Feb 24 2017 14:04:17.673, (.673601196),
maximum error 83202885 us, estimated error 2836 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 0.000 us, frequency 27.608 ppm, interval 1 s,
maximum error 83202885 us, estimated error 2836 us,
status 0x2001 (PLL,NANO),

#2 Updated by Ash Gokhale about 2 years ago

Also error if ntpservers do not have a locked peer relationship:

#3 Updated by William Grzybowski about 2 years ago

  • Status changed from Unscreened to Screened
  • Target version changed from TrueNAS-9.10.2-U2 to TrueNAS-9.10.3

#4 Avatar?id=14398&size=24x24 Updated by Kris Moore about 2 years ago

  • Target version changed from TrueNAS-9.10.3 to 11.0

#5 Updated by William Grzybowski almost 2 years ago

  • Project changed from TrueNAS to FreeNAS
  • Category changed from 224 to 53
  • Priority changed from Important to Expected
  • Target version changed from 11.0 to 11.1

Moving to FreeNAS since its no specific to make sure it makes into the next feature set release.

#6 Updated by William Grzybowski almost 2 years ago

  • Status changed from Screened to Unscreened
  • Assignee changed from William Grzybowski to Kris Moore

Load-balancing

#7 Avatar?id=14398&size=24x24 Updated by Kris Moore almost 2 years ago

  • Assignee changed from Kris Moore to Timur Bakeyev

Timur - This one will greatly help with AD troubleshooting. Can you take a look and work with William/Marcelo/Suraj in implementation?

#8 Updated by Timur Bakeyev almost 2 years ago

  • Status changed from Unscreened to Screened

#9 Updated by Timur Bakeyev almost 2 years ago

  • Status changed from Screened to Investigation

There is an ntplib Python lib, that performs basic NTP operations.

#!/usr/local/bin/python
#
import time
import ntplib
from ntplib import NTPClient, NTPException

#NTP_SERVER = "0.freebsd.pool.ntp.org" 
NTP_SERVER = "10.0.255.250" 
NTP_SERVER_BAD = "www.microsoft.com" 

client = NTPClient()

info = client.request(NTP_SERVER)

# ntpdate -q 10.0.255.250
#server 10.0.255.250, stratum 3, offset 0.001706, delay 0.02589
# 7 Jul 04:19:18 ntpdate[44792]: adjust time server 10.0.255.250 offset 0.001706 sec
print("server %s, stratum %d, offset %f, delay %f" %(NTP_SERVER, info.stratum, info.offset, info.root_delay))

# check response
#self.assertTrue(isinstance(info, ntplib.NTPStats))
print("Version: %d" % info.version)
print("Offset: %f" % info.offset)
print("Stratum: %x" % info.stratum)
print("Precision: %x" % info.precision)
print("Root delay: %f" % info.root_delay)
print("Root dispersion: %f" % info.root_dispersion)
print("Delay: %d" % info.delay)
print("Leap: %d" % info.leap)
#self.assertIn(info.leap, ntplib.NTP.LEAP_TABLE)
print("Poll: %x" % info.poll)
print("Mode: %d" % info.mode)
#self.assertIn(info.mode, ntplib.NTP.MODE_TABLE)
print("Refid: %x" % info.ref_id)
print("TX time: %f" % info.tx_time)
print("Ref time: %f" % info.ref_time)
print("Orig time: %f" % info.orig_time)
print("Dest time: %f" % info.dest_time)
print("Recv time: %f" % info.recv_time)

print("%ld %ld" % ( int(info.tx_time), ntplib.ntp_to_system_time(ntplib.system_to_ntp_time(int(info.tx_time)))) )
print("Leap: %s" % ntplib.leap_to_text(info.leap))
print("Mode: %s" % ntplib.mode_to_text(info.mode))
print("Stratum: %s" % ntplib.stratum_to_text(info.stratum))
print("RefID: %s" % ntplib.ref_id_to_text(info.ref_id, info.stratum))

try:
    new_info = client.request(NTP_SERVER_BAD, timeout=1)
except NTPException as ne:
    print("FATAL: " + ne.message)
]# ntpdate -q 10.0.255.250
server 10.0.255.250, stratum 3, offset 0.003098, delay 0.02585
10 Jul 01:30:42 ntpdate[74562]: adjust time server 10.0.255.250 offset 0.003098 sec
server 10.0.255.250, stratum 3, offset 0.003227, delay 0.005920
Version: 2
Offset: 0.003227
Stratum: 3
Precision: -17
Root delay: 0.005920
Root dispersion: 0.056061
Delay: 0
Leap: 0
Poll: 3
Mode: 4
Refid: d5ef9a0c
TX time: 1499643047.698076
Ref time: 1499641950.401897
Orig time: 1499643047.694415
Dest time: 1499643047.695232
Recv time: 1499643047.698025
1499643047 1499643047
Leap: no warning
Mode: server
Stratum: secondary reference (3)
RefID: 213.239.154.12
FATAL: No response received from www.microsoft.com.

The output and information obtained via lib comparable to the results of ntpdate.

The library, seems, doesn't support NTP authentication though, not sure, how critical is that.

Haven't tested it on IPv6 peers either.

#10 Updated by Timur Bakeyev almost 2 years ago

Implemented nice, simple, easy to understand wrong check. Have to keep in mind, that by default we use pool.ntp.org addresses they are not bounded to some particular hosts, but randomly picked up by DNS round-robin. So, for meaningful results we need to parse the output of ntpq.

#11 Updated by Timur Bakeyev almost 2 years ago

Ok, implemented parsing and analysing of the ntpq output with the cooperation of the ntplib. Now only need to wrap that into the Alerts class and find the way to test it somehow...

#12 Updated by Dru Lavigne over 1 year ago

  • Status changed from Investigation to 46

Timur: is this feature still on track for 11.1?

#13 Updated by Timur Bakeyev over 1 year ago

  • Status changed from 46 to Fix In Progress

I have some working code, need to integrate in into the alerts sub-system, should be ready before 11.1.

#14 Avatar?id=14398&size=24x24 Updated by Kris Moore over 1 year ago

Timur - Ping! is this going to be in for 11.1?

#15 Updated by Timur Bakeyev over 1 year ago

  • Target version changed from 11.1 to 11.1-U1

Hi, Kris!

I forgot about this one :( The code is somewhat working, but I want to test it in tougher conditions. also, there is a ticket, where user got NTP not working in a interesting way - need to check that new code can detect such an issue.

#16 Updated by Dru Lavigne over 1 year ago

  • Target version changed from 11.1-U1 to 11.2-BETA1

Moving to next feature release.

#17 Updated by Erin Clark over 1 year ago

  • Related to Bug #18362: Add alert to indicate when AD is out of sync with NTP added

#18 Updated by Dru Lavigne about 1 year ago

  • Status changed from Fix In Progress to In Progress

#19 Updated by Dru Lavigne about 1 year ago

  • Target version changed from 11.2-BETA1 to 11.2-RC2

#20 Updated by Timur Bakeyev 12 months ago

  • Severity set to Medium

Alerts API has changed, need to update the code.

#21 Updated by Timur Bakeyev 9 months ago

#22 Updated by Timur Bakeyev 9 months ago

user#5268 while tests are there, the thresholds are quite arbitrary and may require adjustment. Feel free to propose your values as well as more checks that can be done.

#23 Updated by Timur Bakeyev 9 months ago

  • Status changed from In Progress to Ready for Testing

#24 Updated by Dru Lavigne 9 months ago

  • Subject changed from Provide Alert if configured ntpserver can not be contacted to Provide Alert if configured NTP server can not be contacted
  • Target version changed from 11.2-RC2 to 11.2-BETA2
  • Needs Merging changed from Yes to No

#25 Updated by Dru Lavigne 9 months ago

  • Subject changed from Provide Alert if configured NTP server can not be contacted to Provide alert if configured NTP server can not be contacted

#28 Updated by Michael Reynolds 9 months ago

  • Status changed from Ready for Testing to Passed Testing
  • Needs QA changed from Yes to No

Timur tested this using the attached files

#29 Updated by Timothy Moore II 9 months ago

  • Needs Doc changed from Yes to No

#30 Updated by Dru Lavigne 9 months ago

  • Status changed from Passed Testing to Done

Also available in: Atom PDF