Project

General

Profile

Bug #40512

Put threshold of 8 sequential failures for NTP connection alert

Added by Greg Stengel about 2 years ago. Updated about 2 years ago.

Status:
Done
Priority:
No priority
Assignee:
Timur Bakeyev
Category:
Services
Target version:
Seen in:
Severity:
Low
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

After upgrading to FreeNAS-11.2-BETA2 I am getting a constant alert about NTP status. It's very odd as the IP mentioned 23.239.26.89 resolves back hadb1.smatwebdesign.com? Looks like some web hosting company and I am not sure how it's relative to NTP or FreeNAS.


Related issues

Related to FreeNAS - Bug #40432: [BETA2] Email alerts now coming in for "* NTP status: 2 out of 8 probes failed" Closed

Associated revisions

Revision fc5bab25 (diff)
Added by Timur Bakeyev about 2 years ago

tkt-40512: Put threshold of 8 sequential failures for NTP connection alert. (#1698) * Put threshold of 8 sequential failures for NTP connection alert. Ticket: #40512

History

#1 Updated by Greg Stengel about 2 years ago

  • File debug-freenas-20180802155603.txz added
  • Private changed from No to Yes

#2 Updated by Dru Lavigne about 2 years ago

  • Category changed from Services to Middleware
  • Assignee changed from Release Council to William Grzybowski

#4 Updated by William Grzybowski about 2 years ago

  • Assignee changed from William Grzybowski to Timur Bakeyev
  • Target version changed from Backlog to 11.2-BETA3

Timur, seems like NTP is very flaky. Do we really need to alert unless something terrible is going on?

#5 Updated by Dru Lavigne about 2 years ago

  • Related to Bug #40432: [BETA2] Email alerts now coming in for "* NTP status: 2 out of 8 probes failed" added

#6 Updated by Timur Bakeyev about 2 years ago

  • Status changed from Unscreened to Screened

William Grzybowski wrote:

Timur, seems like NTP is very flaky. Do we really need to alert unless something terrible is going on?

That's a good question, in ideal world those errors shouldn't happen, but unless this is a persistent issue it possibly can be ignored.

#7 Updated by Timur Bakeyev about 2 years ago

  • Severity changed from New to Low
  • Needs QA changed from Yes to No
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

Hi, Greg!

By default FreeNAS picks up 2(3) random remote NTP servers which are provided by 0.freebsd.pool.ntp.org alias. This list is build on pretty voluntary basis and basically anyone who participates in the effort to provide free NTP sources can join(if their server qualifies).

So, one of that participants is 23.239.26.89 server and it was selected by NTP as best fitting one out of 3 picked ones:

+--------------------------------------------------------------------------------+
+                             ntpq -c rv @1533225343                             +
+--------------------------------------------------------------------------------+
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.8p11-a (1)", processor="amd64",
system="FreeBSD/11.2-STABLE", leap=00, stratum=3, precision=-23,
rootdelay=84.865, rootdisp=48.316, refid=23.239.26.89,
reftime=df0da6c2.25e8bb8a  Thu, Aug  2 2018 11:50:26.148,
clock=df0da7ff.79387b47  Thu, Aug  2 2018 11:55:43.473, peer=4182, tc=9,
mintc=3, offset=1.499756, frequency=40.714, sys_jitter=5.553191,
clk_jitter=0.339, clk_wander=0.013

+--------------------------------------------------------------------------------+
+                             ntpq -pwn @1533225343                              +
+--------------------------------------------------------------------------------+
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+12.167.151.2    198.148.79.210   3 u   47  512  377   42.398    2.739   0.901
+38.229.71.1     204.9.54.119     2 u  140  512  377   56.140   -1.840   1.135
*23.239.26.89    216.218.254.202  2 u  317  512  377   47.557    4.016   0.314

In general, you can reshuffle the list of the servers by just restarting NTPD:

# service ntpd restart

But what exact warning message do you get? It could be that your internet connection is actually flaky and NTP server isn't reachable.

Can you quote that message?

#8 Updated by William Grzybowski about 2 years ago

Timur Bakeyev wrote:

William Grzybowski wrote:

Timur, seems like NTP is very flaky. Do we really need to alert unless something terrible is going on?

That's a good question, in ideal world those errors shouldn't happen, but unless this is a persistent issue it possibly can be ignored.

Several users are getting these errors, telling them to ignore is not going to fly.

We need to figure out a way to alert only when absolutely necessary (perhaps when the issues have persisted for a good anount of time?)

#9 Updated by Timur Bakeyev about 2 years ago

William Grzybowski wrote:

Several users are getting these errors, telling them to ignore is not going to fly.

We need to figure out a way to alert only when absolutely necessary (perhaps when the issues have persisted for a good anount of time?)

I can't argue with you on that, the question is how to calibrate those warnings. I guess, it would be nice to collect complains of the users and draw a line where flaking service starts to become a problem.

#10 Updated by Greg Stengel about 2 years ago

I get a couple different related messages. Here are some specific examples

New alerts:
  • NTP status: No response received from 96.126.105.86.
Alerts:
  • NTP status: No response received from 96.126.105.86.
New alerts:
  • NTP status: 6 out of 8 probes failed
Alerts:
  • NTP status: 6 out of 8 probes failed

#11 Updated by Greg Stengel about 2 years ago

Also I can confirm it's not caused by a flaky internet connection. I work on the same internet connection and it's been up all day. Let me know if there is any other details I can provide.

#12 Updated by Timur Bakeyev about 2 years ago

Greg, thanks for the update.

I guess this check is too sensitive for the real life Internet. I'll disable it in the next version of the FN update.

How frequently do you get those alerts? If you'd restart ntpd - would anything change in regards of the alerts, as, probably, another remote peer will be picked up for synchronization. It could be that it's only your connectivity with the 96.126.105.86 is flaky or, that there is some minor, but notable UDP packet loss in your network - that would stay unnoticeable by TCP connections(like ssh), but may affect UDP protocols, like ntp. Another good example of UDP protocol is DNS - does your FN box always capable to resolve hostnames from the first attempt?

#13 Updated by John Hixson about 2 years ago

As stated in the other NTP alert ticket, how about we only alert if drifting more than 5 minutes? That's the standard for Kerberos. Even at that point, we should have a backoff algorithm for the alerts to we aren't spamming.

#14 Updated by Greg Stengel about 2 years ago

I would say they average every 30 minutes. It's not consistent, but often. Also, I have received alerts regarding 4 endpoints so far; 45.56.118.161, 45.79.111.167, 96.126.105.86, 23.239.26.89. I restarted ntpd yesterday after your previous suggestion and the alerts continue. I will run some tests for DNS resolution today and update later.

#15 Updated by Greg Stengel about 2 years ago

against a list of 1757 random domains, those that had proper DNS resolved instantly.

#16 Updated by Timur Bakeyev about 2 years ago

Hi, Greg!

I couldn't find any UDP issues in the output of netstat -s -p UDP in your debug log. Well, you can check yourself again, if there is anything wrong there now.

I don't see what else immediately can be wrong with NTP traffic, so I think I'll just put higher threshold for the alerts and only long lasting connectivity issues will be reported.

#17 Updated by Timur Bakeyev about 2 years ago

  • File deleted (debug-freenas-20180802155603.txz)

#18 Updated by Timur Bakeyev about 2 years ago

  • Status changed from Screened to In Progress
  • Private changed from Yes to No

#19 Updated by Dru Lavigne about 2 years ago

  • Category changed from Middleware to Services

#20 Updated by Timur Bakeyev about 2 years ago

#21 Updated by Dru Lavigne about 2 years ago

  • Subject changed from NTP status: No response received from 23.239.26.89 | FreeNAS-11.2-BETA2 to Put threshold of 8 sequential failures for NTP connection alert
  • Needs QA changed from No to Yes
  • Needs Merging changed from No to Yes

#22 Updated by Bug Clerk about 2 years ago

  • Status changed from In Progress to Ready for Testing

#23 Updated by Timur Bakeyev about 2 years ago

  • Needs Merging changed from Yes to No

#24 Updated by Bonnie Follweiler about 2 years ago

  • Status changed from Ready for Testing to Passed Testing

Test Passed in FreeNAS-11.2-MASTER-201809050856

#25 Updated by Dru Lavigne about 2 years ago

  • Status changed from Passed Testing to Done
  • Needs QA changed from Yes to No

Also available in: Atom PDF