Project

General

Profile

Bug #40432

[BETA2] Email alerts now coming in for "* NTP status: 2 out of 8 probes failed"

Added by Disk Didler about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
No priority
Assignee:
Timur Bakeyev
Category:
Services
Target version:
Seen in:
Severity:
Low
Reason for Closing:
Duplicate Issue
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Didn't get this under BETA1.

As per previous logs from me, I have a very weak CPU, so this may be my own doing on my system, however it's not entirely clear what this alert means.

"New alerts:
  • NTP status: 2 out of 8 probes failed
Alerts:
  • NTP status: 2 out of 8 probes failed"

5 or 10 minutes later a new email.

"Gone alerts:"


Related issues

Related to FreeNAS - Bug #40512: Put threshold of 8 sequential failures for NTP connection alertDone
Related to FreeNAS - Bug #40792: NTP crashing/restartingClosed

History

#1 Updated by Dru Lavigne about 2 years ago

  • Related to Bug #40512: Put threshold of 8 sequential failures for NTP connection alert added

#2 Updated by Dru Lavigne about 2 years ago

  • Category changed from OS to Services
  • Assignee changed from Release Council to Timur Bakeyev
  • Target version changed from Backlog to 11.2-BETA3

#3 Updated by Arne Klein about 2 years ago

Same here on a Xeon E3-1220 v5 (4 Cores @ 3.00GHz), which is fairly idle most of the time.

#4 Updated by Timur Bakeyev about 2 years ago

  • Status changed from Unscreened to In Progress
  • Needs QA changed from Yes to No
  • Needs Doc changed from Yes to No
  • Needs Merging changed from Yes to No

This message means that connection with the remote NTP server isn't stable, out 8 last attempts to synchronize 2(or more) had failed.

In general, you can ignore it, unless you get other warnings about NTP being out of sync.

#5 Updated by Disk Didler about 2 years ago

Timur Bakeyev wrote:

This message means that connection with the remote NTP server isn't stable, out 8 last attempts to synchronize 2(or more) had failed.

In general, you can ignore it, unless you get other warnings about NTP being out of sync.

So that's just time sync?

Maybe consider the message only if it fails like 10 times in a row or over the course of 3 hours. I'm not sure. Seems odd that it's just randomly cropped up.

#6 Updated by Timur Bakeyev about 2 years ago

Yes, it's the time synchronization, which is essential in certain configurations, sense the alert. But seems, it's too sensitive for the real world :(

Calibrating this sort of alerts is difficult in the lab environments, where everything works fine, so now we need to collect real life experience :)

#7 Updated by John Hixson about 2 years ago

Timur Bakeyev wrote:

Yes, it's the time synchronization, which is essential in certain configurations, sense the alert. But seems, it's too sensitive for the real world :(

Calibrating this sort of alerts is difficult in the lab environments, where everything works fine, so now we need to collect real life experience :)

How about only alerting if drifting more than 5 minutes?

#8 Updated by Timur Bakeyev about 2 years ago

John Hixson wrote:

Timur Bakeyev wrote:

Yes, it's the time synchronization, which is essential in certain configurations, sense the alert. But seems, it's too sensitive for the real world :(

Calibrating this sort of alerts is difficult in the lab environments, where everything works fine, so now we need to collect real life experience :)

How about only alerting if drifting more than 5 minutes?

That sounds logical, but I'm not sure that ntpd will work in such case. Although, in the case of AD, if the FN synchronizes with public NTP servers(pretty common situation), it may drift away from AD for 5 mins.

We can spot that, but I think such a check is better to implement by separate alert. In fact, I believe, Andrew has such a check in his AD plugin.

#9 Updated by Scott Finlon about 2 years ago

Timur Bakeyev wrote:

This message means that connection with the remote NTP server isn't stable, out 8 last attempts to synchronize 2(or more) had failed.

In general, you can ignore it, unless you get other warnings about NTP being out of sync.

Mine isn’t just time sync, ntpd is constantly crashing/restarting.

Aug  3 20:27:31 nas ntpd[2736]: ntpd exiting on signal 15 (Terminated)
Aug  3 20:27:31 nas ntpd[42969]: ntpd 4.2.8p11-a (1): Starting
Aug  3 20:28:54 nas ntpd[42970]: ntpd exiting on signal 15 (Terminated)
Aug  3 20:28:56 nas ntpd[43367]: ntpd 4.2.8p11-a (1): Starting

#10 Updated by Timur Bakeyev about 2 years ago

Ok, then alert was actually right - you have problems with NTP on your system. That also explains, why the original message occurs - after restart NTPd resets the attempts counter.

Signal 15 from other side means normal termination of the process. Could it be that you changed IP address of the box at that time?

Can you provide full debug log of the system attached to this ticket?

#11 Updated by Dru Lavigne about 2 years ago

  • Related to Bug #40792: NTP crashing/restarting added

#12 Updated by Timur Bakeyev about 2 years ago

23816

Scott sent this screenshot to another ticket, but it belongs here.

#13 Updated by Timur Bakeyev about 2 years ago

  • Status changed from In Progress to Closed
  • Target version changed from 11.2-BETA3 to N/A

Hi, Scott!

For the 11.2-BETA3 we'd try to find better thresholds, so temporary network issues won't produce annoying alerts.

At the moment I see that basically two checks are too sensitive - number of the failed attempts and periodic connection to the remote NTP server, which can temporary fail. Well, one is a result of another, no be precise.

I'll close this ticket as there is another, duplicating one for the same problem.

#14 Updated by Dru Lavigne about 2 years ago

  • Reason for Closing set to Duplicate Issue

Also available in: Atom PDF