Fix unnecessary AD restarts caused by enabling service monitor
This issue affects multiple customers. The AD monitoring feature can stop or restart active directory unnecessarily. This is usually caused by DNS misconfigurations or domain controllers being temporarily taken down for maintenance. Not every problem with AD monitoring is obviously connected to DNS. I have one ticket in which "service ix-kinit status" is failing. At this point it is unclear whether this is caused by a change in the customer's environment or is a bug in AD monitoring.
Regardless, this leads to service outages when one would perhaps not have happened if monitoring were disabled.
I think we need to either:
(1) Improve the tests we're performing for "connected" and "started". Examples being:
--(a) use DNS SRV records to identify a list of domain controllers for the domain, then try to connect to all of them and only fail if they all fail.
--(b) review tests that we're using to determine whether AD is "started". Do we really need to perform "service ix-kinit status" and "service ix-activedirectory status"? Perhaps "wbinfo -p" and "wbinfo -t" are sufficient in this case. Is a 1 second sleep between tests too short?
(2) Validate the state of the domain prior to enabling monitoring. For instance:
--(a) Don't allow it to turn on if DNS is obviously misconfigured.
(3) Introduce an easy way to temporarily disable AD monitoring without restarting the AD service, so that administrators can take steps to ensure they don't experience an outage while performing maintenance on DCs.