Reconnecting Active Directory kills Samba connections
I'm not sure if it's a recent Windows Update that causes this, but FreeNAS occasionally cannot connect to the domain controller after periods of inactivity. This causes the Service Monitor middleware to (rightly) assume the directory is unavailable, and when it eventually can make the connection again runs the re-join routine. All is well.
Unfortunately, during this time there may have been existing connections to the samba shares (I don't know if this happens on other sharing services, I don't have them enabled) that shouldn't be interrupted, and when the AD rejoin finishes one of the last things it does is stop and start the samba server, effectively killing all connections. This is undesired, as those who had legitimate connections and (after reconnecting) continue to have legitimate connections should not have been killed.
I believe the whole point of the AD service restarting samba is due to potential reconfiguration, but in this case nothing (as far as I know) has been reconfigured.
I think if the status of the samba service was detected as "started" then the proper action would be simply to service samba_server reload .
Obviously the base problem of the AD controller not disappearing would be better in the end, but it has been shown that samba can handle that (mostly) on its own for existing connections fine until it needs to authenticate a new connection, so this solution could be a good stop-gap until I resolve that.
#5 Updated by Anthony Takata over 2 years ago
Anthony Takata wrote:
I notice that the lack of connection might be a DNS issue? For reasons unknown it seems that failover isn't working right, so when the first domain controller FreeNAS sees goes down (I just restarted it to test) it doesn't try the next one... :/
Since DNS responses are apparently not cached, I just added a sleep on fail of 1 second so that with luck it randomly chooses the one that's working. Still get the "Cannot connect" logs, but it seems to pick up the right one after a few tries well enough that the service monitor doesn't assume it's gone and kills Active Directory.
Relevant line 102 in /usr/local/lib/python3.6/site-packages/middlewared/plugins/service_monitor.py
#7 Updated by Timur Bakeyev over 2 years ago
- Status changed from Not Started to Closed
- Target version changed from 11.2-RC2 to N/A
- Needs QA changed from Yes to No
- Needs Doc changed from Yes to No
- Needs Merging changed from Yes to No
Service monitor in certain cases can do more harm than good, unfortunately, so if it creates too many troubles I'd recommend to disable it.
Recent versions of FreeNAS tried to address some of the AD/DC availability detection problems, so please, try 11.1-U5 or 11.2b1.
Samba, in particular
winbindd may sometimes stuck, accessing dead DC over and over again, and simple reload doesn't help in such case(or did you managed to get constant recovery with reload?).
In the future versions we'd try to restart only
winbindd in such situations, that should reduce impact, caused by AD monitoring. In general, we learned that AD/DC can break in very sophisticated manners and hence monitoring have to be more sophisticated as well.
So, we keep improving the monitoring, try recent enough FN versions, if that still doesn't work as desired - disable it and make a ticket, describing the situation that lead to such a failure.