Project

General

Profile

Bug #25363

Activedirectory fails to join domain after upgrade from 11.0-RELEASE to 11.0-U2

Added by Trevor Hennessy about 3 years ago. Updated almost 3 years ago.

Status:
Closed: Insufficient Info
Priority:
No priority
Assignee:
Timur Bakeyev
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I upgraded last week to 11.0-U2 and AD will not rejoin the domain. Below is the output I get from troubleshooting section of the user guide.

root@tython:~ # sqlite3 /data/freenas-v1.db "update directoryservice_activedirectory set ad_enable=1;"
root@tython:~ # echo $?
0
root@tython:~ # service ix-kerberos start
root@tython:~ # service ix-nsswitch start
root@tython:~ # service ix-kinit start
root@tython:~ # service ix-kinit status
root@tython:~ # echo $?
0
root@tython:~ # klist
Credentials cache: FILE:/tmp/krb5cc_0
Principal:

Issued                Expires               Principal
Jul 31 07:32:37 2017 Jul 31 17:32:37 2017
root@tython:~ # python /usr/local/www/freenasUI/middleware/notifier.py start cifs
True
root@tython:~ # service ix-activedirectory start
Join is OK
root@tython:~ # service ix-activedirectory status
gse_get_client_auth_token: gss_init_sec_context failed with [ Miscellaneous failure (see text): TGT has been revoked](2529638932)
gse_get_client_auth_token: gss_init_sec_context failed with [ Miscellaneous failure (see text): TGT has been revoked](2529638932)
root@tython:~ # echo $?
1

Related issues

Related to FreeNAS - Bug #25264: Active Directory Recovery AttemptsClosed2017-07-22

History

#1 Updated by Trevor Hennessy about 3 years ago

  • File debug-tython-20170731081337.txz added

#2 Updated by Dru Lavigne about 3 years ago

  • Status changed from Unscreened to 15
  • Seen in changed from Unspecified to 11.0-U2

Trevor: that sounds like a timing issue. What is the output of ntpdate -q AD?

#3 Updated by Trevor Hennessy about 3 years ago

Dru Lavigne wrote:

Trevor: that sounds like a timing issue. What is the output of ntpdate -q AD?

See below. I assume I needed to replace "AD" with the domain fqdn.

root@tython:~ # ntpdate -q corusent.intra
server 10.2.17.204, stratum 3, offset -0.011783, delay 0.05312
server 172.19.28.17, stratum 4, offset -0.043233, delay 0.06006
server 192.168.100.247, stratum 4, offset -0.019380, delay 0.05646
server 10.3.148.55, stratum 3, offset -0.013825, delay 0.08900
server 10.3.225.15, stratum 16, offset 3.632399, delay 0.05341
server 192.168.24.113, stratum 16, offset 0.188069, delay 0.10042
server 10.3.135.55, stratum 3, offset -0.024774, delay 0.11362
server 10.2.17.200, stratum 3, offset -0.032769, delay 0.05312
server 10.17.26.55, stratum 3, offset -0.031893, delay 0.06021
server 10.2.19.200, stratum 2, offset -0.020880, delay 0.05327
server 172.20.44.10, stratum 3, offset -0.020784, delay 0.05566
server 10.3.143.55, stratum 4, offset -0.028744, delay 0.09140
server 172.19.0.130, stratum 3, offset -0.029462, delay 0.05731
server 10.2.17.203, stratum 3, offset -0.021420, delay 0.05315
server 10.4.17.11, stratum 3, offset -0.017665, delay 0.05325
server 10.3.225.17, stratum 16, offset 5.392967, delay 0.05371
server 172.27.0.50, stratum 3, offset -0.020069, delay 0.07491
server 172.24.0.25, stratum 3, offset -0.028727, delay 0.08856
server 10.17.34.55, stratum 3, offset -0.024888, delay 0.06210
server 10.4.24.11, stratum 3, offset -0.058224, delay 0.05325
31 Jul 10:34:36 ntpdate: adjust time server 10.2.19.200 offset -0.020880 sec

#4 Updated by Dru Lavigne about 3 years ago

  • Status changed from 15 to Unscreened
  • Assignee changed from Release Council to Timur Bakeyev

#5 Updated by Timur Bakeyev about 3 years ago

Under AD it was meant the address of your Domain Controller server, to which FreeNAS box joins to. Can you please get the output of:

# ntpdate -q AD/DC.Server.FQDN
# ntpq -c rv

From the output you provided I see that most of the servers in your network do have synchronized time, but to be sure let's see what is the relation between FN and DC times.

Also, there are couple of servers which are out of sync - not sure, how do they participate in the network:

server 10.3.225.15, stratum 16, offset 3.632399, delay 0.05341
server 192.168.24.113, stratum 16, offset 0.188069, delay 0.10042
server 10.3.225.17, stratum 16, offset 5.392967, delay 0.05371

#6 Updated by Trevor Hennessy about 3 years ago

All those machines are our domain controllers. So when joining our domain anyone of those can handle the join. However in this case I ran ntpdate on the one that is the primary.

root@tython:~ # ntpdate -q cptordomain001.corusent.intra
server 10.2.17.200, stratum 3, offset -0.003422, delay 0.05312
31 Jul 15:28:10 ntpdate49791: adjust time server 10.2.17.200 offset -0.003422 sec

root@tython:~ # ntpq -c rv
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.8p10-a (1)", processor="amd64",
system="FreeBSD/11.0-STABLE", leap=00, stratum=2, precision=-24,
rootdelay=12.702, rootdisp=528.805, refid=206.108.0.133,
reftime=dd2a029b.0ceda8aa Mon, Jul 31 2017 15:25:15.050,
clock=dd2a038c.a7cf2dd5 Mon, Jul 31 2017 15:29:16.655, peer=25455,
tc=10, mintc=3, offset=0.957962, frequency=22.642, sys_jitter=4.261878,
clk_jitter=0.599, clk_wander=0.116

#7 Updated by Timur Bakeyev about 3 years ago

  • Status changed from Unscreened to Screened
  • Target version set to 11.0-U3

#8 Updated by Timur Bakeyev about 3 years ago

  • Status changed from Screened to 15

Sorry, somehow I forgot to reply to this ticket.

To be honest I'm a bit confused by the topology of your AD network. Can you describe, which servers are serving which domains and how are you connecting your FN box to them?

The question regarding NTP is still valid, as there are at least 3 servers in your domain which are out of sync completely, I also see from the output of ntpq that time source is quite unstable.

In general, although not mandatory(why?) it is recommended for AD members to synchronize their clocks with the AD DC, rather than 3d parity servers. So, try to remove predefined NTP servers from System -> General -> NTP servers and put there IP addresses/hostnames of your DC.

Also, despite your server has 32Gb of memory I see:

[2017/07/31 08:12:26.283021,  1] ../source3/winbindd/winbindd_ads.c:729(lookup_usergroups_memberof)
  lookup_usergroups_memberof ads_search member=CN=PromoB101,OU=CRB - Service and Generic Accounts,OU=CRB - Corus Radio Barrie,DC=corusent,DC=intra: Out
 of memory
[2017/07/31 08:12:26.286340,  1] ../source3/libads/ldap_utils.c:334(ads_ranged_search_internal)
  could not pull first usnChanged!
[2017/07/31 08:12:26.286363,  1] ../source3/winbindd/winbindd_ads.c:729(lookup_usergroups_memberof)
  lookup_usergroups_memberof ads_search member=CN=PeakContests,OU=CRB - Service and Generic Accounts,OU=CRB - Corus Radio Barrie,DC=corusent,DC=intra:
Out of memory
[2017/07/31 08:12:26.289532,  1] ../source3/libads/ldap_utils.c:334(ads_ranged_search_internal)
  could not pull first usnChanged!
[2017/07/31 08:12:26.289556,  1] ../source3/winbindd/winbindd_ads.c:729(lookup_usergroups_memberof)
  lookup_usergroups_memberof ads_search member=CN=CC Test,OU=Users,OU=Toronto,DC=corusent,DC=intra: Out of memory

I'm curious, how many user entries does your AD have?! Lack of memory also could be the reason of strange Kerberos behaviour.

#9 Updated by Trevor Hennessy about 3 years ago

We only have one AD domain. The servers shown by ntpdate -q corusent.intra are the main group of domain controllers. As for the topology we have over 20 offices around the world and each location has at least one AD server in case of WAN disruptions.

Now for number of entries in AD. I would ballpark that number at around 15000 entries, including computers, users and security groups. I did disable user/group cache so shouldn't that get around the memory limitations?

Related to this I went to our other freenas box we're testing out, it has the exact same hardware and freenas version as this one but has never been joined to AD before. I joined it to our domain and that went off without a hitch. However it keeps complaining about failing to recover the AD service. I can't find anything in the logs indicating why it has to recover anything as it looks like it's working since I can see the machine in the AD admin console and set ad file share permissions from a windows machine.

#10 Updated by Dru Lavigne about 3 years ago

  • Status changed from 15 to Investigation
  • Target version changed from 11.0-U3 to 11.1

#11 Updated by Timur Bakeyev about 3 years ago

  • Related to Bug #25264: Active Directory Recovery Attempts added

#12 Updated by Timur Bakeyev about 3 years ago

I believe that issue with the false alert about failing recover for AD was fixed in the nightly build and that it should be available in the 11.0-U3, which should be available next week, I hope. So, if that the only trouble you get with the FreeNAS now, we can wait till the upcoming release and verify that everything is working now as intended.

Recently I've seen a deployment with over 60K users in domain and besides being slow a bit it worked OK. So it must be something else, that causes this OOM message. If you still see it in the logs - we may try to run SMB with higher level of debug info and see, where does it run out of memory.

The caching you referred to is different sort of caching and plays role only on the speeding up getpwnam/getgrnam lookups, so it must be so unusual configuration in your AD, like multilevel nested groups or just wide groups.

#13 Avatar?id=14398&size=24x24 Updated by Kris Moore almost 3 years ago

  • Target version changed from 11.1 to 11.1-U1

#14 Updated by Timur Bakeyev almost 3 years ago

Hi, Trevor!

Do you have any updates on this issue?

#15 Updated by Dru Lavigne almost 3 years ago

  • Status changed from Investigation to Closed: Insufficient Info
  • Target version changed from 11.1-U1 to N/A
  • Private changed from Yes to No

Closing out.

#16 Updated by Dru Lavigne almost 3 years ago

  • File deleted (debug-tython-20170731081337.txz)

Also available in: Atom PDF