Project

General

Profile

Bug #9589

AD timeout not honored when booting and DC is not available

Added by Thomas Stather over 4 years ago. Updated about 3 years ago.

Status:
Closed: Not To Be Fixed
Priority:
Nice to have
Assignee:
Erin Clark
Category:
OS
Target version:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

Hi

This is a follow up on ticket 8013 which is marked as resolved and i think it is not

There it was stated that when the timeout is set in the directory settings (in my case 10 seconds) and during boot the DC is not available, the FreeNAS system continues to boot without the AD join (which can be done later manually as soon as the GUI comes up).

This is however not the case, the FreeNAS box hangs on boot at "generating host.conf" and reses itself (i thing due to a watchdog) after 1 minute or so.

John Hixton said i should try the latest nightly (one month ago), i tested yesterday without success.

"kinit" in state "[connect]"

when i press ctrl+c it boots until

starting ntpd

and gets stuck again. Then i press ctrl+t i see

python2.7 2441 [select]

ctrl+c again it gets stuck at

...ix-pre-samba running

ctrl+c again it gets stuck at

...ix-kinit running

ix-pre-samba

ctrl+c again it boots

I think it isn't important pointing out that its weird to have the DC virtualized (like in my case) but the FreeNAS box should bot in any case if it cannot find any DC (this situation can occur as well due to networking problems, which are resolved after the FreeNAS box boots).

Best,
Thomas

storage-test.jpg (162 KB) storage-test.jpg Thomas Stather, 06/13/2015 02:05 PM
3102

Related issues

Related to FreeNAS - Bug #8013: please help latest 9.3 won't reboot, stuck at "generating host.conf"Resolved2015-02-13

History

#1 Updated by Thomas Stather over 4 years ago

  • File debug-storage-20150505093805.tgz added

attached the debug info if needed

#2 Updated by Jordan Hubbard over 4 years ago

  • Category set to 36
  • Assignee set to John Hixson
  • Target version set to Unspecified

#3 Updated by Jordan Hubbard over 4 years ago

  • Related to Bug #8013: please help latest 9.3 won't reboot, stuck at "generating host.conf" added

#4 Updated by John Hixson over 4 years ago

  • Status changed from Unscreened to Screened

I agree with what you are saying here. I resolved it because I am unable to reproduce your problem. I can boot here just fine while FreeNAS is configured for AD and no DC's are available. Perhaps you can show me this personally over a teamviewer session?

#5 Updated by John Hixson over 4 years ago

  • Status changed from Screened to 15

#6 Updated by Anthony Takata over 4 years ago

I'm not sure what's the difference here as well. I'll try to grab some logs later, but in my experience the boot continues, though with odd errors (like random "[ is not a number" and "False" lines). After boot, some weird finagling needs to be done to properly rejoin the domain, but it doesn't get stuck.

#7 Updated by Thomas Stather over 4 years ago

Hi

Yes i can do this, no problem. Please send me an email once you are available (i live in germany so what was the time difference?).

#8 Updated by John Hixson over 4 years ago

Thomas Stather wrote:

Hi

Yes i can do this, no problem. Please send me an email once you are available (i live in germany so what was the time difference?).

Thomas,

I need your email address ;-) Or you can send me an email:

Let me know what works for you and I'll try to schedule something.

#9 Updated by Thomas Stather over 4 years ago

Hi

I am available right now, if you want to connect, my Teamviewer ID is 301 366 106

:)

#10 Updated by Thomas Stather over 4 years ago

i sent you an email with the password

#11 Updated by Thomas Stather over 4 years ago

Hi

I am available until 0am here (its currently 9:05pm here), just try to connect, you have my email with the password.
What is the time difference between you and my timezone (germany)? :)

#12 Updated by Thomas Stather over 4 years ago

I'm offline for today, please tell me whats the time difference, then i can tell you when i will be available tomorrow to get this sorted out hopefully.

#13 Updated by John Hixson over 4 years ago

Thomas Stather wrote:

I'm offline for today, please tell me whats the time difference, then i can tell you when i will be available tomorrow to get this sorted out hopefully.\

I am in California (PDT). I'm pretty flexible as long as I know when you're available and we schedule ahead of time.

#14 Updated by Thomas Stather over 4 years ago

Hi

So you are 9 hours behind the time in germany. Can we do a teamviewer session today?
I'd propose 11pm which is 2pm in your timezone, ok?

Best,
Thomas

#15 Updated by John Hixson over 4 years ago

Thomas Stather wrote:

Hi

So you are 9 hours behind the time in germany. Can we do a teamviewer session today?
I'd propose 11pm which is 2pm in your timezone, ok?

Yes, that works for me. Send me your info:

Best,
Thomas

#16 Updated by Thomas Stather over 4 years ago

Sorry i meant yesterday evening at 11pm. From today on, im not available until sunday. I'll contact you again then so we can fix this :)

#17 Updated by Jordan Hubbard over 4 years ago

Please give all times in GMT when attempting to coordinate teamviewer sessions. It's too confusing otherwise.

#18 Updated by John Hixson over 4 years ago

Jordan Hubbard wrote:

Please give all times in GMT when attempting to coordinate teamviewer sessions. It's too confusing otherwise.

And on that note, can you please tell me when you are available in GMT? ;-)

#19 Updated by John Hixson over 4 years ago

Waiting to hear back from Thomas on this.

#20 Updated by John Hixson over 4 years ago

John Hixson wrote:

Waiting to hear back from Thomas on this.

Still waiting to hear from Thomas.

#21 Updated by Thomas Stather over 4 years ago

Hi

It would be ok for me today at 7pm GMT (which is 9pm in my local time).

#22 Updated by John Hixson over 4 years ago

Thomas Stather wrote:

Hi

It would be ok for me today at 7pm GMT (which is 9pm in my local time).

Hi Thomas,

We will need to schedule something tomorrow. How about 7pm GMT tomorrow ?

#23 Updated by Thomas Stather over 4 years ago

7pm GMT for me (9pm at my local time) is fine.

Ill send you my TeamViewer ID then.

Best,

Thomas

#24 Updated by John Hixson over 4 years ago

  • Status changed from 15 to Investigation

After spending time on this issue, I've found multiple problems that need addressing:

We try and determine the best DC to use based on latency. We do a lookup for the SRV records, then attempt a connection to each one while summing up latency for each one. This generally works fine and I'm not exactly clear why it is not working here. I didn't take much time to figure that out, only that it was part of the problem here. So using non-blocking IO on the connect or in pythons case, setting a timeout on the socket is necessary here. Also, if only a single SRV record is returned, it isn't necessary to take this code path either. In the AsyncConnect class, there is an extra parameter that should also be set (count). Again, I'm not clear (yet) on why this doesn't work correctly in Thomas's environment. In a regular configuration with a DC down, I'm unable to reproduce this. While tracing through the various code paths that are followed for this, I also found where some better checks are necessary where get_best_host() is used. The last part of this and ultimately the ideal goal is for the AD code to back off if DC's aren't available and periodically attempt a connection and resume.

#25 Updated by John Hixson over 4 years ago

This is going to take a bit of time to do correctly. I haven't had the time yet. When more time becomes available (less tickets..), I'll start hammering away at this.

#26 Updated by Anthony Takata over 4 years ago

In the meantime at least, the workaround I posted in #8013 seems to be working fairly well, with a few hiccups.

#27 Updated by John Hixson over 4 years ago

Yeah, the workaround will have to do for a while. I'm still unsure when I'll have the time to do this. I look pretty booked for at least the next few weeks.

#28 Updated by John Hixson over 4 years ago

on hold, but want to pursue soon

#29 Updated by John Hixson over 4 years ago

I made a commit that is probably relevant to this, see f06fb2f557e8d5b3ceeee7fd75d2ca1a6fb2d411.

#30 Updated by Thomas Stather over 4 years ago

Should i now try the latest nightly? :)

#31 Updated by John Hixson over 4 years ago

  • Status changed from Investigation to 15

Thomas Stather wrote:

Should i now try the latest nightly? :)

Yes, please do. Report back what happens and if it's satisfactory for you. We still need to support this properly, but hopefully this does what you need for now instead of you doing it manually.

#32 Updated by John Hixson over 4 years ago

Thomas,

Have you had a chance to test out a nightly yet?

#33 Updated by Thomas Stather over 4 years ago

Hi

To be honest not yet, as i had no time yet, the company is very busy. Ill try tomorrow and report instantly :)

#34 Updated by Thomas Stather over 4 years ago

3102

Hi

No luck, see attached screenshot. The system halts if the DC is not available.
When do you have time for a Teamviewer session, if you want to take a look?

#35 Updated by John Hixson over 4 years ago

Hi Thomas,

Send me your availability again ;-)

#36 Updated by Thomas Stather over 4 years ago

Hi

Today 7pm GMT for me (9pm at my local time) would be fine.

#37 Updated by John Hixson over 4 years ago

  • Status changed from 15 to Investigation

Did a teamviewer with Thomas and found pretty much what I had previously documented in this ticket is still a problem. I told Thomas I will make some more fixes and ping him when I do so that he can verify them.

#38 Updated by Thomas Stather over 4 years ago

OK ill wait, just tell me if i can test again (i know it takes some time) :)

#39 Updated by John Hixson over 4 years ago

The need for these fixes is starting to rear its ugly head. I'm hoping to have time for this soon. I just need to clear out some tickets first.

#40 Updated by John Hixson over 4 years ago

This is still pushed back

#41 Updated by John Hixson over 4 years ago

This is still pushed back

#42 Updated by John Hixson over 4 years ago

still on hold

#43 Updated by John Hixson over 4 years ago

still on hold

#44 Updated by John Hixson over 4 years ago

This is still on hold

#45 Updated by John Hixson over 4 years ago

This is still on hold

#46 Updated by John Hixson over 4 years ago

I haven't had time to look at this yet.

#47 Updated by John Hixson over 4 years ago

I haven't had time to look at this yet.

#48 Updated by John Hixson over 4 years ago

I haven't had time to look at this yet.

#49 Updated by John Hixson over 4 years ago

Still haven't had time for this

#50 Updated by John Hixson over 4 years ago

I will probably be exploring this soon

#51 Updated by John Hixson over 4 years ago

Closing in on this one soon, next week maybe?

#52 Updated by John Hixson over 4 years ago

It's looking like I'll probably start this sometime this week. One or two tickets ahead of this, and if I can clear those up then it's a go.

#53 Updated by John Hixson over 4 years ago

still on hold

#54 Updated by John Hixson over 4 years ago

Well, every time I think I'm getting close to have time for this (and some others), more tickets appear, so , still on hold ;-/

#55 Updated by John Hixson over 4 years ago

John Hixson wrote:

Well, every time I think I'm getting close to have time for this (and some others), more tickets appear, so , still on hold ;-/

This is exactly where I'm at with this ticket still.

#56 Updated by Thomas Stather over 4 years ago

Any new progress yet? :)

#57 Updated by John Hixson over 4 years ago

Thomas Stather wrote:

Any new progress yet? :)

Thomas, unfortunately, no. Higher priority items are still in my queue. This is still on my wish list, but not a priority.

#58 Updated by John Hixson over 4 years ago

This is still on hold

#59 Updated by Rex Wheeler over 4 years ago

I have hit this issue as well, but I suspect it may be more of a DNS thing than an AD thing.

In my scenario I have a FreeNAS box serving iSCSI for ESXi, NFS for Linux, and CIFS for Windows shares. Like the original poster, I have a circular dependency, FreeNAS serves iSCSI for ESXi, which hosts my domain controllers that FreeNAS is a domain member of. I know this is not a good configuration, but this is a home lab and I am cheap. That being said when I was on Nexenta recently before moving to FreeNAS, I didn't have this problem as Nexenta starts serving iSCSI before needing any of the CIFS or AD domain stuff. Since iSCSI is not authenticated by Active Directory, it starts up first (under Nexenta). ESXi is pretty tolerant of iSCSI targets going away and the VMs just pause until they comes back.

Enough background. When I figured out what has happening to me, my first thought was that maybe FreeNAS needed a DNS server up and running in order to boot, so I moved one of my domain controllers to local disk on the ESXi box and tried again. With this configuration I was still hanging on boot (even though an operating DC and DNS were available on the network). It turns out that the domain controller (DNS server) that I made sure would be available to FreeNAS on boot was the second configured DNS server. When moving both DNS servers to local ESXi storage, the boot worked properly. Any chance that the hang during boot is because something trying to use DNS isn't properly trying the secondary DNS server if the primary DNS server is offline or times out? Note that in my case the primary DNS server may have been "sort of" there - the VM was running but didn't have any disk IO available to it. Perhaps it accepted the DNS TCP/IP connection, but wouldn't actually answer a query? If this is the case, detecting DNS timeouts and rolling to a secondary DNS server may be difficult.

#60 Updated by Anthony Takata over 4 years ago

You may be on to something with the failover. In my situation, since my vms are hosted in a freenas jail, it is literally impossible to have the DC available during boot, no matter what dns configuration there may be.
Specific to me, I just need the services to keep trying past boot in order to enable (with possible UI notification that services were delayed).

#61 Updated by John Hixson about 4 years ago

I think all of this comes down to having some kind of mechanism in the AD system that tries to periodically join for a certain interval. This is definitely on my TODO list, it just isn't a priority right now. I'll keep you posted ;-)

#62 Updated by Jordan Hubbard almost 4 years ago

  • Assignee changed from John Hixson to Wojciech Kloska

#63 Updated by Wojciech Kloska almost 4 years ago

  • Assignee changed from Wojciech Kloska to Erin Clark

#64 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Assignee changed from Erin Clark to John Hixson
  • Priority changed from No priority to Nice to have
  • Target version changed from Unspecified to 9.10.2

John,

Is this something you could still implement?

#65 Avatar?id=14398&size=24x24 Updated by Kris Moore about 3 years ago

  • Assignee changed from John Hixson to Erin Clark

#66 Avatar?id=14398&size=24x24 Updated by Kris Moore about 3 years ago

  • Status changed from Investigation to Closed: Not To Be Fixed

At the moment there just isn't the resources to take a look at this, closing as NTBF

#67 Updated by Dru Lavigne about 2 years ago

  • File deleted (debug-storage-20150505093805.tgz)

Also available in: Atom PDF