Project

General

Profile

Bug #32043

Fix stuck process on TrueNAS shutdown

Added by Joshua Sirrine over 1 year ago. Updated 6 months ago.

Status:
Closed
Priority:
Nice to have
Assignee:
Alexander Motin
Category:
OS
Target version:
Seen in:
TrueNAS - TrueNAS 11.1-U4
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
No
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

I've seen on several systems (Z20-HA, X10-HA, and various customer boxes) that when you try to do a shutdown (passive node on HA, or only node on non-HA since active node HA will sysctl panic) that it gets stuck on a given process. It takes a few minutes, but eventually returns with a timeout and shuts down.

This seems to have begun on 11.1-U4 and was not present on 11.0-U6. Noteworthy is that I did see this on other 11.1-U(x) builds of TrueNAS that were internal, and apparently nobody filed a bug or fixed it.


Related issues

Related to FreeNAS - Bug #74695: Active HA controller shutdown/reboot handlingClosed

History

#1 Updated by Caleb St. John over 1 year ago

I've looked into this as well and I believe the service in question is "consul_alerts"

It doesn't seem to want to honor the SIGTERM signal in the event of a shutdown/reboot.

#3 Avatar?id=13649&size=24x24 Updated by Ben Gadd over 1 year ago

  • Assignee changed from Release Council to Joe Maloney

#4 Updated by Dru Lavigne 10 months ago

  • Assignee changed from Joe Maloney to Alexander Motin
  • Target version changed from N/A to Backlog
  • Severity set to New

#5 Updated by Ryan Moeller 10 months ago

In case it is useful info,

The documented ways of stopping Consul are SIGINT and SIGKILL:
https://www.consul.io/docs/agent/basics.html#stopping-an-agent

This is due to change soon to SIGINT and SIGTERM:
https://github.com/hashicorp/consul/commit/491826ddbce16a08be135aaca53b0e814d888bd7

#6 Updated by Caleb St. John 10 months ago

This ticket can be closed because consul-alerts was removed in commit: https://github.com/freenas/build/commit/914d29437f0d34364ba3abd9c9e0aa08903682b1

#7 Updated by Alexander Motin 10 months ago

Yes, in narrow sense this problem was fixed. But in wider sense there is still uncertainty how active controller reboot is handled, what if some services shut down while other(s) get stuck up, and what client will see as result instead of clean failover. We should explicitly trigger failover on reboot/shutdown request, which I guess may not be the case now, need to check it.

#8 Updated by Alexander Motin 10 months ago

  • Status changed from Unscreened to Screened

#9 Avatar?id=14398&size=24x24 Updated by Kris Moore 10 months ago

  • Project changed from TrueNAS to FreeNAS
  • Category changed from OS to OS
  • Private changed from No to Yes
  • Migration Needed deleted (No)
  • Hide from ChangeLog deleted (No)
  • Support Department Priority deleted (0)

#10 Updated by Dru Lavigne 10 months ago

  • Target version changed from Backlog to 12.0

#11 Updated by Dru Lavigne 10 months ago

  • Target version changed from 12.0 to Backlog

#12 Updated by Alexander Motin 6 months ago

  • Related to Bug #74695: Active HA controller shutdown/reboot handling added

#13 Updated by Alexander Motin 6 months ago

  • Status changed from Screened to Closed
  • Private changed from Yes to No
  • Seen in changed from TrueNAS 11.1-U4 to N/A

I've create another ticket for wider sense of problem.

#14 Updated by Dru Lavigne 6 months ago

  • Target version changed from Backlog to N/A
  • Seen in changed from N/A to TrueNAS 11.1-U4

Also available in: Atom PDF