Project

General

Profile

Bug #16028

Avatar?id=14398&size=22x22

IPMI module bug with ASRock boards

Added by jared wechsler about 4 years ago. Updated over 3 years ago.

Status:
Closed: Third party to resolve
Priority:
Critical
Assignee:
Kris Moore
Category:
OS
Target version:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

CPU: E3-1231v3
MB: ASRock E3C224D4I-14S
Ram: 32GB ECC UDIMM (2x Crucial CT2KIT102472BD160B)
HDD: 10x 4TB WD SE WD4000F9YZ (single vdev raid-z2)
Boot: 2x 16GB Innodisk SataDOM DESML-16GD06RC1DC (mirrored)
Network: Intel X710-DA2
Case: Lian-Li PC-Q26B
Fans: 5x 120mm COUGAR CF-V12HPB & 1x 80mm Noctua NF-R8
PSU: SeaSonic SSR-550RM
UPS: CyberPower CP1000PFCLCD

ChangeLog Required:
No

Description

The IPMI module is confirmed to cause overheating on 3 different ASRock boards. Basically what happens is some time in the late evening the fan speeds drop to the slowest speed. It doesn't matter what speed they are set to in the BIOS. I woke up several times to getting blasted with emails on drives with extremely high temps, possibly voiding my warranty. I searched the forums and found this thread https://forums.freenas.org/index.php?threads/disable-ipmi-kernel-module-in-9-3-resolved.25827/. Unloading the IPMI module does prevent this from occurring, but it isn't exactly a fix. E3C224D2I, E3C224D4I-14S C2750D4I (MiniXL) are the 3 boards that are confirmed to be affected, but there may be other ASRock boards.

Freenas Version: FreeNAS-9.10-STABLE-201606072003
Bios Version: 3.20
Firmware Revision: 0.16.0
Firmware Build Time: Dec 3 2015 21:34:14 CST


Related issues

Related to FreeNAS - Bug #20968: Add alert for end-user to update BMC firmware on FreeNAS miniResolved2017-02-08

History

#1 Avatar?id=14398&size=24x24 Updated by Kris Moore about 4 years ago

  • Category changed from 32 to 76
  • Assignee changed from Jordan Hubbard to Josh Paetzel

#2 Updated by Josh Paetzel about 4 years ago

  • Status changed from Unscreened to Screened

For it's worth I have a mini and it doesn't overheat.

#3 Updated by Sean Fagan about 4 years ago

Right now, interestingly enough, my Mini has a CPU temperature of 96°C, while the MiniXL right next to it has a CPU temperature of 37°C.

#4 Updated by Josh Paetzel about 4 years ago

  • Status changed from Screened to Fix In Progress

Ok, the latest BMC firmware for this board resolves this. I'll be attaching instructions to this ticket soon.

#5 Updated by Josh Paetzel about 4 years ago

  • File Mini_Bios_IPMI.zip added
  • Status changed from Fix In Progress to Closed: Third party to resolve

Attached is a PDF and firmware update for the C2750 board in the mini which resolves this issue. I'm guessing similar updates are available for other ASRock boards, they were pretty up front about the fact that this is a firmware bug.

#6 Updated by jared wechsler about 4 years ago

Josh Paetzel wrote:

Attached is a PDF and firmware update for the C2750 board in the mini which resolves this issue. I'm guessing similar updates are available for other ASRock boards, they were pretty up front about the fact that this is a firmware bug.

Did they confirm it was fixed in the BMC firmware version 00.27.00? If so I will reach out to them for an update for my board specifically.

#7 Updated by Josh Paetzel about 4 years ago

The BMC firmware version is 0.27.0, but they didn't disambiguate between BMC and BIOS updates. They made it seem like you needed both updates.

#8 Updated by jared wechsler about 4 years ago

Josh Paetzel wrote:

The BMC firmware version is 0.27.0, but they didn't disambiguate between BMC and BIOS updates. They made it seem like you needed both updates.

They usually release together. I will reach out to my contact there. Thanks for the update on this.

#9 Updated by Grzegorz Krzystek about 4 years ago

Josh Paetzel wrote:

Ok, the latest BMC firmware for this board resolves this. I'll be attaching instructions to this ticket soon.

I can herby confirm that combination of bios + BMC firmware does not solve problem.
ipmi from my mini become unstable after couple days of uptime (sometimes it is couple weeks, sometimes couple days)
kernel complains
Jun 30 08:22:52 ATUIN ipmi0: KCS: Failed to read address
Jun 30 08:22:52 ATUIN ipmi0: KCS error: 5f
Jun 30 08:22:53 ATUIN ipmi0: KCS: Failed to read command
Jun 30 08:22:53 ATUIN ipmi0: KCS error: 5f
Jun 30 08:24:33 ATUIN ipmi0: KCS: Failed to read address
Jun 30 08:24:33 ATUIN ipmi0: KCS error: 5f
Jun 30 08:24:39 ATUIN ipmi0: KCS: Failed to read address
Jun 30 08:24:39 ATUIN ipmi0: KCS error: 5f
Jun 30 08:24:45 ATUIN ipmi0: KCS: Failed to read address
Jun 30 08:24:45 ATUIN ipmi0: KCS error: 5f
Jun 30 08:24:45 ATUIN ipmi0: Failed to reset watchdog

over and over again
fans are at lower speed , board is overheating.
unable to telnet/ssh/www to BMC , unable to do ipmitool not responding , bmc activity led not blinking.
only full power reset solves that problem.
i have open ticket in asrockrack support but for now no response.

#10 Updated by Grzegorz Krzystek about 4 years ago

My board is AsrockRack C2750D4I - so you will have simlar reports from all Mini and Mini XL Users.

#11 Updated by Josh Paetzel about 4 years ago

It might be something specific to your system. That BIOS/BMC firmware has been available for months and hasn't posed a problem in general.

#12 Updated by Rosa Box about 4 years ago

Josh Paetzel wrote:

It might be something specific to your system. That BIOS/BMC firmware has been available for months and hasn't posed a problem in general.

I'm experiencing the same problem on my Asrock C2550D4I, BIOS 2.30, BMC 00.27.00.
I have my fans set to 800 rpm to keep my HDDs cool but they randomly reset to 400 rpm and I have this in the FreeNAS log:
ipmi: KCS: Failed to read address
ipmi: KCS error: 5f

#13 Updated by Frank Riley about 4 years ago

I also experience this problem regularly with the latest firmwares installed. I have the 2750 board. I have a cron job running once per hour that sets the fans to max speed to get around the issue.

#14 Updated by Grzegorz Krzystek about 4 years ago

Frank Riley wrote:

I also experience this problem regularly with the latest firmwares installed. I have the 2750 board. I have a cron job running once per hour that sets the fans to max speed to get around the issue.

Workaround for that problem is:

disable Watchdog in BIOS ,ad add tunable:
watchdogd_enable , value = NO , type: rc.conf

the problem is in BMC it self, each time watchdog timer being reset by watchdogd IPMIMain process in BMC writes configuration to flash, this causes wareout of flash, when flash write fails BMC linux kernel panics and stop working.
i am running 3 months now with watchdog disabled and no more problems with asrockrack boards (i ahve 3 of them)
i reported BMC bug to AsrockRack support but i got only information that they are working on that problem.

#15 Updated by Grzegorz Krzystek about 4 years ago

AsrockRack support confirmed that they are working on this issue

#16 Updated by Josh Paetzel about 4 years ago

  • Status changed from Closed: Third party to resolve to Investigation
  • Priority changed from No priority to Critical

Do you have a ticket or anything that I can reference with asrock?

#17 Updated by Grzegorz Krzystek about 4 years ago

Josh Paetzel wrote:

Do you have a ticket or anything that I can reference with asrock?

they don't have "normal" ticket tracking system
they support bases on mail threads
my thread is called: $C2750D4I$ BMC Unstable - many issues (Poland)
I can forward to you all messages from thread, i reported couple issues with BMC firmware in this thread...

#18 Avatar?id=14398&size=24x24 Updated by Kris Moore about 4 years ago

  • Target version set to 9.10.1-U1

Setting target version. I realize this depends on upstream, but just for making sure this doesn't get lost in the shuffle when 9.10.1-U1 is rolled.

#19 Avatar?id=14398&size=24x24 Updated by Kris Moore about 4 years ago

  • Due date set to 09/19/2016

#20 Updated by Vaibhav Chauhan about 4 years ago

BRB: QA to confirm ASROCK fix is in for this problem.

#21 Updated by Vaibhav Chauhan about 4 years ago

  • Target version changed from 9.10.1-U1 to 9.10.1-U2

#22 Avatar?id=14398&size=24x24 Updated by Kris Moore almost 4 years ago

  • Target version changed from 9.10.1-U2 to 9.10.1-U3

#23 Updated by Josh Paetzel almost 4 years ago

For what it's worth, I am running watchdogd on my FreeNAS Mini with the fix from ticket 16190

The fan in my mini is at 1400RPM, CPU temp and motherboard temp are both at 65C (which is a tad high, but in spec)

Drives are at 35-40C.

I'll continue to monitor.

#24 Updated by Dru Lavigne almost 4 years ago

For changelog purposes, should this be punted to .2 or to future?

#25 Avatar?id=14398&size=24x24 Updated by Kris Moore almost 4 years ago

  • Target version changed from 9.10.1-U3 to 9.10.2

#26 Avatar?id=14398&size=24x24 Updated by Kris Moore almost 4 years ago

Josh - Anything else we can do with this prior to 9.10.2? Do we want to keep this ticket open just for tracking?

#27 Updated by Josh Paetzel almost 4 years ago

  • Target version changed from 9.10.2 to 9.10.2-U1

#28 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Target version changed from 9.10.2-U1 to 9.10.2-U2

#29 Updated by Josh Paetzel over 3 years ago

  • Status changed from Investigation to Unscreened
  • Assignee changed from Josh Paetzel to Kris Moore

#30 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Status changed from Unscreened to 15

Grzegorz / Jared, just asking for an update on this. Has asrock provided any more information to you since the last updates her?

#31 Updated by Grzegorz Krzystek over 3 years ago

Kris Moore wrote:

Grzegorz / Jared, just asking for an update on this. Has asrock provided any more information to you since the last updates her?

Nope, i pinged them, reply was:

Sorry nothing yet. Seeing the forum the best solution for now is to unload the ipmi kernel in freenas. ( strange thing is seems that this issue is only happening with freenas and not with other OS) https://forums.freenas.org/index.php?threads/disable-ipmi-kernel-module-in-9-3-resolved.25827/

so looks like they will ignore the problem :S

#32 Updated by Marcus Paul over 3 years ago

I can confirm the problem and the workaround (unloading the IPMI module) for the ASROCK E3C226D2I. Got the same reaction from ASROCK btw.

#33 Updated by Josh Paetzel over 3 years ago

The CEO of Asrock was at our office this week. The feedback we received is there should be a firmware release that fixes this in a couple weeks.

#34 Updated by Grzegorz Krzystek over 3 years ago

Marcus Paul wrote:

I can confirm the problem and the workaround (unloading the IPMI module) for the ASROCK E3C226D2I. Got the same reaction from ASROCK btw

better sollution is disable watchdogd on system... mentioned by me above.
you have all ipmi managment available from system. and no problem, silence it was watchdog triggered problem in ipmi.

Josh Paetzel wrote:

The CEO of Asrock was at our office this week. The feedback we received is there should be a firmware release that fixes this in a couple weeks.

That's actually great news. thanks for the update :D

#35 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Related to Bug #20968: Add alert for end-user to update BMC firmware on FreeNAS mini added

#36 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Status changed from 15 to Closed: Third party to resolve

Looks like the firmware is about to drop. Marking this as third-party and also as related to the bug for adding an alert to update.

#37 Avatar?id=14398&size=24x24 Updated by Kris Moore over 3 years ago

  • Target version changed from 9.10.2-U2 to N/A

#38 Updated by Chris Casanova over 3 years ago

This is still an issue. I have an E3C226D4I board myself and have been having issues for almost a year. Stumbled across this post and [https://forums.freenas.org/index.php?threads/disable-ipmi-kernel-module-in-9-3-resolved.25827/]

Disabling the ipmi and watchdogd seems to fix it. My symptoms involved ipmi claiming the CPU temp was 300C. Yes... celsius. Ran a command within FreeNAS to get hardware temperatures, and CPU was stable around a normal temp of 30C. This would happen every couple of days, requiring me to completely pull power from the server and letting BMC reinitialize. I have not seen this reoccur since disabling IPMI and watchdogd within FreeNAS; it's been about 8 days now which I am ready to deem it "fixed" via workaround. ASRock was not much help here other than blaming me for hardware misconfiguration or bending my CPU pins. I sent this bug report and the link above to ASRock for their reference for future occurrences. FreeNAS folks: I would recommend a toggle switch for IPMI/watchdogd within the webgui with a small note to make this easier in the future. I had no clue the OS had hooked back into the IPMI by default.

#39 Updated by Grzegorz Krzystek over 3 years ago

Chris Casanova wrote:

This is still an issue. I have an E3C226D4I board myself and have been having issues for almost a year. Stumbled across this post and [https://forums.freenas.org/index.php?threads/disable-ipmi-kernel-module-in-9-3-resolved.25827/]

Disabling the ipmi and watchdogd seems to fix it. My symptoms involved ipmi claiming the CPU temp was 300C. Yes... celsius. Ran a command within FreeNAS to get hardware temperatures, and CPU was stable around a normal temp of 30C. This would happen every couple of days, requiring me to completely pull power from the server and letting BMC reinitialize. I have not seen this reoccur since disabling IPMI and watchdogd within FreeNAS; it's been about 8 days now which I am ready to deem it "fixed" via workaround. ASRock was not much help here other than blaming me for hardware misconfiguration or bending my CPU pins. I sent this bug report and the link above to ASRock for their reference for future occurrences. FreeNAS folks: I would recommend a toggle switch for IPMI/watchdogd within the webgui with a small note to make this easier in the future. I had no clue the OS had hooked back into the IPMI by default.

Attack AsrockRack support for BMC firmware update. As it isn't FreeNAS/FreeBSD bug, but implementation of watchdog in IPMIMain process running in BMC destroing internal flash and become unstable.
AsrockRack C2750D4I and C2250D4I are no longer a problem as we got BMC Firmware fixed in 0.30.0 BMC Firmware version.

#40 Updated by Dru Lavigne almost 3 years ago

  • File deleted (Mini_Bios_IPMI.zip)

Also available in: Atom PDF