Bug #28235
Bump default number of chain frames for mps(4) and mpr(4)
Description
I've been working with Michael Dexter and he suggested I put in a bug ticket.
We have a few freenas boxes here at the school district. Since we've upgraded to version 11 we're seeing the following errors:
Device: /dev/da38, failed to read SMART values
Device: /dev/da44, failed to read SMART values
Device: /dev/da40, failed to read SMART values
Device: /dev/da53, failed to read SMART values
We're also seeing these in the gui.
But as you see here, when I run a smartctl -a /dev/40 for example, smart says everything is ok.
https://pastebin.com/raw/xb5vdeyf
We've updated to FreeNAS-11.1-U1 but are still having the same issue.
This is happening on all 3 of our FreeNAS-11.1-U1 machines. But not on any of our 9.x machines.
After rebooting, a lot of time the error clears out, but it will come back complaining about a number of different drives.
Lastly, I saw someone on the forums having the same issue.
https://forums.freenas.org/index.php?threads/read-smart-self-test-log-failed.61359/#post-436173
Related issues
Associated revisions
History
#1
Updated by Dru Lavigne almost 3 years ago
- Private changed from No to Yes
- Seen in changed from 11.1-U1 to 11.1-U1
Jamie: please attach a debug (System -> Advanced -> Save Debug).
#2
Updated by Jamie McParland almost 3 years ago
- File debug-san3-20180207115251.tgz added
- File debug-san2-20180207115130.tgz added
I'm not sure if it's related, but on two of our three 11.x systems I'm getting this error as well:
mps0: Out of chain frames, consider increasing hw.mps.max_chains.
I've worked with Michael Dexter and we've upped the chains on SAN3. A little at a time, and we're currently at 8096, but still getting the error.
I've left the settings alone for that on SAN2.
In looking through our syslog, i noticed these errors started happening within a day or so of installing 11.x
Installed - 11.1 Release 12-22-2017
2017-12-24T01:56:27-08:00 san2 mps0: Out of chain frames, consider increasing hw.mps.max_chains.
Installed - 11.1 Release 12-22-2017
2017-12-24T03:34:41-08:00 san3 mps0: Out of chain frames, consider increasing hw.mps.max_chains.
On SAN3 I installed 11.1-U1 on 01/23/2018. But the hw.mps.max_chains is still happening.
We have another box call ipcamsan, which is having the same "failed to read SMART values" issue, but we're NOT seeing the hw.mps.max_chains warning on that box.
The only real difference between san2, san3, and IPCAMSAN, is ipcamsan only has one HBA. The other two boxes have more than one HBA.
#3
Updated by Dru Lavigne almost 3 years ago
- Assignee changed from Release Council to Alexander Motin
- Reason for Blocked set to Need verification
Starting with Alexander to see if it is driver related (which was fixed for U2) or different than any of the several open SMART tickets.
#4
Updated by Alexander Motin almost 3 years ago
- Related to Bug #28201: Fix queue length reporting in mps(4) and mpr(4) added
#5
Updated by Alexander Motin almost 3 years ago
- Category changed from Middleware to OS
- Status changed from Not Started to In Progress
- Priority changed from No priority to Important
- Target version set to 11.1-U2
- Severity changed from High to Medium
- Needs Doc changed from Yes to No
This problem indeed sounds like caused by a transient I/O errors. And there indeed seems like a good chance that it could be triggered by #28201 issue. "Out of chain frames" may also be a cause, and I see obvious issue there too, but still trying to investigate what were thinking people while tuning it as it is right now.
#6
Updated by Alexander Motin almost 3 years ago
- Subject changed from failed to read SMART values to Bump default number of chain frames for mps(4) and mpr(4)
- Status changed from In Progress to Done
- Needs QA changed from Yes to No
- Needs Merging changed from Yes to No
I've pushed the quick fix, while proper more universal one will probably come at some point later.
#7
Updated by Dru Lavigne almost 3 years ago
- File deleted (
debug-san2-20180207115130.tgz)
#8
Updated by Dru Lavigne almost 3 years ago
- File deleted (
debug-san3-20180207115251.tgz)
#9
Updated by Dru Lavigne almost 3 years ago
- Private changed from Yes to No
#10
Updated by Michael Dexter almost 3 years ago
Related forum post:
https://forums.freenas.org/index.php?threads/mps-lsi-hw-mps-max_chains.23067/#post-139146
Related FreeBSD mailing list post:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-March/084316.html
Both systems with this symptom have had multiple pools.
#11
Updated by Jamie McParland almost 3 years ago
Michael Dexter wrote:
Related forum post:
https://forums.freenas.org/index.php?threads/mps-lsi-hw-mps-max_chains.23067/#post-139146
Related FreeBSD mailing list post:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-March/084316.html
Both systems with this symptom have had multiple pools.
Alexander Motin wrote:
I've pushed the quick fix, while proper more universal one will probably come at some point later.
Alexander Motin wrote:
I've pushed the quick fix, while proper more universal one will probably come at some point later.
Alexander Motin wrote:
I've pushed the quick fix, while proper more universal one will probably come at some point later.
It's been 24 hours since i updated to FreeNAS-11.1-U2 and i haven't seen any more warnings. So I'm thinking this is solved. Thanks so much!