Project

General

Profile

Bug #24412

ctl_datamove: tag 0x3ae61 on (0:4:0) aborted after "downgrade" to 9.10.2 from corral

Added by Philip Philip over 3 years ago. Updated almost 3 years ago.

Status:
Closed: User Config Issue
Priority:
No priority
Assignee:
Alexander Motin
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

Supermicro Motherborad X8SIE-LN4F
Intel(R) Xeon(R) CPU X3430 @ 2.40GHz
12GB RAM DDR3
IBM M1015 in IT-mode
2 Zpools
1: 1 vdev with 4x3TB Sata disk in RaidZ
2: 1 vdev with 8x300GB SAS disk in RAID10 in a HP MSA70.

ChangeLog Required:
No

Description

Getting a lot of these ctl_mdatamove tag xx on xx aborted. About every couple of hours.
It started happening after I "downgraded" from corral as far as I can see. However I've tried to upgrade to FreeNAS 11.0-RC4 also the same problem there.
What can I check to be able to see where the problem lies and fix the problem?

FreeNasDiskReports.png (226 KB) FreeNasDiskReports.png Philip Philip, 06/07/2017 12:07 PM
FreeNasDiskReports2.png (114 KB) FreeNasDiskReports2.png Philip Philip, 06/07/2017 12:23 PM
11389
11390

History

#1 Updated by Philip Philip over 3 years ago

11389

I have checked the I/Os, Busy and latceny but it seems pretty normal before and during the time those errors occurred. Attaching screenshots of half of the disks.

How can I with these errors know which pool is the problem so I am looking where I should. How can I interpret the 0:4:0 or 1:4:0?

#2 Updated by Philip Philip over 3 years ago

  • Subject changed from ctl_datamove: tag 0x3ae61 on (0:4:0) aborted after "downgrade" to corral to ctl_datamove: tag 0x3ae61 on (0:4:0) aborted after "downgrade" to 9.10.2 from corral

#3 Updated by Philip Philip over 3 years ago

11390

#4 Updated by Philip Philip over 3 years ago

To add the first pictures I looked at the time 20:15 as I had errors at that time. The second picture I looked at the time 19:45 and 20:15.
Jun 7 20:15:08 mattsson-nas-13 ctl_datamove: tag 0x3b9d8 on (0:4:0) aborted
Jun 7 20:15:10 mattsson-nas-13 ctl_datamove: tag 0x3b9dc on (0:4:0) aborted

#5 Updated by Alexander Motin over 3 years ago

  • Status changed from Unscreened to Closed: User Config Issue

Usually errors like that caused by some congestion on ZFS pool, which can't keep up with incoming request rate, either due to hardware problems or by design. I don't think there should be any relation to Corral in this case. I see that at least one of your pools used with iSCSI is RAIDZ, which is not recommended for block storage, and the last number of "0:4:0 or 1:4:0" is actually an internal CTL's LUN number, which, as far as I can see, is the one on RAIDZ pool.

#6 Updated by Philip Philip over 3 years ago

I still think is weird it just came up. As I only have one disk shared on the iSCSI on the RaidZ pool. It a secondary (storage for pictures) disk to one VM so it shouldn't be used very often but I guess some NFS and SMB shares might use up the disk much. It has worked for a couple of months before but it just happened after this version change weirdly.
But I guess if the errors point to that iSCSI share I just have to figure out another way to store those pictures on that pool.

#7 Updated by Alexander Motin over 3 years ago

May be some time passed while gradually growing pool/data fragmentation before it reached critical mass to start kicking those errors.

#8 Updated by Dru Lavigne almost 3 years ago

  • File deleted (debug-mattsson-nas-13-20170607195615.tgz)

#9 Updated by Dru Lavigne almost 3 years ago

  • File deleted (FreeNasmessages.txt)

#10 Updated by Dru Lavigne almost 3 years ago

  • Target version set to N/A

Also available in: Atom PDF