Project

General

Profile

Bug #19880

Failed to add the device with handle 0x000e to persistent table because there is no free space available

Added by Anthony Portee about 4 years ago. Updated over 3 years ago.

Status:
Closed: Duplicate
Priority:
Important
Assignee:
Alexander Motin
Category:
OS
Target version:
Seen in:
Severity:
New
Reason for Closing:
Reason for Blocked:
Needs QA:
Yes
Needs Doc:
Yes
Needs Merging:
Yes
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:

2 - Intel XEON E5-2643 v2 @ 3.50GHz (Hex Core)
128 - ECC RAM (GB)
2 - JBOD - SC847E26-RJBOD1
http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm
Server - 6037R-TXRF
http://www.supermicro.com/products/system/3U/6037/SYS-6037R-TXRF.cfm
Motherboard - X9DRX+-F
https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRX_-F.cfm
2 - LSI SAS9207-8e ((chipset LSI SAS 2308))
85 - WD4001FYYG - 4TB SAS HDD (functional)
5 - WD4001FYYG - 4TB SAS HDD (non-functional, issue discussed here)
1 - SanDisk Cruzer Fit 16 GB USB Flash Drive (boot device)
1 - X520-SR2 10GbE Dual-Port Server Adapter (82599ES

ChangeLog Required:
No

Description

I’m having what I believe to be a driver issue with my FreeNAS build [FreeNAS-9.10.1-U4 (ec9a7d3)] and I’m hoping that someone out there can help point me in the right direction or otherwise clarify if this is a bug to be addressed here. The problem that I’m facing is that all new HDD installs to my server (disks 86 thru 90) are failing to install or be recognized by FreeNAS, and seemingly the OS as well. All old or previously installed HDDs continue to function without issue. The problem persists after numerous reboots.

After installing a new SAS HDD the DMESG output will state the following:


mps1: _mapping_add_new_device: failed to add the device with handle 0x000e to persistent table because there is no free space available

The error message repeats for each new drive with the actual handle being unique to each drive. I will post pertanent DMESG and SAS2FLAH output below:

DMESG output relative to my Avago/LSI adapters


…
mps0: <Avago Technologies (LSI) SAS2308> port 0x6000-0x60ff mem 0xdf640000-0xdf64ffff,0xdf600000-0xdf63ffff irq 42 at device 0.0 on pci6
mps0: Firmware: 20.00.07.00, Driver: 21.00.00.00-fbsd
mps0: IOCCapabilities: 5285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps1: <Avago Technologies (LSI) SAS2308> port 0xf000-0xf0ff mem 0xfbe40000-0xfbe4ffff,0xfbe00000-0xfbe3ffff irq 58 at device 0.0 on pci132
mps1: Firmware: 20.00.07.00, Driver: 21.00.00.00-fbsd
mps1: IOCCapabilities: 5285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

mps1: _mapping_add_new_device: failed to add the device with handle 0x000e to persistent table because there is no free space available
mps1: _mapping_add_new_device: failed to add the device with handle 0x0017 to persistent table because there is no free space available
mps1: _mapping_add_new_device: failed to add the device with handle 0x001a to persistent table because there is no free space available
…

SAS2FLASH output


[root@storeserv01] ~# sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved
Adapter Selected is a LSI SAS: SAS2308_2(D1)

Num  Ctlr  FW Ver  NVDATA  x86-BIOS  PCI Addr
----------------------------------------------------------------------------

0  SAS2308_2(D1)  20.00.07.00  14.01.00.06  07.39.02.00  00:06:00:00
1  SAS2308_2(D1)  20.00.07.00  14.01.00.06  07.39.02.00  00:84:00:00

Finished Processing Commands Successfully.
Exiting SAS2Flash.
[root@storeserv01] ~#

After opening a ticket with Broadcom I’ve been instructed to replace the currently installed Kernel driver with the manufacture variant which can be found on their site and is currently P20.00.00.00. I currently have the BIOS disabled within the LSI HBA configurations as well as a very low INIT13 number, possibly zero. I appreciate any insight or experience you might be able to share on this matter.

Server specs

2 - Intel XEON E5-2643 v2 @ 3.50GHz (Hex Core)
128 - ECC RAM (GB)
2 - JBOD - SC847E26-RJBOD1
http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm
Server - 6037R-TXRF
http://www.supermicro.com/products/system/3U/6037/SYS-6037R-TXRF.cfm
Motherboard - X9DRX+-F
https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRX_-F.cfm
2 - LSI SAS9207-8e ((chipset LSI SAS 2308))
85 - WD4001FYYG - 4TB SAS HDD (functional)
5 - WD4001FYYG - 4TB SAS HDD (non-functional, issue discussed here)
1 - SanDisk Cruzer Fit 16 GB USB Flash Drive (boot device)
1 - X520-SR2 10GbE Dual-Port Server Adapter (82599ES)


Related issues

Has duplicate FreeNAS - Bug #25709: Update mps driverResolved2017-08-28

History

#1 Updated by Bonnie Follweiler about 4 years ago

  • Assignee set to Kris Moore

#2 Avatar?id=14398&size=24x24 Updated by Kris Moore about 4 years ago

  • Assignee changed from Kris Moore to Alexander Motin

Over to mav for investigation

#3 Updated by Alexander Motin about 4 years ago

  • Status changed from Unscreened to Screened

This is long known problem, just not sure whether in BIOS/firmware or the driver. But AFAIK it should not cause real problems -- it should make device numbers not persistent across reboots, but that is not a problem for FreeNAS, since it identifies disks by stored metadata.

#4 Updated by Anthony Portee about 4 years ago

Alexander Motin wrote:

This is long known problem, just not sure whether in BIOS/firmware or the driver. But AFAIK it should not cause real problems -- it should make device numbers not persistent across reboots, but that is not a problem for FreeNAS, since it identifies disks by stored metadata.

So the noted errors do not correlate with the referenced drives not being accessible to FreeNAS? Drives (~85) that do not get reported as such are accessible.

#5 Updated by Alexander Motin about 4 years ago

Anthony Portee wrote:

So the noted errors do not correlate with the referenced drives not being accessible to FreeNAS? Drives (~85) that do not get reported as such are accessible.

I don't have such correlation information. In cases where I've seen such errors I think all drives were accessible.

#6 Updated by Alexander Motin almost 4 years ago

  • Status changed from Screened to Fix In Progress
  • Priority changed from No priority to Important
  • Target version set to 49
  • Seen in changed from 9.10.1-U4 to 9.10.2-U2

I tried to diagnose this, and I believe this is a driver problem, affecting mps(4) and possibly mpr(4) drivers. I don't believe the driver from LSI web site is anyhow different in respect to this problem from the driver we have bundled with FreeBSD. I reported my observations to Stephen Mcconnell from LSI, and he promised to take a look on this problem.

#7 Updated by Kyle Gilgan almost 4 years ago

Looking through the MPS driver code pulled from https://github.com/freebsd/freebsd/tree/master/sys/dev/mps on 5/1/17 I can see that the function

static void _mapping_add_new_device(struct mps_softc *sc,struct _map_topology_change *topo_change)

from mps_mapping.c is throwing the error

"%s: " 
"failed to add the device " 
"with handle 0x%04x to " 
"persistent table because " 
"there is no free space " 
"available\n" 

This looks to be occurring because sc->max_dpm_entries is being reached.

max_dpm_entries is defined in mps_mapping.c as sc->ioc.pg8.MaxPersistentEntries

MaxPersistentEntries can be seen to be set to 128(in Firmware?)

[root@storeserv01] ~# mpsutil show iocfacts
       MaxChainDepth: 128
             WhoInit: 0x4
       NumberOfPorts: 1
      MaxMSIxVectors: 16
       RequestCredit: 10240
           ProductID: 0x2214
     IOCCapabilities: 0x5a85c
           FWVersion: 0x14000700
 IOCRequestFrameSize: 32
       MaxInitiators: 32
          MaxTargets: 1024
     MaxSasExpanders: 64
       MaxEnclosures: 65
       ProtocolFlags: 0x3
  HighPriorityCredit: 128
MaxRepDescPostQDepth: 65504
      ReplyFrameSize: 32
          MaxVolumes: 0
        MaxDevHandle: 1128
MaxPersistentEntries: 128

Now I see that the driver allows for enclosure slot mapping. Is this something that could be used to get around the MaxPersistentEntries limitation?
If so, how could this be enabled?(Looks like this is a Flag in the firmware)

#8 Updated by Alexander Motin almost 4 years ago

The real problem is that driver has broken mechanism if old devices expiration. It considers all devices as new and so never remove them from the table, so the table overflow is the question of time, if you connect different disks from time to time. Switching to slot mapping could indeed help a bit, but I am not sure how it supposed to be officially switched.

#9 Updated by Kyle Gilgan almost 4 years ago

I found that the mapping type is associated with a flag set in firmware. You are correct, old devices are not removed and this is persistent between reboots.
The controller stores persistent mapping information in its own memory region 'Persistent configuration pages' I would assume that upon initialization the driver polls this device map from the controller for consistency. The problem, as you stated, is that the old drives are never removed from the table. sas2flash can clear this section on the controller and a reboot should clear the mappings with the driver, but sas2flash erase does not work under freebsd.

The thing I find strange is how this controller is suppose the be able to handle 1024 drives and here we have mapping not able to go over 128

#10 Updated by Alexander Motin almost 4 years ago

FreeNAS and often FreeBSD in general do not care much about persistent device names. I don't know why that table is so small and what LSI thought doing that, but it should probably not create much real problems.

#11 Updated by Kyle Gilgan almost 4 years ago

True, for the average Freenas user, it is not much of a problem unless you are close to 128 drives per logical controller. In this case 90 with many that have been swapped out and replaced. I will boot to an alternative OS tomorrow and clear the persistent data on the controller to verify that this frees up dpm_entries for the additional currently unusable drives.

#12 Updated by Kyle Gilgan almost 4 years ago

A prepared USB disk and the EFI shell was used to run

sas2flash.efi -c 0 -o -e 3
sas2flash.efi -c 1 -o -e 3

on the controllers. This cleared out the persistent data including the mapping table. This also clears out any BIOS configurations that are in place and they need to be recreated.

After the initial boot all current drives were rediscovered.

sysctl -a dev.mps.0.mapping_table_dump
sysctl -a dev.mps.1.mapping_table_dump

Displayed indexes now correctly ranged from 8 to 98 (85 for drives and a few more for controllers/Enclosures?) (Not sure why it starts at 8 though)

All LUNs came up properly and were accessible.

When inserting a new drive in an enclosure(#5) The error '_mapping_add_new_device: failed to add the device with handle 0x000e to persistent table because there is no free space available' is no longer received and I can confirm that the index in dev.mps.X.mapping_table_dump is incremented.

However the following error is now received

        (probe0:mps1:0:94:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 532 terminated ioc 804b scsi 0 state c xfer 0
(probe0:mps1:0:94:0): INQUIRY. CDB: 12 00 00 00 24 00
(probe0:mps1:0:94:0): CAM status: CCB request completed with an error
(probe0:mps1:0:94:0): Retrying command

This CDB request is a basic 'who are you' command. I would normally blame this on a bad drive, but this does not seem to be the cause.

I have tried multiple(8) new drives in enclosure #5, including known good ones. All return with the above error.

I have moved a spare working drive from Enclosure #2 to Enclosure #5 to check for slot problems and the drive is recognized.

If I remove a spare from Enclosure #2 and place a new drive there (one I used to test in enclosure #5), the new drive is detected fine and is available to freenas.

These steps confirm that there is not a problem with the enclosure, slots, or drives.

This looks to me that there is some other hard limit on 85 active drives at one time, I am unsure on how to proceed troubleshooting this issue further.

#13 Updated by Alexander Motin almost 4 years ago

Status update: Patches for both mps(4) and mpr(4) are in progress:
https://reviews.freebsd.org/D10878
https://reviews.freebsd.org/D10861

#14 Updated by Alexander Motin almost 4 years ago

  • Target version changed from 49 to 11.1

The drivers fixes were merged into FreeBSD stable/11, so at least they should become part of 11.1 release.

#15 Updated by Alexander Motin over 3 years ago

  • Status changed from Fix In Progress to 19

I updated freenas/11-stable branch to recent FreeBSD stable/11, so tomorrow nightly build should include the fix.

#16 Updated by Alexander Motin over 3 years ago

  • Status changed from 19 to Ready For Release

Lets assume it is fixed.

#17 Updated by Dru Lavigne over 3 years ago

  • Has duplicate Bug #25709: Update mps driver added

#18 Updated by Dru Lavigne over 3 years ago

  • Status changed from Ready For Release to Closed: Duplicate
  • Target version changed from 11.1 to N/A

#19 Updated by Dru Lavigne about 3 years ago

  • File deleted (dmesg.txt)

Also available in: Atom PDF