154299 – [arcmsr] arcmsr fails to detect all attached drives

Bug 154299 - [arcmsr] arcmsr fails to detect all attached drives

Summary: [arcmsr] arcmsr fails to detect all attached drives

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	Unspecified
Hardware:	Any Any

Importance:	Normal Affects Only Me
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-01-26 07:10 UTC by Rich Ercolani
Modified:	2017-12-31 22:29 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Rich Ercolani 2011-01-26 07:10:10 UTC

arcmsr fails to detect all attached drives. It may or may not have something to do with a failed device attached and e.g. PR 148502 or 150390.

c.f.:

[root@manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;
  # Ch# ModelName                       Capacity  Usage
===============================================================================
  1  1  N.A.                               0.0GB  N.A.
  2  2  N.A.                               0.0GB  N.A.
  3  3  N.A.                               0.0GB  N.A.
  4  4  N.A.                               0.0GB  N.A.
  5  5  N.A.                               0.0GB  N.A.
  6  6  N.A.                               0.0GB  N.A.
  7  7  N.A.                               0.0GB  N.A.
  8  8  N.A.                               0.0GB  N.A.
  9  9  ST31500341AS                    1500.3GB  JBOD
 10 10  N.A.                               0.0GB  N.A.
 11 11  ST31500341AS                    1500.3GB  JBOD
 12 12  ST31500341AS                    1500.3GB  JBOD
 13 13  ST31500341AS                    1500.3GB  JBOD
 14 14  N.A.                               0.0GB  N.A.
 15 15  ST31500341AS                    1500.3GB  JBOD
 16 16  ST31500341AS                    1500.3GB  JBOD
 17 17  N.A.                               0.0GB  N.A.
 18 18  N.A.                               0.0GB  N.A.
 19 19  ST31500341AS                    1500.3GB  JBOD
 20 20  ST31500341AS                    1500.3GB  JBOD
 21 21  ST31500341AS                    1500.3GB  JBOD
 22 22                                     0.0GB  Failed
 23 23  ST31500341AS                    1500.3GB  JBOD
 24 24  ST31500341AS                    1500.3GB  JBOD
===============================================================================
GuiErrMsg<0x00>: Success.
/dev/ad4    /dev/ad4s1  /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d /dev/da0    /dev/da1    /dev/da1p1  /dev/da1p9  /dev/da2    /dev/da3    /dev/da4    /dev/da5

I count 11 drives attached via the arc1280ml, not including the failed drive, and I see 6 appearing.

camcontrol rescan all and reboots to do help the issue. I am running firmware 1.49.

How-To-Repeat: Presumably have a failed drive on the controller.

Comment 1 jsirrine 2013-01-24 01:59:23 UTC

First I'd like to appologize right now if I am sending this email and it 
is not being routed correctly.  This is not the same as the ticket 
system FreeNAS uses so I'm in new territory.  I've been using 
FreeNAS(FreeBSD) for about a year but I am a quick learner.  If I need 
to provide this information in a form other than email to fix this issue 
please let me know.

I believe I have found the cause for disks not being usable as seen on 
kern/154299 <http://www.freebsd.org/cgi/query-pr.cgi?pr=154299>. Here's 
what I see on my system.  My system uses an Areca 1280ML-24 with 
Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on FreeBSD 8.3) 
with areca-cli version Version 1.84, Arclib: 300, Date: Nov 9 2010( 
FreeBSD ).  I found this issue when swapping out backplanes for my hard 
drives.

I had drives populating RAID controller ports r1 through 14.  Due to a 
failed backplane I switched the 2 drives that were connected to ports r13 
and 14 to 21 and 22 respectively.  All of these disks are in a ZFS 
RAIDZ3 zpool.  Note that I have not had any problems with ZFS scrubs or 
SMART long tests on these drives and they have been running for more 
than a year so infant mortality is not an issue. Also the RAID 
controller is in Non-RAID mode so all disks are JBOD by default.

Physical Drive Information
   # Ch# ModelName                       Capacity  Usage
===============================================================================
   1  1  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   2  2  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   3  3  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   4  4  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   5  5  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   6  6  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   7  7  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   8  8  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   9  9  WDC WD20EARS-00S8B1             2000.4GB  JBOD
  10 10  WDC WD20EARS-00S8B1             2000.4GB  JBOD
  11 11  WDC WD20EARS-00S8B1             2000.4GB  JBOD
  12 12  WDC WD20EARS-00S8B1             2000.4GB  JBOD
  13 13  N.A.                               0.0GB  N.A.
  14 14  N.A.                               0.0GB  N.A.
  15 15  N.A.                               0.0GB  N.A.
  16 16  N.A.                               0.0GB  N.A.
  17 17  N.A.                               0.0GB  N.A.
  18 18  N.A.                               0.0GB  N.A.
  19 19  N.A.                               0.0GB  N.A.
  20 20  N.A.                               0.0GB  N.A.
  21 21  WDC WD20EARS-00S8B1             2000.4GB  JBOD
  22 22  WDC WD20EARS-00S8B1             2000.4GB  JBOD
  23 23  N.A.                               0.0GB  N.A.
  24 24  N.A.                               0.0GB  N.A.
===============================================================================

With this configuration disks 21 and 22 were not available to me(only 12 
of the disks were available).  I was using a ZFS RAIDZ3 for all of these 
disks so I immediately lost 2 disks worth of redundancy.  The disks 
showed up in the RAID controller BIOS as well as the areca-cli(as you 
can see) but /dev was minus 2 disks and a 'zpool status' showed I had 2 
missing drives.  As soon as I swapped cables so that the disks were back 
in ports r13 and 14 on the RAID controller everything went back to normal.

Knowing that something was wrong I grabbed some spare drives and started 
experimenting.  I wanted to know what was actually wrong because I am 
trusting this sytem with my data for production use. Please examine the 
following VolumeSet Information:

VolumeSet Information
   # Name             Raid Name       Level   Capacity Ch/Id/Lun State
===============================================================================
   1 WD20EARS-00S8B1  Raid Set # 00   JBOD    2000.4GB 00/00/00 Normal
   2 WD20EARS-00S8B1  Raid Set # 01   JBOD    2000.4GB 00/00/01 Normal
   3 WD20EARS-00S8B1  Raid Set # 02   JBOD    2000.4GB 00/00/02 Normal
   4 WD20EARS-00S8B1  Raid Set # 03   JBOD    2000.4GB 00/00/03 Normal
   5 WD20EARS-00S8B1  Raid Set # 04   JBOD    2000.4GB 00/00/04 Normal
   6 WD20EARS-00S8B1  Raid Set # 05   JBOD    2000.4GB 00/00/05 Normal
   7 WD20EARS-00S8B1  Raid Set # 06   JBOD    2000.4GB 00/00/06 Normal
   8 WD20EARS-00S8B1  Raid Set # 07   JBOD    2000.4GB 00/00/07 Normal
   9 WD20EARS-00S8B1  Raid Set # 08   JBOD    2000.4GB 00/01/00 Normal
  10 WD20EARS-00S8B1  Raid Set # 09   JBOD    2000.4GB 00/01/01 Normal
  11 WD20EARS-00S8B1  Raid Set # 10   JBOD    2000.4GB 00/01/02 Normal
  12 WD20EARS-00S8B1  Raid Set # 11   JBOD    2000.4GB 00/01/03 Normal
  13 WD20EARS-00S8B1  Raid Set # 12   JBOD    2000.4GB 00/01/04 Normal
  14 WD20EARS-00S8B1  Raid Set # 13   JBOD    2000.4GB 00/01/05 Normal
===============================================================================
GuiErrMsg<0x00>: Success.

This is my normal configuration and all disks work.  After experimenting 
it turns out that if I want to use ports r1 through 8 I MUST have a disk 
in port 1.  For ports r9 through 16 I MUST have a disk in port 9.  For 
ports in 17-24 I MUST have a disk in port 17. It appears there may be 
something special to CH/ID/LUN=XX/XX/00.  If there is no disk at LUN=00 
then that entire ID is not available for use by FreeBSD despite the 
areca-cli properly identifying the disk.

If you look at the kern/143299:

        *arcmsr fails to detect all attached drives. It may or may not
        have something to do with a failed device attached and e.g. PR
        148502 or 150390.*

        *c.f.:*

        *[root@manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;*
        *# Ch# ModelName Capacity Usage*
        *===============================================================================*
        *1 1 N.A. 0.0GB N.A.*
        *2 2 N.A. 0.0GB N.A.*
        *3 3 N.A. 0.0GB N.A.*
        *4 4 N.A. 0.0GB N.A.*
        *5 5 N.A. 0.0GB N.A.*
        *6 6 N.A. 0.0GB N.A.*
        *7 7 N.A. 0.0GB N.A.*
        *8 8 N.A. 0.0GB N.A.*
        *9 9 ST31500341AS 1500.3GB JBOD*
        *10 10 N.A. 0.0GB N.A.*
        *11 11 ST31500341AS 1500.3GB JBOD*
        *12 12 ST31500341AS 1500.3GB JBOD*
        *13 13 ST31500341AS 1500.3GB JBOD*
        *14 14 N.A. 0.0GB N.A.*
        *15 15 ST31500341AS 1500.3GB JBOD*
        *16 16 ST31500341AS 1500.3GB JBOD*
        *17 17 N.A. 0.0GB N.A.*
        *18 18 N.A. 0.0GB N.A.*
        *19 19 ST31500341AS 1500.3GB JBOD*
        *20 20 ST31500341AS 1500.3GB JBOD*
        *21 21 ST31500341AS 1500.3GB JBOD*
        *22 22 0.0GB Failed*
        *23 23 ST31500341AS 1500.3GB JBOD*
        *24 24 ST31500341AS 1500.3GB JBOD*
        *===============================================================================*
        *GuiErrMsg<0x00>: Success.*
        */dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d
        /dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3
        /dev/da4 /dev/da5*

        *I count 11 drives attached via the arc1280ml, not including the
        failed drive, and I see 6 appearing.*

        *camcontrol rescan all and reboots to do help the issue. I am
        running firmware 1.49.*


If you take what I observed and apply it to his post you will see that 
only disks 9, 11, 12, 13, 15, and 16 would be available to the system.  
So this is inline with the poster that says he has only 6 disk 
available.  I am writing this email in hopes that someone can find and 
fix the issue.  I do not have any failed disks to experiment with, but I 
am convinced based on 4 hours of experimenting last night that the issue 
may only involve failed disks if a disk fails in ports r1, 9 or 17.

Comment 2 Eitan Adler freebsd_committer

2017-12-31 07:58:38 UTC

For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped