I'm using geom_mirror/zfs on IBM System X 3250 servers, which have LSI 1064e controller. When drive dies or when it's removed from the server the system freezes on disk operations, until reboot or until same (or new) drive is inserted. After that the system runs normally. This is reproduceable and I encountered this on i386/amd64. This cannot be helped by upgrading the controller firmware (I downloaded and upgraded to the latest available from IBM support site). ps in debugger shows a great amount of processes in D state. How-To-Repeat: Get an IBM System X server. Install FreeBSD onto a geom_mirror or zfs mirrored pool. Pull out one drive. Issue some disk i/o related command.
Fix: use integrated mirroring on controller. When using IR all is fine when disk is pulled out.
Seems like its more zfs-related freeze. geom_mirror(4) seems to be able to detect that one of the providers is gone in about a minute. When dealing with a zfs mirrored pool, the console is constantly updated with these messages: [...] mpt0: completing timedout/aborted req 0xffffff80002be9c0:2484 mpt0: completing timedout/aborted req 0xffffff80002be6f0:2483 mpt0: completing timedout/aborted req 0xffffff80002be810:2482 mpt0: completing timedout/aborted req 0xffffff80002bde80:2481 mpt0: abort of req 0xffffff80002bde80:0 completed mpt0: request 0xffffff80002be660:2486 timed out for ccb 0xffffff00035ea800 (req>ccb 0xffffff00035ea800) mpt0: request 0xffffff80002be930:2487 timed out for ccb 0xffffff00035da000 (req>ccb 0xffffff00035da000) mpt0: completing timedout/aborted req 0xffffff80002aff30:2582 mpt0: abort of req 0xffffff80002aff30:0 completed2486 function 0 mpt0: request 0xffffff80002afb40:2584 timed out for ccb 0xffffff00035c0000 (req>ccb 0xffffff00035c0000) mpt0: attempting to abort req 0xffffff80002afb40:2584 function 0 mpt0: completing timedout/aborted req 0xffffff80002afb40:2584 [...] This is 100% reproduceable. Unfortunately, I got like 11 of these servers. I can provide root ssh and local console (vie IP KVM) to one of them, along with my help in any destructive testing.
Having enough time on panicbox, I can say that 8.2-RELEASE zfs mirror pool can detect that one of the disks from LSI 1064e controller is gone in about an hour. Without issuing 'camcontrol rescan' or other commands.
Persists on today's STABLE with v28.
The thing is, that after disk removal the controller sends two types of events: 0x12 and 0x16. According to the mpi_ioc.h the first is MPI_EVENT_SAS_PHY_LINK_STATUS and the second is MPI_EVENT_SAS_DISCOVERY. Furthermore, according to the kernel messages on the console during the drive removal/attaching, and the code in mpt_cam.c, mpt_cam_event() does nothing to handle these events (they both are handled by 'default:' section). I think this leads to freezing. Comparing to the linux mpt code, I can say that Linux kernel does nothing about MPI_EVENT_SAS_PHY_LINK_STATUS, but it definitely does something (which my skills are to low to understand to) about MPI_EVENT_SAS_DISCOVERY. Anyway, my skills are to low to correct this. IPKVM screenshots of drive removal and insertion (shot 1 - removal, shot 3 - insertion): http://unix.zhegan.in/files/mpt_cam_event01.jpeg http://unix.zhegan.in/files/mpt_cam_event02.jpeg http://unix.zhegan.in/files/mpt_cam_event03.jpeg
Reflashing the IT firmware doesn't help either - controller behaves identically.
Could you please give the following patch a try? http://people.freebsd.org/~marius/mpt_MPI_EVENT_QUEUE_FULL_non_member_MPI_EVENT_SAS_DEVICE_STATUS_CHANGE.diff It should make mpt(4) report the SAS device discovery and departure events to cam(4) but I don't know whether higher levels like geom_mirror(4) or zfs(4) will handle these properly. Marius
I tested it on today's STABLE (looks like the patch is against CURRENT, but I managed to apply it to STABLE, there was just some extra spaces in a couple of places). Seems to be working (do you need screens ?). At least I can say - the freeze timeout is now around 5 minutes (unfortunately, I didn't measure the exact amount of time), against one hour before the patch. Hop I will be able to diminish it even more by tuning the kern.cam.da.default_timeout. Thanks for the great work.
On Mon, Jul 25, 2011 at 08:44:24PM +0600, Eugene M. Zheganin wrote: > I tested it on today's STABLE (looks like the patch is against CURRENT, > but I managed to apply it to STABLE, there was just some extra spaces in > a couple of places). Seems to be working (do you need screens ?). At > least I can say - the freeze timeout is now around 5 minutes > (unfortunately, I didn't measure the exact amount of time), against one > hour before the patch. Hop I will be able to diminish it even more by > tuning the kern.cam.da.default_timeout. > Thanks for the great work. Do you get log entries regarding the removal of the disk in question in a timely manor, i.e. some seconds rather than minutes, like the following: Jul 24 21:13:49 flak kernel: (da0:mpt0:0:0:0): lost device Jul 24 21:13:49 flak kernel: (da0:mpt0:0:0:0): removing device entry and is it automatically detected as daX again when re-plugged? Marius
Exactly, I received these messages in seconds after disk removal, then I got freeze around 4-5 minutes (during which I thought that this was no success, and went to my office). When I came there I saw messages like 'Invalidating pack' and 'Removing device entry' and the system was unfrozen. I didn't test the device reinsertion; but I can, if you like.
On Mon, Jul 25, 2011 at 09:05:14PM +0600, Eugene M. Zheganin wrote: > Exactly, I received these messages in seconds after disk removal, then I > got freeze around 4-5 minutes (during which I thought that this was no > success, and went to my office). Okay, that's basically all that can be done form the SIM, i.e. mpt(4), driver point of view. Maybe there's also room for improvement in da(4). The following patch may also help here but I'm not sure about that: http://people.freebsd.org/~mav/periph_noretry.patch > When I came there I saw messages like 'Invalidating pack' and 'Removing > device entry' and the system was unfrozen. I didn't test the device > reinsertion; but I can, if you like. Yes, it would be great if you could test that. Marius
Reinsertion is also working just fine. Drive was detected in seconds after pushing in, without any additional iteraction from me. Thanks again for the great work.
On Wed, Jul 27, 2011 at 05:09:16PM +0600, Eugene M. Zheganin wrote: > Reinsertion is also working just fine. Drive was detected in seconds > after pushing in, without any additional iteraction from me. > > Thanks again for the great work. Ok, thanks for testing. I've sent an approval request to the Release- Engineers regarding the inclusion of these changes into 9.0. Marius
Author: marius Date: Fri Jul 29 18:38:31 2011 New Revision: 224494 URL: http://svn.freebsd.org/changeset/base/224494 Log: - Send the RELSIM_ADJUST_OPENINGS in response to a MPI_EVENT_QUEUE_FULL using the right SIM in case the HBA is RAID-capable but the target in question is not a hot spare or member of a RAID volume. - Report the loss and addition of SAS and SATA targets detected via PHY link status changes and signalled by MPI_EVENT_SAS_DEVICE_STATUS_CHANGE to cam(4) as lost devices and trigger rescans as appropriate. Without this it can take quite some time until a lost device actually is no longer tried to be used, if it ever stops. [1] - Handle MPI_EVENT_IR2, MPI_EVENT_LOG_ENTRY_ADDED, MPI_EVENT_SAS_DISCOVERY and MPI_EVENT_SAS_PHY_LINK_STATUS silently as these serve no additional purpose beyond adding cryptic entries to logs. Thanks to Hans-Joerg Sirtl for providing one of the HBAs these changes were developed with and RIP to the mainboard that didn't survive testing them. PR: 157534 [1] Approved by: re (kib) MFC after: 2 weeks Modified: head/sys/dev/mpt/mpt_cam.c head/sys/dev/mpt/mpt_raid.c head/sys/dev/mpt/mpt_raid.h Modified: head/sys/dev/mpt/mpt_cam.c ============================================================================== --- head/sys/dev/mpt/mpt_cam.c Fri Jul 29 18:35:10 2011 (r224493) +++ head/sys/dev/mpt/mpt_cam.c Fri Jul 29 18:38:31 2011 (r224494) @@ -2538,7 +2538,8 @@ mpt_cam_event(struct mpt_softc *mpt, req pqf->CurrentDepth = le16toh(pqf->CurrentDepth); mpt_prt(mpt, "QUEUE FULL EVENT: Bus 0x%02x Target 0x%02x Depth " "%d\n", pqf->Bus, pqf->TargetID, pqf->CurrentDepth); - if (mpt->phydisk_sim) { + if (mpt->phydisk_sim && mpt_is_raid_member(mpt, + pqf->TargetID) != 0) { sim = mpt->phydisk_sim; } else { sim = mpt->sim; @@ -2570,9 +2571,72 @@ mpt_cam_event(struct mpt_softc *mpt, req mpt_prt(mpt, "IR resync update %d completed\n", (data0 >> 16) & 0xff); break; + case MPI_EVENT_SAS_DEVICE_STATUS_CHANGE: + { + union ccb *ccb; + struct cam_sim *sim; + struct cam_path *tmppath; + PTR_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE psdsc; + + psdsc = (PTR_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE)msg->Data; + if (mpt->phydisk_sim && mpt_is_raid_member(mpt, + psdsc->TargetID) != 0) + sim = mpt->phydisk_sim; + else + sim = mpt->sim; + switch(psdsc->ReasonCode) { + case MPI_EVENT_SAS_DEV_STAT_RC_ADDED: + MPTLOCK_2_CAMLOCK(mpt); + ccb = xpt_alloc_ccb_nowait(); + if (ccb == NULL) { + mpt_prt(mpt, + "unable to alloc CCB for rescan\n"); + CAMLOCK_2_MPTLOCK(mpt); + break; + } + if (xpt_create_path(&ccb->ccb_h.path, xpt_periph, + cam_sim_path(sim), psdsc->TargetID, + CAM_LUN_WILDCARD) != CAM_REQ_CMP) { + CAMLOCK_2_MPTLOCK(mpt); + mpt_prt(mpt, + "unable to create path for rescan\n"); + xpt_free_ccb(ccb); + break; + } + xpt_rescan(ccb); + CAMLOCK_2_MPTLOCK(mpt); + break; + case MPI_EVENT_SAS_DEV_STAT_RC_NOT_RESPONDING: + MPTLOCK_2_CAMLOCK(mpt); + if (xpt_create_path(&tmppath, NULL, cam_sim_path(sim), + psdsc->TargetID, CAM_LUN_WILDCARD) != + CAM_REQ_CMP) { + mpt_prt(mpt, + "unable to create path for async event"); + CAMLOCK_2_MPTLOCK(mpt); + break; + } + xpt_async(AC_LOST_DEVICE, tmppath, NULL); + xpt_free_path(tmppath); + CAMLOCK_2_MPTLOCK(mpt); + break; + case MPI_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET: + break; + default: + mpt_lprt(mpt, MPT_PRT_WARN, + "SAS device status change: Bus: 0x%02x TargetID: " + "0x%02x ReasonCode: 0x%02x\n", psdsc->TargetID, + psdsc->Bus, psdsc->ReasonCode); + break; + } + break; + } case MPI_EVENT_EVENT_CHANGE: case MPI_EVENT_INTEGRATED_RAID: - case MPI_EVENT_SAS_DEVICE_STATUS_CHANGE: + case MPI_EVENT_IR2: + case MPI_EVENT_LOG_ENTRY_ADDED: + case MPI_EVENT_SAS_DISCOVERY: + case MPI_EVENT_SAS_PHY_LINK_STATUS: case MPI_EVENT_SAS_SES: break; default: Modified: head/sys/dev/mpt/mpt_raid.c ============================================================================== --- head/sys/dev/mpt/mpt_raid.c Fri Jul 29 18:35:10 2011 (r224493) +++ head/sys/dev/mpt/mpt_raid.c Fri Jul 29 18:38:31 2011 (r224494) @@ -812,6 +812,25 @@ mpt_map_physdisk(struct mpt_softc *mpt, /* XXX Ignores that there may be multiple busses/IOCs involved. */ int +mpt_is_raid_member(struct mpt_softc *mpt, target_id_t tgt) +{ + struct mpt_raid_disk *mpt_disk; + int i; + + if (mpt->ioc_page2 == NULL || mpt->ioc_page2->MaxPhysDisks == 0) + return (0); + for (i = 0; i < mpt->ioc_page2->MaxPhysDisks; i++) { + mpt_disk = &mpt->raid_disks[i]; + if ((mpt_disk->flags & MPT_RDF_ACTIVE) != 0 && + mpt_disk->config_page.PhysDiskID == tgt) + return (1); + } + return (0); + +} + +/* XXX Ignores that there may be multiple busses/IOCs involved. */ +int mpt_is_raid_volume(struct mpt_softc *mpt, target_id_t tgt) { CONFIG_PAGE_IOC_2_RAID_VOL *ioc_vol; Modified: head/sys/dev/mpt/mpt_raid.h ============================================================================== --- head/sys/dev/mpt/mpt_raid.h Fri Jul 29 18:35:10 2011 (r224493) +++ head/sys/dev/mpt/mpt_raid.h Fri Jul 29 18:38:31 2011 (r224494) @@ -54,6 +54,7 @@ typedef enum { } mpt_raid_mwce_t; cam_status mpt_map_physdisk(struct mpt_softc *, union ccb *, target_id_t *); +int mpt_is_raid_member(struct mpt_softc *, target_id_t); int mpt_is_raid_volume(struct mpt_softc *, target_id_t); #if 0 cam_status _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Author: marius Date: Sat Aug 13 12:37:22 2011 New Revision: 224820 URL: http://svn.freebsd.org/changeset/base/224820 Log: MFC: r224494, r224761 - Send the RELSIM_ADJUST_OPENINGS in response to a MPI_EVENT_QUEUE_FULL using the right SIM in case the HBA is RAID-capable but the target in question is not a hot spare or member of a RAID volume. - Report the loss and addition of SAS and SATA targets detected via PHY link status changes and signalled by MPI_EVENT_SAS_DEVICE_STATUS_CHANGE to cam(4) as lost devices and trigger rescans as appropriate. Without this it can take quite some time until a lost device actually is no longer tried to be used, if it ever stops. [1] - Handle MPI_EVENT_IR2, MPI_EVENT_LOG_ENTRY_ADDED, MPI_EVENT_SAS_DISCOVERY and MPI_EVENT_SAS_PHY_LINK_STATUS silently as these serve no additional purpose beyond adding cryptic entries to logs. - Add a warning for MPI_EVENT_SAS_DISCOVERY_ERROR events, which can help identifying broken disks. [2] Thanks to Hans-Joerg Sirtl for providing one of the HBAs these changes were developed with and RIP to the mainboard that didn't survive testing them. PR: 157534 [1] Submitted by: Andrew Boyer [2] Modified: stable/8/sys/dev/mpt/mpilib/mpi_ioc.h stable/8/sys/dev/mpt/mpt_cam.c stable/8/sys/dev/mpt/mpt_raid.c stable/8/sys/dev/mpt/mpt_raid.h Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) Modified: stable/8/sys/dev/mpt/mpilib/mpi_ioc.h ============================================================================== --- stable/8/sys/dev/mpt/mpilib/mpi_ioc.h Sat Aug 13 12:33:41 2011 (r224819) +++ stable/8/sys/dev/mpt/mpilib/mpi_ioc.h Sat Aug 13 12:37:22 2011 (r224820) @@ -33,7 +33,7 @@ * Title: MPI IOC, Port, Event, FW Download, and FW Upload messages * Creation Date: August 11, 2000 * - * mpi_ioc.h Version: 01.05.14 + * mpi_ioc.h Version: 01.05.16 * * Version History * --------------- @@ -140,6 +140,16 @@ * added _MULTI_PORT_DOMAIN. * 05-24-07 01.05.14 Added Common Boot Block type to FWDownload Request. * Added Common Boot Block type to FWUpload Request. + * 08-07-07 01.05.15 Added MPI_EVENT_SAS_INIT_RC_REMOVED define. + * Added MPI_EVENT_IR2_RC_DUAL_PORT_ADDED and + * MPI_EVENT_IR2_RC_DUAL_PORT_REMOVED for IR2 event data. + * Added SASAddress field to SAS Initiator Device Table + * Overflow event data structure. + * 03-28-08 01.05.16 Added two new ReasonCode values to SAS Device Status + * Change Event data to indicate completion of internally + * generated task management. + * Added MPI_EVENT_DSCVRY_ERR_DS_SATA_INIT_FAILURE define. + * Added MPI_EVENT_SAS_INIT_RC_INACCESSIBLE define. * -------------------------------------------------------------------------- */ @@ -639,6 +649,8 @@ typedef struct _EVENT_DATA_SAS_DEVICE_ST #define MPI_EVENT_SAS_DEV_STAT_RC_CLEAR_TASK_SET_INTERNAL (0x0B) #define MPI_EVENT_SAS_DEV_STAT_RC_QUERY_TASK_INTERNAL (0x0C) #define MPI_EVENT_SAS_DEV_STAT_RC_ASYNC_NOTIFICATION (0x0D) +#define MPI_EVENT_SAS_DEV_STAT_RC_CMPL_INTERNAL_DEV_RESET (0x0E) +#define MPI_EVENT_SAS_DEV_STAT_RC_CMPL_TASK_ABORT_INTERNAL (0x0F) /* SCSI Event data for Queue Full event */ @@ -735,6 +747,8 @@ typedef struct _MPI_EVENT_DATA_IR2 #define MPI_EVENT_IR2_RC_PD_REMOVED (0x05) #define MPI_EVENT_IR2_RC_FOREIGN_CFG_DETECTED (0x06) #define MPI_EVENT_IR2_RC_REBUILD_MEDIUM_ERROR (0x07) +#define MPI_EVENT_IR2_RC_DUAL_PORT_ADDED (0x08) +#define MPI_EVENT_IR2_RC_DUAL_PORT_REMOVED (0x09) /* defines for logical disk states */ #define MPI_LD_STATE_OPTIMAL (0x00) @@ -894,6 +908,7 @@ typedef struct _EVENT_DATA_DISCOVERY_ERR #define MPI_EVENT_DSCVRY_ERR_DS_UNSUPPORTED_DEVICE (0x00000800) #define MPI_EVENT_DSCVRY_ERR_DS_MAX_SATA_TARGETS (0x00001000) #define MPI_EVENT_DSCVRY_ERR_DS_MULTI_PORT_DOMAIN (0x00002000) +#define MPI_EVENT_DSCVRY_ERR_DS_SATA_INIT_FAILURE (0x00004000) /* SAS SMP Error Event data */ @@ -929,6 +944,8 @@ typedef struct _EVENT_DATA_SAS_INIT_DEV_ /* defines for the ReasonCode field of the SAS Initiator Device Status Change event */ #define MPI_EVENT_SAS_INIT_RC_ADDED (0x01) +#define MPI_EVENT_SAS_INIT_RC_REMOVED (0x02) +#define MPI_EVENT_SAS_INIT_RC_INACCESSIBLE (0x03) /* SAS Initiator Device Table Overflow Event data */ @@ -937,6 +954,7 @@ typedef struct _EVENT_DATA_SAS_INIT_TABL U8 MaxInit; /* 00h */ U8 CurrentInit; /* 01h */ U16 Reserved1; /* 02h */ + U64 SASAddress; /* 04h */ } EVENT_DATA_SAS_INIT_TABLE_OVERFLOW, MPI_POINTER PTR_EVENT_DATA_SAS_INIT_TABLE_OVERFLOW, MpiEventDataSasInitTableOverflow_t, Modified: stable/8/sys/dev/mpt/mpt_cam.c ============================================================================== --- stable/8/sys/dev/mpt/mpt_cam.c Sat Aug 13 12:33:41 2011 (r224819) +++ stable/8/sys/dev/mpt/mpt_cam.c Sat Aug 13 12:37:22 2011 (r224820) @@ -2538,7 +2538,8 @@ mpt_cam_event(struct mpt_softc *mpt, req pqf->CurrentDepth = le16toh(pqf->CurrentDepth); mpt_prt(mpt, "QUEUE FULL EVENT: Bus 0x%02x Target 0x%02x Depth " "%d\n", pqf->Bus, pqf->TargetID, pqf->CurrentDepth); - if (mpt->phydisk_sim) { + if (mpt->phydisk_sim && mpt_is_raid_member(mpt, + pqf->TargetID) != 0) { sim = mpt->phydisk_sim; } else { sim = mpt->sim; @@ -2570,9 +2571,85 @@ mpt_cam_event(struct mpt_softc *mpt, req mpt_prt(mpt, "IR resync update %d completed\n", (data0 >> 16) & 0xff); break; + case MPI_EVENT_SAS_DEVICE_STATUS_CHANGE: + { + union ccb *ccb; + struct cam_sim *sim; + struct cam_path *tmppath; + PTR_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE psdsc; + + psdsc = (PTR_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE)msg->Data; + if (mpt->phydisk_sim && mpt_is_raid_member(mpt, + psdsc->TargetID) != 0) + sim = mpt->phydisk_sim; + else + sim = mpt->sim; + switch(psdsc->ReasonCode) { + case MPI_EVENT_SAS_DEV_STAT_RC_ADDED: + MPTLOCK_2_CAMLOCK(mpt); + ccb = xpt_alloc_ccb_nowait(); + if (ccb == NULL) { + mpt_prt(mpt, + "unable to alloc CCB for rescan\n"); + CAMLOCK_2_MPTLOCK(mpt); + break; + } + if (xpt_create_path(&ccb->ccb_h.path, xpt_periph, + cam_sim_path(sim), psdsc->TargetID, + CAM_LUN_WILDCARD) != CAM_REQ_CMP) { + CAMLOCK_2_MPTLOCK(mpt); + mpt_prt(mpt, + "unable to create path for rescan\n"); + xpt_free_ccb(ccb); + break; + } + xpt_rescan(ccb); + CAMLOCK_2_MPTLOCK(mpt); + break; + case MPI_EVENT_SAS_DEV_STAT_RC_NOT_RESPONDING: + MPTLOCK_2_CAMLOCK(mpt); + if (xpt_create_path(&tmppath, NULL, cam_sim_path(sim), + psdsc->TargetID, CAM_LUN_WILDCARD) != + CAM_REQ_CMP) { + mpt_prt(mpt, + "unable to create path for async event"); + CAMLOCK_2_MPTLOCK(mpt); + break; + } + xpt_async(AC_LOST_DEVICE, tmppath, NULL); + xpt_free_path(tmppath); + CAMLOCK_2_MPTLOCK(mpt); + break; + case MPI_EVENT_SAS_DEV_STAT_RC_CMPL_INTERNAL_DEV_RESET: + case MPI_EVENT_SAS_DEV_STAT_RC_CMPL_TASK_ABORT_INTERNAL: + case MPI_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET: + break; + default: + mpt_lprt(mpt, MPT_PRT_WARN, + "SAS device status change: Bus: 0x%02x TargetID: " + "0x%02x ReasonCode: 0x%02x\n", psdsc->Bus, + psdsc->TargetID, psdsc->ReasonCode); + break; + } + break; + } + case MPI_EVENT_SAS_DISCOVERY_ERROR: + { + PTR_EVENT_DATA_DISCOVERY_ERROR pde; + + pde = (PTR_EVENT_DATA_DISCOVERY_ERROR)msg->Data; + pde->DiscoveryStatus = le32toh(pde->DiscoveryStatus); + mpt_lprt(mpt, MPT_PRT_WARN, + "SAS discovery error: Port: 0x%02x Status: 0x%08x\n", + pde->Port, pde->DiscoveryStatus); + break; + } case MPI_EVENT_EVENT_CHANGE: case MPI_EVENT_INTEGRATED_RAID: - case MPI_EVENT_SAS_DEVICE_STATUS_CHANGE: + case MPI_EVENT_IR2: + case MPI_EVENT_LOG_ENTRY_ADDED: + case MPI_EVENT_SAS_DISCOVERY: + case MPI_EVENT_SAS_PHY_LINK_STATUS: case MPI_EVENT_SAS_SES: break; default: Modified: stable/8/sys/dev/mpt/mpt_raid.c ============================================================================== --- stable/8/sys/dev/mpt/mpt_raid.c Sat Aug 13 12:33:41 2011 (r224819) +++ stable/8/sys/dev/mpt/mpt_raid.c Sat Aug 13 12:37:22 2011 (r224820) @@ -812,6 +812,25 @@ mpt_map_physdisk(struct mpt_softc *mpt, /* XXX Ignores that there may be multiple busses/IOCs involved. */ int +mpt_is_raid_member(struct mpt_softc *mpt, target_id_t tgt) +{ + struct mpt_raid_disk *mpt_disk; + int i; + + if (mpt->ioc_page2 == NULL || mpt->ioc_page2->MaxPhysDisks == 0) + return (0); + for (i = 0; i < mpt->ioc_page2->MaxPhysDisks; i++) { + mpt_disk = &mpt->raid_disks[i]; + if ((mpt_disk->flags & MPT_RDF_ACTIVE) != 0 && + mpt_disk->config_page.PhysDiskID == tgt) + return (1); + } + return (0); + +} + +/* XXX Ignores that there may be multiple busses/IOCs involved. */ +int mpt_is_raid_volume(struct mpt_softc *mpt, target_id_t tgt) { CONFIG_PAGE_IOC_2_RAID_VOL *ioc_vol; Modified: stable/8/sys/dev/mpt/mpt_raid.h ============================================================================== --- stable/8/sys/dev/mpt/mpt_raid.h Sat Aug 13 12:33:41 2011 (r224819) +++ stable/8/sys/dev/mpt/mpt_raid.h Sat Aug 13 12:37:22 2011 (r224820) @@ -54,6 +54,7 @@ typedef enum { } mpt_raid_mwce_t; cam_status mpt_map_physdisk(struct mpt_softc *, union ccb *, target_id_t *); +int mpt_is_raid_member(struct mpt_softc *, target_id_t); int mpt_is_raid_volume(struct mpt_softc *, target_id_t); #if 0 cam_status _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Author: marius Date: Sat Aug 13 12:37:23 2011 New Revision: 224821 URL: http://svn.freebsd.org/changeset/base/224821 Log: MFC: r224494, r224761 - Send the RELSIM_ADJUST_OPENINGS in response to a MPI_EVENT_QUEUE_FULL using the right SIM in case the HBA is RAID-capable but the target in question is not a hot spare or member of a RAID volume. - Report the loss and addition of SAS and SATA targets detected via PHY link status changes and signalled by MPI_EVENT_SAS_DEVICE_STATUS_CHANGE to cam(4) as lost devices and trigger rescans as appropriate. Without this it can take quite some time until a lost device actually is no longer tried to be used, if it ever stops. [1] - Handle MPI_EVENT_IR2, MPI_EVENT_LOG_ENTRY_ADDED, MPI_EVENT_SAS_DISCOVERY and MPI_EVENT_SAS_PHY_LINK_STATUS silently as these serve no additional purpose beyond adding cryptic entries to logs. - Add a warning for MPI_EVENT_SAS_DISCOVERY_ERROR events, which can help identifying broken disks. [2] Thanks to Hans-Joerg Sirtl for providing one of the HBAs these changes were developed with and RIP to the mainboard that didn't survive testing them. PR: 157534 [1] Submitted by: Andrew Boyer [2] Modified: stable/7/sys/dev/mpt/mpilib/mpi_ioc.h stable/7/sys/dev/mpt/mpt_cam.c stable/7/sys/dev/mpt/mpt_raid.c stable/7/sys/dev/mpt/mpt_raid.h Directory Properties: stable/7/sys/ (props changed) stable/7/sys/cddl/contrib/opensolaris/ (props changed) stable/7/sys/contrib/dev/acpica/ (props changed) stable/7/sys/contrib/pf/ (props changed) Modified: stable/7/sys/dev/mpt/mpilib/mpi_ioc.h ============================================================================== --- stable/7/sys/dev/mpt/mpilib/mpi_ioc.h Sat Aug 13 12:37:22 2011 (r224820) +++ stable/7/sys/dev/mpt/mpilib/mpi_ioc.h Sat Aug 13 12:37:23 2011 (r224821) @@ -33,7 +33,7 @@ * Title: MPI IOC, Port, Event, FW Download, and FW Upload messages * Creation Date: August 11, 2000 * - * mpi_ioc.h Version: 01.05.14 + * mpi_ioc.h Version: 01.05.16 * * Version History * --------------- @@ -140,6 +140,16 @@ * added _MULTI_PORT_DOMAIN. * 05-24-07 01.05.14 Added Common Boot Block type to FWDownload Request. * Added Common Boot Block type to FWUpload Request. + * 08-07-07 01.05.15 Added MPI_EVENT_SAS_INIT_RC_REMOVED define. + * Added MPI_EVENT_IR2_RC_DUAL_PORT_ADDED and + * MPI_EVENT_IR2_RC_DUAL_PORT_REMOVED for IR2 event data. + * Added SASAddress field to SAS Initiator Device Table + * Overflow event data structure. + * 03-28-08 01.05.16 Added two new ReasonCode values to SAS Device Status + * Change Event data to indicate completion of internally + * generated task management. + * Added MPI_EVENT_DSCVRY_ERR_DS_SATA_INIT_FAILURE define. + * Added MPI_EVENT_SAS_INIT_RC_INACCESSIBLE define. * -------------------------------------------------------------------------- */ @@ -639,6 +649,8 @@ typedef struct _EVENT_DATA_SAS_DEVICE_ST #define MPI_EVENT_SAS_DEV_STAT_RC_CLEAR_TASK_SET_INTERNAL (0x0B) #define MPI_EVENT_SAS_DEV_STAT_RC_QUERY_TASK_INTERNAL (0x0C) #define MPI_EVENT_SAS_DEV_STAT_RC_ASYNC_NOTIFICATION (0x0D) +#define MPI_EVENT_SAS_DEV_STAT_RC_CMPL_INTERNAL_DEV_RESET (0x0E) +#define MPI_EVENT_SAS_DEV_STAT_RC_CMPL_TASK_ABORT_INTERNAL (0x0F) /* SCSI Event data for Queue Full event */ @@ -735,6 +747,8 @@ typedef struct _MPI_EVENT_DATA_IR2 #define MPI_EVENT_IR2_RC_PD_REMOVED (0x05) #define MPI_EVENT_IR2_RC_FOREIGN_CFG_DETECTED (0x06) #define MPI_EVENT_IR2_RC_REBUILD_MEDIUM_ERROR (0x07) +#define MPI_EVENT_IR2_RC_DUAL_PORT_ADDED (0x08) +#define MPI_EVENT_IR2_RC_DUAL_PORT_REMOVED (0x09) /* defines for logical disk states */ #define MPI_LD_STATE_OPTIMAL (0x00) @@ -894,6 +908,7 @@ typedef struct _EVENT_DATA_DISCOVERY_ERR #define MPI_EVENT_DSCVRY_ERR_DS_UNSUPPORTED_DEVICE (0x00000800) #define MPI_EVENT_DSCVRY_ERR_DS_MAX_SATA_TARGETS (0x00001000) #define MPI_EVENT_DSCVRY_ERR_DS_MULTI_PORT_DOMAIN (0x00002000) +#define MPI_EVENT_DSCVRY_ERR_DS_SATA_INIT_FAILURE (0x00004000) /* SAS SMP Error Event data */ @@ -929,6 +944,8 @@ typedef struct _EVENT_DATA_SAS_INIT_DEV_ /* defines for the ReasonCode field of the SAS Initiator Device Status Change event */ #define MPI_EVENT_SAS_INIT_RC_ADDED (0x01) +#define MPI_EVENT_SAS_INIT_RC_REMOVED (0x02) +#define MPI_EVENT_SAS_INIT_RC_INACCESSIBLE (0x03) /* SAS Initiator Device Table Overflow Event data */ @@ -937,6 +954,7 @@ typedef struct _EVENT_DATA_SAS_INIT_TABL U8 MaxInit; /* 00h */ U8 CurrentInit; /* 01h */ U16 Reserved1; /* 02h */ + U64 SASAddress; /* 04h */ } EVENT_DATA_SAS_INIT_TABLE_OVERFLOW, MPI_POINTER PTR_EVENT_DATA_SAS_INIT_TABLE_OVERFLOW, MpiEventDataSasInitTableOverflow_t, Modified: stable/7/sys/dev/mpt/mpt_cam.c ============================================================================== --- stable/7/sys/dev/mpt/mpt_cam.c Sat Aug 13 12:37:22 2011 (r224820) +++ stable/7/sys/dev/mpt/mpt_cam.c Sat Aug 13 12:37:23 2011 (r224821) @@ -2538,7 +2538,8 @@ mpt_cam_event(struct mpt_softc *mpt, req pqf->CurrentDepth = le16toh(pqf->CurrentDepth); mpt_prt(mpt, "QUEUE FULL EVENT: Bus 0x%02x Target 0x%02x Depth " "%d\n", pqf->Bus, pqf->TargetID, pqf->CurrentDepth); - if (mpt->phydisk_sim) { + if (mpt->phydisk_sim && mpt_is_raid_member(mpt, + pqf->TargetID) != 0) { sim = mpt->phydisk_sim; } else { sim = mpt->sim; @@ -2570,9 +2571,85 @@ mpt_cam_event(struct mpt_softc *mpt, req mpt_prt(mpt, "IR resync update %d completed\n", (data0 >> 16) & 0xff); break; + case MPI_EVENT_SAS_DEVICE_STATUS_CHANGE: + { + union ccb *ccb; + struct cam_sim *sim; + struct cam_path *tmppath; + PTR_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE psdsc; + + psdsc = (PTR_EVENT_DATA_SAS_DEVICE_STATUS_CHANGE)msg->Data; + if (mpt->phydisk_sim && mpt_is_raid_member(mpt, + psdsc->TargetID) != 0) + sim = mpt->phydisk_sim; + else + sim = mpt->sim; + switch(psdsc->ReasonCode) { + case MPI_EVENT_SAS_DEV_STAT_RC_ADDED: + MPTLOCK_2_CAMLOCK(mpt); + ccb = xpt_alloc_ccb_nowait(); + if (ccb == NULL) { + mpt_prt(mpt, + "unable to alloc CCB for rescan\n"); + CAMLOCK_2_MPTLOCK(mpt); + break; + } + if (xpt_create_path(&ccb->ccb_h.path, xpt_periph, + cam_sim_path(sim), psdsc->TargetID, + CAM_LUN_WILDCARD) != CAM_REQ_CMP) { + CAMLOCK_2_MPTLOCK(mpt); + mpt_prt(mpt, + "unable to create path for rescan\n"); + xpt_free_ccb(ccb); + break; + } + xpt_rescan(ccb); + CAMLOCK_2_MPTLOCK(mpt); + break; + case MPI_EVENT_SAS_DEV_STAT_RC_NOT_RESPONDING: + MPTLOCK_2_CAMLOCK(mpt); + if (xpt_create_path(&tmppath, NULL, cam_sim_path(sim), + psdsc->TargetID, CAM_LUN_WILDCARD) != + CAM_REQ_CMP) { + mpt_prt(mpt, + "unable to create path for async event"); + CAMLOCK_2_MPTLOCK(mpt); + break; + } + xpt_async(AC_LOST_DEVICE, tmppath, NULL); + xpt_free_path(tmppath); + CAMLOCK_2_MPTLOCK(mpt); + break; + case MPI_EVENT_SAS_DEV_STAT_RC_CMPL_INTERNAL_DEV_RESET: + case MPI_EVENT_SAS_DEV_STAT_RC_CMPL_TASK_ABORT_INTERNAL: + case MPI_EVENT_SAS_DEV_STAT_RC_INTERNAL_DEVICE_RESET: + break; + default: + mpt_lprt(mpt, MPT_PRT_WARN, + "SAS device status change: Bus: 0x%02x TargetID: " + "0x%02x ReasonCode: 0x%02x\n", psdsc->Bus, + psdsc->TargetID, psdsc->ReasonCode); + break; + } + break; + } + case MPI_EVENT_SAS_DISCOVERY_ERROR: + { + PTR_EVENT_DATA_DISCOVERY_ERROR pde; + + pde = (PTR_EVENT_DATA_DISCOVERY_ERROR)msg->Data; + pde->DiscoveryStatus = le32toh(pde->DiscoveryStatus); + mpt_lprt(mpt, MPT_PRT_WARN, + "SAS discovery error: Port: 0x%02x Status: 0x%08x\n", + pde->Port, pde->DiscoveryStatus); + break; + } case MPI_EVENT_EVENT_CHANGE: case MPI_EVENT_INTEGRATED_RAID: - case MPI_EVENT_SAS_DEVICE_STATUS_CHANGE: + case MPI_EVENT_IR2: + case MPI_EVENT_LOG_ENTRY_ADDED: + case MPI_EVENT_SAS_DISCOVERY: + case MPI_EVENT_SAS_PHY_LINK_STATUS: case MPI_EVENT_SAS_SES: break; default: Modified: stable/7/sys/dev/mpt/mpt_raid.c ============================================================================== --- stable/7/sys/dev/mpt/mpt_raid.c Sat Aug 13 12:37:22 2011 (r224820) +++ stable/7/sys/dev/mpt/mpt_raid.c Sat Aug 13 12:37:23 2011 (r224821) @@ -828,6 +828,25 @@ mpt_map_physdisk(struct mpt_softc *mpt, /* XXX Ignores that there may be multiple busses/IOCs involved. */ int +mpt_is_raid_member(struct mpt_softc *mpt, target_id_t tgt) +{ + struct mpt_raid_disk *mpt_disk; + int i; + + if (mpt->ioc_page2 == NULL || mpt->ioc_page2->MaxPhysDisks == 0) + return (0); + for (i = 0; i < mpt->ioc_page2->MaxPhysDisks; i++) { + mpt_disk = &mpt->raid_disks[i]; + if ((mpt_disk->flags & MPT_RDF_ACTIVE) != 0 && + mpt_disk->config_page.PhysDiskID == tgt) + return (1); + } + return (0); + +} + +/* XXX Ignores that there may be multiple busses/IOCs involved. */ +int mpt_is_raid_volume(struct mpt_softc *mpt, target_id_t tgt) { CONFIG_PAGE_IOC_2_RAID_VOL *ioc_vol; Modified: stable/7/sys/dev/mpt/mpt_raid.h ============================================================================== --- stable/7/sys/dev/mpt/mpt_raid.h Sat Aug 13 12:37:22 2011 (r224820) +++ stable/7/sys/dev/mpt/mpt_raid.h Sat Aug 13 12:37:23 2011 (r224821) @@ -54,6 +54,7 @@ typedef enum { } mpt_raid_mwce_t; cam_status mpt_map_physdisk(struct mpt_softc *, union ccb *, target_id_t *); +int mpt_is_raid_member(struct mpt_softc *, target_id_t); int mpt_is_raid_volume(struct mpt_softc *, target_id_t); #if 0 cam_status _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: open->closed Close; patches have been committed and MFC'ed down to stable/7.