Monitorizar/reconstruir hardware de RAID (Linux)

Controladores de hardware incorporados

En los servidores root de IONOS se utilizan dos tipos de controladores de hardware: 3ware y Areca.

Así puede consultar qué controlador está instalado en su servidor:

# lspci|grep RAID
01:09.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID (rev 01)

bzw.

# lspci|grep RAID
02:0e.0 RAID bus controller: Areca Technology Corp. ARC-1110 4-Port PCI-X to SATA RAID Controller

3ware RAID

Así recibe detalles sobre el controlador:

# dmesg|grep 3ware
3ware Storage Controller device driver for Linux v1.26.02.002.
scsi0 : 3ware Storage Controller
3w-xxxx: scsi0: Found a 3ware Storage Controller at 0xd800, IRQ: 18.
scsi 0:0:0:0: Direct-Access 3ware Logical Disk 0 1.2 PQ: 0 ANSI: 0
3ware 9000 Storage Controller device driver for Linux v2.26.02.010.

tw_cli abre la conexión con la consola del controlador.
El comando help devuelve todos los comandos disponibles.

# tw_cli
//XXX> help

Copyright(c) 2004-2006 Applied Micro Circuits Corporation(AMCC). All rights reserved.

AMCC/3ware CLI (version 2.00.06.007)


Commands Description
-------------------------------------------------------------------
focus Changes from one object to another. For Interactive Mode Only!
show Displays information about controller(s), unit(s) and port(s).
flush Flush write cache data to units in the system.
rescan Rescan all empty ports for new unit(s) and disk(s).
update Update controller firmware from an image file.
commit Commit dirty DCB to storage on controller(s). (Windows only)
/cx Controller specific commands.
/cx/ux Unit specific commands.
/cx/px Port specific commands.
/cx/bbu BBU specific commands. (9000 only)
/ex Enclosure specific commands. (9KSX/SE only)
/ex/slotx Enclosure Slot specific commands.
/ex/fanx Enclosure Fan specific commands.
/ex/tempx Enclosure Temperature Sensor specific commands.

Certain commands are qualified with constraints of controller type/model support.
Please consult the twi_cli documentation for explanation of the controller-qualifiers.

The controller-qualifiers of the Enclosure commands (/ex) also apply to Enclosure
Element specific commands (e.g., /ex/elementx).

Type help <command> to get more details about a particular command.
For more detail information see twi_cli's documentation.

//XXX>

Consultar información y estado del RAID:

//XXXX> info

Ctl Model Ports Drives Units NotOpt RRate VRate BBU
------------------------------------------------------------------------
c0 8006-2LP 2 2 1 0 2 - -

//XXXX> info c0

Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 OK - - - 232.885 ON -

Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 232.88 GB 488397168 4ND0XYFE
p1 OK u0 232.88 GB 488397168 4ND0YH77

Visualización de alarmas:

//XXXX> show alarms

Ctl Date Severity Alarm Message
------------------------------------------------------------------------------
c0 - INFO (0x0F:0x0007): Initialization complete: Unit #0
c0 - INFO (0x0F:0x000C): Initialization started: Unit #0

En caso de error, la salida tiene el siguiente aspecto:

//XXXX> show alarms

Ctl Date Severity Alarm Message
------------------------------------------------------------------------------
c0 - INFO (0x0F:0x000B): Rebuild started: Unit #0
c0 - ERROR (0x0F:0x0002): Unit degraded: Unit #0

Remover el disco duro defectuoso del RAID en el segundo puerto:

//XXXX> maint remove c0 p1
Removing port /c0/p1 ... Done.

Una vez sustituido el disco duro defectuoso, debe reconocerse el nuevo disco duro:

//XXXX> maint rescan
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [/c0/p1].

El disco se conecta al segundo puerto y se reconstruye:

//XXXX> maint rebuild c0 u0 p1
Sending rebuild start request to /c0/u0 on 1 disk(s) [1] ... Done.

Ver el estado de la reconstrucción:

//XXXX> info c0

Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 REBUILDING 0 - - 232.885 ON -

Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 232.88 GB 488397168 4ND0XYFE
p1 DEGRADED u0 232.88 GB 488397168 4ND0YH77

3dm2

Para fines de monitorización, 3ware ofrece el software 3dm2, que puede descargarse aquí.

Consulte la documentación de 3ware para obtener más información sobre la instalación, configuración y aplicación.

Areca RAID

Compruebe si se utiliza un controlador Areca:

# lspci|grep RAID
02:0e.0 RAID bus controller: Areca Technology Corp. ARC-1110 4-Port PCI-X to SATA RAID Controller

# dmesg|grep -i areca
ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17
scsi0 : Areca SATA Host Adapter RAID Controller
scsi 0:0:0:0: Direct-Access Areca ARC-1110-VOL#00 R001 PQ: 0 ANSI: 5
scsi 0:0:16:0: Processor Areca RAID controller R001 PQ: 0 ANSI: 0

Puede descargar el manual completo de CLI de Areca aquí.

En el siguiente ejemplo se enumeran algunos comandos. Puede acceder al controlador en el sistema de rescate:

arcmsr_cli64
Copyright (c) 2004 Areca, Inc. All Rights Reserved.
Areca CLI, Version: 1.71.240( Linux )


Controllers List
----------------------------------------
Controller#01(PCI): ARC-1110
Current Controller: Controller#01
----------------------------------------

CMD Description
==========================================================
main Show Command Categories.
set General Settings.
rsf RaidSet Functions.
vsf VolumeSet Functions.
disk Physical Drive Functions.
sys System Functions.
net Ethernet Functions.
event Event Functions.
hw Hardware Monitor Information.
exit Exit CLI.
==========================================================
Command Format: <CMD> [Sub-Command] [Parameters].
Note: Use <CMD> -h or -help to get details.
CLI>

Con el comando <cmd> info se puede consultar la información del sistema, por ejemplo, la información del monitor de hardware (temperatura):

CLI> hw info
The Hardware Monitor Information
===========================================
Fan#1 Speed (RPM) : 2673
HDD #1 Temp. : 48
HDD #2 Temp. : 47
HDD #3 Temp. : 51
HDD #4 Temp. : 0
===========================================
GuiErrMsg<0x00>: Success.

CLI>

Información sobre los discos duros se obtiene así:

CLI> disk info
# ModelName Serial# FirmRev Capacity State
===============================================================================
1 ST3750640AS 5QD5G7Z1 3.AAK 750.2GB RaidSet Member(1)
2 ST3750640AS 5QD5G6JR 3.AAK 750.2GB RaidSet Member(1)
3 ST3750640AS 5QD5G7XQ 3.AAK 750.2GB RaidSet Member(1)
===============================================================================
GuiErrMsg<0x00>: Success.

CLI>

Información sobre el propio controlador:

CLI> sys info
The System Information
===========================================
Main Processor : 500MHz
CPU ICache Size : 32KB
CPU DCache Size : 32KB
System Memory : 256MB/333MHz
Firmware Version : V1.43 2007-4-17
BOOT ROM Version : V1.43 2007-4-17
Serial Number : Y813CAAAAR101890
Controller Name : ARC-1110
===========================================
GuiErrMsg<0x00>: Success.

CLI>

Mostrar eventos actuales:

CLI> event info
Date-Time Device Event Type
===============================================================================
2009-07-09 07:23:14 H/W MONITOR Raid Powered On
2008-09-29 08:06:24 H/W MONITOR Raid Powered On
2008-09-29 07:51:37 H/W MONITOR Raid Powered On
...

Obtener información sobre el conjunto de RAID actual (en este ejemplo, 3*750 GB están instalados):

CLI> rsf info
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
1 Raid Set # 00 3 2250.5GB 0.0GB 123 Normal
===============================================================================
GuiErrMsg<0x00>: Success.

CLI>

Información sobre los volúmenes lógicos RAID:

CLI> vsf info
# Name Raid# Level Capacity Ch/Id/Lun State
===============================================================================
1 ARC-1110-VOL#00 1 Raid5 1500.3GB 00/00/00 Normal
===============================================================================
GuiErrMsg<0x00>: Success.

CLI>

Si desea realizar cambios en un raid de hardware con Areca Controler, se requiere una contraseña. La contraseña por defecto es "0000". Aquí puede ver un comando de ejemplo:

<CLI> set password=0000. 

RAID y reconstrucción defectuosos

Un RAID defectuoso puede tener el siguiente aspecto:

CLI> rsf info
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
1 Raid Set # 00 3 2250.5GB 0.0GB 1x3 Degrade
2 Raid Set # 00 3 2250.5GB 2250.5GB x2x Incompleted
===============================================================================
GuiErrMsg<0x00>: Success.

El raid con el estado Incompleted debe borrarse:

CLI> rsf delete raid=2
GuiErrMsg<0x00>: Success.
CLI> rsf info
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
1 Raid Set # 00 3 2250.5GB 0.0GB 1x3 Degrade
===============================================================================
GuiErrMsg<0x00>: Success.

A continuación, vuelva a integrar el disco como Hot Spare:

CLI> rsf createhs drv=2
GuiErrMsg<0x00>: Success.

El controlador Areca detecta automáticamente un nuevo disco. Por lo tanto, no es necesario iniciar una reconstrucción.

La reconstrucción se inicia automáticamente y puede ser monitoreada:

CLI> rsf info
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
1 Raid Set # 00 3 2250.5GB 0.0GB 123 Rebuilding
===============================================================================

GuiErrMsg<0x00>: Success.