[Monitoring] Superviser controleur raid et disque dur

 

Superviser contrôleur raid et disque dur

 

 

Pour mon vieux serveur Dell PowerEdge 1950 mais aussi pour tous les serveurs équipé de carte PERC/LSI dont les cartes et disques utilisent la technologie SMART (Self-Monitoring, Analysis and Reporting Technology) pour récupérer les informations.

 

Il y a un outil megacli très complet et son homologue megaclisas-status qui donne un status rapide sur l’état de santé d’un disque/raid.

voici la liste des différents outils http://hwraid.le-vert.net/wiki/DebianPackages#Packageslist

Pour cela il faut récupérer le dépôt hwraid, je suis sur Debian 9.5 Stretch

# echo "deb http://hwraid.le-vert.net/debian stretch main" >> /etc/apt/sources.list

Puis ajouter la signature du paquet

#  wget -O - https://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add -

Installer ensuite les paquets

# apt install megacli megaclisas-status

Les paquets sont aussi disponible en téléchargement

http://hwraid.le-vert.net/debian/pool-stretch/

 

Pour vérifier l’état d’un contrôleur et de ces disques rapidement, lancer la commande

# megaclisas-status
-- Controller information --
-- ID | H/W Model           | RAM    | Temp | BBU    | Firmware     
c0    | PERC 5/i Integrated | 256MB  | N/A  | Good   | FW: 5.1.1-0040 

-- Array information --
-- ID | Type   |    Size |  Strpsz | Flags | DskCache |   Status |  OS Path | CacheCade |InProgress   
c0u0  | RAID-1 |     68G |   64 KB | RA,WT |  Default |  Optimal | /dev/sda | None      |None         
c0u1  | RAID-1 |    136G |   64 KB | RA,WT |  Default |  Optimal | /dev/sdb | None      |None         

-- Disk information --
-- ID  | Type | Drive Model                      | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u0p0 | HDD  | SEAGATE ST973402SS S2073NP1YEQ6  | 67. Gb   | Online, Spun Up | Unknown  | 24C  | [8:2]    | 2       
c0u0p1 | HDD  | SEAGATE ST973402SS S2073NP1YVB1  | 67. Gb   | Online, Spun Up | Unknown  | 24C  | [8:3]    | 3       
c0u1p0 | HDD  | SEAGATE ST9146852SS HT043TB10J0G | 136.1 Gb | Online, Spun Up | Unknown  | 24C  | [8:0]    | 0       
c0u1p1 | HDD  | SEAGATE ST9146852SS HT043TB10C2P | 136.1 Gb | Online, Spun Up | Unknown  | 25C  | [8:1]    | 1

 


Verifier le nombres de disques physiques

# megacli -PDGetNum -a0
Number of Physical Drives on Adapter 0: 4

Verifier le nombres de raid (logique)

# megacli -LDGetNum -a0
Number of Virtual Drives Configured on Adapter 0: 2

Detail des 2 raids (-L0 ou -L1 pour 1 raid logique)

# megacli -LDInfo -Lall -a0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 67.75 GB
Sector Size : 512
Mirror Data : 67.75 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Is VD Cached: No
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 136.125 GB
Sector Size : 512
Mirror Data : 136.125 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Is VD Cached: No

Liste et détail des disques présents (détail du disque 4)

# megacli -pdlist -a0
Enclosure Device ID: 8
Slot Number: 3
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 3
WWN: 
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 68.366 GB [0x88bb93a Sectors]
Non Coerced Size: 67.866 GB [0x87bb93a Sectors]
Coerced Size: 67.75 GB [0x8780000 Sectors]
Sector Size: 0
Firmware state: Online, Spun Up
Device Firmware Level: S207
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c50006f5a051
SAS Address(1): 0x0
Connected Port Number: 3 
Inquiry Data: SEAGATE ST973402SS S2073NP1YVB1 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device
Drive Temperature :23C (73.40 F)
PI Eligibility: No 
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: Unknown 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No

/!\ Il faut renseigner les IDs Enclosure et Slot, pour les trouver se referrer a l’argument -pdlist
Dans mon exemple pour le disque 4
Enclosure Device ID: 8
Slot Number: 3Activer un disque

# megacli -PDOnline -PhysDrv [Enclosure:Slot] -a0
EnclId-8 SlotId-3 state changed to OnLine.

Désactiver un disque

# megacli -PDOffline -PhysDrv [Enclosure:Slot] -a0
Adapter: 0: EnclId-8 SlotId-3 state changed to OffLine.

Voir l’état de la reconstruction d’un disque membre d’un raid

# megacli -pdrbld -showprog -physdrv[Enclosure:Slot] -a0
Rebuild Progress on Device at Enclosure 8, Slot 3 Completed 47% in 9 Minutes.

Vérifier l’état d’un disque

# smartctl -d megaraid,1 -a /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-8-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST9146852SS
Revision: HT04
User Capacity: 146 815 733 760 bytes [146 GB]
Logical block size: 512 bytes
Rotation Rate: 15015 rpm
Logical Unit id: 0x5000c5001cb9ba2f
Serial number: 3TB10C2P
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sat Sep 8 23:02:50 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature: 25 C
Drive Trip Temperature: 68 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
Blocks sent to initiator = 463450516
Blocks received from initiator = 3200521919
Blocks read from cache and sent to initiator = 137178771
Number of read and write commands whose size <= segment size = 77707674
Number of read and write commands whose size > segment size = 189140

Vendor (Seagate/Hitachi) factory information
number of hours powered up = 3533,52
number of minutes until next internal SMART test = 13

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 45652222 0 0 45652222 45652222 1460,114 0
write: 0 0 0 0 0 1658,786 0
verify: 19271244 0 0 19271244 19271244 3127,975 0

Non-medium error count: 1

No self-tests have been logged

Détail de la carte raid

# megacli -adpallinfo -aALL
Adapter #0
===================================

Versions
================
Product Name : PERC 5/i Integrated
Serial No : 12345
FW Package Build: 5.1.1-0040

Image Versions in Flash:
================
Boot Block Version : R.2.3.12
BIOS Version : MT28
MPT Version : MPTFW-00.10.47.00-IT
FW Version : 1.03.10-0216
WebBIOS Version : 1.03-04
Ctrl-R Version : 1.04-017A

Settings
================
Current Time : 16:3:17 9/9, 2018
Predictive Fail Poll Interval : 300sec
Interrupt Throttle Active Count : 16
Interrupt Throttle Completion : 50us
Rebuild Rate : 30%
PR Rate : 30%
BGI Rate : 30%
Check Consistency Rate : 30%
Reconstruction Rate : 30%
Cache Flush Interval : 4s
Max Drives to Spinup at One Time : 2
Delay Among Spinup Groups : 12s
Physical Drive Coercion Mode : 128MB
Cluster Mode : Disabled
Alarm : Disabled
Auto Rebuild : Enabled
Battery Warning : Enabled
Ecc Bucket Size : 15
Ecc Bucket Leak Rate : 1440 Minutes
Restore HotSpare on Insertion : Disabled
Expose Enclosure Devices : Disabled
Maintain PD Fail History : Disabled
Host Request Reordering : Enabled
Auto Detect BackPlane Enabled : SGPIO/i2c SEP
Load Balance Mode : Auto
Use FDE Only : No
Security Key Assigned : No
Security Key Failed : No
Security Key Not Backedup : No
Default LD PowerSave Policy : Controller Defined
Maximum number of direct attached drives to spin up in 1 min : 0
Auto Enhanced Import : No
Any Offline VD Cache Preserved : No
Allow Boot with Preserved Cache : No
Disable Online Controller Reset : No
PFK in NVRAM : No
Use disk activity for locate : No
POST delay : 90 seconds
BIOS Error Handling : Stop On Errors
Current Boot Mode :Normal

Capabilities
================
RAID Level Supported : RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, SRL 3 supported
Supported Drives : SAS, SATA

Device Present
================
Virtual Drives : 2
Degraded : 0
Offline : 0
Physical Devices : 5
Disks : 4
Critical Disks : 0
Failed Disks : 0

Faire un tcheck des logs de la carte

# megacli -AdpAlILog -aALL

 

Cette article est une ébauche de l’excellent article d’admin-linux.

J’ai testé les différentes commandes sur mon serveur, ce qui ma permis de mieux comprendre ces outils.


J’en ai fait un screenshot de la page ici si le site venait a disparaître !

 

 

No votes yet.
Please wait...

2 thoughts to “[Monitoring] Superviser controleur raid et disque dur”

  1. Au travail, le plugin check_cciss donne de très bon résultats sur nos serveurs HP.

    Je le fais tourner avec le check_by_ssh pour les serveurs qui sont un peu plus sécurisés..

    https://github.com/lokeshreddy4u/nagios/blob/master/libexec/check_cciss.sh

    No votes yet.
    Please wait...
    1. Hello,

      Merci pour cette précision même si au taf je n’ai que du Dell :/
      J’ai un vieux hp proliant a remettre en état, ça sera l’occasion d’essayer le script.

      No votes yet.
      Please wait...

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.