[Monitoring] Superviser controleur raid et disque dur

 

Superviser contrôleur raid et disque dur

 

 

Pour mon vieux serveur Dell PowerEdge 1950 mais aussi pour tous les serveurs équipé de carte PERC/LSI dont les cartes et disques utilisent la technologie SMART (Self-Monitoring, Analysis and Reporting Technology) pour récupérer les informations.

 

Il y a un outil megacli très complet et son homologue megaclisas-status qui donne un status rapide sur l’état de santé d’un disque/raid.

voici la liste des différents outils http://hwraid.le-vert.net/wiki/DebianPackages#Packageslist

Pour cela il faut récupérer le dépôt hwraid, je suis sur Debian 9.5 Stretch

# echo "deb http://hwraid.le-vert.net/debian stretch main" >> /etc/apt/sources.list

Puis ajouter la signature du paquet

#  wget -O - https://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add -

Installer ensuite les paquets

# apt install megacli megaclisas-status

Les paquets sont aussi disponible en téléchargement

http://hwraid.le-vert.net/debian/pool-stretch/

 

Pour vérifier l’état d’un contrôleur et de ces disques rapidement, lancer la commande

# megaclisas-status
-- Controller information --
-- ID | H/W Model           | RAM    | Temp | BBU    | Firmware     
c0    | PERC 5/i Integrated | 256MB  | N/A  | Good   | FW: 5.1.1-0040 

-- Array information --
-- ID | Type   |    Size |  Strpsz | Flags | DskCache |   Status |  OS Path | CacheCade |InProgress   
c0u0  | RAID-1 |     68G |   64 KB | RA,WT |  Default |  Optimal | /dev/sda | None      |None         
c0u1  | RAID-1 |    136G |   64 KB | RA,WT |  Default |  Optimal | /dev/sdb | None      |None         

-- Disk information --
-- ID  | Type | Drive Model                      | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u0p0 | HDD  | SEAGATE ST973402SS S2073NP1YEQ6  | 67. Gb   | Online, Spun Up | Unknown  | 24C  | [8:2]    | 2       
c0u0p1 | HDD  | SEAGATE ST973402SS S2073NP1YVB1  | 67. Gb   | Online, Spun Up | Unknown  | 24C  | [8:3]    | 3       
c0u1p0 | HDD  | SEAGATE ST9146852SS HT043TB10J0G | 136.1 Gb | Online, Spun Up | Unknown  | 24C  | [8:0]    | 0       
c0u1p1 | HDD  | SEAGATE ST9146852SS HT043TB10C2P | 136.1 Gb | Online, Spun Up | Unknown  | 25C  | [8:1]    | 1

 


Verifier le nombres de disques physiques

# megacli -PDGetNum -a0
Number of Physical Drives on Adapter 0: 4

Verifier le nombres de raid (logique)

# megacli -LDGetNum -a0
Number of Virtual Drives Configured on Adapter 0: 2

Detail des 2 raids (-L0 ou -L1 pour 1 raid logique)

# megacli -LDInfo -Lall -a0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 67.75 GB
Sector Size : 512
Mirror Data : 67.75 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Is VD Cached: No
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 136.125 GB
Sector Size : 512
Mirror Data : 136.125 GB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 2
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Is VD Cached: No

Liste et détail des disques présents (détail du disque 4)

# megacli -pdlist -a0
Enclosure Device ID: 8
Slot Number: 3
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 3
WWN: 
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 68.366 GB [0x88bb93a Sectors]
Non Coerced Size: 67.866 GB [0x87bb93a Sectors]
Coerced Size: 67.75 GB [0x8780000 Sectors]
Sector Size: 0
Firmware state: Online, Spun Up
Device Firmware Level: S207
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c50006f5a051
SAS Address(1): 0x0
Connected Port Number: 3 
Inquiry Data: SEAGATE ST973402SS S2073NP1YVB1 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: Unknown 
Link Speed: Unknown 
Media Type: Hard Disk Device
Drive Temperature :23C (73.40 F)
PI Eligibility: No 
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: Unknown 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No

/!\ Il faut renseigner les IDs Enclosure et Slot, pour les trouver se referrer a l’argument -pdlist
Dans mon exemple pour le disque 4
Enclosure Device ID: 8
Slot Number: 3Activer un disque

# megacli -PDOnline -PhysDrv [Enclosure:Slot] -a0
EnclId-8 SlotId-3 state changed to OnLine.

Désactiver un disque

# megacli -PDOffline -PhysDrv [Enclosure:Slot] -a0
Adapter: 0: EnclId-8 SlotId-3 state changed to OffLine.

Voir l’état de la reconstruction d’un disque membre d’un raid

# megacli -pdrbld -showprog -physdrv[Enclosure:Slot] -a0
Rebuild Progress on Device at Enclosure 8, Slot 3 Completed 47% in 9 Minutes.

Vérifier l’état d’un disque

# smartctl -d megaraid,1 -a /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-8-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST9146852SS
Revision: HT04
User Capacity: 146 815 733 760 bytes [146 GB]
Logical block size: 512 bytes
Rotation Rate: 15015 rpm
Logical Unit id: 0x5000c5001cb9ba2f
Serial number: 3TB10C2P
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sat Sep 8 23:02:50 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature: 25 C
Drive Trip Temperature: 68 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
Blocks sent to initiator = 463450516
Blocks received from initiator = 3200521919
Blocks read from cache and sent to initiator = 137178771
Number of read and write commands whose size <= segment size = 77707674
Number of read and write commands whose size > segment size = 189140

Vendor (Seagate/Hitachi) factory information
number of hours powered up = 3533,52
number of minutes until next internal SMART test = 13

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 45652222 0 0 45652222 45652222 1460,114 0
write: 0 0 0 0 0 1658,786 0
verify: 19271244 0 0 19271244 19271244 3127,975 0

Non-medium error count: 1

No self-tests have been logged

Détail de la carte raid

# megacli -adpallinfo -aALL
Adapter #0
===================================

Versions
================
Product Name : PERC 5/i Integrated
Serial No : 12345
FW Package Build: 5.1.1-0040

Image Versions in Flash:
================
Boot Block Version : R.2.3.12
BIOS Version : MT28
MPT Version : MPTFW-00.10.47.00-IT
FW Version : 1.03.10-0216
WebBIOS Version : 1.03-04
Ctrl-R Version : 1.04-017A

Settings
================
Current Time : 16:3:17 9/9, 2018
Predictive Fail Poll Interval : 300sec
Interrupt Throttle Active Count : 16
Interrupt Throttle Completion : 50us
Rebuild Rate : 30%
PR Rate : 30%
BGI Rate : 30%
Check Consistency Rate : 30%
Reconstruction Rate : 30%
Cache Flush Interval : 4s
Max Drives to Spinup at One Time : 2
Delay Among Spinup Groups : 12s
Physical Drive Coercion Mode : 128MB
Cluster Mode : Disabled
Alarm : Disabled
Auto Rebuild : Enabled
Battery Warning : Enabled
Ecc Bucket Size : 15
Ecc Bucket Leak Rate : 1440 Minutes
Restore HotSpare on Insertion : Disabled
Expose Enclosure Devices : Disabled
Maintain PD Fail History : Disabled
Host Request Reordering : Enabled
Auto Detect BackPlane Enabled : SGPIO/i2c SEP
Load Balance Mode : Auto
Use FDE Only : No
Security Key Assigned : No
Security Key Failed : No
Security Key Not Backedup : No
Default LD PowerSave Policy : Controller Defined
Maximum number of direct attached drives to spin up in 1 min : 0
Auto Enhanced Import : No
Any Offline VD Cache Preserved : No
Allow Boot with Preserved Cache : No
Disable Online Controller Reset : No
PFK in NVRAM : No
Use disk activity for locate : No
POST delay : 90 seconds
BIOS Error Handling : Stop On Errors
Current Boot Mode :Normal

Capabilities
================
RAID Level Supported : RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, SRL 3 supported
Supported Drives : SAS, SATA

Device Present
================
Virtual Drives : 2
Degraded : 0
Offline : 0
Physical Devices : 5
Disks : 4
Critical Disks : 0
Failed Disks : 0

Faire un tcheck des logs de la carte

# megacli -AdpAlILog -aALL

 

Cette article est une ébauche de l’excellent article d’admin-linux.

J’ai testé les différentes commandes sur mon serveur, ce qui ma permis de mieux comprendre ces outils.


J’en ai fait un screenshot de la page ici si le site venait a disparaître !

 

 

No votes yet.
Please wait...

2 thoughts to “[Monitoring] Superviser controleur raid et disque dur”

  1. Au travail, le plugin check_cciss donne de très bon résultats sur nos serveurs HP.

    Je le fais tourner avec le check_by_ssh pour les serveurs qui sont un peu plus sécurisés..

    https://github.com/lokeshreddy4u/nagios/blob/master/libexec/check_cciss.sh

    No votes yet.
    Please wait...
    1. Hello,

      Merci pour cette précision même si au taf je n’ai que du Dell :/
      J’ai un vieux hp proliant a remettre en état, ça sera l’occasion d’essayer le script.

      No votes yet.
      Please wait...

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.