Article original Publié le : 06 décembre 2021 Mise à jour le : – |
Contexte :
L’architecture mise en place est basé sur un agent telegraf qui push a influxdb et prometheus pull cet agent telegraf. Grafana sert a la métrologie
Après avoir agrégé le code dans le client telegraf, les tests ont été concluants, mais finalement il n’y a pas de nécessité à conserver les métrics dans influxdb, elles servent juste à la supervision.
Du coup on utilisera pas telegraf, on fait appel directement à une blackbox dans prometheus, on obtient le même résultat, récupérer les métrics d’un état raid, disque physique, battery …
Le code ajouté au playbook existant
Le projet
https://github.com/prometheus/snmp_exporter
Variable par défaut
1 |
$ vim main.yml |
1 |
snmp_exporter_version: 0.20.0 |
Les taches
1 |
$ vim main.yml |
1 2 3 4 5 6 7 |
- include_tasks: install.yml tags: - snmp_exporter - include_tasks: configure.yml tags: - snmp_exporter |
1 |
$ vim configure.yml |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
- name: Create systemd service unit template: src: snmp_exporter.service.j2 dest: /etc/systemd/system/snmp_exporter.service owner: root group: root mode: "0640" notify: - restart snmp_exporter - name: snmp_exporter on /etc file: path: /etc/snmp_exporter state: directory owner: root group: snmp_exporter mode: "0750" - name: Copy file snmp.yml copy: src: snmp.yml dest: /etc/snmp_exporter/ owner: snmp_exporter group: snmp_exporter mode: '0644' - name: Ensure snmp_exporter service is started and enabled systemd: daemon_reload: true name: snmp_exporter state: started enabled: true |
1 |
$ vim install.yml |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
- name: Create snmp_exporter system group group: name: snmp_exporter system: true state: present - name: Create snmp_exporter system user user: name: snmp_exporter system: true shell: "/usr/sbin/nologin" group: snmp_exporter createhome: false - name: Download snmp_exporter get_url: url: "https://github.com/prometheus/snmp_exporter/releases/download/v{{ snmp_exporter_version }}/snmp_exporter-{{ snmp_exporter_version }}.linux-amd64.tar.gz" dest: /tmp/ checksum: "sha256:https://github.com/prometheus/snmp_exporter/releases/download/v{{ snmp_exporter_version }}/sha256sums.txt" - name: Unarchive snmp_exporter unarchive: src: "/tmp/snmp_exporter-{{ snmp_exporter_version }}.linux-amd64.tar.gz" dest: /tmp remote_src: yes - name: copy snmp_exporter binary copy: remote_src: yes src: "/tmp/snmp_exporter-{{ snmp_exporter_version }}.linux-amd64/snmp_exporter" dest: /usr/local/bin/snmp_exporter owner: snmp_exporter group: snmp_exporter mode: "0550" |
Handlers
1 |
$ vim main.yml |
1 2 3 4 5 6 |
- name: restart snmp_exporter become: true systemd: daemon_reload: true name: snmp_exporter state: restarted |
Les métrics iDrac
Le minimum, il faut penser a ajouter les sondes pour la memoire, cpu, alimentation …
1 |
$ vim snmp.yml |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
dell_idrac: walk: - 1.3.6.1.4.1.674.10892.5.5.1.20.140.1.1.20 - 1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.24 - 1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.4 - 1.3.6.1.4.1.674.10892.5.5.1.20.130.15.1.4 - 1.3.6.1.4.1.674.10892.5.2.1 - 1.3.6.1.4.1.674.10892.5.2.3 - 1.3.6.1.4.1.674.10892.5.2.4 metrics: #OID for your raid components status - name: idrac_virtualDiskComponentStatus oid: 1.3.6.1.4.1.674.10892.5.5.1.20.140.1.1.20 type: gauge help: The status of the virtual disk itself without the propagation of any contained component status - 1.3.6.1.4.1.674.10892.5.5.1.20.140.1.1.20 indexes: - labelname: virtualDiskNumber type: gauge #OID for your physical disk status - name: idrac_physicalDiskComponentStatus oid: 1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.24 type: gauge help: The status of the physical disk itself without the propagation of any contained component status - 1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.24 indexes: - labelname: physicalDiskNumber type: gauge #OID for your physical disk status 2 - name: idrac_physicalDiskState oid: 1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.4 type: gauge help: The status of the physical disk itself without the propagation of any contained 2 component status - 1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.4 indexes: - labelname: physicalDiskState type: gauge #OID for battery raid status - name: idrac_batteryState oid: 1.3.6.1.4.1.674.10892.5.5.1.20.130.15.1.4 type: gauge help: The status of the battery raid component status - 1.3.6.1.4.1.674.10892.5.5.1.20.130.15.1.4 indexes: - labelname: batteryState type: gauge #Global System Health - name: idrac_globalSystemStatus oid: 1.3.6.1.4.1.674.10892.5.2.1 type: gauge help: This attribute defines the overall rollup status of all components in the system being monitored by the remote access card - 1.3.6.1.4.1.674.10892.5.2.1 #Global Storage Health - name: idrac_globalStorageStatus oid: 1.3.6.1.4.1.674.10892.5.2.3 type: gauge help: This attribute defines the overall storage status being monitored by the remote access card. - 1.3.6.1.4.1.674.10892.5.2.3 #Global Power System Health - name: idrac_systemPowerState oid: 1.3.6.1.4.1.674.10892.5.2.4 type: gauge help: This attribute defines the power state of the system. - 1.3.6.1.4.1.674.10892.5.2.4 |
Le template d’alerting
1 |
$ vim alert.rules.j2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
### Raid disk running status - alert: Raid_disk_running_status expr: idrac_virtualDiskComponentStatus != 3 for: 1m labels: resourceName: "Raid_disk_running_status" severity: testing service: "system" instanceName: "{{ $labels.instance }}" ### Physical disk running status - alert: Physical_disk_running_status expr: idrac_physicalDiskComponentStatus != 3 for: 1m labels: resourceName: "Physical_disk_running_status" severity: testing service: "system" instanceName: "{{ $labels.instance }}" ### Battery running status - alert: Battery_running_status expr: idrac_batteryState != 2 for: 1m labels: resourceName: "Battery_running_status" severity: testing service: "system" instanceName: "{{ $labels.instance }}" |
Le Job pour Prometheus
1 |
$ vim prometheus.yml.j2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
- job_name: 'snmp' static_configs: - targets: - 10.0.0.11 - 10.0.0.12 metrics_path: /snmp params: module: [dell_idrac] relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: localhost:9116 |
Un échantillon des métrics sous Prometheus, affiche un code erreur 5 pour deux disques HS sur le srv2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="16"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="17"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="18"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="2"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="3"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="4"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="5"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="6"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="7"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="8"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv1", instance="10.0.0.11", job="snmp_exporter", physicalDiskNumber="9"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="1"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="10"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="11"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="12"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="13"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="14"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="15"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="16"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="17"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="2"} 3 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="3"} 5 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="4"} 5 idrac_physicalDiskComponentStatus{ansible_hostname="srv2", instance="10.0.0.12", job="snmp_exporter", physicalDiskNumber="5"} 3 |
Telegraf
Pour Telegraf, le bout de code est similaire, la forme diffère
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
[[inputs.snmp]] agents = [ "udp://10.0.0.11:161" , "udp://10.0.0.12:161" ] version = 1 community = "public" name = "idrac-hosts" [[inputs.snmp.field]] name = "system-name" oid = ".1.3.6.1.2.1.1.5.0" is_tag = true [[inputs.snmp.field]] name = "idrac-url" oid = ".1.3.6.1.4.1.674.10892.5.1.1.6.0" [[inputs.snmp.field]] name = "system-servicetag" oid = ".1.3.6.1.4.1.674.10892.5.1.3.2.0" [[inputs.snmp.table]] name = "idrac-hosts" inherit_tags = [ "system-name" , "disks-name" ] [[inputs.snmp.table.field]] name = "raid-batterystate" oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.15.1.4" [[inputs.snmp.table.field]] name = "disks-state" oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.4" [[inputs.snmp.table.field]] name = "disks-name" oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.2" is_tag = true |
Source
https://www.dell.com/support/manuals/fr-fr/idrac7-8-lifecycle-controller-v2.40.40.40/snmp%20idracandcmc8.5/physical-disktable?guid=guid-bbcc3e20-a879-40d2-94c8-05bbdc7bd086&lang=en-us
https://github.com/prometheus/snmp_exporter
https://github.com/billykwooten/idrac_promethus_snmp_module