基于docker的prometheus+grafana+altermanager+prometheus-webhook-dingtalk钉钉报警

帅气的搬砖工 2024-08-20 16:33:02 阅读 65

一、各软件功能简介

prometheus:Prometheus(是由go语言(golang)开发)是一套开源的监控&报警&时间序列数 据库的组合。主要优点:外部依赖安装使用超简单、系统集成 多等

grafana:Grafana 是一款采用 go 语言编写的开源应用,主要用于大规模指标数据的 可视化展现,是网络架构和应用分析中最流行的时序数据展示工具,目前已经支 持绝大部分常用的时序数据库。主要优点:展示方便、数据源种类多、内置通知提醒功能

altermanager:AlterManager是一个基于开源框架Prometheus和Grafana的告警管理系统。它可以帮助我们轻松地实现监控告警功能,并支持多种告警方式。主要优点:告警方式多样

prometheus-webhook-dingtalk:prometheus-webhook-dingtalk 是一个用于将 Prometheus 告警通知发送到钉钉群组的 webhook 模块。它提供了一种与钉钉无缝集成的方式,使监控团队能够及时接收和处理告警通知,并进行有效的团队协作。

二、基础准备工作

docker安装:这个太简单了我就不介绍了,后期有时间再出一篇专门安装docker的

服务器:由于我是测试就是用了一台机器centos7 ip:10.10.30.34(后期中配置文件需要用到)

安装目录:创建目录<code>mkdir -p /data/prometheus 切换目录cd /data/prometheus/,主要用于各种软件的配置和数据文件存放

三、prometheus准备工作

3.1、prometheus配置文件和数据文件

数据文件准备:创建目录mkdir -p prometheus/data 修改权限chmod 777 prometheus/data实际应用可以根据具体使用情况设置权限

配置文件准备:vim prometheus/prometheus.yml

global:

scrape_interval: 15s # 多久收集一次数据

evaluation_interval: 15s # 多久评估一次规则

scrape_timeout: 10s # 每次收集数据的超时时间

scrape_configs: #收集数据配置列表

- job_name: prometheus # 必须配置, 自动附加的job labels, 必须唯一

static_configs:

- targets: ['10.10.30.34:9090'] # 指定prometheusip端口

labels:

instance: prometheus #标签

- job_name: ehospital-exploit-database #监控客户端

static_configs:

- targets: ['10.10.30.34:9100']

labels:

instance: eehospital-exploit-database

alerting: #Alertmanager相关的配置

alertmanagers:

- static_configs:

- targets:

- 10.10.30.34:9093 #指定告警模块

rule_files: #告警规则文件, 可以使用通配符

- "/etc/prometheus/rules/*.yml"

3.2、prometheus告警配置

创建目录:mkdir rules 用于存放告警和触发文件

通用规则:vim rules/alert-rules.yml

groups:

- name: prometheus-alert

rules:

- alert: prometheus-down

expr: prometheus:up == 0

for: 1m

labels:

severity: 'critical'

annotations:

summary: "instance: { -- -->{ $labels.instance }} 宕机了"

description: "instance: { { $labels.instance }} \n- job: { { $labels.job }} 关机了, 时间已经1分钟了。"

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-cpu-high

expr: prometheus:cpu:total:percent > 80

for: 3m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} cpu 使用率高于 { { $value }}"

description: "instance: { { $labels.instance }} \n- job: { { $labels.job }} CPU使用率已经持续一分钟高过80% 。"

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-cpu-iowait-high

expr: prometheus:cpu:iowait:percent >= 12

for: 3m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} cpu iowait 使用率高于 { { $value }}"

description: "instance: { { $labels.instance }} \n- job: { { $labels.job }} cpu iowait使用率已经持续三分钟高过12%"

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-load-load1-high

expr: (prometheus:load:load1) > (prometheus:cpu:count) * 1.2

for: 3m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} load1 使用率高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-memory-high

expr: prometheus:memory:used:percent > 85

for: 3m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} memory 使用率高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-disk-high

expr: prometheus:disk:used:percent > 80

for: 10m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} disk 使用率高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-disk-read:count-high

expr: prometheus:disk:read:count:rate > 2000

for: 2m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} iops read 使用率高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-disk-write-count-high

expr: prometheus:disk:write:count:rate > 2000

for: 2m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} iops write 使用率高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-disk-read-mb-high

expr: prometheus:disk:read:mb:rate > 60

for: 2m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} 读取字节数 高于 { { $value }}"

description: ""

instance: "{ { $labels.instance }}"

value: "{ { $value }}"

- alert: prometheus-disk-write-mb-high

expr: prometheus:disk:write:mb:rate > 60

for: 2m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} 写入字节数 高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-filefd-allocated-percent-high

expr: prometheus:filefd_allocated:percent > 80

for: 10m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} 打开文件描述符 高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-network-netin-error-rate-high

expr: prometheus:network:netin:error:rate > 4

for: 1m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} 包进入的错误速率 高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-network-netin-packet-rate-high

expr: prometheus:network:netin:packet:rate > 35000

for: 1m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} 包进入速率 高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-network-netout-packet-rate-high

expr: prometheus:network:netout:packet:rate > 35000

for: 1m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} 包流出速率 高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-network-tcp-total-count-high

expr: prometheus:network:tcp:total:count > 40000

for: 1m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} tcp连接数量 高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-process-zoom-total-count-high

expr: prometheus:process:zoom:total:count > 10

for: 10m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} 僵死进程数量 高于 { { $value }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

- alert: prometheus-time-offset-high

expr: prometheus:time:offset > 0.03

for: 2m

labels:

severity: info

annotations:

summary: "instance: { { $labels.instance }} { { $labels.desc }} { { $value }} { { $labels.unit }}"

description: ""

value: "{ { $value }}"

instance: "{ { $labels.instance }}"

细化规则:vim rules/record-rules.yml

groups:

- name: prometheus-record

rules:

- expr: up{ job!="prometheus"}

record: prometheus:up

labels:

desc: "节点是否在线, 在线1,不在线0"

unit: " "

job: "prometheus"

- expr: time() - node_boot_time_seconds{ }

record: prometheus:node_uptime

labels:

desc: "节点的运行时间"

unit: "s"

job: "prometheus"

- expr: (1 - avg by (environment,instance) (irate(node_cpu_seconds_total{ job!="prometheus",mode="idle"}[5m]))) * 100code>

record: prometheus:cpu:total:percent

labels:

desc: "节点的cpu总消耗百分比"

unit: "%"

job: "prometheus"

- expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{ -- -->job!="prometheus",mode="idle"}[5m]))) * 100code>

record: prometheus:cpu:idle:percent

labels:

desc: "节点的cpu idle百分比"

unit: "%"

job: "prometheus"

- expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{ -- -->job!="prometheus",mode="iowait"}[5m]))) * 100code>

record: prometheus:cpu:iowait:percent

labels:

desc: "节点的cpu iowait百分比"

unit: "%"

job: "prometheus"

- expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{ -- -->job!="prometheus",mode="system"}[5m]))) * 100code>

record: prometheus:cpu:system:percent

labels:

desc: "节点的cpu system百分比"

unit: "%"

job: "prometheus"

- expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{ -- -->job!="prometheus",mode="user"}[5m]))) * 100code>

record: prometheus:cpu:user:percent

labels:

desc: "节点的cpu user百分比"

unit: "%"

job: "prometheus"

- expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{ -- -->job!="prometheus",mode=~"softirq|nice|irq|steal"}[5m]))) * 100

record: prometheus:cpu:other:percent

labels:

desc: "节点的cpu 其他的百分比"

unit: "%"

job: "prometheus"

- expr: node_memory_MemTotal_bytes{ job!="prometheus"}

record: prometheus:memory:total

labels:

desc: "节点的内存总量"

unit: byte

job: "prometheus"

- expr: node_memory_MemFree_bytes{ job!="prometheus"}

record: prometheus:memory:free

labels:

desc: "节点的剩余内存量"

unit: byte

job: "prometheus"

- expr: node_memory_MemTotal_bytes{ job!="prometheus"} - node_memory_MemFree_bytes{ job!="prometheus"}

record: prometheus:memory:used

labels:

desc: "节点的已使用内存量"

unit: byte

job: "prometheus"

- expr: node_memory_MemTotal_bytes{ job!="prometheus"} - node_memory_MemAvailable_bytes{ job!="prometheus"}

record: prometheus:memory:actualused

labels:

desc: "节点用户实际使用的内存量"

unit: byte

job: "prometheus"

- expr: (1-(node_memory_MemAvailable_bytes{ job!="prometheus"} / (node_memory_MemTotal_bytes{ job!="prometheus"})))* 100

record: prometheus:memory:used:percent

labels:

desc: "节点的内存使用百分比"

unit: "%"

job: "prometheus"

- expr: ((node_memory_MemAvailable_bytes{job!="prometheus"} / (node_memory_MemTotal_bytes{job!="prometheus"})))* 100

record: prometheus:memory:free:percent

labels:

desc: "节点的内存剩余百分比"

unit: "%"

job: "prometheus"

- expr: sum by (instance) (node_load1{ job!="prometheus"})

record: prometheus:load:load1

labels:

desc: "系统1分钟负载"

unit: " "

job: "prometheus"

- expr: sum by (instance) (node_load5{ job!="prometheus"})

record: prometheus:load:load5

labels:

desc: "系统5分钟负载"

unit: " "

job: "prometheus"

- expr: sum by (instance) (node_load15{ job!="prometheus"})

record: prometheus:load:load15

labels:

desc: "系统15分钟负载"

unit: " "

job: "prometheus"

- expr: node_filesystem_size_bytes{ job!="prometheus" ,fstype=~"ext4|xfs"}

record: prometheus:disk:usage:total

labels:

desc: "节点的磁盘总量"

unit: byte

job: "prometheus"

- expr: node_filesystem_avail_bytes{ job!="prometheus",fstype=~"ext4|xfs"}

record: prometheus:disk:usage:free

labels:

desc: "节点的磁盘剩余空间"

unit: byte

job: "prometheus"

- expr: node_filesystem_size_bytes{ job!="prometheus",fstype=~"ext4|xfs"} - node_filesystem_avail_bytes{ job!="prometheus",fstype=~"ext4|xfs"}

record: prometheus:disk:usage:used

labels:

desc: "节点的磁盘使用的空间"

unit: byte

job: "prometheus"

- expr: (1 - node_filesystem_avail_bytes{ job!="prometheus",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{ job!="prometheus",fstype=~"ext4|xfs"}) * 100

record: prometheus:disk:used:percent

labels:

desc: "节点的磁盘的使用百分比"

unit: "%"

job: "prometheus"

- expr: irate(node_disk_reads_completed_total{ job!="prometheus"}[1m])

record: prometheus:disk:read:count:rate

labels:

desc: "节点的磁盘读取速率"

unit: "次/秒"

job: "prometheus"

- expr: irate(node_disk_writes_completed_total{ job!="prometheus"}[1m])

record: prometheus:disk:write:count:rate

labels:

desc: "节点的磁盘写入速率"

unit: "次/秒"

job: "prometheus"

- expr: (irate(node_disk_written_bytes_total{ job!="prometheus"}[1m]))/1024/1024

record: prometheus:disk:read:mb:rate

labels:

desc: "节点的设备读取MB速率"

unit: "MB/s"

job: "prometheus"

- expr: (irate(node_disk_read_bytes_total{ job!="prometheus"}[1m]))/1024/1024

record: prometheus:disk:write:mb:rate

labels:

desc: "节点的设备写入MB速率"

unit: "MB/s"

job: "prometheus"

- expr: (1 -node_filesystem_files_free{ job!="prometheus",fstype=~"ext4|xfs"} / node_filesystem_files{ job!="prometheus",fstype=~"ext4|xfs"}) * 100

record: prometheus:filesystem:used:percent

labels:

desc: "节点的inode的剩余可用的百分比"

unit: "%"

job: "prometheus"

- expr: node_filefd_allocated{ job!="prometheus"}

record: prometheus:filefd_allocated:count

labels:

desc: "节点的文件描述符打开个数"

unit: "%"

job: "prometheus"

- expr: node_filefd_allocated{ job!="prometheus"}/node_filefd_maximum{ job!="prometheus"} * 100

record: prometheus:filefd_allocated:percent

labels:

desc: "节点的文件描述符打开百分比"

unit: "%"

job: "prometheus"

- expr: avg by (environment,instance,device) (irate(node_network_receive_bytes_total{ device=~"eth0|eth1|ens33|ens37"}[1m]))

record: prometheus:network:netin:bit:rate

labels:

desc: "节点网卡eth0每秒接收的比特数"

unit: "bit/s"

job: "prometheus"

- expr: avg by (environment,instance,device) (irate(node_network_transmit_bytes_total{ device=~"eth0|eth1|ens33|ens37"}[1m]))

record: prometheus:network:netout:bit:rate

labels:

desc: "节点网卡eth0每秒发送的比特数"

unit: "bit/s"

job: "prometheus"

- expr: avg by (environment,instance,device) (irate(node_network_receive_packets_total{ device=~"eth0|eth1|ens33|ens37"}[1m]))

record: prometheus:network:netin:packet:rate

labels:

desc: "节点网卡每秒接收的数据包个数"

unit: "个/秒"

job: "prometheus"

- expr: avg by (environment,instance,device) (irate(node_network_transmit_packets_total{ device=~"eth0|eth1|ens33|ens37"}[1m]))

record: prometheus:network:netout:packet:rate

labels:

desc: "节点网卡发送的数据包个数"

unit: "个/秒"

job: "prometheus"

- expr: avg by (environment,instance,device) (irate(node_network_receive_errs_total{ device=~"eth0|eth1|ens33|ens37"}[1m]))

record: prometheus:network:netin:error:rate

labels:

desc: "节点设备驱动器检测到的接收错误包的数量"

unit: "个/秒"

job: "prometheus"

- expr: avg by (environment,instance,device) (irate(node_network_transmit_errs_total{ device=~"eth0|eth1|ens33|ens37"}[1m]))

record: prometheus:network:netout:error:rate

labels:

desc: "节点设备驱动器检测到的发送错误包的数量"

unit: "个/秒"

job: "prometheus"

- expr: node_tcp_connection_states{ job!="prometheus", state="established"}code>

record: prometheus:network:tcp:established:count

labels:

desc: "节点当前established的个数"

unit: "个"

job: "prometheus"

- expr: node_tcp_connection_states{ -- -->job!="prometheus", state="time_wait"}code>

record: prometheus:network:tcp:timewait:count

labels:

desc: "节点timewait的连接数"

unit: "个"

job: "prometheus"

- expr: sum by (environment,instance) (node_tcp_connection_states{ -- -->job!="prometheus"})

record: prometheus:network:tcp:total:count

labels:

desc: "节点tcp连接总数"

unit: "个"

job: "prometheus"

四、grafana配置

创建目录:mkdir -p grafana/grafana-storage

修改权限:chmod 777 grafana/grafana-storage

grafana.ini准备:先启动一个grafana容器docker run -d --name=grafana -p 3000:3000 grafana/grafana,然后拷贝文件docker cp 93ac9e93e97a:/etc/grafana/grafana.ini ./grafana/

五、alertmanager配置

创建文件:mkdir alert

配置文件:vim alert/alertmanager.yml

route:

group_by: ['dingding'] #根据告警规则名进行分组

group_wait: 30s #在组内等待配置时间,如组内30s出现同一报警,在一个组内出现

group_interval: 1h #告警频率,一条告警消息发送后,等待1h发送第二组报警

repeat_interval: 1h #报警间隔时间,如果1h内未修复,重新发送告警

receiver: 'dingding.webhook1'

routes:

- receiver: 'dingding.webhook1'

match_re:

alertname: ".*"

receivers:

- name: 'dingding.webhook1' #可设置多个接收方

webhook_configs:

- url: 'http://10.10.30.34:8060/dingtalk/webhook1/send'

send_resolved: true #恢复后收到告警

inhibit_rules:

- source_match: #配置了仰制告警

severity: 'critical'

target_match:

severity: 'warning'

equal: ['alertname', 'dev', 'instance']

六、webhook配置

创建目录:mkdir webhook

配置文件:vim webhook/config.yml

## Request timeout

# timeout: 5s

## Uncomment following line in order to write template from scratch (be careful!)

#no_builtin_template: true

## Customizable templates path

templates:

# - contrib/templates/legacy/template.tmpl

- /etc/prometheus-webhook-dingtalk/templates/default.tmpl

## You can also override default template using `default_message`

## The following example to use the 'legacy' template from v0.3.0

#default_message:

# title: '{ { template "legacy.title" . }}'

# text: '{ { template "legacy.content" . }}'

## Targets, previously was known as "profiles"

targets:

webhook1:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxx #钉钉机器人路径

# secret for signature

secret: SEC74939daa62xxx.xxxxxx #钉钉机器人加密标签

webhook2:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx

webhook_legacy:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx

# Customize template content

message:

# Use legacy template

title: '{ { template "legacy.title" . }}'

text: '{ { template "legacy.content" . }}'

webhook_mention_all:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx

mention:

all: true

webhook_mention_users:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx

mention:

mobiles: ['156xxxx8827', '189xxxx8325']

prometheus-webhook-dingtalk模板

创建目录:mkdir webhook/template

模板创建:vim webhook/template/default.tmpl

{ { define "__subject" }}

[{ { .Status | toUpper }}{ { if eq .Status "firing" }}:{ { .Alerts.Firing | len }}{ { end }}]

{ { end }}

{ { define "__alert_list" }}{ { range . }}

---

{ { if .Labels.owner }}@{ { .Labels.owner }}{ { end }}

**告警主题**: { { .Annotations.summary }}

**告警类型**: { { .Labels.alertname }}

**告警级别**: { { .Labels.severity }}

**告警主机**: { { .Labels.instance }}

**告警信息**: { { index .Annotations "description" }}

**告警时间**: { { dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

{ { end }}{ { end }}

{ { define "__resolved_list" }}{ { range . }}

---

{ { if .Labels.owner }}@{ { .Labels.owner }}{ { end }}

**告警主题**: { { .Annotations.summary }}

**告警类型**: { { .Labels.alertname }}

**告警级别**: { { .Labels.severity }}

**告警主机**: { { .Labels.instance }}

**告警信息**: { { index .Annotations "description" }}

**告警时间**: { { dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

**恢复时间**: { { dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}

{ { end }}{ { end }}

{ { define "default.title" }}

{ { template "__subject" . }}

{ { end }}

{ { define "default.content" }}

{ { if gt (len .Alerts.Firing) 0 }}

**====侦测到{ { .Alerts.Firing | len }}个故障====**

{ { template "__alert_list" .Alerts.Firing }}

---

{ { end }}

{ { if gt (len .Alerts.Resolved) 0 }}

**====恢复{ { .Alerts.Resolved | len }}个故障====**

{ { template "__resolved_list" .Alerts.Resolved }}

{ { end }}

{ { end }}

{ { /* Following names for compatibility */}}

{ { define "ding.link.title" }}{ { template "default.title" . }}{ { end }}

{ { define "ding.link.content" }}{ { template "default.content" . }}{ { end }}

{ { template "default.title" . }}

{ { template "default.content" . }}

七、docker实例创建

创建yml文件:vim docker-compose.yml

version: '3.2'

services:

prometheus:

image: prom/prometheus

restart: "always"

ports:

- 9090:9090

container_name: "prometheus"

volumes:

- "./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml"

- "./rules:/etc/prometheus/rules"

- "./prometheus/data:/prometheus"

command:

- '--config.file=/etc/prometheus/prometheus.yml' # 设置yml路径 跟上面挂载对应

- '--storage.tsdb.path=/prometheus' #设置数据路径 跟上面挂载对应

alertmanager:

image: prom/alertmanager:latest

restart: "always"

ports:

- 9093:9093

container_name: "alertmanager"

volumes:

- "./alert/alertmanager.yml:/etc/alertmanager/alertmanager.yml"

command:

- '--config.file=/etc/alertmanager/alertmanager.yml' # 设置yml路径 跟上面挂载对应

webhook:

image: timonwong/prometheus-webhook-dingtalk

restart: "always"

ports:

- 8060:8060

container_name: "webhook" #token指定钉钉

volumes:

- "./webhook/config.yml:/etc/prometheus-webhook-dingtalk/config.yml"

- "./webhook/template/default.tmpl:/etc/prometheus-webhook-dingtalk/templates/default.tmpl"

command:

- '--config.file=/etc/prometheus-webhook-dingtalk/config.yml' # 设置yml路径 跟上面挂载对应

grafana:

image: grafana/grafana

restart: "always"

ports:

- 3000:3000

container_name: "grafana"

volumes:

- "./grafana/grafana.ini:/etc/grafana/grafana.ini" #配置文件自行拷贝出来

- "./grafana/grafana-storage:/var/lib/grafana"

创建docker:docker-compose -f docker-compose.yml up -d

八、钉钉添加机器人

自己创建一个群呗,至少两个人才能建群呀!然后按下面图片操作就行了,反正点点就行就不细说了

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

后面就是啥名字、链接、加签啥的我懒得打马了就不截图了,还不会的就呵呵了

九、验证

安都安装好了那不得验证一下啊!稍微懂点的人应该已经知道了,我配置了监控服务器的指标,但是我没有启动node-export,那肯定会报警呀!没错报警信息如下啊!

在这里插入图片描述

安装node-exporter:<code>vim node-exporter-compose.yml

version: '3.2'

services:

node-exporter:

image: prom/node-exporter

restart: "always"

ports:

- 9100:9100

container_name: "node-exporter"

volumes:

- "/proc:/host/proc:ro"

- "/sys:/host/sys:ro"

- "/:/rootfs:ro"

启动node-exporter:docker-compose -f node-exporter-compose.yml up -d

这不启动了节点吗?那肯定有修复告警呀!没错,收到了,只是我设置了1h后才再告警收到的慢了点啊!大家可以根据需求自己设置时间啊!

在这里插入图片描述

ps:水平高的大神自己看官方文档去整啊!小弟这给大家参考参考就行了啊!加油!!!



声明

本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。