跳到主要内容

搭建基础监控系统

前言

  • 适用场景:内网离线部署,只需要了解服务器基本情况,不需要告警系统。
  • 需要准备的安装包:

环境信息

系统版本IP应用应用版本说明
Debian 12 x86-64192.168.0.11prometheus2.45.2服务端
Debian 12 x86-64192.168.0.11node_exporter1.7.0系统监控客户端
Debian 12 x86-64192.168.0.11grafana10.2.3监控数据可视化

安装prometheus

  1. 解压和创建目录
mkdir -p /home/apps
tar xf prometheus-2.37.1.linux-amd64.tar.gz -C /home/apps
cd /home/apps
mv prometheus-2.37.1.linux-amd64 prometheus
rm -f prometheus-2.37.1.linux-amd64.tar.gz
cd prometheus
mkdir sd_configs data
  1. 编辑服务配置文件:vim /home/apps/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# 修改以下内容
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
file_sd_configs:
- files: ['/home/apps/prometheus/sd_configs/*.yml']
refresh_interval: 10s
  1. 启动。(启动脚本可参考"附录 - prometheus启动脚本")
nohup /home/apps/prometheus/prometheus \
--storage.tsdb.path=/home/apps/prometheus/data \
--config.file=/home/apps/prometheus/prometheus.yml \
--web.listen-address=:19090 \
--storage.tsdb.retention=15d > /dev/null 2>&1 &

参数说明:

  • -storage.tsdb.path:数据存储路径
  • -config.file:配置文件路径
  • -web.listen-address:服务监听端口
  • -storage.tsdb.retention:数据存储期限。这里设置为15天
  1. 编辑文件服务发现的配置文件:vim /home/web/prometheus/prometheus/sd_configs/nodes.yml
- targets: ['192.168.0.11:19091']
labels:
instance: 192.168.0.11
  1. Prometheus服务端配置完成

安装node_exporter

  1. 解压和修改目录名
tar xf node_exporter-1.3.1.linux-amd64.tar.gz -C /home/apps
cd /home/apps
mv node_exporter-1.3.1.linux-amd64 node_exporter
  1. 启动。(启动脚本可参考“附录 - node_exporter启动脚本”)
nohup /home/apps/node_exporter/node_exporter \
--collector.processes \
--web.listen-address 0.0.0.0:19091 > /dev/null 2>&1 &

参数说明:

  • -web.listen-address:监听19091端口
  • --collector.processes:收集进程相关指标

配置grafana

docker方式

  1. 使用docker load -i指令加载事先准备好的docker镜像
  2. 创建容器并运行:
docker run -d --name=grafa -p 3000:3000 grafana/grafana:latest
  1. 浏览器打开192.168.0.11:3000,默认账户名为admin,默认密码为admin

  2. 添加Prometheus数据源,地址为:192.168.0.11:19090

  3. 导入事先准备好的json

  4. grafana配置完成

二进制文件方式

  1. 官网下载压缩包并解压
tar xf grafana-10.2.3.linux-amd64.tar.gz
  1. 进入解压后的目录,如需修改端口号,可编辑conf/defaults.ini文件,找到参数http_port,修改其值,默认为3000。还有数据库连接、提交分析数据等配置可根据实际需求进行修改。
  2. 启动(启动脚本可参考“附录 - grafana启动脚本)
./bin/grafana server --config conf/defaults.ini

附录

prometheus启动脚本

#!/bin/bash

set -u

script_dir=$(cd $(dirname $0) && pwd)
port=19090
app_name="prometheus"

check_env() {
[ -d ${script_dir}/data ] || mkdir -p ${script_dir}/data
[ -d ${script_dir}/sd_configs ] || mkdir -p ${script_dir}/sd_configs
[ -d ${script_dir}/alertrules ] || mkdir -p ${script_dir}/alertrules

timeout 1 bash -c "cat < /dev/null > /dev/tcp/127.0.0.1/${port}" > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "port ${port} has been used"
exit 1
fi
}

start_app() {
check_env
nohup ${script_dir}/${app_name} \
--storage.tsdb.path=${script_dir}/data \
--config.file=${script_dir}/prometheus.yml \
--web.listen-address=:${port} \
--storage.tsdb.retention=7d > /dev/null 2>&1 &
echo "start ${app_name} called"
}

stop_app() {
local pid=$(ps -ef | grep -v grep | grep "${script_dir}/${app_name}" | awk '{print $2}')
if [ x"${pid}" == x ]; then
echo "${script_dir}/${app_name} is not runnning"
exit 0
fi
kill ${pid}
echo "stop ${app_name} called"
}

reload_app() {
local pid=$(ps -ef | grep -v grep | grep "${script_dir}/${app_name}" | awk '{print $2}')
if [ x"${pid}" == x ]; then
echo "${script_dir}/${app_name} is not runnning"
exit 0
fi
kill -HUP ${pid}
echo "reload ${app_name} called"
}

status_app() {
ps -ef | grep -v grep | grep "${script_dir}/${app_name}"
if [ $? -ne 0 ]; then
echo "${script_dir}/${app_name} is not runnning"
exit 0
fi
echo "status ${app_name} called"
}

main() {
case $1 in
start)
start_app
;;
stop)
stop_app
;;
reload)
reload_app
;;
status)
status_app
;;
*)
echo "Available options: start | stop | reload | status"
esac
}

main $@

node_exporter启动脚本

#!/bin/bash

set -u

script_dir=$(cd $(dirname $0) && pwd)
port=19091
app_name="node_exporter"

check_env() {
timeout 1 bash -c "cat < /dev/null > /dev/tcp/127.0.0.1/${port}" > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "port ${port} has been used"
exit 1
fi
}

start_app() {
check_env
nohup ${script_dir}/${app_name} \
--collector.systemd \
--collector.tcpstat \
--collector.sysctl \
--collector.processes \
--web.listen-address 0.0.0.0:${port} > /dev/null 2>&1 &
echo "start ${app_name} called"
}

stop_app() {
local pid=$(ps -ef | grep -v grep | grep "${script_dir}/${app_name}" | awk '{print $2}')
if [ x"${pid}" == x ]; then
echo "${script_dir}/${app_name} is not runnning"
exit 0
fi
kill ${pid}
echo "stop ${app_name} called"
}

status_app() {
ps -ef | grep -v grep | grep "${script_dir}/${app_name}"
if [ $? -ne 0 ]; then
echo "${script_dir}/${app_name} is not runnning"
exit 0
fi
echo "status ${app_name} called"
}

main() {
case $1 in
start)
start_app
;;
stop)
stop_app
;;
status)
status_app
;;
*)
echo "Available options: start | stop | status"
esac
}

main $@

grafana启动脚本

仅在oss-10.2.3版本测试可用,其它版本的启动命令可能不同。

#!/bin/bash

set -u
script_dir=$(cd $(dirname $0) && pwd)
app_name="grafana"

check_app() {
local pid=$(ps -ef | grep -v grep | grep "${script_dir}/bin/${app_name}" | awk '{print $2}')
if [ x"${pid}" == x ]; then
return 0
else
return ${pid}
fi
}

start_app(){
check_app
if [ $? -ne 0 ]; then
echo "${app_name} is running, pid is $?"
exit 1
fi
cd ${script_dir}
nohup ${script_dir}/bin/grafana server --config ${script_dir}/conf/defaults.ini > /dev/null 2>&1 &
echo "start ${app_name} called"
}

stop_app() {
pid=$(check_app)
if [ ${pid} -eq 0 ]; then
echo "${app_name} is not running"
exit 1
fi
kill ${pid}
echo "stop ${app_name} called"
}

status_app() {
ps -ef | grep -v grep | grep "${script_dir}/bin/${app_name}"
if [ $? -ne 0 ]; then
echo "${script_dir}/bin/${app_name} is not running"
fi
echo "status ${app_name} called"
}

main() {
case $1 in
start)
start_app
;;
stop)
stop_app
;;
status)
status_app
;;
*)
echo "Available options: start | stop | status"
esac
}

main $@