文章详情页

如何基于Python和Flask编写Prometheus监控

浏览：60日期：2022-07-04 13:48:55

介绍

Prometheus 的基本原理是通过 HTTP 周期性抓取被监控组件的状态。

任意组件只要提供对应的 HTTP 接口并且符合 Prometheus 定义的数据格式，就可以接入 Prometheus 监控。

Prometheus Server 负责定时在目标上抓取 metrics（指标）数据并保存到本地存储。它采用了一种 Pull（拉）的方式获取数据，不仅降低客户端的复杂度，客户端只需要采集数据，无需了解服务端情况，也让服务端可以更加方便地水平扩展。

如果监控数据达到告警阈值，Prometheus Server 会通过 HTTP 将告警发送到告警模块 alertmanger，通过告警的抑制后触发邮件或者 Webhook。Prometheus 支持 PromQL 提供多维度数据模型和灵活的查询，通过监控指标关联多个 tag 的方式，将监控数据进行任意维度的组合以及聚合。

在python中实现服务器端，对外提供接口。在Prometheus中配置请求网址，Prometheus会定期向该网址发起申请获取你想要返回的数据。

另外Prometheus提供4种类型Metrics：Counter, Gauge, Summary和Histogram。

准备

pip install flaskpip install prometheus_client

Counter

Counter可以增长，并且在程序重启的时候会被重设为0，常被用于访问量，任务个数，总处理时间，错误个数等只增不减的指标。

定义它需要2个参数，第一个是metrics的名字，第二个是metrics的描述信息：

c = Counter(’c1’, ’A counter’)

counter只能增加，所以只有一个方法：

def inc(self, amount=1): ’’’Increment counter by the given amount.’’’ if amount < 0: raise ValueError(’Counters can only be incremented by non-negative amounts.’) self._value.inc(amount)

测试示例：

import prometheus_clientfrom prometheus_client import Counterfrom prometheus_client.core import CollectorRegistryfrom flask import Response, Flaskapp = Flask(__name__)requests_total = Counter(’c1’,’A counter’)@app.route('/api/metrics/count/')def requests_count(): requests_total.inc(1) # requests_total.inc(2) return Response(prometheus_client.generate_latest(requests_total),mimetype='text/plain')if __name__ == '__main__': app.run(host='127.0.0.1',port=8081)

访问http://127.0.0.1:8081/api/metrics/count/：

# HELP c1_total A counter# TYPE c1_total counterc1_total 1.0# HELP c1_created A counter# TYPE c1_created gaugec1_created 1.6053265493727107e+09

HELP是c1的注释说明，创建Counter定义的。

TYPE是c1的类型说明。

c1_total为我们定义的指标输出：你会发现多了后缀_total,这是因为OpenMetrics与Prometheus文本格式之间的兼容性，OpenMetrics需要_total后缀。

gauge

gauge可增可减，可以任意设置。

比如可以设置当前的CPU温度，内存使用量，磁盘、网络流量等等。

定义和counter基本一样：

from prometheus_client import Gaugeg = Gauge(’my_inprogress_requests’, ’Description of gauge’)g.inc() # Increment by 1g.dec(10) # Decrement by given valueg.set(4.2) # Set to a given value

方法：

def inc(self, amount=1): ’’’Increment gauge by the given amount.’’’ self._value.inc(amount)def dec(self, amount=1): ’’’Decrement gauge by the given amount.’’’ self._value.inc(-amount) def set(self, value): ’’’Set gauge to the given value.’’’ self._value.set(float(value))

测试示例：

import randomimport prometheus_clientfrom prometheus_client import Gaugefrom prometheus_client.core import CollectorRegistryfrom flask import Response, Flaskapp = Flask(__name__)random_value = Gauge('g1', ’A gauge’)@app.route('/api/metrics/gauge/')def r_value(): random_value.set(random.randint(0, 10)) return Response(prometheus_client.generate_latest(random_value), mimetype='text/plain')if __name__ == '__main__': app.run(host='127.0.0.1',port=8081)

访问http://127.0.0.1:8081/api/metrics/gauge/

# HELP g1 A gauge# TYPE g1 gaugeg1 5.0

LABELS的用法

使用labels来区分metric的特征，一个指标可以有其中一个label，也可以有多个label。

from prometheus_client import Counterc = Counter(’requests_total’, ’HTTP requests total’, [’method’, ’clientip’])c.labels(’get’, ’127.0.0.1’).inc()c.labels(’post’, ’192.168.0.1’).inc(3)c.labels(method='get', clientip='192.168.0.1').inc()

import randomimport prometheus_clientfrom prometheus_client import Gaugefrom flask import Response, Flaskapp = Flask(__name__)c = Gauge('c1', ’A counter’,[’method’,’clientip’])@app.route('/api/metrics/counter/')def r_value(): c.labels(method=’get’,clientip=’192.168.0.%d’ % random.randint(1,10)).inc() return Response(prometheus_client.generate_latest(c), mimetype='text/plain')if __name__ == '__main__': app.run(host='127.0.0.1',port=8081)

连续访问9次http://127.0.0.1:8081/api/metrics/counter/：

# HELP c1 A counter# TYPE c1 gaugec1{clientip='192.168.0.7',method='get'} 2.0c1{clientip='192.168.0.1',method='get'} 1.0c1{clientip='192.168.0.8',method='get'} 1.0c1{clientip='192.168.0.5',method='get'} 2.0c1{clientip='192.168.0.4',method='get'} 1.0c1{clientip='192.168.0.10',method='get'} 1.0c1{clientip='192.168.0.2',method='get'} 1.0

histogram

这种主要用来统计百分位的，什么是百分位？英文叫做quantiles。

比如你有100条访问请求的耗时时间，把它们从小到大排序，第90个时间是200ms，那么我们可以说90%的请求都小于200ms，这也叫做”90分位是200ms”，能够反映出服务的基本质量。当然，也许第91个时间是2000ms，这就没法说了。

实际情况是，我们每天访问量至少几个亿，不可能把所有访问数据都存起来，然后排序找到90分位的时间是多少。因此，类似这种问题都采用了一些估算的算法来处理，不需要把所有数据都存下来，这里面数学原理比较高端，我们就直接看看prometheus的用法好了。

首先定义histogram：

h = Histogram(’hh’, ’A histogram’, buckets=(-5, 0, 5))

第一个是metrics的名字，第二个是描述，第三个是分桶设置，重点说一下buckets。

这里(-5,0,5)实际划分成了几种桶：(无穷小，-5]，（-5，0]，(0,5]，（5，无穷大）。

如果我们喂给它一个-8：

h.observe(8)

那么metrics会这样输出：

# HELP hh A histogram# TYPE hh histogramhh_bucket{le='-5.0'} 0.0hh_bucket{le='0.0'} 0.0hh_bucket{le='5.0'} 0.0hh_bucket{le='+Inf'} 1.0hh_count 1.0hh_sum 8.0

hh_sum记录了observe的总和，count记录了observe的次数，bucket就是各种桶了，le表示<=某值。

可见，值8<=无穷大，所以只有最后一个桶计数了1次（注意，桶只是计数，bucket作用相当于统计样本在不同区间的出现次数）。

bucket的划分需要我们根据数据的分布拍脑袋指定，合理的划分可以让promql估算百分位的时候更准确，我们使用histogram的时候只需要知道先分好桶，再不断的打点即可，最终百分位的计算可以基于histogram的原始数据完成。

测试示例：

import randomimport prometheus_clientfrom prometheus_client import Histogramfrom flask import Response, Flaskapp = Flask(__name__)h = Histogram('h1', ’A Histogram’, buckets=(-5, 0, 5))@app.route('/api/metrics/histogram/')def r_value(): h.observe(random.randint(-5, 5)) return Response(prometheus_client.generate_latest(h), mimetype='text/plain')if __name__ == '__main__': app.run(host='127.0.0.1',port=8081)

连续访问http://127.0.0.1:8081/api/metrics/histogram/：

# HELP h1 A Histogram# TYPE h1 histogramh1_bucket{le='-5.0'} 0.0h1_bucket{le='0.0'} 5.0h1_bucket{le='5.0'} 10.0h1_bucket{le='+Inf'} 10.0h1_count 10.0# HELP h1_created A Histogram# TYPE h1_created gaugeh1_created 1.6053319432993534e+09

summary

python客户端没有完整实现summary算法，这里不介绍。

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持好吧啦网。

Python 编程

上一条：通过Python pyecharts输出保存图片代码实例下一条：python 基于wx实现音乐播放

相关文章：

1. 快速解决ajax返回值给外部函数的问题2. 低版本IE正常运行HTML5+CSS3网站的3种解决方案3. ASP动态网页制作技术经验分享4. .NET 中配置从xml转向json方法示例详解5. ASP中解决“对象关闭时,不允许操作。”的诡异问题……6. 将properties文件的配置设置为整个Web应用的全局变量实现方法7. asp中response.write("中文")或者js中文乱码问题8. PHP字符串前后字符或空格删除方法介绍9. css进阶学习选择符10. 得到XML文档大小的方法

排行榜

					
					IntelliJ IDEA删除类的方法步骤
源码解读Spring-Integration执行过程
Docker部署ELK7.3.0日志收集服务最佳实践
Java模式设计之多态模式与多语言支持
vue路由切换时取消之前的所有请求操作
低版本IE正常运行HTML5+CSS3网站的3种解决方案
asp中response.write("中文")或者js中文乱码问题
.NET 中配置从xml转向json方法示例详解
得到XML文档大小的方法
快速解决ajax返回值给外部函数的问题
不使用XMLHttpRequest对象实现Ajax效果的方法小结
				

热门标签