回目录 《Monitor》
Options:
grafna prometheus zabbix netbox
Data Visualize https://www.elastic.co/
grafana https://grafana.com/docs/features/panels/graph/ X-axis type: time, series, histogram https://grafana.com/docs/features/datasources/ https://grafana.com/docs/features/datasources/prometheus/
Grafana vs kibana https://logz.io/blog/grafana-vs-kibana/ https://grafana.com/docs/features/datasources/elasticsearch/
Grafana with nginx Set up domain Running Grafana behind a reverse proxy http://docs.grafana.org/installation/behind_proxy/
tail -f /var/log/grafana/grafana.log
?#issues Template variables are not supported in alert queries Create new chart/graph without using any template variables /etc/grafana/grafana.ini
?#grafana mail: expected single address, got Resolved by comment out from_name
?#could not send email 1: 452 4.3.1 Out of memory
If you have a lot of messages queued up it could go over the max number of messages per connection. To see if this is the case you can try submitting only a few messages to that domain at a time and then keep increasing the number until you find the maximum number accepted by the server. https://social.msdn.microsoft.com/Forums/sqlserver/en-US/a1821fec-5109-46e0-8b18-36b96646a5c7/failure-sending-mail?forum=sqlreportingservices
grafana permission Add permission, and remember remove all default viewer and editor permission
Elastic / prometheus/ influxdb/ opentsdb /…… https://prometheus.io/docs/introduction/comparison/ event logging and metrics recording
Prometheus vs influxdb InfluxDB supports timestamps with up to nanosecond resolution Prometheus, by contrast, supports the float64 data type with limited support for strings, and millisecond resolution timestamps Where InfluxDB is better: ● If you’re doing event logging. ● Commercial option offers clustering for InfluxDB, which is also better for long term data storage. ● Eventually consistent view of data between replicas. Where Prometheus is better: ● If you’re primarily doing metrics. ● More powerful query language, alerting, and notification functionality. ● Higher availability and uptime for graphing and alerting.
InfluxDB is maintained by a single commercial company following the open-core model, offering premium features like closed-source clustering, hosting and support. Prometheus is a fully open source and independent project, maintained by a number of companies and individuals, some of whom also offer commercial services and support.
Prometheus vs. OpenTSDB OpenTSDB is a distributed time series database based on Hadoop and HBase.
Prometheus VS Zabbix Zabbix 和 Prometheus 都是流行的开源监控工具,它们各自具有独特的优势。Zabbix 主要用于网络和系统监控,而 Prometheus 则专注于开源的分布式时间序列数据库。在某些场景下,将这两个工具整合在一起可以更好地发挥它们的优势,提高监控的灵活性和效率。 Zabbix 和 Prometheus 到底怎么选? Zabbix 整合 Prometheus
https://prometheus.io/docs/introduction/overview/
Pull or push? Scrape config: scrape interval Reload config(not work?) https://www.robustperception.io/reloading-prometheus-configuration
Prometheus http://localhost:9090 Monitor your applications with Prometheus https://blog.alexellis.io/prometheus-monitoring/
https://prometheus.io/docs/concepts/metric_types/ Samples: https://gist.github.com/lyhistory/998448f439594d3eec073d7849d713a8 counter, gauge, histogram, summary Counter increase all the time Gauge can be negative, decrease or increase
Summary vs histogram
Summary直接统计每个百分位的“数值”【客户端计算】,比如0.5即50%的请求相应时间是多少 (全范围统计,over all time):
*****_duration_seconds{quantile="0.5"}
*****_duration_seconds_sum
*****_duration_seconds_count
*****_duration_seconds_sum/*****_duration_seconds_count 每个请求的平均耗时
而histogram直接反应了不同区间内样本的个数【服务端计算】(后面的区间是覆盖前面的区间)(全范围统计,over all time)
*****_range_bucket{le="100"} 小于100秒的请求个数有几个
*****_range_sum
*****_range_count
*****_range_sum/*****_range_count 每个请求的耗时
histogram计算百分位函数histogram_quantile
histogram_quantile(0.5, *****_duration_seconds_bucket) 占50%的请求的平均耗时(需要注意的是通过histogram_quantile计算的分位数,并非为精确值,而是通过*****_duration_seconds_bucket和*****_duration_seconds_sum近似计算的结果)
进一步, Over 10m统计:
Summary
rate(*****_duration_seconds_sum[10m]) / rate(*****_duration_seconds_count[10])
Histogram
rate(*****_range_sum[10m]) / rate(*****_range_count[10m])
histogram_quantile(0.5, rate(*****_duration_seconds_bucket[10m])) 这个结果可能因为还有其他label比如responsecode=200/404产生多条记录:过去10分钟,返回200的50%的请求的平均响应时间以及返回404的50%的请求的平均响应时间,如果想获取过去10分钟,返回所有请求之中的50%的请求的平均响应时间,则需要用
histogram_quantile(0.5, sum(rate(*****_duration_seconds_bucket[10m])) by (le))
https://groups.google.com/forum/#!topic/prometheus-developers/VYaiXJCsHxQ https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/promql/prometheus-metrics-types https://www.yangcs.net/prometheus/3-prometheus/functions.html
Sample - Summary
The Python client doesn’t store or expose quantile information at this time request_processing_seconds_sum request_processing_seconds_count rate(request_processing_seconds_sum[25m])/rate(request_processing_seconds_count[25m])
from prometheus_client.core import GaugeMetricFamily, CounterMetricFamily, REGISTRY
from prometheus_client import start_http_server, Summary
import random
import time
# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
"""A dummy function that takes some time."""
time.sleep(t)
if __name__ == '__main__':
# Start up the server to expose the metrics.
parser = argparse.ArgumentParser()
parser.add_argument('-p', '--port', help="The port the metrics is serving from.", required=True)
args = parser.parse_args()
start_http_server(int(args.port))
#Test Summary
while True: process_request(random.randint(60,600))
Sample - histogram
request_latency_seconds_bucket
request_latency_seconds_bucket[10m]
histogram_quantile(0.99, request_latency_seconds_bucket)
histogram_quantile(0.99, rate(request_latency_seconds_bucket[10m]))
histogram_quantile(0.99, sum(rate(request_latency_seconds_bucket[10m])) by (le))
##DEFAULT_BUCKETS = (.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, 5.0, 7.5, 10.0, INF)
INF = float("inf")
# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
"""A dummy function that takes some time."""
print(str(t))
time.sleep(t)
if __name__ == '__main__':
# Start up the server to expose the metrics.
parser = argparse.ArgumentParser()
parser.add_argument('-p', '--port', help="The port the metrics is serving from.", required=True)
args = parser.parse_args()
start_http_server(int(args.port))
#Test Summary
MY_BUCKETS = (30.0,40.0,50.0,60.0,70.0,80.0,90.0,100.0, INF)
h = Histogram(name='request_latency_seconds', documentation='Description of histogram',buckets=MY_BUCKETS)
while True:
t = random.randint(20,110)
h.observe(t)
process_request(t)
Visualize on grafana https://yunlzheng.gitbook.io/prometheus-book/part-ii-prometheus-jin-jie/grafana/grafana-panels/use_graph_panel
histogram graph: Be careful, grafana x axis ‘histogram’ mode is meaningless for prometheus, so we need to use “Series” mode instead count==**_count[pre-le,le] based on separated buckets Total == **_count[,le] based on le buckets Demonstrated above with x axis in “Time” Mode
Format as “Heatmap” so that it will separate le buckets into [] individual buckets le[,10] le[,20] …. ====> le[,10] le[10,20] ……
heatmap graph: https://grafana.com/docs/features/panels/heatmap/ Prometheus Histograms & Grafana Heatmaps https://vimeo.com/289620891 Because each **buckets accumulated over time(buckets here has already been separated into individual buckets), so use increase function to show the ‘rate’ along the time
Literals: String Float/scala Time series selector: Instant vector selector Range vector selector Basic time series https://prometheus.io/docs/prometheus/latest/querying/examples/
Example: scrape every 15s, 4 request every 1m http_requests_total http_requests_total[1m] rate(http_requests_total[1m]) sum(rate(http_requests_total[1m])) by (instance) sum(rate(http_requests_total[1m])) by job) topk(5, sum(rate(http_requests_total[5m])) by (instance))
Histogram and summary https://prometheus.io/docs/practices/histograms/ https://timber.io//blog/promql-for-humans/ https://povilasv.me/prometheus-tracking-request-duration/
Prometheus 监控 Nginx 流量 https://www.cnblogs.com/vovlie/p/Nginx_monitoring.html http://vearne.cc/archives/11085 https://www.cnblogs.com/SleepDragon/p/10642955.html
Exporter EXPORTERS AND INTEGRATIONS https://prometheus.io/docs/instrumenting/exporters/ MONITORING LINUX HOST METRICS WITH THE NODE EXPORTER https://prometheus.io/docs/guides/node-exporter/
Example http://michaeljones.tech/writing-exporters-for-prometheus/ Coinmarketcap: https://blog.billyc.io/2017/12/02/a-prometheus-exporter-for-cryptocurrency-values-using-the-coinmarketcap-api/ https://www.robustperception.io/writing-a-jenkins-exporter-in-python
Prometheus deafult admin dashboard: Status -> Targets : check status
c= GaugeMetricFamily(‘btc_gauge’, ‘btc statics’,labels=[‘menu’], value=1000) File “/usr/lib/python2.7/site-packages/prometheus_client/core.py”, line 212, in init raise ValueError(‘Can only specify at most one of value and labels.’) ValueError: Can only specify at most one of value and labels.
No token found This usually means that the output is not valid Prometheus text format. Look for hyphens in metric or label names, or either of those starting with numbers - those are the most common errors.
Scalability
Thanos - a Scalable Prometheus with Unlimited Storage https://www.infoq.com/news/2018/06/thanos-scalable-prometheus
V1.7 https://docs.influxdata.com/influxdb/v1.7/introduction/getting-started/ https://github.com/influxdata/influxdb-python https://influxdb-python.readthedocs.io/en/latest/api-documentation.html
Change port /etc/influxdb/influxdb.conf Run: influxdb or service start influxdb Cli: influx -precision rfc3339 -host 127.0.0.1 -port 8086
Visualize-table panel https://grafana.com/docs/features/panels/table_panel/
?#data with same timestamp and tags gets overwritten By design https://github.com/influxdata/influxdb/issues/4150
v2 https://v2.docs.influxdata.com/v2.0/get-started/ https://community.influxdata.com/c/getting-started https://community.influxdata.com/c/influxdb2 https://www.influxdata.com/blog/getting-started-with-influxdb-2-0-scraping-metrics-running-telegraf-querying-data-and-writing-data/ flux rpel https://docs.influxdata.com/flux/v0.24/
Key concepts: buckets https://docs.influxdata.com/flux/v0.24/introduction/getting-started
New release https://www.influxdata.com/blog/introducing-the-next-generation-influxdb-2-0-platform/
(Optional) Copy the influx and influxd binary to your $PATH sudo cp influxdb_2.0.0-alpha.8_darwin_amd64/{influx,influxd} /usr/local/bin/
influxd –http-bind-address=127.0.0.1:9999
Influxdb query last row of all series in a measurement https://community.influxdata.com/t/influxdb-query-last-row-of-all-series-in-a-measurement/4915/3https://community.influxdata.com/t/influxdb-query-last-row-of-all-series-in-a-measurement/4915 https://stackoverflow.com/questions/29193898/influxdb-getting-only-last-value-in-query/56141349#56141349
from prometheus_client.core import Metric,GaugeMetricFamily, CounterMetricFamily, REGISTRY
from prometheus_client import Gauge
from prometheus_client import start_http_server
import time
import argparse
import requests
from datetime import datetime,timedelta
from influxdb import InfluxDBClient
class CustomCollector(object):
def __init__(self):
self.client = InfluxDBClient(host='localhost', port=8087, database="TEST")
def collect(self):
r = requests.get("http://localhost/api/GetBankAccount")
#print(r)
data = r.json()
#print(data)
print(len(data["List"]))
metric = Metric("banks","bank list","gauge")
index = 0
current_time = real_time = datetime.utcnow()
points = []
for item in data["List"]:
index+=1
group='NA'
if item["group"] is not None:
group = item["group"]
metric.add_sample("bank_"+str(index), value=item["BALANCE"],labels={"type":"bank",'bankname': item["BANKNAME"],"code":item["CODE"],"group":group})
json_body = {
"measurement": "banklist",
"time": real_time.strftime('%Y-%m-%dT%H:%M:%SZ'),
"tags":{
"code":item["CODE"]
},
"fields": {
"timestamp": real_time.strftime('%Y-%m-%dT%H:%M:%SZ'),
"balance": item["BALANCE"],
"bankname": item["BANKNAME"],
"bankcode":item["CODE"],
"bankgroup":group
}
}
print("Write points: {0}".format(json_body))
points.append(json_body)
current_time = current_time+timedelta(seconds=1)
if len(points)>0:
self.client.write_points(points)
yield metric
if __name__ == '__main__':
# Start up the server to expose the metrics.
parser = argparse.ArgumentParser()
parser.add_argument('-p', '--port', help="The port the metrics is serving from.", required=True)
args = parser.parse_args()
start_http_server(int(args.port))
REGISTRY.register(CustomCollector())
while True: time.sleep(60)
https://thingsmatic.com/2017/03/02/influxdb-and-grafana-for-sensor-time-series/
Why pgwatch2? https://www.cybertec-postgresql.com/en/announcing-pgwatch2-a-simple-but-versatile-postgresql-monitoring-tool/
https://github.com/cybertec-postgresql/pgwatch2
1)Project background For more background on the project motivations and design goals see the original series of blogposts announcing the project: ● Project announcement ● Implementation details ● Feature pack 1 ● Feature pack 2 ● Feature pack 3
2)Source Code explain pgwatch2/Dockerfile
pgwatch2/docker-launcher.sh https://github.com/cybertec-postgresql/pgwatch2/blob/9eb5ac699df9873d2138f94dc35e1ba509dd82a6/docker-launcher.sh
Supervisord http://supervisord.org/
pgwatch2/supervisord.conf
http://blog.51cto.com/youerning/1714627 https://blog.csdn.net/vbaspdelphi/article/details/53324673 Auto start on system start /etc/init.d/supervisor https://serverfault.com/questions/96499/how-to-automatically-start-supervisord-on-linux-ubuntu
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2-1.x86_64.rpm sudo yum localinstall grafana-5.2.2-1.x86_64.rpm http://docs.grafana.org/installation/upgrading/
sudo yum install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2-1.x86_64.rpm /etc/grafna/grafana.ini
1) run docker pull https://hub.docker.com/r/cybertec/pgwatch2/tags/ sudo docker run -d -p 8080:8080 -p 8086:8086 -p 9001:9001 -p 3001:3000 –name pw2 cybertec/pgwatch2 sudo docker exec -ti pw2 /bin/bash Note: don’t follow instructions given by pgwatch2, failed if using the cmd below, need to check the reason: sudo docker run -d -p 8080:8080 -p 9001:9001 -p 3001:3000 -e PW2_GRAFANA_BASEURL=’http://10.20.70.205:3000’ –name pw2
In order to avoid conflicts with existing grafana, we use host port 3001 mapping to docker grafana 3000, 9001 is for supervisor UI, 8080 is for the web ui, 8086 is for influx db
2) Postgresql server config At pgsql server: pgsql -p 6432 /c oureadb create role pgwatch2 with login password ‘secret’; /i pgwatch2/sql/metric_fetching_helpers/stat_activity_wrapper.sql CREATE EXTENSION pg_stat_statements; CREATE EXTENSION plpythonu; /i pgwatch2/sql/metric_fetching_helpers/stat_statements_wrapper.sql /i pgwatch2/sql/metric_fetching_helpers/cpu_load_plpythonu.sql
Add user to pgbouncer
3) config supervisor http server
root@28daac38a798:/# supervisor
supervisorctl supervisord
root@28daac38a798:/# supervisorctl stop
Error: stop requires a process name
stop <name> Stop a process
stop <gname>:* Stop all processes in a group
stop <name> <name> Stop multiple processes or groups
stop all Stop all processes
root@28daac38a798:/# ps -lef|grep "super"
4 S root 1 0 0 80 0 - 12459 poll_s 08:55 ? 00:00:00 /usr/bin/python /usr/bin/supervisord --configuration=/etc/supervisor/supervisord.conf --nodaemon
root@28daac38a798:/# netstat -anp|grep :9001
tcp 0 0 0.0.0.0:9001 0.0.0.0:* LISTEN 1/python
root@28daac38a798:/#
Modify /etc/supervisor/supervisord.conf add: [inet_http_server] port = 9001 username = user password = 123 But how to restart with configuration changes or reload? http://www.onurguzel.com/supervisord-restarting-and-reloading/
?# pgwatch2 fatal error check /var/log/supervisor/
4) Add postgresql db config at WebUI http://hostip:8080/dbs Name: dbname Host: port: db: Username: pgwatch2, password: secret
Note: by default there is a test database, removing from here is useless, because I find the test database still in influxdb, so from grafana you still can see test in the list, the way to get rid of it is change the Variables config in 4) from default SHOW TAG VALUES WITH KEY = “dbname” To SHOW TAG VALUES WITH KEY = “dbname” where “dbname” !~ /(test)+/
5) grafana config http://hostip:3000/datasources Add influxdb datasource: Influx InfluxDB http://localhost:8086 Database: pgwatch2, username: root, password: root Make it default source, otherwise you have to make changes to all the dashboards to specify the datasource from default to influx
User and password is required here, because by default, influxdb disable authentication, pls refer to below how to enable and config with auth enabled
Add Monitor script: https://github.com/cybertec-postgresql/pgwatch2/tree/master/grafana_dashboards/v5
Change dashboard setting: Variables -> edit Datasource: select influxdb Query: change SHOW TAG VALUES WITH KEY = “dbname” To SHOW TAG VALUES WITH KEY = “dbname” where “dbname” !~ /(test)+/
Available measurements (InfluxDB "tables" with metric info) overview
NB! Metrics that are actually gathered need to be configured for every DB separately - for that open the Web UI config page or modify the "pgwatch2.monitored_host" table directly in the "pgwatch2" database
backends - active, total, waiting sessions
pgbouncer_stats - pgbouncer (1.8+) statistics
bgwriter - pg_stat_bgwriter snapshots
blocking_locks - detailed info on sessions that are waiting
cpu_load - CPU load info acquired via a plpython sproc (/pgwatch2/sql/metric_fetching_helpers/)
db_stats - pg_stat_database snapshots + DB size info
index_stats - pg_stat_user_indexes snapshots
kpi - most important high level metrics
locks - different locktype (page, tuple, ...) counts. NB! for usable data one should set the polling interval very low
locks_mode - different lock-mode (exclusive, share) counts. NB! for usable data one should set the polling interval very low
replication - pg_stat_replication info (including replica lag)
sproc_stats - pg_stat_user_functions snapshots
table_io_stats - pg_statio_user_tables snapshots
stat_statements - pg_stat_statements snapshots (requires the extension)
stat_statements_calls - total query count according to pg_stat_statements
table_bloat_approx_summary - bloat summary for the whole DB (needs pgstattuple extension)
table_stats - pg_stat_user_tables snapshots
wal - pg_current_(xlog_location|wal_lsn) values
For getting started with Grafana in general start here
For learning InfluxDB query language InfluxQL start here
When stuck then additional support and consultations are available from Cybertec here
So many auth in so many places!!! Grafana - add datasource - Auth
This is different for varaires datasource, need to specifiy accoridingly
For example, for prometheus, it has a default web portal, basic auth is to protect it from access by non authenticated user, And also protect from http api call Securing Prometheus with Basic Auth for Grafana https://www.youtube.com/watch?time_continue=431&v=oPlk0GHYmrE
In the influxdb example here, we didn’t user basic auth, because:
Create user admin with password ‘123456’ with all priveleges; /etc/influxdb/influxdb.conf After restart, influx -precision rfc3339 -username admin -password 123456
https://github.com/cybertec-postgresql/pgwatch2/blob/master/Dockerfile https://dzone.com/articles/demystifying-the-data-volume-storage-in-docker https://docs.docker.com/engine/reference/commandline/volume_inspect/
https://docs.influxdata.com/influxdb/v1.6/query_language/schema_exploration/ #retention policy https://docs.influxdata.com/influxdb/v1.2/query_language/database_management/#create-retention-policies-with-create-retention-policy,https://community.influxdata.com/t/what-is-the-retention-policy-and-how-exactly-it-work/1080/3 Querying data in a non-DEFAULT retention policy https://www.influxdata.com/blog/tldr-influxdb-tech-tips-april-27-2017/ Retention policies not dropping old data https://community.influxdata.com/t/retention-policies-not-dropping-old-data/4538
1) PW2_GRAFANA_BASEURL doesn’t work Need to check scripts
sudo docker run -d -p 8080:8080 -p 8086:8086 -p 9001:9001 -p 3001:3000 -e PW2_GRAFANA_BASEURL='http://10.20.70.205:3000' --name pw2 cybertec/pgwatch2
2) instructions not clear check section “To use an existing Grafana installation” in https://github.com/cybertec-postgresql/pgwatch2
Need to explore and refer to “Updating to a newer Docker version” (https://github.com/cybertec-postgresql/pgwatch2)
cd /tmp/
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2-1.x86_64.rpm
sudo yum localinstall grafana-5.2.2-1.x86_64.rpm
git clone https://github.com/cybertec-postgresql/pgwatch2.git
sudo git clone https://github.com/cybertec-postgresql/pgwatch2.git
sudo docker pull cybertec/pgwatch2
sudo docker images
sudo netstat -anp|grep :8080
##sudo docker run -d -p 8080:8080 -p 9001:9001 -p 3001:3000 --name pw2 cybertec/pgwatch2
##sudo docker run -d -p 8080:8080 -p 9001:9001 -p 3001:3000 -e PW2_GRAFANA_BASEURL='http://10.20.70.205:3000' --name pw2 cybertec/pgwatch2
sudo docker run -d -p 8080:8080 -p 8086:8086 -p 9001:9001 -p 3001:3000 --name pw2 cybertec/pgwatch2
sudo docker ps -a
## sudo vi /etc/grafana/grafana.ini
## ;type = postgres
## ;host = 10.20.70.168:6432
## ;name = oureadb
## ;user = pgwatch2
## # If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
## ;password = secret
sudo docker exec -ti pw2 /bin/bash
apt-get install net-tools
service supervisor status
ps -lef|grep "super"
/usr/bin/supervisord --configuration=/etc/supervisor/supervisord.conf --nodaemon
vi /etc/supervisor/supervisord.conf
[inet_http_server]
port = 9001
username = peter
password = 123456
service supervisor restart
influx -precision rfc3339
influx -precision rfc3339 -username admin -password 123456
show databases
show grants for root;
grant all on pgwatch2 to root;
use pgwatch2
show measurements
select * from backends limit 1
SHOW TAG VALUES WITH KEY = "dbname" where "dbname" !~ /(test)+/
vi /pgwatch2/webpy/pgwatch2_influx.py
modify influx_connect_params
sudo docker start pw2
netstat -anp|grep :9001
netstat -anp|egrep :8080
ps -lef|egrep "35|PID"
ls -l /proc/35/exe
cat /proc/35/cmdline | xargs -0 echo
ps -p 35 -o cmd
vi /etc/grafana/grafana.ini
[server]
protocol = http
cert_file = /pgwatch2/persistent-config/self-signed-ssl.pem
cert_key = /pgwatch2/persistent-config/self-signed-ssl.key
[database]
type = postgres
host = 127.0.0.1:5432
name = pgwatch2_grafana
user = pgwatch2
password = pgwatch2admin
[security]
admin_user = admin
admin_password = pgwatch2admin
[auth.anonymous]
enabled = true
[metrics]
enabled = false
sudo docker top 0c7180296ad6
sudo systemctl status grafana-server
sudo systemctl start grafana-server
journalctl -xe
telnet 10.20.70.168 6432
journalctl -xe
http://10.20.70.205:8080/dbs
oureadb
10.20.70.168 6432 oureadb pgwatch2 secret
?# postgresql connection timeout
Test on monitor server(running docker), try to connect use pgql
yum localinstall https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-7-x86_64/pgdg-centos11-11-2.noarch.rpm
yum list postgres*
yum install postgresql11.x86_64
psql -h
?# prometheus not work => caused by server diskspace used up => caused by pgwatch2 influxdb use up space=>caused by pgwatch metrics preset config(full will scrape too much data) Server date ahead of real datetime Root disk space used up Influxdb used up space vim /etc/influxdb/influxdb.conf modify check-interval du -h /var/lib/influxdb/data
?# Grafana did not display the records in the expected order in table very simple solution : click on max/average whatever column you want to sort, click it once or twice until the result is sorted in your expected order, then save, so next time when you open it, it will display in expected sort order, actually what happens is the grafana json file will change to for example: “legend”: { “alignAsTable”: true, “avg”: true, “current”: true, “max”: true, “min”: false, “rightSide”: true, “show”: true, “sideWidth”: 300, “sort”: “max”, “sortDesc”: true, “total”: true, “values”: true }, ppl asking and discussing it here https://community.grafana.com/t/grafana-did-not-display-the-records-in-the-expected-order-in-table/4564 and https://groups.io/g/grafana/topic/prometheus_data_not_sorted/4314229?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,0,0,4314229
Install node_exporter released for prometheus and config in prometheus https://prometheus.io/docs/guides/node-exporter/#installing-and-running-the-node-exporter Grafana dashboard
Check http://...:9100/metrics
OR
building running from source:
go get github.com/prometheus/node_exporter
cd ${GOPATH-$HOME/go}/src/github.com/prometheus/node_exporter
make
./node_exporter
Add alias, never change original variables,
?# node_exporter(latest Version 0.16.0) not showing on dashbaord Because of naming changed https://www.robustperception.io/new-features-in-node-exporter-0-16-0 Use the latest grafana dashboard https://grafana.com/dashboards/1860 And check from prometheus web portal to confirm the query statement
?# # multiple series error Fixed by itself after a while
Not verified link: http://d-prototype.com/archives/9672
Windows support is removed by node exporter, the wmi_exporter is recommended as a replacement. https://github.com/martinlindhe/wmi_exporter https://grafana.com/dashboards/2129
Endpoint prob - Blackbox-exporter https://github.com/prometheus/blackbox_exporter
https://cloudprober.org/how-to/external-probe/
https://grafana.com/dashboards/5345
Alert http://docs.grafana.org/alerting/notifications/#all-supported-notifier
Btc exporter https://grafana.com/dashboards/6973 https://github.com/hunterlong/btcexporter
docker run -it -d -p 9019:9019 -v /opt/btc/btcexporter/addresses.txt:/app/addresses.txt hunterlong/btcexporter
ps -lef|grep “prome”
ls /proc/<PID>/exe
vi /path/prometheus.yml
kill -HUP <PID>
http://10.20.70.205:9019/metrics
how to Customize exporter?
promethus lib https://prometheus.io/docs/instrumenting/writing_exporters/ python https://github.com/prometheus/client_python#custom-collectors go https://redbyte.eu/en/blog/real-time-metrics-using-prometheus-and-grafana/
Real Time performance monitor for .NET CORE .Net Core 2.0+ InfluxDB+Grafana+App Metrics 实现跨平台的实时性能监控 https://www.cnblogs.com/landonzeng/p/7904402.html
http request/traffic | web api monitor nodejs http://swaggerstats.io/docs.html
java perform monitor https://github.com/stagemonitor/stagemonitor/wiki/Installation
CI-Jenkins https://jenkins.io/doc/tutorials/
Docker mode:
docker pull jenkinsci/blueocean
docker run \
--rm \
-u root \
-p 8080:8080 \
-v jenkins-data:/var/jenkins_home \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "$HOME":/home \
jenkinsci/blueocean
docker exec -ti 0dc7a2730c43 bash
Find initial password: docker log 0dc7a2730c43 UI: http://192.168.56.101:8080/ Work path: /var/jenkins_home/workspace
Use Maven:: ?#maven not found https://my.oschina.net/u/2450666/blog/844170 https://stackoverflow.com/questions/45777031/maven-not-found-in-jenkins/56833922#56833922
export MAVEN_HOME=/opt/maven
export PATH=$PATH:$MAVEN_HOME/bin
mvn --version
mvn clean compile
mvn clean install
mvn clean package -P dev
Use Sonarqube:: https://docs.sonarqube.org/latest/setup/get-started-2-minutes/ Java version requirements: https://docs.sonarqube.org/7.8/requirements/requirements/ java.io.IOException: Cannot run program “sonar-scanner” error=2, No such file or directory https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-jenkins/ Jenkins+sonar+sonar-scanner https://www.jianshu.com/p/27e6ed4f6dbc
Start sonar.sh on host machine: cd /home/test/workspace/sonarqube ./sonarqube-7.8/bin/linux-x86-64/sonar.sh console
Install sonarscanner on docker/jenkins sudo docker cp /home/test/workspace/test/sourcecode/ 0dc7a2730c43:/home/workspace/
Pipline https://jenkins.io/doc/tutorials/build-a-java-app-with-maven/
ref
实战 Prometheus 搭建监控系统 https://www.codercto.com/a/35819.html