Setting up Node Monitoring

in #good4 years ago

Hi, my dear reader, in this tutorial : infrastructure to monitor an instance of AvalancheGo
Start

First we need to add a system user account and create directories (you will need superuser credentials):

sudo useradd -M -r -s /bin/false prometheus
sudo mkdir /etc/prometheus /var/lib/prometheus
Next, get the link to latest version of Prometheus from the downloads page (make sure you select the appropriate processor architecture) and use wget to download it, and tar to unpack the archive:

mkdir -p /tmp/prometheus && cd /tmp/prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.21.0/prometheus-2.21.0.linux-amd64.tar.gz
tar xvf prometheus-2.21.0.linux-amd64.tar.gz
cd prometheus-2.21.0.linux-amd64
Next, we need to move the binaries, set ownership and move config files to appropriate locations:

sudo cp {prometheus,promtool} /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/{prometheus,promtool}
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
sudo cp -r {consoles,console_libraries} /etc/prometheus/
sudo cp prometheus.yml /etc/prometheus/
/etc/prometheus is used for configuration, and /var/lib/prometheus for data.

Let’s set up Prometheus to run as a system service. Do sudo nano /etc/systemd/system/prometheus.service (or open that file in the text editor of your choice) and enter the following configuration:

[Unit]
Description=Prometheus
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus --web.console.templates=/
etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --web.listen-address=0.0.0.0:9090 --web.external-url=

SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target
Save the file. Now we can run Prometheus as a system service:

sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
Prometheus should now be running. To make sure, we can check with:

systemctl status prometheus
which should produce something like:

● prometheus.service - Prometheus
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-04-01 19:23:53 CEST; 5 months 12 days ago
Docs: https://prometheus.io/docs/introduction/overview/
Main PID: 1767 (prometheus)
Tasks: 12 (limit: 9255)
CGroup: /system.slice/prometheus.service
└─1767 /usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus --web.console.templa>

Sep 13 13:00:04 ubuntu prometheus[1767]: level=info ts=2020-09-13T11:00:04.744Z caller=head.go:792 component=tsdb msg="Head GC completed" duration=13.6>
Sep 13 13:00:05 ubuntu prometheus[1767]: level=info ts=2020-09-13T11:00:05.263Z caller=head.go:869 component=tsdb msg="WAL checkpoint complete" first=9>
Sep 13 15:00:04 ubuntu prometheus[1767]: level=info ts=2020-09-13T13:00:04.776Z caller=compact.go:495 component=tsdb msg="write block" mint=15999912000>
...
You can also check Prometheus web interface, available on http://your-node-host-ip:9090/ (you may need to do sudo ufw allow 9090/tcp if the firewall is on).

Install Grafana
To set up Grafana project repositories with Ubuntu:

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
To install Grafana:

sudo apt-get update
sudo apt-get install grafana
To configure it as a service:

sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl enable grafana-server.service
To make sure it’s running properly:

sudo systemctl status grafana-server
which should show grafana as active. Grafana should now be available at http://your-node-host-ip:3000/ (again, open the port if needed). Log in with username/password admin/admin and set up a new, secure password.** Now we need to connect Grafana to our data source, Prometheus.

On Grafana’s web interface:

Go to Configuration on the left-side menu and select Data Sources.
Click Add Data Source
Select Prometheus.
In the form, enter the name (Prometheus will do), and http://localhost:9090 as the URL.
Click Save & Test
Check for “Data source is working” green message.
Set up node_exporter
In addition to metrics from AvalancheGo, let’s set up up monitoring of the machine itself, so we can check CPU, memory, network and disk usage and be aware of any anomalies. For that, we will use node_exporter, a Prometheus plugin.

Get the latest version with:

curl -s https://api.github.com/repos/prometheus/node_exporter/releases/latest | grep browser_download_url | grep linux-amd64 | cut -d '"' -f 4 | wget -qi -
change linux-amd64 if you have a different architecture (RaspberryPi is linux-arm64, for example). Untar and move the executable:

tar xvf node_exporter-1.0.1.linux-amd64.tar.gz
sudo mv node_exporter-1.0.1.linux-amd64/node_exporter /usr/local/bin
node_exporter --version
Then we add node_exporter as a service. Do sudo nano /etc/systemd/system/node_exporter.service (or open that file in the text editor of your choice) and populate it with:

[Unit]
Description=Prometheus
Documentation=https://github.com/prometheus/node_exporter
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/node_exporter
--collector.cpu
--collector.diskstats
--collector.filesystem
--collector.loadavg
--collector.meminfo
--collector.filefd
--collector.netdev
--collector.stat
--collector.netstat
--collector.systemd
--collector.uname
--collector.vmstat
--collector.time
--collector.mdadm
--collector.zfs
--collector.tcpstat
--collector.bonding
--collector.hwmon
--collector.arp
--web.listen-address=:9100
--web.telemetry-path="/metrics"

[Install]
WantedBy=multi-user.target
This configures node_exporter to collect various data we might find interesting. Start the service, and enable it on boot:

sudo systemctl start node_exporter
sudo systemctl enable node_exporter
sudo systemctl status node_exporter
Now we’re ready to tie it all together.

Configure AvalancheGo and node_exporter Prometheus jobs
Make sure that your AvalancheGo node is running with appropriate command line arguments. The metrics API must be enabled (by default, it is). If you use CLI argument --http-host to make API calls from outside of the host machine, make note of the address at which APIs listen.

We now need to define an appropriate Prometheus jobs. Let’s edit Prometheus configuration:

Do sudo nano /etc/prometheus/prometheus.yml (or open that file in the text editor of your choice) and append to the end:

  • job_name: 'avalanchego'
    metrics_path: '/ext/metrics'
    static_configs:

    • targets: ['<your-host-ip>:9650']
  • job_name: 'avalanchego-machine'
    static_configs:

    • targets: ['<your-host-ip>:9100']
      labels:
      alias: 'machine'
      Indentation is important. Make sure -job_name is aligned with existing -job_name entry, and other lines are also indented properly. Make sure you use the correct host IP, or localhost, depending how you node is configured.

Save the config file and restart Prometheus:

sudo systemctl restart prometheus

Check Prometheus web interface on http://your-node-host-ip:9090/targets. You should see three targets enabled:

Prometheus
avalanchego
avalanchego-machine
Open Grafana; you can now create a dashboard using any of those sources. You can also use the preconfigured dashboards [here].(https://github.com/ava-labs/node-monitoring/tree/master/dashboards)

To import the preconfigured dashboard:

Open Grafana’s web interface
Click + on the left toolbar
Select Import JSON and then upload the JSON file
That’s it! You may now marvel at all the things your node does. Woohoo!

Caveat: Security
The system as described here should not be opened to the public internet. Neither Prometheus, nor Grafana as shown here are hardened against unauthorized access. Make sure that both of them are accessible only over a secured proxy, local network, or VPN. Setting that up is beyond the scope of this tutorial, but exercise caution. Bad security practices could lead to attackers gaining control over your node! It is your responsibility to follow proper security practices.

Contributions
Basis for the Grafana dashboard was taken from the good guys at ColmenaLabs, which is apparently not available any more. If you have ideas and suggestions on how to improve this tutorial, please say so, post an issue, or make a pull request.
Good luck!

Coin Marketplace

STEEM 0.19
TRX 0.15
JST 0.029
BTC 63207.55
ETH 2571.17
USDT 1.00
SBD 2.82