Prometheus-Slurm-Exporter Integration

Omnivector’s slurm charms support exporting cluster metrics to prometheus via the prometheus endpoint. Follow along below to understand better how this works.

Deploy Slurm on LXD

Get a local slurm deploy up and running:

sudo snap install lxd
sudo snap install juju --classic

git clone https://github.com/omnivector-solutions/slurm-charms -b add_prometheus_exporter && \
    cd slurm-charms
make charms
make deploy-focal-bundle-on-lxd-from-edge-with-snap-prometheus

The deployment should take a few moments to complete, once everything is ready to go, juju status should resemble:

Model  Controller           Cloud/Region         Version  SLA          Timestamp
test   localhost-localhost  localhost/localhost  2.8.1    unsupported  00:51:26Z

App              Version  Status  Scale  Charm            Store       Rev  OS      Notes
percona-cluster  5.7.20   active      1  percona-cluster  jujucharms  291  ubuntu
slurmctld        20.02.1  active      1  slurmctld        local         8  ubuntu
slurmd           20.02.1  active      1  slurmd           local         3  ubuntu
slurmdbd         20.02.1  active      1  slurmdbd         local         3  ubuntu

Unit                Workload  Agent  Machine  Public address  Ports               Message
percona-cluster/3*  active    idle   16       10.232.132.14   3306/tcp            Unit is ready
slurmctld/7*        active    idle   25       10.232.132.28                       Slurmctld Available
slurmd/3*           active    idle   18       10.232.132.20                       Slurmd Available
slurmdbd/3*         active    idle   19       10.232.132.27                       Slurmdbd Available

Machine  State    DNS             Inst id         Series  AZ  Message
16       started  10.232.132.14   juju-58dca4-16  bionic      Running
18       started  10.232.132.20   juju-58dca4-18  focal       Running
19       started  10.232.132.27   juju-58dca4-19  focal       Running
25       started  10.232.132.28   juju-58dca4-25  focal       Running

After the slurm deployment settles, add the prometheus2 and grafana charms to the model.

juju deploy prometheus2
juju deploy grafana
juju relate prometheus2:grafana-source grafana

Now relate slurmctld to prometheus2.

juju relate slurmctld prometheus2

Following these operations, your should have a juju model that resembles the following:

juju status --relations
Model  Controller           Cloud/Region         Version  SLA          Timestamp
test   localhost-localhost  localhost/localhost  2.8.1    unsupported  00:51:26Z

App              Version  Status  Scale  Charm            Store       Rev  OS      Notes
grafana                   active      1  grafana          jujucharms   37  ubuntu
percona-cluster  5.7.20   active      1  percona-cluster  jujucharms  291  ubuntu
prometheus2               active      1  prometheus2      jujucharms   19  ubuntu
slurmctld        20.02.1  active      1  slurmctld        local         8  ubuntu
slurmd           20.02.1  active      1  slurmd           local         3  ubuntu
slurmdbd         20.02.1  active      1  slurmdbd         local         3  ubuntu

Unit                Workload  Agent  Machine  Public address  Ports               Message
grafana/0*          active    idle   26       10.232.132.214  3000/tcp            Started grafana-server
percona-cluster/3*  active    idle   16       10.232.132.14   3306/tcp            Unit is ready
prometheus2/1*      active    idle   21       10.232.132.110  9090/tcp,12321/tcp  Ready
slurmctld/7*        active    idle   25       10.232.132.28                       Slurmctld Available
slurmd/3*           active    idle   18       10.232.132.20                       Slurmd Available
slurmdbd/3*         active    idle   19       10.232.132.27                       Slurmdbd Available

Machine  State    DNS             Inst id         Series  AZ  Message
16       started  10.232.132.14   juju-58dca4-16  bionic      Running
18       started  10.232.132.20   juju-58dca4-18  focal       Running
19       started  10.232.132.27   juju-58dca4-19  focal       Running
21       started  10.232.132.110  juju-58dca4-21  focal       Running
25       started  10.232.132.28   juju-58dca4-25  focal       Running
26       started  10.232.132.214  juju-58dca4-26  focal       Running

Relation provider           Requirer                 Interface        Type     Message
percona-cluster:cluster     percona-cluster:cluster  percona-cluster  peer
percona-cluster:db          slurmdbd:db              mysql            regular
prometheus2:grafana-source  grafana:grafana-source   grafana-source   regular
slurmctld:prometheus        prometheus2:scrape       prometheus       regular
slurmd:slurmd               slurmctld:slurmd         slurmd           regular
slurmdbd:slurmdbd           slurmctld:slurmdbd       slurmdbd         regular

Notice the relation

slurmctld:prometheus        prometheus2:scrape       prometheus       regular

This is what enables slurmctld to inform prometheus of its scrape endpoint.

Now that the juju environment has been configured, login to the grafana dashboard and import the prometheus-slurm-exporter dashboard to get started.

Import the prometheus-slurm-exporter dashboard

From the grafana ui, click the import dashboard button
Screen Shot 2020-08-16 at 5.42.09 PM

Import the grafana dashboard with id 4323 and choose the prometheus2 datasource.

After importing the dashboard, you should be redirected to a page with some pre-made visualizations that are populated with information coming from the prometheus-slurm-exporter running on the slurmctld node.

You can immediately start to create your own visualizations for metrics you care about by adding another panel to the dashboard and selecting your data point from the scrape metrics being returned by the prometheus-slurm-exporter.
Screen Shot 2020-08-16 at 6.01.06 PM

We are excited to hear about user experiences, so don’t be afraid to reach out and say hello!

Thank you!