Deploy OmniVector's Slurm Core

Over the past few months we have been working on a rewrite of the slurm-charms and an initial release of the slurm snap. We have assembled a beta release of the new stack. Follow the directions below to get started.

The following demo uses LXD as the underlying cloud on which the slurm charms are deployed.

Technologies used

Install and Bootstrap Juju

Install the juju snap and use it to bootstrap a lxd cloud

sudo snap install juju --classic

juju bootstrap localhost

Deploy the Slurm Charms

Clone the slurm-charms repo and run the make command to pull down and deploy the charms to the localhost/lxd cloud we just bootstrapped.
The following command will pull down the slurm charms (and snap) and deploy them in LXD containers on your local machine.

git clone https://github.com/omnivector-solutions/slurm-charms && cd slurm-charms
make deploy-focal-bundle-on-lxd-from-edge-with-snap

What’s under the hood?

When you run the make command above, the provided script pulls down the slurm charms from our public s3 bucket, pulls the slurm snap from github and deploys them to the local cloud using the juju charm bundle contained with in the slurm-charms codebase.

Run juju status to find out what is going on

juju status

The above command will provide some high level detail of what is happening in our juju model.

To watch your slurm deployment from juju's perspective you can use the following command.

watch -n 0.5 -c juju status --color

Which should provide some “real time” information about the applications as they come up.

For example, a few moments after running the make command my juju status resembles

Model       Controller           Cloud/Region         Version  SLA          Timestamp
slurm-core  localhost-localhost  localhost/localhost  2.8.1    unsupported  18:52:32Z

App              Version  Status   Scale  Charm            Store       Rev  OS      Notes
percona-cluster  5.7.20   waiting      1  percona-cluster  jujucharms  290  ubuntu
slurmctld                 waiting    0/1  slurmctld        local         0  ubuntu
slurmd                    waiting    0/1  slurmd           local         0  ubuntu
slurmdbd                  waiting    0/1  slurmdbd         local         0  ubuntu

Unit                Workload  Agent       Machine  Public address  Ports  Message
percona-cluster/0*  waiting   executing   0        10.232.132.249         (config-changed) Unit waiting to bootstrap
slurmctld/0         waiting   allocating  1        10.232.132.74          installing agent
slurmd/0            waiting   allocating  2        10.232.132.50          waiting for machine
slurmdbd/0          waiting   allocating  3        10.232.132.8           installing agent

Machine  State    DNS             Inst id        Series  AZ  Message
0        started  10.232.132.249  juju-5b78e4-0  bionic      Running
1        started  10.232.132.74   juju-5b78e4-1  focal       Running
2        pending  10.232.132.50   juju-5b78e4-2  focal       Running
3        started  10.232.132.8    juju-5b78e4-3  focal       Running

But after a few more moments as the services come up and configure themselves juju status will show applications with active status and idle workload state.

Model       Controller           Cloud/Region         Version  SLA          Timestamp
slurm-core  localhost-localhost  localhost/localhost  2.8.1    unsupported  19:00:03Z

App              Version  Status  Scale  Charm            Store       Rev  OS      Notes
percona-cluster  5.7.20   active      1  percona-cluster  jujucharms  290  ubuntu
slurmctld        20.02.1  active      1  slurmctld        local         0  ubuntu
slurmd           20.02.1  active      1  slurmd           local         0  ubuntu
slurmdbd         20.02.1  active      1  slurmdbd         local         0  ubuntu

Unit                Workload  Agent  Machine  Public address  Ports     Message
percona-cluster/0*  active    idle   0        10.232.132.249  3306/tcp  Unit is ready
slurmctld/0*        active    idle   1        10.232.132.74             Slurmctld Available
slurmd/0*           active    idle   2        10.232.132.50             Slurmd Available
slurmdbd/0*         active    idle   3        10.232.132.8              Slurmdbd Available

Machine  State    DNS             Inst id        Series  AZ  Message
0        started  10.232.132.249  juju-5b78e4-0  bionic      Running
1        started  10.232.132.74   juju-5b78e4-1  focal       Running
2        started  10.232.132.50   juju-5b78e4-2  focal       Running
3        started  10.232.132.8    juju-5b78e4-3  focal       Running