Category Archives: CoreOS

Creating CoreOS Services with Cross Node Dependency using etcd

When I was putting together an architecture for deploying PasTmon sensors across a CoreOS cluster for a previous blog, PasTmon Passive Application Response Time Monitoring a CoreOS Cluster, I wanted to have the Fleet service units coded so the pastmon-sensors would have the pastmon-web as a cross node dependency.  The plan was for the sensors to only start once the web/database service had started, but this dependency needed to operate across all nodes in the cluster.

At first I thought I could achieve this using the unit directives like Requires/Wants etc, so I tried:

simply following the examples shown in the CoreOS documentation.

The unit called pastmon-web-discovery@1.service is a sidekick unit that BindsTo the actual pastmon-web service pastmon-web@%i.service, registering it’s hostname and database port in etcd:

Firing up pastmon-web, with it’s sidekick, followed by the sensors across the rest of the nodes in the cluster, all worked fine. However, if the CoreOS cluster failed or was rebooted, the services came back up out of order and required manual intervention.  It was clear that the [Unit] After and Requires directives only applied to the node the unit was started on, and not across the whole cluster.

Actually, this kind of made sense when I thought about it. The [X-Fleet] section of the unit means just that: “Cross Fleet (cluster)”.  At the time of writing this blog, there does not appear to be any support in this section for cross cluster unit dependencies (though I did find a few discussions around and requesting this feature in the CoreOS forums).

To resolve this I realised I could leverage the existing etcd web service registration as a Pre-Start condition in the sensor units.  The etcd key value has a Time-To-Live (–ttl) of 60 seconds, and is re-registered every 45 seconds, as long as the pastmon-web service it is bound to is running.

So here is my fixed pastmon-sensor unit using the etcd Pre-Start test:

The etcdctl get command will fail with a non-zero return code if the key is not present.  Running the ExecStartPre=, without the ‘-‘ (= instead of =-) causes this to fail starting the unit.

The second highlighted section, above, sets the unit to automatically restart on failure, after a delay of 10 seconds, and to retry forever.

I tested these again, crashing and rebooting the cluster, and they restarted in the correct order everytime – perfect.

All of the code above is available in gbevan/pastmon on GitHub.

PasTmon Passive Application Response Time Monitoring a CoreOS Cluster

The PasTmon Passive Application Response Time Monitor project (which I run) has just released pre-built docker images of pastmonweb front-end and pastmonsensor builds.  These make deploying a PasTmon response time monitoring solution a whole lot easier.

Here’s how I deployed PasTmon to my development CoreOS cluster.

PasTmon deployed into a CoreOS cluster

PasTmon deployed into a CoreOS cluster

The following instructions are available on the pastmonweb information page. Clone the project from GitHub – this contains all of the services unit files – onto the frontend cluster node:

Edit the unit files, pastmon-web@.service and pastmon-sensor@.service, to select the version of the docker image you want (currently “latest” and “0.16”):

You can instead create a local.conf file to override the selected version – but this applies to the node that the service will run on.  Editting the version, as above, before submitting the unit file allows this version to be set for the whole cluster.

Next edit the pastmon-web@service file to bind it to the frontend node of the cluster:

You can do this either using the MachineMetadata or MachineID from /etc/machine-id.

Submit all of the unit files to fleet:

Start the pastmonweb services:

The pastmon-web-discovery@.service is actually a “sidekick” to register the pastmonweb service as active to etcd – which provides host and port details to the pastmonsensors running on the other nodes in the cluster.

Once the web service is running (the first time will take a few minutes to download the docker image) you can point your browser at http://your-front-end-floating-ip:8080.  You should see a login screen for the PasTmon web app, like this:

login

You can login with the default credentials – user: “admin”, password: “admin”.

Next we can start the pastmon-sensor services on the remaining nodes in the cluster (the pastmonweb service also contains it’s own sensor) by running:

The “1..6” here means to start 6 instances numbered 1 through 6.

These should automatically discover the web service and connect to it’s postgresql database on port 5432.  After a while you should start to see measurement data in the web UI.

Here are a couple of screenshots of what to expect:

summary

rtt_avg

This one is showing the per 5 minute average of network round-trip-times for the postgresql server running on the pastmonweb container.

The way the pastmon sensor containers are configured allows them to bind to the same IP Namespaces as the CoreOS cluster nodes – so the sensors can see all of the traffic of all of the containers being run on that node.

Experimental Script to create a CoreOS Cluster in OpenStack

This is an experimental CoreOS cluster creator script for OpenStack Nova with Cinder: