The Observer Daemon

If you're still with me here, it's because you have the other components up and running, and you want to be automatically notified on the goings on of the Arborist node tree. This is where the Observer daemon comes into the mix -- it's job is to bundle up your selections and frequency timers, and subscribe to events as they happen at the Manager. When a matching event takes place, the Manager publishes it to all interested observing parties.

It has a few specific roles:

Lets set up a playground file for examples again -- this all should be familiar by now! Assuming you're still in /usr/local/arborist...

mkdir observers
touch observers/example.rb

Then start it up with arborist start <component> <source>.

arborist -c config.yml -l info start observers observers
[2016-10-06 17:05:15.563959 12721/main]  info {} -- Loading config from # with defaults for sections: [:logging, :arborist].
[2016-10-06 17:05:15.621350 12721/main]  info {Arborist::Observer} -- Loading observer file observers/example.rb...
[2016-10-06 17:05:15.621572 12721/main]  info {Arborist::Client:0x2fc2d88} -- Connecting to the event socket "ipc:///tmp/arborist_events.sock"
[2016-10-06 17:05:15.621622 12721/main]  info {} -- Using ZeroMQ 4.0.3/CZMQ 2.0.1

Like the other daemons, restart the Observer as we make changes to see them take effect -- and we'll start out with demonstrating a simplistic case.

Observers 101

The Arborist Manager has events bubbling around all over the place. When nodes have their attributes updated and their states change, they broadcast to their children, propagate to their parents, and publish to their subscribers. Because Arborist stores state as a tree, if you wanted to listen to all events, you can simply subscribe to the root node, and catch everything that is propagating upwards.

As it so happens, subscribing to the root node is the default behavior. It's as if someone thought about the normal, optimal case. Weird, huh? Here's how to subscribe to any node that transitions to a down state.

Arborist::Observer "Notify on downed nodes" do
    subscribe to: 'node.down'

    action do |node, event|
        # do something amazing
    end
end

And that's it. The node.down event is only emitted when a node's state changes, not every time it is checked. The action block is executed only once per state transition to down.

I should pause for a moment to discuss some Arborist internals. This won't be a departure from Observers, I promise. In fact, having a little bit of knowledge bomb dropped in your ear will probably help you figure out how to properly observe what you're after. Gather round everyone, lets go over...

Node Events

Yes, node events! What fun!

When a node's state changes, various events are generated within the Manager and ferried about. Here's a list of those events -- note that the primary events match the possible states of any one particular node. This is not a coincidence. When a node transitions from state X to state N, a single N event is propagated upwards through the node tree, ripe for capture by an eager Observer.

Those events are issued once and only once per state transition.

In addition to the state transition events, there are two other informational events that are fired on any and every change to a node -- not just state changes.

Using these informational events, you can trigger actions on all events (not just transitions), or extremely specific changes to any arbitrary attribute -- this is great for relaying metrics data to external time-series databases, if you're building an Arborist user interface, or just otherwise want to drink from the FIRE HOSE.

Status Transitions

Status transitions are internally managed via a state machine. Here are the possible status transitions -- useful to have if you want to observe a specific transition from one state to another, using the node.delta event.

Observer Pragmas

Okay, sidebar over. Now armed with that info, we can continue. An Observer has three basic pragmas. subscribe catches the event, action and summarize actually do something with it.

subscribe

Subscriptions require the event that you're interested in seeing. If you want to match on more than one event, add multiple subscribe lines. They'll all be tested as events flow in, and any one of them that matches is sufficient to trigger an action. A subscription accepts the following arguments:

# Subscribe to events where nodes transition into a down state
subscribe to 'node.down'

# Subscribe to every event where a node is detected as down
subscribe to: 'node.update', where: { status: 'down' }

# Subscribe to events that are generated by humans
subscribe to: 'node.ack'      # acknowledging a downed node
subscribe to: 'node.disabled' # pre-emptive acknowledgement (maintenance windows)
subscribe to: 'node.delta', where: { delta: {status: [ 'disabled', 'unknown' ]}}  # re-enabling a disabled node

action

An action block is executed on an incoming event You may have any number of action blocks within an Observer, each with their own settings.

When executed, the block is sent the node that generated the event, along with a hash of events, keyed on the time each event was generated.

# Perform the action as soon as an event comes in
action { ... }

# Perform the action if 3 events happen within a minute
action( after: 3, within: 60 ) { ... }

# Only trigger the action during core work hours
action( during: 'wd {Mon-Fri} hr {9am-4pm}' ) { ... }

If what goes inside of an action block has seemed nebulous so far... well, you're right. That's because you can do anything that is possible within the bounds of ruby, which practically means anything at all. Send email, sms, pushover, influxdb, splunk, ascii log files, relay to AMQP brokers, break out shared behavior to modules... really, the sky is the limit. Hack something together and use your imagination.

summarize

A summarize block is executed not as an event comes in, but rather when its options are satisfied. You may have any number of summarize blocks within an Observer, each with their own settings.

Either the count or the every option is required. If you use both, whichever options is satisfied first wins.

When executed, the block is sent a hash of events, keyed on the time the each event was generated.

# Trigger the block after 20 events
summarize( count: 20 ) { ... }

# Trigger the block once an hour, but only after work hours
summarize( every: 3600, during: 'hr {6pm-7am}' ) { ... }

Putting it All Together

Whoof, finally! This example clearly is large, and has plenty of opportunities to reduce code duplication, use variables, etc. Leaving it expanded out, so the core ideas of Observers aren't abstracted away. Here we go!

require 'mail'

using Arborist::TimeRefinements

NOT_WORK_HOURS = 'wd {Mon-Fri} hr {5pm-8am}, wd { Sat-Sun }'

Arborist::Observer "Notify on down" do
    subscribe to: 'node.down'

    action do |event|
        data = event[ 'data' ]
        mailer = Mail::Message.new
        mailer.delivery_method :sendmail
        mailer.from = Mail::Address.new( 'alerting@example.com' )
        mailer.to = Mail::Address.new( 'recipients@example.com' )
        mailer.subject = "Node down: %s" % [ event['identifier'] ]
        mailer.body = "%s (%s) is down: %s" % [
            event['identifier'],
            data['addresses'].first,
            data['error']
        ]
        mailer.deliver
    end

    action( during: NOT_WORK_HOURS ) do |event|
        send_sms_to_night_crew( event )
    end

    summarize( every: 1.hour ) do |events|
        mailer = Mail::Message.new
        mailer.delivery_method :sendmail
        mailer.from = Mail::Address.new( 'alerting@example.com' )
        mailer.to = Mail::Address.new( 'recipients@example.com' )
        mailer.subject = "Down events for the past hour"
        body = ""
        events.sort_by{|k, v| k }.each do |time, event|
            body <<  " - [%s] %s\n" % [
                time, event['identifier']
            ]
        end
        mailer.body = body
        mailer.deliver
    end
end


Arborist::Observer "Notify on recovery" do
    subscribe to: 'node.delta',
        where: { delta: {status: ['down', 'up']} }
    subscribe to: 'node.delta',
        where: { delta: {status: ['acked', 'up']} }

    action do |event|
        mailer = Mail::Message.new
        mailer.delivery_method :sendmail
        mailer.from = Mail::Address.new( 'alerting@example.com' )
        mailer.to = Mail::Address.new( 'recipients@example.com' )
        mailer.subject = "Node recovered: %s" % [ event['identifier'] ]
        mailer.body = "%s has recovered" % [ event['identifier'] ]
        mailer.deliver
    end
end


Arborist::Observer "Notify on ack/disabled/enabled" do
    subscribe to: 'node.acked'
    subscribe to: 'node.disabled'
    subscribe to: 'node.delta',
        where: { delta: {status: ['disabled', 'unknown']} }

    action do |event|
        data = event[ 'data' ]
        verb = data[ 'status' ]
        verb = 're-enabled' if event['type'] == 'node.delta'

        mailer = Mail::Message.new
        mailer.delivery_method :sendmail
        mailer.from = Mail::Address.new( 'alerting@example.com' )
        mailer.to = Mail::Address.new( 'recipients@example.com' )
        mailer.subject = "Node %s %s" % [ event['identifier'], verb ]
        if verb == 'acked' or verb == 'disabled'
            mailer.body = "%s says: \"%s\"" % [
                data['ack']['sender'],
                data['ack']['message']
            ]
        end
        mailer.deliver
    end
end


Arborist::Observer "Relay every event elsewhere" do
    subscribe to: 'node.update'

    action do |event|
        send_elsewhere( event['data'] )
    end
end

First observer:

Second observer:

Third observer:

Fourth observer:

If you want to get more details within an action than what the event provides, you can always get a reference to the Manager itself, and perform any queries you like against it using a Client object. One is always in scope via the client variable:

Arborist::Observer "Report on VM status changes" do
    subscribe to: 'node.down', where: { tags: 'vm' }
    subscribe to: 'node.up',   where: { tags: 'vm' }

    action do |event|
        vms = client.search tag: 'vm'
        down, up = vms.partition{|id, vm| vm['status'] == 'down' }

        puts "%d virtual machines are down, %d are operational!" % [
            down.length, up.length
        ]
    end
end

Multiple Observer daemons

Just like Monitors, you can scale out observing across any number of machines. The only configuration change needed is to the event_api_url key -- instead of the local socket default, provide it with a listening IP and port:

---
arborist:
  event_api_url: tcp://10.3.0.75:5012