Thursday, February 26, 2015

Monitoring - Jenkins, Fabric, Python

So... there are lots of monitoring tools out there.
But there always seem to be things that just aren't covered.
We use Zabbix, StatsD, CollectD and MMS.
But still find things that aren't handled the way we want.

Currently working on a API monitoring solution that can:
  1. do chains of calls that need to happen in order
  2. call into various levels of our stack (we have a lot of redundancy which can mask individual component failures... but we'd obviously still like to know about them)
Planning to have Jenkins call a Fabric Task that will do the real work.  We already have a buncha fabric modules for getting/creating/destroying infrastructure components, so that code can be leveraged to dynamically lookup the infrastructure components that we want to monitor.

The general solution should also be able to support 'app specific' monitoring.
For example:
  1. use PyMongo to query various MongoDB values... like if Balancing is enabled.
  2. use the Requests python module to query restful endpoints on our Haproxies & ELBs to confirm they are up and healthy.

And cuz this feels like an exceedingly verbose and not visually appealing post, here is a link to something on devops reactions:


 "Before diving into Legacy Code"

No comments:

Post a Comment