Sismology:  Iguana Solutions’ Monitoring System

Timeseries, Long Term Storage, Multi tenancy & High Availability

 

Sismology Iguana Solutions' Monitoring System

 

This article is a retrospective of several months of continuous improvement since the creation of our current monitoring system: what challenges did we face, how we overcame them and how we finally switched to Victoria Metrics.

How it started

At Iguana Solutions we have created a multi-tenant system based on Prometheus for our alerting and metrology needs: Sismology. It began as a project to replace our monolithic Naemon and Graphite (with collectd) by a unique system merging metrology and alerting based on the current standard: Prometheus.

While Prometheus gave us a nice metrology and alerting core, we faced 3 challenges:

multi-tenancy Multi-tenancy: as we were planning on letting our customers access their own data, the single tenancy of prometheus would have to be overcome
storage Long term storage: long as several years, indeed it is not uncommon for our customers (or ourselves) to compare a specific time of the year against year N-1 or N-2
high availability High availability: 0 downtime target while still having the possibility to put some nodes offline for maintenance purpose

Link to medium Sismology Iguana Solutions Monitoring System

In this article written by Edouard Hur, VP Engineering at Iguana Solutions, you’ll find all the details about:

  • The fine tuning of the technologies used
  • The custom development regarding: disk usage & remote read proxy; RAM usage, cardinality and why it gave birth to our own agent
  • Victoria Metrics and why it replaced InfluxDB