Timeseries, Long Term Storage, Multi tenancy & High Availability
This article is a retrospective of several months of continuous improvement since the creation of our current monitoring system: what challenges did we face, how we overcame them and how we finally switched to Victoria Metrics.
How it started
At Iguana Solutions we have created a multi-tenant system based on Prometheus for our alerting and metrology needs: Sismology. It began as a project to replace our monolithic Naemon and Graphite (with collectd) by a unique system merging metrology and alerting based on the current standard: Prometheus.
While Prometheus gave us a nice metrology and alerting core, we faced 3 challenges:
|Multi-tenancy: as we were planning on letting our customers access their own data, the single tenancy of prometheus would have to be overcome|
|Long term storage: long as several years, indeed it is not uncommon for our customers (or ourselves) to compare a specific time of the year against year N-1 or N-2|
|High availability: 0 downtime target while still having the possibility to put some nodes offline for maintenance purpose|
In this article written by Edouard Hur, VP Engineering at Iguana Solutions, you’ll find all the details about:
- The fine tuning of the technologies used
- The custom development regarding: disk usage & remote read proxy; RAM usage, cardinality and why it gave birth to our own agent
- Victoria Metrics and why it replaced InfluxDB