Timeseries, Long Term Storage, Multi tenancy & High Availability
This article is a retrospective of several months of continuous improvement since the creation of our current monitoring system: what challenges did we face, how we overcame them and how we finally switched to Victoria Metrics.
How it started
At Iguana Solutions we have created a multi-tenant system based on Prometheus for our alerting and metrology needs: Sismology. It began as a project to replace our monolithic Naemon and Graphite (with collectd) by a unique system merging metrology and alerting based on the current standard: Prometheus.
While Prometheus gave us a nice metrology and alerting core, we faced 3 challenges:
|Multi-tenancy: as we were planning on letting our customers access their own data, the single tenancy of prometheus would have to be overcome
|Long term storage: long as several years, indeed it is not uncommon for our customers (or ourselves) to compare a specific time of the year against year N-1 or N-2
|High availability: 0 downtime target while still having the possibility to put some nodes offline for maintenance purpose
In this article written by Edouard Hur, VP Engineering at Iguana Solutions, you’ll find all the details about:
- The fine tuning of the technologies used
- The custom development regarding: disk usage & remote read proxy; RAM usage, cardinality and why it gave birth to our own agent
- Victoria Metrics and why it replaced InfluxDB