..
Prometheus notes
I’ve been working on $current_job’s monitoring stack for a few months now. It’s the first time I’ve been working with Prometheus, so it is a learning experience.
Here is a list of thoughts I have about the product in general :
- The fact that all the config is in plain-text makes it easy to build incrementally.
- There is no way to monitor info that is string-based and regularly changing. There are not a lot of info of this type that we’d like to monitor, but when we encounter it, it’s not a good time.
- SNMP was hard to figure out, it kinda always is. But we got there.
- The built-in way of looking at alerts is by alert type, whereas I’m more used to looking at them by host (or instance, in prometheus terms). I am currently thinking about making a page of all alerts which resembles the way Zabbix does it, as I can wrap my head around it a lot more easily.
- It is fast and not CPU hungry, really well optimized, and robust.
- You cannot choose different retention times per metric, it’s one dial only for the whole database. Maybe there is a way to cheat this with a separate long-term storage location with a different instance dedicated to it.
Maybe I’ll edit this with more info as I go on.