Fix remote logging #138
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What's the current state of remote logging? We do have a loki instance on ildkule, but it's not doing anything interesting at the moment?
Fix proper remote loggingto Fix remote loggingIts was too slow to be useful, the times I tried using it anyways.
Right now it's broken since we moved all our logging/graphs/uptime tracking to openstack.
Which doesn't work (yet?) Felix is working on it I think
Loki ingests logs just fine (when the server is up), but the grafana search and filtering can be quite slow and heavy, as it is naturally a resource intensive task. My main issue with loki is however that configuring alerts and rules in Grafana is awful, both for prometheus and loki data.
I think rsyslog is a great system for us, alternatively something like https://www.elastic.co/blog/get-system-logs-and-metrics-into-elasticsearch-with-beats-system-modules if we want to try something entirely different.
rsyslog is performant and cool, and has a mature and sane ecosystem of filtering and alerting (output modules like ommail and omhttp).
Whatever system we use will probably both require a performant server, and still be sluggish, as logs are heavy.
Both filebeat, rsyslog and promtail are however surprisingly lightweight on each of the clients.
Earlier today I had a look into the state of journald-remote, and it was in surprisingly good shape.
It has these continuous https streams (which we can authenticate with client certs and CA if we'd like, or just authenticate by IP), and it continuously streams all journald logs to another machine, with good systemd watchdog integration and everything. It allows us to run
journalctl --merge
on the receiving machine, and see all logs for all hosts, tagged by hostname. We can also easily pump the logs into something like signoz (free elasticsearch alternative), loki or meilisearch (with a frontend) to get better search possibilities.There's also journalwatch if we'd like to add alerts directly, but there might be better options after ingestion?
There's a branch with most of the required setup at https://git.pvv.ntnu.no/Drift/pvv-nixos-config/src/branch/systemd-journald-remote, but it's kinda blocked by #146
FWIW I've heard journald-remote breaks a lot, I would have suggested it earlier otherwise.
I think getting a faster machine for loki or ELK is what we want, and if we have that then the pushing of the logs is a non-issue. (unless the point was mostly to get a text-based greppable thing, which could be nice!)
I've heard you say it before, but I do think this information could be outdated. I skimmed through the issue list when I tested it out, and it seems like they have been actively working to improve it lately, fixing stuff like https://github.com/systemd/systemd/issues/9858, and with PRs like https://github.com/systemd/systemd/pull/31313 (although not yet merged). I have not tested it for long, but I'm currently using it in my homelab.