Monitor HDD/SSD health #92
Labels
No Label
dns
exploration
gitea
mail
new stuff
services
software
art
backup
big
blocked
bug
crash report
disputed
documentation
duplicate
enhancement
good first issue
logging
nixos
question
salt
security
servers n' hardware
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Drift/issues#92
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
All hosts should do filesystem scrubbing, SMART-testing or similar tests periodically to detect failing disks, potentially even before they cause any data loss.
Any failing drives and/or filesystems should raise an alert to Drift (On e-mail, Matrix, etc.).
I've been having a look at smartd in nixpkgs, and it seems to be relatively straight forward to set up. However, it's currently built without it's systemd integration upstream. Maybe we should dogfood a patch until we've been able to upstream it?
nixpkgs has a custom script for notifying via any combination of email, systembus-notify, wall and xmessage, but it's probably most reasonable to just use email. That means we need to set up MTAs on our nixos machines however.
Not sure about the state of smartd on debian/freebsd, but we seem to already have some kind of cronjobs running there?
Alternatively, something like https://github.com/matusnovak/prometheus-smartctl could also be considered. Doesn't need to be mutually exclusive with the email notifications, but if we set up alerts in grafana it might become redundant.
The NixOS machines are now fine, but the Debian machines still need to implement the mail notification system.