Monitor HDD/SSD health #92

Open
opened 2024-07-27 02:27:19 +02:00 by felixalbrigtsen · 2 comments
felixalbrigtsen commented 2024-07-27 02:27:19 +02:00 (Migrated from github.com)

All hosts should do filesystem scrubbing, SMART-testing or similar tests periodically to detect failing disks, potentially even before they cause any data loss.
Any failing drives and/or filesystems should raise an alert to Drift (On e-mail, Matrix, etc.).

All hosts should do filesystem scrubbing, SMART-testing or similar tests periodically to detect failing disks, potentially even _before_ they cause any data loss. Any failing drives and/or filesystems should raise an alert to Drift (On e-mail, Matrix, etc.).
oysteikt added the
salt
nixos
labels 2024-08-03 21:10:59 +02:00
oysteikt added this to the Kanban project 2024-08-03 22:14:24 +02:00
oysteikt added the
servers n' hardware
label 2024-08-06 18:13:44 +02:00
oysteikt added the
logging
label 2024-08-06 18:30:12 +02:00
oysteikt self-assigned this 2024-08-24 11:37:23 +02:00

I've been having a look at smartd in nixpkgs, and it seems to be relatively straight forward to set up. However, it's currently built without it's systemd integration upstream. Maybe we should dogfood a patch until we've been able to upstream it?

nixpkgs has a custom script for notifying via any combination of email, systembus-notify, wall and xmessage, but it's probably most reasonable to just use email. That means we need to set up MTAs on our nixos machines however.

Not sure about the state of smartd on debian/freebsd, but we seem to already have some kind of cronjobs running there?

Alternatively, something like https://github.com/matusnovak/prometheus-smartctl could also be considered. Doesn't need to be mutually exclusive with the email notifications, but if we set up alerts in grafana it might become redundant.

I've been having a look at smartd in nixpkgs, and it seems to be relatively straight forward to set up. However, it's currently built without it's systemd integration upstream. Maybe we should dogfood a patch until we've been able to upstream it? nixpkgs has a custom script for notifying via any combination of email, systembus-notify, wall and xmessage, but it's probably most reasonable to just use email. That means we need to set up MTAs on our nixos machines however. Not sure about the state of smartd on debian/freebsd, but we seem to already have some kind of cronjobs running there? Alternatively, something like https://github.com/matusnovak/prometheus-smartctl could also be considered. Doesn't need to be mutually exclusive with the email notifications, but if we set up alerts in grafana it might become redundant.

The NixOS machines are now fine, but the Debian machines still need to implement the mail notification system.

The NixOS machines are now fine, but the Debian machines still need to implement the mail notification system.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Drift/issues#92
No description provided.