Service monitoring
from Shimitar@feddit.it to selfhosted@lemmy.world on 13 Jan 10:29
https://feddit.it/post/13974645
from Shimitar@feddit.it to selfhosted@lemmy.world on 13 Jan 10:29
https://feddit.it/post/13974645
Hi all!
i have a nice setup with some containers (podman rootless) and bare metal services (anything i can install bare metal, goes bare metal usually).
I used Monit, in the past, to keep an eye on my services and automatically restart something that for any reason goes down. I stopped using Monit because doesnt scale well on mobile browser and it’s frankly clumsy to configure.
I could go back to Monit i guess, but i am wondering if there is anything better out there to try.
A few requirements (not necessarily mandatory, but preferable):
- Open Source (ideally: true open source, not just commercial sulutions with dumbed down free verisons)
- Not limited, or focuesd, on containers (no Watchtower and similar)
- For containers, it can just support “works” or "restart"
- For containers, if it goes above the minimum “works” and “restart” must support podman
- Must support bare metal services (status, start, stop)
- Must send email or other kind of notifications (ok IM notifications, but email preferred)
- Should additionally monitor external machines (es other servers on the LAN), or generic IP addresses
- Should detect if a web service is alive but blocked
- No need for fancy GUIs or a Web GUI (it’s a pro point, but not required)
- No need for data reporting, graphics and such aminities. They are a plus, but 100% not required.
What do you guys use?
threaded - newest
I think Beszel fits most of those. Not sure if it can restart containers/services.
Nice tool! Simple to setup and pretty lightweight. It seems it cannot restart services tough, not monitor them specifically…
For Podman you don’t need anything else other than Podman to monitor and restart failed containers:
For anything else I use healthchecks.io
Thanks for the podman restart suggestion!
healthchecks.io seems not free or at least, very not open?
It’s just what I use, as I’m specifically looking for something which only notifies when things aren’t able to report due to failure. Free for 20 checks which is more than enough for me.
If I were hosting it myself I wouldn’t know if my own notification system had failed (since it wouldn’t be able to report due to failure.)
If youre using podman quadlets, this config in the systemd service file does the same:
[Service]
Restart=always
I think I’m a step behind you. I use Uptime Kuma for monitoring and it worked really well. Just have it running on a pi separate from my main machine.
I worked out how to get it sending me emails when things are down and up, and now my email inbox is a fucking hot mess of notifications.
So I’ve just this weekend integrated it into Home Assistant and set it to notify me when things are down for 5 minutes or more.
My next step was going to be finding some way of integrating Portainer into Home Assistant so I can restart stopped containers, and maybe Proxmox so I can reboot VMs from HA. Not sure it’s possible yet though.
Ultimately I want to have HA send me a notification with actionable buttons with “reboot container” and “reboot VM” which, when pressed, will sort the issue out.
However this will not help when one of my drives goes down. They’re HDDs plugged in by USB3 which isn’t great and my server is behind the coat rack so sometimes the kids just throw their coats on and it falls onto my server, which then heats up and goes silly.
Can you share the Home Assistant automation / setup that you have for Uptime Kuma notifications? As I’m in the same boat as you. I just got a webhook setup but I’m getting flooded with notifications, especially after services update.
My hope is I just want to be notified when a particular service is down for say 5 minutes but all I care about is knowing the node name. I don’t necessarily care to get notified if the service comes back up.
I did it all in Node Red so unfortunately I can’t share the automation, but I can point you at this HACS integration github.com/meichthys/uptime_kuma
Set that up and all your nodes will be visible in HA then it’s just a case of “if node X is off for X minutes” - “notify”
This is what the ‘retries’ setting in each monitor is for. It will only be considered down if its failed its heartbeat check <retries> number of times in a row.
Do not use “bare metal” in this way. “Outside containers” is sufficient.
Yeah, I got a little confused there!
Maybe icinga?