Service monitoring
from Shimitar@feddit.it to selfhosted@lemmy.world on 13 Jan 10:29
https://feddit.it/post/13974645

Hi all!

i have a nice setup with some containers (podman rootless) and bare metal services (anything i can install bare metal, goes bare metal usually).

I used Monit, in the past, to keep an eye on my services and automatically restart something that for any reason goes down. I stopped using Monit because doesnt scale well on mobile browser and it’s frankly clumsy to configure.

I could go back to Monit i guess, but i am wondering if there is anything better out there to try.

A few requirements (not necessarily mandatory, but preferable):

What do you guys use?

#selfhosted

threaded - newest

Strit@lemmy.linuxuserspace.show on 13 Jan 10:47 next collapse

I think Beszel fits most of those. Not sure if it can restart containers/services.

Shimitar@feddit.it on 13 Jan 12:19 collapse

Nice tool! Simple to setup and pretty lightweight. It seems it cannot restart services tough, not monitor them specifically…

asap@lemmy.world on 13 Jan 11:02 next collapse

For Podman you don’t need anything else other than Podman to monitor and restart failed containers:

podman-compose --podman-run-args='--health-on-failure=restart' up -d

For anything else I use healthchecks.io

Shimitar@feddit.it on 13 Jan 12:20 next collapse

Thanks for the podman restart suggestion!

healthchecks.io seems not free or at least, very not open?

asap@lemmy.world on 13 Jan 12:44 collapse

It’s just what I use, as I’m specifically looking for something which only notifies when things aren’t able to report due to failure. Free for 20 checks which is more than enough for me.

If I were hosting it myself I wouldn’t know if my own notification system had failed (since it wouldn’t be able to report due to failure.)

mosiacmango@lemm.ee on 13 Jan 19:18 collapse

If youre using podman quadlets, this config in the systemd service file does the same:

[Service]

Restart=always

Lifebandit666@feddit.uk on 13 Jan 11:26 next collapse

I think I’m a step behind you. I use Uptime Kuma for monitoring and it worked really well. Just have it running on a pi separate from my main machine.

I worked out how to get it sending me emails when things are down and up, and now my email inbox is a fucking hot mess of notifications.

So I’ve just this weekend integrated it into Home Assistant and set it to notify me when things are down for 5 minutes or more.

My next step was going to be finding some way of integrating Portainer into Home Assistant so I can restart stopped containers, and maybe Proxmox so I can reboot VMs from HA. Not sure it’s possible yet though.

Ultimately I want to have HA send me a notification with actionable buttons with “reboot container” and “reboot VM” which, when pressed, will sort the issue out.

However this will not help when one of my drives goes down. They’re HDDs plugged in by USB3 which isn’t great and my server is behind the coat rack so sometimes the kids just throw their coats on and it falls onto my server, which then heats up and goes silly.

marsara9@lemmy.world on 13 Jan 14:32 collapse

Can you share the Home Assistant automation / setup that you have for Uptime Kuma notifications? As I’m in the same boat as you. I just got a webhook setup but I’m getting flooded with notifications, especially after services update.

My hope is I just want to be notified when a particular service is down for say 5 minutes but all I care about is knowing the node name. I don’t necessarily care to get notified if the service comes back up.

Lifebandit666@feddit.uk on 13 Jan 17:54 next collapse

I did it all in Node Red so unfortunately I can’t share the automation, but I can point you at this HACS integration github.com/meichthys/uptime_kuma

Set that up and all your nodes will be visible in HA then it’s just a case of “if node X is off for X minutes” - “notify”

Darkassassin07@lemmy.ca on 14 Jan 06:06 collapse

This is what the ‘retries’ setting in each monitor is for. It will only be considered down if its failed its heartbeat check <retries> number of times in a row.

atzanteol@sh.itjust.works on 13 Jan 13:28 next collapse

Do not use “bare metal” in this way. “Outside containers” is sufficient.

d00phy@lemmy.world on 13 Jan 14:00 collapse

Yeah, I got a little confused there!

0x0@programming.dev on 13 Jan 14:21 collapse

Maybe icinga?