Hosting Lemmy experience

Hosting Lemmy experience
from nachitima@lemmy.ml to selfhosted@lemmy.world on 31 Mar 14:11
https://lemmy.ml/post/45272461

I’m considering starting a Lemmy instance with a limited federation model, and one of the things I’m thinking about from the start is how to support and maintain it as it grows, while spending as little attention as possible on the technical side of infrastructure management itself.

Because of that, I’m especially interested in hearing from admins who host Lemmy instances, particularly larger ones. I’d like to understand what your actual workflow looks like in practice: how you organize administration, what methodologies you use, how you handle backups, data recovery, upgrades, monitoring, and infrastructure maintenance in general. I’m also interested in whether there are any best practices or operational patterns that have proven reliable over time.

From what I’ve found so far, the official Lemmy documentation on backup and restore seems reasonably good for small instances, but as the instance grows, more nuances and complications appear. So ideally, I’d like to find or assemble something closer to a real guideline or runbook based on practices that are actually used by admins running larger instances.

If you run or have run a Lemmy instance, especially one that had to scale beyond a small personal or experimental setup, I’d really appreciate hearing about your experience. Even brief notes, links to documentation, internal checklists, or descriptions of what has and hasn’t worked for you would be very useful.

#selfhosted

threaded - newest

Mubelotix@jlai.lu on 31 Mar 14:23 next collapse

I have heard image pruning didn’t work well and disks filled up pretty fast

flamingos@feddit.uk on 31 Mar 14:44 collapse

This is the thumbnail generation functionality, that would create a local thumbnail for all image posts and keep them around forever. You can disable this by setting image_mode to None in lemmy.hjson and you’ll only store what users actively upload.

kat@lemmy.blehiscool.com on 31 Mar 14:31 next collapse

I run a modest Lemmy instance (lemmy.blehiscool.com). It’s not on the scale of lemmy.world or anything, but it’s been around long enough that I’ve had to deal with some real growth and scaling issues. I’ll try to focus on what actually matters in practice rather than theory.

Infrastructure

I’m running everything via Docker Compose on a single VPS (22GB RAM, 8 vCPU). That includes Postgres, Pictrs, and the Lemmy services.

This setup is great right up until it suddenly isn’t.

The main scaling issue I hit was federation backlog. At one point, the queue started piling up badly, and the fix was increasing federation worker threads (I’m currently at 128).

If you run into this, check your lemmy_federate logs—if you see:

“Waiting for X workers”

that’s your early warning sign.

What Actually Takes Time

Once your infrastructure is stable, the technical side becomes pretty low-effort.

The real time sink is moderation and community management. Easily 90% of the work.

On the technical side, my setup is pretty straightforward:

Auto updates: Watchtower (with major versions pinned)
Monitoring: Uptime Kuma
Backups: Weekly pg_dump + VPS-level backups

Backups are boring right up until they aren’t. Test your restores. Seriously.

Where the Gaps Are

The main gaps I’ve run into:

Pictrs storage growth Images from federated content add up fast. Keep an eye on disk usage.
Postgres tuning As tables grow, default configs start to fall behind.
Federation queue visibility There’s no great built-in “at a glance” view—you end up relying on logs.

My Actual Workflow

Nothing fancy, just consistent habits:

Daily (quick check):

Check Uptime Kuma
Skim logs for obvious errors

Weekly:

Check disk usage (especially Pictrs)

Monthly:

Update containers (after reading changelogs)
Verify backups can actually be restored

As needed:

Moderation decisions

What I’d Do Differently

If I were starting over:

Set up proper log aggregation much earlier (still a weak spot for me)

TL;DR

Infra is the easy part once stable
Moderation is the real workload
Backups matter more than you think (and need testing)
Logs are your best friend—but painful without centralization

Happy to answer specifics if you’re planning a setup—there’s a lot of small gotchas that only show up once you’ve been running things for a while.

nachitima@lemmy.ml on 02 Apr 16:11 collapse

Hey, super helpful comment.

A few of the details you mentioned are exactly the kind of practical stuff I’m trying to collect, so I wanted to ask a bit more:

When you say you pushed federation workers up to 128, which exact setting are you referring to?
Roughly how big is your instance in practice — users, subscriptions, remote communities, storage size, daily activity?
What were the first signs that federation was falling behind, besides the Waiting for X workers log message?
Did increasing workers fully solve it, or did it just move the bottleneck somewhere else?
What kind of Postgres tuning ended up mattering most for you?
For backups, are you only doing weekly pg_dump + VPS backups, or also separately backing up pictrs, configs, secrets, and proxy setup?
Have you tested full restore end-to-end on another machine?
For pictrs growth, have you found any good way to keep storage under control, or is it mostly just “plan for it to grow”?
For monitoring/logging, if you were starting over, what would you set up from day one?

I’m mostly interested in the boring operational side of running Lemmy long-term: backup/restore, federation lag, storage growth, and early warning signs before things get messy.

Sorry if some of these questions are a bit basic or oddly specific — I’m using AI to help gather as much real-world Lemmy hosting experience as possible, and it generated most of these follow-up questions for me.

tofu@lemmy.nocturnal.garden on 31 Mar 14:35 next collapse

Single user instance. I’m not doing anything besides updating the docker images.

bjoern_tantau@swg-empire.de on 01 Apr 14:56 next collapse

I run an instance just for myself and it was a nightmare on HDD and 16 GB RAM. It was slow as molasses. Supposedly the database layout will be fixed with the 1.0 release that is just around the corner.

Since I upgraded to 64 GB it’s been pretty smooth. Still wild that that is necessary for a single user.

Also, disable image proxying. I have no idea what pict-rs does but it seems to be too much.

You should consider running Piefed instead. It’s not as resource hungry as Lemmy.

bridgeenjoyer@sh.itjust.works on 01 Apr 15:38 next collapse

Question, what is the purpose of running an instance for yourself ?

bjoern_tantau@swg-empire.de on 01 Apr 15:41 collapse

Control. I’m not beholden to anyone. My server is federating exactly those communities that interest me.

bridgeenjoyer@sh.itjust.works on 01 Apr 15:43 collapse

No I get that, but what do you do on it??

bjoern_tantau@swg-empire.de on 01 Apr 15:55 collapse

Same stuff you do on any other instance. Looking at stuff, upvoting, downvoting, posting and commenting.

bridgeenjoyer@sh.itjust.works on 01 Apr 16:02 collapse

Ohh I see now. I was just wondering why you’d make a community for only yourself so I didnt get that. But you’d basically have it for your account to live on.

meldrik@lemmy.wtf on 01 Apr 18:16 next collapse

Do you mean a 64GB SSD or did you really have to upgrade to 64GB RAM to run your Lemmy instance?

bjoern_tantau@swg-empire.de on 01 Apr 20:17 collapse

RAM. Maybe 32 would have been enough but 64 cost as much as 32 so that decision was easy.

nachitima@lemmy.ml on 02 Apr 16:07 collapse

Hey, this is really useful.

I wanted to ask a few follow-ups, because the jump from 16 GB to 64 GB sounds pretty dramatic:

What kind of storage were you using when it was struggling — HDD, SSD, NVMe?
Did you only increase RAM, or did storage / CPU / other settings change too?
Roughly what kind of workload was this? Number of users, subscribed communities, amount of federated traffic, image-heavy browsing, etc.
Do you remember what the actual bottleneck looked like — high RAM use, swap, I/O wait, Postgres getting slow, pictrs, federation queue buildup?
When you say disabling image proxying helped, how much did it help in practice?
Was this on a recent Lemmy version, or a while back?

I’m trying to separate “Lemmy really needs big hardware” from “a specific part of the stack was the real problem”.

bjoern_tantau@swg-empire.de on 02 Apr 16:57 collapse

I was and still am on HDD. The CPU was upgraded as well. I migrated to a new server.

The main culprit was the database. As far as I’m aware Lemmy is missing some indexes and due to the ORM they used didn’t always have optimised queries. Now with 64 GB RAM the whole database (almost 30 GB) fits in there fixing most of those issues.

The real fix will probably come with Lemmy 1.0. They radically changed the database layout and queries.

Image proxying wasn’t bad for performance. Just storage space. It was growing really really fast. Now that only I am using it to host the pictures I uploaded it is still much too large (24 GB). But its directory structure is so convoluted that I can’t really debug it. My stuff really shouldn’t be taking up more than a few hundred MBs.

I am the only one using this instance. I am subscribed to a hundred communities or so. I am always pretty up to date with my Lemmy versions.

meldrik@lemmy.wtf on 01 Apr 18:23 next collapse

Host PieFed instead.

With that said, I host Lemmy.wtf, which is a rather small instance.

Proxmox > LXC > Docker.

The container has 6 vcores and 6GB RAM.

Backups are handled by Proxmox.

nachitima@lemmy.ml on 02 Apr 16:07 collapse

Hey, thanks for sharing this.

I’m trying to get a clearer picture of what a reliable Lemmy backup/restore setup looks like in practice, especially for self-hosting.

A few things I’d be curious about in your setup:

Are your Proxmox backups enough on their own, or do you also make separate Postgres dumps?
Are you backing up the whole container/VM image, or do you also separately keep pictrs data, config files, secrets, reverse proxy config, etc.?
Have you actually tested a full restore from backup onto another machine? If yes, did it come back cleanly?
Do you do local-only backups, or also offsite copies?
When you update Lemmy, do you rely on rollback from snapshots if something breaks, or do you have another recovery path?

Main thing I’m trying to understand is whether Proxmox-only backups are “good enough” operationally, or whether people still end up needing app-level backups too.

meldrik@lemmy.wtf on 02 Apr 19:00 collapse

Backups of the LXC container has been enough, but I’m thinking about doing a secondary backup of the DB itself.

Everything runs in the same container.

I have restored the container many times. Also, every time there’s an update, I backup the container first and then I can restore it if something breaks.

I have 2 local backups and then 3 remotely at Hetzner.

Decronym@lemmy.decronym.xyz on 01 Apr 18:30 collapse

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

Fewer Letters	More Letters
LXC	Linux Containers
NVMe	Non-Volatile Memory Express interface for mass storage
SSD	Solid State Drive mass storage
VPS	Virtual Private Server (opposed to shared hosting)

4 acronyms in this thread; the most compressed thread commented on today has 10 acronyms.

[Thread #205 for this comm, first seen 1st Apr 2026, 18:30] [FAQ] [Full list] [Contact] [Source code]