from nachitima@lemmy.ml to selfhosted@lemmy.world on 31 Mar 14:11
https://lemmy.ml/post/45272461
I’m considering starting a Lemmy instance with a limited federation model, and one of the things I’m thinking about from the start is how to support and maintain it as it grows, while spending as little attention as possible on the technical side of infrastructure management itself.
Because of that, I’m especially interested in hearing from admins who host Lemmy instances, particularly larger ones. I’d like to understand what your actual workflow looks like in practice: how you organize administration, what methodologies you use, how you handle backups, data recovery, upgrades, monitoring, and infrastructure maintenance in general. I’m also interested in whether there are any best practices or operational patterns that have proven reliable over time.
From what I’ve found so far, the official Lemmy documentation on backup and restore seems reasonably good for small instances, but as the instance grows, more nuances and complications appear. So ideally, I’d like to find or assemble something closer to a real guideline or runbook based on practices that are actually used by admins running larger instances.
If you run or have run a Lemmy instance, especially one that had to scale beyond a small personal or experimental setup, I’d really appreciate hearing about your experience. Even brief notes, links to documentation, internal checklists, or descriptions of what has and hasn’t worked for you would be very useful.
threaded - newest
I have heard image pruning didn’t work well and disks filled up pretty fast
This is the thumbnail generation functionality, that would create a local thumbnail for all image posts and keep them around forever. You can disable this by setting
image_modetoNoneinlemmy.hjsonand you’ll only store what users actively upload.I run a modest Lemmy instance (lemmy.blehiscool.com). It’s not on the scale of lemmy.world or anything, but it’s been around long enough that I’ve had to deal with some real growth and scaling issues. I’ll try to focus on what actually matters in practice rather than theory.
Infrastructure
I’m running everything via Docker Compose on a single VPS (22GB RAM, 8 vCPU). That includes Postgres, Pictrs, and the Lemmy services.
This setup is great right up until it suddenly isn’t.
The main scaling issue I hit was federation backlog. At one point, the queue started piling up badly, and the fix was increasing federation worker threads (I’m currently at 128).
If you run into this, check your
lemmy_federatelogs—if you see:that’s your early warning sign.
What Actually Takes Time
Once your infrastructure is stable, the technical side becomes pretty low-effort.
The real time sink is moderation and community management. Easily 90% of the work.
On the technical side, my setup is pretty straightforward:
pg_dump+ VPS-level backupsBackups are boring right up until they aren’t. Test your restores. Seriously.
Where the Gaps Are
The main gaps I’ve run into:
Pictrs storage growth Images from federated content add up fast. Keep an eye on disk usage.
Postgres tuning As tables grow, default configs start to fall behind.
Federation queue visibility There’s no great built-in “at a glance” view—you end up relying on logs.
My Actual Workflow
Nothing fancy, just consistent habits:
Daily (quick check):
Weekly:
Monthly:
As needed:
What I’d Do Differently
If I were starting over:
TL;DR
Happy to answer specifics if you’re planning a setup—there’s a lot of small gotchas that only show up once you’ve been running things for a while.
Single user instance. I’m not doing anything besides updating the docker images.