In search for a new self-hosted LLM
from tanka@lemmy.ml to selfhosted@lemmy.world on 11 Apr 04:30
https://lemmy.ml/post/45766694

Hey :) For a while now I use gpt-oss-20b on my home lab for lightweight coding tasks and some automation. I’m not so up to date with the current self-hosted LLMs and since the model I’m using was released at the beginning of August 2025 (From an LLM development perspective, it feels like an eternity to me) I just wanted to use the collective wisdom of lemmy to maybe replace my model with something better out there.

Edit:

Specs:

GPU: RTX 3060 (12GB vRAM)

RAM: 64 GB

gpt-oss-20b does not fit into the vRAM completely but it partially offloaded and is reasonably fast (enough for me)

#selfhosted

threaded - newest

carzian@lemmy.ml on 11 Apr 04:33 next collapse

I’m in the same boat. You’ll get better responses if you post your machine specs. I

Gumus@lemmy.dbzer0.com on 11 Apr 04:46 next collapse

I’d say Qwen 3.5 and Gemma 4 beat GPT OSS in every aspect.

tal@lemmy.today on 11 Apr 05:07 next collapse

I’m not on there, but you might have more luck in !localllama@sh.itjust.works

You might also want to list the hardware that you plan to use, since that’ll constrain what you can reasonably run.

theunknownmuncher@lemmy.world on 11 Apr 05:16 next collapse

How much VRAM?

Jozzo@lemmy.world on 11 Apr 05:17 next collapse

I find Qwen3.5 is the best at toolcalling and agent use, otherwise Gemma4 is a very solid all-rounder and it should be the first you try. Tbh gpt-oss is still good to this day, are you running into any problems w it?

tanka@lemmy.ml on 11 Apr 07:23 collapse

No problems per se. I just thought that I had not checked for an update for a longer time.

cron@feddit.org on 11 Apr 05:19 next collapse

The latest open weights model from google might be a good fit for you. The 26B model works pretty well on my machine, though the performance isn’t great (6 tokens per second, CPU only).

ejs@piefed.social on 11 Apr 07:03 next collapse

I suggest looking at llm arena leaderboards filtered by open weight models. It offers benchmarks at a very complete and statistically detailed level for models, and usually is quite up to date when new models come out. The new Gemma that just came out might be the best for 1x GPU, and if you have a bunch of vram check out the larger Chinese models

sompreno@lemmy.zip on 11 Apr 07:23 next collapse

What are your computer specs?

tanka@lemmy.ml on 11 Apr 07:25 collapse

I did just update my post with the specs. Maybe it takes a while to federate?

sompreno@lemmy.zip on 11 Apr 09:34 collapse

I must have not refreshed ignore my comment

jaschen306@sh.itjust.works on 11 Apr 10:55 next collapse

I’m running gemma4 26b MOE for most of my agent calls. I use glm5:cloud for my development agent because 26b struggles when the context windows gets too big.

Kirk@startrek.website on 11 Apr 12:14 next collapse

Just curious, what does “some automation” entail? I thought LLMs could only work with text, like summarize documents and that sort of thing.

a1studmuffin@aussie.zone on 11 Apr 13:03 next collapse

These days they can also chain together tools, keep a working memory etc. Look at Claude Code if you’re curious. It’s come very far very quickly in the last 12 months.

Kirk@startrek.website on 11 Apr 16:08 collapse

OP said coding AND “some automation”, what is being automated?

SuspciousCarrot78@lemmy.world on 11 Apr 16:56 collapse

Some examples

  • Tell Home Assistant to adjust lights/thermostat/locks in plain English based on certain conditions being met
  • Ask Jellyfin/Plex to play something based on a vague description like "something like Interstellar but lighter"
  • Morning briefing that pulls calendar, weather, emails and traffic into a 60-second summary automatically. Or get it to read it to you out loud while you shave.
  • Schedule the robot mower or vacuum based on weather forecast via API
  • Fetch information for you off net at set intervals and update you (email, SMS etc)
  • CCTV uses (classification etc)
  • Batch rename files, sort downloads, resize images - stuff you’d normally write a one-off script for
  • Parse a booking reply email, confirm the time, add it to your calendar, set reminders
  • Tag and name your own pictures based on meta data

That’s probably just the basics. People have some clever uses for these things. It’s not just summarize this document

Kirk@startrek.website on 11 Apr 17:47 collapse

That’s cool, it just… does those things? How does it connect to those apps? I can’t even get Gemini to set a reminder and that’s on a Google device.

nutbutter@discuss.tchncs.de on 11 Apr 16:29 next collapse

Have you tried the new gemma4 models? The e4b fits in the 12gb memory and is pretty good. Or you can use 31b too, if you’re okay with offloading to CPU.

SuspciousCarrot78@lemmy.world on 11 Apr 16:43 next collapse

What sort of coding and what sort of automation tasks? The latter is an easier ticket to fill than the former, though I might have an idea for you on that end if coding is a must

iceberg314@slrpnk.net on 11 Apr 17:30 collapse

I also recommend gemma4 or qwen3.5. Both super solid in my experience for how lightweight they are