Raidz2 or btrfs for important document storage?
from CorrectAlias@piefed.blahaj.zone to selfhosted@lemmy.world on 25 May 09:41
https://piefed.blahaj.zone/c/selfhosted/p/781372/raidz2-or-btrfs-for-important-document-storage

I currently have a secondary pool (with raidz2) that I was originally going to use for my important documents, such as storage for Paperless-ngx, as raidz offers corruption detection and repair. The pool is encrypted.

However, I’m concerned about rebuild times (it’s a pool of 4 22TB drives). Is btrfs a better choice for this use case, or should I just go with raidz like I originally planned?

#selfhosted

threaded - newest

exu@feditown.com on 25 May 09:47 next collapse

Maybe you could switch to a raid10 (mirrored striped vdevs) for faster rebuild time.

BTRFS is relatively similar to ZFS when it comes to their raid implementation, though using raid5 or raid6 comes with some caveats.

felbane@lemmy.world on 25 May 11:04 collapse

I would absolutely not trust BTRFS’s implementation. Maybe things are better now but it earned the backronym Bro The RAID Fuckin Sucks for a reason.

tal@lemmy.today on 25 May 10:03 next collapse

I was originally going to use for my important documents

Not quite what you’re asking, but if your concern is avoiding data loss, if you don’t already, I’d set up a backup before I started setting up a RAID or similar setup.

i078@europe.pub on 25 May 10:47 next collapse

While redundancy in a drive setup helps, it’s not really a backup and thus not a “safe” way to store important information on it’s own.

That said, selecting the way you setup a raid system is based on risk and utility. I have a raid1 with a hotspare for important files. And use raid5 with 3&4 drives for less important stuff. You can also optimise for reading speed for example (as the same file can be drawn from multiple drives)

chris@l.roofo.cc on 25 May 10:54 collapse

Like you said: RAID is not a backup. If it’s import follow at least the 3-2-1 rule. 3 copies on at least 2 different media, 1 of them off site.

felbane@lemmy.world on 25 May 11:01 next collapse

RAID is not a backup.

RAID is not for data safety.

RAID is for:

  1. Ensuring availability of data in the face of hardware failure. That means your files don’t disappear when a drive dies and you have some time to swap out for functional hardware and restore redundancy.
  2. Presenting multiple drives as one larger unit. This is what striping does, and to a lesser extent the parity-mode levels.
  3. Improving performance (sometimes). A RAID mirror is generally much faster to read from than any individual drive because reads can be interleaved across drive members. A stripe can be much faster because writes are distributed across drive members. This is less of a bonus today with solid state/nvme drives, but it’s still applicable to spinning rust.

If your concern is protecting your data, set up a 3-2-1 backup strategy.

Decronym@lemmy.decronym.xyz on 25 May 11:10 next collapse

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

Fewer Letters More Letters
HTTP Hypertext Transfer Protocol, the Web
IP Internet Protocol
RAID Redundant Array of Independent Disks for mass storage
SSL Secure Sockets Layer, for transparent encryption
TLS Transport Layer Security, supersedes SSL
VPN Virtual Private Network
ZFS Solaris/Linux filesystem focusing on data integrity
nginx Popular HTTP server

6 acronyms in this thread; the most compressed thread commented on today has 15 acronyms.

[Thread #314 for this comm, first seen 25th May 2026, 11:10] [FAQ] [Full list] [Contact] [Source code]

Blue_Morpho@lemmy.world on 25 May 13:00 collapse

Why was this upvoted? It’s AI slop giving definitions for acronyms that aren’t in this thread and not even related to backups.

Zeoic@lemmy.world on 25 May 13:27 collapse

It isn’t AI, you can take a look at the source code for it from the url it provides. Obviously the detection needs some tweaking, but extra acronyms in the list doesn’t really hurt anything when the other half are relevant.

Blue_Morpho@lemmy.world on 25 May 16:18 collapse

Detection is completely broken because it finds terms that aren’t anywhere in the thread, even as substrings.

AI isn’t just LLM.

tychosmoose@piefed.social on 25 May 11:18 next collapse

For your situation I would be more likely to go with a single drive with btrfs and dup for metadata redundancy. Regular snapshots and scrubs.

Use a second drive in the same system with btrfs to store snapshots at wider scheduled intervals. These will be bigger since no CoW on the separate file system. Scheduled scrub here too.

Use a third drive with ext4 as a backup target using a separate backup mechanism.

Use the fourth drive as a spare, or in a separate location as a target to send the backups if you don’t already have an off-site solution.

xrun_detected@programming.dev on 25 May 12:42 next collapse

I’d stay on zfs, I simply don’t trust btrfs’ raid implementation. For very important documents I also set copies=2 (or 3) on that dataset, just in case.

And as others already said: 3-2-1 backups ;)

oats@piefed.zip on 25 May 12:52 next collapse

Filesystem doesn’t really matter once you have a reliable, redundant off-site backup and recovery plan set up and tested.

Really, use what fs feels best for you. And do your backups.

Did I mention backups are important?

MangoPenguin@piefed.social on 25 May 14:27 collapse

What about 2 mirrored pools of 2 drives each, then back up the main pool to the other with either ZFS snapshots or a tool like Restic.

Ideally you also need an offsite backup of important files too, but that gets you part way to a robust system that can handle corruption or accidental deletions.