Zpool scrub taking days? And HDD issues... Am I cooked?
from Imaginary_Stand4909@lemmy.blahaj.zone to selfhosted@lemmy.world on 06 May 06:03
https://lemmy.blahaj.zone/post/42307518

So I was trying to download a torrent (while seeding like 5 others) when I noticed my rates just kept gradually falling to 0B upload/download until spiking back up to 1-2MB before falling again. I check my Proxmox SMART test of my drives and then it shows one disk was degraded. When I try to view the overall “disks” tab in Proxmox it just times out and shows an error [communication failure (0)]

So I try to do a zpool scrub tank_name, which started Monday May 4 22:02:21 2026…

While scrubbing the checksum errors on the online repairing disk (wwn-0x5000c5004d033fc1) just keep climbing… I made the degraded disk go offline. Here’s the current status of zpool status tank_name:

root@nova:~# zpool status Orico2tera4
  pool: Orico2tera4
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Mon May  4 22:02:21 2026
        3.53G / 378G scanned at 36.9K/s, 3.47G / 378G issued at 36.3K/s
        9.61M repaired, 0.92% done, no estimated completion time
config:

        NAME                                              STATE     READ WRITE CKSUM
        Orico2tera4                                       DEGRADED     0     0     0
          mirror-0                                        ONLINE       0     0     0
            ata-ST2000NM0011_Z1P2D6SC                     ONLINE       0    13     1
            usb-External_USB3.0_DISK01_20170331000C3-0:1  ONLINE       0     0     3  (repairing)
          mirror-1                                        DEGRADED     0     1     0
            wwn-0x5000c500357c0b91                        OFFLINE      0     0    21
            wwn-0x5000c5004d033fc1                        ONLINE       0     1 2.00K  (repairing)

errors: 49 data errors, use '-v' for a list

I haven’t used these disks for super long, it’s only been about 5 months of my homelab actually being used, and I wasn’t doing constant torrenting until February. The disks are refurbished, 2TB each, and they’re stored in a USB connected drive bay. my usage is pretty low, just 432.80 GB of 4TB (11.13%)

I’ve looked at my snapshots with zfs list -t snapshot, not sure when I should try to restore from a snap, but I’ve never done it before. I’ll make sure to take backups more seriously from now on, don’t be me…

#selfhosted

threaded - newest

androidul@lemmy.world on 06 May 06:21 next collapse

yikes!

How often were you running scrub & trim?

mlfh@lm.mlfh.org on 06 May 07:19 next collapse

You have enough failures on each disk to make me suspect an issue with the usb-connected drive bay. I ran into similar issues with a cheap pci-e sata adapter, where little hiccups and latency in the communication layer would cause zfs to take disks offline randomly. Read, write, and checksum errors would slowly accumulate across all of the disks. Switched that machine to a proper enterprise hba, the issues vanished, and the disks are all healthy 3-4 years later.

xep@discuss.online on 06 May 07:52 collapse

As another poster mentioned, I’d suspect the drive bay. Those things aren’t known for being reliable.