Recertified HDD experience (happy ending for now)
from eskuero@lemmy.fromshado.ws to homelab@lemmy.ml on 07 Jan 00:55
https://lemmy.fromshado.ws/post/318619

So only one small question/wonder at the end but I guessed this might be useful written down experience for anyone who is planning to do the same.

I grabbed two 20TB Seagate Exos (ST20000NM002C) as Manufacturer Recertified for 260€ each (they are now listed at 290€ lol)

My plan was to use one in my homelab to replace the current 8TB one that was already at 60% capacity since I planned to give new family members access to it. And the other one would go to a NAS on a separate location to mirror the jellyfin media and backup the immich library.

They arrived quickly, well packaged, no apparent damage, no issues so far.

I plugged the first drive. SMART data showed 280 hours of power on and one “short offline selftest” at 232 hours without errors. I just accepted it was just probably reset by Seagate and I would never knew the true data.

So I sent it to run a extended test which claimed would take roughly 40 hours to finish. While it was running I formatted the new drive and periodically run a rsync of the data from the existing 8TB to advance the migration. The long test finished without reported issues so I was a bit relieved. I shutdown the services, run one more sync, replaced the mountpoint with the new drive and rebooted. Everything worked without hiccups.

I plugged the second drive on the same machine to locally do the first sync before moving it to the offsite nas. This one’s SMART data showed 70 hours of power on. No tests logged. I started a extended test. This time it claims it would only take 26 hours. What? The drives are theoretically identical. Is this something hardcoded/configured differently on the firmware? Or is the drive just faster? For the first time I have the idea of running an quick benchmark.

# hdparm -t --direct /dev/disk/by-id/ata-ST20000NM002C-3X6103_******** /dev/disk/by-id/ata-ST20000NM002C-3X6103_********E /dev/disk/by-id/ata-ST8000VN002-2ZM188_********

/dev/disk/by-id/ata-ST20000NM002C-3X6103_********:
 Timing O_DIRECT disk reads: 786 MB in  3.01 seconds = 261.46 MB/sec

/dev/disk/by-id/ata-ST20000NM002C-3X6103_********:
 Timing O_DIRECT disk reads: 606 MB in  3.01 seconds = 201.34 MB/sec

/dev/disk/by-id/ata-ST8000VN002-2ZM188_********:
 Timing O_DIRECT disk reads: 620 MB in  3.01 seconds = 206.24 MB/sec

The first 20TB drive turns out the slowest of all at 201MB/s, slightly behind the already existing 8TB at 206MB/s, but insanely behind the second 20TB drive that’s supposedly identical at 261MB/s

Wait I’m an idiot of course the first drive must be slower. It’s actually mounted and might be in use. So I stop all the services that read from it. The benchmark stays consistent.

Only final difference I can control is the SATA port/cable the second drive is connected. Actually I don’t expect an issue with any of them but the second TB drive is connected on an HDD bay builtin onto the case because I didn’t want to bother removing the screws again just to temporarily put it on an internal tray. This bay doesn’t provide a cooling fan so while the first 20TB drive and the existing 8TB one are stable at around 30 degrees the second one on the outside is hitting 45 degrees. Does running it warmer make it goes faster?

So my only educated guess now is that the second drive is faster because it was recertified with barely any usage hours (maybe lived as an spare?) and was kept with stock fw while the first one was recertified after already running for thousands of hours and they kneecapped the firmware to safer values to make sure it goes through the warranty period without dying.

They seem to be good drives so can’t really complain. Could have been worse since apparently I got lucky and landed an unused one.

Or maybe the opposite and I’m lucky that I at least got a battle tested one that will run for years on safer speeds.

Or maybe I’m entirely unlucky because I got a coin toss of unused drive and one with so much usage that’s it’s slower not because of firmware limits but because it’s running on fumes and will both die in two months.

Thanks for listening to my ted talk.

#homelab

threaded - newest