Ah, I see you have 990 Pros so that would rule out an internal buffering issue. So what is the temperature(s) of the target ssd when this happens? Also I expect it is not nearly full?
What release of Windows 11? Some versions did have issues copying. I haven’t seen this kind of issue myself and i make large copies.
Hi all....thanks for your ideas and suggestions! I finally had a few hours this weekend to chase this mystery issue. Unfortunately, I still haven't figured it out.
In case it helps, here are few additional data points:
I'm using Win 11 Pro for Workstations 23H2 and its patched monthly
I'm not using Bitlocker, so there's no reason why the CPU would spike so much
I've optimized (retrimmed) all drives and that didn't change anything
I haven't been able to figure out what triggers the change from fast performance mode to slow performance mode
I've watched the drive temps and haven't seen any high temps that would indicate thermal throttling
Most other CPU tools (i.e., Sysinternals Process Explorer) do not see the same CPU spike that Task Manager shows....that's really weird
The Resource Monitor graph shows the CPU spike but none of the listed processes shows anything over 1%
Task Manager's "Processes" tab sometimes shows a bunch of random processes accounting for the spike but sometimes allocates all of the spike to "System".
All drives are internal drives (no external ones)
Once the slow performance starts:
It continues until the next reboot
It also impacts SATA SSDs on the same system (also limited to around 150 MB/s during slow mode)
It doesn't matter if I'm copying a file to the same drive or another drive
The NVMe drives do not appear to be the bottleneck as they show active times around 4-5%
Copying files I've recently copied (likely in cache) and those that I haven't touched lately (likely not in cache) see the same performance
I don't believe MS copy algorithms or caching are so bad that 150 MB/s is the max they can push once the problem starts. SSDs capable of 500 MB/s have been around for what, a decade now?
It also doesn't make sense that I can see 4-5 GB/s reliably for dozens of large file copies for a day or two and then the system suddenly can't hold those speeds for more than a second or two before degrading to 150 MB/s.
Also keep in mind that this system has 256 GB of memory and rarely has more than 20% in use. I assume the Windows cache is in memory and memory is much faster than even NVMe SSDs.
This system does have VMware Workstation on it so my best guess is that the virtualization stack is somehow getting involved. I usually have 1 very lightweight VM running on this box. However, I've tried a bunch of tests with/without VMs running and haven't been able to find a reproducible scenario.
I created an account here to say that I'm struggling with this exact issue on multiple systems. In most cases, I'm lucky to get 40-50MB/s when the problem starts, so I believe this is a bug and not a some sort of caching issue.
I also, want to chime in with a bit of an extreme datapoint. One of the systems I use is an AI workstation with an AMD Epyc 9654 (96 cores/192 threads) with 768GB of RAM. It's running Windows 11 Pro for Workstations build 22631.4602. Below is a screenshot showing almost no CPU usage while the file copy is performing well. Then CPU usage spikes ACROSS ALL 192 threads and copy performance tanks to 50MB/s. You can't see it well in the graph, but all of the CPU time is 'kernel' time.
This particular copy is going from/to the same internal NVMe drive. It's a Corsair MP600 Pro LPX 8TB. HWinfo reports temps 52-53C. I get similar behavior when write files to that computer over a 10Gb network, but not when I read files from that computer to a network system. I do have this problem when copying from the NVMe drive to a USB drive (also 40-50MB/s copy speed).
I also have very similar behavior on an AMD 7950x platform using a WDC SN850X NVMe drive on Windows 11 Enterprise build 22631.4602 (same build as above).