I am starting to think while yes, loading it all into RAM obviously has tremendous benefits and I will just continue to use this method since I have so much of it, that there is some other underlying issue as they mentioned above. It could be my PCIe controller card, it's driver and the additional latency it adds for the CPU versus the RAM Bus.
There are so many aspects to the entire chain between DAW, slave, drives, etc. that I just feel that I have almost somehow luckily hit an amazing sweet spot for my system that I have never hit before. I already have enterprise grade Samsung NVME drives so the drives themselves are not the issue. I have 4 of them and they all perform the same level.
Not to mention that while yes, I have 64 physical cores in my slave, due to the slower clock speed (and everyone seems to agree that higher clock speeds per core is better) that loading everything into RAM's additional speed boost is helping my slower core speeds (just a theory).
I am super happy right now with this performance! In 4-5 years of using it, I have never had such incredible performance (never tried to do full RAM loads ever before)!