# NVMe vs SATA: Will it make Kontakt faster?



## tack

On the subject of the benefit of NVMe over a modest SATA SSD, specifically as it relates to Kontakt, I've commented here and elsewhere that:



> At least with Kontakt's compressed samples, NVMe is completely wasted. Decent SATA SSDs are too for that matter: I bottleneck my CPU decompressing the samples as they're being loaded in long before I bottleneck my storage.



Having now investigated this a bit more, I need to moderate my position. The truth is it _can_ help, depending on what you're measuring.

There are of course a lot of aspects to Kontakt performance (another big one being DFD streaming with various preload buffers), but I wanted to answer the one I personally found most annoying: can an NVMe make my project load times faster?

So I've done a bit of benchmarking and technical analysis (you know, for fun) trying to suss out what's going on (as best as I can given Kontakt's a black box).

https://docs.google.com/document/d/1wL8XYGgd_O9fomMrK1EpSnZJeQwhVOAn91e82byj8s4/edit?usp=sharing


The doc is a bit eye-glazing so I'll just cut to the chase:

Where "project load" is measured as "I can start using my DAW now" the answer is no.
Where "project load" is measured as "all samples are fully loaded into memory and the system is idle" the answer is yes. Unless you run everything purged. 


Note that this is all quite specific to Kontakt. The ability to fully take advantage of system resources depends quite a lot on software design. So these findings aren't going to be transportable to other VIs (e.g. UVI, Omnisphere, etc).

I've posed some questions in the doc that I'd be keen to hear ideas on. Notably the fact that disk utilization % doesn't seem to map predictably onto actual disk usage metrics (throughput and IOPS) relative to the benchmarked capacity at equivalent block sizes. I proposed a possible explanation as to why that might be (the section titled "Kontakt's File Access Pathology") but I'm a bit skeptical that's the real explanation.


Cheers!


----------



## babylonwaves

hey @tack -

interesting, thanks for sharing. some time ago I've compared loading times on a mac pro in between an EVO 840 and a RAID0 of 4 EVO. in the end, I didn't see any difference that would justify to create a RAID in order to make a kontakt instrument load faster - when you have an SSD already. from what I understand that was different for you? which version of kontakt did you test with?


----------



## EvilDragon

babylonwaves said:


> which version of kontakt did you test with?



Somebody didn't read the document all the way through.


----------



## babylonwaves

EvilDragon said:


> Somebody didn't read the document all the way through.


thanks. for those who don't scroll to the last page as i did: 5.7.3


----------



## jamwerks

It'll be interesting to see how Spitfire's new sampler performs/compares. We might be using that more than Kontakt in the future!


----------



## tack

babylonwaves said:


> in the end, I didn't see any difference that would justify to create a RAID in order to make a kontakt instrument load faster - when you have an SSD already. from what I understand that was different for you?


Not really that different for me as I run everything purged. In this case all my painful wait times are entirely CPU bound and NVMe was no help. But when it comes to loading those samples into memory, NVMe did much better. (Though the data suggests Kontakt could do better on slower SSDs if it made a bit more effort.)



jamwerks said:


> It'll be interesting to see how Spitfire's new sampler performs/compares. We might be using that more than Kontakt in the future!


Yeah, I'm interested as well. I'd be even more interested if they were going to open it up to third parties a la Kontakt. And I'd be _ecstatic_ if Spitfire decided that building a sampler isn't their core business and released it as open source.


----------



## jamwerks

I wonder though if opening it up to third parties is a synonyme of the platform being hacked and cracked copies appearing?


----------



## tack

jamwerks said:


> I wonder though if opening it up to third parties is a synonyme of the platform being hacked and cracked copies appearing?


If the security of their copyright protection hinges on the secrecy of the software's source code then it is already doomed to fail.

It's analogous to developing a cryptosystem (such as TLS that secures most of the Internet): if the security of the encryption depends on the secrecy of the algorithm, as opposed to the key material being fed into the algorithm, then it's intrinsically insecure.


----------



## EvilDragon

Somehow I'm not sure Spitfire is going to be interested in opening up their player...


----------



## tack

EvilDragon said:


> Somehow I'm not sure Spitfire is going to be interested in opening up their player...


Yeah, it'd be a nice outcome, but I absolutely don't expect it either.


----------



## jules

Thanks for this real life report, tack ! Very interesting reading.


----------



## khollister

Nice work - good technical depth and test methodology. 

I was obsessed with maximizing sample streaming and minimizing load times when I built my last slave and then the Cubase DAW. I first RAID'ed everything, then tried allocating libraries across JBOD based on likely use & streaming density. It was a PIA and when I went back to a Mac with my current iMac Pro, I just took the 4 850 EVO's and organized everything by type of library so I could actually remember where shit is - Strings, Winds&Perc, Brass&Hybrid, Ensembles&Keys. Guess what? - I have never noticed any significant difference regardless of what I did. 

I did put the Spectrasonics STEAM folder and my Garritan CFX lib on the internal NVMe SSD in the iMP. The big Keyscape patches load a little quicker and the CFX is a LOT faster - the only library that really made a difference.

I have come to the conclusion that the big cost difference for NVMe isn't remotely worth it unless you make music by a stopwatch.


----------



## chimuelo

I noticed the Phison E8 Controller specs/benches like most M.2s but loads Keyscape faster than my Samsung’s that bench higher.
But that’s the Magician software so no telling if that’s from the company they bought that uses caching or just faster transaction times.
I’m interested in such details because I use Dual Live Mode during performances.
Until I got the Phison based M.2 I was limited to loading one Multi per song.
Now I’ve got 2 and 4 way splits instantly.
The new CC # sure helps too.

Anyone looking for new NVMe M.2s should check out the Phison Devices.
MyDigitalSSD is the cheapest, others are coming.
But this is only in reference for STEAM and one shot samples from my Host Bidule.
Every extra second counts.

Omnisphere really gave us a fantastic update.
CCs and resizing were my only 2 complaints.
Both were addressed, and then some.

Nice review on Kontakt and Storage.


----------



## TNM

ok but technically for the streaming, not the loading, it would help, right? AFAIK, it takes multiple processes to start really maxing out the bandwidth of those NVMe drives.. for example, just streaming one patch from kontakt would never get close to using the possible speed. But what about if kontakt was set up to use little ram and stream as much as possible.. if there were like, 16 kontakt instances, with 16 patches each, a huge orchestration, all streaming off the one drive.. wouldn't NVME *then* make a difference? Or am i in la la land and sata would still have headroom? I am trying to work out for myself now, as i am finally getting an imac pro, whether to bother investing in something like an OWC thunderblade as an external TBolt3 drive, which is roughly 2-2.5gb/s transfer speeds..or whether to just get a couple of cheap samsung 2Tb T5's.. I will not write to the drives often, these will be purely for streaming samples so endurance should be fine...


----------



## jcrosby

TNM said:


> ok but technically for the streaming, not the loading, it would help, right? AFAIK, it takes multiple processes to start really maxing out the bandwidth of those NVMe drives.. for example, just streaming one patch from kontakt would never get close to using the possible speed. But what about if kontakt was set up to use little ram and stream as much as possible.. if there were like, 16 kontakt instances, with 16 patches each, a huge orchestration, all streaming off the one drive.. wouldn't NVME *then* make a difference? Or am i in la la land and sata would still have headroom? I am trying to work out for myself now, as i am finally getting an imac pro, whether to bother investing in something like an OWC thunderblade as an external TBolt3 drive, which is roughly 2-2.5gb/s transfer speeds..or whether to just get a couple of cheap samsung 2Tb T5's.. I will not write to the drives often, these will be purely for streaming samples so endurance should be fine...



This was always my assumptiom... I run Kontakt with a very low preload size and sure seems like I get better polyphony and performance than SATA III... Could be wishful thinking I suppose...


----------



## TNM

There is a huge sale on samsung T3 like I never seen at an aussie store.. he has already sold over 600..

the 2TB for 825 AU, minus 20%, so minus 165$.. therefore $660.. I can't even see them that cheap in the US.. that's about 450 USD for the 2TB.
I think i will get 2, simply connect one each to usb 3 ports on my caldigit thunderbolt 3 hub, and raid 0 them...
T3 sustains 450 read even over USB 3.. that should definitely take it to 700.. Believe it or not no one else has done this on video, anywhere. all people have done is raid some flash drives for fun, like a bunch of 8GB ones.

I think 4 terabytes of bus powered SSD for 1300$ AUD is dead cheap, and 4 terabytes is enough to hold every kontakt library i possess and then some.. So.. yeah.. Surely that will be fast enough..the write won't increase all that much with the soft raid but that is not relevant here.. even in all the flash drive tests, read speeds doubled in every case...

I really think i could get even 800 mb/s out of T3's!

Like that "Glyph" 2TB portable raid ssd which gets about 700 write and 800 read... but the 2TB is more in AUD than what i'll be paying for 4TB with the T3's.
Anyone with T3's and are happy with them, please let me know! Sale ends 15th, as in midnight tonight, 12 hours exactly!


----------



## chimuelo

jcrosby said:


> This was always my assumptiom... I run Kontakt with a very low preload size and sure seems like I get better polyphony and performance than SATA III... Could be wishful thinking I suppose...



Sounds like transaction benefits of NVMe help out.
Prior to M.2 there were OCZ, HGST and Intel PCI-e Devices, and while they weren’t benching like NVMe speeds were seeing now, folks I know that used these said they had significant performance boosts.
Some of these devices were RAID 0 too, but transaction times could be an additional way to get higher polyphony, along with the CPU.
I noticed super low latency, which I dont really need, from above 4GHz CPU Speeds.
.07 msec. @ 96k/ 32 Samples.
I run my rigs @ 44.1k/64 samples and 4 msec.
Fine, considering 6 msec. is about the time a drummers ride cymbal reaches my ear.

Good that NVMe helps some of us.
I like it ever since Asus PS10S WS C236/Xeon 1U.
The NVMe M.2 sits right in front of my triple barrel 22k rpm fans.


----------



## jcrosby

TNM said:


> There is a huge sale on samsung T3 like I never seen at an aussie store.. he has already sold over 600..
> 
> the 2TB for 825 AU, minus 20%, so minus 165$.. therefore $660.. I can't even see them that cheap in the US.. that's about 450 USD for the 2TB.
> I think i will get 2, simply connect one each to usb 3 ports on my caldigit thunderbolt 3 hub, and raid 0 them...
> T3 sustains 450 read even over USB 3.. that should definitely take it to 700.. Believe it or not no one else has done this on video, anywhere. all people have done is raid some flash drives for fun, like a bunch of 8GB ones.
> 
> I think 4 terabytes of bus powered SSD for 1300$ AUD is dead cheap, and 4 terabytes is enough to hold every kontakt library i possess and then some.. So.. yeah.. Surely that will be fast enough..the write won't increase all that much with the soft raid but that is not relevant here.. even in all the flash drive tests, read speeds doubled in every case...
> 
> I really think i could get even 800 mb/s out of T3's!
> 
> Like that "Glyph" 2TB portable raid ssd which gets about 700 write and 800 read... but the 2TB is more in AUD than what i'll be paying for 4TB with the T3's.
> Anyone with T3's and are happy with them, please let me know! Sale ends 15th, as in midnight tonight, 12 hours exactly!



Unless they're in the same enclosure I could see RAID 0-ing them being problematic. (If you one came lose (or you have a cat ) seems like it'd be a little too easy to break the striping... Not a RAID ninja though so curious what others say...)


----------



## EvilDragon

TNM said:


> if there were like, 16 kontakt instances, with 16 patches each, a huge orchestration, all streaming off the one drive.. wouldn't NVME *then* make a difference?



You'd probably max out on your CPU before even a regular SATA SSD would choke up...


----------



## TNM

Thanks ED, hence why I asked.. However perhaps something like a 2950X overclocked to 4GHZ would be able to do it and make use of the speeds.. I plan to build such a system down the road so will find out then.. but that's 6 months away as I now ordered an imac pro.


----------



## TNM

jcrosby said:


> Unless they're in the same enclosure I could see RAID 0-ing them being problematic. (If you one came lose (or you have a cat ) seems like it'd be a little too easy to break the striping... Not a RAID ninja though so curious what others say...)



Hmmm.. when i ordered my imac pro on wed, the lady on the phone at apple surprised me and took 500 bucks off to thank me for being a loyal customer. So... Since i had the 10 grand interest free and had enough headroom, I added to of the USBC G drive 1TB SSD-R series.. I have a caldigit TS3 plus TBolt3 which has a 10GB usbC port as well as a TB3 passthough, so i will attach each drive to either of those, and try the raid 0 for myself.. MAc is custom so isn't coming till Aug 30th but i'll let you guys know then how it worked.. I'll specifically do tests using a bunch of different VSTi's , Halion, Kontakt, Play etc streaming off the one drive, and we'll see if the raid 0 makes any difference.. I'll do the standard black magic and aja disk bench results too of course.


----------



## EvilDragon

TNM said:


> Thanks ED, hence why I asked.. However perhaps something like a 2950X overclocked to 4GHZ would be able to do it and make use of the speeds.. I plan to build such a system down the road so will find out then.. but that's 6 months away as I now ordered an imac pro.



My CPU runs at 4.5 GHz, and it still craps out sooner than any of my 850 Evos get to their streaming ceiling. It very much depends on the complexity of libraries you're running.


----------



## TNM

EvilDragon said:


> My CPU runs at 4.5 GHz, and it still craps out sooner than any of my 850 Evos get to their streaming ceiling. It very much depends on the complexity of libraries you're running.


just curious then, what cpu?


----------



## EvilDragon

i7-6700K.


----------



## TNM

EvilDragon said:


> i7-6700K.


Hi, I mentioned a 2950X.. there is no comparison.. But, in case you are anti ryzen, let's say a 7960x running at 3.8ghz which is very easy to do even on air cooling.You'll get better single core performance sure, a bit more polyphony, but but I am talking about multiple kontakts and multiple instruments within them all streaming off of one drive and totally maxing out a modern CPU.. Ok so your 6700K craps out prior to sata 850 evos being saturated, but my point is, what about a 16 core cpu dedicated to kontakt? even as a standalone slave/Ve Pro Slave for example. This is why I plan to build a pc myself early next year, purely for kontakt as a slave, hence my interest in whether NVME is worth it for that.


----------



## EvilDragon

Still think it would probably be fine.

Let's just go with the numbers. An 850 Evo 1 TB version has 4k random read (which is what is most used when DFD streaming) speed at around ~40 MB/s. That means that it can process a LOT of 6 KB DFD buffers in one second (40 MB is 40000 KB, divide by 6 KB, you get a magical number of 6666.6... buffers). That is plenty of voices, I would say - and only from a single drive. (Note that "a voice" can be mono or stereo, or in rare cases multichannel, in which case one DFD burst would consist of _DFD buffer size_ x _number of audio channels_, and that the rate of DFD requests to the drive also depends on the sample rate of the samples, and any resampling/repitching going on with it.) If you spread out your heavy sample libraries over multiple such SSDs, you've really got nothing to worry about. I'd just use NVMe for OS and keep sample libs on your regular SATA III SSDs, at least until we get 5 GHz 32 core monsters.

I'd hope my math is right, but it'd be good to hear @tack's take on it.


EDIT: My above numbers are based on 4k QD1 benchmark, which is probably *not* what's happening when streaming a lot of voices. You would be getting a number of simultaneous read requests to the drive, which results in _higher _I/O queue depths, which for SSDs means *even faster performance* (because multiple NAND flash channels are attached to an SSD controller, so they're heavily multitasking by design). To put things into perspective, at 4k QD1 there's nearly 10000 IOPS on the same above mentioned drive. At 4k QD64 (that is 64 simultaneous I/O requests), there's *10x as much IOPS*, which means it can process EVEN MORE 6 KB buffers _in one second._







You literally don't need to worry about this, I think. I can hardly think of a scenario which would saturate these drives, other than heavy server activity. DFD audio streaming? Piece of cake.


----------



## TNM

wow thanks for the detail.. Really I had only planned to build this machine with one 2TB drive simply hosting barebones OS and Kontakt/Ve Pro and sample libraries. That's it. I wasn't planning on multi drive setups and was hoping to get away with 32GB ram by setting kontakt to use maximum streaming rather than ram. I might indeed spring for a 2950X as in 6 months they'll be priced even better and still a killer chip (personally i think they are the best sweet spot bang for buck chip on the market right now).. I think i'll just spend the tiny bit extra to get an NVME drive anyway, it certainly can't hurt.. Hopefully an intel 760P 2TB is adequate for this task.  Cheers


----------



## EvilDragon

I would DEFINITELY suggest separating the OS drive and your samples drive. Don't spend a fortune on a 2 TB NVMe unnecessarily. Get a 256 GB NVMe for OS and get a 2 TB 860 Evo for samples - should be cheaper than a 2 TB NVMe alone (didn't google any prices but it makes sense to me). Job done.


----------



## tack

EvilDragon said:


> I'd hope my math is right, but it'd be good to hear @tack's take on it.


I really haven't taken a proper look at the streaming case yet mainly because it hasn't really been a pain point for me. 

I think it stands to reason that greater parallelism (more Kontakt instances and more cores) is going to be able to drive more I/O.

It didn't take me very much effort to build a project with SCS V1/V2 Core/Decorative patches layering gobs of articulations over a number of Kontakt instances to plateau my 850 EVO with DFD streaming while maintaining plenty of CPU headroom (40-50% on all cores with my 8700K at 4.7GHz).

I say plateau and not saturate because I reached the point where adding more Kontakt instances (across more tracks) failed to drive greater read ops and throughput, even though those metrics were less than I previously benchmarked at 128KB block size (which seems to be much closer to what Kontakt actually does than 4KB -- see the Drive Benchmarks section in the doc for an explanation of how I determined that). And it was clearly dropping samples because it failed a null test. Specifically, it was peaking at 2300 read ops/sec and 160MB/s.

Meanwhile, moving SCS to from my SATA 850 EVO to the NVMe 960 PRO, it peaked at 3900 read ops/sec and 260MB/s for pretty similar CPU load. Fewer dropouts than SATA, though not perfect as I still heard some blips through the null test.

So, sure, with that very cursory look, an NVMe can help you push a higher voice count with DFD streaming. And I suspect it would also let you get away with a smaller preload buffer for an equivalent voice count. But as ED said it really depends on the libraries, and also your projects. Throw in the usual DSP plugins (eq, compression, reverb, etc.) or CPU-hungry synths and the real world difference between NVMe and a good SATA SSD quickly shrinks away.

If you've got the money to burn and you want to maximize your voice count _potential_ at any cost, yeah, NVMe drives are great. 

But another more cost effective way to do this what ED mentioned earlier: spread your libraries across multiple SSDs.


----------



## EvilDragon

tack said:


> spread your libraries across multiple SSDs.



This is really the best way to go about it, indeed. I have sample libraries across 5 1 TB SSDs now. 


Although, I must say it's a bit weird Kontakt doesn't go above QD1 almost at all. I'd expect with concurrent voices being triggered that read requests would be heavily stacked on top of each other, resulting in higher QD... But I suppose a higher QD is what _will_ happen when using multiple Kontakt instances, right?


----------



## D Halgren

EvilDragon said:


> This is really the best way to go about it, indeed. I have sample libraries across 5 1 TB SSDs now.
> 
> 
> Although, I must say it's a bit weird Kontakt doesn't go above QD1 almost at all. I'd expect with concurrent voices being triggered that read requests would be heavily stacked on top of each other, resulting in higher QD... But I suppose a higher QD is what _will_ happen when using multiple Kontakt instances, right?


Sorry for being daft, but are you guys saying that it's better to spread your libraries across multiple SSD's versus a raid 0 speed increase? IE, if I raid 4 drives versus those 4 drives separate, but slower?

@tack @EvilDragon


----------



## tack

EvilDragon said:


> Although, I must say it's a bit weird Kontakt doesn't go above QD1 almost at all.


Yeah, that's why I hypothesized Kontakt was doing synchronous disk I/O and sample decompression in the same thread.



EvilDragon said:


> But I suppose a higher QD is what _will_ happen when using multiple Kontakt instances, right?


Ok well now _this_ is interesting: no, no it doesn't happen, it still tops out at 1. But it _does_ happen when I tell Reaper to spawn each Kontakt instance as a dedicated process. Obvious conclusion: that single thread doing disk reads is shared between instances in the same DAW process space. (Edit: confirmed with Process Monitor.)

This kind of design makes sense back in the days when spinning rust was the primary storage technology, when parallelizing your I/O could be catastrophic for performance, but on modern systems with SSDs, it's crazy.

And with this change to run Kontakt instances as dedicated processes, the previous test on my SATA 850 EVO now peaks at 480MB/s and 6700 read ops/sec, and _that_ looks properly saturated. And the NVMe at 605MB/s and 7800 read ops/sec -- not saturated. 

Really interesting discovery there. Kontakt desperately needs some architectural tweaks to bring it into the 2010s.


----------



## tack

D Halgren said:


> Sorry for being daft, but are you guys saying that it's better to spread your libraries across multiple SSD's versus a raid 0 speed increase? IE, if I raid 4 drives versus those 4 drives separate, but slower?


RAID 0 is just another way to spread your libraries across multiple drives. 

It should perform pretty similar to placing them directly on separate drives. Personally I would avoid RAID-0 just so as not to have the headache of losing the entire volume when a drive fails (or having to deal with specialized recovery tools).

(I still centralize all my libraries in one place by way of junction points.)


----------



## Nick Batzdorf

EvilDragon said:


> Note that "a voice" can be mono or stereo, or in rare cases multichannel



Now that's interesting. So it makes no difference whether it's multiple mono files or one stereo/surround one?


----------



## EvilDragon

Nick Batzdorf said:


> So it makes no difference whether it's multiple mono files or one stereo/surround one?



One zone in Kontakt = one voice. No matter how many audio channels the sample loaded into a zone has. So if you overlap 2 zones with mono files, you will see 2 voices played, but if you have one zone with a stereo file, you will see 1 voice played. They will do the same amount of disk reads, though (because math).

In Kontakt, DFD buffer is stated in KB, not in milliseconds (like it is in, for example, HALion and Falcon). This just means that for a sample that has more channels and/or higher sample rate/bit depth, Kontakt will just make more DFD calls at the same DFD buffer size. OTOH, Falcon and HALion will load samples in x millisecond chunks, so in a way it's a sort of "dynamic" DFD buffer size.



tack said:


> Ok well now _this_ is interesting: no, no it doesn't happen, it still tops out at 1. But it _does_ happen when I tell Reaper to spawn each Kontakt instance as a dedicated process. Obvious conclusion: that single thread doing disk reads is shared between instances in the same DAW process space. (Edit: confirmed with Process Monitor.)
> 
> This kind of design makes sense back in the days when spinning rust was the primary storage technology, when parallelizing your I/O could be catastrophic for performance, but on modern systems with SSDs, it's crazy.
> 
> And with this change to run Kontakt instances as dedicated processes, the previous test on my SATA 850 EVO now peaks at 480MB/s and 6700 read ops/sec, and _that_ looks properly saturated. And the NVMe at 605MB/s and 7800 read ops/sec -- not saturated.
> 
> Really interesting discovery there. Kontakt desperately needs some architectural tweaks to bring it into the 2010s.



Wow, that's really interesting. I posted a new feature request to NI about this. Who knows if it'll happen, but a man can hope, I guess? 15 years old codebase doesn't sound like a very fun ride to do this large deep architectural changes in it, though...



@tack If you have Falcon, it'd be interesting to see if they're using higher QD towards SSDs, since they have an SSD optimized DFD mode.


----------



## Guy Rowland

Looking ahead, assuming NVMe prices drop significantly (which is a pretty risky assumption), how you feel about dropping all sample libraries onto a single 4TB NVMe?


----------



## EvilDragon

Methinks I'd still wanna spread them out...


----------



## Guy Rowland

EvilDragon said:


> Methinks I'd still wanna spread them out...


 For throughput reasons?


----------



## EvilDragon

That and in practice you don't want your NVMe to be fully saturated or the thing gets hot and starts thermal throttling...


----------



## tack

EvilDragon said:


> 15 years old codebase doesn't sound like a very fun ride to do this large deep architectural changes in it, though...


Only NI can say for sure, but actually I think there are signs of reasonable abstractions here that could make this type of improvement realistic. If they do have a worker thread responsible for I/O (and probably sample decompression) then there are probably clear interfaces in and out of that worker. If so, it'd be possible to change it to use async I/O, and/or a producer/consumer thread pool approach I suggested in the doc, without the other layers needing to know about that change.

Whether they'll think the performance increase is worth the hassle is another matter entirely. 

But in the meantime, Reaper users have a nice workaround here by running Kontakt as a dedicated process. This does unfortunately mean that you also forego sample pool sharing, so the same patch loaded in different instances will duplicate the samples in memory. (I actually wish Reaper provided more ability here to create host process groups, and choose which instances should be loaded into which groups.)



EvilDragon said:


> @tack If you have Falcon, it'd be interesting to see if they're using higher QD towards SSDs, since they have an SSD optimized DFD mode.


I don't have Falcon. I do have the Bohemian though which uses UVI Workstation. Disk type is set to SSD.

I generally find the Bohemian's load times to be pretty brutal compared to similar sized Kontakt instruments. I never really looked into it though, preferring instead to take the tried-and-true approach of whining about it on the internet.

After a quick peek, I do see queue depths > 1, and procmon shows two threads performing heavy reads on the .ufs file. So UVI Workstation is doing multithreaded reads here (I assume only when you set the type to SSD). But the overall throughput is really poor because the average block size is quite low compared to Kontakt. (Average in UVI is around 7KB while with Kontakt it's 112KB.) According to procmon, most of the reads are 8KB and a few 4KB that are bringing the average block size down. (_Lots_ of tiny 4 and 8 byte reads as well [probably metadata] but these won't generate physical I/Os and will instead come out of cache.)

Anyway, the net of it is the painful load time of the Bohemian is from the small block size saturating I/O while producing low throughput and being harder on CPU. Ultimately this one looks CPU bottlenecked:






These show two runs loading the Bohemian. First run is NVMe 960 PRO, second one is SATA 850 EVO. Green line is read operations, red line is throughput (where 1 unit on the y-axis == 10MB/s), blue line is total processor idle (across all cores), and the thick brown line at the bottom is queue depth.

So I'm topping out at around 40-50MB/s and faster storage isn't helping. Deserves a bit of a closer look to see what's actually going on with the threads given the CPU headroom I still have -- I assume this, as with many things, comes down to single core performance caused by single-threaded work.

I've been focusing on load times here because I wasn't able to generate enough disk activity for the DFD streaming case, seeing as I only have the one instrument (Bohemian).

Anyway, even with its single-thread I/O limitation Kontakt is doing a lot better just because of its much larger block size, and IMO this is reflected in how things _feel_. It'd be great to see Kontakt push through this limitation.


----------



## meradium

While we are at the disk debate, how crucial is it really to keep the OS drive separated from samples if you use that particular machine exclusively as a sample slave?

Does it make that much of a difference?


----------



## EvilDragon

Yeah there's no background loading in UVI engine (and it doesn't even support multicore for voice processing like Kontakt does), and Bohemian has quite a lot of samples to load...



meradium said:


> While we are at the disk debate, how crucial is it really to keep the OS drive separated from samples if you use that particular machine exclusively as a sample slave?
> 
> Does it make that much of a difference?



IMHO it's a "piece of mind" kind of thing. If something happens to the OS, you only reinstall it on that drive, you don't have to touch the sample drives. Plus, OS will constantly want to read or write to its drive, so why should that interfere with DFD performance (no matter how fast our SSDs are)?


----------



## MartinH.

meradium said:


> While we are at the disk debate, how crucial is it really to keep the OS drive separated from samples if you use that particular machine exclusively as a sample slave?
> 
> Does it make that much of a difference?



If the machine is just a sample slave with lots of RAM, I'd assume there isn't much that the OS would even need to read/write onto the system disk all the time. I doubt you'd feel the difference between one shared and two separate SSDs here. EvilDragon has a point about system reinstalls. But how often do you really do that? My rig has a roughly 8 year old Windows installation right now. 




tack said:


> RAID 0 is just another way to spread your libraries across multiple drives.
> 
> It should perform pretty similar to placing them directly on separate drives. Personally I would avoid RAID-0 just so as not to have the headache of losing the entire volume when a drive fails (or having to deal with specialized recovery tools).
> 
> (I still centralize all my libraries in one place by way of junction points.)



I use junctions as well, very handy for organizing stuff accross a growing number of drives, but easy to lose track over the years. 
I would expect raid 0 to be _slower _than two separate drives in certain high usage scenarios, at least on HDDs, where you're capped on seek-time instead of bandwidth, because imho raid 0 should roughly double the number of reads needed for the same amount of data. I would stay away from raid 0 for the reasons you mention.


----------



## EvilDragon

MartinH. said:


> If the machine is just a sample slave with lots of RAM, I'd assume there isn't much that the OS would even need to read/write onto the system disk all the time.



Oh but it does happen all the time, regardless of what the machine is supposed to do. Check it out with Process Monitor.


----------



## tack

MartinH. said:


> I would expect raid 0 to be _slower _than two separate drives in certain high usage scenarios, at least on HDDs, where you're capped on seek-time instead of bandwidth, because imho raid 0 should roughly double the number of reads needed for the same amount of data.


That depends on your stripe size, but even with a relatively small stripe size those read operations are distributed over multiple drives in parallel, which after all is kinda the point of raid 0's performance benefit 

I think it might be slower for different reasons. For example if you had lopsided access times between the drives in the array, where you would end up with a lowest common denominator effect. Another scenario where raid 0 with spinning rust would suck pretty bad is two parallel sets of sequential reads (e.g. preloading two different libraries in parallel): with the libraries on separate drives the heads would advance in one direction while with raid 0 they'd be all over the place.


----------



## ThomasS

TNM said:


> There is a huge sale on samsung T3 like I never seen at an aussie store.. he has already sold over 600..
> 
> the 2TB for 825 AU, minus 20%, so minus 165$.. therefore $660.. I can't even see them that cheap in the US.. that's about 450 USD for the 2TB.
> I think i will get 2, simply connect one each to usb 3 ports on my caldigit thunderbolt 3 hub, and raid 0 them...
> T3 sustains 450 read even over USB 3.. that should definitely take it to 700.. Believe it or not no one else has done this on video, anywhere. all people have done is raid some flash drives for fun, like a bunch of 8GB ones.



Thanks for this heads up TNM! I just ordered one T3 for 660 AUD to ship here to Adelaide, coming in a day or two. I'll let you know how it works. Did you get them?


----------



## D Halgren

tack said:


> That depends on your stripe size, but even with a relatively small stripe size those read operations are distributed over multiple drives in parallel, which after all is kinda the point of raid 0's performance benefit
> 
> I think it might be slower for different reasons. For example if you had lopsided access times between the drives in the array, where you would end up with a lowest common denominator effect. Another scenario where raid 0 with spinning rust would suck pretty bad is two parallel sets of sequential reads (e.g. preloading two different libraries in parallel): with the libraries on separate drives the heads would advance in one direction while with raid 0 they'd be all over the place.


Well, I was talking about 4 1G 860 EVOs over thunderbolt 3. Still a little confused, should I raid 0, or leave as separate drives? Any guidance would be appreciated as I am setting up a new storage/streaming solution.


----------



## EvilDragon

Leave as separate drives.


----------



## tack

Maybe if a key use-case was doing heavy streaming of a single library I could see an argument for RAID 0, but otherwise it's not worth the hassle of losing the array when a drive fails. And even in the single library case there's nothing preventing you from manually spreading it across multiple drives with the junction trick.


----------



## D Halgren

Thanks guys


----------



## TNM

ThomasS said:


> Thanks for this heads up TNM! I just ordered one T3 for 660 AUD to ship here to Adelaide, coming in a day or two. I'll let you know how it works. Did you get them?


I can't believe I didn't..

When I ordered my imac pro and she gave me the 500 discount, i then added the g tech 1TB SSD-R.. i have 18 months interest free so i thought it was worth the extra over the T3 cause it's about 100 mb/s faster.
However in hindsight, i should have gotten at least the T3 1TB.. One can never have enough storage. Too late now!

I do have a brand new crucial MX500 1TB Sata so I'll just get a nice usb 3.1 fanless enclosure for it and use that instead, which i was originally planning to put in a windows laptop, but i decided against getting one for now (for gaming).


----------



## ThomasS

TNM said:


> I can't believe I didn't..



Well, if they go back down in price, I recommend it. I got the 2TB Samsung T3, and ran it through a test, and it reads and writes at 450, which is the same as my internal SSD. What surprised me the most (happily) is how _small _it is. It's length is as small as the width of normal externals, and it is much thinner too. I'd' say overall you could fit four of them in the same space as a one normal compact external drive.


----------



## Yigal Navon

Hi, I would like to know regarding your nvme test. You write that on purge there is NO difference from ssd in project loading times, But what about latency (For real time midi playing)? You know nvme latency is lower so in purge instruments wouldn't you get advantage with real time midi playing? I am planning on building a template where all the samples/tracks are purge and all the tracks ready to play on the fly. (Very Heavy libraries)


----------



## tack

Yigal Navon said:


> But what about latency (For real time midi playing)? You know nvme latency is lower so in purge instruments wouldn't you get advantage with real time midi playing?


Yes, streaming is going to be better with NVMe. However I didn't specifically test and measure DFD, so I can't quantify by how much (and therefore whether the cost differential is warranted).


----------



## Yigal Navon

Well my friend your test document will never be complete without the DFD measurements , if you can do a quick observation on this matter it would be super helpful.


----------



## tack

Yigal Navon said:


> Well my friend your test document will never be complete without the DFD measurements


It was complete enough for me.  I had a very specific question: would NVMe help with the part of loading projects I found most annoying.



Yigal Navon said:


> if you can do a quick observation on this matter it would be super helpful.


That's kind of the rub. To do it properly is definitely not a quick observation. Or at least, the quick observation is this: NVMe _will_ help DFD streaming. I just can't tell you by how much.


----------



## Dewdman42

My opinion about NVMe is: If I am building a new system and will have to buy either SSD or NVMe, then I will absolutely get NVMe, even if its maybe up to 30% more expensive. I don't expect to see much difference in load times or even streaming frankly...there are just too many people on the internet saying that the difference is performance is that not great in real life practical terms due to other bottlenecks. But still, might as well get the faster stuff in case other things down the road change better. 

But like for my existing system that has $600 worth of SSD's in it now, should I consider changing over to NVMe? Absolutely not worth the cost. IMHO.


----------



## tack

Dewdman42 said:


> there are just too many people on the internet saying that the difference is performance is that not great in real life practical terms due to other bottlenecks.


The outcome from my testing was this: while initial load times (where Kontakt blocks the UI) were equal, unpurged patches preloaded all samples into memory 2x faster with NVMe with the documented configuration. Objectively NVMe will handle more voices for DFD before dropouts. With the caveat that I haven't measured it, my WAG on that is about 2-3x more voices in the tested configuration, but possibly up to 4-5x times more, since DFD is sensitive to latency and the drives I benchmarked had about a 4.5x disparity in latency at the relevant block size for Kontakt.

Whether or not _your_ projects need that extra headroom in voices is another question entirely. And probably the most relevant question too.


----------



## Dewdman42

And those are the reasons if I was building a brand new system with no spare SSD's laying around, I would definitely use NVMe. No way I can justify changing from my SSD's to that.


----------



## Karnob

Hello Guys
I recently bought a sabrent rocket pro 2 TB with USB-C connection to my 2019 iMac, iMac is i9 with 64 GB RAM. I got the sabrent to have a faster load times for my VI libraries on Kontakt and Omnisphere and other VIs. However, i am noticing that the load times i nsome cases are super fast, and other cases are slower by far than the spinning disks (8 TB G-tech on thudnerbolt 2). I think with big VSTs the nvme drive is loading much slower, sometimes it reaches to 8-10 minutes, same is happening when i want to close the project, sometimes it takes 10 minutes until the project closes. i did not have this issue before with the spinning disks. i understood from sabrent, there is the ssd cache, that might be a cause, but i want to ask do u guys face a similar issue? and what solutions u suggest, u think if i get a thunderbolt 3 enclosure and put the sabrent drive inside the th3 enclosure instead of the usb-c enclosure (that comes as original with sabrent) will solve the issue?


----------



## EvilDragon

How is your drive formatted? If it's ExFAT, that's not good. Reformat it as macOS Journaled.


----------



## Karnob

Hello My Friend. yes it is ExFAT, what is the difference and how it will affect the performance?


----------



## EvilDragon

Kontakt doesn't like ExFAT at all. Just reformat it as suggested.


----------



## Karnob

I will just can you tell me the reason? i want to understand why is it so? just for curiosity and knowledge. DO u suggest getting an 4M2 from OWC with 4 NVMe drives, on raid0 (i know its nt the safest) and connect it with thunderbolt 3 for all the libraries? 8 TB (4 x 2 TB samsung EVO plus)


----------



## Kent

re: streaming DFD samples on NVMe drives, is there a functional bottleneck on bus speeds? In other words, is there a definitive "better/worse" for using, say, one 4TB NVMe drive vs four 1TB NVMe drives?


----------



## Pictus

No bottleneck for audio workloads.


----------



## Kent

Pictus said:


> No bottleneck for audio workloads.


All audio workloads, though? That's a broad statement.


----------



## Dr.Quest

Karnob said:


> I will just can you tell me the reason? i want to understand why is it so? just for curiosity and knowledge. DO u suggest getting an 4M2 from OWC with 4 NVMe drives, on raid0 (i know its nt the safest) and connect it with thunderbolt 3 for all the libraries? 8 TB (4 x 2 TB samsung EVO plus)


ExFAT was made for flash drives (but not fast ones like SSDs, unfortunately). This is why it's slower and not recommended to use on a regular hard drive you work from! [_*From an Evil Dragon post on NI Forum*_]


----------



## Guy Rowland

Karnob said:


> I will just can you tell me the reason? i want to understand why is it so?



Performance of exFAT is notoriously slow on everything, not just Kontakt, not just music stuff. I was using portable drives for years thinking they were rubbish before I realised this. It's a shame cos the beauty is that both windows and Mac supports exFAT natively. But its always best to avoid it if you need any kind of performance.


----------



## Karnob

Thank you guys. I will try to delete everything, and re-install while i format the drive to MACOS JOURNALED. let me see.


----------



## Pictus

kmaster said:


> All audio workloads, though? That's a broad statement.



With NVMe TLC/MLC drives, the bottleneck will be your CPU and the software.
Editing 4K RAW video is another history...


----------



## Karnob

Hello guys.
I reformatted the drive from ExFat to Mac OS Journaled, and I can confirm the issue is solved. VSTs now load super fast, matter of seconds. Thank you very much for the help I Appreciate it a lot


----------



## Karnob

EvilDragon said:


> How is your drive formatted? If it's ExFAT, that's not good. Reformat it as macOS Journaled.


THANK YOU


----------

