# I7 6700k Slave Machine Sample Streaming Benchmarks



## rgames (Jun 22, 2016)

I've been setting up a new slave machine and ran it through its paces on a bunch of libraries - data below. It's overclocked to 4.6 GHz and all samples were run from a Samsung 850 Pro SSD in a 44.1 kHz project. It was run as a VE Pro network slave connected to an HP zBook laptop master running Cubase 7.5.

The benchmark runs major scales up and down in 16th notes at 120 BPM, modulating up a half step each time it starts back up. The numbers reported are the max number of voices that did not produce audio dropouts. For Omnisphere the "voices" are the "notes" readout on the system page divided by 10 so I can show them on the same plot (in other words, you need to multiply the values on the chart below by 10 to get the readout).

The four bars for each library represent the different buffer sizes noted in the Omnisphere data. Note that "128+128" means the sound card was set to 128 and the VE Pro network buffer was set to one buffer. The darker segments at the end represent my estimate for the range of maximum values - it can be tricky to pick out an exact number, so there's some uncertainty there.

The 256+256 and 512+512 values for LASS are 4000+ and 8000+ respectively. I limited the max to 2000 to better show the differences in the other libraries. LASS is just showing off - no need for that.

I looked at a bunch of other factors like clock speed and number of cores and will be putting a video together. I think this is a good baseline, though, for anyone wondering what kind of performance to expect from a new slave machine.

Enjoy!


----------



## Suganthan (Jun 23, 2016)

Thank you for this.
So if I am writing a full orchestral track(tutti) with EW Gold instruments, would it hit the threshold 500 voices so easily? What would be the typical play's voice rate when using EWHO?


----------



## chimuelo (Jun 23, 2016)

Richard this is most appreciated.
4.6Ghz is max on many boards.
Which ASRock are you using?


----------



## Jackles (Jun 23, 2016)

Suganthan said:


> Thank you for this.
> So if I am writing a full orchestral track(tutti) with EW Gold instruments, would it hit the threshold 500 voices so easily? What would be the typical play's voice rate when using EWHO?



+1 

I do have some difficulties transposing your results into a real life scenario. 
Thanks for taking the time to do this benchmark though ! I'm planning to buy this processor, so I'm particularly interested in what you came up with.


----------



## EvilDragon (Jun 23, 2016)

chimuelo said:


> 4.6Ghz is max on many boards.



But not all of them... 6700K can OC very nicely with some careful voltage tweaking (even over 4.7 on some) - and I would say you don't really need to have an extra-special mobo for that...


----------



## chimuelo (Jun 23, 2016)

Very True ED.
I meant max sustained OC.

I can hit 4.8ghz on an H170 after undervolting the RAM but even with server quality electronics the fluctuation in voltages from heat make that risky.

I'm thinking OC Fatality and ROG boards are better suited.


----------



## rgames (Jun 23, 2016)

Suganthan said:


> Thank you for this.
> So if I am writing a full orchestral track(tutti) with EW Gold instruments, would it hit the threshold 500 voices so easily? What would be the typical play's voice rate when using EWHO?


It depends on what kind of music you're writing. If you're using a lot of fast lines with modulations all over the place and multiple mic positions then you'll eat up 500 voices pretty quickly. If you're writing a basic repeating spicc pattern then you probably won't.

As an example, the major scale benchmark (described above) across all five string sections (Vln I, Vln II, Vla, Cello, Bass) eats up around 500 voices using only one mic position. With two mic positions then you'll need twice as many voices.

I can tell you how many you get but I can't tell you how many you need 

rgames


----------



## rgames (Jun 23, 2016)

Jackles said:


> I do have some difficulties transposing your results into a real life scenario.


Just play back one of your more stressing tracks and check the voice counts. That'll tell you what you need.

rgames


----------



## rgames (Jun 23, 2016)

chimuelo said:


> Which ASRock are you using?


It's the Z170 OC Formula. I can run it at 4.7 but it's not completely stable in the really stressing benchmarks (e.g. Prime95). So I just run it at 4.6. The core voltage sits around 1.35 V. RAM is DDR4 2400.

Now's a great time to buy that setup - I got the mobo/CPU/64 GB RAM for about $770. Most of the components are in the Newegg sales pretty regularly.

rgames


----------



## chimuelo (Jun 23, 2016)

Good to know.
Prime95 & Cinebench are good benches.

Are you going to get an NVMe device for PLAY at some point?

Thanks


----------



## rgames (Jun 23, 2016)

chimuelo said:


> Are you going to get an NVMe device for PLAY at some point?


I got one of the Samsumg 950 Pro NVME drives and tested it out. I saw some improvement at the lowest buffer setting (96+96) but the rest were a wash in terms of streaming. The NVME drive did drop load times by about 15%, though.

That was using the onboard M.2 connections - I have an add-in card on the way to see if it makes a difference. I read some reviews that said they got better performance with a PCIe card vs. the onboard connections. For $22 I thought I'd give it a shot...

rgames


----------



## dbudde (Jun 23, 2016)

rgames, 

I'd be interested to know if the VSL Directory manager preload size makes any difference on the VSL strings results. I assume you have the preload size set to 1536 for this test since your samples come from a fast SSD, correct?


----------



## tack (Jun 23, 2016)

Thanks for sharing these results, Richard. How did the voice count compare to your previous CPU? (And what was your previous CPU?)


----------



## Gerhard Westphalen (Jun 23, 2016)

I know that decreasing the Kontakt preload buffer increases CPU usage. Any idea how much that would effect these results? Also curious to know where Spitfire libraries would sit on the graph since that's what I mostly use.


----------



## Jaredf920 (Jun 23, 2016)

Thanks for this!!

You mentioned that the NVME drive loaded faster, but you didn't see much improvement in terms of more voices?
I am curious to see if the add-on card will make any difference as well! This is really good info, makes me a little hesitant to spend more $$ & put all samples on NVME drives.

I'm currently looking into the i7-6800k for my slave build, planning on overclocking it to 4-4.4GHz. I'm really curious if the extra 2 cores will make a noticeable difference or not...


----------



## jamwerks (Jun 23, 2016)

@rgames, how many total M2's can you have? And what is the loading time for how many gb's? Thanks!


----------



## rgames (Jun 23, 2016)

dbudde said:


> rgames,
> 
> I'd be interested to know if the VSL Directory manager preload size makes any difference on the VSL strings results. I assume you have the preload size set to 1536 for this test since your samples come from a fast SSD, correct?


Yes - pre-load size can have an effect. I used 4096 for VSL and 60 kB for Kontakt. Those are both really high, of course, but I wanted to look at it from a "What's the best you can do?" standpoint. So I set the pre-loads really high to limit their effect. In the past I've gone as low as 2048 in VSL and 18-30 kB in Kontakt (depending on library, most do fine at 18 kB from SSD) with no significant reduction in voice counts.

Regarding CPU - my previous slave was an i5 2500k at 4.4 GHz. The 6700k definitely produces more voices - sometimes 2x-10x as many - but it's hard to directly compare the CPUs since the chipsets are from very different generations (the i5 2500k is 5-6 years old at this point). My guess is that a current-generation i5 overclocked to 4.6 GHz would produce much better performance than the i5 2500k but it's hard to say if it would match the i7 6700k.

I did compare against my i7 4930k, a more-recent 6-core chip at 4.4 GHz, and found mixed results. For libraries that rely mostly on sample streaming it tended to provide more voices - as many as 30% more - at the lowest buffer setting (96+96). Those same libraries, however, were a wash at the higher buffer settings. For the libraries that really don't depend on streaming (Omnisphere and Samplemodeling) the opposite was true - the 6700k provided more voices at about the same amount - 30%. So, unfortunately, the answer is "it depends".

But here's a very important point: both of those CPUs were overclocked about the same - 4.4 and 4.6 GHz. Many of the higher-core-count chips (e.g. Xeons) are clocked around 3.0 GHz, so I under-clocked the 6700k and 4930k to 3.0 GHz to see the effect of clock speed. It was much larger than the effect of number of cores - the voice counts dropped by 50% - 80% in many instances. I seriously doubt that any number of additional cores would make up that difference.

So that's something to consider: speed seems to help more than number of cores.

rgames


----------



## rgames (Jun 23, 2016)

jamwerks said:


> @rgames, how many total M2's can you have? And what is the loading time for how many gb's? Thanks!


That board allows for 3 at x4. But you can add more by using up PCIe slots. Total load times for a 26 GB template were 336 and 289 seconds for the 850 Pro and 950 Pro NVME, respectively. That includes about 12 GB of PLAY using PLAY 4.3.5 (the slow one...). PLAY 4.2.2 was much quicker but I didn't record the data.

Without PLAY, the 14 GB template loaded in about 72 seconds from the 850 Pro.

rgames


----------



## rgames (Jun 23, 2016)

Gerhard Westphalen said:


> Also curious to know where Spitfire libraries would sit on the graph since that's what I mostly use.


I don't use Spitfire much (only some of the sections patches in Albion 1) but my gut feeling is that it falls between Cinebrass and LASS.


----------



## JohnG (Jun 23, 2016)

The most demanding string libraries really seem to "like" fairly high CPU speed, 4.0 GHz or higher. I use quite a bit of Spitfire and Hollywood string patches, some with multiple mic positions, and for that a high CPU has helped noticeably, compared with the 3.0 range of my previous string computer.

I'm afraid I don't have as much detail as Richard on numbers of voices, but the gist is that a fairly complex bunch of string writing was straining at the edge on the older computer, fast though it was. By contrast, I have yet to come up with anything on the new one that it can't accommodate.


----------



## Jackles (Jun 23, 2016)

JohnG said:


> The most demanding string libraries really seem to "like" fairly high CPU speed, 4.0 GHz or higher. I use quite a bit of Spitfire and Hollywood string patches, some with multiple mic positions, and for that a high CPU has helped noticeably, compared with the 3.0 range of my previous string computer.
> 
> I'm afraid I don't have as much detail as Richard on numbers of voices, but the gist is that a fairly complex bunch of string writing was straining at the edge on the older computer, fast though it was. By contrast, I have yet to come up with anything on the new one that it can't accommodate.



Could you tell us what is that machin that could accommodate with anything ? I'm looking for the right parts for a new computer, and I would like no to make the same mistakes I've made in the past. I run constantly in the complexe strings parts scenario.


----------



## JohnG (Jun 24, 2016)

Hi Jackles,

I also love strings and keep multiple libraries loaded. However, my computer is far from "the latest" so I wouldn't ape it. I'd look for chimuelo's posts on the forum here -- he's perhaps the most intense researcher about all this.

My specs:
Windows 10
32 GB RAM
4.0 GHz i7 CPU
PCIe card SSD array (hardware RAID) for most libraries
SSDs for other libraries
buffer on RME card is 512 (I tried and tried to get this down to 256 but never managed it -- maybe you can with a more powerful CPU).

I also went through the operating system and BIOS and turned off a ton of functions that I don't need. While inherently risky -- don't do this if you don't know what you are doing -- it made a big difference in performance.

Here's the link to the discussion here about "tuning" Windows 10: http://vi-control.net/community/threads/windows-10-settings-stuff-you-can-turn-off-for-music.49446/


----------



## Vin (Jun 24, 2016)

Jackles said:


> Could you tell us what is that machin that could accommodate with anything ? I'm looking for the right parts for a new computer, and I would like no to make the same mistakes I've made in the past. I run constantly in the complexe strings parts scenario.



What's your budget?


----------



## Jackles (Jun 24, 2016)

Vin said:


> What's your budget?



Around $2000.


----------



## Jackles (Jun 24, 2016)

JohnG said:


> Hi Jackles,
> 
> I also love strings and keep multiple libraries loaded. However, my computer is far from "the latest" so I wouldn't ape it. I'd look for chimuelo's posts on the forum here -- he's perhaps the most intense researcher about all this.
> 
> ...



Thanks, any information I can gather is good to take. I previously purchased a very expensive and particularly inappropriate computer for sample music, I really don't want to go down that road again. So now I'm extra careful. 

My current template takes more than 32Go of RAM (I have 126 but I also have 2 xeon processors of 1.8Ghz each, so nothing powerful enough to take advantage of all that RAM). So from what I'm seeing around here, maybe my template is a little bit too much for someone not being Hans Zimmer or Tom Holkenborg...


----------



## JohnG (Jun 24, 2016)

another member, GSilbers, has 128 GB on a single machine and it's working for him. My 32 GB machine is only for strings.


----------



## Vin (Jun 24, 2016)

Jackles said:


> Around $2000.



You can build a kick-ass PC for that money.

My first choice would be something like this:

*http://pcpartpicker.com/list/7MhgLD*

High-end, cold, all-SSD and completely silent machine. You can add 64 GB of RAM more, but I think it's would be overkill since with SSDs you can lower your preload buffer, at least in Kontakt.

I have a single machine with 5820K OC'd @4.4 GHz, working with lots of plugins and samples in real time and it rarely goes north of 50-60%. No clicks, no pops.

You can upgrade later to Broadwell-E CPUs (like that i7-6950X behemoth which now goes for ~$1700) when the prices go down.

If you want 6700K, then your limit for RAM is going to be 64 GB and I'd get something like this:

*http://pcpartpicker.com/list/Xx3tJV*

But I prefer 2011-v3 to 1151, 1151 is only better in single core performance.


----------



## EvilDragon (Jun 24, 2016)

Vin said:


> 1151 is only better in single core performance



Which is quite important for a lot of plugins - Reaktor, Falcon, etc...


----------



## Jackles (Jun 24, 2016)

JohnG said:


> another member, GSilbers, has 128 GB on a single machine and it's working for him. My 32 GB machine is only for strings.



Wow ... Ok, 32Go of strings ! You do love strings don't you ? XD
I feel better already. Although, I'm starting to realize that, if I want to use the template that I build, I have to go with slave machines. And this is something I wasn't thinking about until now.

How many slave machines do you work with ?



Vin said:


> You can build a kick-ass PC for that money.
> 
> My first choice would be something like this:
> 
> ...



Wow, thanks a lot for this. 
I was pretty much aiming in that direction. Especially for the 6700K. Now I know I don't need more than 48Go of RAM, so the limitation doesn't bother me.
Thanks again !


----------



## JohnG (Jun 24, 2016)

Jackles said:


> How many slave machines do you work with ?



I use four but if they were up to date, maxed out machines I probably could do it with fewer; I just can't face reconfiguring everything. And besides I like to have plenty of extra room on them so that I can add libraries in a hurry.


----------



## JohnG (Jun 24, 2016)

I'd definitely get a 4.0GHz CPU, for what it's worth.


----------



## TintoL (Jun 24, 2016)

Thank you all for your time researching all this. I wanted to ask if someone could please tell me how can you check voice count in Cubase and kontatk?

Also Richard, I wanted to ask you, when you said:



rgames said:


> I got one of the Samsumg 950 Pro NVME drives and tested it out. I saw some improvement at the lowest buffer setting (96+96) but the rest were a wash in terms of streaming. The NVME drive did drop load times by about 15%, though.
> rgames



Did you mean that comparing your resaults with a 2.5 sata ssd drive the improovements where only 15 percent compared to the M.2? Because that will suprised me.

Thanks in advance.


----------



## Jackles (Jun 25, 2016)

JohnG said:


> I use four but if they were up to date, maxed out machines I probably could do it with fewer; I just can't face reconfiguring everything. And besides I like to have plenty of extra room on them so that I can add libraries in a hurry.



I would love to ask you how you set it up, but that's a bit too much OT I suppose. Do you mind if I ask you this in PM ?


----------



## JohnG (Jun 25, 2016)

Jackles said:


> I would love to ask you how you set it up



I use hardware but if I were starting over I'd use VE Pro to get going. 

Hardware has the advantage that you can add as many slave computers as you want and arguably other advantages as well, but VE Pro is much less expensive and, I would guess, better for where you appear to be with a likely 2-3 computer setup.


----------



## rgames (Jun 25, 2016)

TintoL said:


> Did you mean that comparing your resaults with a 2.5 sata ssd drive the improovements where only 15 percent compared to the M.2? Because that will suprised me.


That's correct - there was a little improvement in voice count but only at the lowest buffer setting (96+96). Overall there's really no difference for streaming voice count. The 15% improvement was on the load times.

That's consistent with what I've seen in the past - I've had SSDs that bench between 400 and 550 MB/s sequential read and they performed basically the same in terms of streaming. The NVME drive benches at over 2000 MB/s sequential read but it performs about the same, as well. So the bottleneck no longer appears to be the read speeds. The big jump was from HDDs to SSDs but since then, meh...

As for how to check voice counts - there's a readout at the top of each Kontakt instance, directly above the memory usage. Each Kontakt instance has its own voice count so if you have multiple instances open then you need to add them up to get total voice count. It's hard to read sometimes so it helps to take a video of it while playing back a track. Then move slowly through the video and find the max. In PLAY you go to the "Settings" page and look under the "Streaming" tab - there's a readout there. With PLAY it's tricky, though, because the readout updates slowly and often misses the peak voice usage, so you have to play back a high-voice-count section a bunch of times (maybe 10) and note the maximum. In VSL there's a readout just above the keyboard when you have the advanced view open. For PLAY and VSL the readouts are for all instances, so you don't need to add up individual instances like you do for Kontakt.

rgames


----------



## rgames (Jun 25, 2016)

JohnG said:


> I use hardware but if I were starting over I'd use VE Pro to get going.


Have you ever tried VE Pro over ethernet with your setup? I found a pretty significant drop in latency (512 to 128 if I recall) when I switched from audio hardware to ethernet. How low you can go will depend, of course, on what else is in your master project.

One thing I didn't mention in my original post is that I was running with all of those libraries connected in my full template. So the master machine had 20-30 plug-ins, four IR reverbs, a bunch of group channels, etc. The raw power of the slave is probably higher than what's shown in the numbers but I wanted to give an idea of performance as part of a typical setup. My desktop runs at 128+128 and my laptop runs at 256+256 with my full template (about 375 MIDI tracks and another 140 or so audio tracks for VE Pro audio returns, group tracks, FX tracks, etc.).

rgames


----------



## JohnG (Jun 25, 2016)

rgames said:


> Have you ever tried VE Pro over ethernet with your setup?



If I could reduce the buffer to 128 I would fly to your studio and give you a bag of gold. Or at least potato chips or licorice or something Very Valuable.

I read a while ago that four slaves plus a DAW was too much for VE Pro, plus my audio path is idiosyncratic so I don't know if it would work:

Midi path: DAW => slave computers
Audio path: DAW+slave_computers => another Mac running Pro Tools (via PT interfaces)


----------



## rgames (Jun 25, 2016)

Yeah - I could see that. I've never had more than two slaves and going into a Pro Tools rig might be better with dedicated audio.


----------



## JohnG (Jun 25, 2016)

well, thanks Richard. I appreciate it. Maybe my next iteration I'll bag PT, but it's a tough call. I'm really used to it, all my engineers use it, and that zero latency thing is tough to emulate. There's a way to do the latter in Digital Performer but I've never gotten around to it.


----------



## TintoL (Jun 25, 2016)

Thanks Richard. I am surprised the improvement is so small. And the CPU can not be the bottleneck. It has to be something in the qpi. But apparently the qpi speed should be around 25 GB per sec. So what is it....? Who knows.

in practice, the point to me is that it is more cost efficient and pragmatic to stick to sata 2.5 ssd.

Thanks for explaining to me the voice count.


----------



## rgames (Jun 25, 2016)

TintoL said:


> I am surprised the improvement is so small.


Yeah - I've never been able to explain why the read speeds and IOPS don't make much difference. When I saw no difference as a function of SSD read speeds 3-4 years ago I thought maybe IOPS would be the key but as I've started using NVME drives (which have much higher IOPS) I still haven't seen much improvement.

And that's not just for sample streaming - I do a lot of time-lapse photography and wind up with thousands of RAW camera images that need to have preview images generated and cached to manipulate them. I've tried putting my cache drive on NVME but doing so didn't make the process any faster. Same kind of thing as streaming samples but much larger file sizes. Neither one seems to benefit from the NVME drive.

The NVME drive did show 15% improvement in load times, though, so that's something. In truth, load times aren't an issue for me because I run from a template.

rgames

EDIT: Also, you are correct that CPU is not the bottleneck - I watched it while doing the benchmarks and it was only an issue at the really high buffer settings (512+512) with LASS. Max with everything else was around 50% - 60%.


----------



## tack (Jun 25, 2016)

To me, something has always smelled off about Kontakt performance, especially on loading patches. When there's no apparent bottleneck (all cores have plenty of headroom, filesystem cache is warm and disks are basically idle, no network I/O involved), one has to wonder just where the problem is. If this were Linux, I'd be able to more conclusively blame it on shitty software. On Windows, I feel entirely helpless. All I can do is stare at perfmon plots.

Is it like that on the Mac too? I've always just answered the "why do Kontakt patches load so slowly when every resource on my system has plenty of headroom?" with "Because Windows."


----------



## EvilDragon (Jun 26, 2016)

tack said:


> "why do Kontakt patches load so slowly when every resource on my system has plenty of headroom?"



I think this might depend on what the instrument actually contains BESIDES samples... like instantiating all those effects, filters, modulators, IR samples, reserving memory for groups, zones, modulators, etc... Sure that should be fast, but I would assume this part of Kontakt might not have received any important updates ever since K4.2 when binary format was introduced, or perhaps even before...


----------



## tack (Jun 26, 2016)

EvilDragon said:


> I think this might depend on what the instrument actually contains BESIDES samples... like instantiating all those effects, filters, modulators, IR samples, reserving memory for groups, zones, modulators, etc... Sure that should be fast


Or at the very least all that should be compute intensive, and I should see that in the graphs in the form of at least one core flirting with the 100% line.

I'd be much happier with that. At least I'd know I could always throw more hardware at it if I want it faster. But I can't actually find any apparent bottleneck in any system resource to suggest beefier hardware should make any difference. Even in the case of a cache stall (and so the CPU is waiting on RAM) this should manifest as utilization -- unless Windows counters don't work like Linux in this regard. That's worth looking into -- it could be a bad assumption.

What I see is most of the time cores are below 50% utilization during a heavy project load, at least on some of the heavier instruments like Mural and Sable. With some of the lighter ones (say woodwinds), not only do they load faster (unsurprisingly) but the CPU graphs show much higher utilization (surprisingly) than the heavier instances with multiple patches.


----------



## chimuelo (Jun 28, 2016)

I concur that once you hit 110-135 IOPS you've topped out.
NVMe has benefits like embedded slots, great OS device too.
2 x decent SSDs in RAID 0 using RST is the same.
I use the 105k Phison controllers now.
8 months of trouble free gigs.

Try throwing STEAM Folder on an M.2 + OS.
PCM files love NVMe.


----------



## Elephant (Jul 15, 2016)

@rgames Strikes me this slave of yours could do pretty well as a single box with the DAW, and VEPro/samples all on it. What do you reckon ? (Soecifically using that M'board, 64GB RAM and a decent i7) - also, have you by any chance measured the power draw when you are giving it a hammering ?


----------



## rgames (Jul 15, 2016)

Elephant said:


> @rgames Strikes me this slave of yours could do pretty well as a single box with the DAW, and VEPro/samples all on it. What do you reckon ? (Soecifically using that M'board, 64GB RAM and a decent i7) - also, have you by any chance measured the power draw when you are giving it a hammering ?


You certainly could use it as a single machine if your voice counts never exceed its limits. In general, I do exceed those limits, so I need a slave. Actually I picked up a second one with an even cheaper motherboard (ASUS Z-170E, $105 with $20 rebate, so an $85 mobo!).

Max power draw in Prime95 is around 150 W but you'll never hit that for DAW use. Max I've seen while actually using it is around 80 W.

The i7 6700k on the ASUS Z-170E hits 4.6 GHz and performs the same as the Asrock in all respects. That system is under $650 after the rebate.

With the two slaves and using my laptop as master, the setup is running all of EW diamond along with a ton of VSL, Kontakt and other libraries at a buffer of 128+128. When I get back to my desktop master I bet it'll do 96+96. Without PLAY I bet I could run everything at the same buffer (or better) with only one slave.

I haven't seen anything else that performs anywhere close to those two slaves and the total cost is under $1500 (assuming you re-use a case, power supply and drives).

So, as ever, you don't need to spend a huge sum of money on computers for running VIs. You just need a couple of cheap slaves.

rgames


----------



## Elephant (Jul 15, 2016)

@rgames Thanks v much - v useful. As a matter of interest, what is the voice count for one of those boxes when used as a single integrated box before you have to go to a slave ? As a starter system, I could kick off with one single box, then add a slave if and when needed. BTW, what is the power draw on idle for one box ? Sofar, this is looking like a great system. Now I just have to find a case and physical build that is relatively quiet and relatively light.


----------



## rgames (Jul 16, 2016)

The voice count when running sequencer and samples on the same machine is always a bit less - how much less will depend on the project. My standard template has a few hundred tracks, a few dozen plug-ins (mostly EQs), four IR reverbs and a 15 or so group tracks for stems and drops the voice count by 10% or so when running the samples from the same machine.

Re: idle power draw, it's about 50 W.

rgames


----------



## tack (Jul 16, 2016)

rgames said:


> Re: idle power draw, it's about 50 W.


That's quite good. My 6700k system draws 130W while idle. Although I do have a monstrous GTX 980 Ti.


----------



## rgames (Jul 16, 2016)

tack said:


> That's quite good. My 6700k system draws 130W while idle. Although I do have a monstrous GTX 980 Ti.


That's certainly part of it - I use a $10 fanless video card. Fanless ensures that it won't draw much power. You can also use the onboard GPU on the 6700k but I think you get better overclock with the onboard GPU disabled - less heat generation on the die.

Do you disable the power-saving features? I also have all the power-saving features enabled (e.g. all C states enabled, speedstep enabled, etc). So my idle CPU speed is something like 1.3 GHz.

rgames


----------



## tack (Jul 16, 2016)

rgames said:


> Do you disable the power-saving features? I also have all the power-saving features enabled (e.g. all C states enabled, speedstep enabled, etc). So my idle CPU speed is something like 1.3 GHz.


I don't disable them, no. I have a couple icons on my desktop that makes it easy to switch between high performance and balanced power profiles, so when I'm in my DAW I run in high performance, and otherwise with balanced which lets the CPU clock down while idle. No C states are disabled in BIOS.

In the balanced profile, I idle at 800MHz. In high performance, I'm clocked at 4.5GHz. The difference between these is about 6W.


----------



## Elephant (Jul 17, 2016)

rgames said:


> The voice count when running sequencer and samples on the same machine is always a bit less - how much less will depend on the project. My standard template has a few hundred tracks, a few dozen plug-ins (mostly EQs), four IR reverbs and a 15 or so group tracks for stems and drops the voice count by 10% or so when running the samples from the same machine.
> 
> Re: idle power draw, it's about 50 W.
> 
> rgames



Thanks ! Very encouraging.


----------



## Phryq (May 2, 2017)

rgames said:


> Yeah - I've never been able to explain why the read speeds and IOPS don't make much difference. When I saw no difference as a function of SSD read speeds 3-4 years ago I thought maybe IOPS would be the key but as I've started using NVME drives (which have much higher IOPS) I still haven't seen much improvement.
> 
> And that's not just for sample streaming - I do a lot of time-lapse photography and wind up with thousands of RAW camera images that need to have preview images generated and cached to manipulate them. I've tried putting my cache drive on NVME but doing so didn't make the process any faster. Same kind of thing as streaming samples but much larger file sizes. Neither one seems to benefit from the NVME drive.
> 
> ...





rgames said:


> That's certainly part of it - I use a $10 fanless video card. Fanless ensures that it won't draw much power. You can also use the onboard GPU on the 6700k but I think you get better overclock with the onboard GPU disabled - less heat generation on the die.
> 
> Do you disable the power-saving features? I also have all the power-saving features enabled (e.g. all C states enabled, speedstep enabled, etc). So my idle CPU speed is something like 1.3 GHz.
> 
> rgames



So would you recommend getting a simple graphics card, even for pure audio work, simply to take some load off the CPU? E.G. Can I buy a $10 GPU that fill fit into my Mini ITX?


----------



## JohnG (May 2, 2017)

Phryq said:


> would you recommend getting a simple graphics card, even for pure audio work, simply to take some load off the CPU?



Hi Phryq,

I used to add a cheap graphics card in the olden days for exactly this reason but, today, CPUs are so insanely more powerful that I would think the voice count impact of having or not having that card is negligible on a PC slave. I have no data to support this hypothesis.

That's if you're using it for music; for games it's another story.

For the main DAW computer, a lot of people use multiple screens. Most mobos don't seem to support more than two without a separate graphics card.

Kind regards,

John


----------



## Phryq (May 3, 2017)

Ok, thanks. I'm only on 1 screen. I wonder if a GPU would at least help by taking care of GUIs etc?

Anyhow, I'm hoping not, because it would just be 1 more layer of complexity building the machine.


----------



## shmimptone (Jun 5, 2018)

rgames said:


> ...as I've started using NVME drives I still haven't seen much improvement.



So, it's been two years since this was posted. Are people still really finding this to be true? It seems so strange...

Just to make sure - we're talking about NVMe drives running on the PCIe bus - not M.2 SATA drives (which are no faster than a regular SSD).

These two pages helped me better understand the difference:

goo.gl/ttxVCF

goo.gl/BuZaUs

(* see below...)

I'm about to build a new PC for VEP6 usage. I really want to believe that NVMe drives will help, but I'd also be happy to save some money!

Thanks...


* The forum software wouldn't let me post a link because I'm new to the forum. To get around this, links are shortened and unclickable - copy and paste them into address bar...


----------



## EvilDragon (Jun 5, 2018)

Might wanna read this:

https://docs.google.com/document/d/1wL8XYGgd_O9fomMrK1EpSnZJeQwhVOAn91e82byj8s4/edit


----------



## shmimptone (Jun 5, 2018)

EvilDragon said:


> Might wanna read this:
> 
> https://docs.google.com/document/d/1wL8XYGgd_O9fomMrK1EpSnZJeQwhVOAn91e82byj8s4/edit



Wow! Thank you!


----------



## shmimptone (Jun 5, 2018)

shmimptone said:


> Wow! Thank you!



I just have to say wow again. That is a fantastic analysis of some very complex phenomena. 

Regarding your last comment in the document:

"DFD streaming could use a closer look in terms of the performance benefits at large scale, across a project consisting of 100+ tracks, and at aggressively lower preload buffer sizes."

I think that question is going to become more important. Although SATA drives perform well enough now, that's probably because sample libraries were developed for usage with them, no? Now that faster drives exist, libraries will push the envelope again...

As for who's the best qualified to answer those questions, I'd love to help, but I don't have your technical acumen. I hope you pursue it! Anyone thinking about rebuilding their rig has to decide whether to go NVMe or not, so I'm guessing this topic will continue to come up.


----------



## tack (Jun 5, 2018)

shmimptone said:


> I just have to say wow again. That is a fantastic analysis of some very complex phenomena.


Well thanks 



shmimptone said:


> "DFD streaming could use a closer look in terms of the performance benefits at large scale, across a project consisting of 100+ tracks, and at aggressively lower preload buffer sizes." I think that question is going to become more important. Although SATA drives perform well enough now, that's probably because sample libraries were developed for usage with them, no? Now that faster drives exist, libraries will push the envelope again...


Yeah, I do plan to do a similar analysis with DFD at some point. Like I wrote in the paper, DFD streaming hasn't been a pain point for me with Kontakt so the itch isn't too serious enough to scratch yet, but I'm certainly interested. I do think this will be quite library-dependent, so a proper analysis would require a sufficient number of libraries, which just makes it all the more time consuming.


----------



## edgar_hsu (Jun 6, 2018)

Have anyone tested NVME PCIe/SATA3 ssd with Eastwest's Play(especially Hollywood Strings diamond)?


----------



## Stevie (Jun 6, 2018)

tack said:


> I don't disable them, no. I have a couple icons on my desktop that makes it easy to switch between high performance and balanced power profiles, so when I'm in my DAW I run in high performance, and otherwise with balanced which lets the CPU clock down while idle. No C states are disabled in BIOS.
> 
> In the balanced profile, I idle at 800MHz. In high performance, I'm clocked at 4.5GHz. The difference between these is about 6W.



Can you elaborate on these icons? Sounds very handy!
I have all power saving options disabled in the BIOS, because the high performance power scheme still seems to throttle down, which results in bad audio performance. What settings did you use to keep the machine at full speed?


----------



## shmimptone (Jun 9, 2018)

tack said:


> Well thanks
> 
> 
> Yeah, I do plan to do a similar analysis with DFD at some point. Like I wrote in the paper, DFD streaming hasn't been a pain point for me with Kontakt so the itch isn't too serious enough to scratch yet, but I'm certainly interested. I do think this will be quite library-dependent, so a proper analysis would require a sufficient number of libraries, which just makes it all the more time consuming.



The DFD analysis, especially if done with a CPU hungry library, would help answer the question of what kind of chip to use in a new build, and whether one should err on the side of faster CPU or more cores. VEP support says that Vienna likes more cores (I asked them...) But sample loading, at least, is CPU bound. Threadripper, for example, is tempting. But I worry that the base clock speed is too low...


----------



## tack (Jun 9, 2018)

Stevie said:


> Can you elaborate on these icons? Sounds very handy!


I basically just set them up like this.



Stevie said:


> I have all power saving options disabled in the BIOS, because the high performance power scheme still seems to throttle down, which results in bad audio performance. What settings did you use to keep the machine at full speed?


Like this. 

CPU frequency scaling is definitely a killer for audio latency. I'm surprised if you're still observing it in high performance mode because I _thought_ that (i.e. "Minimum processor state" set to 100%) was default. Turning off core parking (as explained in the above link) I'm more dubious about -- I haven't tested its effect directly -- but disabling it fits with the high performance theme so I leave it in.

How are you determining if the CPUs are throttling down? I use something like HWMonitor.




shmimptone said:


> But sample loading, at least, is CPU bound. Threadripper, for example, is tempting. But I worry that the base clock speed is too low...


And in Kontakt that initial sample loading is single threaded. And decompression isn't a cheap operation. So indeed, from the perspective of a single Kontakt instance, DFD streaming is going to do better with an 8700K compared to a 1950X. But for a project perspective with many Kontakt instances where decompression can be parallelized, the equation starts to shift. All those nuances complicate testing.


----------



## shmimptone (Jun 9, 2018)

tack said:


> But for a project perspective with many Kontakt instances where decompression can be parallelized, the equation starts to shift. All those nuances complicate testing.



re: use case, I'm looking to have multiple Kontakt instances running in VEP at the same time.

Kontakt sample loading is single threaded. Kontakt sample streaming, I guess, can use multiple threads if you allow Kontakt to use multiple threads via its preferences. But I'm not sure if that means that each instance is multi-threaded...

Given these parameters:

- I've enabled Kontakt to use multiple threads
- I've loaded multiple instances of Kontakt in VEP and I want lots of them (for some reason!) to play back at once

...would Kontakt sample decompression _for a given Kontakt instance_ then be able to be parallelized across multiple cores? Or, does each Kontakt instance stick to only one core at a time while decompressing samples?

If it's really the latter, then the Threadripper probably is a no go...


----------



## Stevie (Jun 9, 2018)

tack said:


> I basically just set them up like this.
> 
> 
> Like this.



Ah thanks man! 


tack said:


> CPU frequency scaling is definitely a killer for audio latency. I'm surprised if you're still observing it in high performance mode because I _thought_ that (i.e. "Minimum processor state" set to 100%) was default. Turning off core parking (as explained in the above link) I'm more dubious about -- I haven't tested its effect directly -- but disabling it fits with the high performance theme so I leave it in.
> 
> How are you determining if the CPUs are throttling down? I use something like HWMonitor.



Yes, it's definitely clocking down to around 1000 MHz for a short moment and then going up to 4000MHz again.
The minimum processor state doesn't do anything here, apparently. That's why I turned all APM off (C-States). I'm using CPU-Z (same dev as HW monitor).


----------



## tack (Jun 9, 2018)

shmimptone said:


> Kontakt sample streaming, I guess, can use multiple threads if you allow Kontakt to use multiple threads via its preferences. But I'm not sure if that means that each instance is multi-threaded...


Each instance is multithreaded. It's hard to say exactly what those threads are being used for. Without debug symbols for Kontakt (something NI would never publicly distribute), the best I can do is infer design from very high level thread activity. Or I could try to wade through a disassembly, but ain't nobody got time for that!

I had a bit of a poke at the DFD case. Looks that unlike with the patch loading scenario I tested previously, with DFD streaming there are two threads doing I/O. This is corroborated by the fact that the disk queue depth pretty much caps at 2 (even when streaming off rust). So a bit of parallelism here. Interestingly, I observe the same thing even when multiprocessor support is disabled.

I believe voices are being processed across the thread pool. Whether or not sample decompression is as well, it's not clear. But the low disk queue depth during DFD does suggest to me that NVMe would help for the same reasons it did in the initial sample loading case: the poor parallelism means that even run-of-the-mill SSDs would be underutilized, but the very low latency of NVMe will improve overall throughput due to lower round-trip times to compensate for that design limitation.

Now as to whether or not more cores will help the DFD case more than fewer/faster cores on a single Kontakt instance, my intuition right now is still no, because in the testing I described in that Kontakt Patch Load Performance doc, I didn't see any evidence that sample decompression was multi-threaded.

On the other hand, in that testing for the patch loading case, I/O was clearly single threaded whereas in the DFD case I'm seeing two threads, so I stand to be wrong here. There's just no substitute for testing the actual thing. 

But once you have multiple instances of Kontakt, you get parallelism "for free." So through multiple Kontakt instances you'll be able to squeeze more life out of your flash storage (without needing to resort to the super low latencies of NVMe) and distribute decompression over more cores (even if it actually is single threaded within a single instance).

All this collapses down to conventional wisdom when it comes to the question of more slower cores vs fewer faster cores: as long as the CPU's single core performance is good enough to run your most demanding single instance (whatever it is), then these days adding cores is the better price-performance ratio.

The really hard part is knowing whether the 1950X (or whatever you were looking at) truly is fast enough for your single most demanding instrument without actually trying it.


----------



## Sami (Jun 9, 2018)

tack said:


> Each instance is multithreaded. It's hard to say exactly what those threads are being used for. Without debug symbols for Kontakt (something NI would never publicly distribute), the best I can do is infer design from very high level thread activity. Or I could try to wade through a disassembly, but ain't nobody got time for that!
> 
> I had a bit of a poke at the DFD case. Looks that unlike with the patch loading scenario I tested previously, with DFD streaming there are two threads doing I/O. This is corroborated by the fact that the disk queue depth pretty much caps at 2 (even when streaming off rust). So a bit of parallelism here. Interestingly, I observe the same thing even when multiprocessor support is disabled.
> 
> ...




So lets say one had unlimited funds and wanted to buy a system which could run the entire huge template without slaves; would they go for something like the 1950x or the 7980xe supposing you could run both at the same clockspeed? Also, if we scale down the issue and take a cpu that has less cores than the 18-core skews but can run faster, say the 7940x at 4,7 instead of the 7960xe at 4,2. Who do you suppose will win there? Then there is the question of storage. Will all nvme make a difference here (probably for times, but for the scenario where you want everything on one system so you want to keep dfd buffers low to not fill up more ram)?. Also the question of latency. DAWbench testing puts the intel parts ahead at low latencies which is where we want to write at. And the DAW/VEPro locally? I guess Reaper is more efficient than say Cubase but how about the iMac pro/Hackintosh folks and Logic?

Thanks for your estimation, this is obviously all hypothetical.


----------



## shmimptone (Jun 9, 2018)

tack said:


> The really hard part is knowing whether the 1950X (or whatever you were looking at) truly is fast enough for your single most demanding instrument without actually trying it.



Also, regarding NVMe drives, wouldn't they actually increase CPU load because they're faster? In your current test report, you say "the lower the disk access latency while reading a block, the busier we will be able to keep the CPU."

So, the Threadripper might hypothetically be ok if I use SATA drives, and too slow if I use NVMe drives? 

(I'm hoping that I've got something wrong about that...)


----------



## shmimptone (Jul 1, 2018)

Just wanted to add two very useful links that I've found. If you (like me) were considering a Threadripper, or some other multi-core beast to run a VEP6 setup, these two articles will probably talk you out of it:

https://techreport.com/review/32390...and-ryzen-threadripper-1950x-cpus-reviewed/14

http://www.scanproaudio.info/2017/08/14/first-look-at-the-amd-threadripper-1920x-1950x/

The first article provides real-world tests; the second provides explanations.


----------

