Blind testing with really interesting results...[POLL]

Garry · May 29, 2018

On another thread, I had proposed the value of blind testing of sample libraries. There are many caveats, but we the purchasing community, could really help ourselves by having blind, randomised tests of short (5-10 seconds) melodic lines, to compare different libraries, and the supposed differences between them. When we compare in open testing, we bring so many biases (whether or not we've already bought the library, whether we like the company, how good the company's marketing of the library was, the price of the library ostensibly relating to its quality, how old the library is, etc etc).

Some disagreed on the value of such an exercise, and I wouldn't have re-opened up the discussion... But then... who knows, perhaps inspired by that discussion, Christian Henson posted not one, but TWO blind tests, first for microphones, then for reverbs. The results were FASCINATING! Some of the reverbs that cost thousands lost out to much lower priced offerings, and similarly, Christian's expectation that the U67 mic would be identified as the best microphone, was not borne out on blind testing and voting.

So, turning the power of the blind, randomised test now back on sample libraries - what could we learn? This community has precisely the right resources to do this test well. Between us, we have access to every library on the planet, we have skilled musicians to perform the lines, we have skilled listeners to evaluate them, using the best equipment. And just watch Christian and Jake: how much fun they had doing it, and how insightful they found it, watching their biases melt away before their eyes!

So, I propose a competition. First we would need to agree the rules:

we need to agree a simple melodic line that would be capable of demonstrating the prowess of a library
we need to agree an instrument - I suggest violins would be a good starting point
we need to agree a procedure. I suggest the following:
1. Anyone is free to submit 1 or more versions, each using a different library. All features within the library (different mic positions, reverbs, EQ, articulations) can be used, but nothing must be contributed that contains anything outside the library (no third party plugins etc).
2. The audio file is submitted to the collator - an independent person (I'm happy to do it, but anyone else can) - the file should be labelled unblinded (ie with the library name in the title), and sent via PM.
3. The collator retains the names of the libraries, but then posts all libraries together at the same time, with blinded labels in a randomised order. The collator is not allowed to vote.
4. Anyone is free to vote, and sends their votes not to the forum, but to a second collator (so that others are not influenced by seeing what others are voting for).
5. Collator 2 then posts the results, and collator 1 unblinds the libraries

So, what do you think? If we the purchasing community don't do something like this, then our ONLY alternative is to buy the libraries ourselves, compare them after purchase, after which you're stuck with it, and for the most part, can't sell it whether your purchase was good or not. There has to be a better way!

What say you VI-C community? If the polling suggests we are an active, data-led, community that wants to make informed choices, and not be held to vicarious demos, then let's go ahead. If instead, we vote against our own interests, and decide that it's too hard or too much trouble, then I'll accept this is the consensus view.

d.healey · May 29, 2018

Most people don't base their purchases on the sound, they base it on price (higher price = better product of course

) and popularity ( if everyone else is using it then it must be good, right... ) and marketing/hype.

Playability and features are what are important to me, more so than the sound - although I expect a certain level of sound quality.

I'm sure people will find this interesting but I don't think it will affect the majority of purchasing decisions.

Saxer · May 29, 2018

"I don't think it's valid, but it would be fun."

Good answer to choose.

I heard a lot of good libraries used so badly that I would never have bought it after hearing that. I also heard great stuff from libraries I have and that are not useful for me at all.
All I know: if I hear a really good sounding example I know that there's someone who has the right library for his/her needs. Might be fun to listen to.
But a 5-10 second line says nothing. And it isn't even fun. Make something that this libraries are created for. For example: music.

Garry · May 29, 2018

Saxer said:
"I don't think it's valid, but it would be fun."

Good answer to choose.

I heard a lot of good libraries used so badly that I would never have bought it after hearing that. I also heard great stuff from libraries I have and that are not useful for me at all.
All I know: if I hear a really good sounding example I know that there's someone who has the right library for his/her needs. Might be fun to listen to.
But a 5-10 second line says nothing. And it isn't even fun. Make something that this libraries are created for. For example: music.

Were you similarly unconvinced by the approx 10 seconds Christian used for both the mic and reverb tests? Did you learn nothing from these that would help inform your opinion of them? Christian and Jake were both impressed by the results of the testing - so why would this not work for sample libraries? Particularly when compared to what is our status quo.

Garry · May 29, 2018

A great example of the bias we bring to open comparisons was when Jake offered his opinion about Altiverb before the testing. Remember, this guy is the sound engineer at Spitfire, and frequently credited for the quality of their libraries. Before the test, he said he knew which one was the Altiverb, such that it almost invalidated the test. Then he did the test, and rejected it on round 1!

Blind testing is the only way if you don’t want to get suckered into all the biases that are play, most of which we are completely unaware of at the time.

tack · May 29, 2018

Well constructed blinded testing is invaluable. The reverb test was fun to watch but the methodology was flawed. (To start with there weren't nearly enough trials to conclude anything with statistical significance. Also it wasn't double blinded.)

So this means I really want to vote twice. Proper controlled testing is extremely useful. But doing it right is terribly difficult and tedious, so the ways that most people would end up doing blinded testing could be fun (in the way that exposing us to and reminding us of our biases is always a great lesson) but likely wouldn't produce useful results.

Saxer · May 29, 2018

I really trust Christians and Jakes opinion. They mixed and recorded extremely more and better than I did (and probably ever will do), they have the tools avaliable and are able to compare. There are also a lot of other people I respect and trust their opinions too. But watching them playing the different reverb tails on a YT video didn't help me at all. For a while I bought a lot of reverbs and tested and tried and waited for the wonder to happen... it didn't. At some point it's a decision to choose something and then just use it. Sure, that has to be good tools. But I think it's the same with sample libraries and DAWs. Or the instrument you play.

But the main thing is: I don't know how able the players behind a forum blind test are. Playing different libraries by one player is risky enough because everyone has different playing styles (which includes the use of CC, key switches etc). It could match one library and doesn't work with another. Happens to me a lot.
I would trust a test made by Thomas Bergersen and Andy Blaney, but only if both use all tested libraries. Wouldn't help if Andy plays Spitfire only and Thomas only EastWest. But vice versa would be really interesting!

gyprock · May 29, 2018

How do you then measure the pleasure aspect of using one library over another. Give a $100 Strat guitar copy to Eric Clapton and he'll make it sing although he might not get pleasure out of playing it. Likewise if a library sounds great but is a dog to work with, then how will this be judged. Of course if this is purely a test of sound quality and realism then my comments can be ignored.

Garry · May 29, 2018

tack said:
Well constructed blinded testing is invaluable. The reverb test was fun to watch but the methodology was flawed. (To start with there weren't nearly enough trials to conclude anything with statistical significance. Also it wasn't double blinded.)

.

Statistical significance is not the goal here. With the data in hand, each person, n=1, can make their own decision about whether any individual library stands out, but do so in an unbiased way. That’s not a statistical question, and a p-value is not meaningful in this context. Regarding double-blinding - yes, that’s fair: they could have divulged information in their voice as they announced each library, without meaning to. Watching it, I don’t feel that’s the case, but it’s a fair point. In our testing, we could introduce a double-blind: collator 1 sends blinded files to collator 2, who then reallocates new, random labels, but I think there’s a balance to be struck between purity and practicality. In my view, as long as people are listening to files not knowing which library they came from, that’s probably good enough for our purposes.

tack said:
So this means I really want to vote twice. Proper controlled testing is extremely useful.

Yes, repeated measures is the gold standard! But let’s run before we can walk. Imagine if we ran this test between six libraries, and library D won. My expectation that library D would win the 2nd time is actually quite low: I think the differences between the libraries is nowhere near what the market men would have us believe. So, I think people would be forced to make lots of random and arbitrary choices, such that the result would change on the 2nd test. That would be interesting, indicating that the differences between the libraries is small and the preference unstable. But let’s get test no1 done first!

tack said:
But doing it right is terribly difficult, so the ways that most people would end up doing blinded testing could be fun (in the way that exposing us to and reminding us of our biases is always a great lesson) but likely wouldn't produce useful results.

Again, it depends what your goal is. Mine is simple: I want to have an objective way to compare different libraries; secondly, I would be interested to hear what a community of experts thinks, relative to my own preference. With these limited goals in mind, it’s entirely feasible. Ask the homeopaths about randomised testing, and they will run a mile, because they’re not confident in the value of their products. There is no reason for us to fear randomised testing: we want to know the truth.

Garry · May 29, 2018

gyprock said:
How do you then measure the pleasure aspect of using one library over another. Give a $100 Strat guitar copy to Eric Clapton and he'll make it sing although he might not get pleasure out of playing it. Likewise if a library sounds great but is a dog to work with, then how will this be judged. Of course if this is purely a test of sound quality and realism then my comments can be ignored.

One test can’t achieve everything: the proposed test will tell you this: which sound do YOU prefer, given objective data on which to base your decision, and how does this compare to what other people prefer (who will use their own criteria). It can’t tell you about the playability - it’s not designed to do so. But presumably sound quality is ONE factor (not the only one of course) in deciding which library to buy. If so, then this would help you on this one aspect. You would have to seek out other information for other questions to inform your decision.

Ethos · May 29, 2018

I'm not only interested in how a library sounds. I also care about the UI, the programming possibilities, the number of articulations, and all assortment of other things that wouldn't be determined in a blind listening test. Also, I've heard amazing midi mockup composers make libraries generally regarded as bad sounds incredible.

Garry · May 29, 2018

Ethos said:
I'm not only interested in how a library sounds. I also care about the UI, the programming possibilities, the number of articulations, and all assortment of other things that wouldn't be determined in a blind listening test. Also, I've heard amazing midi mockup composers make libraries generally regarded as bad sounds incredible.

Of course, who isnt’? But presumably, you’re ALSO interested in how it sounds? If so, presumably unbiased data would be useful to you in making that determination? Again, one test cannot answer all questions, but it doesn’t invalidate its utility for the one question it does answer.

Garry · May 29, 2018

Saxer said:
I really trust Christians and Jakes opinion. They mixed and recorded extremely more and better than I did (and probably ever will do), they have the tools avaliable and are able to compare. There are also a lot of other people I respect and trust their opinions too. But watching them playing the different reverb tails on a YT video didn't help me at all. For a while I bought a lot of reverbs and tested and tried and waited for the wonder to happen... it didn't. At some point it's a decision to choose something and then just use it. Sure, that has to be good tools. But I think it's the same with sample libraries and DAWs. Or the instrument you play.

But the main thing is: I don't know how able the players behind a forum blind test are. Playing different libraries by one player is risky enough because everyone has different playing styles (which includes the use of CC, key switches etc). It could match one library and doesn't work with another. Happens to me a lot.
I would trust a test made by Thomas Bergersen and Andy Blaney, but only if both use all tested libraries. Wouldn't help if Andy plays Spitfire only and Thomas only EastWest. But vice versa would be really interesting!

I think there could be value in the proposed design to answer your question. Let’s say we test 3 libraries, and there are 5 entries for each. Players 1, 2 and 3 are wonderful, but 4 and 5 are lousy. One would assume that votes for the files submitted by players 1, 2 and 3 would rank high if it’s a good library (combination of good library, good player), and votes for 4 and 5 would be lower. That would tell you that a good player can get a lot out of this library. Of course, you can’t know which are good and which are lousy players. But you will get a sense of this from the sample they contribute, and can weight your own view of their contribution accordingly. But the point is, with more than 1 contributor per library, it will be less dependent on the ability of any one individual, and outliers for a given library, based on poor playing ability, will stand out, and be voted down. Also, now imagine that despite numerous contributions, library B is consistently ranked low. The likelihood that this is due to poor ability from any one player is mitigated (unless there’s an interaction, ie, poor players tend to buy and submit contributions from this poor library - that too would be useful information to know!).

Garry · May 29, 2018

Thinking about the value of blind testing, consider this: the Bricasti model 7 costs $4000. I think I remember Christian saying that he bought it, because he was convinced by Jake of its quality (again, remember that Jake is Spitfire’s sound engineer). If you were considering spending 4 grand on the Bricasti, you can only rely on online demos and you could not test it before you buy it, and once you buy it you cannot sell it, would it be useful information that NEITHER Christian nor Jake picked out the Bricasti under blind testing conditions? Would it answer ALL questions you have about your potential purchase? Almost certainly not. Would that undermine the value of having this information? Presumably not? Would you have wanted to at least know this before you spent your 4 grand, if you were at least partly going off Christian's recommendation? I'm guessing so! As I mentioned before, I wouldn’t have reopened this discussion, if it were not for the irony that we have Christian/Jake doing reverb & mic blind comparisons, and forum members arguing that Spitfire’s own products (and those of the rest of the industry) cannot be subjected to precisely the same tests.

We as VI library purchasers often spend this and much more, just usually over a longer period. We as consumers should demand the information we need; forums like this enable us to do so. We just have to band together, and make the value of the information the best it can be. Use the power of the crowd, and it will influence developers: if efforts such as this begin to drive sales, they will have to show that their tools truly differentiate from the competition. They have it all their own way right now: you can’t listen to it in retail outlets (mostly), you can’t return it (mostly), you can’t sell it (mostly), and they don’t publish their own comparisons. Take control fellow musicians! Special pleading, that sample libraries are somehow unique and different and not amenable to gold-standard, comparative testing, will keep us all in exactly the same situation we’re all in right now.

“They may take away our resale rights, but they’ll never take.... our FREEDOM!!!” (or something like that!)

fretti · May 29, 2018

It would definitely be fun. But it would also definitely not shatter any opinion of a certain library I have/had beforehand or directly change any buying decision of mine.
The thing though, that might be triggered by this test, is that I dig deeper into the library I felt sounded best. Searching for more videos, walkthroughs, demos, test, reviews or whatever to see if it really can add something, fits my needs, workflow etc.

Also: when doing such a test, you (or anyone doing it) would have to write a big disclaimer of what is the targeted goal of this test. Otherwise people might see it as something it actually isn't, because even with the "same" midi and no external effects, the possibilities within the libraries vary very much, and just with midi programming, mic positions, and internal effects you can change and ("enhance") the sound of every library.

So it obviously isn't a "here is what the first key you press after loading the patch sounds" comparison. :thumbsup:

Garry · May 29, 2018

fretti said:
It would definitely be fun. But it would also definitely not shatter any opinion of a certain library I have/had beforehand or directly change any buying decision of mine.

Are you sure though?

Yes, it doesn't really matter what order everyone else ranks the library as (that bit is mainly just for fun, to compare YOUR ratings to the community - perhaps it might just make you rethink if there's a disparity, then perhaps there's something you're missing?). But what about your OWN ratings: e.g. I haven't yet bought SCS, but I'm strongly leaning towards it. If when I compare it with 6 other libraries that have been well performed, and I can't pick it out (as was the case in a recent shoot out), that would certainly make me question whether I'm being distracted by the marketing.

fretti said:
Also: when doing such a test, you (or anyone doing it) would have to write a big disclaimer of what is the targeted goal of this test. Otherwise people might see it as something it actually isn't

Yes, good point. But I think the community needs to decide this (assuming the community decides we do it!). Indeed, people might have their own personal disclaimers and their own goals: e.g., I don't really care what the ratings of the community are, that's just for fun, but I care whether my presumptions of which library is better, are confirmed when I compare the libraries blinded, and if they're not, I need to re-evaluate. But perhaps others may have other goals.

fretti said:
because even with the "same" midi and no external effects, the possibilities within the libraries vary very much, and just with midi programming, mic positions, and internal effects you can change and ("enhance") the sound of every library.
So it obviously isn't a "here is what the first key you press after loading the patch sounds" comparison.

Yes, exactly. It's for this reason, I don't think we should limit what people are allowed to do to produce the agreed melodic line, with the exception that they must use only the resources within the library. If mic positions, internal effects are what make the difference, this should come out in the comparison.

gsilbers · May 29, 2018

this is how every purchase should be made imo. dont rely on marketing at all. if u are running a commercial studio yes, a neumann u87 o u67 might be more beneficial to have and show but at a home studio if you hear the warm audio or stam audio u87 sounds about the same or even sometimes not even a difference, then why pay so much.

ive been seeing these types of videos for years now. its been 20 years ive been doing pro audio and now with not only the internet where you can compare, but also cheaper production techniques have open my mind a lot and cannot believe poeple who cannot tell a difference between a neve preamp vs a art channel strip2 get so emotional invested into the neve brand. its a realy old technology that i cant believe we are still trying to make it seem its better than others.
there are barely any difference between mic preamps at all. every company in the last 10 years has reverse engineer the neve, the u87 etc etc and keep churning out prodcuts that mimic the sound. and the competition wants it to be closer and closer. there used to be "cheap chinese mics" but thats not the case anymore. even audio interfaces have come a long long way since the mbox1 days. and new mics, preamps, interfaces if compared in a blind test will all come out around the same. its not going to be better or bad, just a tad different.

there should be a lot more blind test. i think it should be double blinded though. not even knowing at the end what exactly you listen to. or that they show an image of a neve or u87 but place the audio of a behringer or art and vice versa. then all of those youtube comments will go nuts. and specially the high end crowd at gearslutz always trying to fight this low end world that has come a logn way.

as for sample libraries i thinkn videos like this are very usefull

i was surprise how much i liked cinematic studio strings. i know each library probably has the veolicty range place differently etc etc but in general i can hear the difference and what i inmediatly like or dont. and feel confident i can make a sound choice (at least for spicc)

Garry · May 29, 2018

gsilbers said:
as for sample libraries i thinkn videos like this are very usefull

i was surprise how much i liked cinematic studio strings. i know each library probably has the veolicty range place differently etc etc but in general i can hear the difference and what i inmediatly like or dont. and feel confident i can make a sound choice (at least for spicc)

I agree completely. Daniel's video was a perfect example of a melodic line that would be short enough to allow comparison between libraries, but long enough to enable evaluation of each library individually; also, it was played skillfully, so the comparison of libraries is truly about the libraries, and not the player. I only wish he'd done it blindly, but he can't do that on his own - we need a community to do that. Hey, I know of one!

RiffWraith · May 29, 2018

Garry said:
so why would this not work for sample libraries?

B/c sample libs have much more to them than reverb. Between different velocity/dynamic layers, reaction to the modwheel, stereo imaging, how one lib fits in/mixes with others (or even other sections from the same lib) and a whole host of other things, you can't really compare a reverb test to a sample lib test - two completely different animals.

Getting reverb to sound good takes little to no effort, while getting samples to sound good take a good deal of effort. I too voted "I don't think it's valid, but it would be fun."

Cheers.

Garry · May 29, 2018

RiffWraith said:
B/c sample libs have much more to them than reverb. Between different velocity/dynamic layers, reaction to the modwheel, stereo imaging, how one lib fits in/mixes with others (or even other sections from the same lib) and a whole host of other things, you can't really compare a reverb test to a sample lib test - two completely different animals.

Getting reverb to sound good takes little to no effort, while getting samples to sound good take a good deal of effort. I too voted "I don't think it's valid, but it would be fun."

Cheers.

The special pleading of the homeopaths

Scientist: why can't homeopathy [SAMPLE LIBRARIES] be subjected to conventional, randomised, double-blind, placebo-controlled testing, just like we do for other comparisons we want to make [REVERB]? Then we would know how it compares to conventional medicine [OTHER LIBRARIES].

Homeopath: oh, you can't just reduce homeopathy to such a simple test! What are you thinking? Don't you realise, we have to holistically evaluate and interview the patient; each patient [SAMPLE LIBRARY] is different, and requires a unique, individually-specific remedy. This isn't just a pill [REVERB] you know.

Scientist: ok, you evaluate the patient [PRODUCE THE MELODIC LINE], and include any/all assessments you want [ANY MIC POSITIONS, ARTICULATIONS, ETC]. Send your patient to the homepath [SAMPLE COLLATOR], and at that point we'll randomly assign placebo/homeopathic remedy [BLINDED LABELS TO SAMPLE FILES]. After this, we will determine whether those given placebo get better relative to those given a homeopathic remedy [WE WILL DO A BLINDED COMPARISON OF EACH LIBRARY, AND EACH SAY WHICH WE PREFER]. That way, you can incorporate all of the things you feel are necessary to fully account for the superior effects of your remedy [THE LIBRARY CAN BE USED TO THE FULL TO GET THE BEST FROM IT]. At the end of the day, the remedy either works, or it doesn't [A LIBRARY EITHER STANDS OUT ABOVE THE OTHERS, OR IT DOESN'T], but this would be a true test of the claims you make for your remedy [SAMPLE LIBRARY].

Homeopath: ah, bollocks :sad:

Scientist: yup, it is. :sneaky:

Special pleading is in the interest of those who want to avoid knowing the truth. The scientists [BEST SAMPLE DEVELOPERS] will encourage such unbiased comparisons, because they're confident in their treatments [LIBRARIES], and are not afraid of comparison. The homeopaths [INFERIOR DEVELOPERS], who know they are making claims beyond that which their data can support, will undertake special pleading every time a meaningful test of their claims is proposed.

Blind testing with really interesting results...[POLL]

Should we conduct a blind test of sample libraries?

Yes - I think it would help me choose between different libraries

Yes - I don't think it's valid, but it would be fun.

No - I don't think it's valid - too many caveats

No - I think's it's valid, but I'm not interested in the results.

Senior Member

This is a custom title

Senior Member

Senior Member

Senior Member

Damned Dirty Ape

Senior Member

Active Member

Senior Member

Senior Member

Active Member

Senior Member

Senior Member

Senior Member

Senior Member

Senior Member

Part of Pulsesetter-Sounds.com

Senior Member

Senior Member

Senior Member