# The future of virtual instruments



## Dima Lanski (Mar 11, 2020)

Hi, vi-control members and fellow composers!

This is my first post here, so I'd like to introduce myself. My name is Dmitry, and I'm a former software engineer and now an aspiring composer.

For the last couple of years, among other things, I've been thinking about the future of virtual instruments, and have come up with a few ideas. I want to share one of them and hear your thoughts, especially from people who are making sample libraries. I'm doing it since I'm probably never going to implement it myself anyway, since I don't have access to a decent hall or a studio, or musicians, or the recording equipment, or engineers. But I want my virtual instruments to improve. And the sooner, the better

The basic idea is to record dry samples with a semisphere of mics to capture the sound from all directions. Then put the same semisphere in a real room, in a spot where the musicians would be seated, and capture an IR for each mic point on the sphere by putting a speaker there. Then in playback time apply the corresponding IR to each mic recording. This should hypothetically simulate the propagation of sound in a room in all directions from an ensemble in a specific spot.

So, let me outline the motivation for doing it this way.

As most of you know, there are a couple approaches to recording samples. The first approach to sampling is to record with wet room mics. Most of the modern libraries do that. And they sound great and super realistic. Until you move the mod wheel too fast. Than the reverb plays at a wrong volume or is just missing, and it sounds like the ensemble is moving all over the room, getting further or closer while it plays. Or until you play a fast legato line, and the notes starting from the second one don't have the proper reverb of the previous notes. In your favorite library, play a fast legato line with a tree mic enabled, and then compare it to the same line but with a close mic and an IR, and you'll hear what I mean. This actually leads to the necessity to move the mod wheel and play legato lines slowly, or to put a lot of extra reverb on top. Or both. Which in turn limits the styles of music that can be played with these VIs.

And so there's another, older approach: dry samples. And with them you're supposed to create your own space using IRs. But right now most of the IRs are stereo, and they work exactly as the word suggests. They're NOT capturing the sound of a room as a whole, whatever that would mean, but instead they specifically capture the propagation of sound from a pair of stereo speakers to a pair of room mics. And this is literal: if you put any sound through such an IR, it sounds exactly as if it's playing through a wide PA in a room. Try it with a closely recorded human voice and you'll hear exactly what I mean. It's like you're at a wedding or a corporate party and someone took the mic to tell a joke. In this sense the IRs are a wonderful and realistic technology. But put a close mic recording of a string section, or especially a brass section, and it sounds flat. Again, exactly like it's going through a PA in the room. You can massage the signal with EQs and delays to make it sound decent. But I've never heard someone achieve a realistic sound of an ensemble in a room this way.

So here's the idea: let's not capture the sound of a PA in a room, but instead capture the propagation of sound in all directions from a specific spot of a room where an ensemble would be seated. And then record an ensemble in a dry room from all directions. Put it together and it should give the best of both worlds: the responsive performance of dry samples, with the ability to move the mod wheel at any speed, seamless stitching of legato or any other samples, but also (hopefully) a realistic room sound with proper depth and positioning.

So, here's how you do it. First, capture the IRs. There are no musicians involved yet. Find a good sounding room: Sony, Teldex, AIR (in no particular order). Pick a spot in the room you want to capture, you probably want to start with the usual seatings for each orchestral section. Now, imagine a semisphere around the spot you picked, just big enough to cover an ensemble. Pick a few evenly spaced points on that semisphere, 9 of them seems like a good starting number. But it's hard to predict how many you'll actually need for good results. Maybe it's 2, and then it's not really a semisphere. 3-8 might be good numbers. Or maybe it's a hundred and then we're screwed. But hopefully it's not more than 16.

Now, take a speaker that only sends the sound in a cone in front of it, while isolating all the other directions with sound proof material. Put it in one of the points on the imagined semisphere, position it perpendicular to the sphere's surface and facing away from the center. Then record an IR with a room mic. Do this for all the points on the semisphere.

Now you have a set of IRs, each representing the propagation of sound in a single direction. And together they capture the propagation of sound in all directions for that spot in the room. All that is left, is to record an ensemble.

For this, go to a dry room. Then, again, imagine the semisphere of exactly the same size and the same points on it. Put mics in each point on the semisphere, exactly where you put the speaker before, relative to the center, but facing the opposite way, i.e. towards the ensemble. Finally, invite your musicians in and record the samples with all the mics at the same time (writing to separate files). You probably want to give them headphones that run the IRs of the simulated room, so that they can hear the final sound when playing. They say it gives a better final result and that's probably true.

Put it all together and program it in a way, that when you play a note, it plays the samples from all the mics of the semisphere at the same time, separately applies the corresponding direction's IR to each of them and then mixes it all together. Now you have a vi that simulates how the sound is propagating from the ensemble in a sitting position to the room mics, without the need to stitch the reverb tails. And yes, this wouldn't be a perfect simulation, but the hope is that it would be good enough.

There's one caveat though, and this is probably why no one attempted to make a sample library this way yet. You'll have to run an IR for each recording mic on a sphere and for each room mic. So, say, you recorded 16 directions. And have a single close mic channel, two channels for a Decca tree, and two channels for surround mics. That's 5 channels, and as a result -- 16 times 5, or 80 simultaneous convolutions. And that's for a single room spot of which you will have a few. That's a lot of computation for the CPU. Modern Threadrippers could probably handle it anyway, but I have a couple of other possible solutions.

First, use GPU to run the convolutions. I worked with GPUs for graphic applications and they are seriously fast. And also, the task of convolution is probably one of the easiest ones for the GPU to handle, especially when it's doing a lot of them at the same time, which is exactly what we do in the proposed VI. With proper optimizations and correct data handling, I'll estimate 10-100 times more performance on a modern GPU than on a CPU. Though, you'll probably have to fit the samples into the GPU's memory for this to work.

By the way, the IR convolution was already tried on GPUs before and worked fine, but was quite pointless, since at the time nobody was running lots of reverbs simultaneously. On a side note, I think all of the audio processing can be moved to GPU at some point. And then we're talking about immense performance boost and completely new technologies, like ray traced room and instrument acoustic simulation. But that's a topic for another post.

Another solution might be to scale down the number of semisphere mics when playing in real-time and then use all of them when bouncing. You can record extra IRs with fewer points on the sphere and a speaker with a wider cone, and use mic recordings from the full sphere, just not all of them.

Yet another solution would be to not put mics evenly in a sphere, but find a few directions that influence the final sound the most, and then use only them.

So, ladies and gentlemen, what do you think? Do you think it'll work and sound good? Is there a problem with this approach that I didn't consider, like maybe the way mics respond when they're close to the source? Is this just overkill and not worth it? Have anybody tried doing something like this before? Is anybody secretly working on a new vi using this approach, preparing an audio revolution?

All in all, I would love to see someone try this approach and hear the result.

Anyway, that's a long post. Thanks for reading through.


----------



## EgM (Mar 12, 2020)

Interesting!, I'd probably love to crazy experiments like this myself if I had a free decent room to record, and more time, money...


----------



## Ben (Mar 12, 2020)

Hi Dima,

your idea sounds really interesting. We also used a similar approach to create our *M*ulti *I*mpulse *R*esponse software: *MIR PRO*
You can place your sound sources freely in the captured rooms (5 rooms of the _*Vienna Konzerthaus*_, _*Teldex* _and 5 other *studio venues*, 3 *churches*, _*The Sage Gateshead*_ 3 rooms, our scoring stage: _*Synchron Stage Vienna*_).

You can read more about how MIR PRO works and its features here: https://www.vsl.info/manuals/mir-pro/introduction
Detailed Information about the rooms we have recorderd with demo-tracks here: https://www.vsl.info/manuals/mir-roompacks/roompack1

Not only can you _*position*_ the instruments, but also influence the _*stereo width of the instrument / ensemble*_ in the room and the _*orientation*_. If you like MIR will calculate the _*distance volume drop-off *_*as well as *_*simulate air-absorption*_. We also captured the _*room-tone*_ of each venue, to add even more realism. A _*room-eq*_ lets you shape the IR, so you can filter out unwanted frequencies of the room in only the wet signal.
You have full control over the _*two microphone positions*_, the _*number, orientation and shape of the microphone capsules*_.
And everything in an intuitive and easy to use user-interface. On top there is an _*algorithmic reverb plugin included*_ to add a sweet, long reverb tail.

You can get a free 30 days trial on the product page (requires physical USB eLicenser or ViennaKey)
Stay tuned, exciting times here in Vienna 

Best, Ben


----------



## Quantum Leap (Mar 13, 2020)

I gotta say that I admire all the thought that went into this, but a lot of what you are talking about is what Spaces is all about. In Spaces, we just give you what we feel are the most useful angles of sound projection using an array of speakers. Each one created to emulate the sound projection patterns and direction of an instrument or section In a musical setting. The reason we didn’t go farther is you are entering a world of phasing and cancellation weirdness if you mix too many impulses from a single location going in all directions. You are also putting a big burden on the user to make it sound good. In any case, when multiple impulses are used on multiple instruments placed in different positions in a 3D soundstage, a 3D sound space is created. And Spaces has surround samples as well, by the way.

For sampling instruments, multiple mic setups that pickup a lot of the character of an instrument like the back of a violin for instance has been going on for over 20 years at QL. Once again, it is problematic to try and mix 12 mics surrounding an instrument, so engineers avoid that sort of thing.


----------



## ProfoundSilence (Mar 13, 2020)

similar to @Dietz work with VSL to make mir 

audio ease has a few products using similar technology as well


----------



## Dietz (Mar 13, 2020)

Quantum Leap said:


> it is problematic to try and mix 12 mics surrounding an instrument, so engineers avoid that sort of thing.


But that's not what MIR is about. It's about multiple positions _for instruments_ on a stage (and the instruments' specific way to emanate sound from that position), not about multiple mics (unless you count the underlying Ambisonics array as four mics).

Engineers don't "avoid" recordings that are based on main microphone setups plus supporting spot mics, that's for sure.
_
*EDIT*: Re-reading the whole thread this line seemed to be aimed at the OP's concept for re-inventing MIR, not at MIR itself. Sorry!! _


----------



## Buz (Mar 13, 2020)

I'm surprised the MIR approach hasn't taken over everything. I guess non-acoustic genres don't care. Is there even a downside though?


----------



## Lindon (Mar 13, 2020)

you're playing the audio back through a speaker(or a set of speaker in your case) - which has a response profile - so it "colours" the results....more than just capturing it with a mic (which itself has a response curve and colours..etc.etc.)


----------



## Lee Blaske (Mar 13, 2020)

Buz said:


> I'm surprised the MIR approach hasn't taken over everything. I guess non-acoustic genres don't care. Is there even a downside though?



MIR is a brilliant piece of software. I've had it for some time. I probably should be using it more than I do. For me, and I suspect for a lot of other people, though, it might be too mind-boggling. It's not that it's impossible to understand and start using, but when you do, you're immediately faced with a universe of options and possibilities. It's hard to make decisions, and you sort of get paralyzed. You wonder if you're doing things correctly, or if there is a much better way of doing things that you haven't chosen. I imagine that's why they came up with MIRx. Contrast that with using a typical sample library that has a nice start on room sound, and then adding more reverb to taste. Your end result will sound good, and you won't lie awake at night worrying if there was a much better solution.


----------



## Dietz (Mar 13, 2020)

Lindon said:


> you're playing the audio back through a speaker(or a set of speaker in your case) - which has a response profile


True. When we developed the original MIR concept we were very aware of the fact. That's why all IRs undergo some nifty inverse filtering which zeroes out (or at least minimizes) the effects of the used speaker, microphones and pre-amps.


----------



## Wallander (Mar 14, 2020)

I think the OP suggests using not only many different microphone perspectives and IRs, but a different IR on each mic, prior to mixing. This prevents the destructive coloration of sound, and phasing, that you get if you mix dry/close channels, prior to applying reverberation.

Theoretically, you would get a much smoother instrument spectrum in the reverb/ER portion of the sound, more akin to how an instrument sounds in a real hall. As a bonus, the multitude of close microphones could be used to arbitrarily pick the angle for a dry portion of sound.

I don’t _think_ you need to bother with capturing the IRs at exact spots. It’s more important that the IR reflection patterns of any two microphones don’t correlate. 

There are much more efficient ways you can do this with an algorithmic reverb. For example, if you have an old Lexicon-type ”ring buffer” reverb, you can simply make the loop/ring larger, and feed the microphones at different endpoints, just like you’d normally feed L/R channels at different endpoints, but scaled up to many more microphones.

Actually, that’s a pretty neat idea. I’d love to hear someone try that.


----------



## Rilla (Mar 14, 2020)

Dima Lanski said:


> Hi, vi-control members and fellow composers!
> 
> This is my first post here, so I'd like to introduce myself. My name is Dmitry, and I'm a former software engineer and now an aspiring composer.
> 
> ...



MIR Pro and Sample Modeling Instruments used together already sorta do what you're talking about.

I personally appreciate bone-dry samples recorded in an anechoic chamber. I think anechoic is the way to record instruments. That way, I can apply IR early reflections and late reflections to the EXACT degree that I like. 

I like what you're saying about the semisphere. Sample Modeling instruments have a "distance" feature that backs the instrument away from the mic (Z), but not one that simulates horizontal or vertical angles (X, Y). I don't know if that's really needed, but I bet it would be interesting to simulate exact seating positions, and in different configurations; orchestra, big band, marching band, concert band, etc. Hell, they do off-axis miking simulations with amps these days so why not orchestral instruments? Hey, if we're going for ultra realism why not? 

Also,

it will be amazing when ai can generate a realistic human performance based off a musical score, through manipulating midi/cc data of a particular VI, namely Sample/Audio Modeling and other modeled vi's. Some might call that lazy, but whatever! lol! 

I want to be able to come up with a musical score, let the computer generate the performances of those parts and make a few edits to the data.


----------



## ProfoundSilence (Mar 14, 2020)

I dont think it's the future really. 

I think more or less clever scripting is a possibility - as well as truly modelled instruments over anachoic recordings. 

also worth noting that reproducing an actual instrument is not really a sound we know and like - we've grown to like the sound of a MICROPHONE capturing an instrument.


----------



## Wallander (Mar 15, 2020)

ProfoundSilence said:


> I dont think it's the future really.
> 
> I think more or less clever scripting is a possibility - as well as truly modelled instruments over anachoic recordings.
> 
> also worth noting that reproducing an actual instrument is not really a sound we know and like - we've grown to like the sound of a MICROPHONE capturing an instrument.


Having spent the better part of a decade trying to turn anechoic samples into wet samples, I can say with confidence that _wet samples are here to stay_. The theory behind this hold up, and I'm happy to take you through the physics if you're willing to take the time.

When you record an instrument from close-up, the angle from which you record the instrument greatly affects the timbre. This is why microphone technique is an artform.

If you then add digital reverb to that dry sample, you create ten thousand reflections of _that same sound_. The reverb preserves all the inconsistencies of the original sample, because it's just a recording of _one_ angle of the instrument, repeated over and over.

Lets compare to that to a real concert hall. When you record natural reverberation in a concert hall, you're also recording ten thousand reflections. But _each reflection is a different angle of the instrument_.

*This concept is very important to understand.* The natural room sound that you hear in a wet sample is _not_ ten thousand copies of the same sound, but it's _ten thousand microphone perspectives_, where _every angle of the instrument is represented_, assembled into a smooth body of sound. 

Even in a suboptimal concert hall, or an auditorium or church, the natural room sound is a consistent mix of all angles of the instrument, and contains the full body of sound of that instrument.

So if you record an IR in Musikverein or Boston Concert Hall, for use in a convolution reverb to apply on a dry sample, that's just a gimmick. It's not even close to the real thing. Because no matter what you do, you're only reflecting the _same_ angle of sound ten thousand times over. The IR _can't_ reconstruct the sound from the various angles of the instrument, because that sound information doesn't even exist in the samples.

A convolution reverb is no better or worse than an algorithmic reverb (technically speaking) in this regard. An IR is just a shortcut to capture the reflection pattern and color of the room, but it doesn't mirror reality, where each reflection doesn't just occur at a certain time, but it also represents a different perspective of the instrument.

So in short, wet samples are here to stay. You can get away with recording samples with a short reverb/ER (long enough to represent most angles of the instrument) and use your choice of high-quality digital reverb to further extend the reverb tail. Which is also what a lot of the developers are doing these days, recording on stages rather than concert halls. But to algorithmically go from a dry or anechoic sound, into a high-quality wet sound, is a flawed concept.


----------



## Quantum Leap (Mar 15, 2020)

Yes I was not talking about MIR.


----------



## Dietz (Mar 15, 2020)

Wallander said:


> *This concept is very important to understand.* The natural reverb that you hear in a wet sample is _not_ ten thousand copies of the same sound, but it's _ten thousand microphones_, where _every angle of the instrument is represented_, assembled into a smooth tail of sound.


I understand what you're trying to tell us, but then this is true for any kind of sound recording in this space. Or in other words: If you can record it, you can also create an impulse response from it.



Wallander said:


> The IR _can't_ reconstruct the sound from the various angles of the instrument, because that sound information doesn't even exist.


This information in fact _does_ exist in case of Vienna Instruments and MIR Pro. Back then, we researched the so-called Directivity Profiles of more or less all important instruments of a typical symphony orchestra. We even developed a new method to capture this data and weight it properly, as we found out that most of the publicly available frequency profiles were a good estimation at best, or simply wrong due to conceptual flaws (... but that's a different story).

The IRs which make up for a complete MIR "Venue" take these Profiles into account, because the impulses used for their creation were directed towards the very same directions. You could even use a "dumbed-down" version called "General Purpose Profile" for any arbitrary input signal which isn't covered by the detailed Directivity Profiles like Vienna Instruments.



Wallander said:


> But any IR will also act like a giant equalizer, colouring the sound.


That could be considered to be the strength of the concept, too. This colour has been called "the voice of the hall" by more poetically talented peoply than me  and might or might not be what you're after - but it's not a flaw _per se._

Kind regards,


----------



## chrisr (Mar 15, 2020)

ooh, great thread! -I own a few of each of of your products (ewql, wallander, vsl...) and I get pretty great use out of all of them, so thanks!

Arne - particularly interested to hear your views in this thread as you're essentially discussing the limitations of your own products - and always interested to hear what direction you might be going in next.

For what it's worth the way I push the "realism factor" of your instruments (wivi) has little to do with the space and everything to do with creating imperfect source sound by messing with the intonation/tuning and introducing pleasant imperfections in my performances - breaths running out etc...

Peter Roos (I think?) posted some anechoic orchestra samples online some years ago - even those bone dry recordings sounded great to me - it was at that point that I started to think that I might be worrying too much about the room.

Anyway, I remain hugely appreciative of all of your efforts - your products have helped to keep me in work for a few years now.


----------



## Quantum Leap (Mar 15, 2020)

Wallander. I’m sorry, but I don’t think you know what Spaces does. Some of you guys should do your homework before writing all of that. Most of what you say is true, but there are also limits to what the human ear can appreciate. So if you get too technical, you lose sight of reality. If you want to create an orchestral soundstage, there is no better reverb than impulses done correctly. Algorithmic reverb can’t touch it. What Spaces is doing is 1000 times more complex. If you compare all the impulses in Reynolds Hall in Spaces 2, taken at different positions on the stage, each position utilizing a different projection array and angle or series of angles, you will notice that they all sound very different from eachother. Very different. Then if you use the appropriate impulse with various instruments and sections, you create an incredibly complex soundstage. It’s as good as it needs to be, if you are listening in stereo. It’s more than the brain can process.
However, if the original samples are close mic’d from a single position, the effect of good reverb will be less successful. But if the samples have a stage position or something a bit farther back, you then have probably captured the essence of the instrument, so good reverb can work wonders.
Algorithmic reverb for orchestral work became popular because filmscore engineers were adding it to a soundstage recording that already sounded great. The digital verb just added to what was already a balanced, big sound. It’s not the reverb is doing wonderous things.
Imagine you have 8 Meyer Sound live audio speakers. You want to create an impulse for violins. You place them in the front left section of the stage where the violins would be. You angle them to fire diagonally upwards across the hall, but alter the angle on each pair of speakers as you approach the center. Then you fire 2 speakers backwards with the tweeters covered. You have created the projection array of a violin section. You then capture the whole thing with the same kind of mics, preamps and converters that Sean Murphy uses to record Star Wars Etc... You get the beautiful color and you get all the intricacies of the materials and geometries of the room. and you capture it in surround. That’s what Spaces does. Algorithms can’t touch that. But of course they can be really useful and sound great in many instances.
Back to IRs: And you do the impulses all at very high volumes so the room is excited. Basically you go obscenely loud and back it off when the room starts making horrible vibration noises. This gives you a good low noise impulse that works for all volumes of music.
In a real setting, there is very little reverb when the orchestra plays quietly. People don’t like that generally and add reverb to hide this In orchestral recordings and film scores.
To summarize, Spaces 2 is a proven tool that really works. If you use it with live audio or any of the great samples out there, you can create a mix indistinguishable sonically from the real thing. Performance might not be there, but that’s because sampling is still not perfect, and never will be. Reverb is much much easier to tackle than sampling. Need to do more locations someday.
one last thing: the biggest problem turning dry samples into realistic wet samples is the recording. If you don’t capture a true picture of the instrument and you simply record it from the front in a close position, you get a completely wrong balance. But the second biggest problem is early reflections. The right amount of early reflections can do wonders. It can draw the instrument into the hall, by bridging the gap between dry and wet. Orchestral sound is really close, plus stage, plus hall, plus surround hall. Spaces shines in this area because the arrays were often firing up into the stage, across the stage and backwards to the back of the stage. The impulses meld the stage reverb with the hall reverb. So in theory, samples only have to be a bit wet and nicely balanced and a good reverb would be able to do the trick.


----------



## Wallander (Mar 15, 2020)

Quantum Leap said:


> Wallander. I’m sorry, but I don’t think you know what Spaces does. Some of you guys should do your homework before writing all of that. Most of what you say is true, but there are also limits to what the human ear can appreciate. So if you get too technical, you lose sight of reality. If you want to create an orchestral soundstage, there is no better reverb than impulses done correctly. Algorithmic reverb can’t touch it. What Spaces is doing is 1000 times more complex. If you compare all the impulses in Reynolds Hall in Spaces 2, taken at different positions on the stage, each position utilizing a different projection array and angle or series of angles, you will notice that they all sound very different from eachother. Very different. Then if you use the appropriate impulse with various instruments and sections, you create an incredibly complex soundstage. It’s as good as it needs to be, if you are listening in stereo. It’s more than the brain can process.
> However, if the original samples are close mic’d from a single position, the effect of good reverb will be less successful. But if the samples have a stage position or something a bit farther back, you then have probably captured the essence of the instrument, so good reverb can work wonders.
> Algorithmic reverb for orchestral work became popular because filmscore engineers were adding it to a soundstage recording that already sounded great. The digital verb just added to what was already a balanced, big sound. It’s not the reverb is doing wonderous things.
> Imagine you have 8 Meyer Sound live audio speakers. You want to create an impulse for violins. You place them in the front left section of the stage where the violins would be. You angle them to fire diagonally upwards across the hall, but alter the angle on each pair of speakers as you approach the center. Then you fire 2 speakers backwards with the tweeters covered. You have created the projection array of a violin section. You then capture the whole thing with the same kind of mics, preamps and converters that Sean Murphy uses to record Star Wars Etc... You get the beautiful color and you get all the intricacies of the materials and geometries of the room. and you capture it in surround. That’s what Spaces does. Algorithms can’t touch that. But of course they can be really useful and sound great in many instances.
> ...


I agree with everything you say. I think we’re discussing apples vs. oranges.

When I talk about dry, I mean anechoic, or close to. I could have been more clear about that. From my perspective, Hollywood Orchestra is a ”wet” sample library.

Spaces 2 is an exquisite reverb, and the artistic knowledge that went into it is awe-inspiring. But even Spaces 2 can't make an _anechoic_ stereo string section (regardless of sample quality) sound like Hollywood Strings. I’m sure we can agree on that.

The point I’m trying to make is that the ER doesn't just bridge the gap between dry and wet, but each reflection in that ER also contains a different angle/perspective of the instrument. The direct sound is rarely balanced, and could be missing overtones, but these overtones are filled in by those ER which have those overtones. The direct sound could also have a very strong fundamendal sticking out on a certain pitch (due to directivity) but the natural ERs won't have that same problem, as they’re snapshots of other angles of the instrument. So the problem is attenuated when you mix the direct sound with the natural ER. The spectrum of an instrument is less uneven if you record it with some ambience. I’m not sure how to explain it better.


----------



## Wallander (Mar 15, 2020)

Dietz said:


> I understand what you're trying to tell us, but then this is true for any kind of sound recording in this space. Or in other words: If you can record it, you can also create an impulse response from it.
> 
> 
> This information in fact _does_ exist in case of Vienna Instruments and MIR Pro. Back then, we researched the so-called Directivity Profiles of more or less all important instruments of a typical symphony orchestra. We even developed a new method to capture this data and weight it properly, as we found out that most of the publicly available frequency profiles were a good estimation at best, or simply wrong due to conceptual flaws (... but that's a different story).
> ...


I truly admire the work you put into MIR, and I don't know the details of it, but you can't make up for this with Directivity Profiles. A Directivity Profile may give you a general idea of the directivity of an instrument (particularly in low/low-mid frequencies) but it can't bring you back the finer details with any accuracy. 

When you record an instrument close-up, moving a microphone just an inch in any direction can make a big difference on the scale of individual overtones. There's also a lot of variation in directivity between two different violins, even from the same maker. Also, if the players move or angle their instruments just slightly during the recording (and the microphone isn't attached to the instrument) it's all in vain. 

And even if you managed to filter the dry samples so that they were completely neutral, you're not going to get a sense of liveliness and space if you repeat that sound 10,000 times. As opposed to a wet sample, where you start with many strongly coloured reflections with lots of character, but which gradually builds up to becoming denser and more neutral as the reverb progresses, finishing off in a perfectly smooth and uncoloured "white" instrument sound in the late reverb. 

And just so there's no confusion, I'm doing anechoic instruments myself. I'm certainly not here to review other's products, but I'm only trying to speak candidly about the drawbacks of a technique I myself employ. There are certainly also a lot of benefits to using dry sounds, such as flexibility and easier editing.


----------



## ProfoundSilence (Mar 15, 2020)

Well Arne, I agree that Impulses are a gimmick, but the reality is that what sounds good sounds good. And I've found that you can almost always give the impression of a larger space - but it can't make a molehill into a mountain so we agree on that. 

I think algo could eventually create spaces that are realistic for modelled instruments... the problem will be the akward part when we can virtually simulate these things we'll have this blind period where we forget that it's actually just the microphone colored signal we're expecting. Kind of like how we got rid of all the coloration as an engineering GOAL and immediately found that we were adding fake saturations and distortions from "crappier" equipment because that's the sound we've established is enjoyable. 

You had quite the valiant effort with Wivi, and wivi proves that performance can certainly make a convincing argument against static samples. 

Diet on the other hand gave us quite an interesting tool for coloring tone as well as a way to approximate depth/position on a make believe stage. Loading up SWAM woodwinds with MIR impulses certainly gave it some width and pop over sending them all to the same reverb - yet still felt like the coloration was correct despite being different reverbs.


----------



## Wallander (Mar 15, 2020)

chrisr said:


> ooh, great thread! -I own a few of each of of your products (ewql, wallander, vsl...) and I get pretty great use out of all of them, so thanks!
> 
> Arne - particularly interested to hear your views in this thread as you're essentially discussing the limitations of your own products - and always interested to hear what direction you might be going in next.
> 
> ...


You're right. Imperfections and the performance aspect is also important. But even if you had that covered, natural reverberation has a quality in sound that you just can't reproduce with artificial reverberation on top of a dry sample. It's a tradeoff you make.

Those anechoic live orchestra records actually sound pretty great even without reverb. It's refreshing to hear all the small details. With that being said, no amount of artificial reverb would make those recordings sound like the Boston Symphony Hall (acoustically).


----------



## Wallander (Mar 15, 2020)

ProfoundSilence said:


> Well Arne, I agree that Impulses are a gimmick, but the reality is that what sounds good sounds good. And I've found that you can almost always give the impression of a larger space - but it can't make a molehill into a mountain so we agree on that.
> 
> I think algo could eventually create spaces that are realistic for modelled instruments... the problem will be the akward part when we can virtually simulate these things we'll have this blind period where we forget that it's actually just the microphone colored signal we're expecting. Kind of like how we got rid of all the coloration as an engineering GOAL and immediately found that we were adding fake saturations and distortions from "crappier" equipment because that's the sound we've established is enjoyable.
> 
> ...


I shouldn't have written "gimmick". That was unnecessarily harsh. My intention was to say that the IR is only a very limited approximation of the room. An IR can make a great reverb, but if you record a world-class violin player in an anechoic chamber and run it through an IR, it doesn't sound anything like a world-class violin player in a concert hall. IRs have an artistic use, but their technical merits are overrated.

There are major peaks and notches in the spectrum of many acoustic instruments recorded up-front, sometimes occurring only a few Hz apart. If you record an "E4" sample and an "F4" sample close-up, and look at the spectrum, the 2nd overtone of E4 could be 10 dB louder than the 2nd overtone of F4. But in the ambient microphone, those overtones would (usually) have a similar amplitude. Reason being, most acoustic instruments produce _more or less_ the same sound spectrum for two consecutive pitches, but there's a difference in what direction that energy goes.

Other instruments, like brass instruments, are not as sensitive to what pitch is being played. But a brass instrument instead has the problem that high frequencies go straight out of the bell, but most of the body sound _energy_ goes to the sides. You would usually put the microphones at an angle, but it's pretty sensitive and there's always a risk that you get either too little body sound, or too little brightness at ff. And the overall balance of the close perspective will never exactly match the sum of sound energy projected in all directions. Not to mention horns, being turned backwards, where brightness typically is reflected energy.

The sum of ambience and ER catches it all, but the direct sound only gets a directional snapshot.


----------



## ProfoundSilence (Mar 15, 2020)

you're not wrong to use the word gimmick, because companies like waves make batshit claims like "*The Abbey Road Studio 3 plugin brings the acoustic environment of the legendary Abbey Road Studio 3 control room to your headphones, so you can have better reference for your mixes and productions wherever you are.* " That said, MIR pro has a pretty modest product page and simply states it's thousands of impulse reponses - and didn't from my memory ever market itself making larger than life claims(although users might have hyped it up to that level)

I just personally think that IRs on samples can be used to make the room feel bigger - and while it's technically not doing anything close to what a real room does - the end result can be pleasing and function as an illusion to help temporarily suspend disbelief and let a listener focus on the music. 

Working with Samplemodeling so much for the longest time I own a silly amount of reverbs - and I'd throw most of them away at this point, but it was certainly a masterclass of trial, error, and how ridiculous spatialization ACTUALLY IS while simultaneously showing me that you can really half ass it and still make something sound good, even if it's not realistic. 

I've thought a lot about it and planned on doing an in depth video on ways of dealing with products like pianoteq.


----------



## muk (Mar 15, 2020)

From a user perspective I can relate to Arne's findings. Short sounds with prominent transients, if they were recorded dry I could never make them sound like they had been recorded in a nice room/hall. In a concert hall these sounds get a nice 'bloom' around them. I never could replicate that with adding reverb to dry samples.

Here is an example, a pizzicato ditty with Dimension Strings. Dimension Strings have been recorded with close mics only in a relatively dry room:









Encore Dimension Strings more reverb.mp3 | Powered by Box







app.box.com





The strings are pushed back in the virtual room, and there is reverb there. But to me it does not sound coherent. There is no bloom around the notes, and the reverb does not sound like an organic room to me.

Compared to this, the same piece with a few instances of Berlin Strings First Chairs layered on top of each other (so strings recorded with close mic, decca etc. in Teldex):









Encore BFC Strings.mp3 | Powered by Box







app.box.com





Hearing the second example the problems of the first one become glaringly obvious.

I used neither MIR nor Spaces in the first example by the way. I don't own them. I don't doubt that they could have come much closer to a believable result. But, like Arne, I don't think they can get you all the way there with really dry/anechoic samples.


----------



## Wallander (Mar 15, 2020)

muk said:


> From a user perspective I can relate to Arne's findings. Short sounds with prominent transients, if they were recorded dry I could never make them sound like they had been recorded in a nice room/hall. In a concert hall these sounds get a nice 'bloom' around them. I never could replicate that with adding reverb to dry samples.
> 
> Here is an example, a pizzicato ditty with Dimension Strings. Dimension Strings have been recorded with close mics only in a relatively dry room:
> 
> ...


This is a stellar example.

First of all, let me be clear by saying that Dimension Strings is expertly recorded, to a level of refinement I could never achieve myself.

The difference between these two examples isn't about microphone positioning or choice of reverb. The reason why Berlin Strings stands out is because it has lots of mid-high/high energy which happens in the ER and tail. You can even hear that the sound blooms into something brighter as it progresses through the reverb tail. The late portion of the reverb is almost like white noise, because it's perfectly uncoloured and all-encompassing. As opposed to the direct sound, which has lot of body character.

This particular transient sound further shows the problem with directivity. These frequencies aren't projected from the resonating body of the violin, and don't obey the rules of a general Directivity Profile for a violin. It's a snapping transient sound, which emanates from the bridge and strings themselves, omnidirectionally, like a separate percussion instrument attached on top of the violin. The only way to collect that sound information is through having a reflective room. If you record a pizzicato dry, you're probably going to have a lack of high frequencies, regardless of where you put the microphone.


----------



## muk (Mar 15, 2020)

Arne, this sounds like a plausible explanation for what I encountered in practice. No matter what I tried with reverb, for transient heavy short sounds I could never make dry samples sound like they'd been recorded in an ambient space. For long notes I found it to be less difficult. Also for woodwinds. I don't know why that's the case. Maybe woodwind sounds have less transients than other instruments? In any case thank you for the explanation. It's a truly fascinating topic.


----------



## ProfoundSilence (Mar 15, 2020)

I made a (probably useless) video comparing an actual close + room mic vs a close + IR.

my primary points are :
it's passable, albiet limited - but it's not the real thing
reverbs lack punch, which is important
you don't need to be a rocket scientist to still use reverb to simply add "more" room to something that has room already.

Also, before anyone asks, I'm not making a tutorial on how to customize IRs to load into kontakt.


----------



## Dima Lanski (Mar 15, 2020)

Thank you guys, for all the wonderful replies! Some of your comments really gave me some interesting insights. What a wonderful community!



Ben said:


> We also used a similar approach to create our *M*ulti *I*mpulse *R*esponse software: *MIR PRO*


I've been pointed toward MIR after discussing the idea with a couple of people, so I definitely will check it out when I get the opportunity. Thanks!



Quantum Leap said:


> a lot of what you are talking about is what Spaces is all about.


A friend of mine just got Spaces 2 recently. I'll check your software next time I pay him a visit, and maybe get a copy for myself too



Wallander said:


> I think the OP suggests using not only many different microphone perspectives and IRs, but a different IR on each mic, prior to mixing. This prevents the destructive coloration of sound, and phasing, that you get if you mix dry/close channels, prior to applying reverberation.


Yes, that's exactly what I'm proposing! Glad to here someone get it Is that what MIR does, @Dietz*? *And how about Spaces 2, @Quantum Leap? Though to be more precise, there bound to be at least some degree of phase cancellations and buildup in a real sound propagating through a room. I just want to get the real ones and eliminate the ones arising from the method of capturing.



Wallander said:


> I don’t _think_ you need to bother with capturing the IRs at exact spots. It’s more important that the IR reflection patterns of any two microphones don’t correlate.
> 
> I don’t _think_ you need to bother with capturing the IRs at exact spots. It’s more important that the IR reflection patterns of any two microphones don’t correlate.
> 
> There are much more efficient ways you can do this with an algorithmic reverb. For example, if you have an old Lexicon-type ”ring buffer” reverb, you can simply make the loop/ring larger, and feed the microphones at different endpoints, just like you’d normally feed L/R channels at different endpoints, but scaled up to many more microphones.


Well, they're bound to be correlated to some degree, as they're are happening in the same room, especially for the IR's that were recorded from neighboring spots on the sphere. And while the idea of generating decorrelated algorithmic IRs is valid on its own, it's not what I have in mind



Dietz said:


> hat's why all IRs undergo some nifty inverse filtering which zeroes out (or at least minimizes) the effects of the used speaker, microphones and pre-amps.


This is kinda beside the scope of the post, but the speaker has not just a frequency response profile, but also an impulse response profile. I wonder if MIR corrects for that somehow? By the way, I view the impulse response as frequency response spread in time, is that a correct way of thinking about it?



Rilla said:


> it will be amazing when ai can generate a realistic human performance based off a musical score, through manipulating midi/cc data of a particular VI, namely Sample/Audio Modeling and other modeled vi's. Some might call that lazy, but whatever! lol!
> 
> I want to be able to come up with a musical score, let the computer generate the performances of those parts and make a few edits to the data.


Again, unrelated to the point of the post, but that's the same idea I had too, and it's the next step after generating realistic sound from dry samples. Glad to see someone think the same things! It's actually doable already, all the ai needs is lot's of data. So contact great midi performers, of which there are a lot on this forum, and gather their midi scores. Than notate the scores in detail, adding an overall mood and all the neccessary markings for each phrase, and let the neural net learn to predict the great performance from the notation. As an alternative, you can replace the notation with a rough midi performance. Should take just a year or two of R&D)

The next step would be to capture a real artist's performance this way. But I haven't figured out an exact way to do it, yet. Maybe learning on performances in a dry room would be enough?



Dietz said:


> This information in fact _does_ exist in case of Vienna Instruments and MIR Pro. Back then, we researched the so-called Directivity Profiles of more or less all important instruments of a typical symphony orchestra. We even developed a new method to capture this data and weight it properly, as we found out that most of the publicly available frequency profiles were a good estimation at best, or simply wrong due to conceptual flaws (... but that's a different story).
> 
> The IRs which make up for a complete MIR "Venue" take these Profiles into account, because the impulses used for their creation were directed towards the very same directions. You could even use a "dumbed-down" version called "General Purpose Profile" for any arbitrary input signal which isn't covered by the detailed Directivity Profiles like Vienna Instruments.


The more I hear, the more MIR sounds similar to what I had in mind. I've gone ahead and checked out this demo: . Unfortunately this one has the same problem I hear with regular IRs: the voice sounds like it's coming from a speaker, not a real person. Too boomy and compressed. Maybe it's not representative of MIR's workings with orchestral sections? BTW, the human voice is really the litmus test for all acoustic systems, because all of us are used to hearing it in real spaces, unlike orchestral instruments. So I would love to hear the human voice recorded in the way I proposed: multitude of mics in an anechoic chamber, then an IR for each combination of recording mic + real room mic.



chrisr said:


> Peter Roos (I think?) posted some anechoic orchestra samples online some years ago - even those bone dry recordings sounded great to me - it was at that point that I started to think that I might be worrying too much about the room.


I think I heard the recordings, and I agree, they're great on their own. That's probably because sound is not the most important aspect, as much as many of us want to believe... Performance, orchestration and composition come first The greatest performances were captured on a potato a century ago, and it still didn't prevent most people from enjoying them.

I've hit the character limit of a post, so I'll try to split my response in 2. Hopefully, you''l see the second part right next to this one.


----------



## Dima Lanski (Mar 15, 2020)

Quantum Leap said:


> Once again, it is problematic to try and mix 12 mics surrounding an instrument, so engineers avoid that sort of thing.


That sentence, I feel, represents the common misunderstanding of the intent of the proposed solution and what it actually tries to do... It's not just you, @Quantum Leap, who interpret my solution this way, but most of the responders, it's just that the quote captures that interpretation best I'll try to make myself more clear, though to be honest, it's hard to do with words only. Pictures would be best, but, alas, I'm only good at hand waving, not drawing) Anyway...

So I come from graphics engineering background, among other things, and the solution actually is inspired by the way things are done there. And the closest analogy would be raytracing. But instead of rays, we have cones of sound, bits of a wavefront if you will. And instead of modeling it in virtual space, I'm trying to think of the way to capture how they behave in a real space. A big set of IRs seem to be a perfect solution for this.

As a side note, how most current solutions in audio work right now is akin to the way movie compositing software work, like After Effects, Nuke, etc. Or 2d game engines. You have a set of captured videos, images and effects, and you're trying to compose and process them in a way that looks best. And this is inevitable and invaluable part of the process, since in the end the viewer will be looking at a flat picture anyway. But this is just one part of the equation, the other part is capturing, recreating, generating and manipulating a 3d space and how light interacts with a camera in it. That's a realm of CG and corrective post production. And that's what I'm proposing to do with my solution.

To make it really clear, let's imagine a scaled up super HD version. Let's build an actual anechoic semisphere with a MILLION pressure sensitive sensors on its surface, like a giant spherical camera looking inwards, but with a tiny mic for each pixel. The footage from this camera would be similar in size to the footage of a real high speed camera with say 48,000 fps and 24 bits per pixel. If you record with such an audio camera, you'll get a "movie" that represents a sound wavefront detected at its surface.

What we need then, is the way to simulate how that spherical wavefront will travel further and arrive at each of the real room mics, if it were in a real wet room. For that we'll need to put this semisphere in a real room, turn it inside out, and record an IR for each pixel one by one. After that process we'll get a million IRs. Again, it will be like a short movie clip from a high-speed camera, but this time capturing how a sound wavefront from the surface of the semisphere travels to a room mic.

And then combine the 2 movies with the usual IR algorithm and you'll get the final result: the simulation of the recorded ensemble sitting in a real room. Perfect!

To record the IRs, however, would require each pixel to send a really narrow and directed wave of sound. And here lies a potential problem with the overall idea. You probably can't send a sound in a cone and make it behave as if it's a part of a wavefront. Similar to the smoke rings: the air in the cone will interact with the air outside it, creating friction and therefore eddies, and therefore losing energy. And the smaller the cone, the bigger the effect. I'm actually wondering if it's even makes sense to split a wavefront into cones from physics perspective...

If this problem has a solution in a real room, it would require an expertise of an audio engineer with deep understanding of acoustics and probably the underlying math and physics down to differential equations. If there are people like that on this forum, I would love to hear their opinion and advice on the topic

One potential solution would be to just model the room, and produce the required mega IR. The wave in a simulation can be as narrow as we want, just need an algorithm that ignores the interaction with the air outside of the cone. We can measure the acoustic properties of the materials, build a virtual room with them, maybe even capture individual responses for different elements of the room. And then run a simulation to produce the final spherical IR movie. Not to run the simulation with the source sound, mind you, that would be realtime raytracing, but just to create the IRs, which represent a snapshot of the results of the physical process. I honestly believe the final result produced this way will be already very realistic, but using a modeled room doesn't have the same ring to it from the end user perspective. Pun intended, sorry.

Another solution would be to scale back the ambition, and make just a few pixels instead of a million. Say 16. This will increase the size of the cones, required to produce each pixel of the mega IR, reducing the smoke ring effect. And hopefully it will still be good enough for our ears to sound realistic. This is actually what the proposed system in the original post was.

So did this text clear you understanding of my proposition? What do you think, @Dietz, @Quantum Leap, @Wallander?

One final note I wanted to add, is that the IR solutions today tend to produce good results for higher pitched instruments. But put a Cello or a Horn ensemble through them, and it's a mess. No wonder, low-mid and bass are the frequencies that are affected by the room the most. The higher frequencies just tend to be filtered out faster. And the players of the instruments talk a lot about how the room is always part of their sound, like a deck of a guitar. And that's one reason why people use Sample Modeling trumpets a lot, but horns or trombones -- not so often.

Oh Horns! I recently discovered CineBrass and their 6-horn ensemble is just a wonder. But I want it to play like Sample Modeling or Aaron Venture stuff! Guys, please, do something!

Again, this was a long post, so for those of you who made it to the end -- thank you!


----------



## muk (Mar 15, 2020)

Dima Lanski said:


> Again, unrelated to the point of the post, but that's the same idea I had too, and it's the next step after generating realistic sound from dry samples. Glad to see someone think the same things! It's actually doable already, all the ai needs is lot's of data. So contact great midi performers, of which there are a lot on this forum, and gather their midi scores. Than notate the scores in detail, adding an overall mood and all the neccessary markings for each phrase, and let the neural net learn to predict the great performance from the notation. As an alternative, you can replace the notation with a rough midi performance. Should take just a year or two of R&D)



Arne will probably have some thoughts about this too. His 'Noteperformer' software tries to interpret a score the way that musicians would:






NotePerformer 3


NotePerformer 3 is the Artificial Intelligence-based orchestral playback engine for Sibelius, Finale & Dorico.




www.noteperformer.com





The algorithm was 'trained' with actual recordings if I remember correctly. Great help when orchestrating in a notation program. An essential tool really. And if I understood you correctly it tries to achieve exactly what you are described above.


----------



## Dima Lanski (Mar 15, 2020)

muk said:


> Arne will probably have some thoughts about this too. His 'Noteperformer' software tries to interpret a score the way that musicians would:
> 
> 
> 
> ...


Yep, NotePerformer is on my radar. It's quite good, I actually consider buying it at some point in future. Didn't know @Wallander was the one responsible for the software, though the avatar might have given me a hint Nice to meet such great people here! The same goes for you, @Quantum Leap, @Dietz and other people responding to my post


----------



## johngrant (Mar 15, 2020)

Reverbs are my pet peeve. Wallander's post rings very true for me. I would love not to have to add reverb, in my particular case, to solo piano classical material. But my ears have become very used to hearing it in virtually ALL solo classical piano recordings. So I like it. (Not everyone does.)

Problem is, as folks have suggested, even the driest piano vsts contain mic perspectives that introduce _some_ natural reverb. That makes the process of adding MORE artificial reverb inherently difficult. On the other hand, I've found that the remote mic perspectives for most piano vsts don't really capture anything close to the verb you hear in well-recorded piano music, either. So how to mimic good live solo piano recordings with great, super-realistic (natural?) reverb. Bricasti?

A big part of the issue might be the fact that piano samples (commercially available) seem to have the high end, say 10-16k hz, of all their velocity layers damped down. You really need all the top end, don't you, to make a recording that sounds like it's in a real space?


----------



## Dima Lanski (Mar 15, 2020)

ProfoundSilence said:


> I made a (probably useless) video comparing an actual close + room mic vs a close + IR.
> 
> my primary points are :
> it's passable, albiet limited - but it's not the real thing
> ...



Interesting video, thanks! That's exactly my experience with IRs. Though, woodwinds are kinda ok with this method. But if you get to brass or low strings -- the sound becomes a mess. Especcially on the punch side of things, as you mentioned.


----------



## Dima Lanski (Mar 15, 2020)

Dima Lanski said:


> To record the IRs, however, would require each pixel to send a really narrow and directed wave of sound. And here lies a potential problem with the overall idea. You probably can't send a sound in a cone and make it behave as if it's a part of a wavefront. Similar to the smoke rings: the air in the cone will interact with the air outside it, creating friction and therefore eddies, and therefore losing energy. And the smaller the cone, the bigger the effect. I'm actually wondering if it's even makes sense to split a wavefront into cones from physics perspective...
> 
> If this problem has a solution in a real room, it would require an expertise of an audio engineer with deep understanding of acoustics and probably the underlying math and physics down to differential equations. If there are people like that on this forum, I would love to hear their opinion and advice on the topic


Hate to quote myself, but I thought about it a bit more, and decided that maybe you don't need to isolate a direction when recording an IR for each pixel of the semisphere. That it won't represent the real physical process anyway. Since sound propagates in all directions naturally, maybe it's more correct to just play the sweeps normally, with a regular omnidirectional speaker from each pixel's spot. And the audio will naturally combine itself into a realistic wavefront, when applying all the IRs together and then summing the results. Need some math to confirm that.

But that same line of reasoning makes me believe that we probably can't scale the solution down to 16 pixels. Can't say for sure without calculations. The hope is that it's not a lot of pixels, otherwise the amount of information and processing will be infeasible. Not to mention the hardware. But then again, maybe someone comes up with a good compression and processing algorithm for such a system, like they did for video...

What do think guys? @Wallander, @Dietz, @Quantum Leap?


----------



## Wallander (Mar 15, 2020)

Dima Lanski said:


> Hate to quote myself, but I thought about it a bit more, and decided that maybe you don't need to isolate a direction when recording an IR for each pixel of the semisphere. That it won't represent the real physical process anyway. Since sound propagates in all directions naturally, maybe it's more correct to just play the sweeps normally, with a regular speaker in all directions from each pixel's spot. And the audio will naturally combine itself into a realistic wavefront, when applying all the IRs together and then summing the results. Need some math to confirm that.
> 
> But that same line of reasoning makes me believe that we probably can't scale the solution down to 16 pixels. Can't say for sure without calculations. The hope is that it's not a lot of pixels, otherwise the amount of information and processing will be infeasible. Not to mention the hardware. But then again, maybe someone comes up with a good compression and processing algorithm for such a system, like they did for video...
> 
> What do think guys? @Wallander, @Dietz, @Quantum Leap?


It sounds like the CPU and memory streaming requirements could be over the top, by a very large margin, for realtime audio processing. 

I’m not qualified to say if it would sound good or not. In my own experience, new ideas usually sound great until I try them. I throw away at least 9 out of 10 lines of code I write.

So even if I vouched for the idea, that would at best translate into a 10% chance of success.


----------



## Dima Lanski (Mar 15, 2020)

Wallander said:


> It sounds like the CPU and memory streaming requirements could be over the top, by a very large margin, for realtime audio processing.


That's why I also proposed to migrate audio processing to GPU in my original post

Information and memory wise the modern 24 bit color HD video with 30 fps has (1920*1080 pixels)*(30 fps)*(24 bits) = 1,492,992,000 bits per second. A raw audio stream is say (44,100 Hz)*(16 bits) = 705,600 bps. Divide the first number by the second one and we get 2,116. That's the number of mics we can have on our sphere to match the data rate of a regular HD video. That seems like a good upper limit, though the hope is that it will sound realistic with much less.

Then onto processing. The modern GeForce 2080 TI has 13.4 TFLOPs of processing power, that is 13,400,000,000,000 operations per second, and we have (2,116 mics)*(44,100 Hz) = 93,315,600 samples per second, which amounts to 143,598 operations per sample for our stream, which should be more than enough to apply an IR algorithm. That of course is a theoretical limit, we'll need to use the GPU efficiently, filling all the caches and making all the cores work in parallel. But that seems feasible, given that the task itself is highly parallel, apart from the final summing.

All of it is a bit handwavy without the actual algorithms, but I just hope to show that it's within possibility on modern hardware and with a very good margin. Well the actual hope is that someone picks up the idea and implements a solution with it, or probably stumbles upon a much better solution in the process. And then in 5 years I'll be composing with NotePerformer playing ultra-realistic and punchy sounding virtual instruments


----------



## Wallander (Mar 15, 2020)

muk said:


> Arne, this sounds like a plausible explanation for what I encountered in practice. No matter what I tried with reverb, for transient heavy short sounds I could never make dry samples sound like they'd been recorded in an ambient space. For long notes I found it to be less difficult. Also for woodwinds. I don't know why that's the case. Maybe woodwind sounds have less transients than other instruments? In any case thank you for the explanation. It's a truly fascinating topic.


Woodwinds _generally_ have their dominating partials in a relatively narrow frequency region, near the fundamental. A lot of the sound energy emanates through the tone holes, so a woodwind is very strongly directional. When you change fingering, the directivity of the instrument changes drastically.

If you record a woodwind instrument with a single mono mic, you’ll see that the amplitude varies a lot between different pitches, even when the musician thinks he/she is playing all notes at the same dynamic. It’s because of directivity.

Now, if you move the microphone to a different position relative to the woodwind, the amplitude still changes with pitch, but with a different pattern, i.e. other pitches are loud/soft.

Using a properly placed stereo pair does wonders in this case, because the variations in amplitudes cancel out, as far as your brain is concerned (although the energy moves somewhat between the L and R channels, with different pitches).

In a sample library, errors like these are also comparably straightforward to fix in post-processing.


----------



## robgb (Mar 15, 2020)

Rilla said:


> I personally appreciate bone-dry samples recorded in an anechoic chamber. I think anechoic is the way to record instruments. That way, I can apply IR early reflections and late reflections to the EXACT degree that I like.


Yes. I've been preaching this for a long, long time.


----------



## Rilla (Mar 16, 2020)

Wallander said:


> Having spent the better part of a decade trying to turn anechoic samples into wet samples, I can say with confidence that _wet samples are here to stay_. The theory behind this hold up, and I'm happy to take you through the physics if you're willing to take the time.
> 
> When you record an instrument from close-up, the angle from which you record the instrument greatly affects the timbre. This is why microphone technique is an artform.
> 
> ...




Recording an IR in a concert hall, a speaker plays a sine wave sweep and the natural reverb that you hear *in* *the mic* is _not_ ten thousand copies of the same *sine sweep*, but it's _ten thousand microphone perspectives_, where _every angle of the *speaker* is represented_, assembled into a smooth tail of sound.

A trumpet is not a speaker, so is this where you're saying the discrepancy lies? That the only way you're going to get an authentic "ten thousand microphone perspectives" of a trumpet reverb is to record an actual trumpet rather than a speaker/IR?

I understand IR's wont be exactly the same but it's close enough for me. It's practical (especially with something like Spaces), because I can get close to *what I want* and make adjustments, rather than trying to get it exactly like the Boston Concert Hall. After all, it's not like I can publicly announce that I recorded in the Boston Concert Hall if I really programmed something with sample modeled instruments and used a Boston Concert Hall IR, lol!

And plus, only guys like us are going to split hairs over IR versus the real thing!  

que'd at 1:59, this is close enough for me:


----------



## Rilla (Mar 16, 2020)

Wallander said:


> I agree with everything you say. I think we’re discussing apples vs. oranges.
> 
> When I talk about dry, I mean anechoic, or close to. I could have been more clear about that. From my perspective, Hollywood Orchestra is a ”wet” sample library.
> 
> ...



What in particular are the natural ER's you are talking about? Do you mean the stage floor/back wall of the stage in a concert hall setting?


----------



## Rilla (Mar 16, 2020)

Wallander said:


> There are major peaks and notches in the spectrum of many acoustic instruments recorded up-front, sometimes occurring only a few Hz apart. If you record an "E4" sample and an "F4" sample close-up, and look at the spectrum, the 2nd overtone of E4 could be 10 dB louder than the 2nd overtone of F4. But in the ambient microphone, those overtones would (usually) have a similar amplitude. Reason being, most acoustic instruments produce _more or less_ the same sound spectrum for two consecutive pitches, but there's a difference in what direction that energy goes.
> 
> Other instruments, like brass instruments, are not as sensitive to what pitch is being played. But a brass instrument instead has the problem that high frequencies go straight out of the bell, but most of the body sound _energy_ goes to the sides. You would usually put the microphones at an angle, but it's pretty sensitive and there's always a risk that you get either too little body sound, or too little brightness at ff. And the overall balance of the close perspective will never exactly match the sum of sound energy projected in all directions. Not to mention horns, being turned backwards, where brightness typically is reflected energy.
> 
> The sum of ambience and ER catches it all, but the direct sound only gets a directional snapshot.



An "ambient" mic or a "distanced" mic in an anechoic chamber is _still_ going to capture overtones at a similar amplitude whereas a close-mic won't, correct or incorrect? And if incorrect, wouldn't a distanced stereo pair capture them more evenly?


----------



## Rilla (Mar 16, 2020)

muk said:


> From a user perspective I can relate to Arne's findings. Short sounds with prominent transients, if they were recorded dry I could never make them sound like they had been recorded in a nice room/hall. In a concert hall these sounds get a nice 'bloom' around them. I never could replicate that with adding reverb to dry samples.





Wallander said:


> This particular transient sound further shows the problem with directivity. These frequencies aren't projected from the resonating body of the violin, and don't obey the rules of a general Directivity Profile for a violin. It's a snapping transient sound, which emanates from the bridge and strings themselves, omnidirectionally, like a separate percussion instrument attached on top of the violin. The only way to collect that sound information is through having a reflective room. If you record a pizzicato dry, you're probably going to have a lack of high frequencies, regardless of where you put the microphone.



que'd at the 12:00 mark




In this video, Ethan Winer displays how a guitar sounds against a backdrop of absorption vs a backdrop of qrd diffusion. The guitar with absorption sounds dull compared to the diffuser. However, if I took an IR of the qrd diffuser in that exact position and applied it to the dry (absorption) guitar, wouldn't it produce similar results as the real qrd diffuser?


----------



## Wallander (Mar 17, 2020)

Rilla said:


> Recording an IR in a concert hall, a speaker plays a sine wave sweep and the natural reverb that you hear *in* *the mic* is _not_ ten thousand copies of the same *sine sweep*, but it's _ten thousand microphone perspectives_, where _every angle of the *speaker* is represented_, assembled into a smooth tail of sound.
> 
> A trumpet is not a speaker, so is this where you're saying the discrepancy lies? That the only way you're going to get an authentic "ten thousand microphone perspectives" of a trumpet reverb is to record an actual trumpet rather than a speaker/IR?
> 
> ...



The problem isn’t (only) with the reverb tail, but it’s the overall sound. The instrument that’s recorded with natural reflections will have a more pleasant timbre, compared to an instrument recorded without reflections. Regardless of reverb treatment.

I don’t want to come off as saying IR reverbs have no place. They’re great. What I’m saying is that _no type of digital reverb_ including IR can replace the natural reflections you get from any type of reflective room. The problem isn’t with the reverb, but it’s the samples.


----------



## Wallander (Mar 17, 2020)

Rilla said:


> What in particular are the natural ER's you are talking about? Do you mean the stage floor/back wall of the stage in a concert hall setting?


Yes. If you record an instrument in a reflective room, the reflection from the back wall is the backside sound of the instrument. The reflection from the left wall is the ”leftside” sound of the instrument. The reflection from the right wall is the ”rightside” sound of the instrument.

With an IR, you have all these reflections at the appropriate times. But they’re all reflections of the frontside side sound of the instrument, or wherever you put the microphone when you record the samples.


----------



## Wallander (Mar 17, 2020)

Rilla said:


> que'd at the 12:00 mark
> 
> 
> 
> ...



No, it would _not_ produce the same sound.

The IR will sound _similar, _in terms of reverb amount, length and echo density. But only the natural diffuser will reflect _different perspectives_ of the guitar, into the microphone. The recording with natural reflections is going to sound a lot more ”airy”, and pleasant.


----------



## Wallander (Mar 17, 2020)

Rilla said:


> An "ambient" mic or a "distanced" mic in an anechoic chamber is _still_ going to capture overtones at a similar amplitude whereas a close-mic won't, correct or incorrect? And if incorrect, wouldn't a distanced stereo pair capture them more evenly?


Yes, in an anechoic chamber, the distance has no (little) meaning.

I meant the ambient mic in a reflective room. Sorry for not being clear about that.


----------



## Dima Lanski (Mar 17, 2020)

Wallander said:


> Yes. If you record an instrument in a reflective room, the reflection from the back wall is the backside sound of the instrument. The reflection from the left wall is the ”leftside” sound of the instrument. The reflection from the right wall is the ”rightside” sound of the instrument.
> 
> With an IR, you have all these reflections at the appropriate times. But they’re all reflections of the frontside side sound of the instrument, or wherever you put the microphone when you record the samples.


I wonder if you ever tried recording an instruments in an anechoic chamber from a lot of different perspectives at the same time and then just mixing them equally? I know it'll produce phase cancellation and buildup, but that's exactly what happens to sound in a real room. Maybe you just need a lot of perspectives, like 100s for it to sound natural?


----------



## Wallander (Mar 17, 2020)

Dima Lanski said:


> I wonder if you ever tried recording an instruments in an anechoic chamber from a lot of different perspectives at the same time and then just mixing them equally? I know it'll produce phase cancellation and buildup, but that's exactly what happens to sound in a real room. Maybe you just need a lot of perspectives, like 100s for it to sound natural?


Yes, I've tried that, but if you have just a few different anechoic perspectives and mix them together, they're more likely to just sound "boxy" and be even more coloured than the individual perspective. Because then you'll have lots of phase issues instead.

If you like math, it's like a converging series. If you add just a few microphone perspectives, it won't help your case, but if you add thousands of them randomly, at a random delay, the sound will eventually converge into the average spectrum of those perspectives (but it will no longer be dry).

The obvious step up from an anechoic mono recording is to have a stereo recording, feeding one channel to each ear. Your brain will happily interpret that as the average spectrum of those two channels, and the same phase differences that would cause issues if you collapsed them into a single channel, instead produces a beautiful stereo image.

Even if you could produce the "perfect" dry sample with the right combination of techniques, it's not the same as a wet sample. The natural wet sample starts off as coloured, and then gradually morphs into something whiter and whiter as more reflections are added. Having a distinct direct sound with a smooth tail is part of the character and detail of the sound.


----------



## Dima Lanski (Mar 17, 2020)

Wallander said:


> If you like math, it's like a converging series. If you add just a few microphone perspectives, it won't help your case, but you add thousands of them randomly, at a random delay, the sound will eventually converge into the average spectrum of those perspectives (but it will no longer be dry).


Yes, that's what I'm thinking as well. Though maybe hundreds will be enough? This depends on how much the shape of the sound changes with perspective. And we might be able to apply some denoising algorithms, similar to raytracing.

I'm not sure applying random delays produces realistic results though, since this way you loose some natural correlations in the signal. They should probably have the same pattern as delays in a room.



Wallander said:


> Even if you could produce the "perfect" dry sample with the right combination of techniques, it's not the same as a wet sample. The natural wet sample starts off as coloured, and then gradually morphs into something whiter and whiter as more reflections are added. Having a distinct direct sound with a smooth tail is part of the character and detail of the sound.


Yeah, I see what you mean. And the coloration's spread in time changes as the musicians move their instruments, so that's another part of the equation. Then we can't just mix the signals prior to simulating the room for each perspective.

Considering all that, I'm convinced that to create realistic wet sound from dry samples we'll need to record from all directions with quite a few mics. Then it's a question of cost of an anechoic chamber with hundreds (or thousands) of mics in it. And considering that the demand for such a tech is almost non existent, the cost is very high. Despite the fact, that in principle the tech is much simpler than, say, a 4k camera in a phone.

There's hope though. Microsoft announced the specs for its next gaming console, and the audio chip there has a built-in 3d acoustic simulation algorithm. So we might get some advancments in audio tech from gaming industry, as we did with graphics

As a side note, the whole tech might seem like overkill, and for single instruments it probably is. But when trying to put multiple instruments in an ensemble, the benefits should be obvious. Especially for horns and even strings. I'm sure, when the tech arrives, the difference will be the same as between VR in the 90s and VR right now. They're kinda the same in theory, but the later just feels right in practice.


----------



## Wallander (Mar 17, 2020)

Dima Lanski said:


> Yes, that's what I'm thinking as well. Though maybe hundreds will be enough? This depends on how much the shape of the sound changes with perspective. And we might be able to apply some denoising algorithms, similar to raytracing.
> 
> I'm not sure applying random delays produces realistic results though, since this way you loose some natural correlations in the signal. They should probably have the same pattern as delays in a room.
> 
> ...


I certainly don’t want to disencourage you on your voyage, but if I may put in some advice.

Even when you’re faced with a big problem, the best solution is usually the simplest solution which also does the job.

Wet sample libraries are getting incrementally better. The limitation is not with the technology. Questions such as ”what makes a great legato program” or ”what makes a great sounding trombone” are still not easily answered. 

In a recording session, you need to know what you want, and be able to communicate that to a session musician, keep motivation up, keep consistency, and make sure those notes are performed with the kind of natural inspiration they’d put into a concert piece. Then you need to stitch these sounds together in a natural-sounding fashion, in realtime.

None of these challenges are resolved by replacing the recording technique for something much more complex, with hundreds of channels of audio. You’re still going to face the same challenges as everyone else battles, creating sample libraries. So you may be just as well off using the same techniques as everyone else, and just do it better, if you have the know-how.


----------



## shawnsingh (Mar 17, 2020)

I think it's a worthy ideal to discuss and brainstorm . For one, imagine being able to take the same wonderfully 3d-dry-sampled instrument, and placing it in different acoustic spaces. On the other hand, I completely agree that it requires so much data and processing power, that it's going to be a long time before it seems worthwhile to even bother.

Anyway, I think I have a more rigorous way to describe the OP's original idea.

First, I think that convolution and IRs are not really to blame here. Technically, these are just part of the math of linear systems - sound propagation in acoustic spaces is predominantly a linear system. The nonlinear effects I believe are negligible (see wikipedia page on nonlinear acoustics, near the end of the intro. I don't know enough of the math to show real proofs, but I believe it). 

Seems like a lot of people on this thread don't have faith in the math and convolution/IR reverbs. I think their arguments are absolutely valid --> given the current state of reverbs. But the math is fully capable of going completely realistic in my opinion.

The real thing to blame is the representation we use for audio samples that is input to the convolution reverb. We use a mono or stereo audio signal --> i.e. measuring sound pressure at only a few specific positions which means we already threw away a LOT of 3d information about how sound propagates from the source in all directions. What we really want is to record and sample is the entire 3d wavefront of sound emanating from the virtual instrument - like capturing the sound pressure on the surface of some sphere around the virtual instrument.

The ideas of "sampling" and "ray tracing" --> actually I think I had superficially peeked at acoustics research and found that they do use ray tracing sometimes. I think they also have an analogous algorithm to "radiosity", which is the other major approach to realistic rendering in graphics. I even saw something called "sonel mapping", which I think was the acoustics version of photon mapping in the graphics world! But I don't think this works for taking a dry sampled virtual instrument and putting a room reverb on it, this approach doesn't decouple the virtual instrument and the acoustics space.

I think there's a better, more practical way to represent the 3d wavefront: *spherical harmonics*. The reason it's more practical is that ultimately it can be represented by a bunch of individual single-channel audio signals. These audio signals can be the samples for the virtual instrument, and then we could define a new kind of convolution reverb that receives this rich 3d information, instead of the flattened stereo samples.

0th-order spherical harmonics would be just a mono omnidirectional audio signal. 1st-order spherical harmonics have 3 additional directional components, and along with the omni part, this can be represented with 4 single-channel audio signals. As you get into higher order harmonics, the effect is to increase the resolution of the entire 3d sound wave, so the approximation becomes more accurate. At some point there will be some n'th order that could be considered "enough approximation".

There are a few properties of the spherical harmonics approach which I think are better than the "sampling the hemisphere approach":
(1) for ray tracing, you'll have to sample a LOT of directions, then do way too much complex thinking about how to prioritize some samples over others to reduce the number of samples and how to prove convergence that the approximation is probably accurate enough. Instead with spherical harmonics, the approximation is more clearly defined - the higher order you go, the more accurate you can represent the specific spatial directivity pattern of the 3-d sound wave, and people can probably do more research to find out how many spherical harmonics are needed for an accurate approximation.
(2) Spherical harmonics can be rotated before the reverb is applied. So you can sample a virtual instrument's 3d wavefront once, and then you could orient the instrument forwards, backwards, sideways, etc, without needing to resample it. MIR does something like this, but they have to still deal with "up-converting" the stereo dry samples into an approximate 3d wavefront with their directivity profiles.

Now, after saying ALL that stuff... I have to admit I don't think this is going to happen any time soon. DAWs and plugins inserts and the whole ecosystem are baked around the concept of mono/stereo/quad/5.1 channels and buses. Changing that to spherical harmonics is a tough sell. And the advent of object based audio workflows in DAWs will probably come sooner (e.g. ATSC 3.0) and seems to go in a totally different direction for now. For our world of virtual instruments and reverbs, I think there are probably some other tricks that can be done with wet samples to improve the realism, and that would further mean it's not so necessary to deal with dozens of spherical harmonics channels.

So despite how fun it is to brainstorm it, I don't think it'll happen for a long time =/

Thoughts?


----------



## Rilla (Mar 17, 2020)

Wallander said:


> Yes, in an anechoic chamber, the distance has no (little) meaning.
> 
> I meant the ambient mic in a reflective room. Sorry for not being clear about that.



Thanks Arne for answering those questions!


----------



## Vik (Mar 18, 2020)

Warning: this will be a long post.

I hope and think there will be a shift in how virtual instruments are being produced and marketed.

There are already a few main approaches to making string libraries. Approach #1: make a lot of different libraries. SF has their symphonic strings, chamber strings, Hans Zimmer strings, London/contemporary strings, the strings in the Bernard Herrmann, Albion and British Drama libraries, the BBC strings, their studio strings, various evo/Arnalds strings and so on. 

Approach #2: spend a lot of time on creating one, well planned string library: Berlin Strings and Cinematic Studio Strings are both examples of that approach (BS also has expansion kits, planned from day 1). 

Approach #3: Create expandable libraries. Spitfires original Sable Strings and Mural are both examples of this approach.

Approach #4: string libraries that specialise on a limited segment within all the possible string library options, like eg. Performance Samples Con Moto and Musical Sampling Soaring Strings. 

All these approaches have their pros and cons. Looking at eg *this* poll, it's clear that both Spitfire Chamber Strings (based on Sable 1, Sable 2, Sable 3 etc) and Cinematic Studio Strings both are very well received by their users. Berlin Strings didn't get as many votes as CSS and SCS in that poll, probably because BS is more expensive and therefore have fewer users. 

The good thing about about CSS is that it's really good at what it does, it's easy to use and not too expensive. The 'bad' thing about CSS is that, unlike SCS/Sable, there hasn't been any CSS 2, CSS 3 releases (yet). The good thing about SCS is that it, like CSS sounds great (both gave really good legatos), SCS was based on several other product and therefore has a lot of work put into it and also a wide variety of articulations, but it's also less available for students and others since it's more expensive than if it was still sold as several separate products. Berlin Strings also obviously has it's strengths, but it couldn't in it's original form be bought the way Mural and Sable could be bought, and was therefore limited to those who could afford a pricey library (this will change when BS is available in the Sine format, and one can buy one instrument at a time). Again, there are BS expansions, but even without them BS isn't affordable for many potential users.

I don't think SCS would have been that good if it had been released as one product - it took several years to make SCS (the various Sable products). The Mural situation was a little different: the first Mural library was quite far from being usable as a main and only library (too many limitations – combined with a high price), but the situation improved many Euros/$/GBPs later. 

I like that there are loads of different libraries available, but/and my feeling is that the most fruitful approaches (both for us users and for those who create these libraries is a modular approach (the "Sable" approach), especially if the various single products wouldn't be discontinued, but that companies in the future instead plan products so they can be bought in portions / be expanded upon own the future.

And maybe there actually will be a CSS 2 and CSS 3 in the future. Maybe Afflatus will become more complete and expanded upon (it's already very good, and is in a way several libraries, but has some limitations compared with some of the other main libraries). And things will to some degree change when Berlin Strings can be bought one instrument at a time. I don't know if Sable 2/3 and SCS was planned before Sable 1 was created, but I believe that in the future, a library like SCS which was planned to become a full library _from before the first module was being sold_ is very close to an ideal way of making string libraries. It also solved the manufacturers dilemma of deciding between an affordable library and an excellent and complete library. 

The other change we could see is that even for one-product-libraries, there may be fewer products from each company, but more focus on making products that can end up as one of those relatively few products that users keep buing for many years. I'm certainly happy that limited product like eg. Con Moto and CSS exist, but again – the future of products like that would be even brighter if there would be a CSS 2 or Con Moto 2 (with many others articulations).

From a profit perspective, maybe 8dio, SF and others make more money on making many different libraries compared with on the somehow modular approach that I describe above – I wouldn't know. But also - maybe they both are large enough to plan a major, new modular library as well. I think that most future users would probably end up with wanting single main string library, and be happy to buy expansions of that lib later on, probably combined with some niche lirabres for special articulations or because they offer something which one cannot get in their main library. But the amount of users who want and can afford to buy a large number of 'main libraries' will always be much lower than those who want a really great main library and keep investing in it over several years.


----------



## Dima Lanski (Mar 18, 2020)

Wallander said:


> I certainly don’t want to disencourage you on your voyage, but if I may put in some advice.


Thanks for the advice! And for all the bits of wisdom you shared in your previous posts. They really did clarify my perspective on the subject. You are an insightful and wise person, and your reputation here is well deserved



Wallander said:


> None of these challenges are resolved by replacing the recording technique for something much more complex, with hundreds of channels of audio. You’re still going to face the same challenges as everyone else battles, creating sample libraries. So you may be just as well off using the same techniques as everyone else, and just do it better, if you have the know-how.


Yeah, I see what you mean. The library creation is complicated enough as it is, even when employing a much simpler recording technique. From a business perspective, it's a risky endeavor already, especially if you never done it before, and you're making it even riskier with the innovative tech. Then again for a business person, that would be a question of ROI. Which will depend on how long you will be able to maintain the uniqueness of your approach, until everybody else copies it. That's given the tech is working and sounds much better than other solutions.



shawnsingh said:


> I think there's a better, more practical way to represent the 3d wavefront: *spherical harmonics*. The reason it's more practical is that ultimately it can be represented by a bunch of individual single-channel audio signals. These audio signals can be the samples for the virtual instrument, and then we could define a new kind of convolution reverb that receives this rich 3d information, instead of the flattened stereo samples.


That's an interesting idea, thanks for sharing. From your description it sounds like it'll still require a lot of mics to capture the sound, is it? But the approach allows to cut down on the bandwidth in a smart way. And there's probably an efficient algorithm for simulating a room with this representation, is there?



Vik said:


> Maybe Afflatus will become more complete and expanded upon (it's already very good, and is in a way several libraries, but has some limitations compared with some of the other main libraries)


Afflatus for me is one of the main string libraries. I just love the sound, and the fact that it's not too wet, so I can safely put additional reverb on top. Also the poly-legato is a very useful tool for writing. The only thing I wish they did, is make ALL the patches into separate sections, and preferably divisi as well


----------



## Wallander (Mar 18, 2020)

shawnsingh said:


> I think it's a worthy ideal to discuss and brainstorm . For one, imagine being able to take the same wonderfully 3d-dry-sampled instrument, and placing it in different acoustic spaces. On the other hand, I completely agree that it requires so much data and processing power, that it's going to be a long time before it seems worthwhile to even bother.
> 
> Anyway, I think I have a more rigorous way to describe the OP's original idea.
> 
> ...


I’m not going to pretend like I have the faintest idea, but from my brief encounter with spherical harmonics in quantum mechanics class, I’m not sure how that could be utilised for the purpose of storing three-dimensional audio recordings. It could simply be over my head.

The data transfer rate between a musical instrument and a reflective room is almost unmeasurably large. On the scale of terabytes of audio data for a single sampled note. The idea that you could store this data, and process it in realtime, for an entire orchestra with all its articulations, and generate three-dimensional reverberation, goes beyond the scope of my imagination. 

The more straightforward method is to let the room do its thing, and to sample the finished results. There are good reasons why modern sample libraries not only use a high-quality stage, but also that particular selection of close and main/ambient microphones. It mirrors the established signal chain of orchestral recordings made in that same room, but the instruments were simply recorded on a per-note basis. You essentially have the same tools at your disposal as the audio engineer recording a score on that stage, and the abundance of artistic knowledge that went into that setup goes beyond theoretical math and physics. 

But it’s inspiring to see so many people thinking outside the box.


----------



## Rilla (Mar 18, 2020)

Wallander said:


> Yes. If you record an instrument in a reflective room, the reflection from the back wall is the backside sound of the instrument. The reflection from the left wall is the ”leftside” sound of the instrument. The reflection from the right wall is the ”rightside” sound of the instrument.
> 
> With an IR, you have all these reflections at the appropriate times. But they’re all reflections of the frontside side sound of the instrument, or wherever you put the microphone when you record the samples.



So a speaker playing a sine sweep can have backside, leftside, and rightside reflections, but those same reflections from a flute are non-uniform in terms of harmonics on individual notes, so the only way to get 100% accuracy is to record the instrument, not the speaker. The speaker will get you close, but a speaker is simply not a flute.


----------



## Wallander (Mar 18, 2020)

Rilla said:


> So a speaker playing a sine sweep can have backside, leftside, and rightside reflections, but those same reflections from a flute are non-uniform in terms of harmonics, so the only way to get 100% accuracy is to record the instrument, not the speaker. The speaker will get you close, but a speaker is simply not a flute.


Exactly.

That's a great way of putting it.

And it's not only harmonics, but it's also other components of the sound, such as breath noise, or chiff noise. The breath noise of a flute has its own unique pattern of radiation, which is different from the body of the flute, even for the same frequency. So you can't easily make up for these things with equalization or a body IR. If you amplify a partial that's missing, you're going to get too much breath noise or chiff noise in that frequency region instead.


----------



## Dima Lanski (Mar 18, 2020)

Wallander said:


> I’m not going to pretend like I have the faintest idea, but from my brief encounter with spherical harmonics in quantum mechanics class, I’m not sure how that could be utilised for the purpose of storing three-dimensional audio recordings. It could simply be over my head.


From what I understand it's a way of storing (and then processing) 3d spatial audio. The point is to store submixes of all the surround mics. 0th harmonic is a mix of all mics, 1st is 3 mixes, left/right, top/bottom, front/back, 2nd is 8 mixes of the corners, and so on. You can probably even think of non symmetrical submixes, to accomodate for "instrument profile". The question is how much of these mixes you need for realistic result. And also, how to process them in this form to simulate a room, or capture and apply a real one.


----------



## Wallander (Mar 18, 2020)

Dima Lanski said:


> From what I understand it's a way of storing (and then processing) 3d spatial audio. The point is to store submixes of all the surround mics. 0th harmonic is a mix of all mics, 1st is 3 mixes, left/right, top/bottom, front/back, 2nd is 8 mixes of the corners, and so on. You can probably even think of non symmetrical submixes, to accomodate for "instrument profile". The question is how much of these mixes you need for realistic result. And also, how to process them in this form to simulate a room, or capture and apply a real one.


Admittedly, I've been trying my best to forget about spherical harmonics now for more than 15 years, so I'm not quite up-to-date on the subject. However, I don't _think_ it would actually compress the data size, but it would just be a different way of representing it. I could be wrong, however. 

But if you're happy with an approximation like that, you would perhaps be better off pre-rendering your 100 microphones into something more manageable? Such as a stereo channel representing early reflections, or a "white" stereo channel designed only for being fed into the reverb. But exactly how those reflections would be arranged in order to sound good is, well, black magic.


----------



## shawnsingh (Mar 18, 2020)

Ambisonics is a good example of spherical harmonics in spatial audio. It's similar to the idea we're discussing here to sample a virtual instrument, but for ambisonics it's capturing 3d audio arriving into a listening location, while here we're talking about capturing 3d audio emanating from a source virtual instrument.

From what I know, people consider 5th order ambisonics to be as accurate as humans can perceived. That would be only 36 channels of audio. But this is specifically for representing the 3d audio arriving towards the listener.

For this idea of spherical harmonics used to sample a virtual instrument, or to sample IRs for a location in a room, I don't know how high order would be necessary. It would of course depend on the instrument or venue being sampled. I think there would often be specific directions that are very sensitive, like the bell of a brass instrument, and the rest could probably be modeled at low resolution. So if there's some way to model specific directions separately, then maybe the order doesn't need to be more than third order (wild guessing here) which would be only 16 audio channels.

if my guesses are not to far off, then this really could be on the edge of practical. One instrument like this would be roughly equivalent to putting 10 traditional stereo sampled instruments through convolution reverb...


----------



## Wallander (Mar 19, 2020)

shawnsingh said:


> Ambisonics is a good example of spherical harmonics in spatial audio. It's similar to the idea we're discussing here to sample a virtual instrument, but for ambisonics it's capturing 3d audio arriving into a listening location, while here we're talking about capturing 3d audio emanating from a source virtual instrument.
> 
> From what I know, people consider 5th order ambisonics to be as accurate as humans can perceived. That would be only 36 channels of audio. But this is specifically for representing the 3d audio arriving towards the listener.
> 
> ...


Ok, so looking at the theory of ambisonics, I _think_ I understand where "spherical harmonics" come in. Again, it may be over my head, but I'm not sure how it could be used for the purpose of directivity in a musical instrument, as it seems to deal with the direction by which sound hits your ear. As opposed to the directivity of a musical instrument, and the fundamental problem that natural reverberation is actually an agglomeration of all angles of the instrument, rather than just basic reflections of the direct sound.


----------



## Dietz (Mar 19, 2020)

Wallander said:


> I'm not sure how it could be used for the purpose of directivity in a musical instrument



Of course you can.  Lots of research has been done in that realm by the super-smart people at IEM Graz (Institute of Electronic Music and Acoustics - VSL's development partner for certain aspects of the upcoming MIR 3D). As part of their famous IEM Ambisonics Suite they even offer freeware plug-ins which are able to do this:

-> https://plugins.iem.at/docs/directivityshaper/

Spectacular stuff in up to 7th order Ambisonics!


----------



## Wallander (Mar 19, 2020)

Dietz said:


> Of course you can.  Lots of research has been done in that realm by the super-smart people at IEM Graz (Institute of Electronic Music and Acoustics - VSL's development partner for certain aspects of the upcoming MIR 3D). As part of their famous IEM Ambisonics Suite they even offer freeware plug-ins which are able to do this:
> 
> -> https://plugins.iem.at/docs/directivityshaper/
> 
> Spectacular stuff in up to 7th order Ambisonics!


It looks very powerful, but I can’t get my head around how that would be used in the context of instrument directivity.

If you record a guitar from the left, I don’t see how one could take that recording, and process it to become a recording made from the right. Except for a very crude approximation based on filters or an IR, where all the transients in the sound are smeared out.


----------



## Dietz (Mar 19, 2020)

Wallander said:


> If you record a guitar from the left, I don’t see how one could take that recording, and process it to become a recording made from the right.


That's not the idea.


----------



## Wallander (Mar 19, 2020)

Dietz said:


> That's not the idea.


Understood. I think we may be talking about two different things.

I’m sure there are all sorts of amazing things that you’re capable of doing with MIR, on the spatialization end of things.

My concern isn’t actually with the spatialization part, but the timbre of the instrument. Natural reverberation improves the timbre with each reflection, but digital reverberation doesn’t.

It’s not a flaw in a particular reverb algorithm, but the entire signal chain and concept of digital reverberation was always a simplified model of reality. The fact that each reflection of a natural reverb is a different angle of the instrument, is an aspect that’s either been constantly overlooked, or it’s a known tradeoff.

I’ve never seen it being referenced anywhere, but I would be surprised if I was the first one having this thought. The physics is pretty intuitive.


----------



## shawnsingh (Mar 19, 2020)

Wallander said:


> Ok, so looking at the theory of ambisonics, I _think_ I understand where "spherical harmonics" come in. Again, it may be over my head, but I'm not sure how it could be used for the purpose of directivity in a musical instrument, as it seems to deal with the direction by which sound hits your ear. As opposed to the directivity of a musical instrument, and the fundamental problem that natural reverberation is actually an agglomeration of all angles of the instrument, rather than just basic reflections of the direct sound.



Apologies for the essay. The concepts are not hard, but somehow it's just tricky to explain. But if you do have the patience to go through it, I hope this fully explains how it would work. 

(1) Directivity of sound arriving to an ear is the same as directivity of sound emanating from an instrument, the only difference is whether the sound is going inwards or outwards. The math of sound waves propagating is linear, so it doesn't matter whether the sound is going "in" or "out'. Just like you can trace rays of light starting from light sources or starting from a camera position.

(2) Choose any sphere with the virtual instrument in the center. Then the real 3d sound propagation can be snapshotted on the surface of that sphere by measuring sound pressure. Actually it doesn't matter the size of the sphere you chose... it's the **resolution** of the sound pressure snapshot that matters (i.e. solid angles), so really does represent directivity because when you snapshot any sphere, you can just trace straight lines from the center and the sound pressure will travel along that line until it hits an object. Take 44100 snapshots per second of the sound pressure on the sphere, and suddenly you have a 44.1 kHz 3d audio representation.

(3) This 3d sound pressure can be approximated with spherical harmonics. Higher order spherical harmonics will provide more accurate estimate of the sound pressure in any direction - but no matter how good the resolution is, you can still always estimate the sound pressure in any specific direction (i.e. any specific point on the surface of the sphere). This is a key point - with a 3d sound wave, you can extract a mono audio signal that has only the sound traveling in one direction.

(4) You're right that natural reverberation is an agglomeration of all angles of the instrument, reflecting everywhere, before it arrives at the microphones. But actually, all that information is already contained in a single impulse response. So think of a convolution reverb plugin - for one channel of audio, an impulse response hard codes ALL the following: (a) the directivity pattern and location and orientation of the sound source, (b) all of the sound waves propagating in all directions, reflecting around the acoustic space, arriving to the microphones, and (c) the polarity pattern and location and orientation of the microphones. All this information inside one impulse response is truly capturing reality - the entire process of sound propagating and arriving into microphones is just a linear system, and the IR captures that system exactly. EXCEPT... the hardcoded directivity pattern. It doesn't make physical sense to take a stereo virtual instrument, which already had it's own directivity pattern and "flattened" it into the stereo signal, and then simulate another (mismatched) directivity pattern on top of that.

The way to solve this is to capture multiple impulse responses for the same source location - just like the OP had proposed, and each IR uses a sound source that has a carefully controlled directivity pattern, focused in a narrow direction. As people already mentioned, this is what MIR does, but MIR is still fundamentally limited by receiving only stereo audio inputs (EDIT: they do use Directivity Profiles to up-convert the stereo with specialized knowledge of their original recording setup, etc.)

So finally we can tie these four parts together:

So if we have captured multiple IRs for the instrument's location, and each IR is intended to receive sound that is traveling in a specific direction from the virtual instrument... then using spherical harmonics, we can compute the sound pressure for those directions, i.e. extract audio signals that are traveling in those directions, and then we can send those directional audio signals to the respective impulse responses. We can "rotate" the virtual instrument by extracting different directions that were rotated. We can "move" the instrument by capturing these IRs from a different location in the venue. After the directional IRs are applied to those directional audio signals, they just sum up to represent all the sound of the virtual instrument propagating in all directions, reflecting everywhere, and arriving at the microphones.


----------



## Wallander (Mar 20, 2020)

shawnsingh said:


> Apologies for the essay. The concepts are not hard, but somehow it's just tricky to explain. But if you do have the patience to go through it, I hope this fully explains how it would work.
> 
> (1) Directivity of sound arriving to an ear is the same as directivity of sound emanating from an instrument, the only difference is whether the sound is going inwards or outwards. The math of sound waves propagating is linear, so it doesn't matter whether the sound is going "in" or "out'. Just like you can trace rays of light starting from light sources or starting from a camera position.
> 
> ...


Great post. I have a few things to ask, or add.

1) You can’t encode a source (instrument) the same way you encode ambisonics, unless it’s a _point source_, and acoustic instruments are not. If you record the instrument from many different angles, the phases won’t line up.

2) As far as I can determine, ambisonics wouldn’t provide any benefit, compared to storing the individual channels, other than saving some memory. The process of getting an ”intermediate” angle is still just the process of mixing a number of channels together, at various proportions. 

3) Today’s IRs don’t store the directivity. Even if it did, there’s no 1:1 relationship between the incidence of reflection at the point of the microphone, and the emittance angle for the instrument. Even if there was, sound bouncing around the hall are quickly dispersed by objects and diffusors in the room. 

My guess is that you would be sonically (and practically) better off to just record a number of mono channels from different angles of the instrument, and then process each channel with its own subset of the IR.


----------



## Dietz (Mar 20, 2020)

Wallander said:


> he entire signal chain and concept of digital reverberation was always a simplified model of reality.


The whole process of recording is a simplified model of reality, or at least a very peculiar approach to its representation. Like watching a movie is not the same as watching the play at a theatre.

(... but heck with it, theatres as well as cinemas are closed anyway. :-/ ...)


----------



## Dietz (Mar 20, 2020)

shawnsingh said:


> this is what MIR does, but MIR is still fundamentally limited by receiving only stereo audio inputs.


No, not quite. This is where MIR brings in the so-called Directivity Profiles, at least in case of Vienna Instruments. Due to the fact that we know the recording circumstances of these stereo sources, we are also able to mimic their typical sound emanation - at least in eight directions. Not as good as 5rd order Ambisonics, but still better then anything else on the market.


----------



## José Herring (Mar 20, 2020)

There is something to OP approach. It would give the control and flexibility of the old VSL libraries with the sound and ambience of a wet library. I like the idea. 

Maybe perhaps VSL can do some preliminary experiments with their already recorded dry libraries into their new Synchron Stage. I'd like to find out if this could work.


----------



## shawnsingh (Mar 20, 2020)

Wallander said:


> 1) You can’t encode a source (instrument) the same way you encode ambisonics, unless it’s a _point source_, and acoustic instruments are not. If you record the instrument from many different angles, the phases won’t line up.



Yeah, great point!





Wallander said:


> 2) As far as I can determine, ambisonics wouldn’t provide any benefit, compared to storing the individual channels, other than saving some memory. The process of getting an ”intermediate” angle is still just the process of mixing a number of channels together, at various proportions.



Another fair point, but I don't know the math well enough to say.




Wallander said:


> 3) Today’s IRs don’t store the directivity. Even if it did, there’s no 1:1 relationship between the incidence of reflection at the point of the microphone, and the emittance angle for the instrument. Even if there was, sound bouncing around the hall are quickly dispersed by objects and diffusors in the room.



This doesn't seem correct to me. Or maybe I'm misunderstanding your point. I would say that an IR does indeed capture the directivity of sound source and microphones that were used when recording the frequency sweeps. The same IR also captures all the different ways that sound bounds around the room after leaving the source and before arriving to the mic.




Dietz said:


> No, not quite. This is where MIR brings in the so-called Directivity Profiles, at least in case of Vienna Instruments. Due to the fact that we know the recording circumstances of these stereo sources, we are also able to mimic their typical sound emanation - at least in eight directions. Not as good as 5rd order Ambisonics, but still better then anything else on the market.



Apologies for over-simplifying ... it is accidentally misleading. For what it's worth I did acknowledge this in a previous post . Will edit to correct that.


----------



## Dietz (Mar 20, 2020)

shawnsingh said:


> Apologies for over-simplifying ... it is accidentally misleading. For what it's worth I did acknowledge this in a previous post . Will edit to correct that.


No need to apologise! I just tried to spread little-known facts. All is good!


----------



## Dima Lanski (Mar 20, 2020)

Wallander said:


> You can’t encode a source (instrument) the same way you encode ambisonics, unless it’s a _point source_, and acoustic instruments are not. If you record the instrument from many different angles, the phases won’t line up.


I don't see why not, you just need to measure and account for the position of each microphone precisely enough. There bound to be some error in any model, but we should be able to calculate it and compensate for it.



josejherring said:


> Maybe perhaps VSL can do some preliminary experiments with their already recorded dry libraries into their new Synchron Stage. I'd like to find out if this could work.


We'll need a 3d recorded dry samples for it to work. As @Dietz said, MIR has directivity profiles to accommodate for that, but it's still an approximation.



Dietz said:


> No, not quite. This is where MIR brings in the so-called Directivity Profiles, at least in case of Vienna Instruments. Due to the fact that we know the recording circumstances of these stereo sources, we are also able to mimic their typical sound emanation - at least in eight directions. Not as good as 5rd order Ambisonics, but still better then anything else on the market.


Speaking of Directivity Profiles. I see a couple of flaws with them, apart from it being an approximation.

First is that, as @Wallender said, an instrument might have a few completely different ways of producing sound, which means that you'll need a different directivity profile for some articulations. Does MIR provide this?

And second, I think they're not very effective with ensembles like a string section. If you record it from a single direction the players closer to the mic will have more prominence in the sound. This, it seems, can only be solved with a 3d recording, or ambisonics. Or you'll have to have a separate recording and an IR for each player, or at least a subsection.


Finally I want to bring up another problem with 3d dry sampling, that nobody mentioned yet. And that is, how stable are the IRs depending on the source position, i.e. how they change? And as a consequence, can you blend two neighboring IRs to produce an intermediate one? The answer is probably no, not directly. If you just mix two IRs, even produced from close points in space, you'll get twice as many reflections slightly offset from each other. So you'll need a smart way to blend between them.

I've been thinking of neural net approach for solving this problem, it should be easy to train one to predict a middle point IR from two others. But maybe there's a more analytical solution to this? The similar problem should be present in HRTF ambisonics, so I wonder how they solve it there?


----------



## Wallander (Mar 20, 2020)

Dietz said:


> No, not quite. This is where MIR brings in the so-called Directivity Profiles, at least in case of Vienna Instruments. Due to the fact that we know the recording circumstances of these stereo sources, we are also able to mimic their typical sound emanation - at least in eight directions. Not as good as 5rd order Ambisonics, but still better then anything else on the market.


If Vienna Instruments was indeed recorded in eight (or sixteen, or more) directions, you can do a lot better than Directivity Profiles.

You should supply your VI with a large number of raw microphone perspectives. 

1) The user should be given the freedom to _pick_ their choice of close perspectives for L/R (or surround speakers). This in itself would go a long way for many people. When you're fatigued with a sound, just change the microphone perspective. If a sound is too aggressive (or not aggressive enough) you could change the microphone perspective. That option would be endlessly better than filters.

2) You should send them to different endpoints in a multiple-endpoint reverb, so that different microphone perspectives are mixed together in unedited form by (and only by) the reverb, as reflections.

3) I'm not sure how you made your IRs, but if you have the means to divide them by microphone pickup direction (e.g. ambisonics), you can use that to split a recorded IR into e.g. eight sparse IR tails, which sums up to the full IR. You would feed a different instrument perspective into each directional IR.

4) This is perfectly manageable from a CPU point-of-view. You don't need eight reverbs per instrument, but you can reuse those eight reverbs for lots of instruments (the entire section, or the entire orchestra) as a multi-channel send effect.

5) If you want some movement in the reverb, on a per-instrument basis, you could randomly modulate the signal level over time of these inputs at almost no CPU cost.

6) Your programmers are excellent. Forget about ambisonics and figure out a way of compression to store many different microphone perspectives, without messing up the phase. For example, you could possibly store only two channels at full-quality, calculate the signal difference between those channels and various other microphones, and store only the residual as an 8-bit signal.

7) The direction isn't important, but the point of all this is to achieve a timbre that's more in line with a natural wet recording, where you don't have overtones sticking out, missing frequency regions, or an unbalanced level of tone vs. breath/bow noise and transients, or unevenness in velocity layers.

8) You can achieve a variation in spatiality for different instruments by modifying the relative input levels for each IR, and/or change the order. E.g. the "left" bassoon sound goes to the "left" IR, but the clarinet sends the "right" sound to the "left" IR. None of this is real physics, anyways, so there's no reason not being creative about it.

9) Again, direction or spatial placement shouldn't be the focus. That's not the issue with dry samples, but the issue people have with dry samples is the timbre. Most people who use wet samples are happy to just pan them left/right, or possibly use a time offset between the channels. It's completely unnatural, but it sounds great, if the timbre is great to begin with.

10) Never do FIR processing for color, and never mix two channels prior to reverberation. That only diffuses the sound and/or colours the sound in unwanted ways.

I'm not sure why I made this a bullet list, but I like bullet list. And I apologise if this comes off as a rant, I mean it with the best of intentions.


----------



## Wallander (Mar 20, 2020)

shawnsingh said:


> This doesn't seem correct to me. Or maybe I'm misunderstanding your point. I would say that an IR does indeed capture the directivity of sound source and microphones that were used when recording the frequency sweeps. The same IR also captures all the different ways that sound bounds around the room after leaving the source and before arriving to the mic.


What I meant was, if you have a single monophonic IR as a .wav file, you can't say what direction an individual reflection came from.


----------



## Wallander (Mar 20, 2020)

Dima Lanski said:


> I don't see why not, you just need to measure and account for the position of each microphone precisely enough. There bound to be some error in any model, but we should be able to calculate it and compensate for it.
> 
> Finally I want to bring up another problem with 3d dry sampling, that nobody mentioned yet. And that is, how stable are the IRs depending on the source position, i.e. how they change? And as a consequence, can you blend two neighboring IRs to produce an intermediate one? The answer is probably no, not directly. If you just mix two IRs, even produced from close points in space, you'll get twice as many reflections slightly offset from each other. So you'll need a smart way to blend between them.
> 
> I've been thinking of neural net approach for solving this problem, it should be easy to train one to predict a middle point IR from two others. But maybe there's a more analytical solution to this? The similar problem should be present in HRTF ambisonics, so I wonder how they solve it there?


Ambisonics is (very simplified) based on the idea that there's a center omnidirectional microphone that's "white", and contains all the audio. Then there's a number of _directional side-channels_, where the frequency components of the sound still have the _same phase_ as the center microphone. So you can _subtract_ the side-channels from the center channel, to isolate only a particular direction of sound.

If the microphones in the ambisonics array don't occupy the _exact_ same point in space (which they never do) there are going to be phase discrepancies between the microphones, producing audio artefacts when you subtract the side-channels from the white center channel. This is why ambisonics microphones are designed such that the distance between the microphones is very small. Even then, there are going to be audible artefacts in the audio at high frequencies, because when you approach 20.000 Hz the wavelengths are even smaller than an inch.

For example, if you have a transient click that's just one sample wide, at 44100 Hz sample rate, and the distance between your microphones is one inch, the transient is going to hit the second microphone three samples later. So if you mix these two microphones together, you no longer have one transient, but two closely spaced transients at a lower volume. The audio has been smeared out. If you do this with a lot of channels, you no longer have a crispy-sounding transient, but a digital noise. It may look similar on an FFT, but it doesn't sound even remotely the same.

If you put microphones in a sphere around an instrument, none of the frequency components of the instrument's timbre line up, and there's no white center microphone. The white center microphone also cannot be constructed mathematically, because the side-channels no longer serve as an orthogonal basis for the center microphone, i.e. you can't use the MID+LEFT microphone to get the RIGHT microphone, which is how you save channels with ambisonics in the first place. I'm not quite sure how to explain this any better.


----------



## Dima Lanski (Mar 20, 2020)

Wallander said:


> Ambisonics is (very simplified) based on the idea that there's a center omnidirectional microphone that's "white", and contains all the audio. Then there's a number of _directional side-channels_, where the frequency components of the sound still have the _same phase_ as the center microphone. So you can _subtract_ the side-channels from the center channel, to isolate only a particular direction of sound.
> 
> If the microphones in the ambisonics array don't occupy the _exact_ same point in space (which they never do) there are going to be phase discrepancies between the microphones, producing audio artefacts when you subtract the side-channels from the white center channel. This is why ambisonics microphones are designed such that the distance between the microphones is very small. Even then, there are going to be audible artefacts in the audio at high frequencies, because when you approach 20.000 Hz the wavelengths are even smaller than an inch.
> 
> ...


I think I now roughly understand what you're saying, and completely agree with it. There's definitely a problem there. I believe it's similar to the problem with omnidirectional IRs, that I mentioned above and that is, we can't just blend two neighboring mics together and get a signal similar to what a real mic between them would pickup. We need some transient preserving morphing to achieve that. And as with the IRs, we can try and solve it with a neural net, or look at how the similar problem is solved in HRTF ambisonics.

Come to think of it, it is somewhat similar to morphing between dynamic layers of a recorded instrument, and that it sometimes sounds like two instruments playing at the same time instead of a single instrument playing an average dynamic.

So, yeah, if we are to use ambisonics, we need some special form of signal morphing to counter the problem.


----------



## Dima Lanski (Mar 20, 2020)

On a related note, Sony just revealed the hardware specs for their Playstation 5, and they too have developed a special chip for processing 3d audio. They claim it to be as fast as 8 cpu cores for the task, able to process around 5000 simultaneous audio sources, which is pretty good, especially for a chip that is just a small part of an already inexpensive 500$ machine. And it also seems very scalable, since it's similar to gpu, but is even simpler in its overall architecture.

They're not doing 3d audio sources yet, but they're doing 3d sound simulation and HRTF ambisonics. Though the 3d part might be offloaded to gpu's raytracing hardware, it wasn't completely clear to me.

Anyway, there's hope that we'll get a new type of powerful hardware from the gaming industry, and the full 3d audio simulation in music production is much closer, than we think


----------



## Rilla (Aug 21, 2022)

@Wallander

I have a question.

First of all, I want to thank you again for your input in this thread and you enlightening me about IR's. Since then, there has been so much I've noticed when focusing on the way rooms respond to different instruments.

In this recording for instance, it seems that out of the first 3 kick hits, the second hit had a very different sounding reflection than the first and third (the reflection is strong in the right channel). I'm assuming this is an example of how the kick naturally radiates differently per note? Is that correct?



Also, here at 5:55 (que'd), the kick drum harmonics are playing 3 different tones (Ab, F, G) in succession, based on the velocity of the strikes. That's just the direction heard from the pickup of the mic, but a mic in a different position might not pick up those overtones so prominently?




I've been reading this paper trying to get a full grasp of directivities.


https://users.aalto.fi/~ktlokki/Publs/patynen_aaua_2010.pdf



Do you have any sources you favorite that I might study on the subject?


----------



## Wallander (Aug 22, 2022)

Rilla said:


> @Wallander
> 
> I have a question.
> 
> ...



These recordings use artificial studio reverb. I don't believe the room is prominent enough to do anything for the sound.

I know very little about classic recording studio techniques, so you may want to get a second opinion, but your first example could be the effect of sound compression applied after the reverb; the second kick could be louder, ducking the volume and taking the reverb down with it. In your second example, I think the kick drum is doubled by toms, making the toms the source of the pitched tone instead of the kick drum. At least that's what I _think_.

At any rate, a room is linear. It should respond the same every time, and for any input volume.

Instrument directivity is very complex. I would suggest not spending too much time on the theory. Researchers can teach you that brass instruments emanate sound from the bell, woodwinds from the holes, and string instruments from the bridge and top plate, and that's pretty much it. Look for books or articles on studio techniques. The people working on the studio floor are the true experts on how to best record an instrument.

In my opinion, no angles are good for orchestral instruments. Arrange one hundred microphones around a violin in an anechoic chamber, and you'll have one hundred channels that all sound terrible. That's why an orchestral instrument must be recorded in situ, with natural reverberation being a superposition of all angles. The ambiance becomes the primary carrier of the timbre; the upfront sound is just for detail and imaging, so you're not as sensitive about the angle.


----------



## sumskilz (Aug 22, 2022)

Rilla said:


> In this recording for instance, it seems that out of the first 3 kick hits, the second hit had a very different sounding reflection than the first and third (the reflection is strong in the right channel). I'm assuming this is an example of how the kick naturally radiates differently per note? Is that correct?
> 
> Also, here at 5:55 (que'd), the kick drum harmonics are playing 3 different tones (Ab, F, G) in succession, based on the velocity of the strikes. That's just the direction heard from the pickup of the mic, but a mic in a different position might not pick up those overtones so prominently?


I think most of what you're hearing in both cases is sympathetic vibration from the toms, but a drum hit while already resonating does sound different from one hit when it isn't. This is also effected by how much it's still resonating, which depends on how long it was since the last hit and in the case of the kick drum, whether or not the beater is left against the head, and if so, with how much pressure.


----------



## Rilla (Aug 22, 2022)

Wallander said:


> These recordings use artificial studio reverb. I don't believe the room is prominent enough to do anything for the sound.
> 
> I know very little about classic recording studio techniques, so you may want to get a second opinion, but your first example could be the effect of sound compression applied after the reverb; the second kick could be louder, ducking the volume and taking the reverb down with it. In your second example, I think the kick drum is doubled by toms, making the toms the source of the pitched tone instead of the kick drum. At least that's what I _think_.
> 
> ...



Thanks for your input!

So in other words, with orchestral instruments the room/hall is an extension of the instruments
in a sense, that almost overpowers the direct sound? Whereas studio recordings like I posted are a totally different convention where direct sound is way more dominant and reverb doesn't influence the timbre of instruments so much?


----------



## Wallander (Aug 22, 2022)

Rilla said:


> Thanks for your input!
> 
> So in other words, with orchestral instruments the room/hall is an extension of the instruments
> in a sense, that almost overpowers the direct sound? Whereas studio recordings like I posted are a totally different convention where direct sound is way more dominant and reverb doesn't influence the timbre of instruments so much?


Yes, that’s correct. 

The direct sound has a significant overtone variety. Some overtones are strong, some are weak, almost randomly. This is an unwanted quality that makes the timbre strident and uneven.

The artificial reverb can add a tail of sound, but it cannot make the timbre smooth like a symphony hall because an artificial reverb can only increase the overtone variety.


----------



## zigzag (Aug 24, 2022)

Wallander said:


> The direct sound has a significant overtone variety. Some overtones are strong, some are weak, almost randomly. This is an unwanted quality that makes the timbre strident and uneven.
> 
> The artificial reverb can add a tail of sound, but it cannot make the timbre smooth like a symphony hall because an artificial reverb can only increase the overtone variety.


What about using plugin like Soothe to reduce strong overtones and resonances, would that help achieve better approximation?


----------



## Wallander (Aug 24, 2022)

zigzag said:


> What about using plugin like Soothe to reduce strong overtones and resonances, would that help achieve better approximation?


It could take the edge off the worst problem frequencies, but it will not fill in the missing blanks. 

Moreover, a real room will make the sound bloom, but an IR can't do that. The tail of an IR sounds the same after 50 milliseconds, 500 milliseconds, and 2000 milliseconds (apart from some damping). It's a one-dimensional sound that sounds... suffocating, for lack of a better word. 

The natural reverb tail grows smoother and more all-encompassing (spanning all frequencies) as you progress towards the end of the tail. Like the room opens up to the sound. The feeling is open and airy. 

Anechoic+IR may work quite well with brass, somewhat lesser with woodwinds, and very poorly with strings. With that being said, music is art, and everything is an approximation. If anechoic+IR works for you musically, you should use it. I'm only saying wet and dry samples are not interchangeable.


----------



## Wallander (Aug 24, 2022)

zigzag said:


> What about using plugin like Soothe to reduce strong overtones and resonances, would that help achieve better approximation?


To illustrate, here's a comparison of a natural tail and a state-of-the-art IR of the same room. Each is played four times.

I'm sure you can appreciate the blooming of timbre in the natural tail. It opens up as the tail progresses, blooming into something smoother, brighter, and airier. The natural tail introduces more frequency components to the sound as time passes.

The IR doesn't evolve. The reverb time and energy are the same, but it's just a prolonging of the original sound. That's how all artificial reverberation works, convolution or algorithmic.


----------



## EanS (Aug 24, 2022)

This whole thread is like reading an NBA All-star only players discussion. ❤️


----------



## Rilla (Aug 24, 2022)

Wallander said:


> To illustrate, here's a comparison of a natural tail and a state-of-the-art IR of the same room. Each is played four times.
> 
> I'm sure you can appreciate the blooming of timbre in the natural tail. It opens up as the tail progresses, blooming into something smoother, brighter, and airier. The natural tail introduces more frequency components to the sound as time passes.
> 
> The IR doesn't evolve. The reverb time and energy are the same, but it's just a prolonging of the original sound. That's how all artificial reverberation works, convolution or algorithmic.



Man that blows my mind!
And it bugs me that the only current way to produce that is with a real instrument in a real room.
My DAW was supposed solve all my problems! lol

There's no kind of AI or deep learning or anything in the works that will eventually allow us to recreate this phenomenon in the box??


----------



## Wallander (Aug 24, 2022)

Rilla said:


> Man that blows my mind!
> And it bugs me that the only current way to produce that is with a real instrument in a real room.
> My DAW was supposed solve all my problems! lol
> 
> There's no kind of AI or deep learning or anything in the works that will eventually allow us to recreate this phenomenon in the box??


Sure. Machine learning could possibly be a candidate for something like this. I don't know if that's actually in the works, but you're right in it being the right approach.

Meanwhile, recording the sample in situ is a perfectly fine solution that's also very CPU friendly. It's only a matter of time before all these libraries can be streamed from the hard drive without any preload.


----------



## Vlzmusic (Aug 25, 2022)

Wallander said:


> Meanwhile, recording the sample in situ is a perfectly fine solution that's also very CPU friendly.


Arne...I'll be damned... NP4 then?


----------



## zigzag (Aug 25, 2022)

Wallander said:


> Meanwhile, recording the sample in situ is a perfectly fine solution that's also very CPU friendly.


The issue with wet samples is that any transitions between samples also transitions the baked in reverb which is far from perfect. Most evident in fast legato lines and transitions between dynamic layers. Any pitch bending also bends baked in reverb.


----------



## zigzag (Aug 25, 2022)

Wallander said:


> To illustrate, here's a comparison of a natural tail and a state-of-the-art IR of the same room. Each is played four times.
> 
> I'm sure you can appreciate the blooming of timbre in the natural tail. It opens up as the tail progresses, blooming into something smoother, brighter, and airier. The natural tail introduces more frequency components to the sound as time passes.
> 
> The IR doesn't evolve. The reverb time and energy are the same, but it's just a prolonging of the original sound. That's how all artificial reverberation works, convolution or algorithmic.


Really good explanation and example. I learned a lot in this thread and can now identify the issue much better. 

I recreated the test on some percussion (convolution reverb on close mic vs tree mic). No matter what I do, the pitch stays much more prominent the whole time in the IR version while in the tree mic version sound transforms into kind of noise like tail.


----------



## cpoessnicker (Aug 25, 2022)

First of all: Thank you for this fantastic thread! Although quite abstract and honestly, I don’t think I understood everything, it is really informative and I learned a lot! As game engines on the Sony PlayStation and ray tracing came up at one point, I’d thought I share this video I made a couple of months ago, where I explored this exact thing for orchestral music production:  
Now, after reading this, what is missing are the instrument specific omnidirectional emissions that reflect into the room (well they are as they are not set up as directional sound sources but they sound the same in all directions) If these “inconsistencies” are taken into account on the emitters it could possibly be quite close to an “instrument in a space”, right? In this example I used wet samples and the rooms might still need some work in terms of the materials and shapes, so “real” in a simulation, *And* the IR conversion to use it in nuendo doesn’t help either, but one problem at a time…


----------



## justthere (Aug 25, 2022)

A thought or two about the importance of early reflections from all sides of an instrument: seems to me that some approximation would be acceptable here. If one were to record an instrument throughout its range from the front, sides and back, and then determine the difference between them, then one might approximate the variable response of a filter to be applied to an impulse recorded from the particular position of an instrument, so that its reflections from the rear wall, say, would contain the appropriate frequencies. These IR captures would likely be best synthesized rather than just using as IR’s, because they could be combined in the most flexible way - wouldn’t it be easier to say of a violin position, “let anything striking this particular angle of reflection have a certain incoming filter contour” with an algorithmic but IR-derived reverb?


----------



## Wallander (Aug 25, 2022)

cpoessnicker said:


> First of all: Thank you for this fantastic thread! Although quite abstract and honestly, I don’t think I understood everything, it is really informative and I learned a lot! As game engines on the Sony PlayStation and ray tracing came up at one point, I’d thought I share this video I made a couple of months ago, where I explored this exact thing for orchestral music production:
> Now, after reading this, what is missing are the instrument specific omnidirectional emissions that reflect into the room (well they are as they are not set up as directional sound sources but they sound the same in all directions) If these “inconsistencies” are taken into account on the emitters it could possibly be quite close to an “instrument in a space”, right? In this example I used wet samples and the rooms might still need some work in terms of the materials and shapes, so “real” in a simulation, *And* the IR conversion to use it in nuendo doesn’t help either, but one problem at a time…



It’s impossible to correct in the IR.

The problem is that the reverb tail should be sonically richer than the direct sound. It’s the type of process you need A.I. for. You need to invent things that are not in the dry sound, improving it.


----------



## José Herring (Aug 25, 2022)

I'm late to this party and I only skimmed the 5 pages. The idea of the OP is great and though there are a few products that have a similar approach there's always room for more. Competition drives the market. 
The only real thing that needs to be considered is that really the room is not just a good room but it becomes part of the instrument's sound. So by removing the player from a good room you will knock out what makes an instrument sound the way it does. This can't be replicated by adding IR to the sound. 
Mostly brass suffers from lack of room reflections, then percussion, then strings, then woodwinds. It's why the original VSL had a rough time sounding good and why imo in spite of having more difficuluties programming wise, Synchron sounds way better. 

Also, players even holding long tones will use the room to create a fuller more expressive sound. They can't do that in a bone dry room. 

It's always a trade off and it always should be. Either great sounding samples that are less flexible or dry samples that don't sound as good but are more nimble. Adding IR's to the dry sound does do wonders but it will imo never come up to the level of sounding like it was recorded in a great space but always that the dry sample was placed in a great sounding space.

A great trade off and one that I used when recording The Herring clarinet for embertone is that I used some good mics relatively close but my room has a lot of high ceilings and reflections. What came out was I was able to get the benefits of using the room and blending it with my sound to make the instrument fuller and more resonant but not too much of the sound of the room itself got recorded. So it's dry enough that you can use IR's to place them anywhere but I was able to get the full sound of the clarinet if that make any sense at all. So hard to explain this stuff in words.


----------



## José Herring (Aug 25, 2022)

Wallander said:


> Anechoic+IR may work quite well with brass, somewhat lesser with woodwinds, and very poorly with strings. With that being said, music is art, and everything is an approximation. If anechoic+IR works for you musically, you should use it. I'm only saying wet and dry samples are not interchangeable.


I would argue almost the exact opposite. Anechoic+IR don't work as well with brass in an orchestral context. The instruments that seem to not suffer as much are actually woodwinds. Solo strings could do well but not sections.


----------



## Wallander (Aug 25, 2022)

José Herring said:


> I would argue almost the exact opposite. Anechoic+IR don't work as well with brass in an orchestral context. The instruments that seem to not suffer as much are actually woodwinds. Solo strings could do well but not sections.


Are you sure about that? I would agree if we talked a very short tail like the Vienna Silent Stage, but not anechoic as in Sample Modeling or WIVI, where the brass was always more successful. A handful of diffuse reflections goes a long way for the timbre of woodwinds or solo strings.


----------



## José Herring (Aug 25, 2022)

Wallander said:


> Are you sure about that? I would agree if we talked a very short tail like the Vienna Silent Stage, but not anechoic as in Sample Modeling or WIVI, where the brass was always more successful. A handful of diffuse reflections goes a long way for the timbre of woodwinds or solo strings.


No I'm not sure. I know very little about it other than what i've heard. 
The problems with Sample Modeling woodwinds imo is that it sounds like they assumed they could use the same recording techniques that they used on the brass. 

I know nothing about WIVI other than you own it. I'd actually like to have an in depth discussion on how woodwinds get recording for libraries like yours. In truth no instrument should fair better with close micing techniques than woodwinds if done correctly. With woodwinds recorded well, you could eliminate the room influence all together and not suffer too much but the mics need to be setup in the right way and what mics you use makes a huge difference.


----------



## justthere (Aug 25, 2022)

José Herring said:


> Sample Modeling woodwinds


Do you mean AudioModeling woodwinds? SampleModeling only has the saxes from a while back. 

For my part I have very much enjoyed using modeled instruments with IR’s - for me, the only way to make modeled tubas sound like anything is by using a lot of a great impulse, so the sound can develop a bit - in the way that one would use tree mics or the like for the body of the instrument and close mics for definition. But it is all about the sound of the room the IR captures. The challenge is that many of the great rooms for score (not concert halls, but score) are not available as IR’s because they would lose business - and I can’t fault that at all. It’s only because Audio Ease got in early that we have the Fox stage etc. And there are rooms in Spaces as well. I like the character of the captures in some of those - they give weight to modeled instruments on a par with, and sometimes surpassing, traditional instruments, and you don’t lose the agility or continuously variable dynamics and pitch as has been mentioned previously.

Wondering now about impulse capture: what of a full-range (or instrument full range) multi-speaker array pointed in 6 (top, bottom, left, right, front, back) directions (or more) with each speaker filtered according to the response of an instrument from those directions, outputting the impulse; and captures of IR’s from all mics that one would have up? That way you would have the sound of a close mic (not just a dry instrument, but the sound/response of a smaller condenser), and the section mics, the tree, and the outriggers and rears also. It wouldn’t be reproducing the room from any angle someone might choose, but rather from the perspective that reproduces a film orchestra. Alternatively, filtering within the software on the source before hitting the generated IR’s (or modeled reflections from the room surfaces, with some feedback mechanism so they could all interact) might do. 

I’m truly not a fan of most immersive formats. I find them to be of limited use in an orchestral setting - having worked extensively on surround projects and supervising the multichannel mixes of material that has previously been stereo (and we only have two ears), and having witnessed some truly awful surround mixes (a theatrical release of Disney’s Fantasia where the French Horns flew around the room being a notable example), my thought is that though it’s great to be free to place things anywhere and move things and all that for effects and dialogue, most people don’t respond so well to it being done with music, especially when music isn’t the focal point in film and tv - dialogue and story, and not being distracted by clever musical prestidigitations, are always the key. Unless it’s Tomita, most of us need a good orchestra sound that’s relatively static; and if one is a composer rather than a mix engineer, one likely needs these tools to be efficient enough to be used in real-time with very little added latency. Maybe building from what we need to hear (and do) will be more efficient than 100% accurate 360 models of a space, especially if the room doesn’t have the added character one wants. That’s not a criticism of any existing product at all - but I would like a simpler alternative.


----------



## Wallander (Aug 26, 2022)

José Herring said:


> No I'm not sure. I know very little about it other than what i've heard.
> The problems with Sample Modeling woodwinds imo is that it sounds like they assumed they could use the same recording techniques that they used on the brass.
> 
> I know nothing about WIVI other than you own it. I'd actually like to have an in depth discussion on how woodwinds get recording for libraries like yours. In truth no instrument should fair better with close micing techniques than woodwinds if done correctly. With woodwinds recorded well, you could eliminate the room influence all together and not suffer too much but the mics need to be setup in the right way and what mics you use makes a huge difference.


I happily concede that it's debatable what instrument family sounds better with an IR. 

My point is that the natural tail starts with a direct sound and progressively grows richer. The convolution reverb starts with a direct sound and progressively grows duller. The former sounds airy while the latter sounds choked. 

You can always improve the recording technique, but it won't help against this inverted relationship.


----------



## cpoessnicker (Aug 26, 2022)

Wallander said:


> It’s impossible to correct in the IR.
> 
> The problem is that the reverb tail should be sonically richer than the direct sound. It’s the type of process you need A.I. for. You need to invent things that are not in the dry sound, improving it.


I am sorry that my video title is a bit misleading, but I think you misunderstood my post. I am not using IR nor algorithms. I am using *wavetracing* which is the audio wave equivalent of raytracing. There does not need to be any correction as the audio and its reflections are calculated on a single ray basis. The omnidirectional emission parameters can be adjusted, which is something you’ll find in games quite often. So, if you get a good emission approximation it should be pretty accurate. It is even a lot more accurate and a lot simpler than doing it in real life as the need to fiddle around with speakers, noise, etc. falls away. It also works quite well with immersive formats as you can create your own microphone positions inside the room with each getting position, room and instrument specific reflections.


----------



## Wallander (Aug 26, 2022)

cpoessnicker said:


> I am sorry that my video title is a bit misleading, but I think you misunderstood my post. I am not using IR nor algorithms. I am using *wavetracing* which is the audio wave equivalent of raytracing. There does not need to be any correction as the audio and its reflections are calculated on a single ray basis. The omnidirectional emission parameters can be adjusted, which is something you’ll find in games quite often. So, if you get a good emission approximation it should be pretty accurate. It is even a lot more accurate and a lot simpler than doing it in real life as the need to fiddle around with speakers, noise, etc. falls away. It also works quite well with immersive formats as you can create your own microphone positions inside the room with each getting position, room and instrument specific reflections.


Right, but aren't you feeding it an orchestral recording that only captured one direction of the sound?


----------



## cpoessnicker (Aug 26, 2022)

Wallander said:


> Right, but aren't you feeding it an orchestral recording that only captured one direction of the sound?


Yes, you are right and I get your point! As the samples weren't completely dry to begin with you could argue that the room additions are already in there (with all their disadvantages ofc. making my point pretty unusable in this specific discussion) but it’s a rabbit hole. Thank you for pushing 😊 multiple close microphone positions like someone wrote earlier could help with this and I would bet that you could get a convincing and aesthetically pleasing sound without going full 360 degree simulation/ recording


----------



## Wallander (Aug 26, 2022)

cpoessnicker said:


> Yes, you are right and I get your point! As the samples weren't completely dry to begin with you could argue that the room additions are already in there (with all their disadvantages ofc. making my point pretty unusable in this specific discussion) but it’s a rabbit hole. Thank you for pushing 😊 multiple close microphone positions like someone wrote earlier could help with this and I would bet that you could get a convincing and aesthetically pleasing sound without going full 360 degree simulation/ recording



I'm sure your model is perfect. 

All room models (and IR reverbs) suffer from this limitation. The physical model is not that of an acoustic instrument playing in a big room, but it's a physical model of a loudspeaker playing music in a big room. It's a model of a nightclub, lacking a better analogy.


----------



## Rilla (Aug 26, 2022)

Wallander said:


> I'm sure your model is perfect.
> 
> All room models (and IR reverbs) suffer from this limitation. The physical model is not that of an acoustic instrument playing in a big room, but it's a physical model of a loudspeaker playing music in a big room. It's a model of a nightclub, lacking a better analogy.


What if you built 3d modeled instruments _inside_ of unreal engine that used real-world physics??


----------



## ModalRealist (Aug 26, 2022)

If I was going to get into this business (which I’m not), I’d be looking for the most efficient way to specifically address the resulting sonic differences between real room reverberation from an acoustic sound source (as excellently outlined by Arne) and existing approaches to artificial reverberation.

My basic understanding is that there are two features of real rooms that make them so rich. Firstly, the room itself adds to the sound, presumably because it itself oscillates in response. Secondly, both the instrument and the room are not single objects, but really comprised of millions of facets: the many different specific pressure waves emanates by the instrument (which we call directivity, etc), and the innumerable different specific ways in which individual points of the room will add to that sound when it reaches them.

Now, simulating a real room seems like folly (as does simulating the entire physical output of a real instrument). But what could be more promising is finding out how much of this parallel conplexity must be simulated in order to start getting the desired end effect: human ears cannot hear all the wonderful sonic complexity of these real spaces.

So you need some measure of multiple outputs for a single source (directivity) and you need sufficient parallel reverberation and you need that “reverberation” to be such that it reacts to the sound, and does so over time (I.E. that it “speaks” back into the sound, and that like other instruments, this reaction is not uniform and static but alive).

I have to say that I’ve been blown away by how “real” Mir 3D can sound. I don’t think(?) it addresses the points above, but it does show that more parallel processing even using traditional IR techniques (I don’t pretend to understand how the Ambisonics stuff allows all those IRs to be coordinated) does get us closer to a feeling of a real space.

I don’t think for a moment that in the box stuff will ever sound as good as real recordings, for a huge number of reasons. However I do have an analogy:

There’s a reason Tom Cruise does all that real stunt work in Mission Impossible etc. It’s just more alive. Doesn’t matter how good the CG is. It’s living, breathing, blood-pumping stuff. That’s what real recordings do - in all genres!

Personally - and this is off-topic - I think a lot of orchestras are missing out on a big market in commissioning and recording new works in genres that are currently catered for largely by dodgy samples and synths. I guess the numbers just don’t stack up. But is another recording of Beethoven really that profitable anymore?!


----------



## zigzag (Aug 26, 2022)

Rilla said:


> What if you built 3d modeled instruments _inside_ of unreal engine that used real-world physics??


In that case, you can't use close mic samples as these samples already contain a recorded single perspective of an instrument's body producing sound. You can't undo what's already captured in samples. 3D modeled instruments would fall under physical modeling of instruments and that is a whole other topic compared to just reverberation.


----------



## Wallander (Aug 26, 2022)

Rilla said:


> What if you built 3d modeled instruments _inside_ of unreal engine that used real-world physics??


Unfortunately, that problem is much too challenging to solve in any musically meaningful manner.

It’s difficult enough to do vanilla physical modeling.


----------



## shawnsingh (Aug 26, 2022)

Disclaimer - I don't have real-world experience with sampling instruments or IRs. I'm sure there are pain points in real world compared to the theory.

But still, in theory - impulse responses _*can be*_ a perfect mathematical characterization of reverb. In the signal processing math, an IR is an exact representation of a linear system, and an acoustics scenario like this is a linear system.

The real underlying problem is that all reverbs today use input audio that has already been "flattened" to mono or stereo already. The 3d sound field information is completely lost before the sound even goes into the reverb. VSL MIR is the closest thing we have to a solution - using directivity profiles to "up convert" stereo samples into an approximation of the 3d sound field.

I think it would be very interesting if a developer had the resources to experiment - sampling the 3d sound field of an instrument at a very high order (e.g. 7th order spherical harmonics would be 49 channels), and correspondingly faithful IR captures to represent a room. I have a feeling that would have the richness and bloom which Wallander shows is missing.

Of course editing 50 channels per sample, and doing 50 channels of convolution per voice will be a difficult sales pitch for real software. Maybe there could be some innovations that bring this holy grail to reality some day.


----------



## fan455 (Aug 26, 2022)

shawnsingh said:


> Disclaimer - I don't have real-world experience with sampling instruments or IRs. I'm sure there are pain points in real world compared to the theory.
> 
> But still, in theory - impulse responses _*can be*_ a perfect mathematical characterization of reverb. In the signal processing math, an IR is an exact representation of a linear system, and an acoustics scenario like this is a linear system.
> 
> ...


Maybe it's more practical to sample in mono as dry as possible, pan and delay the different samples differently in 50 channels, and then convolve with the 50-channel impulse response?


----------



## justthere (Aug 26, 2022)

I still doubt the absolute necessity of a huge number of angles here. The instruments still have to be playable through it or it’s only mixing tech. So I ask the knowledgeable - why wouldn’t it work to model (algorithmically, based on IR captures and detailed measurements) the reflections of six surfaces individually, with pre filtering for the (dry or preferably anechoically recorded or rendered) source instrument based upon how the instrument sounds from that direction, and have a cross feed matrix so that sounds that hit the back wall can also bounce off of the side walls, variably according to position?
@Dietz @Wallander


----------



## shawnsingh (Aug 26, 2022)

fan455 said:


> Maybe it's more practical to sample in mono as dry as possible, pan and delay the different samples differently in 50 channels, and then convolve with the 50-channel impulse response?


But isn't this just the idea of dry sampling? I don't think 50 channels of IR will make much difference if each instrument is modeled as only mono - this would already flatten the 3d sound information from the instrument. The change in sound of 50 different directions from an instrument would be much more than just delay - the frequencies present would be completely different, too


----------



## fan455 (Aug 26, 2022)

It's confusing to me that convolution reverb using impulse response can be explained differently in 2 domains. In time domain, it creates echoing effect. And in frequency domain, it changes the frequencies' balance. Fourier transform reveals that time-domain convolution is frequency-domain multiplication.


----------



## fan455 (Aug 26, 2022)

shawnsingh said:


> But isn't this just the idea of dry sampling? I don't think 50 channels of IR will make much difference if each instrument is modeled as only mono - this would already flatten the 3d sound information from the instrument. The change in sound of 50 different directions from an instrument would be much more than just delay - the frequencies present would be completely different, too


Maybe panning and delay could make the mono audio halfway-3D in a controllable way? And convolution reverb does the rest.


----------



## fan455 (Aug 26, 2022)

shawnsingh said:


> But isn't this just the idea of dry sampling? I don't think 50 channels of IR will make much difference if each instrument is modeled as only mono - this would already flatten the 3d sound information from the instrument. The change in sound of 50 different directions from an instrument would be much more than just delay - the frequencies present would be completely different, too


I understand now! An instrument is not a spot. It sounds different in different directions.


----------



## shawnsingh (Aug 26, 2022)

fan455 said:


> It's confusing to me that impulse response can be explained differently in 2 domains. In time domain, it creates echoing effect. And in frequency domain, it affects the frequencies' balance. Fourier transform reveals that time-domain convolution is frequency-domain multiplication.


Yes, it is confusing, takes some thinking time to feel it. A linear system affects both (a) the gain of individual frequencies and (b) the delay/phase of individual frequencies. It's just convenient for people to demonstrate the time effect by showing the time domain, and the frequency effect by showing the frequency domain. But if you see the right examples, you can also see the frequency effect in the time domain - for example, a low pass filter is a linear system. Open up an oscilloscope in your DAW and you can easily see a square or sawtooth wave getting smoothed out by a low pass filter.

It's a little harder to see the effect of phase/delay in the frequency domain, because in audio software the phase part of the frequency response is not usually shown, they only show the amplitude response. The phase response is also unintuitive to interpret, personally it's on my todo list to learn more about it =)


----------



## justthere (Aug 27, 2022)

justthere said:


> I still doubt the absolute necessity of a huge number of angles here. The instruments still have to be playable through it or it’s only mixing tech. So I ask the knowledgeable - why wouldn’t it work to model (algorithmically, based on IR captures and detailed measurements) the reflections of six surfaces individually, with pre filtering for the (dry or preferably anechoically recorded or rendered) source instrument based upon how the instrument sounds from that direction, and have a cross feed matrix so that sounds that hit the back wall can also bounce off of the side walls, variably according to position?
> @Dietz @Wallander


As an addendum - I get that when you put up a mic in a room and record an instrument, you are getting not just the instrument but reflections from 360 degrees around, even if it all ends up in a single microphone; so even emulating a close mic on an anechoic source means having all of that information available. But when reproducing the instrument in a room from multiple perspectives, isn’t each perspective essentially receiving the same information with varying degrees of filtering, phase shift and delay? And thus would it be more or less DSP-intensive to do the “differences” per microphone position based on a single room map, as opposed to multiple IR’s? I’m trying to figure out ways that the computations could be reduced and thus real-time use would be more of a thing.


----------



## Wallander (Aug 27, 2022)

justthere said:


> I still doubt the absolute necessity of a huge number of angles here. The instruments still have to be playable through it or it’s only mixing tech. So I ask the knowledgeable - why wouldn’t it work to model (algorithmically, based on IR captures and detailed measurements) the reflections of six surfaces individually, with pre filtering for the (dry or preferably anechoically recorded or rendered) source instrument based upon how the instrument sounds from that direction, and have a cross feed matrix so that sounds that hit the back wall can also bounce off of the side walls, variably according to position?
> @Dietz @Wallander


It's not the incoming angles of the IR that are missing. It's the outgoing angles of the instrument.


----------



## justthere (Aug 27, 2022)

Wallander said:


> It's the outgoing angles of the instrument.


Yes, so (as I referenced before) can one create filter contours for the sides of the (anechoic) instrument facing each surface and apply them so that each surface is getting that sound and reflecting it before interacting? Might that be close enough? Record a violin on all sides to determine the differences between them, and apply those differences to the anechoic instrument before hitting the ambience?


----------



## shawnsingh (Aug 27, 2022)

justthere said:


> Record a violin on all sides to determine the differences between them, and apply those differences to the anechoic instrument before hitting the ambience?


Seems very much like the directivity profiles in MIR.

One thing is that basic filtering and delaying wouldn't be right to model all the differences of sound coming from different directions. There would probably be new frequencies popping in and out too, which is something that linear operations like delay and filtering can't do.

In my unqualified opinion, MIR is already that balance between practical software and high fidelity realistic reverb and avoiding diminishing returns. It is built on well-thought-out approximations to the math of room acoustics.


----------



## justthere (Aug 27, 2022)

shawnsingh said:


> Seems very much like the directivity profiles in MIR.
> 
> One thing is that basic filtering and delaying wouldn't be right to model all the differences of sound coming from different directions. There would probably be new frequencies popping in and out too, which is something that linear operations like delay and filtering can't do.
> 
> In my unqualified opinion, MIR is already that balance between practical software and high fidelity realistic reverb and avoiding diminishing returns. It is built on well-thought-out approximations to the math of room acoustics.



It has amazing functionality; but it’s not so much a solution for real-time work as far as I can tell - at least nobody has said any differently. And tbh the rooms are not what I’m looking for. 

I’m just hoping for an algorithmic solution that’s less DSP-hungry than all the IR’s, and trying to think of a way that’s doable. 

I would also guess - as I’m not a developer in this area - that a model of a room based on convolution or any other map of reflections will absolutely boost and cut frequencies in a similar way to a room - though surely somewhat more simply. I mean, since applying a delay to a sound as a plug-in will reinforce and cancel, why would a reverb not? They already do.


----------



## Knomes (Aug 27, 2022)

justthere said:


> and thus real-time use would be more of a thing.


I don't get why real-time is a necessity.

If a software (reverb in this case) obtains results that are much better than the competition at the cost of having to render the audio (even for hours), I would buy it.


----------



## justthere (Aug 27, 2022)

Knomes said:


> I don't get why real-time is a necessity.
> 
> If a software (reverb in this case) obtains results that are much better than the competition at the cost of having to render the audio (even for hours), I would buy it.


For my use-case, it's important because I play expressive modeled instruments as part of my composing process, and I mix as I go. And since I work on a tight deadline and have to get through quite a bit of music, I don't want to have to hear something differently while I'm performing it than the way it's going to sound in the final. I genuinely don't have those hours you are talking about. Beyond all of those realities - I just want to hear things like they are supposed to sound when I play them. To a degree it would be like plugging your electric guitar in and playing the part you want dry and direct, and then having to go back and hear how it sounds coming out of an amp simulator. Having to deal with lots of latency while performing is unpleasant, and even subtly leaning away from performance in composition because of that is something I'm going to avoid at all costs.


----------



## Wallander (Aug 27, 2022)

justthere said:


> Yes, so (as I referenced before) can one create filter contours for the sides of the (anechoic) instrument facing each surface and apply them so that each surface is getting that sound and reflecting it before interacting? Might that be close enough? Record a violin on all sides to determine the differences between them, and apply those differences to the anechoic instrument before hitting the ambience?


Unfortunately, no. The side sound isn't a basic filtered version of the front sound. The sound in different angles varies greatly, and in a complex and unpredictable manner that changes with every note and the tilt of the instrument.

In practice, it’s a quasi random error that changes from note to note. You're not simply missing highs or lows. It's more like your note is missing some 554 Hz, 1385 Hz, and 2493 Hz, while you have too much of 832 Hz and 1939 Hz. Every angle has a different pattern of error. The natural room tail has the advantage of being the _average_ of all the angles, which effectively eliminates the error. The natural tail doesn’t have too much or too little of any frequency. That’s why the natural tail sounds richer and smoother than the direct sound from any given angle.


----------



## Saxer (Aug 28, 2022)

What I like to do is using modeled instruments stacked with the room mike of sampled instruments. Though it's not the "real" verb of this specific instrument it's at least a real space around the modeled dry source. It works quite well with instruments that behave similar (like Caspian with Samplemodeling Brass). It certainly has it's limits as both instruments aren't made for that purpose but it's a nice way of cheating.


----------



## Rilla (Aug 28, 2022)

Saxer said:


> What I like to do is using modeled instruments stacked with the room mike of sampled instruments. Though it's not the "real" verb of this specific instrument it's at least a real space around the modeled dry source. It works quite well with instruments that behave similar (like Caspian with Samplemodeling Brass). It certainly has it's limits as both instruments aren't made for that purpose but it's a nice way of cheating.



What limits have you run into? This certainly sounds like a promising compromise.


----------



## mybadmemory (Aug 28, 2022)

Saxer said:


> What I like to do is using modeled instruments stacked with the room mike of sampled instruments. Though it's not the "real" verb of this specific instrument it's at least a real space around the modeled dry source. It works quite well with instruments that behave similar (like Caspian with Samplemodeling Brass). It certainly has it's limits as both instruments aren't made for that purpose but it's a nice way of cheating.


I’ve been doing the same on occasion. Use a wet library as a reverb for a dry library. Works surprisingly well.


----------



## Wallander (Aug 28, 2022)

Saxer said:


> What I like to do is using modeled instruments stacked with the room mike of sampled instruments. Though it's not the "real" verb of this specific instrument it's at least a real space around the modeled dry source. It works quite well with instruments that behave similar (like Caspian with Samplemodeling Brass). It certainly has it's limits as both instruments aren't made for that purpose but it's a nice way of cheating.





Saxer said:


> What I like to do is using modeled instruments stacked with the room mike of sampled instruments. Though it's not the "real" verb of this specific instrument it's at least a real space around the modeled dry source. It works quite well with instruments that behave similar (like Caspian with Samplemodeling Brass). It certainly has it's limits as both instruments aren't made for that purpose but it's a nice way of cheating.


I like this way of thinking. That’s very clever and also doable.


----------



## justthere (Aug 28, 2022)

Wallander said:


> Unfortunately, no. The side sound isn't a basic filtered version of the front sound. The sound in different angles varies greatly, and in a complex and unpredictable manner that changes with every note and the tilt of the instrument.
> 
> In practice, it’s a quasi random error that changes from note to note. You're not simply missing highs or lows. It's more like your note is missing some 554 Hz, 1385 Hz, and 2493 Hz, while you have too much of 832 Hz and 1939 Hz. Every angle has a different pattern of error. The natural room tail has the advantage of being the _average_ of all the angles, which effectively eliminates the error. The natural tail doesn’t have too much or too little of any frequency. That’s why the natural tail sounds richer and smoother than the direct sound from any given angle.


Quasi-random? You are saying that something might be down 30dB at 1100hz for one note but only down 5 on another? I would think that isn't random, but just the interaction of modes in the instrument based upon its dimensions, and thus quantifiable. But I wonder if this goes too far. I mean, it's true that in an absolute sense, when one puts up a microphone, all sides of an instrument will end up in the microphone, as they would in the ear - some in the form of direct sound, some as first reflections from the floor, and so on. What I wonder is, how much of that has to be absolutely recreated in order to make a good sound that feels authentic spatially? Maybe it all doesn't. Maybe a different directional mode technique would make MIR better for you. I don't know, but it's worth wondering about.

Remember ages ago when Apple made the PowerPCs? MOTU came out with the Masterworks compressor, which was cool - the only problem was it brought almost any computer that would run it to its knees. A lot of things like MIR and the others are great for mixers but less so for composers who do realtime - unless I'm wrong and Cubase would make an instance of MIR on a fast computer have negligible latency on the record track only. (and "negligible" is to a degree subjective.) I keep watching computers get more powerful and developers making things that are just a little too hungry for the current crop of machines. Would love to find a way around that. I very much appreciate your insights.


----------



## justthere (Aug 28, 2022)

Saxer said:


> What I like to do is using modeled instruments stacked with the room mike of sampled instruments. Though it's not the "real" verb of this specific instrument it's at least a real space around the modeled dry instrument.


So that works either: as long as the sampled library is exactly in tune with the other, or it will sound like more voices, or like, say, spitfire's libraries do when crossfading single instruments with vibrato; or as long as that extra girth is okay with you. I'm not judging - I can see it could be a cool sound. But we are getting away from accuracy, if you like the sound of the modeled instrument and if that's what you want.


----------



## Rilla (Aug 28, 2022)

justthere said:


> So that works either: as long as the sampled library is exactly in tune with the other, or it will sound like more voices, or like, say, spitfire's libraries do when crossfading single instruments with vibrato; or as long as that extra girth is okay with you. I'm not judging - I can see it could be a cool sound. But we are getting away from accuracy, if you like the sound of the modeled instrument and if that's what you want.


It seems the best workflow would be to program with the sample library first, utilizing the available articulations. Then replace the close mics with modelled instruments. That way the you don't venture outside of the confines of the sample library.

Geez, this is a fascinating thread! I should be sleep right now.


----------



## Wallander (Aug 28, 2022)

justthere said:


> Quasi-random? You are saying that something might be down 30dB at 1100hz for one note but only down 5 on another? I would think that isn't random, but just the interaction of modes in the instrument based upon its dimensions, and thus quantifiable. But I wonder if this goes too far. I mean, it's true that in an absolute sense, when one puts up a microphone, all sides of an instrument will end up in the microphone, as they would in the ear - some in the form of direct sound, some as first reflections from the floor, and so on. What I wonder is, how much of that has to be absolutely recreated in order to make a good sound that feels authentic spatially? Maybe it all doesn't. Maybe a different directional mode technique would make MIR better for you. I don't know, but it's worth wondering about.
> 
> Remember ages ago when Apple made the PowerPCs? MOTU came out with the Masterworks compressor, which was cool - the only problem was it brought almost any computer that would run it to its knees. A lot of things like MIR and the others are great for mixers but less so for composers who do realtime - unless I'm wrong and Cubase would make an instance of MIR on a fast computer have negligible latency on the record track only. (and "negligible" is to a degree subjective.) I keep watching computers get more powerful and developers making things that are just a little too hungry for the current crop of machines. Would love to find a way around that. I very much appreciate your insights.


The directional error is not random, but it’s unpredictable and granular enough to appear random to an observer.

High-quality musical instruments produce a smooth timbre. It’s just chopped up into highly granular fragmented pieces (on the frequency axis) and projected in different directions. If you don’t have a room to collect the sum of sound energy from the instrument, and project it to you as an observer, the sideways information is lost in space. The crude directional sound is just a slice of the cake, and it's burdened by spectral errors.

Yes, there could be a -30 dB difference in a blind spot for a frequency, as will show in the FFT. You may not hear it outside of an A/B test because your brain adapts and fills in the blanks, but the sensation of smoothness is when your brain doesn’t have to make all these micro-adjustments.


----------



## Saxer (Aug 28, 2022)

Rilla said:


> What limits have you run into? This certainly sounds like a promising compromise.


The limits are in the different behavior of non related libraries. i.e. SM brass plays different legato speeds up to portamento controlled by velocity. Caspian doesn't follow there. I need three SM solo trumpet tracks but have only one Caspian trumpet reverb. Audiomodeling solo woodwinds sound good with the angry woodwinds room but angry woodwinds don't have vibrato so it sounds like a section when adding vibrato. Things like that. The BBCSO performance legato patches work great with Audiomodeling or Samplemodeling strings. But for short notes passages pure samples are easier to use.

As a developer of modeled and sampled instrument (that I am not) it should be possible to get both worlds closer together. But it's a lot of investment as there is twice the development time and cost for a single library.


----------



## Knomes (Aug 28, 2022)

justthere said:


> For my use-case, it's important because I play expressive modeled instruments as part of my composing process, and I mix as I go. And since I work on a tight deadline and have to get through quite a bit of music, I don't want to have to hear something differently while I'm performing it than the way it's going to sound in the final. I genuinely don't have those hours you are talking about. Beyond all of those realities - I just want to hear things like they are supposed to sound when I play them. To a degree it would be like plugging your electric guitar in and playing the part you want dry and direct, and then having to go back and hear how it sounds coming out of an amp simulator. Having to deal with lots of latency while performing is unpleasant, and even subtly leaning away from performance in composition because of that is something I'm going to avoid at all costs.


I get that is not something useful for everyone. But the lack of software working this way completely puzzles me.

I am thinking about software that has a real-time mode that is just an approximation and the real mode that takes even hours to render. Modeled instruments seem exactly the perfect example, I'm my opinion.

Pianoteq, for example, is very good, but I cannot imagine the constraints in the physical modeling that real-time computation has put into its development. Now imagine that when you are done with the performance you can switch to "Pianoteq computationally-hard mode", the same 2-minutes piece of music gets rendered in 4 hours, but the result is so much better than "Pianoteq real-time mode". I think a lot of people would like something like this, I cannot believe that I am the only one that would like or find useful something like this.


----------



## zigzag (Aug 28, 2022)

Knomes said:


> I get that is not something useful for everyone. But the lack of software working this way completely puzzles me.
> 
> I am thinking about software that has a real-time mode that is just an approximation and the real mode that takes even hours to render. Modeled instruments seem exactly the perfect example, I'm my opinion.
> 
> Pianoteq, for example, is very good, but I cannot imagine the constraints in the physical modeling that real-time computation has put into its development. Now imagine that when you are done with the performance you can switch to "Pianoteq computationally-hard mode", the same 2-minutes piece of music gets rendered in 4 hours, but the result is so much better than "Pianoteq real-time mode". I think a lot of people would like something like this, I cannot believe that I am the only one that would like or find useful something like this.


For instruments/effects with a realtime and a non-realtime high quality modes, DAWs could automatically render HQ version in the background. If track is armed or being edited it would automatically switch to real-time mode, but otherwise it would use cached HQ version, if avaliable. Like an automatic freeze mode.

PS: But then, suddenly, I woke up...


----------



## justthere (Aug 28, 2022)

Wallander said:


> High-quality musical instruments produce a smooth timbre. It’s just chopped up into highly granular fragmented pieces (on the frequency axis) and projected in different directions. If you don’t have a room to collect the sum of sound energy from the instrument, and project it to you as an observer, the sideways information is lost in space. The crude directional sound is just a slice of the cake, and it's burdened by spectral errors.


I was thinking that one would create a six-sided "tone map" from a real instrument, and then apply that to a modeled one (based upon the relationship between the front mic of the real instrument and the model: the modeled instrument would be matched with the front mic tone map, with the differences between the other sides and the front map applied to the matched model to derive the other side-perspectives). To use it with a sampled one would likely better be served by: having first some baseline "ideal" profile of an instrument from the front, and a map of the differences from that on six sides, and that would be its directional profile - and then applying the difference map to the single perspective of the sampled instrument, thus deriving five more angles from the sampled source. Then the software would "bus" those angles appropriately to the sides of the virtual room. So the other sides wouldn't disappear into nowhere - they would go out into the "room" and come back off of the walls and floor and ceiling, full of added reflections and the tonal contour that imparts.

Now I'm clearly no expert in this, but I'm thinking that even though an instrument sounds different from each side, this is not something that can't be quantified - that the dips and peaks are directly derived from the instrument itself, and thus measurable and able to be reproduced. Complex to me, yes, but not inscrutable. Maybe it's not a 100% 1-to-1 model of an instrument in a room but it might be pretty good.

And I love the idea of realtime draft-quality vs. HQ renders at faster than real-time, if the draft-quality is listenable.


----------



## Wallander (Aug 28, 2022)

justthere said:


> I was thinking that one would create a six-sided "tone map" from a real instrument, and then apply that to a modeled one (based upon the relationship between the front mic of the real instrument and the model: the modeled instrument would be matched with the front mic tone map, with the differences between the other sides and the front map applied to the matched model to derive the other side-perspectives). To use it with a sampled one would likely better be served by: having first some baseline "ideal" profile of an instrument from the front, and a map of the differences from that on six sides, and that would be its directional profile - and then applying the difference map to the single perspective of the sampled instrument, thus deriving five more angles from the sampled source. Then the software would "bus" those angles appropriately to the sides of the virtual room. So the other sides wouldn't disappear into nowhere - they would go out into the "room" and come back off of the walls and floor and ceiling, full of added reflections and the tonal contour that imparts.
> 
> Now I'm clearly no expert in this, but I'm thinking that even though an instrument sounds different from each side, this is not something that can't be quantified - that the dips and peaks are directly derived from the instrument itself, and thus measurable and able to be reproduced. Complex to me, yes, but not inscrutable. Maybe it's not a 100% 1-to-1 model of an instrument in a room but it might be pretty good.
> 
> And I love the idea of realtime draft-quality vs. HQ renders at faster than real-time, if the draft-quality is listenable.


There's no simple way to generate the side sound from the front sound. It’s the kind of thing for which you need A.I.

Whatever problem a wet sample library has, it’s easier to solve. The complexity level of modeling very quickly escalates to a point where it’s easier to work with wet samples, and overcome their limitations. It could be easier to make an A.I. generate missing dynamics or articulations in a wet library.

Look at the development of computer graphics. It’s a huge industry. They developed complicated modeling tools for 3D images, games, and movies. Out of nowhere, there are A.I. technologies like Midjourney that makes educated guesses directly in the 2D plane, from a simple text prompt, making images looking as good as or better than a Pixar movie. We’re in a technology shift where A.I. is taking over many roles from physical modeling.


----------



## justthere (Aug 29, 2022)

Wallander said:


> There's no simple way to generate the side sound from the front sound. It’s the kind of thing for which you need A.I.
> 
> Whatever problem a wet sample library has, it’s easier to solve. The complexity level of modeling very quickly escalates to a point where it’s easier to work with wet samples, and overcome their limitations. It could be easier to make an A.I. generate missing dynamics or articulations in a wet library.
> 
> Look at the development of computer graphics. It’s a huge industry. They developed complicated modeling tools for 3D images, games, and movies. Out of nowhere, there are A.I. technologies like Midjourney that makes educated guesses directly in the 2D plane, from a simple text prompt, making images looking as good as or better than a Pixar movie. We’re in a technology shift where A.I. is taking over many roles from physical modeling.


Machine learning would certainly be useful. 

I don’t know about it being easier - I admit that’s based on what I know about, and it’s a broad world and field - but I’m not convinced of the necessity for an absolutely accurate reproduction just because it’s possible - because it would not be usable in real-time for me.


----------



## glyster (Aug 29, 2022)

A lot of great discussions. I'm surprised to see some fairly technical ideas being brought up. I have done a good amount of computer graphics, so I find many similarities between the two art forms (graphics and music).

*How realistic is realistic enough?*
For computer graphics, the bar is set to be photorealistic, i.e., the computer-generated image is indistinguishable to a human observer from a photo taken by a camera. This bar was very high 30 years ago. But many would agree, CG has gotten so good today, this is commonly achievable. (The remaining challenge is CG human, because we are so good at distinguishing real human).

So, the question for virtual instruments is, if we just take a note played by a real instrument and recorded on a stage at various directions/locations, compare to the same instrument sampled and applied with the IR of the same room using something like Space 2 or MIR, can we tell the difference which one is recorded in location vs. processed with a reverb?

I'd love to see a comparison of such. I actually expect VSL and EW already have done similar studies to verify the quality of their product. If we can't tell in a blind test, I'd say we are already good enough!

*Realism vs. Artistry*
I think no one here is really thinking about perfect realism, right? That is simply unrealistic. Even for CG, today we shoot billions of rays and photons in the computer model, complex material modeling to accomplish photorealism. But did you know, we are still so far from modeling the real physics? Even our RGB color space is a much-simplified representation of the color spectrum. In this ray/particle approximation, we could not model dispersion of light, nor fluorescence and polarization. These are doable, but super expensive to compute. If it's really important to see that rainbow after the rain, we have a lot cheaper ways to accomplish that.

The point is, whether it's CG or music, artists are never limited by the physical world. In fact, for many years, physical modeling in CG were not well accepted by artists because the constraints enforced by these physical rules. They make artists life harder. Imagine your paint brush simply can't draw the shadow because it's not physically correct! Our imagination and creativity go beyond what is physically possible. If we really wanted to listen to something super realistic, we should go to a live concert. But often the recording sound better than live! The artists who can out do themselves live are truly talented, and often they are not due to the realism of the instruments.


----------



## zigzag (Aug 30, 2022)

glyster said:


> So, the question for virtual instruments is, if we just take a note played by a real instrument and recorded on a stage at various directions/locations, compare to the same instrument sampled and applied with the IR of the same room using something like Space 2 or MIR, *can we tell the difference which one is recorded in location vs. processed with a reverb*?


This approach doesn't really produce desired information and may lead to wrong conclusions. I see this mistake repeated in many industries, especially with comparision of compression formats. If you are asking which one is recorded in location and which one is processed, or for example which picture is original and which one compressed, your are asking much harder question than just if difference is auditable/visible. What these questions are really asking is, is a person is familiar with characteristics that a specific process produce and if he/she is able to identify them. Many may not be able to identify which is which, but still hear or see a big difference. For example, people may prefer less noisy images and therefore identify compressed picture with noise removed as original.

Here is one test that was posted not long ago: https://vi-control.net/community/threads/the-future-of-virtual-instruments.90836/post-5167977

Sampled instruments with with the IR applied usually sound harsher. Try it on some percussive sample to easily hear the difference. Choose library with multiple mic position, so you have dry (close mic) recording and wet (tree mic) recording. Dry with IR applied will have more defined resonances than wet recording.


----------



## glyster (Aug 30, 2022)

zigzag said:


> This approach doesn't really produce desired information and may lead to wrong conclusions. I see this mistake repeated in many industries, especially with comparision of compression formats. If you are asking which one is recorded in location and which one is processed, or for example which picture is original and which one compressed, your are asking much harder question than just if difference is auditable/visible. What these questions are really asking is, is a person is familiar with characteristics that a specific process produce and if he/she is able to identify them. Many may not be able to identify which is which, but still hear or see a big difference. For example, people may prefer less noisy images and therefore identify compressed picture with noise removed as original.
> 
> Here is one test that was posted not long ago: https://vi-control.net/community/threads/the-future-of-virtual-instruments.90836/post-5167977
> 
> Sampled instruments with with the IR applied usually sound harsher. Try it on some percussive sample to easily hear the difference. Choose library with multiple mic position, so you have dry (close mic) recording and wet (tree mic) recording. Dry with IR applied will have more defined resonances than wet recording.


Even just try to differentiate whether they are similar or not and how much. The test is easy to set up:

1. capture the room IR.
2. set up mics as follows: close mics that captures dry sound, another set of mics in 3 different locations capture the wet sound as baseline.
3. Record all mics during the instrument play.
4. Apply IR (like MIR 3D) to the dry sound in 3 locations. Compare with the actual recording baseline.

I'd be surprised if VSL didn't do something like this, as it's super easy to do, it also give them some confidence on how close their MIR 3D is to the real thing.

Even if the difference can't be heard, since we are capturing the exact same source of sound. This method can tell you exactly what difference there are and how much via sound analysis tools.


----------



## Wallander (Aug 30, 2022)

I posted a comparison of a natural tail and an IR of the same hall in this post:






The future of virtual instruments


@Wallander I have a question. First of all, I want to thank you again for your input in this thread and you enlightening me about IR's. Since then, there has been so much I've noticed when focusing on the way rooms respond to different instruments. In this recording for instance, it seems...




vi-control.net


----------



## justthere (Aug 30, 2022)

glyster said:


> artists are never limited by the physical world. In fact, for many years, physical modeling in CG were not well accepted by artists because the constraints enforced by these physical rules. They make artists life harder.


I disagree entirely - this sounds like philosophy, which I'm good with but less so here in this discussion about tech. Not saying you shouldn't say it, just offering my opinion as well. Artists are almost always limited by the physical world. The world tells an artist what can be accomplished with what's at hand. The very essence of writing for picture is being constrained by what the picture is. If you were doing early CG and didn't like it, it's because you were constrained by it. But even then, you were't just constrained by that - you were constrained by time and money and the state of tech. I think I know what you are getting at, if it's that we should be less concerned with absolute accuracy of models and more about does this sound great - but really, the imagination is almost always constrained by the world. That's not the point here. The point is about whether or not something sounds enough like the desired effect to move forward and be successful as an endeavor. Considering that since people have written tons of music we like with gear we consider to be outdated, it's possible to keep writing; all I'm trying to get to is "what gets me to the point where things sound like I want them to, which more or less is how things sound when they are recorded on a great-sounding scoring stage - without killing my computer, in realtime, with a transparent workflow that doesn't drag me out of composing to deal with configuration and doesn't demand that I write some other way so I can use a tool." So maybe we mostly agree on that.


----------



## justthere (Aug 30, 2022)

Wallander said:


> I posted a comparison of a natural tail and an IR of the same hall in this post:
> 
> 
> 
> ...


You live this as a developer, which I utterly respect. But the two variables here for me are: I don't much like hearing that source sound, and I don't know anything about the reverb capture that was done or how it's integrated with the instrument sound. I have heard captures I very much like and some I really, really don't. Can you describe further what we are hearing - how it was done?


----------



## Wallander (Aug 30, 2022)

justthere said:


> You live this as a developer, which I utterly respect. But the two variables here for me are: I don't much like hearing that source sound, and I don't know anything about the reverb capture that was done or how it's integrated with the instrument sound. I have heard captures I very much like and some I really, really don't. Can you describe further what we are hearing - how it was done?


The natural tail is a wet sample. 

The IR tail is the exact same sample but truncated at the time of release, and routed through an IR of the same hall. 

What you should hear is that the natural tail is smooth and more noise-like in quality, a quality which is the manifestation of all overtones being balanced in the sound. The tail is brighter than the direct sound. 

The IR tail is dull and with strong tonal components inherited from the dry sound. The tail is darker than the direct sound.


----------



## justthere (Aug 30, 2022)

Thanks! So the IR was generated by a sweep from the same position as the sample’s room perspective was?


----------



## Wallander (Aug 30, 2022)

justthere said:


> Thanks! So the IR was generated by a sweep from the same position as the sample’s room perspective was?


I believe so, yes.

With that being said, the ambience of a room is almost position-agnostic. When you're paying for a good seat, you're only paying for the first 100 milliseconds.


----------



## justthere (Aug 30, 2022)

Wallander said:


> I believe so, yes.
> 
> With that being said, the ambience of a room is almost position-agnostic. When you're paying for a good seat, you're only paying for the first 100 milliseconds.



Well, it’s apples to apples if the position is exactly the same [EDIT - and if the mics are the same and the preamps are the same etc.] - I guess we are all tolerant of some approximation. But in the past I have used one program for ER’s and one for tail, and as long as they aren’t strikingly different that works pretty well.


----------



## Wallander (Aug 30, 2022)

justthere said:


> Well, it’s apples to apples if the position is exactly the same [EDIT - and if the mics are the same and the preamps are the same etc.] - I guess we are all tolerant of some approximation. But in the past I have used one program for ER’s and one for tail, and as long as they aren’t strikingly different that works pretty well.


It works pretty well, yes, but in an A/B test the difference is striking. You can take the first 100 milliseconds from my example and put it through any IR of your choice. The tail will usually be dark, and never airy.


----------



## Rilla (Sep 4, 2022)

Wallander said:


> It works pretty well, yes, but in an A/B test the difference is striking. You can take the first 100 milliseconds from my example and put it through any IR of your choice. The tail will usually be dark, and never airy.



Have you checked this out? 









Chameleon - Intelligent Reverb Matching


Chameleon is an intelligent audio plugin which uses artificial neural networks to estimate and model the exact reverb content of any source recording.




www.accentize.com





"Chameleon is an intelligent audio plugin which uses artificial neural networks to estimate and model the exact reverb content of any source recording. You can build a reverb profile in seconds and easily apply it to dry studio recordings.

-create unlimited different unique reverbs with a single click
-automatic parameterisation of dry/wet-mixing, stereo-width and pre-delay
-the ideal tool for realistic ADR and foley matching
-useful for creative sound-design or music-production
-extract the natural room-impulse-response of any recording and export as a wav-file"


----------



## glyster (Sep 8, 2022)

I have done some additional testing with VSL libraries. A synchron instrument with close mic + MIR Pro does not sound good at all. SYzd woodwinds instruments + MIR sounds good. Maybe because the tones are simpler?

Others like horn and strings using SY close + mid mics + MIR are still not as good as SY tree mic with or without surrounds. While they sound pretty good with MIR, the tree mics just sound fuller. One thing I'm thinking is to use transient designer to remove the reverb from the tree mics and then use MIR to place them in a different room.


----------



## justthere (Sep 8, 2022)

The Synchron-ized instruments were initially recorded to reproduce the full range of the instrument frequency-spectrum-wise - they are not intended to sound merely close-mic’ed, because the spot mic in an orchestral recording serves a different purpose - they are meant to sound dry. The Synchron library’s close mic is a spot mic intended to add extra clarity to a tree-mic or the like. It’s not recorded with the same kind of mic as a tree mic.


----------



## glyster (Sep 8, 2022)

justthere said:


> The Synchron-ized instruments were initially recorded to reproduce the full range of the instrument frequency-spectrum-wise - they are not intended to sound merely close-mic’ed, because the spot mic in an orchestral recording serves a different purpose - they are meant to sound dry. The Synchron library’s close mic is a spot mic intended to add extra clarity to a tree-mic or the like. It’s not recorded with the same kind of mic as a tree mic.


Yes, this makes sense and also aligns with my simple experiments. Anyone (including myself before trying this) should know we can’t just use the close mic of any wet library (VSL Synchron and other brands) with MIR and expect good results. I just tried with transient designer to remove some of the sustains of the Synchron tree mic then input to MIR to a different room, the result sounds promising.


----------



## Wallander (Sep 8, 2022)

Rilla said:


> Have you checked this out?
> 
> 
> 
> ...


I didn't try it, no, but I understand it as creating traditional impulse responses, but from musical recordings.


----------



## TonalDynamics (Sep 8, 2022)

Dima Lanski said:


> But put a close mic recording of a string section, or especially a brass section, and it sounds flat


In theory shouldn't the IRs themselves be 'close', 'mid', 'far', etc., based on how far the mics were from the speaker (ostensibly emitting pink noise) at the time of recording, regardless of the space?

Actually now that I think about it, most IRs don't even describe the mic distances, they'll just say 'such and such hall' or 'cathedral', or something -- so this is perhaps a curious disconnect you've happened across here.

But yes, as one of the engineers at EW said earlier, the problem with all those mics so close together in the semi-sphere construction will inevitably involve many phasing issues -- _but_ I'm going to be optimistic here and go a step further, and say that _if_ you can find some way to phase align all those IRs, it could in theory result in some kind of sonic breakthrough.

Again, tedious, but if you can make it work you're an innovator.

Best of luck, keep us informed!


----------



## Rilla (Sep 8, 2022)

Wallander said:


> I didn't try it, no, but I understand it as creating traditional impulse responses, but from musical recordings.



I was thinking that this technology, not necessarily the plugin itself, could be used to copy the reverb of each note of an instrument from a sample library, and then whenever each note (dry instrument) is played the reverb for that specific note could be triggered (which would contain the bloom and air). If the reverb can be derived from a musical source (an instrument) rather than a sine sweep from a speaker, then it would sound less like a trumpet being blown through a loudspeaker. A dynamic reverb that responds differently to each note of the instrument.

ehhh, it would be easier to just use the distance mics of a sample library layered with the dry modeled instrument. But I'm excited about the AI exploration because I can see it getting closer to the kind of realism that IR's don't currently provide.


----------



## Wallander (Sep 9, 2022)

Rilla said:


> I was thinking that this technology, not necessarily the plugin itself, could be used to copy the reverb of each note of an instrument from a sample library, and then whenever each note (dry instrument) is played the reverb for that specific note could be triggered (which would contain the bloom and air). If the reverb can be derived from a musical source (an instrument) rather than a sine sweep from a speaker, then it would sound less like a trumpet being blown through a loudspeaker. A dynamic reverb that responds differently to each note of the instrument.
> 
> ehhh, it would be easier to just use the distance mics of a sample library layered with the dry modeled instrument. But I'm excited about the AI exploration because I can see it getting closer to the kind of realism that IR's don't currently provide.


A problem with that approach is that the reverberation (including the bloom and air) is also present during the sustained note. In a concert hall, the ambiance supports the sustained notes by filling in what's missing from the direct sound. So it's not just about tails but the overall timbre. Artificial reverbs don't improve timbre. They just prolong sounds and filter out some highs and lows.


----------



## Dietz (Sep 9, 2022)

glyster said:


> I'd be surprised if VSL didn't do something like this, as it's super easy to do, it also give them some confidence on how close their MIR 3D is to the real thing.


This little example comes quite close to your suggested scenario, I'd say:

-> https://www.synchronstage.com/en/technology#!mir_synchron_tech

... it's not quite what you were asking for as the recordings in the hall are individual performances, not the ones used for MIR. At least it's the same instrument and the same performer. 

_There is also a short, very exposed and obvious proof-of-concept from 2002 or '03 that can be found somewhere on Youtube. It's a trumpet being played in the Mozart Hall of Vienna Konzerthaus, and the same melody being played with samples, positioned and reverberated by MIR (... actually MIR-like impulse responses applied "by hand", since there was no MIR engine back then). During the years I took a lot of flak from the usual know-it-alls for this example, yet it was the last piece of the puzzle that convinced us that the concept was worth trying. 8-)_


----------



## glyster (Sep 9, 2022)

Dietz said:


> This little example comes quite close to your suggested scenario, I'd say:
> 
> -> https://www.synchronstage.com/en/technology#!mir_synchron_tech
> 
> ...


Thank you for confirming this. I listened to the harp examples and they sound great. I think the type of instrument make a big difference. As you know and shown in MIR, each instrument have a different sound dispersion profile. The MIR part of the equation is fairly good. The sound source can be a challenge.

While I understand MIR is intended to be used with the VSL VI type sound sources. Asking MIR to work with thirdparty library is probably too much for VSL. But it'd be a lot more impactful if there are some recommendations or premade mix for the Synchron libraries to use with MIR if the user want to move the instruments to a different room than Synchron. I have a combination of Synchron and Synchron-ized instruments and all 6 room packs. It seems wasteful not to use them together. Maybe something to consider for a future Synchron update  .

As I mentioned in some posts above, I have tried the tree mics with the reverbs removed (via transient designer). This produces a pretty good sound in MIR. If I don't remove the wet reverb, I think I can hear some ringing in the tail when used directly in MIR. But I'm no expert and really not sure if I'm butchering my sound  .


----------



## Dietz (Sep 9, 2022)

glyster said:


> While I understand MIR is intended to be used with the VSL VI type sound sources. Asking MIR to work with thirdparty library is probably too much for VSL.


As a music mixer, I use MIR for non-VSL sources in all my mixes - "real" recordings, most of the time. It's true that there are no detailed presets for 3rd party sources, but nothing prevents you from creating your own settings.  The so-called "General Purpose" instrument profiles were made exactly with this task in mind. In the end it's just about finding a nice-sounding spot on a stage of or your choice, and determining the balance between dry and wet signal.


----------



## glyster (Sep 9, 2022)

Dietz said:


> As a music mixer, I use MIR for non-VSL sources in all my mixes - "real" recordings, most of the time. It's true that there are no detailed presets for 3rd party sources, but nothing prevents you from creating your own settings.  The so-called "General Purpose" instrument profiles were made exactly with this task in mind. In the end it's just about finding a nice-sounding spot on a stage of or your choice, and determining the balance between dry and wet signal.


Good to know how it's used in a pro setting with real recordings. This definitely a path I will explore more. The improvements on the Synchron side are more for ease of use with Synchron libraries. Many of us aren't mixing experts. It could be as simple as a filtered tree mic preset (with reduced wetness). This could be a feature request for VSL, really good to advertise for Synchron + MIR, could motivate more people get MIR or both  , cc: @Ben .


----------

