What's new

My process for building realistic instrumental sections with SWAM and Sample Modeling instruments.

Damn...This is the best tutorial I've read in a while. Thank you kindly Rohan. Truly awesome work. The SWAM Engine guys should hire you for this stuff.
 
This is great! Thanks :)
It prompted me to order a leap motion controller, haha!
I have been using SWAM strings for a while, layered with Dimension strings, which works very well. Am certainly interested in trying this method though :)
Is it difficult setting up the leap motion with swam strings?
Would you be willing to share a basic preset?

Cheers,
Jon
 
My leap motion controller arrived - it was easy to set up and start playing more expressively. I am using it with a Linnstrument, which it plays well with.
It makes dynamic control fast and seamless, and is great for legato playing. Great purchase for $30 used!
The biggest disappointment for me is using it in 'bowing' mode in swam instruments. It seems impossible to get anything other than pp dynamics when trying to bow sustained notes. I have found this to be the case with other expression controllers as well, so I think it is more an issue with the way the swam bowing algorithm works than with the controllers. In order to get even moderate volume, you have to bow extremely fast - so currently it only seems suitable for tremolo playing.
The controller is excellent for more 'conductor' style expression, though :)
 
Last edited:
My leap motion controller arrived - it was easy to set up and start playing more expressively. I am using it with a Linnstrument, which it plays well with.
It makes dynamic control fast and seamless, and is great for legato playing. Great purchase for $30 used!
The biggest disappointment for me is using it in 'bowing' mode in swam instruments. It seems impossible to get anything other than pp dynamics when trying to bow sustained notes. I have found this to be the case with other expression controllers as well, so I think it is more an issue with the way the swam bowing algorithm works than with the controllers. In order to get even moderate volume, you have to bow extremely fast - so currently it only seems suitable for tremolo playing.
The controller is excellent for more 'conductor' style expression, though :)

The problem is the limited resolution of current MIDI. The "bowing" gesture computes the derivative of the input expression (i.e. the position of the bow), which has just 128 values (0 to 127). With so limited resolution, it is like having a bow 5 cm long or even less.
We could increase the sensitivity, i.e. like having a longer bow, but the lowest dynamics sound so bad! It is like having a saw-bow!
There is no solution with the current standard MIDI, using a single 7-bit byte. We are still working on the next major release which will support high-resolution MIDI (i.e. 14-bit: 16384 values).
The problem is that the majority of controllers out there does not support it.

That's why I've pointed to Pen2Bow: finally a controller that exploits the "Bipolar" bowing gesture, and overcomes to the limited length of the "Bowing" gesture.

Best,
Emanuele
 
I haven’t seen much chat here on using SWAM or Sample Modeling instruments in sections, and thought i’d pipe in with my process for anyone who might find it interesting.

Late to this. Thanks for sharing with us! Super interesting. You got the strings to sound so good, at pp especially! I don't have any SWAM strings, but do have the saxes (+ SM brass). It's hard to consistently get the attack right on the saxes (I need a lot more practice!). Not so easy to get the attack to not stick out in an ugly way. Your BC input curve has to be right, but even then sometimes it's like driving a car with overly sensitive power steering; or piloting a helicopter :laugh:. I wonder if it's similar on hard string attacks?
 
The problem is the limited resolution of current MIDI. The "bowing" gesture computes the derivative of the input expression (i.e. the position of the bow), which has just 128 values (0 to 127). With so limited resolution, it is like having a bow 5 cm long or even less.
We could increase the sensitivity, i.e. like having a longer bow, but the lowest dynamics sound so bad! It is like having a saw-bow!
There is no solution with the current standard MIDI, using a single 7-bit byte. We are still working on the next major release which will support high-resolution MIDI (i.e. 14-bit: 16384 values).
The problem is that the majority of controllers out there does not support it.

That's why I've pointed to Pen2Bow: finally a controller that exploits the "Bipolar" bowing gesture, and overcomes to the limited length of the "Bowing" gesture.

Best,
Emanuele

Thanks for the information Emanuele, that makes sense. I think that Gecko sends 14 bit midi - so I'm looking forward to your next release :) Keep up the good work!

Pen2bow does look cool. I don't think I would use an iPad for anything else though!
 
Thanks for this @rohandelivera! I found your Tchaikovsky's video a couple of months ago when I found out about SWAM instruments and you're one of the very few people sharing how they use these instruments with each other to recreate an orchestra.

I'm thinking on diving into doing the same, probably will do some automation to humanize MIDI parameters (like velocity, length, CC's) in order to avoid recording a lot of parts over and over again. I use Reaper so creating a button to execute a couple of scripts in a certain order should do the trick, a little tweaking to the code could change the amount of "humanization" applied to each copy of the original MIDI recording.

I'm even thinking on having multiple buttons to apply different amounts of MIDI humanization to the different subsections of the orchestra (ex: apply 2% to 1st violins and 3.5% to 2nd violins). Have you tried something like this? Would love to develop this idea even further.
 
Thanks for this @rohandelivera! I found your Tchaikovsky's video a couple of months ago when I found out about SWAM instruments and you're one of the very few people sharing how they use these instruments with each other to recreate an orchestra.

I'm thinking on diving into doing the same, probably will do some automation to humanize MIDI parameters (like velocity, length, CC's) in order to avoid recording a lot of parts over and over again. I use Reaper so creating a button to execute a couple of scripts in a certain order should do the trick, a little tweaking to the code could change the amount of "humanization" applied to each copy of the original MIDI recording.

I'm even thinking on having multiple buttons to apply different amounts of MIDI humanization to the different subsections of the orchestra (ex: apply 2% to 1st violins and 3.5% to 2nd violins). Have you tried something like this? Would love to develop this idea even further.
The script-based approach to this is definitely something I've thought about, too, though I haven't had time to actually experiment with it yet. A couple of the thoughts I've had, in no particular order, that might be of interest to you:

You need to be able to generate CC data that resembles an original input but is different enough from it to produce a distinct performance. This means you should probably be looking for ways to abstract important pieces of general info about notes' CC data. For example, with expression data, you'd be looking at things such as average level, level at the start of the note, level at the end of the note, amount of change between the levels at the start and end, min/max values, location of the fastest change in level, etc. Knowing characteristics like that about a note should allow you to tweak those characteristics and recombine them to get a performance that both works for the musical context but also differs from the original. It would also be helpful to have a script to generate data about how multiple performances compare in terms of these characteristics, which would help you find the ideal randomization ranges for these characteristics by looking at how real performances of the same material differ.

You should also be able to abstract a general playing style with regard to how the original follows the tempo -- for example, on loud, short notes, does it generally anticipate the beat? Does this differ from long, quiet notes? How does the average grid adherence change throughout the piece? Like with the CCs, if you can abstract this sort of data from the performance, you should be able to tweak it then reconstruct it all in a way that still makes musical sense.

If you're doing large sections, it might be useful to set the scripts up to accept multiple tracks as inputs then record a couple of takes instead of just one so the scripts have a bit more variety to work with. I expect that even just having a script that generates an average of multiple performances would go a long way in speeding up the whole process even if it didn't do any actual humanization.

If you wanted to get really into it, it might be possible to come up with a quasi-machine learning system that can be trained -- it generates a bunch of possible versions, then you mark the ones that work best, and the system remembers the parameters used in those versions and tries to apply them to other similar situations.

Again, the key to all of this is that you need to be able to abstract meaningful data about the performance rather than just randomizing arbitrary parameters, and I think the way to go about determining what is meaningful in this context is to ask what am I hearing when I listen critically to a performance? and not what parameters does MIDI make it easy to adjust?
 
The script-based approach to this is definitely something I've thought about, too, though I haven't had time to actually experiment with it yet. A couple of the thoughts I've had, in no particular order, that might be of interest to you:

You need to be able to generate CC data that resembles an original input but is different enough from it to produce a distinct performance. This means you should probably be looking for ways to abstract important pieces of general info about notes' CC data. For example, with expression data, you'd be looking at things such as average level, level at the start of the note, level at the end of the note, amount of change between the levels at the start and end, min/max values, location of the fastest change in level, etc. Knowing characteristics like that about a note should allow you to tweak those characteristics and recombine them to get a performance that both works for the musical context but also differs from the original. It would also be helpful to have a script to generate data about how multiple performances compare in terms of these characteristics, which would help you find the ideal randomization ranges for these characteristics by looking at how real performances of the same material differ.

You should also be able to abstract a general playing style with regard to how the original follows the tempo -- for example, on loud, short notes, does it generally anticipate the beat? Does this differ from long, quiet notes? How does the average grid adherence change throughout the piece? Like with the CCs, if you can abstract this sort of data from the performance, you should be able to tweak it then reconstruct it all in a way that still makes musical sense.

If you're doing large sections, it might be useful to set the scripts up to accept multiple tracks as inputs then record a couple of takes instead of just one so the scripts have a bit more variety to work with. I expect that even just having a script that generates an average of multiple performances would go a long way in speeding up the whole process even if it didn't do any actual humanization.

If you wanted to get really into it, it might be possible to come up with a quasi-machine learning system that can be trained -- it generates a bunch of possible versions, then you mark the ones that work best, and the system remembers the parameters used in those versions and tries to apply them to other similar situations.

Again, the key to all of this is that you need to be able to abstract meaningful data about the performance rather than just randomizing arbitrary parameters, and I think the way to go about determining what is meaningful in this context is to ask what am I hearing when I listen critically to a performance? and not what parameters does MIDI make it easy to adjust?

I think your thoughts on the matter are well intended but I personally differ as I think they could provide different results to the ones that, at least in my case, I'm after.

What you describe is a common scenario of data analysis and machine learning, which is often used in software development in order to mimic certain behaviors to the point of accurate replication, very useful if what you want is to define a "performance" of your own and then create an automation system in order to print that same "performance style" into each one of your orchestral productions. On its own that sounds like a cool idea to be honest, but if you think about it an instrument performer should probably realize that you never have a real "performance style" on whatever you do unless you spend a really vast amount of time playing in similar scenarios over and over again (as your performance when practicing an instrument won't probably be the same in a live situation because of multiple variables). That's why people often debate on which live performance by the same musician they like the most (or even, which they disliked the most).

I think what I'm trying to say is that from what I've experienced people make mistakes in their own way but, under pressure, they won't probably make similar mistakes under the same ratio or "randomization range" as you put it.

Coming back to the script topic: I've actually tried this script the other day and seems to be just what I was looking for, I tested it and even duplicated it with a different percentage just to see the changes and it works great.

rather than just randomizing arbitrary parameters

what is meaningful in this context is to ask what am I hearing when I listen critically to a performance? and not what parameters does MIDI make it easy to adjust?

That's actually the part that I need to test now, as my idea was not to randomize arbitrary parameters (MIDI doesn't make any parameter easier than others, your DAW does and in Reaper seems like they're all easily editable) but specific parameters, which demands me checking out all the possible parameters the guys at Audio Modeling made available on their instruments. I would like to create a dedicated version of the script so instead of humanizing the selected CC values it humanizes all the values available on specific CC's. Just to save some more time.

I wish I could not only theorize about this topic but experiment and just say if it works or not, but until customs in my country and my local post office shake hands and decide on giving me the breath controller I was supposed to receive over a month ago I can't put any money where my mouth is. For now, I find the exchange of ideas related to this very appealing and would love to hear more ideas like these.


Edit: Forgot to add the script link, my bad.
 
I think your thoughts on the matter are well intended but I personally differ as I think they could provide different results to the ones that, at least in my case, I'm after.

What you describe is a common scenario of data analysis and machine learning, which is often used in software development in order to mimic certain behaviors to the point of accurate replication, very useful if what you want is to define a "performance" of your own and then create an automation system in order to print that same "performance style" into each one of your orchestral productions. On its own that sounds like a cool idea to be honest, but if you think about it an instrument performer should probably realize that you never have a real "performance style" on whatever you do unless you spend a really vast amount of time playing in similar scenarios over and over again (as your performance when practicing an instrument won't probably be the same in a live situation because of multiple variables). That's why people often debate on which live performance by the same musician they like the most (or even, which they disliked the most).

I think what I'm trying to say is that from what I've experienced people make mistakes in their own way but, under pressure, they won't probably make similar mistakes under the same ratio or "randomization range" as you put it.

Coming back to the script topic: I've actually tried this script the other day and seems to be just what I was looking for, I tested it and even duplicated it with a different percentage just to see the changes and it works great.





That's actually the part that I need to test now, as my idea was not to randomize arbitrary parameters (MIDI doesn't make any parameter easier than others, your DAW does and in Reaper seems like they're all easily editable) but specific parameters, which demands me checking out all the possible parameters the guys at Audio Modeling made available on their instruments. I would like to create a dedicated version of the script so instead of humanizing the selected CC values it humanizes all the values available on specific CC's. Just to save some more time.

I wish I could not only theorize about this topic but experiment and just say if it works or not, but until customs in my country and my local post office shake hands and decide on giving me the breath controller I was supposed to receive over a month ago I can't put any money where my mouth is. For now, I find the exchange of ideas related to this very appealing and would love to hear more ideas like these.


Edit: Forgot to add the script link, my bad.

To me, it looks like the results from that script are likely to be either too similar to the input (if using a low randomization percentage) or too spiky and disjointed to be usable (if using a higher percentage). I'd definitely be interested in hearing anything you come up with using it, though. It may well be that the approach I'm thinking of unnecessarily overcomplicates things.

I think we're talking about two different things when we say parameters, and that's probably my fault for choosing a word that already has a pretty clearly defined meaning in terms of virtual instruments, which is the way you're using it. What I'm getting at when talking about parameters -- characteristics might be a better word -- is that the MIDI spec broadly represents notes by defining certain characteristics about them. Start time, pitch, end time, velocity, and a collection of CC data points being the most important. But the MIDI spec has no inherent representation of things like attack time. If you're using an instrument whose level is primarily controlled by, say, CC11, then understanding the attack characteristics of a MIDI note requires abstracting the early part of a note's CC11 data and interpreting its effect on the instrument. So if we want to randomize the start point of a note, which is directly characterized by the MIDI spec, we look up the start point of the note in the MIDI data and randomize a single number. But if we want to randomize the level or length of the attack portion of a note, we first must decide what collection of MIDI data represents the attack, then come up with an algorithm able to locate that data, then come up with an algorithm that can process the data as a collection rather than as individual numbers. But if the MIDI spec used ADSR envelopes attached to notes rather than individual CC data points, modifying a note's attack characteristics would simply be a matter of editing a couple of easy-to-find numbers like editing the note start is. That's what I'm getting at when I talk about the things that MIDI makes it easy to adjust vs. the things about notes that we actually hear.

I guess that's why I'm skeptical about the usefulness of the above script -- it randomizes the CC points as individual things, but we don't hear the CC points as individual things; we hear broader note characteristics that are built out of the CC points, and I expect that randomizing the characteristics that we hear will give better results than randomizing the individual data points that those characteristics are made from. It's like how changing an image of a face by randomizing the shapes of its features will work better than changing it by randomizing the colors of individual pixels.
 
To me, it looks like the results from that script are likely to be either too similar to the input (if using a low randomization percentage) or too spiky and disjointed to be usable (if using a higher percentage). I'd definitely be interested in hearing anything you come up with using it, though. It may well be that the approach I'm thinking of unnecessarily overcomplicates things.

I think we're talking about two different things when we say parameters, and that's probably my fault for choosing a word that already has a pretty clearly defined meaning in terms of virtual instruments, which is the way you're using it. What I'm getting at when talking about parameters -- characteristics might be a better word -- is that the MIDI spec broadly represents notes by defining certain characteristics about them. Start time, pitch, end time, velocity, and a collection of CC data points being the most important. But the MIDI spec has no inherent representation of things like attack time. If you're using an instrument whose level is primarily controlled by, say, CC11, then understanding the attack characteristics of a MIDI note requires abstracting the early part of a note's CC11 data and interpreting its effect on the instrument. So if we want to randomize the start point of a note, which is directly characterized by the MIDI spec, we look up the start point of the note in the MIDI data and randomize a single number. But if we want to randomize the level or length of the attack portion of a note, we first must decide what collection of MIDI data represents the attack, then come up with an algorithm able to locate that data, then come up with an algorithm that can process the data as a collection rather than as individual numbers. But if the MIDI spec used ADSR envelopes attached to notes rather than individual CC data points, modifying a note's attack characteristics would simply be a matter of editing a couple of easy-to-find numbers like editing the note start is. That's what I'm getting at when I talk about the things that MIDI makes it easy to adjust vs. the things about notes that we actually hear.

I guess that's why I'm skeptical about the usefulness of the above script -- it randomizes the CC points as individual things, but we don't hear the CC points as individual things; we hear broader note characteristics that are built out of the CC points, and I expect that randomizing the characteristics that we hear will give better results than randomizing the individual data points that those characteristics are made from. It's like how changing an image of a face by randomizing the shapes of its features will work better than changing it by randomizing the colors of individual pixels.

Yes, I think I'm better understanding at what you are getting at and sure, we should be aiming to "humanize" what we hear which is kind of tricky as we're not trying to humanize audio, but MIDI CC values.

Correct me if I'm wrong, but seems like what you want to do is to control the transients with the algorithm you are describing, is that right? If so I'm completely in the shadows as I can't really tell how the Audio Modeling instruments model the output audio produced by their instruments based on the input MIDI CC values they receive from the user, this is again because I haven't had the chance to play with their instruments yet which is unfortunate.

From what I've seen in some tutorials online seems like the transients are mainly controlled by velocity and, in some Sample Modeling instruments, there were some keyswitches to provide some flexibility but that was about it. Other than that I guess you could use external help like compressors, but were you thinking on something else besides that? Or were you mainly pointing out the fact that besides humanizing MIDI CC values we should also aim to modify each transient differently as part of the process to give even more differences between each instrument line?
 
Hello. Yes that’s how my Escapades mock-up happened.

I copied and pasted midi note data which I randomised using Logic’s midi transform, and then Re performed all the controller data for every track. Much faster than playing everything multiple times.



Thanks for this @rohandelivera! I found your Tchaikovsky's video a couple of months ago when I found out about SWAM instruments and you're one of the very few people sharing how they use these instruments with each other to recreate an orchestra.

I'm thinking on diving into doing the same, probably will do some automation to humanize MIDI parameters (like velocity, length, CC's) in order to avoid recording a lot of parts over and over again. I use Reaper so creating a button to execute a couple of scripts in a certain order should do the trick, a little tweaking to the code could change the amount of "humanization" applied to each copy of the original MIDI recording.

I'm even thinking on having multiple buttons to apply different amounts of MIDI humanization to the different subsections of the orchestra (ex: apply 2% to 1st violins and 3.5% to 2nd violins). Have you tried something like this? Would love to develop this idea even further.
 
Yes, I think I'm better understanding at what you are getting at and sure, we should be aiming to "humanize" what we hear which is kind of tricky as we're not trying to humanize audio, but MIDI CC values.

Correct me if I'm wrong, but seems like what you want to do is to control the transients with the algorithm you are describing, is that right? If so I'm completely in the shadows as I can't really tell how the Audio Modeling instruments model the output audio produced by their instruments based on the input MIDI CC values they receive from the user, this is again because I haven't had the chance to play with their instruments yet which is unfortunate.

From what I've seen in some tutorials online seems like the transients are mainly controlled by velocity and, in some Sample Modeling instruments, there were some keyswitches to provide some flexibility but that was about it. Other than that I guess you could use external help like compressors, but were you thinking on something else besides that? Or were you mainly pointing out the fact that besides humanizing MIDI CC values we should also aim to modify each transient differently as part of the process to give even more differences between each instrument line?
Mostly I'm using the attack characteristics of a note as an example of the sort of thing that I think ought to be focused on by a script that auto-generates additional takes of a performance. Attack isn't uniquely important in a broader context, but it's a good example of something that substantially influences how you perceive the note but is difficult to modify via script without first building some fairly robust MIDI parsing tools.

The reason, I think, that no one has yet come up with a script to generate alternate takes is that it's going to be difficult to do it well. Not impossible -- I'm confident it can be done -- just difficult.
 
Very interesting post and the mock-ups are just incredible. One point though I am not entirely in agreement with is this:



My main problem with sampled orchestras is that they are rhythmically not as tight as real players, in direct contradiction to your above point. The problem is that in a sampled session, the players have no musical context in which to time their playing. So the attacks vary without any musical context making them sound rocky and a bit arhythmic. I often have to bounce out and tighten the audio, but it rarely works. Some libs are worse than others.

Actually what happens is that a good band lock onto a groove or feel that they all experience together, so rather than a cascading effect like you describe, you get a coherent performance in a way that is almost impossible to achieve with samples - except by accident.

Which isn't to say that you are 100% correct, those individual variations in timbre and performance create the ensemble effect. It's amazing what you have done with these modelled instruments and your hand modulator is incredible! Truly fantastic work and an extremely interesting and meaty post.....I note that you are also a "Rohan". Correct spelling too I'm glad to see. :)
Surprised this post hasn't gotten more love. I think you're spot on.
 
The way to humanise a synthetic performance is to permute the data from an existing real performance rather than randomise using an arbitrary distribution. I've been using that method for years and it gives better results. If you want to use a random distribution, equal is probably the worst yet seems to be what everyone uses. I imagine that is because the people writing the code are coders rather than psychologists and musicians
 
Last edited:
  • Like
Reactions: Zee
Hi Rohan, are you using the TEC Breath Controller, or the Breath and Bite Controller? Any thoughts on whether it's worthwhile shelling out the extra cash for the breath and bite model? I'll probably pick up a Leap Motion controller first since it's inexpensive, but I wouldn't mind grabbing a TEC controller later on.
 
Top Bottom