# Dynamic Time Warping Tool



## Darius (Dec 29, 2016)

Sample developers such as Sample Modeling and Chris Hien seem to use some sort of implementation of Dynamic Time Warping.

The intention of DTW is to eliminate phase (ergo pitch) differences between crossfading samples, which almost entirely eliminates the perceptual difference between samples.

Here's my own tool for constructing phase aligned samples.

​
(edit: ) Nine trombone samples being crossfaded, post-warping. (excuse that my soft notes are a little wobbly - my trombone playing is rusty!)
​
(edit:edit: ) Tuba example:
​


----------



## EvilDragon (Dec 29, 2016)

And it can also be done in Reaper, or most any DAW that has audio warping, too...


----------



## Darius (Dec 29, 2016)

EvilDragon said:


> And it can also be done in Reaper, or most any DAW that has audio warping, too...



Yes. Yes they can. You can try it on tape or vinyl if you're so inclined. You could play it back on a monitor and wave a microphone back and forth to affect the phase!

For your consideration, to add to existing methods, is a custom made script for precision analytics and reconstruction.


----------



## EvilDragon (Dec 29, 2016)

OK, so something similar to what Elan Hickler already did for Reaper with some Lua scripts...


----------



## Darius (Dec 29, 2016)

I haven't come across Hickler's scripts before, although I can see indexed on his website he's got some phase locked flutes.


----------



## d.healey (Dec 29, 2016)

EvilDragon said:


> OK, so something similar to what Elan Hickler already did for Reaper with some Lua scripts...


Scripts you say... Publicly available?


----------



## Elan Hickler (Dec 29, 2016)

Hey guys, no it's not lua scripts, it's embedded in a REAPER plugin. I currently do phaselocking for my clients. It works well for cello-range-frequency instruments and higher. Anything lower than a cello (the lowest notes of the cello start to get iffy btw, also trombone/lower brass is pretty iffy all around) and I run into a problem where the individual cycles of the waveform show up as impulses rather than discrete frequencies/harmonics (when it comes to FFT stuff). Because of this I've lost my inspiration to continue development. There's a lack of funding and knowledge in how to deal with this. I know someone has the answer somewhere.

I would like to get phaselocking technology out to the masses, but because of problems above, it's not happening at the moment. (Edit: Development will continue if I start to make a profit with the audio plugins I am working on)

If you need your instruments phaselocked, contact me.


----------



## Joe_D (Dec 29, 2016)

Thanks, Darius and Elan,

It's great to see progress in this area. It would be excellent if either or both of these efforts eventually make their way into the hands of a wider pool of developers and musicians, including end users who might make their own instruments. Phase locking and similar tools are a significant step forward for sampling, IMO.


----------



## Darius (Dec 30, 2016)

I had started off in my early tests in frequency domain processes, although I encountered similar problems as Elan described. The tool you can see here works entirely in the time domain which benefits from not losing any data (especially transients) to freq domain representation, but has its caveats, too. I have developed promising early results of a implementation of freq domain analysis I've not seen before, which is likely to significantly speed up the 'brute force' calculations I currently use.

I have similar concerns about funding continued development of these applications. I suspect this kind of technology is actually almost exclusively part of the knowledge economy. It may not be financially viable developing a 'general purpose' application on its own, but instead it finds market stability in bespoke applications for clients.


----------



## d.healey (Dec 30, 2016)

Darius said:


> I have similar concerns about funding continued development of these applications. I suspect this kind of technology is actually almost exclusively part of the knowledge economy. It may not be financially viable developing a 'general purpose' application on its own, but instead it finds market stability in bespoke applications for clients.


Donations, sell it to developers, kickstarters, make it GNU or open source, just use it privately - lots of options


----------



## Elan Hickler (Dec 30, 2016)

Darius said:


> The tool you can see here works entirely in the time domain which benefits from not losing any data (especially transients) to freq domain representation, but has its caveats, too.


My method deals with this via a simple separation of harmonic and inharmonic content. The result is that the phaselocked sample is nearly indistinguishable from the original. If you work solely in the time domain at some point you will run into phasing issues when you attempt to crossfade two or more samples; the phases between the individual harmonics will be out of alignment. This is especially problematic when you want to do pitch crossfading (crossfade from one pitch/sample to another). So, my method allows for this.

I think the solution for low brass is to go further into the idea of harmonic/inharmonic separation. If I had the software to do this, I would have the client record, for every pitch, a sample from loud to soft. That sample would be the representative sample for the impulses of every amplitude. I would separate the impulses from all other samples and apply the representative sample's impulses to all samples after I perfect the phases and frequencies of all harmonics... eh... maybe it wouldn't work.

I think SampleModeling is laughing at us. They have clearly solved this problem. Their brass instruments sound like perfection to me.


----------



## Elan Hickler (Dec 30, 2016)

Having said all that, let me say something on a different vein. We sample library developers need to start moving into the direction of phaselocking/resynthesis/granular synthesis/time-locked crossfading (every sample crossfaded based on their phase, e.g. legato samples)/etc. We should move in a direction where many less samples are recorded in favor of recording "representative samples" that can be extrapolated and reconstructed in real time to create infinite variation and expression.

Christoph Hart is moving in this direction with HISE and wants to help other developers move in this direction as well. http://hise.audio

I predict the future of sampling will be as follows:

1. Leave behind the old method of sampling (recording everything) and transition to representational sampling.
2. Leave behind recording samples altogether and transition to mathematical models / 100% synthesis.
3. Leave behind the strict/lacking-character mathematical models and transition to artificial-intelligence-based synthesis which will allow greater degree of nuance and expression.
4. Quantum computers are the norm and we can now simulate an orchestra in complete detail as if it existed in the real world.

...I think I'm missing a few steps between between 3 and 4.


----------



## Darius (Dec 31, 2016)

Elan Hickler said:


> My method deals with this via a simple separation of harmonic and inharmonic content. The result is that the phaselocked sample is nearly indistinguishable from the original. If you work solely in the time domain at some point you will run into phasing issues when you attempt to crossfade two or more samples; the phases between the individual harmonics will be out of alignment. This is especially problematic when you want to do pitch crossfading (crossfade from one pitch/sample to another). So, my method allows for this.



I've not yet encountered this problem, having warped different instruments together over an octave (but usually only in pairs of samples) - I'll have to investigate this further and get back to you!



Elan Hickler said:


> I think the solution for low brass is to go further into the idea of harmonic/inharmonic separation. If I had the software to do this, I would have the client record, for every pitch, a sample from loud to soft. That sample would be the representative sample for the impulses of every amplitude. I would separate the impulses from all other samples and apply the representative sample's impulses to all samples after I perfect the phases and frequencies of all harmonics... eh... maybe it wouldn't work.



Like http://hartinstruments.net/technology.php (Christoph's example?) I think your method of DTW is more like my timbral modification tool, while my method has its roots in non-audio science and statistics. It would be possible to take multiple IRs from a single crescendo and imprint that onto another waveform with a few tweaks. My timbral mod tool really is a way of being able to produce samples of this kind while we're limited by Kontakt's capabilities - I'll have to revisit HISE and see more carefully what it's capable of.

I had a little look at the SM trumpet (kontakt) source samples a while back; seemed they had phase locked sustains and the dynamic attacks separately... a good approach if your attacks are going to be blurred by filtering.

Indeed, in the future, we're going to need methods of producing sound that is not only highly flexible, but has the right means of control.


----------



## Elan Hickler (Dec 31, 2016)

Darius said:


> Like http://hartinstruments.net/technology.php (Christoph's example?)


 Yes and no. My clients seem to want to record many many samples rather than a tiny set of representative samples. Therefore we need a way of making all samples work together. That's why I am suggesting a representative sample be used as a "fingerprint" for the rest, somehow.


----------



## Darius (Jan 1, 2017)

Elan Hickler said:


> Yes and no. My clients seem to want to record many many samples rather than a tiny set of representative samples. Therefore we need a way of making all samples work together. That's why I am suggesting a representative sample be used as a "fingerprint" for the rest, somehow.



I see... I think by comparing the IRs in the way I have, but using many windows (I currently analyse a whole sample as a single window). I suspect there's a few other things this approach can achieve...



Elan Hickler said:


> If you work solely in the time domain at some point you will run into phasing issues when you attempt to crossfade two or more samples



I'm not seeming to encounter the issues you've had with phasing partials. Are you suggesting the harmonics' locked phases means they are deconstructing on crossfading? Or perhaps the partials change phase over time and cause variable amounts of construction/deconstruction?

Here's some trombone samples I slapped together. My _piano_ sustains are somewhat wobbly due to my rusty lip skills, but my DTW program managed to (just about) keep it in phase while still being somewhat wobbly. In the spirit of "it's not a bug, it's a feature!", perhaps I could say it can retain some original "human variation"


----------



## Elan Hickler (Jan 1, 2017)

Darius said:


> Are you suggesting the harmonics' locked phases means they are deconstructing on crossfading? Or perhaps the partials change phase over time and cause variable amounts of construction/deconstruction?



A musician will not be able to record an instrument over some hours at the exact same spot. The musician will be at slightly different distances away from the mic and walls of the room creating many differences in phase per partial. Different pitches will produce different sizes / wavelengths which will return to the microphone differently again causing phase of partials to be different. If the musician sways a little bit during a sustain recording again the phase is affected. Yes, the phase of partials definitely change over time for all instruments recorded in the real world. Just take any recording, separate out the harmonics (1st,2nd,4th,8th,16th) and watch them drift around in relation to each other. Not to mention the problem with time domain pitch detection: Realize that pitch detection of a signal is only as strong as the stability of the phase of all partials.

There's many examples of why working in the time domain is a bad idea. Try this: Tune a sample based only on it's fundamental. Now tune a sample based on its 4th 8th 16th or 32nd harmonic. You get different results. No single harmonic is a representation of the frequency of a piece of audio. The time-domain-waveform is a sum of all the partials and is not a good enough representation of frequency for phaselocking. You can't phaselock a sample based one partial, a few partials, or all partials combined. Phaselocking requires every partial be individually manipulated and aligned.

You'll run into these problems eventually if you work with hundreds of samples all from different people, places, and recording techniques.


----------



## Darius (Jan 1, 2017)

Ah.... I've realised I haven't mentioned; this method is exclusively for dry samples!  If not just for the fact that I'd be retuning reverb tails of historic impulses... (yuck)



Elan Hickler said:


> Just take any recording, separate out the harmonics (1st,2nd,4th,8th,16th) and watch them drift around in relation to each other.


I did with a few samples; the dry sample harmonics remained in phase throughout, while you're right that the wet samples do change in phase over time.
If you've got some dry samples that can educate me that this is not the case, I'll be happy to hear it. But I'm _wildly_ speculating that the phase of harmonics from a consistently resonating source is stable, and is a part of the signature of that individual sound... when dry.

Even if this could happen when wet, the fact that you'd have to play back the wet sample with _changing_ gain, with _changing_ pitch in the sample engine, means that all the impulses within audible perceptual history suddenly change their reverberation characteristics (ie; bending old reverb). Which is why I'm reckoning dry is the way to go!


----------



## Elan Hickler (Jan 1, 2017)

mmm, ok. When you can phaselock low brass, let me know!


----------



## Mike Greene (Jan 2, 2017)

I've played with this a bit myself, and my experiences have been along the lines of what Elan is saying. But . . . Darius' trombone example sounds very convincing to me. Way, way better than I would have expected from a time domain lock.


----------



## Darius (Jan 2, 2017)

Elan Hickler said:


> mmm, ok. When you can phaselock low brass, let me know!


Found a tuba player. Kidnapped him.

​


----------



## Elan Hickler (Jan 2, 2017)

hmmm... ... but... those samples aren't demonstrating the problem. 

Ok I just did some tests on some very spiky brass waveforms, the results are surprisingly GOOD (using only time domain manipulation):

*Dynamic crossfade*
Trombone_spiky_phaselocked_soft-loud-soft.wav






*Pitch crossfade*
trombone_spiky_phaselocked_loud_D#1_to_F#1.wav





I am not using any frequency domain stuff for these tests. I am realizing something. The problems I am describing and trying to solve are problems that are not yet ready to be solved... in other words, I'm getting ahead of myself, trying to phaselock samples that weren't recorded specifically for phaselocking. I think I have some ideas that could _help_ in phaselocking less-than-ideal samples so that the end result is more guaranteed to sound correct (using some combination of time and frequency domain processes)... The more effort you have to put into getting the perfect recording, the less room there is for nuance/human touches.

So conclusion is: Time domain is the right way to go for low brass samples at this time (and the samples must be recorded with phaselocking in mind), but further innovations can be made to allow for using less-perfect-more-expressive material, maybe somehow use the expressive material as a fingerprint to somehow modulate the "perfectly-recorded" samples in terms of timbre, frequency drift, and amplitude.


----------



## Darius (Jan 2, 2017)

Spot on. I think until we have AI instruments with the capabilities of https://deepmind.com/blog/wavenet-generative-model-raw-audio/ (WaveNet), we're not going to see the features you've been doing with reasonable success!



Elan Hickler said:


> maybe somehow use the expressive material as a fingerprint to somehow modulate the "perfectly-recorded" samples in terms of timbre, frequency drift, and amplitude.


I'll be posting, hopefully in a few days to months, several methods for doing exactly this.


----------

