Sneak Preview of the Upcoming WarpIV PRO 3D Monster Drum Kit Sample Library

WarpIV · May 8, 2021

Ok, as I answer the question, "What is the loudest part of a sin wave?", let me begin by dispelling the myth of RMS as being the way to measure audio loudness. RMS stands for Root Mean Squared. It is computed by the following math equation.

RMS = sqrt[(s1 x s1 + s2 x s2 + s3 x s3, ..., + sN x sN) / N]

In other words, add up the square of each sample, average them by dividing by the number of samples, and then take the square root of the result. Note that RMS can be transformed to db by taking the logarithm of RMS (base 10) and then multiplying that by 10.

In this math equation, sqrt is the square root, s1 is the first sample in a set of N samples, s2 is the second one, s3 is the third one, ..., and sN is the last sample in the set. RMS is always computed over a collection of N samples where N could be directly specified or determined by a time window (i.e., N = sample rate x time window, so a 48,000 sample rate x 0.01 second time window would mean computing RMS with 480 samples over 10 milliseconds).

Now imagine we have a wav file where every sample is maxed out (not the sin wav example). It would be like a long, positive, square wave that never comes down. Remember what actually happens at the speaker when we play the wav file. Positive samples push the speaker out and negative samples pull the speaker in. The speaker's movement essentially mimics what's in the wav file. In our maxed-out sample case, RMS would be maximized to its largest value possible, the speaker would be pushed out (ultimately controlled by the volume at the amplifier), but the speaker is not moving, which means no sound would be produced. This should alarm anyone who has thought of RMS as being a good measure of loudness. Clearly, this example describes a case where RMS is maximized, yet no sound would be produced...

RMS cannot possibly be the correct measure of audio loudness!

So why does everybody in the industry use RMS as the standard measure of loudness? It took me a while to figure this out. I had to go back to basic physics involving electromagnetic circuits. It turns out that RMS describes average power in electrical circuits involving voltage, currents, and resistors. Think of it this way. Your stereo system would be doing a lot of work to push and hold a speaker out at its maximum position even though no sound is being produced. It would be like holding your arm out motionless and parallel to the ground. Your body would get very tired (just like your stereo would be doing a lot of work holding the speaker out) even though your arm is motionless (i.e., doing no kinetic work).

As it turns out, RMS measures electrical power (think powering a radio station), which can be a useful measure. But, it does not measure audio loudness. Does anybody care how much electricity their compositions use when mixing (answer: no), or is audio loudness the real focus (answer: yes)...

So, with this concept in mind (i.e., that the speaker's out/in position, amplified by your amplifier, mimics the wav file), go back to my original question, "What is the loudest part of a sin wave?" If you haven't thought of audio in the way I just laid out, the answer might completely stun you...

Again, I hope to hear your responses. I'll provide the actual loudness equation in my next post. And guess what... It explains why compression and limiting work so well...

Lindon · May 9, 2021

WarpIV said:
So why does everybody in the industry use RMS as the standard measure of loudness?

They dont, well not any more, they use LUFS/LKFS https://en.wikipedia.org/wiki/LKFS

You may find this useful:

https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-3-201208-S!!PDF-E.pdf

WarpIV · May 9, 2021

Lindon said:
They dont, well not any more, they use LUFS/LKFS https://en.wikipedia.org/wiki/LKFS

You may find this useful:

https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-3-201208-S!!PDF-E.pdf

Hi Lindon, thanks for the link. I did my best to go through the math in the document and I think I have a basic understanding of the algorithm. What I am going to lay out in my next post is the actual audio energy, not perceived audio energy based on human perception (i.e., Fletcher Munson curves, head shapes, multichannel perception, etc.). What I am going to lay out is far simpler to understand and it is computationally trivial to implement as a plug-in or software algorithm. It does not require any filtering, special hardware, curve-fitted constants, or FFTs. It's just a measure of the raw energy of audio from first principles. The audio power can be instantaneous (i.e., the audio energy between two samples) or averaged over a specified time window... It is also (basically) independent of what digital bit rate or sample rate you use. I believe the Leq(RLB) algorithm (equation 3 on page 9 in the document) falls out of my much simpler equation if you wanted to include human perception to the loudness measure (but at a great computational cost)...

WarpIV · May 9, 2021

Ok, here is the answer...

Sorry for the long post and for all the math. But I will do my best to make it as simple as possible.

If we go back to the basic principle that the speaker mimics what is in the wav file (i.e., it pushes out for positive values and pulls in for negative values), then it is not too hard to also realize that the air molecules close to the speaker essentially move with the speaker. We know from basic physics that the kinetic energy of a particle is 1/2 x mass x (velocity x velocity). So all we need to do is calculate the velocity (speed) of the speaker at any time to get the speed of the air molecules being pushed out and pulled in by the speaker. We can ignore scale factors such as constants, amplification, speaker size, and the mass of air molecules because we want our measure of loudness to be independent of any particular audio hardware. The only actual important thing is the speed of the speaker at any point in time.

Well, that is easy to compute...

Speed is defined as distance traveled divided by time (e.g., a car moving 60 miles in an hour has a speed of 60 miles per hour). In the case of a wav file, the distance the speaker moves between two samples is simply the difference between the two samples. The time between the two samples is 1.0 divided by the sample rate (a constant for a given wav file). So here we go - it's super easy...

Here is the equation for the instantaneous power between two samples denoted by s(i+1) and s(i).

Define ds(i) = s(i+1) - s(i)
Define T = 1.0 / Sample Rate

Instantaneous power (ignoring constants, etc.) between two samples is.

P = ds x ds / T

To get an average power, all you have to do is add up the instantaneous power values over a time window and then divide by the number of samples in the time window. The big difference between this and RMS (other than the square root) is that instead of adding up the squared samples, you add up the squared difference between successive samples. It makes all the difference in the world...

One thing that is really cool about this calculation is that you can easily compute a sliding average power without having to recompute the whole window each time from scratch. All you have to do is subtract the instantaneous power of the previous first sample from the sliding average and then add the new sample's instantaneous power. You don't need any special hardware to do this because it is super fast to do these computations. I wrote software to do all of this...

For anyone who is really into math, an alternative representation of signals in the time domain (e.g., wav files) is the frequency domain. This is often done digitally using FFTs or filters. So, a function of time can be represented as a sum of sin wave frequencies, each with their own amplitude and phase shift. If you know basic calculus, the speed is simply the time derivative of the function. So, the derivative of a function (i.e., its speed) that is described as a sum of sin waves turns out to be a sum of cosine waves with each cosine term multiplied by its frequency. Squaring that and averaging over the time window used to compute the FFT has the orthogonal property that all cross terms go to zero. So what you end up with is essentially the original sin wave amplitudes multiplied by their frequency squared. What this tells us is that the audio power for a given frequency is proportional to its frequency squared. For example, doubling a frequency (i.e., an octave) but keeping the amplitude constant has four times the power. Or conversely, the same power would be achieved by having 1/4 the amplitude. This is why when you look at frequency spectrums, the levels go down in a nearly linear way (note, it's linear because the scale is in db, which is a log scale). This also explains the linearly ramped down frequency spectrum of pink noise so that all frequencies have the same power.

So, with all this, what is the loudest part of a perfect sin wave?

The answer is easy. The speaker moves fastest as the wav file crosses 0. It slows down and actually stops at the high and low peaks where it has no power. The peaks are actually where the least amount of power occurs (not intuitive for most people). This is why compression and limiting work so well. At the upper and lower peaks of the wave file, you are compressing the part of the waveform where the speaker is moving its slowest, which means compression and limiting affect the parts of the audio with the least amount of power. That's a very good thing...

Putting this all back together, the WarpIV PRO 3D Drum Kit analyzes the power during the attack in each sample to normalize all dynamic layers and (Top/Bot, Left, Right) channels. The attack is different for each type of drum, cymbal, or percussion so appropriate time windows are used to capture this. The audio power approach is far better than normalizing samples based on peaks or RMS. Peaks can be wildly off (think about cymbals vibrating and contorting in very unpredictable ways) and RMS is simply not a correct measure of loudness. Basing normalization on physics-based audio power works great. It made a big difference in the library.

By the way, somebody ought to develop a plug-in with a limiter and/or compressor that shows audio power before and after compression or limiting. Such a plug-in would provide feedback to users on how much distortion (based on power) is occurring from the compression or limiting effect. If anyone is interested in working on this with me down the road, let me know...

Lindon · May 10, 2021

Yeah, but no. Ignoring perception means that any non-frequency dependent approach is no better (or worse) than RMS. As we know perceived volume does depend upon the frequency of the source, different (high and low) pitches with the same RMS (or I would imagine with whatever value your approach calculates, as again I assume its not accounting for frequency) are perceived as different volumes. That's the point in LUFS.

Also I think you should be careful attempting to take a theory that seems to work when applied to a sine waves and extrapolation it to the real world..... where audio almost never emulates a sine wave...

davidanthony · May 10, 2021

Lindon said:
Ignoring perception means that any non-frequency dependent approach is no better (or worse) than RMS.

I'm confused as to why frequencies matter when it comes to evaluating an equation for normalizing samples in a library? Unless I'm understanding it incorrectly the equal loudness contours of the ear are basically a filter that our brains apply to the input, so I don't understand why the math used to produce the signal feeding them needs to account for them.

Makes sense when using something like LUFS to appreciate loudness across different program material, but for setting dynamic layers?

WarpIV · May 10, 2021

Lindon said:
Yeah, but no. Ignoring perception means that any non-frequency dependent approach is no better (or worse) than RMS. As we know perceived volume does depend upon the frequency of the source, different (high and low) pitches with the same RMS (or I would imagine with whatever value your approach calculates, as again I assume its not accounting for frequency) are perceived as different volumes. That's the point in LUFS.

Also I think you should be careful attempting to take a theory that seems to work when applied to a sine waves and extrapolation it to the real world..... where audio almost never emulates a sine wave...

Lindon,

First, thank you for the dialog. I enjoy talking about audio to people... I hope this is interesting to you (and others) too.

You are not correct in saying that "any non-frequency dependent approach is no better than RMS." I just showed why RMS is not a valid measure of audio power, but rather, a measure of electric power. I then derived the physically correct measure of audio power from first principles. What my equation does not account for is the fact that human ears are subjective, which means that we hear frequencies with the same power differently. This is one of the reasons why people good at mixing have "great ears." I also showed why two frequencies with the same amplitude, but an octave apart, would have 4 times the power for the higher frequency. Power as a function of frequency and amplitude explains why frequency spectrums of mixes decrease as frequencies increase. It also explains how pink noise is computed (each frequency is generated with the same power, which means their amplitude decreases linearly in a db scale).

Everybody's hearing is different, which means that frequency-corrected measures of loudness are subjective. It's why they curve-fitted frequency-corrected audio power in the document you posted. But what makes this approach even worse is the fact that audio perception by humans is not linear in terms of volume. So as we increase the volume, different frequencies (e.g., low low and very high frequencies) begin to pop out in our human perception as being too loud and as we lower the volume, the same frequencies disappear from the mix (e.g., low and very high frequencies). This is why a lot of music alarm clocks have a loudness button that increases low and high frequencies. People like to wake up to soft and soothing music, but when you play music really soft, you lose the bass and highs, and when you turn up the volume, the bass and highs are dominating.

This is why mixing is really hard...

So, any frequency corrected definition of loudness is subjective and a function of the volume. Keep in mind that a wav file has no volume setting. That is controlled by your amplifier. The LUFS measure might be a good measure of perceived loudness at a fixed (medium) volume, but it is not a general representation of the actual power of the audio. It is also not a simple computation, which means that it drains CPU resources when you use it. My metric is simply a true measure of the audio power across all frequencies. It works great for normalizing my drum samples, and I believe it would work great for other sampled instruments with different dynamic layers or multiple mic setups...

I'm not saying that LUFS has no use or is wrong. But, it has major problems. I am saying that RMS is junk and should not be used by anybody ever again. We need a true audio power plug-in for metering to replace RMS. My equation is super simple to implement, very efficient, and extremely accurate. I actually did an analysis of a composition I wrote some years ago where I plotted RMS over time (over the whole composition) and then the actual audio power using my true audio power approach. The two plots are very different. For example, I had some xylophone hits in the composition and they popped right out in my audio power plot, but were buried in the RMS plot.

I hope this helps.

Best,
Jeff Steinman

WarpIV · May 10, 2021

davidanthony said:
I'm confused as to why frequencies matter when it comes to evaluating an equation for normalizing samples in a library? Unless I'm understanding it incorrectly the equal loudness contours of the ear are basically a filter that our brains apply to the input, so I don't understand why the math used to produce the signal feeding them needs to account for them.

Makes sense when using something like LUFS to appreciate loudness across different program material, but for setting dynamic layers?

You are right. For normalizing velocity layers, audio power is the right metric to use because the samples essentially are made of the same frequencies (cymbals sound like cymbals, snare drums sound like snare drums, etc.). So, my audio power equation treats them all the same way and is a true measure even with human perception of frequencies being different. What the audio power approach does (vs just normalizing the samples) is eliminate spikes in the audio that occur by chance. Again, think of a cymbal vibrating in a crazy way where the frequencies picked up by each mic could randomly spike. When you look at cymbals, the peak in the wav file often comes much later than the initial hit of the cymbal. It is too random to rely on normalization (the approach most libraries use). The audio power approach over a meaningful time window eliminates these random spikes in the amplitude and gives a much truer representation of the audio loudness.

synthesizerwriter · May 10, 2021

WarpIV said:
I hear you... It is funny to me how composers using virtual instruments work hard to include rings and rattles to make tracks sound real, while recording engineers do everything they can to cut out those things to make tracks sound good. There is a balance between these two things, but I definitely lean towards making tracks sound good. One of the examples of this sort of thing that I sometimes do on drums is have parts going that really could not be played by a real drummer, at least not by one drummer...

I'm intrigued by the thought of taking 'rings and rattles' and deliberately recording those, to the extent of as much exclusion of the parent drum sound as possible. The resulting 'nurnies' (the word that 3D CGI people use for little bits of added detail) would seem to be a very interesting way of extending the creative breadth/scope of what already sounds like an amazingly detailed piece of work!

WarpIV · May 10, 2021

synthesizerwriter said:
I'm intrigued by the thought of taking 'rings and rattles' and deliberately recording those, to the extent of as much exclusion of the parent drum sound as possible. The resulting 'nurnies' (the word that 3D CGI people use for little bits of added detail) would seem to be a very interesting way of extending the creative breadth/scope of what already sounds like an amazingly detailed piece of work!

I am going to add adjustable snare noise to toms and kicks as an option. So, that will get much of what you are looking for. Also, even though I took great care to mount everything as clean as possible, there still is some rattle, etc., just from the recordings.

WarpIV · May 10, 2021

Lindon said:
Also I think you should be careful attempting to take a theory that seems to work when applied to a sine waves and extrapolation it to the real world..... where audio almost never emulates a sine wave...

The sin wave was just to get the discussion started with something extremely simple. I wanted to point out that the loudness does not come from the peaks, which is when the speaker stops and then moves in the opposite direction, but rather the zero crossing. It explains why compression and limiting work so well (you are cutting out the part of the waveform that has the least amount of audio power).

My audio power equation works across any audio mix, not just sin waves.

Jeff

synthesizerwriter · May 10, 2021

WarpIV said:
I am going to add adjustable snare noise to toms and kicks as an option. So, that will get much of what you are looking for. Also, even though I took great care to mount everything as clean as possible, there still is some rattle, etc., just from the recordings.

Wow!

Lindon · May 11, 2021

davidanthony said:
I'm confused as to why frequencies matter when it comes to evaluating an equation for normalizing samples in a library? Unless I'm understanding it incorrectly the equal loudness contours of the ear are basically a filter that our brains apply to the input, so I don't understand why the math used to produce the signal feeding them needs to account for them.

Makes sense when using something like LUFS to appreciate loudness across different program material, but for setting dynamic layers?

Frequencies DONT matter when normalizing samples - but that's not what we are discussing - we are discussing loudness, and for my part perceived loudness (which is quite different I will grant). Nothing I've said is about approaches to normalizing.

Lindon · May 11, 2021

WarpIV said:
Lindon,

First, thank you for the dialog. I enjoy talking about audio to people... I hope this is interesting to you (and others) too.

You are not correct in saying that "any non-frequency dependent approach is no better than RMS." I just showed why RMS is not a valid measure of audio power, but rather, a measure of electric power. I then derived the physically correct measure of audio power from first principles. What my equation does not account for is the fact that human ears are subjective, which means that we hear frequencies with the same power differently. This is one of the reasons why people good at mixing have "great ears." I also showed why two frequencies with the same amplitude, but an octave apart, would have 4 times the power for the higher frequency. Power as a function of frequency and amplitude explains why frequency spectrums of mixes decrease as frequencies increase. It also explains how pink noise is computed (each frequency is generated with the same power, which means their amplitude decreases linearly in a db scale).

Everybody's hearing is different, which means that frequency-corrected measures of loudness are subjective. It's why they curve-fitted frequency-corrected audio power in the document you posted. But what makes this approach even worse is the fact that audio perception by humans is not linear in terms of volume. So as we increase the volume, different frequencies (e.g., low low and very high frequencies) begin to pop out in our human perception as being too loud and as we lower the volume, the same frequencies disappear from the mix (e.g., low and very high frequencies). This is why a lot of music alarm clocks have a loudness button that increases low and high frequencies. People like to wake up to soft and soothing music, but when you play music really soft, you lose the bass and highs, and when you turn up the volume, the bass and highs are dominating.

This is why mixing is really hard...

So, any frequency corrected definition of loudness is subjective and a function of the volume. Keep in mind that a wav file has no volume setting. That is controlled by your amplifier. The LUFS measure might be a good measure of perceived loudness at a fixed (medium) volume, but it is not a general representation of the actual power of the audio. It is also not a simple computation, which means that it drains CPU resources when you use it. My metric is simply a true measure of the audio power across all frequencies. It works great for normalizing my drum samples, and I believe it would work great for other sampled instruments with different dynamic layers or multiple mic setups...

I'm not saying that LUFS has no use or is wrong. But, it has major problems. I am saying that RMS is junk and should not be used by anybody ever again. We need a true audio power plug-in for metering to replace RMS. My equation is super simple to implement, very efficient, and extremely accurate. I actually did an analysis of a composition I wrote some years ago where I plotted RMS over time (over the whole composition) and then the actual audio power using my true audio power approach. The two plots are very different. For example, I had some xylophone hits in the composition and they popped right out in my audio power plot, but were buried in the RMS plot.

I hope this helps.

Best,
Jeff Steinman

Ok well , *I* think you are confusing "audio power", for which RMS may be a rubbish approach, with "loudness" - which you explain is subjective, and frequency dependent. But all of this stems from your original question "which is the loudest part of a sine wave", which you then use to extrapolate into an approach for "audio power". I guess all I am saying is that these two are not the same, but you seem to be saying the same thing as me above - so now we are disappearing down a pointless rabbit hole of minutiae, so lets stop now, because I think we are furiously agreeing about loudness.

So if your algorithm works - then great - build a plugin that uses it and demonstrate its usefulness beyond the normal gain reduction display in several existing software compressors.

WarpIV · May 11, 2021

Hi Everybody,

We've had a lot of interesting discussion on the subject of audio loudness and I am guessing that some of you following this thread are confused. So, I would like to summarize and clarify everything one last time. I am hoping that this thread (within my main thread on the WarpIV PRO 3D Monster Drum Kit) will be helpful, not just for sample library developers but for anybody mixing and mastering their own compositions and wondering what is actually going on with audio in their mixes.

First, it is important to understand that each speaker moves in and out in a manner that mimics the audio channels in the wav file. As a speaker moves, it pushes and pulls the air molecules that are in front of it, which gives the air molecules kinetic energy. It is the energy of the moving air molecules that translates to audio power.

Second, I showed that RMS is junk. It is a measure of electrical power that your audio hardware consumes when it plays your composition. It is not a measure of audio power. For reference, here is the equation for RMS, where s(i) represents sample i in a time window.

RMS = sqrt[(s(1) x s(1) + s(2) x s(2) + ..., + s(N) x s(N)) / N]

The real measure of audio power from physics is derived by recognizing that the air molecules in front of each speaker essentially move with the speaker and that their energy is 1/2 x Mass x (Velocity x Velocity). The velocity (or speed) of the speaker is easily derived from the wav file as we recognize that the distance the speaker moves between two successive samples is proportional to the difference between the two samples (i.e., from one sample to the next, here is how much the speaker moves). The time (T) between two successive samples is simply one divided by the sample rate. So, the measure of kinetic energy between two successive samples (denoted as sample i and sample i+1) in a wav file is represented by the following equation.

ds(i) = (s(i+1) - s(i)) / T
Energy per air molecule is therefore proportional to ds(i) * ds(i).

So, the key is that instead of simply squaring samples and averaging them over a time window (the RMS approach), we should really be taking the difference between samples, squaring them, and averaging them over a time window.

Of course, the total energy is proportional to speaker size (which impacts how many air molecules are moved by the speaker), speaker design, amplification, mass of air molecules, etc... We can ignore all of these constants and just focus on the speed derived from the wav file.

Power is defined as average energy over time. So, the way to compute audio power is to add up a bunch of squared ds(i) terms over a time window and then divide by the number of terms (N) to get their average. That is the equation of audio power from a purely phyics perspective. It does not involve human factors such as people hearing frequencies differently. It is simply the total raw power of the audio.

Audio Power = (ds(1) x ds(1) + ds(2) x ds(2) + ..., + ds(N) x ds(N)) / N

When it comes to perceived loudness, it is important to recognize that everybody's hearing is different. For example, we do not generally hear frequencies above 20KHz very well (although some people apparently do). Also, perceived loudness of different frequencies is highly non-linear in terms of their amplitude. This is why many audio alarm clocks have a loudness switch (so you can hear the lows and highs even when music is played softly - but if you turned up the volume, the lows and highs would be too loud).

Now, switching gears...

From a mathematical perspective, you can represent any function of time as a sum of sin waves, each with a different amplitude and phase shift. In the continuous (analog) world, this is done mathematically by something called a Fourier Transform. In the discrete (digitally sampled) world, it is done through a Fourier Series. A special algorithm, known as the Fast Fourier Transform (FFT), was developed to super-efficiently take a series of samples over time and turn them into a sum of sin waves. What you essentially get is the following.

F(t) = A1 x sin(w1 x t) + A2 x sin(w2 x t) + A3 x sin(w3 x t) + ... AN x sin(wN x t)

In this equation, wi represents a frequency derived from the FFT.

The idea is that F(t) perfectly replicates the samples over the window used to compute the FFT. From physics and calculus, we know that the velocity (speed) is the time derivative of this function. And we know that the derivative of a A x sin(w x t) = A x w x cos(w x t). The important thing is this. The power at any time t, (determined by squaring F(t)) has terms such as (Ai x Ai) x (wi x wi) x cos(wi x t) x cos(wi x t) and then there are all of the cross terms. It turns out that when averaged over the same time window that was used to compute the FFT, the cross terms all cancel out and the cos (wi x t) x cos(wi x t)( terms all average to 1/2. Ignoring all of the constants, all that is left is the following.

Audio Power = (A1 x A1) x (w1 x w1) + (A2 x A2) x (w2 x w2) + ..., A(N x An) x (wn x wN)

This is where we get the audio power of each frequency being proportional to the square of the frequency. It explains why high frequencies sound louder than low frequencies, why it is so hard to mix bass, pink noise, etc.

These two Audio Power equations (mine and the one derived from the FFT) are identical (or at least proportional). What LUFS does is multiply each frequency by an empirically derived weight to form a new "perceived" definition of Audio Power that would more accurately reflect what an average human would perceive at some reference level of amplification. Of course it does not match each individual perfectly and it could be wildly off as the volume of the audio changes (i.e., music played loud, soft, medium, etc.).

So, LUFS attempts to relate Audio Power to some reference human perception, while my simple equation is the actual audio power from a pure physics perspective. In any case, RMS is junk and should not be used because it is not a measure of audio power at all. My physics-based equation for audio power is much better and is not subjective to human perception.

Here is why using physics-based audio power works great for sample libraries. When you hit a drum or cymbal, what is really happening is that the kinetic energy from the stick is imparted onto the drum or cymbal. We want the audio power to represent that energy. This is exactly what my equation for power captures. It is the true energy imparted from the stick to the drum or cymbal, not some distorted representation of what humans would hear.

The other reason why the physics-based audio power approach works so well for drums and cymbals, and why it would be better than RMS, Normalization based on peak sample, or even LUFS, is to remember that human perception of frequencies is non-linear in terms of their amplitude. Use of drums in musical compositions generally have a huge dynamic range, which could cause LUFS to break down.

Finally, I want to go back to my original question, "Which part of a perfect sin wave is the loudest?). This was just a (admittedly loaded) teaser to get us to think about audio power using a simple example. My goal was to provide an intuitive perspective about audio and to generate lively discussion... Remembering that the air molecules in front of the speaker move with the speaker, we get a very unintuitive result. The speaker stops and reverses direction at the peaks of the sin wave, which means that right at the peaks, the air molecules have no energy. The speaker (and therefore the air molecules) actually move fastest at the zero crossings. Most people naturally think that the peaks in their wav files (not just sin waves) have the most power, but actually it is at the peaks where the speaker stops and changes direction. In other words, it is at the peaks of our wav files where the audio energy is minimal. This is why compression and limiting work so well. What these effects do is compress the audio right where the audio power is smallest. So the resulting distortion from compression and limiting (up to a point) is often imperceptible. We get louder mixes (at least up to a point) almost for free.

Anyway, I hope all of this is helpful to people reading this post. I tried to provide a solid foundation for how we normalize our drum samples across dynamic boundaries and how we accurately mix multiple mics. But, I also hope that this discussion has been helpful for anyone wanting to better understand the physics and mathematics of audio. These concepts have really helped me formulate my strategy for mixing and mastering in my own compositions. I just wish I had better ears...

Best,
Jeff Steinman

davidanthony · May 11, 2021

WarpIV said:
for anybody mixing and mastering their own compositions and wondering what is actually going on with audio in their mixes.

Thanks for this, Jeff. I have to admit I've been using LUFS metering (standard for most broadcast deliverables and parts of the music industry as well) without having a clue about how it actually works under the hood, and this thread has prompted me to try to address that ignorance.

If you have some time, could you expand on this paragraph?

WarpIV said:
What LUFS does is multiply each frequency by an empirically derived weight to form a new "perceived" definition of Audio Power that would more accurately reflect what an average human would perceive at some reference level of amplification. Of course it does not match each individual perfectly and it could be wildly off as the volume of the audio changes (i.e., music played loud, soft, medium, etc.).

Specifically the last part -- are you saying that something in the weighting of the equation allows for
"disproportionate" results in terms of actual perception (recognizing there's no universal perception, but just to an "average" listener with decent hearing) vs. the LUFS number?

The reason this is important is some deliverables actually require you to hit a certain LUFS measurement as measured over the duration of a piece (not just peak levels), so I'm wondering if strategically boosting or cutting certain frequencies at certain volumes could make sense as a strategy in order to "hit" loudness targets. E.g. if something like a 30hz bump measuring 65 dbC at reference volume is almost perceptually undetectable but accounts for a disproportionate bump in the LUFS measurement, it may make sense to dynamically high pass content in order to bring the average into range..

Unfortunately given how resistant to change the industry is, it's doubtful the standard will change for some time, even in the face of a superior alternative (I won't pretend to be able to evaluate the mathematically quality of your equation but the logic strikes me as correct!) So if you can't beat em, join em...

WarpIV · Aug 20, 2021

Ok, after being swamped with work from my software company (which is a good thing), I'm back on the drum library, at least for a while!

Just so everybody knows, we have completed all of the recordings and most of the editing, so the remaining work is getting the samples into Kontakt and upgrading some of the programming.

We completed the other two hi hats in the collection and I put a demo together featuring these two new sets. All of the demos I previously posted were the Zildjian K Constantinople 14" hi hats, so you have already heard them. These new demos that I am posting feature the 14" Paiste Signature Series and the 14" Meinl Byzanth. I have attached 5 demos to help you hear what everything sounds like. Two demos are the full composition, two are just the drums, cymbals, and percussion, and one is everything except drums. All of the tracks are WarpIV sampled instruments (trumpets, trombones, saxes, flute, bass, guitars).

In these demos, most of the cymbal sounds are hi hats at various openings, but I added three Zildjian K splash cymbals (8", 10". and 12") and either a matching 22" Meinl or 22" Paiste Ride cymbal for the cymbal swells. The Paiste hi hat demo used a 13x3 Pearl Piccolo Maple Snare and the Meinl hi hat demo used a 13x3 Pearl Piccolo Brass Snare that is tuned a little higher than the Maple snare. The two snares are actually pretty similar and I think they sound great. There are two kicks, one a tight kick with pillow dampening and the other is an open kick that is boomier and with more reverb. There are also finger snaps, 3 cowbells, and Meinl red/black woodblocks. I did not use any toms in these demos.

The next thing we will focus on is getting all of the snare drums into Kontakt. I think that will take a few weeks. Once that is done, I will put some demos together so you can hear what the snares sound like. I'll probably just do some solo snares first because the type of snare used in a composition really depends on the style of the composition. So, great snares for some genres will sound really bad in others. My hope is that you will all get to hear what each of them sounds like and what they can do.

Meanwhile, let me know what you think of these hi hat demos...

Thanks,
Jeff Steinman

davidanthony · Aug 20, 2021

WarpIV said:
let me know what you think of these hi hat demos...

The way the hats decay is impressive. I'm not sure I would have picked them out as samples without the benefit of knowing! The only "giveaway" to me was the open/close but again I think that was just because I was being hyper vigilant, doubt I'd notice in a mix. Well done!

Markastellor · Aug 21, 2021

Sounds promising.

WarpIV · Aug 21, 2021

davidanthony said:
The way the hats decay is impressive. I'm not sure I would have picked them out as samples without the benefit of knowing! The only "giveaway" to me was the open/close but again I think that was just because I was being hyper vigilant, doubt I'd notice in a mix. Well done!

The reason why the decays sound good is that when a new note is played, the old note fades out with a reasonable tail. This allows the hi hat (and cymbals) to build. You might also notice the cymbal swells that were created on the 22" rides. They were created with single hits.

I think when you said open/close, you were referring to what I call mutes (the open hi hat is hit on its edge and then quickly closed). I have mutes for all the cymbals as well, where they are crashed and then quickly muted by hand. I tried to get a lot of the articulations in, so perhaps I overdid those...

One more thing that I thought I would mention. I use Ozone 9 for mastering. One of the things I like to do is compare the frequency response of the entire mix with a reference. I use Gordon Goodwin's Big Phat Band for this. Those mixes are incredible. Anyway, with other drum libraries, I would always get a bump in the bass frequencies and a loss in the very high frequencies. With my drum libraries, I do not get a loss in the very high frequencies, but actually a little rise, and the bump in the bass frequecies is also gone. I think this is because the microphones I used to record everything are ridiculously good. The top mic has a flat (within 1 db) frequency response of 3 Hz to 50 KHz and the two side mics are also flat )within 1 db) with a frequency response of 7 Hz to 30 KHz. I think capturing the really high frequencies does not really add treble to the mix, but rather, it gives everything more clarity, which makes it easier to listen to without tiring your ears. I'm really happy with the tone, clarity, expression (57 articulations), and naturalness of the hi hats.

Thanks for the feedback,
Jeff Steinman

Sneak Preview of the Upcoming WarpIV PRO 3D Monster Drum Kit Sample Library

Active Member

VST/AU Developer

Active Member

Active Member

VST/AU Developer

Active Member

Active Member

Active Member

Bits and bites of sound and silence...

Active Member

Active Member

Bits and bites of sound and silence...

VST/AU Developer

VST/AU Developer

Active Member

Active Member

Active Member

Attachments

Active Member

Active Member

Active Member