# Physical Modeling for the Human Voice ?



## muziksculp (Nov 28, 2020)

Hi,

I was wondering if it would be possible to create a Physically Modelled Instrument for the Human Voice.

Basically, to create Choral, and Solo Voices, with different vowels, and maybe some syllables, ..etc.

I don't think something of this nature exists, if it does, please let me know, if it doesn't, developers... take note. This could be something you want to think about developing.


Cheers,
Muziksculp


----------



## DSmolken (Nov 28, 2020)

Any synth with a formant filter is, to a degree, modeling the human voice. PPG Phonem is probably the most complex take on that - not easy to use!

Also, it doesn't even have to be software, and people have been working on such models since the XVIII century.


----------



## AllanH (Nov 28, 2020)

Back in the 80s, a company (SoftVoice, if I recall) developed a physical model of the vocal cords and used that to synthesize speech. It was initially included in the Amiga OS as the "narrator" device. It was remarkably good, especially in view of the CPU and memory footprint.

I remember a few interesting conversations with one of the founders, and I'm pretty sure he told me that it was a physical model built around synthesizing phonemes. Anyway, I don't think they ever tried to make it sing


----------



## nolotrippen (Nov 28, 2020)

muziksculp said:


> Hi,
> 
> I was wondering if it would be possible to create a Physically Modelled Instrument for the Human Voice.
> 
> ...


Would love to be able to model Sinatra, Karen Carpenter, Mama Cass, King Cole, etc.


----------



## Rob (Nov 28, 2020)

I wish Myriad, developer of Virtual Singer, had the will and the power to bring it from a useful but ugly sounding device to a full, multisampled voice synth. The ability to sing in every language is already there, they just need to nail the tone... wonder if that's possible


----------



## mgnoatto (Nov 28, 2020)

Vocaloid








VOCALOID - the modern singing synthesizer -


This is the official VOCALOID site run by Yamaha Corporation. You can purchase the downloadable versions of singing synthesizer software and Voice Banks, such as VOCALOID5. You can also find all of the latest information on VOCALOID here, such as TIPS, tutorials on creating music with VOCALOID...



www.vocaloid.com


----------



## David Cuny (Nov 28, 2020)

I think Synthesizer V is the most promising vocal synthesizer on the market. Typically, performances need to be "tuned" by hand. The latest version is adding AI driven vocal tuning:



Unfortunately, the English versions seem to be lagging behind the Asian language versions - there's no English male voice, and the AI tuning seems to be currently limited to the "Saki" voice.

Here's a hand-tuned vocal. There are a number of places where the vocal is overly synthetic, so it won't pass for an actual human to a discerning listener. But it's a very good performance:


----------



## RogiervG (Nov 28, 2020)

emvoice can also sound very realistic when applied correctly. (not many good demos unfortunately on the web yet.. but i heard a few in the past that where very good.


----------



## DSmolken (Nov 28, 2020)

Yeah, there's interesting things brewing, and if you want vowels and some syllables then full English isn't necessary. XiaoiceSing and Neutrino are capable of great realism and fluidity though not great expression. If you want to tweak the legato and vibrato and have a decent selection of voices, Synthesizer V is probaly the top choice right now. It's gotten ahead of Vocaloid in R&D, I think.

I've said this before: there's be money in researching and modeling the differences between choir members' tuning and timing, and making a synthesized choir that emulates those.


----------



## RogiervG (Nov 28, 2020)

problem is the language.. most modelling is based on Japanese, not English. It seems hard to emulate english realistically.


----------



## Dirtgrain (Nov 28, 2020)

There was a thread a few months ago about how to add scratchiness/rasp to a voice--maybe relates a bit: https://vi-control.net/community/threads/creating-a-raspy-voice.96996/


----------



## DSmolken (Nov 28, 2020)

If you want rasp, I once made a virtual vocalist from recordings of a girl doing death metal vocals, heh. A lot is possible in this realm. It's just that it takes a lot of data, especially for languages with a significant number of possible consonant clusters, and, well, the human competition is often cheap and easy to work with.


----------



## Leandro Gardini (Nov 28, 2020)

I was looking at this one the other day. It looks like that software that can sound amazing with the right programming.



E_CANTOR


----------



## Leandro Gardini (Nov 28, 2020)

I've used this one before and recommend it. The official demos don't make justice to the quality that you can get out of this synth.






Plogue | Alter/Ego :: real-time singing synthesizer


A free real-time singing software (VST/AU/AAX plugin) synthesizer based on the technology featured in chipspeech but focusing on more modern techniques. Windows and macOS.




www.plogue.com


----------



## Scamper (Nov 28, 2020)

Is this not good enough for you?


Pink Trombone


----------



## mgnoatto (Nov 28, 2020)

David Cuny said:


> I think Synthesizer V is the most promising vocal synthesizer on the market. Typically, performances need to be "tuned" by hand. The latest version is adding AI driven vocal tuning:
> 
> 
> 
> ...



The basic version is free, it's a lot of fun. Eleonor sounds good!


----------



## timbit2006 (Nov 28, 2020)

Lookup Vocaloids, Plogue also has a pretty good one. There is one original voice that is harder to find and sounds the best, using waybackmachine you can still get it if I remember correctly.
Izotope's Biovox is really good for adding subtle realism as well as MCharacter to emulate overtones that a vocalist normally generates, you can automate the overtones for a more believeable effect.

I just also remembered that Antares makes a voicebox emulator called Throat, I have not tried it yet though and I've been waiting for a sale.


----------



## Wally Garten (Nov 28, 2020)

This is the only human voice synth you will ever need:









Audio Plug-ins


Klevgrand is a creative studio and software company in Stockholm run by film makers, musicians, software developers, producers and sound designers.




klevgrand.se







If you can't make a Top 10 banger with a modeled Swedish opera singer, I don't even know what to say.


----------



## timbit2006 (Nov 28, 2020)




----------



## Wally Garten (Nov 28, 2020)

leogardini said:


> I've used this one before and recommend it. The official demos don't make justice to the quality that you can get out of this synth.
> 
> 
> 
> ...



On a more serious tip, I agree with this. I've had a lot of success with it. It won't do _perfect_ words (there's some occasional distortion) but if you imagine it's just somebody with a weird accent, it sounds pretty good. If you just need something to add word-textures (especially if blending with a sampled choir or something), and it doesn't have to be perfectly intelligible, it's very good indeed.


----------



## muddle (Nov 28, 2020)

These sound great, but limited in style. Would like more classical timbre of voice. Ive tried for years to do this with a french notation program which has virtual voice capability. But I can see that this would eventually be the way to go. For your interest the attachment is an Italian language version of a midi file using that notation program. 
I actually use it mostly for word singing choirs ( en masse soloists) but sometimes a solo voice works well.
David L


----------



## David Cuny (Nov 28, 2020)

Pink Trombone is an excellent example of "real" physical modeling.

The difficulty with that approach is implementing the articulations. Microsoft went that path, and vastly underestimated the difficulty of the approach. They went in full of optimism, and the project disappeared without a trace.

Part of that might have been also because the results of machine learning approaches were giving much better results.

If you listen to the online version of SAM (SoftVoice's first product), you'll notice that the voice is very synthetic, but also quite expressive. The synthesis _technology_ behind it is ridiculously simple, but the implementation is clever enough to make it work well. But it's the pitch contours that well the vocals, and if you turn that directly into singing, the result is dry and without expression:




The example above was from my own re-implementation of SAM, since you can't actually specify specific pitches and durations.

Replacing the 40 year old synthesis method with 20 year old technology will give you a better sound, but still has the same issues with expression. Here is entirely synthesized using a rule-based formant synthesizer I wrote, based on SAM's vocal data:



I've built a concatonative synthesis vocal singing engine, which means I need to capture all the combinations of transitions. This is more complicated in English than Asian language, because there are a lot more phonemes.

It's better than SAM - especially the articulation - but the performance is still flat. And that's with quite a bit of performance logic coded into it, including a bit of overshoot, and delayed progressive vibrato:



The way I'm doing synthesis is decades behind the state of the art vocal synthesis technology. I'm using these examples because I think they help make the point that _expressive_ synthesis is really where the technology needs to catch up.

If you listen to a musical performance, you'll see there are a host of vocal techniques that are typically not captured at all in notation.

Even when AI is applied to create a vocal performance (as _SynthesizerV_ is doing), it's still only one possible performance of many plausible performances.

So yes - developers have certainly taken note. But it's a much harder nut to crack than it initially appears, partly because because we can do it so effortlessly that we assume it must be easy.


----------



## muddle (Nov 28, 2020)

Of course this French notation program does many languages and styles. How about opera?

David L


----------



## muddle (Nov 28, 2020)

Or a more serious voice . Just a note that these voices are hand created. I believe that no one else in the world has ever taken the time to get these results with the program ... hence why these new synthetic voices are the likely future.


----------



## dflood (Nov 28, 2020)

I think the so-called *deepfake* technologies may soon offer a shortcut to creating convincing artificial vocal performances that could one day prove a lot easier to use than physical modelling as I understand it. 

The concept is simple enough: using machine learning algorithms you would simply sing a part into your DAW and the technology would transform it into the vocal signature of a reference voice. The performance would retain the nuances of your own performance, but it would sound like somebody else. So, presumably, if I wanted Frank Sinatra to sing ‘Mary had a Little Lamb’, I would just record it with my own voice. I would then point the machine learning algorithm at enough Frank Sinatra vocal recordings for it to develop a vocal signature target. It would then grind through a sufficient number of iterations using typical machine learning trial and error techniques until it got it right. 

Of course it’s a lot more complicated than that, and there are some major ethical and legal ramifications to be considered as well. The main difference I see between this approach and typical ‘physical modelling’ is that the developer would not need to painstakingly model all of the underlying timbral parameters that give a voice its unique characteristics. The user input would also be greatly simplified as long as you can reasonably carry a tune, since all you would need to do is sing in the part rather than painstakingly hand assembling phonemes with a keyboard or a DAW user interface.

Anyway, good or bad, I think this is where we are headed. More *here*.


----------



## Noeticus (Nov 28, 2020)

This is the LSD (drug) thread right now on VI-Control.



That's right, LSD is not the "London Symphony Drama" toolkit.


----------



## Kevinside (Nov 28, 2020)

Don t forget the german company called virsyn with their product called Cantor...
I made a demo for them...


----------



## ShidoStrife (Nov 28, 2020)

RogiervG said:


> problem is the language.. most modelling is based on Japanese, not English. It seems hard to emulate english realistically.



Because Japanese (and many asian languages) don't have the abomination that is english pronunciation. The spelling <-> pronunciation is pretty much 1:1. Unlike english where read rhymes with lead and read rhymes with lead, but read doesn't rhyme with lead and read doesn't rhyme with lead.


----------



## muziksculp (Nov 28, 2020)

What about a simpler Physical Modeling Instrument that's good for creating male, and female vowels, Oooohs, Aaaahs, Eeeee. ... ? Not words.


----------



## David Cuny (Nov 28, 2020)

ShidoStrife said:


> Because Japanese (and many asian languages) don't have the abomination that is english pronunciation. The spelling <-> pronunciation is pretty much 1:1. Unlike english where read rhymes with lead and read rhymes with lead, but read doesn't rhyme with lead and read doesn't rhyme with lead.


No, the issue it's the number of core sounds (phonemes).

English has _more_ phonemes than Japanese or Chinese.

It's a combinatorial problem. More sounds means your synthesizer needs to produce more transitions from phoneme _x_ to phoneme _y_. Transitions have to exist for every phoneme to every other phoneme.

You can't just paste the sounds together - the transition from one phoneme to the next is important.
And for some phonemes - /B/ and /D/, for example, the placement of the articulators differs greatly depending on the phoneme that follows.

If you're working with a language like French, you need to deal with how vowels are nasalized. So there are even _more_ phonemes to deal with.


----------



## DSmolken (Nov 28, 2020)

muziksculp said:


> What about a simpler Physical Modeling Instrument that's good for creating male, and female vowels, Oooohs, Aaaahs, Eeeee. ... ? Not words.


In theory, that's any synth with a formant filter, including plenty of older hardware synths and free VSTs like Scrooo. I'm not sure if there is one which would have more singer-like legato and vibrato, but there might be. That would probably help a lot, like something with the flexibility of Scrooo and humanity on the level of Jussi. With some different settings for the formants and timing, might be able to pull off a decent choir, too.


Noeticus said:


> This is the LSD (drug) thread right now on VI-Control.


If you really want to go down that path, I could dig out some of the weirder stuff people have used Marie Ork in.


----------



## givemenoughrope (Nov 28, 2020)

Not Jordan Peterson







www.notjordanpeterson.com












This AI lets you generate eerily realistic Jordan Peterson sound bites


NotJordanPeterson is a mind-boggling voice AI that lets you say anything you'd like in the voice of Canadian psychologist Jordan Peterson.




thenextweb.com


----------



## David Cuny (Nov 28, 2020)

givemenoughrope said:


> Not Jordan Peterson
> 
> 
> 
> ...


Although neural networks (NNs) can be trained to speak, getting them to sing is a problem of a higher level of complexity.

NNs can be trained by entering in lots of tagged data. By "tagged data", I mean metadata describing what the phoneme is, what the prior and following phoneme is, the part of speech, the syllable, the relative position in the sentence, and so on.

NNs do their magic and generate statistical models of this data.

You can then input a sequence of tags, and the NN will generate the most statistically likely output.

With singing, the amount of data increases by a _huge_ amount because you need to include pitch and duration information. Getting coverage of every phoneme transition for every pitch for any given duration is a nightmare.

Most NN based programs end up cheating this by generating the vocals using standard NN processes, and then applying rules to convert the spoken text to "sung" text.


----------



## David Cuny (Nov 28, 2020)

dflood said:


> The concept is simple enough: using machine learning algorithms you would simply sing a part into your DAW and the technology would transform it into the vocal signature of a reference voice. The performance would retain the nuances of your own performance, but it would sound like somebody else. So, presumably, if I wanted Frank Sinatra to sing ‘Mary had a Little Lamb’, I would just record it with my own voice.


You'd then have Frank Sinatra's voice with _your_ vocal technique.

Which - unless you're a professional singer - is probably not your desired result.

It's not the voice, it's what you _do_ with your voice. The goal would be to apply Frank Sinatra's nuances, not your own.

Otherwise, Melodyne would make us all great singers. I can attest from personal experience that's not the case.


----------



## dflood (Nov 28, 2020)

David Cuny said:


> You'd then have Frank Sinatra's voice with _your_ vocal technique.
> 
> Which - unless you're a professional singer - is probably not your desired result.
> 
> ...


I don’t think it’s either/or. You can improve your singing technique with practice, and you can learn to mimic other people’s singing styles, but you can’t do much about the baked in timbral qualities of your voice beyond some basic parametric pitch and formant shifting in programs like Melodyne.

I can sing passably well, but I always sound like myself. There are times when I’d like to include harmony vocal parts in my compositions. Although that can now be done with some degree of success with programs like melodyne or outboard effects units like the Voicelive Extreme, I envision a day when I can just sing in those parts separately and process that track from a selection of target singing voices. It might sound mad, but I don’t think it is all that far off.

I should also mention that of course it’s always better to have access to live vocalists but that’s not what this thread is about.


----------



## DSmolken (Nov 28, 2020)

David Cuny said:


> With singing, the amount of data increases by a _huge_ amount because you need to include pitch and duration information. Getting coverage of every phoneme transition for every pitch for any given duration is a nightmare.


Yup. XiaoiceSing used a database of one singer singing literally thousands of pop songs in Mandarin. Perhaps the tagging was also automated... but even recording that much stuff is a ton of work.


----------



## rodyrode (Nov 29, 2020)

Physical modeling is not practical for human voices, because you need to model a dynamic 3D object that constantly changes its shape, and is made of soft, wet tissues. It's exponentially more complex than modeling, say, a piano. That's why it has never been done in a way that gives realistic results. 

And, since we've been listening to human voices all day, every day since before our birth, we're all experts at telling a real voice from an artificial one. That means the smallest mistake in the model will "break" the simulation. The result can be pleasant, if you're looking for artificial-sounding voices, or unpleasant otherwise.

If your goal is to create realistic choirs, you may want to try Emvoice One. Here is a basic example of what you can do with a single voice: https://soundcloud.com/madduxdavid/love-has-tricked-you-well
There are two voices and a third one is coming. Full disclosure: CEO of Emvoice here


----------



## David Cuny (Nov 29, 2020)

rodyrode said:


> Physical modeling is not practical for human voices, because you need to model a dynamic 3D object that constantly changes its shape, and is made of soft, wet tissues. It's exponentially more complex than modeling, say, a piano. That's why it has never been done in a way that gives realistic results.


Any physical modeling simulation is going to be a simplification at some point.

There are plenty of PM simulations of the human vocal tract, ranging from one dimension to three dimensions. Even the 1D versions - basically build from delay lines - are capable of capturing the effect of losses from soft tissues.

Brad Story's _TubeTalker_ was a 3D model that accounted for this as well.

That said, I think the many advantages of neural networks have made it by and far the preferred technology for vocal synthesis.



> The result can be pleasant, if you're looking for artificial-sounding voices, or unpleasant otherwise.


I'm not sure it's a binary "pleasant" _vs. _"unpleasant".

But it _is _a binary "useful for my use case", which varies from person to person.



> Here is a basic example of what you can do with a single voice:


I guess it depends on what you mean by "realistic".

It's musical, well articulated and pleasant. And - dare I add - a much more compelling example of what Emvoice One can do than what I heard on your main page! 

But... I'm not sure many how people would think it's not synthetic.


----------



## YaniDee (Nov 30, 2020)

Phonem. This was an amazing attempt at voice synthesis. Wolfgang Palm is the main inventor of the PPG wave synth..For some reason this product is not currently for sale.


----------



## Tom Ferguson (Nov 30, 2020)

This song is an unironic banger (for me at least  ) and I had no idea this was an AI vocal synth for a long time.


This song isn't so awesome, but maybe even more convincing. Same vocal synth, AI Kiritan ( https://n3utrino.work/ )! Check it out it's free to use and pretty fun. Only Japanese though. 

Both of these were made with alpha version too. There have been a few updates since then, and it's now considered in 'beta'.


----------



## Tremendouz (Nov 30, 2020)

Synthetizer V can be really good but needs a lot of tweaking to make the timing of everything sound natural and it also struggles with pronouncing some words.

I made this example a long ago when the first version came out. Ignore the cheesy fan-translated lyrics haha


----------



## DSmolken (Nov 30, 2020)

YaniDee said:


> Phonem. This was an amazing attempt at voice synthesis. Wolfgang Palm is the main inventor of the PPG wave synth..For some reason this product is not currently for sale.



Yeah, Wolfgang Palm closed his software business in March... still, very interesting software. Incredibly hard to use, I honestly think only one person on this planet really really knew how to use it and released some patches for it. Great for Krautrock soundscapes. For singing a pop song, seems possible but impractically labor-intensive.


----------



## rodyrode (Nov 30, 2020)

David Cuny said:


> But... I'm not sure many how people would think it's not synthetic.



Most people don't realize it's synthetic, and I don't see how they could without knowing about Emvoice in the first place. Lucy is already the main singer in a small number of "bands", and people seem to like the vocals without realizing their true nature. Obviously, the producers don't disclose their little secret  so you won't find them with a simple search on "Emvoice", and I can't share examples here. I only know about some of them because they got in touch with me.


----------



## Leandro Gardini (Nov 30, 2020)

muddle said:


> These sound great, but limited in style. Would like more classical timbre of voice. Ive tried for years to do this with a french notation program which has virtual voice capability. But I can see that this would eventually be the way to go. For your interest the attachment is an Italian language version of a midi file using that notation program.
> I actually use it mostly for word singing choirs ( en masse soloists) but sometimes a solo voice works well.
> David L


The timbres sound good. Which notation program is this?


----------



## Rob (Nov 30, 2020)

muddle said:


> These sound great, but limited in style. Would like more classical timbre of voice. Ive tried for years to do this with a french notation program which has virtual voice capability. But I can see that this would eventually be the way to go. For your interest the attachment is an Italian language version of a midi file using that notation program.
> I actually use it mostly for word singing choirs ( en masse soloists) but sometimes a solo voice works well.
> David L


yeah, this is what I'm using too... pretty incredible program really


----------



## Rob (Nov 30, 2020)

leogardini said:


> The timbres sound good. Which notation program is this?


I think it's Harmony Assistant, by Myriad, with its Virtual Singer


----------



## Werty (Nov 30, 2020)

muddle said:


> Or a more serious voice . Just a note that these voices are hand created. I believe that no one else in the world has ever taken the time to get these results with the program ... hence why these new synthetic voices are the likely future.



which program is it?


----------



## Rob (Nov 30, 2020)

Werty said:


> which program is it?


see my reply above


----------



## Quasar (Nov 30, 2020)

The biggest single barrier to ever physically modeling the human voice in a "realistic", playable way is the simple fact that we're hardwired to have–by many orders of magnitude–a more finely tuned perception to human faces and voices than we have for just about anything else in the world, so facsimiles won't be able to fool us, and even if they do they won't have the depth or "soul", that we're so extremely sensitive to.

It would be easy to sample or model a cow's moo that sounds entirely realistic to us, because from our perspective all cows sound pretty much alike and we're not all that sensitive about it. But if every single cow had an utterly unique moo (which they probably do to other cows) and each intonation could induce a virtually infinite range of psycho-emotional responses, it likely wouldn't work.

IOW, us imitating us is different than us imitating anything else.


----------



## DSmolken (Nov 30, 2020)

Sure, and that's why it makes sense to first pick the low-hanging fruit, where we'll be easier to fool. Death metal vocals, obviously, but also choral vocals are a relatively easy target compared to solo pop vocals.


----------



## José Herring (Nov 30, 2020)

DSmolken said:


> Any synth with a formant filter is, to a degree, modeling the human voice. PPG Phonem is probably the most complex take on that - not easy to use!
> 
> Also, it doesn't even have to be software, and people have been working on such models since the XVIII century.



Is it me or is that just creepy?


----------



## Leandro Gardini (Nov 30, 2020)

Rob said:


> I think it's Harmony Assistant, by Myriad, with its Virtual Singer


Interesting software. I only wish we could use it as a plugin in our DAWs.


----------



## Marcus Millfield (Nov 30, 2020)

If you have a Windows computer and are familiar with Powershell, you can use the free SpeakSynthesizer class to speak every word or sentence you want. There are male and female voices you can use. Phonetics work best. They don't sing, but speach can be accurate.


----------



## Rob (Nov 30, 2020)

leogardini said:


> Interesting software. I only wish we could use it as a plugin in our DAWs.


That would be nice... still with its ability to import-export xml, even complex orchestral scores, it can communicate with daws and notation software. My wish is for better voice timbres


----------



## Nick Batzdorf (Nov 30, 2020)

Sampled voices are one thing, in fact the Take 6 voices in Omnisphere are probably my favorite thing to play of everything.

But these things are totally creepy. You can't manufacture a human soul. Westworld is sci-fi.


----------



## Quasar (Nov 30, 2020)

Nick Batzdorf said:


> Sampled voices are one thing, in fact the Take 6 voices in Omnisphere are probably my favorite thing to play of everything.
> 
> *But these things are totally creepy*. You can't manufacture a human soul. Westworld is sci-fi.



Yes, the uncanny valley effect comes into play with this in a way it just wouldn't with a modeled trumpet.


----------



## dflood (Nov 30, 2020)

Voiceful - Give voice to your ieas


We want to help people communicate better so we have developed Voiceful, a toolkit that uses voice to create new ways of expressing yourself.




www.voiceful.io





This company doesn’t offer a DAW plugin. They seem to be licensing their SDK and cloud service for a variety of computer generated vocal applications including singing. Seems mainly geared toward advertising. Some of the demos are pretty impressive, and there’s one where you can alter the lyrics of a song and have it sing back the results. They also offer some insight into the technologies they are using.


----------



## DSmolken (Nov 30, 2020)

dflood said:


> Voiceful - Give voice to your ieas
> 
> 
> We want to help people communicate better so we have developed Voiceful, a toolkit that uses voice to create new ways of expressing yourself.
> ...


Also, Voctro Labs - the people behind much of the R&D which went into the original Vocaloid.


----------



## pondinthestream (Nov 30, 2020)

David Cuny said:


> So yes - developers have certainly taken note. But it's a much harder nut to crack than it initially appears, partly because because we can do it so effortlessly that we assume it must be easy.


 cue reference to Marvin Minsky and Seymour Papert setting computer vision as a summer project for masters (?) students back in 1966


----------



## lychee (Dec 3, 2020)

Using Friktion (a physical string modeling synth and more), I noticed that among the IRs offered, there was the IR voice.
So I went on an experiment trying to create a voice instrument:


----------



## lychee (Dec 3, 2020)

After that I had the same idea as @muziksculp, why everyone is struggling with tons of samples to make choir when we could maybe use a more malleable synthetic base to create an instrument that speaks?
It may not be necessary to reproduce a physical model of the mouth, I think simple basic materials can do the trick.
But this thread reminded me that there were solutions such as Vocaloid ...
But all of these solutions certainly require a lot of research to imitate speech, so another idea occurred to me that I don't know if it has ever been exploited (if it hasn't, someone will surely steal my idea and I will miss my mountain of dollars). 

So, why not create some sort of evolved vocoder, from which we would choose a voice base (solo male, female or M/F choir)?
That would be controlled by our own voice, with our own words, and it might need a vowel detector that would change the source according to our sounds.
It could avoid creating complicated word builder, and also solve the compatibility problem with different languages.

What do you think of this idea?


----------



## muziksculp (Dec 3, 2020)

lychee said:


> After that I had the same idea as @muziksculp, why everyone is struggling with tons of samples to make choir when we could maybe use a more malleable synthetic base to create an instrument that speaks?
> It may not be necessary to reproduce a physical model of the mouth, I think simple basic materials can do the trick.
> But this thread reminded me that there were solutions such as Vocaloid ...
> But all of these solutions certainly require a lot of research to imitate speech, so another idea occurred to me that I don't know if it has ever been exploited (if it hasn't, someone will surely steal my idea and I will miss my mountain of dollars).
> ...



Yes @lychee . That's a great idea ! 

We need a real choir Vocoder ... we just sing or say some words, and we can play them as a Choir, now that would be awesome. No wordbuilders, or limitations to a few latin words (dominos, ..etc.) The Choir Vocoder can emulate sopranos, Altos, Children's choirs both male/female. ..etc. 

Cheers,
Muziksculp


----------



## rodyrode (Dec 4, 2020)

muziksculp said:


> Yes @lychee . That's a great idea !
> 
> We need a real choir Vocoder ... we just sing or say some words, and we can play them as a Choir, now that would be awesome. No wordbuilders, or limitations to a few latin words (dominos, ..etc.) The Choir Vocoder can emulate sopranos, Altos, Children's choirs both male/female. ..etc.
> 
> ...



That's basically the plan for Emvoice One. Since playing a voice with a keyboard doesn't make much sense, we figured the best way to control would be with your own voice. It may not be real time in the first iterations though. So you'd sing for example in your laptop microphone (quality doesn't matter) then get the result back immediately, sung by whatever you chose. Once it becomes real-time, the tech can be used for karaoke too


----------



## muziksculp (Dec 4, 2020)

rodyrode said:


> That's basically the plan for Emvoice One. Since playing a voice with a keyboard doesn't make much sense, we figured the best way to control would be with your own voice. It may not be real time in the first iterations though. So you'd sing for example in your laptop microphone (quality doesn't matter) then get the result back immediately, sung by whatever you chose. Once it becomes real-time, the tech can be used for karaoke too



Hi @rodyrode ,

That would be fantastic.

I'm guessing *Emvoice One* is still in the development phase.

Yes, this type of tool would make a lot of sense, given all the restrictions of traditional sample libraries.

Looking forward to know more about your product.

Thanks for giving us a heads up on it.

Cheers,
Muziksculp


----------



## bvaughn0402 (Dec 4, 2020)

What does the one guy here that has a computer voice speaking for him use? I’ll have to look him up. He does mostly reviews.


----------



## muziksculp (Dec 4, 2020)

bvaughn0402 said:


> What does the one guy here that has a computer voice speaking for him use? I’ll have to look him up. He does mostly reviews.



It's a secret.


----------



## pmcrockett (Dec 4, 2020)

lychee said:


> After that I had the same idea as @muziksculp, why everyone is struggling with tons of samples to make choir when we could maybe use a more malleable synthetic base to create an instrument that speaks?
> It may not be necessary to reproduce a physical model of the mouth, I think simple basic materials can do the trick.
> But this thread reminded me that there were solutions such as Vocaloid ...
> But all of these solutions certainly require a lot of research to imitate speech, so another idea occurred to me that I don't know if it has ever been exploited (if it hasn't, someone will surely steal my idea and I will miss my mountain of dollars).
> ...


I've played around with vocoding my voice with a Vocaloid. The results maybe improved the timbre slightly, but it still sounded pretty synthetic. But yes, I think it's a strong idea for a virtual instrument.


----------



## muziksculp (Dec 4, 2020)

I can imagine recording my voice singing something, or saying a word, or whatever... Then use a plug-in that can transform it to multiple voices, singing as a choir, and be able to adjust the harmony, sonic characteristics, range, formant, choir size, ..etc. No more crappy word-builders, or being stuck with samples singing the same words a million times.


----------



## David Cuny (Dec 4, 2020)

lychee said:


> So, why not create some sort of evolved vocoder, from which we would choose a voice base (solo male, female or M/F choir)?
> That would be controlled by our own voice, with our own words, and it might need a vowel detector that would change the source according to our sounds.
> It could avoid creating complicated word builder, and also solve the compatibility problem with different languages.
> 
> What do you think of this idea?


Well, consider what a vocoder is. It's basically a bank of filters that listens to slices of frequencies, and drives a notch filter bank that lets a different source signal through.

There's no "detection" involved, other than detecting whether the input signal is voiced or unvoiced.

For musical purposes, a vocoder can replace a voice with a different source - say, cellos. If you use a signal similar to a glottal pulse, you'll get something quite similar to the original vocal signal. Change the pitch of the carrier signal, and you'll have shifted the pitch of the voice.

But replacing the glottal pulse won't change the position of the formants - that's essentially baked into the harmonic information of the voice. So the voice will sound pretty much the same.

To make it sound like a different person (for the purposes of a choir), you'll need to shift the position of the formants. That's not especially difficult - just create a frequency offset between the input and output filter banks.

But the voice will now be _exactly_ the same in terms of timing and articulation, and choirs don't work like that. So you'll need to stagger the signal in time as well.

I suspect that's basically how Clone Ensemble works, and you can get some surprisingly good results with it. No synthesis, no complicated physical modeling.


----------



## lychee (Dec 7, 2020)

I did a little experiment with the Wave Ovox vocoder and Kontakt's base library, Choir.
It's still synthetic but it's not that awful as a result.
I first tried with my own voice, but it was even more synthetic so I took (or rather steal) an extract of a female voice as a guide for Choir, through Ovox.
So that it doesn't sound weird, I still had to play with the vowel keyswitchs in Choir corresponding to the song.
Maybe with a more advanced choirs VST the result could be better? but even if it's far from perfect, I still think it's still a way to explore.


----------



## pmcrockett (Dec 7, 2020)

lychee said:


> I did a little experiment with the Wave Ovox vocoder and Kontakt's base library, Choir.
> It's still synthetic but it's not that awful as a result.
> I first tried with my own voice, but it was even more synthetic so I took (or rather steal) an extract of a female voice as a guide for Choir, through Ovox.
> So that it doesn't sound weird, I still had to play with the vowel keyswitchs in Choir corresponding to the song.
> Maybe with a more advanced choirs VST the result could be better? but even if it's far from perfect, I still think it's still a way to explore.


This is really impressive, especially given that none of the stuff you used for it was ever originally intended to produce a result like this. I'll have to check out Ovox.


----------



## Vardaro (Dec 11, 2020)

I wondered about _whispering_ the text, providing the formants over a white(ish) noise?


----------



## David Cuny (Dec 11, 2020)

Vardaro said:


> I wondered about _whispering_ the text, providing the formants over a white(ish) noise?


If you want to extract formants, check out Praat for an extremely cool free tool for vocal analysis. It's an _amazing_ tool.


----------



## Vardaro (Dec 12, 2020)

Could the exracted formants and consonants be applied to a sampled "ah" patch?
(Classical singers' vowels often _tend_ _towards_ "ah" for resonance and equal volume within a phrase.)


----------



## DSmolken (Dec 12, 2020)

If you first rolled back that singer's "a" formants, you'd end up with a non-formanted voice... that's be a very interesting sound by itself... the unfiltered human voice oscillator.


----------



## Rob (Dec 12, 2020)

Did a lot of experimentation on this, and had to conclude that the modulator formants are always too present in the output (basically it always sounds like you singing), and the sound always has a whispery quality that isn't what you want...


----------



## Vardaro (Dec 12, 2020)

One detail which affects realism (to my ears, at least): the transition from consonant to vowel and back.
Since the mouth closes partially for consonants, a _very_ brief insertion of a "o" or "oo" before or after the consonant might be sufficient?


----------



## lychee (Dec 12, 2020)

Vardaro said:


> One detail which affects realism (to my ears, at least): the transition from consonant to vowel and back.
> Since the mouth closes partially for consonants, a _very_ brief insertion of a "o" or "oo" before or after the consonant might be sufficient?



If you talk about my experience above, there are things that make the whole thing synthetic.

First of all the vocoder does not differentiate between singing and breathing and "shhhhhh" and "ssssssh" sounds, everything is treated like a note.
I tried to decrease this effect in Ovox with the "sibilance" button, but that doesn't make everything go away.

Secondly I use Kontakt's Choir, which is quite basic, there is no legato to tie each note, everything done in crossfade which is not great.

Finally I only have basic vowels (Aa, Ee, Ih, Oh, Uh, Mm), and sometimes the sounds of the voice ring somewhere between these vowels (Eu, Ei ...), it is for that i said the experience would be better with a better vowels choir plugin.


----------



## David Cuny (Dec 12, 2020)

DSmolken said:


> If you first rolled back that singer's "a" formants, you'd end up with a non-formanted voice... that's be a very interesting sound by itself... the unfiltered human voice oscillator.


A plain glottal pulse sounds a _lot_ like a kazoo.

Obtaining the glottal pulse directly from a vocal via inverse filtering turns out to be non-trivial. Instead of deriving the pulse purely from inverse filtering, most people choose a glottal pulse model, and then try to apply the results of inverse filtering to best fit that model. 

If you want to play around with a simple formant synthesizer, have a look at this formant synthesis demo. The "sampled" option will load a sampled glottal pulse. The quality of the underlying waveforms can be altered by moving the "Shape" slider, but has no effect on the "sampled" waveform.


----------



## DSmolken (Dec 12, 2020)

David Cuny said:


> A plain glottal pulse sounds a _lot_ like a kazoo.


Heh. Not surprising. I would have guessed it'd be more like bagpipes.

A bunch of different glottal pulses might make an interesting wavetable for a "regular" non-voice synth, too.


----------



## Vardaro (Dec 12, 2020)

As the AH is a kind of medium vowel, could we just "colour" it with EQ to push it towards EE, OO etc?
Or even French vowels and nasals?


----------



## David Cuny (Dec 12, 2020)

Vardaro said:


> As the AH is a kind of medium vowel, could we just "colour" it with EQ to push it towards EE, OO etc?
> Or even French vowels and nasals?


Yes and no.

Your insight that the vocal sound "colouring" can be simulated with some sort of EQ is correct.

But the idea that the /AH/ is neutral, and therefore amenable to manipulation, is a bit off the mark.


You can think of the simplified vocal process as something like:

*glottal pulse -> vocal tract -> vowel*

The glottal pulse is a harmonically rich sound that sounds a lot like a kazoo. It's fairly simple to generate a glottal pulse waveform that works well.

The *vocal tract* portion is also pretty easy to simulate. The tongue divides the mouth into several resonating chambers. Each chamber - depending on the length - emphasizes a particular frequency. The specific frequencies determine which vowel is produced. Each resonating frequency is called a "formant", and these formants are the "colouring" that changes the buzz of the glottal pulse into a vowel.

It turns out that you can effectively simulate these resonating chambers by using bandpass filters. For most vowels, only two formants are really required, although 4 to 6 are generally preferred.

So the *vocal tract* can be simulated by an bandpass filter. A super-simple simulation would look like this:

* pulse generator -> bandpass filter 1 -> bandpass filter 2 -> vowel*

Now, here's the problem with your suggestion: the neutral /AH/ vowel has already been "coloured", which is how you identify it as an /AH/ vowel. The specific frequencies and bandwidths vary from individual to individual, but for me the first four formant frequencies for /AH/ are {689, 1033, 2641, 3445}.

By the way, the _frequency_ of the resonating frequency doesn't necessarily correspond to the order of that resonating chamber in the vocal tract.

So you can imagine, the process of "uncoloring" the vowel to make it sound like the original glottal pulse - essentially a kazoo - isn't trivial.


Now, if you wanted to modify the /AH/ through clever _harmonic_ manipulation so the formants were placed where other vowels are, _would_ work.

But only somewhat, because the formants in different vowels have different bandwidths (even though the ear is relatively insensitive to it). More problematic would be the "stretching" you'd have to do to the waveform.

There are just simpler ways to accomplish the same end, which yield better results.


----------



## dflood (Dec 12, 2020)

lychee said:


> So, why not create some sort of evolved vocoder, from which we would choose a voice base (solo male, female or M/F choir)?
> That would be controlled by our own voice, with our own words, and it might need a vowel detector that would change the source according to our sounds.
> It could avoid creating complicated word builder, and also solve the compatibility problem with different languages.
> 
> What do you think of this idea?



For choirs and backing harmonies I think it’s the way to go and the *TC Helicon Voicelive* hardware line is already pretty good at this in real time. Some of the choir patches on my Voicelive3 Extreme are very nice. Unfortunately, their products are not very DAW friendly since they are aimed at the live performer.

The Voicelive is also wonderful for tuning and polishing and altering any vocal performance, but for creating convincing solo performances sung with a completely different vocal signature than the recorded vocal input, I think it’s going to take a different approach.


----------



## timbit2006 (Dec 13, 2020)

Well if we're now talking about emulating choirs using the human voice...

Can't beat MUnison


----------



## muziksculp (Dec 13, 2020)

timbit2006 said:


> Well if we're now talking about emulating choirs using the human voice...
> 
> Can't beat MUnison




Interesting ! 

Thanks for sharing this. 

I will check it out.


----------



## DSmolken (Dec 13, 2020)

So, to put it in simple terms: if your head was cut off above the vocal cords, and you still could scream, you'd sound like a kazoo.

I guess if you're doing sound design for a horror movie with headless monsters, you might want to disregard that and not insist on using kazoos for realism.


----------



## Vardaro (Dec 13, 2020)

"But the idea that the /AH/ is neutral, and therefore amenable to manipulation, is a bit off the mark."

My thoughts (albeit mistaken) on the AH sound (or the A as in "bat") come from the fact that it is produced with an open mouth and flat tongue; all other vowels involve arching the tongue in different ways, and closing or shaping the lips. And it is readilly available in vocal libraries!
But I can see that it already carries the individual singers formants.


----------



## Vardaro (Dec 14, 2020)

...And his/her very individual "kazoo"!


----------



## eli0s (Dec 14, 2020)

I can say without any doubt that Synthesizer V is leap and bounds ahead of the competition today. Perhaps the full version of Emvoice is close enough, however, after trying the demo of Synthesizer V, I was blown away and bought the pro version for the vst support.

Here is a short piece that I would have never dared to write using any other vst lead vocal.

Destilled

Having used Vocaloid, EW Symphonic and Hollywood Choirs, I am telling you that working with Synthesizer V is a liberating joy!
Now, it doen't come without flaws! There is only one English singing voicebank. Also, the vst plugin has some bugs (at least for me within Cakewalk), and weird limitations (no automatic tempo/meter change sync).

I am really exited to see where this technology will lead to, there seems to be some AI implementation that they are about to release (for the non English voicebanks) which improoves a lot the realism.


----------



## servandus (Dec 14, 2020)

eli0s said:


> Here is a short piece that I would have never dared to write using any other vst lead vocal.
> 
> Destilled



Wow! Fantastic use of SynthV!

The improvements AI brings to the table are really impressive. I love how the breath adds some soul to the second example.





I really hope the time will come when we can hear a choir using this technology.


----------



## eli0s (Dec 14, 2020)

servandus said:


> I really hope the time will come when we can hear a choir using this technology.


Oh man... That would be something!!!


----------



## DSmolken (Dec 14, 2020)

It will be money for the developer who pulls it off.


----------



## fabrizio (Dec 14, 2020)

... I liked very much eli0s' job with Synthesizer V, too! Actually, I checked their page and I think "Destilled" sounds much better than their own demos. Very well done, very interesting.

Quite some time ago I had a run at making my own patch of a singing choir. Just for fun, for the sake of it. The original plan was to build two choirs female/men.... I quickly decided that the female version was enough . It is samples-based and requires some programming on-the-go to blend consonants and vocals... not overly tough but not exactly a relaxing experience. It did not take me more than a few days to build the program and try it on a test rendering. I chose "Bist du bei mir" (Stolzel/ Bach), in german. According to the way it was built, I guess it would have no problem whatsoever to "sing" in any language.

Bist du bei mir

Bist du bei mir lyrics - for following along - are:

"Bist du bei mir, geh ich mit Freuden
zum Sterben und zu meiner Ruh
zum Sterben und zu meiner Ruh
Bist du bei mir, geh ich mit Freuden
zum Sterben und zu meiner Ruh
zum Sterben und zu meiner Ruh

Ach, wie vergnügt wär so mein Ende,
es drückten deine schönen Hände
mir die getreuen Augen zu!

Ach, wie vergnügt wär so mein Ende,
es drückten deine schönen Hände
mir die getreuen Augen zu!
Bist du bei mir, geh ich mit Freuden
zum Sterben und zu meiner Ruh
zum Sterben und zu meiner Ruh"

This was a funny project!

Fab


----------



## DSmolken (Dec 15, 2020)

Whoa. Speaking of different approaches to the problem...









Audio Plug-ins


Klevgrand is a creative studio and software company in Stockholm run by film makers, musicians, software developers, producers and sound designers.




klevgrand.se


----------



## Rob (Dec 15, 2020)

Still I think Myriad's approach to voice simulation has the best potential... just listen to this rough rendering of the famous "nessun dorma" aria, and try to imagine it with a sweet tone, perfect consonants and vowel transitions.


----------



## fabrizio (Dec 16, 2020)

Rob said:


> Still I think Myriad's approach to voice simulation has the best potential... just listen to this rough rendering of the famous "nessun dorma" aria, and try to imagine it with a sweet tone, perfect consonants and vowel transitions.


Roberto, first of all let me congratulate you for your artistry, I do love your ways with the ivory keys (S.O.O.N. is really, really top notch). I am perfectly aware that you are a musician with remarkable sensitivity and taste but... Myriad (as any other mentioned solution, to tell the truth...) is still very far away from a decent result. Music has to be pleasant, first of all. You said it yourself: sweet tone, nice transitions and so on.
Nonetheless, I do agree with you: this issue needs to be solved by synthesis or maybe by hybrid approach. Samples alone will never do it, too many parameters involved. Above mentioned Synthesizer V - for example - chose the hybrid path, if I got it right. 
Well... the issue, actually, doesn' t "need" to be solved. I am perfectly comfortable with the idea of singing being performed by ... well, singers. My very humble opinion, this goes without saying.


----------



## Rob (Dec 16, 2020)

fabrizio said:


> Roberto, first of all let me congratulate you for your artistry, I do love your ways with the ivory keys (S.O.O.N. is really, really top notch). I am perfectly aware that you are a musician with remarkable sensitivity and taste but... Myriad (as any other mentioned solution, to tell the truth...) is still very far away from a decent result. Music has to be pleasant, first of all. You said it yourself: sweet tone, nice transitions and so on.
> Nonetheless, I do agree with you: this issue needs to be solved by synthesis or maybe by hybrid approach. Samples alone will never do it, too many parameters involved. Above mentioned Synthesizer V - for example - chose the hybrid path, if I got it right.
> Well... the issue, actually, doesn' t "need" to be solved. I am perfectly comfortable with the idea of singing being performed by ... well, singers. My very humble opinion, this goes without saying.


thanks a lot Fabrizio, and of course I agree with you. But of all the singing software I have tried VS is the one that better aproximates the behavior of the voice, specially in the italian (well, maybe all latin) language... the editor allows for very deep customization, including recording one's own phonemes, so it's also fun. I'm optimistic, even if I don't want for a moment imagine a world of virtual singers. 
But I'm not scared, at the same time, by technology. For the work that do, having the ability to deliver vocal tracks for real singers to rehearse, or have an idea of the opera they'll be singing in, is of great help. A big thank you for your words about my trio!


----------



## Leandro Gardini (Dec 16, 2020)

Rob said:


> thanks a lot Fabrizio, and of course I agree with you. But of all the singing software I have tried VS is the one that better aproximates the behavior of the voice, specially in the italian (well, maybe all latin) language... the editor allows for very deep customization, including recording one's own phonemes, so it's also fun. I'm optimistic, even if I don't want for a moment imagine a world of virtual singers.
> But I'm not scared, at the same time, by technology. For the work that do, having the ability to deliver vocal tracks for real singers to rehearse, or have an idea of the opera they'll be singing in, is of great help. A big thank you for your words about my trio!


It would be great if you could record a tutorial of VS. There's nothing on the internet.


----------



## Rob (Dec 16, 2020)

leogardini said:


> It would be great if you could record a tutorial of VS. There's nothing on the internet.


Leandro, I'll try and find the time...


----------



## Vardaro (Dec 17, 2020)

Myriad: a right click on the tutorial pages will offer Print > Save As PDF
Is "VS" the Virtual Singer plugin to Melody Assistant?

I love that Nessun Dorma sung by a light lyric tenor!


----------



## Leandro Gardini (Dec 17, 2020)

Seriously, this experiment sounds better than all the expensive virtual choirs we have on the market.






The best vocal quartet yet?


https://artsandculture.google.com/experiment/blob-opera/AAHWrq360NcGbw?cp=e30. Have any of you seen this? It's pretty fun haha. It actually sounds like you'd be able to get away with it in a mix!




vi-control.net


----------



## dsy (Dec 30, 2020)

eli0s said:


> I can say without any doubt that Synthesizer V is leap and bounds ahead of the competition today. Perhaps the full version of Emvoice is close enough, however, after trying the demo of Synthesizer V, I was blown away and bought the pro version for the vst support.
> 
> Here is a short piece that I would have never dared to write using any other vst lead vocal.
> 
> ...



Hi, 
very good demo, you should sent it to the Dreamtonics.
Is it Eleanor voice? I don't recognize.


----------



## eli0s (Dec 30, 2020)

@dsy , thanks!

I've posted a link at Dreamtonics' forum, so the piece exists there also.

Yes, it's Eleanor with some parameters shifted from their defult values (gender towards the "male" side, less tention, less voice...). Also some eq, compression and reverb/delay...


----------



## eli0s (Jan 15, 2021)

A new piece using Synthesizer V, this time in a more... eastern style. I've used Saki's Lite voice bank witch I find has an inherently deeper voice than Eleanor.

Asia Minor

Try to do something like this with any other virtual voice library... Not happening! Now I wish for a native Greek sound-bank!


----------



## dsy (Jan 16, 2021)

Nice eli0s, I like this voice too and the song is rich: harmonies, sounds, percussions. Nice job!

Quizz: which physical modeling plugin was used for the song Smooth? 
More a demo than a song... Very easy if you know all these plugins.


----------



## timbit2006 (Jan 16, 2021)

Speechelo - The Best Text To Speech Softare







speechelo-offer.com




What's everyone's thought on this? I think I will get it for any sort of spoken word effect in songs but it would be interesting to see how it can be shifted to be more vocal-like


----------



## lychee (Mar 21, 2022)

It seems that someone is working on an idea similar to what I described a year ago, based on a sample controlled by a theremin:


----------



## nolotrippen (Mar 21, 2022)

lychee said:


> It seems that someone is working on an idea similar to what I described a year ago, based on a sample controlled by a theremin:



I want it!!!!!


----------



## lychee (Mar 21, 2022)

It's sad that it's not in real time apparently.


----------



## Markrs (May 17, 2022)

I know a few on here use Emvoice and I came across a tutorial on using it with a soprano voice to sing 16th century music with Dorico.









Scoring a 16th century ayre with Dorico and Emvoice One - Scoring Notes


How to make the Emvoice One voice synth emulate the voice of a trained singer, adding more realism to a score produced in Dorico Pro.




www.scoringnotes.com


----------



## lychee (Oct 29, 2022)

Oh my god, this guy is a f... genius!
At each of his achievements I fall to the ground with my jaw dropped.

Rather than saturated again and again with yet another sample library that doesn't bring more than the others, I think we could go much further with synthesis in areas such as expressiveness, or even in word builder.


----------



## David Cuny (Oct 29, 2022)

Markrs said:


> I know a few on here use Emvoice and I came across a tutorial on using it with a soprano voice to sing 16th century music with Dorico.


Emvoice is certainly useful in mocking up songs. For this, absolute realism isn't always the most important feature, especially if you're planning on replacing the vocals in the end.

But creating the vibrato using individual notes is problematic. 

First, the pitch change is like a sine wave - it goes above _and_ below the target pitch. This is different vibrato on a violin, where you''ll roll your finger towards the body of the instrument, and then back into position, which results in a vibrato that doesn't go under the pitch.

There's also an amplitude change, although the amplitude doesn't go in the negative direction.

Here's a algorithmic example of progressive vibrato - amplitude on top, pitch below:






As he notes, volume automation also has to be done in the DAW. Hopefully they've added these features to Emvoice by now, because they're really helpful in getting an expressive performance.

Emvoice also sounds a like a male voice pitched up on the higher notes, with a unnatural purity in the tone. Voice timbre changes as you change pitch, so if you interpolate a note from a different, you'll end up a with a note sung at the correct pitch, but with the spectral qualities of the lower note.

The location of the nasal anti-resonances (which dampen frequencies) are at fixed positions. They're "baked in" to the spectral curve, so when you interpolate a higher pitch, you're shifting their position as well. That probably has some small affect as well.

I've heard a lot of vocal synthesis tools sound really good when using copy synthesis - transferring the parameters of a live performance onto the synthetic vocal. It's the same thing with any VI - the demo can sound very good when copying parameters, but getting a good performance "out of the box" is a different sort of animal.

Still a good tool, but a bit tedious to use to get an expressive performance from at the point when the video was made.


----------



## David Cuny (Oct 29, 2022)

lychee said:


> Oh my god, this guy is a f... genius!
> At each of his achievements I fall to the ground with my jaw dropped.
> 
> Rather than saturated again and again with yet another sample library that doesn't bring more than the others, I think we could go much further with synthesis in areas such as expressiveness, or even in word builder.


Synthesizing a single vowel is relatively easy.

Controlling all those parameters to get performance from it? Much more difficult. That's why breath controllers work so well with physical synthesis.

And there's a huge gap between synthesizing a single vowel and synthesizing singing. Just look at Microsoft, which was convinced that they could make articulatory synthesis work by hiring the best researchers in the field and throwing lots of money at the problem. Nothing that I know of came out of _that_ money pit.


----------



## Lord Daknight (Oct 29, 2022)

muziksculp said:


> Hi,
> 
> I was wondering if it would be possible to create a Physically Modelled Instrument for the Human Voice.
> 
> ...


Infinite Choirs!!!


----------

