What's new

AI Orchestral Music

Lazeez

New Member
Ok, I'm really not sure how I feel about this. I knew of AI's capability of generating non orchestral music but I didn't think the nuances and complexities of orchestral music could be handled by AI just yet. I was wrong.

I spent the last few hours prompting AI (Udio.com) to generate this orchestral film music track:


I didn't compose a note of it. I didn't arrange it, I didn't mix it, I didn't master it but I did prompt and re-prompt many times (almost a hundred times), created many remixes and edits. It not perfect/great by any means. Its a little generic sounding. There are glitches, there are questionable bow changes and weird musical choices among other things but overall, this blew my freaking mind. I have no idea how this works under the hood. But one thing I kept thinking about is its just a matter of time for developers figure out how to create sample/ai hybrid orchestral virtual instruments that can be molded exactly how the composer wants. And they will sound like the real deal, if properly composed, arranged and balanced of course.

This is wild. At this point I'm not sure yet if AI will replace composers in the future (that's a whole other intense topic) but I know it will provide them with amazing new tools to compose music with (not prompt but actually compose). Like how when full frame digital cameras made high quality film-making accessible to a lot of people. People still needed to learn how to use them and they needed to be creative and know what they're doing though to produce good results. New, more sophisticated and cheaper tools are around the corner! Maybe there will even be AI assisted composing tools (like AI assisted industrial design, AI assisted learning etc...). Cubase co-pilot?

But I could be wrong. What do I know.

Even the image I used for this track is AI generated.
 
Last edited:
Insane! Very curious what this will do in an already tough market. I would love to see this approach on individual instruments / midi data like you said, but with the speed development, I wonder how long that would be a useful thing.. it requires a lot of input audio / data to do that (I think?) and I honestly wonder if musicians are willing to collaborate in that effort.
 
but I did prompt and re-prompt many times (almost a hundred times), created many remixes and edits. It not perfect/great by any means. Its a little generic sounding. There are glitches, there are questionable bow changes and weird musical choices among other things but overall, this blew my freaking mind. I have no idea how this works under the hood.
This may prove to be one of the Achilles heels of AI in the short term. When it becomes too hard to knock the results into shape and there's no easy to way to apply small manual edits without continually regenerating the material through prompts (which call on the user to second guess what the AI will produce), more experienced people will use it for a bit and abandon. Even Rick Rubin types who may find the interface a reasonable one and cheaper than interacting with human musicians may just find it too onerous to deal with.

There will be a "that's close enough" crowd and for underscore type work maybe it's all a lot of clients will pay for.

But the way these things are implemented are very characteristic of computer scientists bolting stuff together rather than finding out how a practical FenbyAI should work. I know of one or two startups who have consulted with composers and musicians but they are very much in the minority and I suspect they will run out of cash quite quickly, if not already.
 
I think we should probably use "I prompted" this composition instead of "I generated" which sound like more work has been done by "the generator".
"I gave some extremely vague indications and the thingy came up randomly with this" would be better, but probably too long.

A bit pedantic I guess, but language has to move with the tech and describe things more accurately so we don't have a next generation of proud "generators" thinking they have actually created something.
 
I think we should probably use "I prompted" this composition instead of "I generated" which sound like more work has been done by "the generator".
"I gave some extremely vague indications and the thingy came up randomly with this" would be better, but probably too long.

A bit pedantic I guess, but language has to move with the tech and describe things more accurately so we don't have a next generation of proud "generators" thinking they have actually created something.
100 % , those who fool themselves into believing that they create something when they use AI should be mocked and shamed !
 
I think we should probably use "I prompted" this composition instead of "I generated" which sound like more work has been done by "the generator".
"I gave some extremely vague indications and the thingy came up randomly with this" would be better, but probably too long.

A bit pedantic I guess, but language has to move with the tech and describe things more accurately so we don't have a next generation of proud "generators" thinking they have actually created something.
It may eventually be a non issue because nobody will want to listen to other people's music.

Also, once the tools get better and offer more control, "prompters" may get a promotion to "music director" or some title like that.

Right now, it's too much random, too much A.I.. I played with Udio to create a song and it was a fun yet frustrating experience. I could not get it in the direction I wanted and I could not get it to repeat the chorus. Utimately all the songs I created lacked structure and direction.

Once it's better, I can see myself having fun creating songs because I can not do these the old way. For instrumentals I will continue to learn to do it the old way.
 
Wow that's pretty impressive !
I hope the next generation of VIs can sound this good and use some kind of AI "sorcery" to create more realistic performances.
Overall, in the coming weeks/months the music business like any (creative) field is gonna be massively impacted by this tech. Our main advantage for now is that we use our brains to create stuff and we know what we're doing. We can precisely change things based on client feedback. This thing is not yet able to do that.
It certainly might at some point, and then we are screwed.
 
I think that that line "I did prompt and re-prompt many times (almost a hundred times)" is very important in this case. I mean, this was not generated by a simple "make a string orchestra piece with solo violin and some horn lines". The ia was guided by a composer's hand, searching for a specific result, wasn't it? Did you get some fragments and then piece them together?
 
This may prove to be one of the Achilles heels of AI in the short term. When it becomes too hard to knock the results into shape and there's no easy to way to apply small manual edits without continually regenerating the material through prompts (which call on the user to second guess what the AI will produce), more experienced people will use it for a bit and abandon.
I had read in an article that the some of the developers wanted to have lower-level controls to the generated music, but it was left out because it didn't match the needs of the target audience.

So it's not like it can't happen today - it's just not cost effective.

Plus, text tags for music often come for free, while something more complex would require additional work, and that would cost money.

Which, come to thing of it, is another copyright violation, since the tags containing the descriptions of the music are also being scraped off of websites.

But it's not hard to imagine training an AI to classify music based on control settings (texture density, intensity, harmonic change, etc.) on some small training set, and then having that AI set the labels for the remaining data.

If there's money to be made in a niche market, someone will eventually provide it.
 
This may prove to be one of the Achilles heels of AI in the short term. When it becomes too hard to knock the results into shape and there's no easy to way to apply small manual edits without continually regenerating the material through prompts (which call on the user to second guess what the AI will produce), more experienced people will use it for a bit and abandon. Even Rick Rubin types who may find the interface a reasonable one and cheaper than interacting with human musicians may just find it too onerous to deal with.

There will be a "that's close enough" crowd and for underscore type work maybe it's all a lot of clients will pay for.

But the way these things are implemented are very characteristic of computer scientists bolting stuff together rather than finding out how a practical FenbyAI should work. I know of one or two startups who have consulted with composers and musicians but they are very much in the minority and I suspect they will run out of cash quite quickly, if not already.
Agreed. Besides the high level prompting, there is no control. Even the same prompts generate different outputs. If you don't like the result, ask it to try again. Rinse and repeat 10s of times until you get something that sounds good that still likely isn't exactly what you're looking for.

As a composer, I see this technology evolving to allow some generic level of control that would be good enough for some music directors' requirements (when they just need something quick that vaguely sounds like "this or that"). For requirements that need specificity, this mode of interacting with AI will not work. Frankly I don't want it to evolve to the level where you could be extremely specific. Its a scary thought.
 
I think we should probably use "I prompted" this composition instead of "I generated" which sound like more work has been done by "the generator".
"I gave some extremely vague indications and the thingy came up randomly with this" would be better, but probably too long.

A bit pedantic I guess, but language has to move with the tech and describe things more accurately so we don't have a next generation of proud "generators" thinking they have actually created something.
Agreed. Its the AI that's generating the music. There should be no illusions about that. I shudder at this thought: When this type of tool evolves to allow more specificity, customizability and control, new job positions like "AI Prompt Engineer (Music Specialty)" would open up that music directors might hire to provide them with the music they want. :emoji_worried: Imagine a director sitting down with an AI Prompt Engineer to create the score of a movie instead of sitting down with a composer. Yikes.
 
Last edited:
I think that that line "I did prompt and re-prompt many times (almost a hundred times)" is very important in this case. I mean, this was not generated by a simple "make a string orchestra piece with solo violin and some horn lines". The ia was guided by a composer's hand, searching for a specific result, wasn't it? Did you get some fragments and then piece them together?
That's correct. It was actually 3 generations/fragments (intro, middle and outro) that were generated. I spent some time prompting the AI to generate the middle first. Once that sounded good to me, I then did the same with an intro and finally did the same with the outro. The AI has knowledge of what a intro and outro would sound like so I didn't have to be specific about that.

Its important to note that specificity in my prompts really didn't matter that much. It felt like the AI generated what it wanted to generate based on a very high level interpretation of my prompts. The reason I had to rinse and repeat so many times isn't so that I can get the prompt just right (to get the result I was looking for) but so that I can simply get a lot of different versions of what I'm asking for. Its an arduous process that is devoid of any significant thought or creativity from my part. To be clear, I didn't quite like the experience. But I did it anyway as an experiment.
 
That's correct. It was actually 3 generations/fragments (intro, middle and outro) that were generated. I spent some time prompting the AI to generate the middle first. Once that sounded good to me, I then did the same with an intro and finally did the same with the outro. The AI has knowledge of what a intro and outro would sound like so I didn't have to be specific about that.

Its important to note that specificity in my prompts really didn't matter that much. It felt like the AI generated what it wanted to generate based on a very high level interpretation of my prompts. The reason I had to rinse and repeat so many times isn't so that I can get the prompt just right (to get the result I was looking for) but so that I can simply get a lot of different versions of what I'm asking for. Its an arduous process that is devoid of any significant thought or creativity from my part. To be clear, I didn't quite like the experience. But I did it anyway as an experiment.

My experience mirrors yours. Having to audition random results was very time-consuming (but fun)!
Some generated music were just hilarious, incoherent and eclectic.
But I am truly astounded by AI's ability to source from (in this case) Udio's different models just by inserting prompts.
It generates not only realistic music but also lyrics as well.
And this is very much still in its infancy!
 
My experience mirrors yours. Having to audition random results was very time-consuming (but fun)!
Some generated music were just hilarious, incoherent and eclectic.
But I am truly astounded by AI's ability to source from (in this case) Udio's different models just by inserting prompts.
It generates not only realistic music but also lyrics as well.
And this is very much still in its infancy!
Thats the scary part, its in its infancy. What will it look like when its mature? I'm much more interested this type of technology getting into sample modeling VSTs than to simply prompt my way to a "composition"
 
Agreed. Its the AI that's generating the music. There should be no illusions about that. I shudder at this thought: When this type of tool evolves to allow more specificity, customizability and control, new job positions like "AI Prompt Engineer (Music Specialty)" would open up that music directors might hire to provide them with the music they want. :emoji_worried: Imagine a director sitting down with an AI Prompt Engineer to create the score of a movie instead of sitting down with a composer. Yikes.
Job description: applicants must be humble enough to admit that they are infinitely inferior to ones and zeros and must not gripe when they are eventually fired after being replaced with another AI. Applicants should be able to type at 200 WPM and should have little to no knowledge of anything music-related.

Minimum job requirements: must be fluent in literally any language spoken by more than one person on planet Earth. Must be able to sit for long periods of time and generally not add any intellectual value to the music-making process.

Education: must have completed at least one year of formal school (grade school, high school, or college level) and must have a GPA greater than 0.

Salary expectations: don't have expectations and you'll be good.
 
I’m curious if like stable diffusion, you’ll in the end be able to upload audio, like a mock-up and have it re-render it - that seems more likely based on the current model than having it interpret midi or score notation. I know folks who have stable diffusion working on their own computers natively so it wouldn’t necessarily need to be using a remote service - but audio may be too demanding or the trained models not open source.

Don’t really want this but guessing it might be a development
 
I’m curious if like stable diffusion, you’ll in the end be able to upload audio, like a mock-up and have it re-render it - that seems more likely based on the current model than having it interpret midi or score notation. I know folks who have stable diffusion working on their own computers natively so it wouldn’t necessarily need to be using a remote service - but audio may be too demanding or the trained models not open source.

Don’t really want this but guessing it might be a development
That's a really interesting use case that I hadn't considered. Uploading audio of a mock-up or the combination a mockup + associated midi file and the AI makes it sound as if its played by a real orchestra. No more fighting VSTs to make them sound realistic. Its akin to handing over the notation to an orchestrator and real orchestra to play and record. You can choose the size of the orchestra, the type of hall etc.. None of this would make up for a poorly composed, arranged or balanced composition though. Nor should it :)
 
Top Bottom