What's new

Musico - AI generated music

patrick76

Senior Member
I find it extremely interesting that AI-generated visual art is vastly more palatable than AI-generated music. And I think that's true for most people, not just musicians.

Why is that?
Good question. I was wondering the same thing. Perhaps the AI used for visual art is more sophisticated than that used for music… I’ll stop speculating though before I start to ramble.
 

Voider

From the future
I find it extremely interesting that AI-generated visual art is vastly more palatable than AI-generated music. And I think that's true for most people, not just musicians.

Why is that?
I guess it's because more people are in need of pictures (for any kind of products, websites, novels, games and so on) and therefore there are more projects for AI imagery in development.

AI can be amazing when it comes to music too though, do you know about the Beethoven X project?
It's his 10th symphony, completed by AI. The Beethoven Orchestra of Bonn performed it:

 
OP
Pier

Pier

Senior Member
Thread starter
I find it extremely interesting that AI-generated visual art is vastly more palatable than AI-generated music. And I think that's true for most people, not just musicians.

Why is that?
My hypothesis... it's just a matter of interest.

There are probably more people working on AI imaging/vision for a number of reasons:

- There's a lot of interest in AI vision for a number of applications (self driving cars, robots collecting vegetables at hydroponic farms, recognition of diseases from scans, etc, etc)

- AI images are easier to share and like on social media so that boosts interest. See Dall-E for example.

- Other applications in design/video/gaming (content aware resizing, deleting objects, AI upscaling to 4K, Nvidia DLSS, etc)
 

NekujaK

Procrastination is my superpower!
Part of it has to do with AI art programs being trained with a vastly greater catalog of images compared to the number of musical examples AI music programs have been trained on, so far.

Here's very thoughtful and, IMHO, reasonable analysis of AI music. This video was originally brought to my attention by @Tim_Wells in the another AI music thread. It's a good watch:

 

JJP

Why do they still let me post here?
I find it extremely interesting that AI-generated visual art is vastly more palatable than AI-generated music. And I think that's true for most people, not just musicians.

Why is that?
AI art tends to be static. Music is dynamic and constantly changing. AI animation may be a better comparison. I bet we could have AI do a very convincing single chord.
 

Tim_Wells

Tim Wells
I find it extremely interesting that AI-generated visual art is vastly more palatable than AI-generated music. And I think that's true for most people, not just musicians.

Why is that?
I agree with the reasons mentioned above. But also, because it's much simpler for a computer to analyze and reassemble digital pixels than audio files. Look at how complex a program like Melodyne is. And even Melodyne produces crude results when you heavily manipulate an audio file.

That's why a lot of these AI music tools use midi. Midi data is much easier for a computer to analyze and generate. But the results (as demonstrated above) are pretty lame. Think about all the time we spend finessing our midi tracks.

But it certainly could get better over time. Imagine if someone like Google teamed up with Native Instruments or such. It could be scary.

I've experimented with some of these AI tools. Notably AIVA and Soundful. The results were pretty sh!t.

I personally see these .... and future tools as less of a threat and more of a potential aid, or a personal assistant. I could see using one to generate a basic rhythm track, or an ostinato, or even as a template or idea generator for a certain theme you're going for.
 
Last edited:
OP
Pier

Pier

Senior Member
Thread starter
AI art tends to be static. Music is dynamic and constantly changing. AI animation may be a better comparison. I bet we could have AI do a very convincing single chord.
Movement in music is also a pattern that can be analyzed and reproduced by a human so it will be reproducible by AI at some point too.

Anything our brain can do will eventually be done by AI. I don't know if we will live to see that, but it's pretty much inevitable that this will happen. For better or worse.

The Moderna COVID vaccine was developed in two days by an AI. No human(s) could have done that. It then took 10 months to test and confirm it worked as expected.

But also, because it's much simpler for a computer to analyze and reassemble digital pixels than audio files.
That's really just superficial. You can feed midi files to an AI. Even scanned music sheets.

As I wrote before, there's not much going on in this area probably because there's not that much interest. There aren't many applications other than generating AI music which doesn't really solve any big issue humans have right now.
 

Tim_Wells

Tim Wells
AI needs 10s of millions of examples to learn from. For music, that means audio because the pool of available midi for AI to learn from is extremely tiny. Plus basic midi data lacks expressiveness. Sheet music might be a bit better, but it's pretty much the same situation.

So again, that means deconstructing millions of audio files, analyzing them, and then somehow reconstructing all the parts and pieces into something new. I haven't seen any technology that can do that. It's probably coming... someday.

But as it stands, it's currently much easier to do with pixels.

By the way, most of these concepts are discussed in the video @NekujaK linked above. So I can't claim them as original thoughts. :)
 
Last edited:
OP
Pier

Pier

Senior Member
Thread starter
But as it stands, it's currently much easier to do with pixels.
I've worked a bit with computer vision (face recognition, etc) and I don't think that's even remotely true. Both are very hard.

It's really unfair to compare the result of say Dall-E and Melodyne. The difference in investment into imaging and vision over the last 20-30 years has been multiple orders of magnitude above sound/music.

It's true pitch recognition with Melodyne is not perfect, but look how Tesla self driving tech still fucks up after billions of investment.

Honestly I'm convinced Melodyne would have close to perfect results if they had invested what Google or Tesla have invested (and are still investing) into their image recognition tech. The reason that hasn't happened is, again, lack of interest.
 

gsilbers

Part of Pulsesetter-Sounds.com
I find it extremely interesting that AI-generated visual art is vastly more palatable than AI-generated music. And I think that's true for most people, not just musicians.

Why is that?
Might be that programmers behind it are aiming too high too fast.
IF instead of trying to do styles/genres and commercial music, they focused on royalty free social media stuff theyd be much better. Simple kids music. Simple Ambient. Simple corporate music.
How many times have you grabbed a kontakt/synth patch where it generates a sequence from a chord you press that you felt, hey, just add a drum loops and done.

The AI guys are imo, going the wrong way about this. They are trying to use the visual AI and replicate into music, trying to capture millions of music and replicate it.
when it could be a different affair all together by just having common chord progression be randomized in key and rhythm, different drum loop beds and melodies that follow the chords and bass. YEs, much of it will sound similar but if you look at all of those short social media videos with text, the music is all very similar.

Trrying to do jazz is already difficult by a human sequencing it. Or film score where theres just too many variations.
Rock music where samples dont sound well. Classcial music is also hard even for humans.
If they focused on the kids, corporate, trap, and simpler genres with much less variations theyll get to where they want faster: keeping the rights to whatever you create and they get the royalties of your video. Plus the sub fee.
 

NekujaK

Procrastination is my superpower!
Might be that programmers behind it are aiming too high too fast.
IF instead of trying to do styles/genres and commercial music, they focused on royalty free social media stuff theyd be much better. Simple kids music. Simple Ambient. Simple corporate music.
How many times have you grabbed a kontakt/synth patch where it generates a sequence from a chord you press that you felt, hey, just add a drum loops and done.

The AI guys are imo, going the wrong way about this. They are trying to use the visual AI and replicate into music, trying to capture millions of music and replicate it.
when it could be a different affair all together by just having common chord progression be randomized in key and rhythm, different drum loop beds and melodies that follow the chords and bass. YEs, much of it will sound similar but if you look at all of those short social media videos with text, the music is all very similar.

Trrying to do jazz is already difficult by a human sequencing it. Or film score where theres just too many variations.
Rock music where samples dont sound well. Classcial music is also hard even for humans.
If they focused on the kids, corporate, trap, and simpler genres with much less variations theyll get to where they want faster: keeping the rights to whatever you create and they get the royalties of your video. Plus the sub fee.
I think we need to be careful to not conflate algorithmically generated art/music with AI generated art/music.

AGORITHMS VS. AI

Algorithmic art and music have existed for quite some time. On the music side, programs and plugins that can algorithmically generate music have existed for decades. They use pre-programmed patterns and rules, augmented with some randomization, to generate a limited musical result. We can check this box as done!

But AI music is different. With AI, the engine "learns" about music by ingesting and analyzing millions of real world examples, categorizing all the relevant elements that make up a musical work, effictively building a vast body of knowledge about music.

The AI doesn't care how simple or complex a piece of music is. A one-chord ambient background and a symphony both require the same effort to ingest and analyzie. The trick is to have a large enough body of examples, and the tools to effectively break down the music into it's essential components to create meaningful classifications.

What has muddied the distinction between algorithmic and AI, is that many algorithmic engines inaccurately label themselves as "AI", when in fact, they are not AI.

EXAMPLES ARE KING

The ultimate goal with AI music is for us to be able to issue a prompt like: "Write a dirge in the style of Eminem with Brazillian percussion" and it should come up with several reasonable choices of music based on the knowledge it has accumulated about the different elements in the prompt. The AI does this by connecting and combining all the relevant bits of categorized data it has accumulated up to this point.

But if the AI was never fed any examples of Eminem music, it won't be able to satsify the request properly. Complexity is not the issue. The AI simply needs to know what is meant by "dirge", "Eminem", and "Brazillian percussion", and the only way it can do that is through exhaustive analysis of lots and lots and lots of existing examples.

BUT WE'RE EXPECTING TOO MUCH

Beyond the challenge of effectively analyzing and classifying massive volumes of musical works, I believe one of the reasons AI music has not succeeded yet is because our expectations go beyond composition. With AI art, what the computer produces is effectively a finished product: a visual image. Yes, you can tweak it further in Photoshop, but in many cases, what the AI generates is usable as a finished piece of art.

But with AI music, achieving a finished product involves far more than just composition. We're also expecting the AI to arrange and produce a final recording. That's a much bigger ask. Composing, arranging, and production are each worthy of their own AI applications.

In the video I posted above, Huawei's smartphone AI was able to successfully complete Schubert's Unfinished Symphony, but it took a human to orchestrate and arrange it properly. But in terms of pure composing, the AI actually succeeded and did it's job quite well.

I think if we only expected AI to "compose" music, so the final output was either a lead sheet or a MIDI file, then we'd probably be quite impressed with the results so far, and we might even say it's on par with AI art. But the bar has been set much higher for AI music - everyone is expecting a finished piece of recorded music.

With AI art, we simply expect the AI to replace the painter. But with AI music, we expect the AI to replace the composer, the arranger, and the recording engineer/producer. And that requires much greater knowledge and expertise than just pure composing.
 
Last edited:

David Cuny

Grand Poobah, Royal Order of WordBuilders
I think Tim Wells hit the nail on the head by pointing out that images are composed of pixels. It's an atomic data element that can be analyzed, scaled and otherwise manipulated.

You can take an image and scale it down. Sure, you'll lose detail, but the proportions and color will remain.

To do something similar with music, you have to convert the audio into a spectrogram. That's exactly what happens with most speech synthesis. With the data converted into a 2D space (frequency vs. time), approaches that worked in the visual domain can be applied to the problem.

The problem then is to convert the spectrogram back into audio data, because phase has been lost in the process, so that has to be guessed at.

Another problem is labeling the data. If you want to type in "Short peaceful orchestral movement with strings and flutes", then there better be lots and lots of audio data marked up with similar descriptions for the network to learn from.

While there's plenty of examples of marked-up images, there aren't as many examples of music. And while once networks are trained to classify images they can then be used to classify other images (and thus create more labeled data to learn from).

Then there's the question of the size of the frame, so to speak. While images can be coerced into similar sizes for the sake normalizing data, that's not necessarily the case for audio. How does a symphony get sliced up?

It's not an insolvable problem. But I suspect that painting a spectrogram of a string quartet playing a sad song is many degrees more difficult than painting an abstract fish bowl on the moon.
 

gsilbers

Part of Pulsesetter-Sounds.com
I think we need to be careful to not conflate algorithmically generated art/music with AI generated art/music.

AGORITHMS VS. AI

Algorithmic art and music have existed for quite some time. On the music side, programs and plugins that can algorithmically generate music have existed for decades. They use pre-programmed patterns and rules, augmented with some randomization, to generate a limited musical result. We can check this box as done!

But AI music is different. With AI, the engine "learns" about music by ingesting and analyzing millions of real world examples, categorizing all the relevant elements that make up a musical work, effictively building a vast body of knowledge about music.

The AI doesn't care how simple or complex a piece of music is. A one-chord ambient background and a symphony both require the same effort to ingest and analyzie. The trick is to have a large enough body of examples, and the tools to effectively break down the music into it's essential components to create meaningful classifications.

What has muddied the distinction between algorithmic and AI, is that many algorithmic engines inaccurately label themselves as "AI", when in fact, they are not AI.

EXAMPLES ARE KING

The ultimate goal with AI music is for us to be able to issue a prompt like: "Write a dirge in the style of Eminem with Brazillian percussion" and it should come up with several reasonable choices of music based on the knowledge it has accumulated about the different elements in the prompt. The AI does this by connecting and combining all the relevant bits of categorized data it has accumulated up to this point.

But if the AI was never fed any examples of Eminem music, it won't be able to satsify the request properly. Complexity is not the issue. The AI simply needs to know what is meant by "dirge", "Eminem", and "Brazillian percussion", and the only way it can do that is through exhaustive analysis of lots and lots and lots of existing examples.

BUT WE'RE EXPECTING TOO MUCH

Beyond the challenge of effectively analyzing and classifying massive volumes of musical works, I believe one of the reasons AI music has not succeeded yet is because our expectations go beyond composition. With AI art, what the computer produces is effectively a finished product: a visual image. Yes, you can tweak it further in Photoshop, but in many cases, what the AI generates is usable as a finished piece of art.

But with AI music, achieving a finished product involves far more than just composition. We're also expecting the AI to arrange and produce a final recording. That's a much bigger ask. Composing, arranging, and production are each worthy of their own AI applications.

In the video I posted above, Huawei's smartphone AI was able to successfully complete Schubert's Unfinished Symphony, but it took a human to orchestrate and arrange it properly. But in terms of pure composing, the AI actually succeeded and did it's job quite well.

I think if we only expected AI to "compose" music, so the final output was either a lead sheet or a MIDI file, then we'd probably be quite impressed with the results so far, and we might even say it's on par with AI art. But the bar has been set much higher for AI music - everyone is expecting a finished piece of recorded music.

With AI art, we simply expect the AI to replace the painter. But with AI music, we expect the AI to replace the composer, the arranger, and the recording engineer/producer. And that requires much greater knowledge and expertise than just pure composing.

Some of the courses i took in colege was about AI and its philosophy, so its always a very intresting and deep subject. ANd its always fasinated me.

AGORITHMS VS. AI

This deals with the notion in AI philosophy of what exactly is intelligence and Artifical Inteligence.

This is basically ye old Turing Test vs Chinese room arguments. The main notion of someone interacting with "it" does it really show signs of intelligence or is it an algorithm performing a mechanical function.

Here is a summary:

The Turing test and Searle's Chinese Room argument represent two alternative definitions of intelligence.

The Turing test is based on the assumption that intelligence is difficult to formally define but can be easily recognised by behavior. So if a computer program behaves and interacts in a way that is practically indistinguishable from the behavior of human being (who we assume is "intelligent") then, based on this assumption, we should say that the computer program is also intelligent. In the Turing test, the particular behavior that is tested is holding a conversation in natural language.

Searle's Chinese Room argument is a challenge to the validity of the Turing test. It posits a complex system that behaves as if it were intelligent (in this case, holding a conversation in Chinese), but in which each component of the system follows an algorithm, and so no part of the system can be said to "understand" Chinese. The Chinese Room argument is based on the assumption that the system as a whole cannot be said to be intelligent unless some of its individual components are intelligent - it is essentially a reductionist argument.

In summary, the Chinese Room argument says that its is possible for a system to simulate intelligence without actually being intelligent. Whereas the Turing test is says that if a system can simulate intelligence then it actually is intelligent.

We can illustrate the difference between the two points of view with an analogy as follows. The Turing test would say that an aeroplane flies because it travels through the air from one place to another - it exhibits "flying" behavior. The Chinese room argument would say that an aeroplane only simulates flight because it does not flap its wings.



In the music AI side, then what exactly do we call intelligence and what dont we call intelligence when creating music.
The idea of creating a composition of a classical style using many amalgamations of melody/counterpoint etc from the classic composers. But not rendered into record form. That could indeed be a tough argument since humans do about the same, by some standard and rules. And there was an article a while back where an algorithm would be able to detect if a song will be succesfull just by matching 80% of these rules.

But now if we come to genre specific recorded music, then what exactly is AI music and what exactly is "good" and what is the difference between algorithmic based music making.
If its like asking a prodcuer to create music and the end result is what we judge it then it might be very dependent on the actual style and producers used as a target. Trap producers and corporate music/kids music etc will churn out a lot of very similar stuff, all usable, all very human. And all can be based on an "algorithm" where they just keep changing several patterns, chord progresions etc, but so the same could be said about artificial intelligence if the intelligence the algorithm is trying to emulate just stays in this realm.
T
he key word is intelligence. Intelligence will come from an algorithm no matter what. And if that algorithm fools humans into creating the sense of intelligence when it seems more AI and "algorithm" based music. But if the standard is trap music, corporate and kids productions where they all follow similar styles, design and output than a human, then that would in itself be intelligent.
And if expand a bit and say mix corporatem music with trap, then the algorithm is just mixing the way humans do; find a pattern they like, mix it with another chord progresion they like, chnage a few things and done, you have something that resembles intelligence if its done with AI.

Mostly, the idea of intelligence is based on what the definition of intillgence is. BEcause by most accounts the thermostat in your house is intelligent. it interacts with the outside world, it adjusts itself ala self awareness, and performs a job. And theres countless exmaples of varying degress of whats considered intelligent.
Does an AI have to do something that sounds like john williams to finally say oh wow thats intelligent or can it be simple trap music or kids music where somehow it matches the intelligence of a human producer of those genres regardless of how it got it there.

so thats my argument. that the level of what we percieve of intelligence for music creation is complexity. And within this complexity we can still call artifical intelligence an algorithm that didnt have to take into account millions of classical, pop, etc style of music, analyze it and output something original. That if we lower the bar of what our definition of "intelligence" is then by all accounts AI music would very well indeed do good music as good as human producers, still be called orginal, and still can be called intelligent regardless of how it got there.
 

NekujaK

Procrastination is my superpower!
Some of the courses i took in colege was about AI and its philosophy, so its always a very intresting and deep subject. ANd its always fasinated me.

AGORITHMS VS. AI

This deals with the notion in AI philosophy of what exactly is intelligence and Artifical Inteligence.

This is basically ye old Turing Test vs Chinese room arguments. The main notion of someone interacting with "it" does it really show signs of intelligence or is it an algorithm performing a mechanical function.

Here is a summary:

The Turing test and Searle's Chinese Room argument represent two alternative definitions of intelligence.

The Turing test is based on the assumption that intelligence is difficult to formally define but can be easily recognised by behavior. So if a computer program behaves and interacts in a way that is practically indistinguishable from the behavior of human being (who we assume is "intelligent") then, based on this assumption, we should say that the computer program is also intelligent. In the Turing test, the particular behavior that is tested is holding a conversation in natural language.

Searle's Chinese Room argument is a challenge to the validity of the Turing test. It posits a complex system that behaves as if it were intelligent (in this case, holding a conversation in Chinese), but in which each component of the system follows an algorithm, and so no part of the system can be said to "understand" Chinese. The Chinese Room argument is based on the assumption that the system as a whole cannot be said to be intelligent unless some of its individual components are intelligent - it is essentially a reductionist argument.

In summary, the Chinese Room argument says that its is possible for a system to simulate intelligence without actually being intelligent. Whereas the Turing test is says that if a system can simulate intelligence then it actually is intelligent.

We can illustrate the difference between the two points of view with an analogy as follows. The Turing test would say that an aeroplane flies because it travels through the air from one place to another - it exhibits "flying" behavior. The Chinese room argument would say that an aeroplane only simulates flight because it does not flap its wings.



In the music AI side, then what exactly do we call intelligence and what dont we call intelligence when creating music.
The idea of creating a composition of a classical style using many amalgamations of melody/counterpoint etc from the classic composers. But not rendered into record form. That could indeed be a tough argument since humans do about the same, by some standard and rules. And there was an article a while back where an algorithm would be able to detect if a song will be succesfull just by matching 80% of these rules.

But now if we come to genre specific recorded music, then what exactly is AI music and what exactly is "good" and what is the difference between algorithmic based music making.
If its like asking a prodcuer to create music and the end result is what we judge it then it might be very dependent on the actual style and producers used as a target. Trap producers and corporate music/kids music etc will churn out a lot of very similar stuff, all usable, all very human. And all can be based on an "algorithm" where they just keep changing several patterns, chord progresions etc, but so the same could be said about artificial intelligence if the intelligence the algorithm is trying to emulate just stays in this realm.
T
he key word is intelligence. Intelligence will come from an algorithm no matter what. And if that algorithm fools humans into creating the sense of intelligence when it seems more AI and "algorithm" based music. But if the standard is trap music, corporate and kids productions where they all follow similar styles, design and output than a human, then that would in itself be intelligent.
And if expand a bit and say mix corporatem music with trap, then the algorithm is just mixing the way humans do; find a pattern they like, mix it with another chord progresion they like, chnage a few things and done, you have something that resembles intelligence if its done with AI.

Mostly, the idea of intelligence is based on what the definition of intillgence is. BEcause by most accounts the thermostat in your house is intelligent. it interacts with the outside world, it adjusts itself ala self awareness, and performs a job. And theres countless exmaples of varying degress of whats considered intelligent.
Does an AI have to do something that sounds like john williams to finally say oh wow thats intelligent or can it be simple trap music or kids music where somehow it matches the intelligence of a human producer of those genres regardless of how it got it there.

so thats my argument. that the level of what we percieve of intelligence for music creation is complexity. And within this complexity we can still call artifical intelligence an algorithm that didnt have to take into account millions of classical, pop, etc style of music, analyze it and output something original. That if we lower the bar of what our definition of "intelligence" is then by all accounts AI music would very well indeed do good music as good as human producers, still be called orginal, and still can be called intelligent regardless of how it got there.
All fair and valid points - I appreciate the discourse.

I wasn't referring to the "I" in AI as exhibiting percieved intelligence. Yes, back in college, when I took some basic AI programming courses (when LISP was all the rage), the holy grail of AI was to convincingly emulate human thought and behavior. But since then, practical AI has focused on more mundane applications, where perceived intelligence is not the ultimate goal, but rather, the competent execution of narrowly scoped tasks with large amounts of variability.

When I use MidJourney to generate an image, never have I thought the system was intelligent. And I don't think it's trying to fool people into believing it has actual "intelligence". Everyone knows it's a software program and not a goblin sitting in a dark room painting pictures on request (although that would be much more interesting!).

For me, the label "AI" represents the ability of a system (yes indeed, at it's heart nothing but a collection of algorithms) to analyze, evaluate, and learn largely on its own, and to evolve and refine that knowledge over time. And it was in that context that I was comparing algorithmic versus AI music generation. Not assessing their output, but rather, their underlying architecture.

In both cases, the technology is able to compose music - absolutely. But the typical "algorithmic" music-generating programs and plugins that have been prevelant up to now, haven't been designed to acquire knowledge, self-correct, and flexibly expand their scope.

So in that context, I don't believe musical complexity is a significant variable. If a system is designed to effectively "learn" about music, it can learn about any kind of music. Yes, certain pop styles are simpler to break down and quantify, in that they have less variability and fewer "moving parts", but the added complexity of a classical symphony merely offers the AI more data to analyze. In the end, it still goes through the same internal process to generate either type of music. And that process is entirely data-driven.

The key lies in providing a system with the tools to effectively break down and analyze all of the necessary variables and characteristcs of a piece of music, so that it has enough good data to turn around and competently create its own original works. That's currently where the greatest challenge lies. (That, and as I mentioned previously, the unreasonable expectation for AI to create a fully produced finished recording... which has nothing to do with composing.)
 
Top Bottom