Something that an AI analysis will never get right (not at the moment, at least) is human performance and expression. The "why" of it. Why play it exactly that way, when you could have played it a thousand different ways. The answer to that is something only a human can give you, and every one will be slightly different even if the overall message is mostly the same. But everybody will answer that "why" with a slightly different tone, volume and speed.
If you are going to sample instruments conventionally with separated articulations, you'll never get enough to be able to fit a recorded performance for every expression and every line.
What you're looking for already exists. And has for centuries. If you put written notes in front of an experienced player, they'll analyze it in real time and play it right out on their instrument. If your writing is good and your notation is clear, drawing upon their years of experience and reading scores, they'll probably get very close to what you want in the first couple passes.
If you know how to play an instrument yourself, you don't need to write down any notes. You just play it out of your head.
Since the whole idea is using virtual instruments ourselves, this is what you should look for instead. Look for virtual instruments that will let you truly express yourself. Then you can be churning out tunes in real-time. No AI will ever get it right for you.