I'm totally ignorant about sound engineering stuff, but the way I look at it is that audio bits are like visual pixels. So the more your production is aimed to big pictures (louder sound) the more you would need higher pixel rate(bit rate) as to avoid getting a distorted image(sound).
Could be total bullshit and definitely not an answer to the OP, but I'm interested to know the answer as well.
Your example is pretty much the opposite of what's true.
If you're always going to capture loud sounds - that is, you're recording sounds that
always hit near the top few db on your meters, then it's more acceptable to use only 16 bits.
But you want to use 24 bits if you're going to capture quiet sounds - those that barely flicker the bottom few blocks on your meters.... or, to put it more correctly, those sounds that have a very wide dynamic range between their quietest and loudest extremes.
Say you want to record bashing on trash cans, and you will
not be playing any quiet or soft passages - it's all going to be triple-forte kabooms. In that case, you might not notice a difference between 16 and 24 bit signals.
But if you want to record brushes on a frame drum, and you want to play the lightest little scrapes followed by the loudest smacks, and you set your levels so the loudest smacks are near zero - then the lightest little scrapes will be a zillion db below those loud levels, and only be lighting the bottom couple of blocks on your meters.
That's an example of a source with a wide dynamic range, and which would benefit from being captured (and stored) at 24 bits.
So a typical orchestral library, which might have triple-pianissimo and triple-forte string section samples in the same patch, will definitely benefit from being stored and played back at 24 bits.
Think of it this way - each additional bit added to a digital signal doubles the number of vertical steps in the waveform that can be represented. That translates to 6db of "vertical" resolution per bit, so a 16 bit signal can have 96 db of range between the loudest full-level signal and zero, while a 24 bit signal can have 144 db of range.
So if you're recording/storing/playing back a 16 bit signal, and the quietest pp samples are 48db below the loudest ff samples (which is a realistic scenario), then those pp samples will be using up only half of the range - and will thus effectively be 8-bit samples. You would probably hear that as a grainy, noisy, or low-resolution sound.
But if you're recording/storing/playing back a 24 bit signal, that quietest sample which is 48db below the loudest sample will still be using up 16 bits (24 bits minus 8 bits = 144db minus 48db), and will still have 96db of range. So even the quiet sounds in a 24 bit recording can have the same resolution as the loudest sounds in a 16 bit recording.
(Note that none of this has anything to do with sampling rate (48 vs 96 etc) and only refers to the "vertical" resolution of the audio being captured, which audibly translates into "signal to noise ratio" or "noise floor" issues.)
If you're recording loud, blasting sounds with very little range between the loudest and softest passages (heavy metal guitar etc.), and which are always peaking near the top of the meters, then you may never hear the difference between 16 and 24 bit recordings.
But orchestral sounds ain't like that. So 24 bit capture is a must, and 24 bit storage really does help.