In a perfect world, where the 2 track stereo, or 5.1 mix was mixed from completely dry elements in the digital (or even analog) multitrack, then lining up the audio so it is sample accurate, in a Digital Audio Workstation, phase inverting either the dialog, or the rest will in theory result in dialog without the rest of the soundtrack.
In theory.
There are lots of reasons why this would not work in reality. If some ambience has been processed in at the audio mastering stage, this will smear elements of the dialog across the stereo soundstage, both in terms of stereo position and temporal position (in the case of reverb), so canceling out the dry vocal can leave a ghost image of the voice channel in the music/foley mix.
If the original audio was done in analog simple variation in tape player and recorder speeds (wow/flutter) can mean that a sample accurate version of the multitrack might not be possible.
Audio restoration is a field in its own right, and companies like Cedar specialize in it completely.
Our ears are capable of discriminating elements of sound psycho-acoustically (in that we can just 'ignore' some elements and 'enhance' others (In the same way as you can hear your own name across a crowded room - Whereas a computer algorithm would just interpret that as noise.
That's not to say that processing may not be able to do that in the future, just now, it's actually quite difficult, given the sometimes complex nature of the composite soundtrack.
|