Sane & interesting enough to have been disproven, by Boaz Barak iirc. Maybe not surprising since simulated annealing never achieved the results of gradient descent + backprop.
What makes statistical mechanics so brilliant is that it takes first principle ideas (particle energies + ensemble) to derive macroscopic thermodynamic rules, all of which were originally derived from observation.
What the OP is proposing is a mathematical analysis of SGD + generic deep learning architectures might be able to derive the rules we have empirically derived from experiments in model training.