This machine learning experiment combines the output of three different neural networks to get the final results shown above. The original input sequence is a short clip of a dancer I filmed in a recent photoshoot.
The first neural network generates abstract/primitive displacement maps from the input dance sequence. A second network takes these displacement maps and renders them into 3d and lights them with approximate global illumination, which it learned from training on an equivalent bunch of GI rendered frames (note the rendering glitched from imperfect training of this network). Finally a third network generates gold and silver specular reflections based on the input sequence and this output is composited over the rendered frames to give the final look.