Scanning through the paper, I see this "We structure our policy into two subnetworks, one of which
receives only proprioceptive information, and the other which receives only exteroceptive information.
As explained in the previous paragraph with proprioceptive information we refer to information
that is independent of any task and local to the body while exteroceptive information includes a
representation of the terrain ahead. We compared this architecture to a simple fully connected neural
network and found that it greatly increased learning speed."
It seems to me they do use neural nets. Proximal Policy Optimization is just a more novel way of optimizing them.
It seems to me they do use neural nets. Proximal Policy Optimization is just a more novel way of optimizing them.