> When you use Camera, VoiceOver describes objects in the viewfinder. To take a photo or start, pause, or resume a video recording, double-tap the screen with two fingers.
They can also generate alt text for photos which do not have that information already. Here's a video of the person in the original article describing this feature:
It does! It goes beyond simply enumerating objects and can describe their properties or context as well -- for example, it'll describe a husky as "a black and white dog lying on a wooden floor", or a soft drink as "a transparent cup with brown liquid in it".
VoiceOver also works with another accessibility tool called Magnifier, allowing it to be used as a general "what am I looking at" tool.