I stumbled on the most hilarious cross-walk encounter between one of the these delivery bots and a Waymo in downtown phoenix.. it seems that neither was programmed (probably rightfully so) to take the initiative in the situation, so what ensued was a painfully drawn out exchange of agentic deference.
Interestingly, the solution to this problem in humans is that all humans have different individual aggressiveness levels. That works pretty well, but I would guess it won't be one of the first things that robot fleet operators try.
The standard way to do it with machines is to use a bit of randomisation along with exponential backoff. It's been used for collision avoidance in network protocols for a long time.