What I think you're trying to describe already exists as a product from Cisco called "telepresence". It is/was insanely expensive, was a permanent installation that only Cisco contracted techs could install, and did what you describe: It is a series of large, curved HD displays with desks at an appropriate distance from the screens/cameras, and copious amounts of indirect lighting from behind the setup to make each party look good.
It seems like the imaging/rendering technology that Google is using is much more advanced.
I’ve used such a Cisco system. Compared to regular video calls the latency and quality was light years ahead, much more natural conversations were possible. By which I mean it was possible to laugh, interject, and generally have a realistic conversation with a colleague in another country without having to compensate for video lag in that very careful way I find necessary on Meet and Zoom.
That said, there was no “emotional connection” like the Google one is described as offering. It was still a video call. There was no forgetting that. I suspect the 3D and the apparent physical closeness to the display add a lot.
Wow I forgot about Telepresence. I used it a decade ago at a Fortune 500 company. With all of the cameras and displays perfectly positions, everyone was life-sized on video, felt like you were sitting around a roundtable. Now I'm imagining that with higher resolution and 3D light field display, wow.
Low latency is more important than all the other bits. I worked at Bell Communication Research in the nineties and they had an experimental video conferencing system that used analog circuit switched video and it worked really well, mainly because the latency was only a little more than the speed of light.
I spent a year using telepresence a few times a week. It was genuinely amazing. Lifesize people 1200 miles away with audio so crisp that one of the guys was idly rubbing the edge of some papers with his thumb and I could hear it.
It seems like the imaging/rendering technology that Google is using is much more advanced.