I would be interested to know which part you feel is implausible, to me it seems...

I would be interested to know which part you feel is implausible, to me it seems inevitable

You have a language model produce an outline with steps and then recursively set agents to consume and iterate on a task until another language model finds the results satisfies the specification.

This includes interactions with the real world (via instructions executed over an API) and using the success of those interactions for reinforcement learning on the model.