The fact that GitHub’s Copilot has an enterprise feature that matches model output against code having certain licenses - in order to prevent you from using it, with a notification - suggests the model outputs are at least potentially infringing.
If MS were compelled to reveal how these completions are generated, there’s at least a possibility that they directly use public repositories to source text chunks that their “model” suggested were relevant (quoted as it could be more than just a model, like vector or search databases or some other orchestration across multiple workloads).
> suggests the model outputs are at least potentially infringing.
The only thing it suggests is that they recognize that a subset of users worry about it. Whether or not GitHub worries about it any further isn’t suggested.
Don’t think about it from an actual “rights” perspective. Think about the entire copyright issue as a “too big to fail” issue.
If MS were compelled to reveal how these completions are generated, there’s at least a possibility that they directly use public repositories to source text chunks that their “model” suggested were relevant (quoted as it could be more than just a model, like vector or search databases or some other orchestration across multiple workloads).