The part that stands out is that it identified the text as an imitation rather than simply guessing James Mickens.
That suggests it is picking up not only on style, but on the gap between authentic style and performed style. Useful for detecting pastiche, but pretty unsettling for pseudonymous writing.
That framing helps. When people compare MoQ with WebRTC, is the main attraction lower-level control over transport/media semantics, or are there cases where MoQ is expected to be materially better for latency or reliability?
I’m trying to understand whether it’s mainly a replacement for specific WebRTC use cases, or more of a building block for new kinds of real-time systems.
There's a few cases where WebRTC falls apart that I think MoQ could help with.
It doesn't work so well for having a low-latency broadcast. Your choices right now are - use WebRTC and deploy selective forwarding units, which are going to be something custom, and likely involve spinning up a bunch of geographically-distributed virtual machines, figuring out signalling and whatnot. Or - use HLS so you can use more standard HTTP CDN tech, but you gain orders of magnitude of latency.
MoQ should allow for a standardized CDN stack, meaning we should be able to have a more abstract service (instead of spinning up VMs, you just employ some company's CDN service and tell it where to get media from).
There's a lot of other little issues with WebRTC for certain, specific applications. Like - last I tried it, browsers will subtly speed up audio/video to keep everything in sync, and you can have scenarios where you'd rather just let the viewer fall behind a bit and skip ahead later (say you're listening to music, speeding it up isn't ideal).
Or - say you want to have a group call and capture each participant's audio individually and edit it together later for something like a podcast. It's been a while since I've tried this, but I recall it being pretty difficult to do that with WebRTC. I remember all the mixing would happen in the browser's libwebrtc and I had really limited control over things.
> use WebRTC and deploy selective forwarding units, which are going to be something custom
Would you mind explaining more? If you are doing WHIP/WHEP you should be able to drop in Broadcast Box/MediaMTX etc... and switch out servers and no one should notice. You can use browser/mobile/ffmpeg/OBS etc... get the same behavior. I care a lot about the broadcast space, want to learn about other problems.
> subtly speed up audio/video to keep everything in sync
Regarding SFUs - with something like HLS, I can really easily scale up using something like a caching CDN (not entirely sure if that's the right term). But the idea goes: I can distribute the HLS media playlist, and have my media segment entries prefixed with a caching/CDN service. The service will be configured with the actual origin server, and when a segment isn't in the CDN, the CDN fetches from the origin, on-demand. That was a nice option when I was doing owncast streaming since I really only paid based on viewership, and just had to make sure I had the correct cache-related headers on my media segments.
Or alternatively - I can push media segments up to a CDN and distribute that way, using an s3-compatible service, or just rsyncing to a server with better bandwidth, etc. One thing I didn't care for - again back when I was broadcasting with Owncast - was that I needed to make sure old media segments were expired, otherwise I would rack up an insane bill. I had a 24/7 owncast stream and if you're not on top of expiring media segments with your CDN, it gets expensive fast.
The overall idea is - serving HLS is ultimately serving files and there's a good amount of tooling for that, right.
Now that you mention it, I think WHIP/WHEP can solve some of that. I just don't know of any service where I can have that same cache/CDN-like experience, of either having the CDN connect to the origin as needed and fan-out, or where I can push up and let the service distribute. (though - now I'm googling for "webrtc sfu as a service" and see that is a thing!).
Didn't know about the playout delay extension.
Whether capturing individual audio is easier with RtpTransport or insertable streams - I'm unsure. Possibly? I just figure since MoQ is going to rely on things like WebCodec/WebAudio there's hopefully a bit more control over what happens with audio as it comes in.
I'll admit though - I've started noticing how often podcasts are clearly recorded using something that doesn't allow per-participant recordings and, I'm guessing as long as the quality is good enough most aren't worrying about it.
EDIT: feel like I should mention Pion rules, I used it a few years ago to put together an SRT-to-WebRTC thing and RTMP-to-WebRTC thing to use with Janus Gateway, it was so easy.
I think this is an important distinction. Documentation and automation can preserve artifacts, but not the actual capability.
A runbook can tell you what usually works, but it cannot tell you when the situation is no longer “usual.” That kind of judgment mostly comes from seeing real systems fail in messy ways over time.
Tools are still valuable, of course. But they work best when they help experienced people transfer knowledge, not when they are used as a reason to remove the people who understand the system.
Yeah this really looks like an encoding issue during migration.
I've run into similar problems when moving old content between systems, especially with MySQL and mixed encodings. It can get messy surprisingly quickly.
If you used UTF-8 then you can probably fix those pretty easily. UTF-8 is sufficiently structured that it’s extremely unlikely for a string that’s written in another encoding to be valid UTF-8 by accident. If you write some code to try decoding as UTF-8 and set that as the encoding when it succeeds, you’re unlikely to damage anything in the process.
I like this kind of benchmark, especially since it uses problems that are harder to overfit to.
That said, single-attempt results are a bit hard to read into. For anything code-like, things like retries, test feedback, or just letting the model iterate tend to change the outcome quite a bit.
This is where stochastic approaches start to feel a bit uncomfortable.
Even small mistakes can make something dealing with sensitive data hard to trust. It seems useful as a first pass, but I’d probably still want some deterministic checks or a human in the loop to feel confident using it.
I built a community tool for exactly this, based on privacy first principals but around the what. It’s workflow based and not “put your sensitive data into ChatGPT and hope it captures the right stuff”. Mostly built for security folks but anyone can use it
That suggests it is picking up not only on style, but on the gap between authentic style and performed style. Useful for detecting pastiche, but pretty unsettling for pseudonymous writing.
reply