I've been running BBR on both my web server and my own laptop for quite some time now with good results. It's included in the mainline Linux kernel - if you have kernel 4.9 or newer you can probably trial BBR yourself with these sysctls:
I've been interested in BBR since the ACM Queue article and finally turned it on yesterday on my home workstation.
But slide 27 in the slide deck is incredibly troubling, where it shows BBR completely destroying the throughput of a neighboring host using Cubic. (Cubic being the default on modern Linux).
Even if everyone upgrades to BBR, slide 30 shows more trouble in paradise with BBR not getting to a stable bandwidth allocation. I have a hard time reconciling this with the ACM Queue article, since supposedly Google saw great results rolling it out at Youtube.
Youtube doesn't have neighbouring senders using Cubic, and doesn't try to fill the path's narrowest hop. It tries to send the video at exact playback speed and if possible at high quality, and its notions of success will reflect that.
IIRC (I read the paper too) the Youtube numbers assumed that the bandwidth available and needed are both largely stable, and that their success was in using that better.
Ie. they assumed that if 4mbps appeared available most of the time, then it really was available all of the time. If the capacity of the narrowest point is 10Mbps and 4mbps youtube streaming briefly competes against a bulk download from a Cubic sender, then BPR would hold the Cubic-using download to 60%, which Youtube might count as success and you might or might not.
Delay based congestion control is the basis of LEDBAT (RFC6817) as used by BitTorrent over UDP. The claimed advantage of LEDBAT is that it is submissive to TCP, instead of dominating like BBR. This allows BitTorrent to run in the background without hurting your videocalls.
What is different about BBR that makes it dominate while LEDBAT subsides?
It's interesting to compare the work being done on BBR in (relatively) recent times with the kinds of results coming out of various implementation of SCPS[1]. In general, the biggest pain with conventional algorithms is their boom/bust cycle and (re)ramp-up speeds. Satellite characteristics provide some fascinating playgrounds.
An additive increase/multiplicative decrease congestion control algorithm has the property that the effect of a packet loss gets larger as the congestion window increases. There will be a point where the upward and downward forces on the congestion window balance out. The actual congestion window will oscillate around that.
Assuming packet loss halves the congestion window while a packet delivery increases the congestion window by 1/cwnd, the break-even point for 1% packet loss is at a congestion window of 14. (1% change of reducing congestion window by 7, 99% chance of increasing it by 0.07).
At 100ms RTT the average delivery rate will then be about 14 packets/rtt * 1500 bytes/packet * 10 rtts/second = 200kB/s.
What if we cut down the packet loss rate by a factor of 10, to 0.1%? It won't give a 10x speed increase; instead the break-even point for the congestion window will be about 45; only a 3x increase in speed. If RTT is constant, increasing speed by 10x would require reducing packet loss by 100x.
(The math here is simplified, but should suffice for illustrating this).
That configuration is already possible in Windows with the “DelayedAckFrequency” parameter and in Linux with the “quickACK” setting, but it’s unrelated to congestion control, and it will neither “solve” bottlenecks nor “turn TCP into UDP.”
Congestion control (the purpose of BBR) deals with the bottleneck problem. Delayed ACKs are an optimization to reduce protocol overhead. Indeed there are problems with delayed ACKs when combined with Nagle’s algorithm and an application that doesn’t send a steady stream of data, but it has nothing to do with the topic of this thread.
This has nothing to do with sending. On the receiving side you can't receive newer packets when a missed packet exists with an earlier sequence ID.
Ex:
4 -> 5 -> X -> 7 -> 8 -> 9
Until the client gets 6 then 7,8,9 will be held(and getting stale) until you roundtrip 6. At that point 9 may even be useless since it would be too stale and you've just added extra latency that you don't want or need.
Look up how dead-reckoning is done in realtime games, this is one of many other cases where having timely data works better over UDP than TCP.
If you would be able to set ack. freq. to 0 = no ack. needed; you would not want ordering, just like UDP but over TCP. Why would you want that you might ask, because then you can choose if you want order over the same protocol precisely for games.
For context I built this multiplayer platform, not that that prooves anything but I have 20 years of working with multiplayer: http://fuse.rupy.se/about.html
The reason you can't do that is because you would no longer have a stream based protocol which is one of the core requirements of TCP.
Really there's no up-side to doing it that way with UDP already existing and tons of downsides in changing an existing protocol in such a fundamental way.