You keep a document going called "state of the world", on every turn, you read this document in (as context), use it to help compute what happens, and based on what happens, create an updated "state of the world" document. You track important details so your LLM is consistent from turn to turn.
If you doing an RPG, which I guess is where this is more obvious, you track the play and enemy positions, their health, their moods and perhaps top thoughts, the state of important inanimate objects. if you break down the door, you update the door's state in the document. This is in contrast to just giving the LLM the previous turns and hoping it realizes the door is broken down later (just by statistical completion).
You can trust that a model that scores 40% vs a model that scores 90% is indeed worse.
You can’t trust it that a model that scores 93% is better at software engineering than a model that scores 90%, because at that point it’s impossible to distinguish between recall and reasoning.
It’s honestly far better to just ignore SWEBench Verified in 2026. Multiple labs have noted issues with contamination, and achieving high scores require memorisation of what passes the prescriptive verifier; not what is a correct solution.
40% vs 90%? Sure.
70% vs 90%? _Absolutely meaningless_ as you are not measuring coding intelligence but “how well can the model cheat flaws in SWEBench Verified”, the former can certainly be better at coding even assuming no deliberate benchmaxxing / foul play.
> It's much like climate science today: any dissent at all, even just questioning the predictions of catastrophe, immediately brands you as a heretic.
I think this is not a great example, as there’s a huge group of people that, in fact, does not agree with the consensus and would happily fund research that (tries to) prove otherwise.
I fully agree with your point, though, just not the example.
having worked in amyloid, and in an a-beta lab in the second half of the 2000s, we always said under our breath in group meetings that we were skeptical about the amyloid hypothesis, but our grant applications certainly did not say that (or if they did it was a quick throwaway sentence). And I think the lab that I landed in was one of the most honest scientific labs in biochemistry/chemical biology.
The comment you were replying to was talking about funding. If you could develop a scientifically plausible model to defend the "burning fossil fuels is not so bad, actually" thesis, your funders would include the oil companies and the greater petrochemical industry. There is a lot more money to fund projects there than... anywhere else in the world, really, by a wide margin.
well oil companies funded "lead fuels are safe" research...
...and it really did backfire (in public relations, politics, etc)
now... I don't think they can actually fund 'research-for-their-profit' -- I mean, would you believe "petro is good for earth" research coming from oil companies, even after the 'lead is good or neutral' research?
Not uncritically, but if the research presents a logically consistent hypothesis, and evidence supporting it, then it would be worth following up on with independent groups and if it remains consistent to scrutiny then it should be accepted.
There are so many counterexamples proving that your statement is just not true. I'll give you just one example, the Berkeley physics professor Richard Muller that took funding from the Koch foundation to attempt to "prove" that the satellite temperature data was "miscalibrated" and estimates of actual warming were overblown. Started the project in 2010. First published in 2011 showing that in fact the warming was real and using more advanced calibration techniques actually showed the warming was worse than we thought.
Expecting scientific rigor is not a bad bias: everyone who has been willing to do actual science agrees that climate change is real and significant. For example, Richard Muller was a climate skeptic who had a great job at one of the most prestigious universities in the world, got funding to establish a team to critically review climate science research … and concluded it was right:
“When we began our study, we felt that skeptics had raised legitimate issues, and we didn’t know what we’d find. Our results turned out to be close to those published by prior groups. We think that means that those groups had truly been very careful in their work, despite their inability to convince some skeptics of that.”
If you haven’t read up on both, it’s hard to appreciate how unlike climate science is from the beta amyloid theory. The latter has some evidence but there were always alternate theories by serious researchers because it involved multiple systems which scientists were still working to understand and basic questions around causation and correlation had significant debate.
In contrast, climate scientists reached consensus about climate change four decades ago and by now have established many separate lines of evidence which all support what has been the consensus position. More importantly, since the 1970s they have been making predictions which were subsequently upheld by measured data from multiple sources. The ongoing research is in fine-tuning predictions, estimating efficacy of proposed interventions, etc. but nobody is seriously questioning the basic idea.
Almost all of the people you hear dismissing climate change are funded by a handful of companies like Exxon, whose own internal research showing climate change was a significant threat produced a chart in 1982 which has proven accurate:
Over the past decades the group that are not happy with the AGW consensus in the hard earth sciences crowd have principally funded FUD via think tanks, ala the pro-tobacco lobby back in the day, rather than research.
The few examples of research driven from the skeptic PoV (eg: urban heat skewing, etc) have landed on the side of the AGW consensus.
If anything the current consensus on the scientific front lines is that the alarmism is understated, and the real orthodoxy is astroturfed denial of the facts.
The global fossil industry is worth around $11 trillion a year. It supports some of the worst regimes in the world.
Of course they're going to try to FUD away the science, with the usual copy-paste narratives about how it's really scientists and academics who are corrupt.
It's all about money, power, and entitlement. Not about truth or responsibility.
But no amount of PR nonsense, astroturfing, and false accusation is going to make the slightest difference to climate reality.
In a world where it enables you to tell your place of a work “just get us an account there so we have access to all models under a single billing account”.
In other words, it solves an organizational problem, not a technical one. That’s what the 5.5% is for.
Whether or not you prefer this or OpenRouter or one of the other LLM gateways is another discussion.
How does it handle “unredaction” in responses? E.g. let’s say the LLM does something with the document. You redacted its input, so it emits redacted content. Now what?
The proxy keeps 2-way mapping of identified PII and the redaction e.g. Jane Doe <-> <PERSON_1> so the process is reversable i.e. redactions from LLM response will be replaced back to the original, and it should feel transparent on user end. I'll add more detailed example in README to make it clear.
And djb (the djb) also wrote djbdns.
There are plenty of examples, usually when it coincides with someone’s first project.
reply