I write Rails aswell but comparing Rails with Erlang is kinda weird, they are for completely different domains. Both can co-exist in this world. These are only tools, not a cult or part of personality.
Whats crazier is that Codex is free. I thought I had to pay to even try it out but nope, you can use the desktop app or cli for free, its apparently included in the free plan. You just have to sign in to your ChatGPT account.
Of course I am aware that the caveat here is that all my interaction is part of training, but I’m fine with that. Even Qwen Cli discontinued the free plan.
I stopped using my Claude subscription because it became so prohibitive. Back to ChatGPT and Codex full time and been pretty happy. I miss the tone/writing style of Claude, but don't miss the frustration of being told I've reached my plan limits in a comically short amount of time.
Using these prompts/steering[0], setting Base style to Friendly, Warm to More, Enthusiastic to Default, Headers, Lists, and Emoji to Less, I have found I can get gpt-5.5 about ... 80% of the way there to writing as non-annoyingly as Claude. And it's so much faster and has such higher limits that that's worth it for me.
I also put together this ridiculous thing[1] because I missed the font and color scheme of Claude.
Some of it is in my customized instructions, some of it I fed pieces in at a time saying "remember this please:" so it goes into Memories.
I'm not entirely clear on the mechanism by which memories make it into context, so it's possible some of it isn't all the time, but it does seem to be working reasonably.
Again, it's not as good as Claude when it comes to writing "not like an AI". But it's significantly better than it was.
FYI I'm actively working on aimpostor, so check back in a couple days for some quality improvements. (I'm definitely not going to bother with a Sparkle updater or anything like that.)
on Codex I ran into limits maybe like 2 times in 3 months, after doing several "upgrade this experimental game to my latest shared framework" passes on 5.5 Extra High
I can go through a 5-hour limit with a $20/mo Plus subscription in a few minutes with 5.5 Extra High. This causes me to reserve the latest/best rev for the harder problems.
5.5 really does seem to be very superior to 5.4, but it's also very expensive to run: The gas gauge moves fast. It's not very clearly defined whether 5.5 will cost less to get a problem solved quickly, or if a bunch of automatic iterations of 5.4 will solve it less-expensively. Both are often frustrating to me on the $20 plan.
Most of those commits since the last few months are thanks to Codex reviews (but the code is not AI generated): 5.5 since it came out, and 5.4 etc before that, almost always on Extra High because it's for a framework that underlies the other stuff I do so I want make to sure everything's correct.
Sometimes I have to run multiple passes on the same task: I rarely continue any session beyond 4-5 prompts to avoid "bloat" or accumulate "stale context", so sometimes Codex finds different stuff in subsequent reviews of the same file/subsystem.
The project is modular enough where each file can be considered standalone with only 1-2 dependencies, and I already used to write a lot of comments everywhere (something some people laughed at), so maybe that helps the AI along?
I'm taking this, along with my own experience, to mean that the GPTs are cheaper to use for refactors of an existing body of work than they are for creating a new one.
(And perhaps part of that is in the name? These "LLM" contraptions are very good at translation, after all. And tokens seem to relate more to concepts than to specific phrases or words.)
the current state of that 20$ claude plan, despite twice this week them stating better usage. first for "double 5 hour usage", then for 50% overall more usage a week.
MAYBE the 50% overall is true, but the double usage during a 5 hour window i just dont see it at all. I've maxed 3 5 hour windows since this happened, 0% chance it was double as much as normal, i ate up about 4-5% of my weekly total each time(this was ~10% each time pre announcements). wish i could give token numbers but its obscured i just know it was around 120k 4.6 with some delegation to sonnet subagents.
So SURE its almost certainly more allotted weekly, but if those totals are consistent for 5 hour blocks, you gotta split your daily usage into at least 3 sessions with 5 hours between them to even hit that weekly limit. its unreal how much they have burned their good reputation in a 2 month stretch, i am positive its also being astroturfed with bots more than happy to advance the narrative.
the internet is annoying, these tools are overall cool, just wish anthropic would go back to being semi predictable.
5.5 is absolutely comparable to opus 4.7 (both on highest effort), maybe even better. It generally seems less lazy, faster, and writes code closer to what I'd write. The only downside is that for very very long tasks, it can kind of lose track of the goal. For tasks under ten minutes I'll go with codex every time.
The main difference is in the frontend skills. GPT produces terrible design. What I do these days is ask Opus to produce an HTML mockup, then feed it to Codex.
I have not had problems with long goals. I let it chomp for 40 minutes on a proof in my custom theorem prover (xhigh fast), and it got there. Very happy with Codex, I ditched Claude for it.
I switched some time after Anthropic bricked their models with adaptive thinking. It's a legit mystery to me how people are still using CC professionally.
Codex is far less frustrating and manages context better. It's also costing me about 1/3rd as much as Opus 4.7 on CC.
IME, based on an in-house bench it's still good to about 20% on the 1M for 4.6 and 4.7 with a code base >50k loc. The trick I used before switching providers was to have it write a handoff when it hit ~18% of context and reset.
There are also many people running 4.5 with specific parameters that claim to be having luck.
I stopped trying to use Claude to do anything with 4.7 because it sucks up so many tokens so quickly. I use the 4.6 model still and have switched to Codex for larger tasks. It also works better at more complex coding tasks than Claude for web apps that have python backends and typescript front ends.
I've been on the codex train for a few months now for personal stuff, but have Claude at work. I always tell people it's as good if not better than CC, but it has different strengths and weaknesses.
Claude was more autonomous and still is a little, but I think GPT 5.5 closed that gap a lot. Claude is far better at front end design. I think it's still better at big picture planning.
Codex is far better at code review and catching bugs that actually matter. I think it's better at following directions, although I think that regressed a bit with 5.5 (flip side of the autonomy I mentioned earlier). A lot of CC users claim to not like Codex's personality (or lack of), but personally I prefer it.
I just like that Openai let's you use your codex subscription with whatever harness you like. I prefer Pi, so that's what I use. GPT 5.5 xhigh feels equivalent to Opus to me, so there's no reason for me to be locked into the Claude Code cli. I use it off-and-on throughout my workday and never even come close to the pro limits.
Compaction is basically seamless which is a major weak point of Claude. At effort=low, Claude is better than codex but still slower. If you don't mind trading the upfront quality of work with additional micromanaging but at a faster speed, it is fine. I also think because of that very reason, you absorb more of the code.
I was really unimpressed by the free Codex (for nodejs/react dev). I think it must be using a less powerful model or they’re limiting it in some other way.
Are you specifically pointing at a different experience between free + paid? Or just that the free version is unimpressive?
I'm using paid on TypeScript and it's genuinely terrific. Subjectively I think it has the edge over Opus.
I'd be surprised if OpenAI is hamstringing the free version. That would seem crazy from a GTM PoV. If anything the labs seem to throttle the heavy paid users.
> I was called in to explain "academic dishonesty" from apparently copying results from a former student who I had never met. I truly had no idea where Qwen got the results, I just cared that it was correct. I told them I found it on StackOverflow, which they believed and let me off with just one failed class to retake.
How did this happen? I assumed Qwen generated all its content? Even when its using web search or sifting through a document.
> So what did I learn? Yes you can use AI, and you don't need to burn thousands of dollars on Claude, when GLM and Qwen are subsidized by the Chinese government and available for free.
This is so true. I've gotten tons of help from Qwen, Kimi and GLM (Z.ai) when I have gotten stuck. All for free! We are at a point where we have access to very powerful models today. Especially Qwen models who have very good vision models, like truly vision models. You can upload graphs, diagrams and all kinds of images to let it analyze and discuss about it. It's not like the other models who just performs an OCR on the image, it truly analyzes the image. I am very thankful that I have access to these three sites for free. Especially Qwen and Z.ai.
> How did this happen? I assumed Qwen generated all its content? Even when its using web search or sifting through a document.
Qwen does generate everything, so I'm not exactly sure. It wasn't anything too unique, so it could've easily learned and regurgitated it from any one of hundreds of sites.
+1 on Qwen Vision, its far better than anything else I've seen, and its open source, woo!
What are Western models for other than being the ethics police when you ask about something controversial, if the Chinese models are superior and lack the moral / ethical BS? Don't trust China? Self-host Chinese models or host in AWS and verify the output!
I've tried Claude Code with another LLM, it's very good at doing tasks and figuring things out. So this made me wonder, even though we know how good Claude models is, maybe the true value is in the harness now?
> Seahorse emoji demonstrates this nicely, the LLM internally holds a semantic vector for seahorse+emoji but the output translation layer can't match it.
I am curious about this, how can the LLM hold the embedding for seahorse+emoji if it doesn’t exist? How did it end up like this? Perhaps the dataset had discussions from people about new potential emojis?
> When artificial systems produce human-like language, people may draw a reverse inference: if LLMs can speak like humans, perhaps humans think like LLMs.
I think I experienced this when I learned about LLMs, chain of thought, thinking tokens, short-term memory context, and long-term memory context. I began applying these concepts to real life and reasoning about how our brains work as if these concepts described how our brains actually function. But maybe this is more akin to the Tetris effect?
People have been doing this since the invention of clockwork. Analogies are useful, even when they're utterly wrong, since they provide a perspective and that perspective is not necessarily wrong. Who knew?
reply