I initially thought "Every Frame Perfect" meant a strict avoidance of any jank or stutter in motion, which I'm fully on-board for but as a film, video and 3D technologist, you're spot on calling out motion blur and similar temporal artifacts. In motion, they not only look 'most correct' to the human visual system, they are the most interpretable.
Adding the correct blur to motion makes it appear clearer but seen as a still, it's obviously not clearer. The nuance is correct motion blur appears clearer while guaranteeing it's as clear as the human visual system can perceive moving details at that speed, so no perceptual detail is actually lost. It's a method that objectively improves perception which only works in motion. If frozen, the method breaks. Thus, evaluating motion blurred stills for clarity or interpretability is incorrect.
The rest of the article focuses on details of proper implementation while missing the opportunity to question whether some of these animations should exist at all. IMHO, motion can be a valuable affordance in limited doses but it's reached a point of overuse and, in some cases, outright abuse of the user's visual field and cognitive load. Designers (and their PMs) see it as a badge of 'Refined Modern UX' but it's devolved into a trendy gimmick aping good design without being good design.
Regarding your last point, I think it's almost always wrong to move something discontinuously, but I do think designers should think a lot more about getting out of the way of the user. A 50-100 ms animation is more than enough for most motions and keeps the UI feeling snappy. Also, animation should be decoupled from input wherever possible. I hate it when I have to sit there waiting for an animation to complete before the app will start acknowledging my keystrokes.
> I think it's almost always wrong to move something discontinuously
Yes, I think we agree. When a thing is becoming a larger/smaller form of itself in a different place, it can be useful to cue the relationship visually with motion. But there are times when the change or displacement is minor enough, I do prefer 'just do it', even when the animation is hyper-fast. It's just more visual/cognitive clutter.
It's obviously situational, and if such motion is always very fast, consistent and well-motivated, it never rises to the level of annoying me. I might personally prefer some instances where, if the position overlaps and the size change is minor, just skipping it, but it's not 'bad'. I think the key may be that, done properly, such motion should cognitively be a 'barely there' hint. The moment a state-change animation rises to having perceivable aesthetic value, like being 'pleasing', it's too much.
As the senior product owner, I once had a new designer argue that if an animation was as fast as I wanted, no one would be able to appreciate the excellent S-curve ease-in/out. :-) I had to explain if a simple state-change animation was slow enough to be consciously 'appreciated', it had failed in its purpose.
> waiting for an animation to complete before the app will start acknowledging my keystrokes.
Or you find out you can input as the animation happens, but when the animation finishes, you’ve lost where your input ended up and don’t know if you can backspace/delete and retype.
... I wonder if we're seeing a downstream effect of Apple rejecting Flash on iPhone, triggering a slow collapse of Adobe empire. It seems that there are multiple concepts missing in conversations going on here.
Sorry, I'm not connecting how "collapse of Adobe empire" leads to "multiple concepts missing" in this UX discussion. Can you clarify?
Also: Apple dumped Flash ~15 yrs ago, so whatever it is... it's very slow. The larger, more recent suspects for any "collapse of Adobe empire" would be "Adobe forcing $ubscription model" and "rise of Figma."
I get your reasoning but I think you're misreading this. The Trump admin has had access to Mythos for a couple months and certainly had access to pre-release Fable for more than a week but they wait until 5:30p on a Friday to send a broad and unworkable demand for a company to remove its flagship products from access to anyone who is not a confirmed U.S. citizen under severe penalties for any violation.
What penalties? Treason is still punishable by death in the U.S. I hate that I just felt compelled to write that as a serious possibility and, pre-Trump 2.0, I would have accused anyone citing that as scaremongering. But times have changed and this administration hates Anthropic vehemently. Anthropic is the only major AI company not "playing ball" with the DoW and donating to Trump's pet projects.
I truly believe if Mythos was an OAI or Google model, there would have been exactly the kind of discussion you imagine and this would have all been worked out. I deeply regret that recent facts make the most likely conclusion that this late-Friday ban was planned for days (if not weeks). And there was no real attempt to work anything out about Mythos, because that's not really what the DoW wants.
The driver behind this is the still unresolved dispute of Anthropic's Acceptable Use Policy regarding autonomous lethal weapons and mass domestic surveillance, which conflicted with the Pentagon's push for unrestricted model deployment ("all lawful uses"). This is the DoW's counter-attack. I fully expect that the DoW is going to hold Anthropic (and Ant's IPO) hostage by blocking any new model until Anthropic gives the DoW full access with no restrictions except "all lawful uses" (and the DoW's position is their in-house lawyers decide what's legal).
> <!-- EXACT setup from working simple-test.html -->
All LLM-kind would be vastly improved if the words "exact" and "brilliant" were nerfed to hell in their pre-training weights or even just removed from their training distributions entirely. Virtually nothing outside of mathematics is "exact", and virtually nothing outside of colors should be described as "brilliant".
Yeah, that was my guess too. Still a little disappointing to see fellow HNers reflexively fanboying a company that's the overwhelming dominate player in LLM coding.
I try to reserve my reflexive fanboy company/project votes for underdogs who need and deserve the help.
> But you know what my coworker asks? “Test Y theory.”
It still surprises me when I see people not prompting more specifically and clearly. It not only avoids problems, it's faster, costs less -and just works better.
I recently shared with a friend a multi-hour LLM chat session I'd done because it veered into a domain he's interested in. In the session I'd brainstormed and probed the feasibility of a novel concept for a new research direction. It traversed a half dozen domains diving into minute detail then zooming back out to survey an adjacent space, interspersed with intense skeptical probing of key assumptions, all while spewing tons of detailed citations, specific paragraph pulls, summarized data tables etc.
My friend is very experienced using LLMs for research so I was surprised when he called me shocked by the sheer velocity, precise targeting and signal/noise. I'd assumed everyone did it the same as I do. He attributed the different result solely to the way I crafted my prompts.
I used to write detailed prompts. Now I find the benefits of strategic ambiguity — rather than speaking imperatively, I emphasize my vision and then Claude can often figure out a method.
This doesn’t always work better. But often enough.
That's actually what I do too. What I was trying to say is that my prompts are precise in the sense that whether they're vaguely ambiguous or hyper-detailed and highly directive it's always very intentional to improve the response in the direction I want. The difference can have significant impact as shown in research on how LLMs naturally mirror user's prompts.
I noticed this last year and started experimenting which led to several realizations about how my prompt's tone, style, length, format, word choices and even punctuation can have very counter-intuitive impact on model responses. It's not that one strategy always gets "better" results, they're just different in specific ways, which can make one input style better for one context but worse for another. I first noticed this effect when modding my user prompt so major topic headings would always be numbered. It's surprisingly difficult to get it to reliably use the same simple scheme due to various potential ambiguities. So, I spent a little time word-smithing, lawyering and tuning the prompt but I found the closer I got to full compliance on heading numbering, the more unrelated things would drift. Like it would just stop using bullets, even though I never mentioned anything about bullets.
Then I changed the prompt to "Change nothing about your default formatting, except headings." But just mentioning anything related to formatting, could suddenly cause unintended effects on seemingly unrelated things. Then I tried being explicitly directive about all formatting to just lock it down. And this completely failed because once the formatting was perfect, I started noticing the model's output would get less intelligent much earlier in sessions. So I cleared my user prompt entirely as it wasn't worth the cognitive cost on the model or my time. A few days later in a long session I noticed it was numbering everything perfectly with no prompt at all. When I scrolled back through I saw it didn't start out numbering its responses. It started doing it because I was consistently numbering every major concept in my inputs, even though I never mentioned numbering or formatting.
So... yeah, subtle differences in prompts which absolutely shouldn't matter, do impact model output in unexpected ways. And, as of now, these effects can only be fully suppressed with strong directive prompts for short periods, but doing so always impacts other unrelated things - and has some cognitive impact on model performance. So, by paying a little attention, I've discovered ways to optimize a model's output in the direction I need by shifting not only my prompt's explicit directives but also the subliminal meta-elements like tone, style, length, structure, formatting, etc.
The counter-intuitive nature of LLMs is so simultaneously interesting and frustrating. Overloading a single prompt definitely can create challenge remarkably similar to human short-term memory and attentional drift.
LLMs gain so much knowledge and capability from absorbing the symbolic relationships embedded in human language but in doing so, inevitably absorb many of the human foibles, sensitivities and weaknesses reflected in our languages.
> we can't read even a tiny fraction of what gets posted here
I'll bet it's exhausting but your note did make ponder: If a soul was condemned to the eternal torment of reading nothing but all the user posts of one social media site for all eternity, HN would be a pretty excellent choice. I shudder to think of the alternatives.
Re: AI OS integration: I recently retired so most of my LLM use is just implementing and fixing fairly mundane OS and networking things along with light scripting for OS automation (AHK) and Home Assistant. So far, I just use web chat and cut-paste to the OS which is fine for little things but it starts to suck after the 15th round back and forth. For example, debugging intermittent Windows crash logs on my wife's laptop by doing multi-line PowerShell incantations from browser chat window, paste into PowerShell window. Cut multi-line error messages back to browser. Rinse / Repeat.
I'm leery about just giving an LLM free run of my laptop, but with reasonable restrictions on which app(s) it can access and how many steps it can do before checking in, and maybe even a throttle on how fast it works, I'd be fine (I'm not in a hurry and I can learn by watching it work at double-speed). It doesn't have to be mil-spec locked down, it's not like I have production code accessible or millions in crypto keys, the biggest downside would be a few hours hosing out and restoring the laptop, which would be annoying but not the end of the world.
I get those that say, "just spin up a VM and run it there", but I 'spin up a VM' rarely enough that the versions have changed and UXs drifted enough that it's exactly the kind of thing I'd actually want the LLMs help to do without me being a cut-paste bot. I'm mostly Windows at the moment and I don't understand why MSFT insists on spamming LLM features everywhere except the one place I'd not only use it, but pay for it. The usage model could be as simple and intuitive as a Zoom remote desktop share with a collaborator. That's already constrained and users have a mental model for the interaction pattern.
I asked Gemini earlier today to search recent user reviews of the latest 'drive my Windows desktop for me' and it reported that the capability is still slow, expensive, and prone to getting lost navigating the interface or interpreting window boundaries etc.
Anyone have any suggestions for my lightweight, casual use case?
Yeah unironically just let an agent harness rip with full admin access without monitoring anything it does or using a VM. It’ll be fine, probably. “How I Learned To Stop Worrying And Love the AgentDOS and Only Exfiltrate Secrets Occasionally”
Yep, I remember seeing the headline. I clicked in, read the sub-head and first few sentences, hit the back button, and moved on, having duly noted the passing of one more milestone in Google's long descent. The reasons why it sucks, why they eventually did it and the vague, implausible PR justifications for doing it were all self-derivable from the headline and sub-head.
I doubt I missed anything of significant substance, but didn't want to assert factual knowledge, so I just linked the Wikipedia article (which I also didn't read into). Don't interpret my skipping the rumination step with excusing or dismissing Google's decline. I don't need to rubber-neck every step of a slow-mo, multi-year train wreck to lament that it happened and update my priors regarding Google.
Sadly, I think it's all four at once.
reply