Hacker Newsnew | past | comments | ask | show | jobs | submit | carterschonwald's commentslogin

it seems like with some care and disabling sip, that some pretty good work arounds using llm assisted kext hackery would get pretty far

yeah shared “did you this weeks X” is lame, but it was social glue for a long time.

I think about this all the time.

The trend towards personalization in media and software comes at the cost of a loss of a shared social experience we can use to relate to each other.


yeah but do we really need some trash reality-TV for a "shared social experience"? most of TV's programming was garbage anyway and contributed to a lot of what was/is wrong with the society

this is literally just “leave a child at the work computer with a real doc open playing office”. otoh it is good to design benchmarks tonground these things.

on the flip side if you’re literally just using a bare bones harness on top of a stochastic parrot, of course stochastic errors accumulate.

theres a lot of ways for improving text faithfulness through harness tool designs, and my incremental experiments seem promising.

but unless work is gated on shit like “the script used must type checked ghc haskell or lean4”, unsupervised stuff is gonna decay


It’s not a stochastic parrot.

It’s a stochastic goblin.

good one

i mean of course. ive been working on this the past few months and ive a bunch of tech towards this in flight, including some harness forks to layer my ideas in. eg my oh punkin pi test bed on my github.com/cartazio page , theres some shockingly obvious ince you see it tricks that i think i can stack into a really nice harness product for just doing hard real work with these models more easily

16weeks plus week or so per year of service is pretty good

the funny thing is once the llms got mostly good enough in november 2025 for me, it was mind boggling how much it helped me get stuff out of my head with ease.

its easier for me to code now, because its like i have a 24/7 insane intern that needs to be supervised via pair programming but also understands most topics enough to be useful/ dangerous.

ironically ive been spending much of my time iterating on ways to improve model reasoning and reliability and aside from the challenge of benchmark design, ive had some pretty good success!!

my fork of omp: https://github.com/cartazio/oh-punkin-pi has a bunch of my ideas layered on top. ultimately its just a bridge till i’ve finished the build of the proper 2nd gen harness with some other really cool stuff folded in. not sure if theres a bizop in a hosted version of what ive got planned, but the changes ive done in my forks have made enough difference that i can see the different in per model reasoning


im def working on benchmarks for how my own general harness improves task performance vs same model in a commodity setup. its hard to do!

i will say that my current harness: https://github.com/cartazio/oh-punkin-pi is a testbed for a bunch of 2nd gen harness tech, largely optimized for reasoning llms only. the next one after this harness is gonna be epicccc


i might borrow the skills etc for good ideas sometime. thats a lot of integration surface

check out my pi forks.


Ummmmmm, how?


I searched his HackerNews username on Google.

[0] - https://github.com/cartazio/oh-punkin-pi


That (and oh-my-pi) seem like an excessive swing in the other direction. Im all for the simplicity and minimalism of pi. There are just a few fundamental things that need updated (mainly subagent context and open-by-default security model).


yup thats mine. :) i actually had some stuff layered into mono pi, and i frankly hit my limit in terms of architecture issues in monopi, omp aka oh my pi is frankly better architectured. if you pared back the fearure set to be minimal, you would full stop have a better designed minimal harness.

i do have a proper next gen no slop harness in the work.

amusingly , dog fooding existing tools with my improvements layered in, has repeatedly validated my design choices and if anything has reduced my tolerance for the errors that seem to happen in vanilla or first party harnesses


more than that, its pretty clear that there is an insane underinvestment in the harness layer. ive been iterating on my own ideas in that area through the lens of increasing reliability. and holy crap is there so much low hanging fruit. i literally can’t figure out a sustainable way to do the work without commercializing at that layer


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: