anthropic dropped a new model! i finally put it thru my normal paces, more or less, and have a few thoughts.
it’s a claude, toe to tip. most of the things you might hate (overly defensive programming around invariants, lots of comments, some vague reward hacking) are all still here…
but it’s notably better. we’ve come a long way from sonnet 3.7 and return true //TODO: write test shenanigans.
something i dislike, or at least is discomfiting - it tends to do a lot more successful work at a time. idk how much of this has to do with claude code’s context management getting better, but without steering sonnet will take bigger swings and usually get them right the first time.
this behavior is kinda annoying in practice if you’ve built up the behavior of “work in small chunks and refine”. i think sonnet 4.5 benefits strongly from pre-planning and prompting (i suspect some of this is overfitting/hill climbing on SWE benchmarks).
i think it’s still a great workhorse, but it’s very unsurprising. maybe that’s fine, and that’s where we’re at with models these days; continually getting better but not really making huge leaps.
vibescore 3.8/5