We test every new promising model that comes out. Sometimes we find interesting things like below!
Big fan of open-source models - and am impressed with the new Kimi model from Moonshot AI, particularly at its price point. At Fern Labs we test all new promising models for real world use cases - working with engineers through Slack, writing software and raising Github PRs, deploying new apps/digging through infra logs etc. During testing Oscar Chung has noticed an interesting quirk - when using tools Kimi thinks it's Claude. This could have a few different causes, but is also bad for Moonshot as Kimi keeps attributing its work to Claude - for example raising PRs using a username of 'Claude Anthropic'! Strong model though, particularly exciting as frontier-level open-source models hosted by extremely fast inference providers like Groq means agents can work 5x faster. Link below, where we deep-dive into the topic!