LinkedIn has published one of the best reports I’ve read on deploying LLM applications: what worked and what didn’t. 1. Structured outputs They chose YAML over JSON as the output format because YAML uses less output tokens. Initially, only 90% of the outputs are correctly formatted YAML. They used re-prompting (asking the model to fix its YAML responses), which increased the number of API calls significantly. They then analyzed the common formatting errors, added those hints to the original prompt, and wrote an error fixing script. This reduced their errors to 0.01%. 2. Sacrificing throughput for latency Originally, they focused on TTFT (Time To First Token), but realized that TBT (Time Between Token) hurt them a lot more, especially with Chain-of-Thought queries where users don’t see the intermediate outputs. They found that TTFT and TBT inversely correlate with TPS (Tokens per Second). To achieve good TTFT and TBT, they had to sacrifice TPS. 3. Automatic evaluation is hard One core challenge of evaluation is coming up with a guideline on what a good response is. For example, for skill fit assessment, the response: “You’re not a good fit for this job” can be correct, but not helpful. Originally, evaluation was ad-hoc. Everyone could chime in. That didn’t work. They then have linguists build tooling and processes to standardize annotation, evaluating up to 500 daily conversations and these manual annotations guide their iteration. Their next goal is to get automatic evaluation, but it’s not easy. 4. Initial success with LLMs can be misleading It took them 1 month to achieve 80% of the experience they wanted, and additional 4 months to surpass 95%. The initial success made them underestimate how challenging it is to improve the product, especially dealing with hallucinations. They found it discouraging how slow it was to achieve each subsequent 1% gain. #aiengineering #llms #aiapplication
I tried the premium feature of automatic evaluation Chip Huyen. It’s not bad but not amazing. I would’ve liked more concrete actions to be a better fit. It’s in beta as stated.
The choice of going from json to yaml seems entirely personal. The performance gains are minimal while you’re stuck with operating in yaml which isn’t industrial standard
Good stuff Chip Huyen The Premium LinkedIn AI Beta had a very thoughtful question and response to my LinkedIn post today.
With no neural network, thus much faster and much easier to improve, train and understand, I get better results. See how I do it in my articles 36-40 at https://mltblog.com/3EQd2cA Currently under implementation with Fortune 100 companies. There is a lot more than just no neural network. Plenty of ground-breaking innovation. In short, it's a collection of dozens of specialized sub-LLMs, self-tuned, with an LLM router and contextual tokens, taxonomy, concept graph and more.
Jake Bowles - YAML ... interesting or no?
Building something new | AI x storytelling x education
1yI'd highly recommend this report to anyone interested in building AI applications. Great write up Juan Pablo Bottaro and Karthik Ramgopal! https://www.linkedin.com/blog/engineering/generative-ai/musings-on-building-a-generative-ai-product