Alison Smith’s Post

View profile for Alison Smith, graphic

Director of Generative AI

Sometimes it feels like we're kids in an LLM candy store, eyeing the latest shiny model. With #Claude3 and #DBRX topping the latest benchmark charts, I wonder what these accolades mean to the average person or organization.   When it's time to roll up our sleeves and dive into real-world application, we face a conundrum: How do we sift through the mountain of options to find the golden ticket - the model that fits our unique business confection?   Unfortunately, evaluating AI isn't as straightforward as picking the best chocolate. Open source(ish) and proprietary models come with different features, licenses, levels of openness, inference speeds, context windows etc. We can't just push all the buttons hoping for the best without considering the cost and time to experiment with these models.   In an ideal world, we would assess them 'ceteris paribus' - holding all else equal, but Generative AI doesn't play by these economic rules. Each model interaction can be as complex as a dynamic market economy!   At #BoozAllen, we are developing approaches to evaluating LLMs in the context of their use, but I'd love to see any more insights, research papers, or tools making strides in this space. For example, I recently stumbled upon Superpipe (https://superpipe.ai/), an open-source framework that helps build, evaluate and optimize LLM pipelines.   Join me in this discussion because, in the end, we want our AI choices to be less of a gamble and more of a calculated investment - who wouldn't want the sweet taste of success? 🚀   #AI #GenerativeAI #OpenSource #ModelEvaluation

Superpipe - LLM experimentation platform

Superpipe - LLM experimentation platform

superpipe.ai

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

10mo

Navigating the landscape of Large Language Models (LLMs) indeed resembles being in a candy store, with a plethora of enticing options. You talked about the complexities of model evaluation, highlighting the need for robust assessment frameworks. In this vein, considering the dynamic nature of Generative AI, how do you envision integrating tools like Superpipe to address the evolving demands of model selection and optimization for specific use cases, such as real-time sentiment analysis in financial markets?

See more comments

To view or add a comment, sign in

Explore topics