Tuhin Srivastava, CEO Baseten, breaks down the hardware-level metrics that determine latency, throughput, and real LLM performance. A sharp insight at what actually matters when you're serving models at scale. You can catch the full episode below in the comments!
Weights & Biases
Software Development
San Francisco, California 86,654 followers
The AI developer platform.
About us
Weights & Biases: the AI developer platform. Build better models faster, fine-tune LLMs, develop GenAI applications with confidence, all in one system of record developers are excited to use. W&B Models is the MLOps solution used by foundation model builders and enterprises who are training, fine-tuning, and deploying models into production. W&B Weave is the LLMOps solution for software developers who want a lightweight but powerful toolset to help them track and evaluate LLM applications. Weights & Biases is trusted by over a 1,000 companies to productionize AI at scale including teams at OpenAI, Meta, NVIDIA, Cohere, Toyota, Square, Salesforce, and Microsoft. Sign up for a 30-day free trial today at http://wandb.me/trial.
- Website
-
https://wandb.ai/site
External link for Weights & Biases
- Industry
- Software Development
- Company size
- 201-500 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2017
- Specialties
- deep learning, developer tools, machine learning, MLOps, GenAI, LLMOps, large language models, llms, Generative AI, Developer Tools, Experiment Tracking, AI Governance, Model Monitoring, Inference, Open Source AI, Model Comparison, Evals & Scorers, Data Quality, Generative AI, AI Observability, Agentic Workflows, RAG (Retrieval-Augmented Generation), Prompt Engineering, Hyperparameter Tuning, Benchmarking, Large Language Models (LLMs), Reproducibility, Dataset Versioning, and Tracing
Products
Weights & Biases
Machine Learning Software
Weights & Biases helps AI developers build better models faster. Quickly track experiments, version and iterate on datasets, evaluate model performance, reproduce models, and manage your ML workflows end-to-end.
Locations
-
Primary
Get directions
400 Alabama St
San Francisco, California 94110, US
Employees at Weights & Biases
Updates
-
You bring the LoRA, we bring the GPUs 🤝 We’ve unlocked a new way to serve custom models. Introducing Serverless LoRA Inference on Weights & Biases. Fine-tuning is getting easier, but serving those custom models usually means provisioning dedicated infrastructure for every single version. It’s expensive and it slows down iteration. We fixed that. We now let you upload, version, and serve custom LoRAs instantly on fully managed CoreWeave GPUs (the ONLY SemiAnalysis Platinum-grade AI cloud). Here is how it works: 1️⃣ Train a tiny LoRA adapter (it's cheap and efficient). 2️⃣ Upload it to W&B Artifacts. 3️⃣ Request inference using the adapter ID. We swap the layers dynamically in real-time. That means no cold starts, no dedicated instances per user, and no infrastructure headaches. Just fast, efficient inference. It is currently in Public Preview. Get started here: wandb.me/LoraLaunch
-
-
🎰 See you in Vegas! If you’ve been to Amazon Web Services (AWS) re:Invent before, you know the drill. It is massive. The Venetian is a maze, the expo hall is packed, and the amount of information flying at you is intense. Amidst the noise, make sure you stop by the Weights & Biases booth (#1427). Why? Because while everyone else is talking about what GenAI can do, we’re showing you how to actually build it reliably. We are the developer platform that the teams building the world’s best models use every day. Whether you're training foundation models on Amazon SageMaker or trying to debug a tricky LLM app on Amazon Bedrock, we give you the tools to see exactly what's happening inside your models. If you are going to be attending, we would love for you to come grab some swag and say hi. Book a time to meet with our experts here: https://lnkd.in/gvSRWZ4c
-
-
Weights & Biases reposted this
CoreWeave delivers another industry first: End-to-end automated user provisioning for Slurm clusters. In this new explainer video, CoreWeave Product Managers Andrew Manoske and Deok Filho walk through how Automated User Provisioning (AUP) and SUNK User Provisioning (SUP) work — and how to set them up in minutes. They break down: ▶️ How AUP connects enterprise identity providers to CoreWeave IAM ▶️ How SUP automatically provisions users and permissions in Slurm-on-Kubernetes ▶️ How teams go from manual setup to instant, secure access Watch the walkthrough below to see end-to-end automation in action — from identity → infrastructure → innovation. Take a look 👇
End-to-end automated user provisioning for Slurm clusters
-
Baseten CEO Tuhin Srivastava reveals why scaling custom models turns inference into a completely different engineering problem. A quick look at the part of the AI stack most people underestimate. Watch the full episode via the links in the comments!
-
The conversation around Reinforcement Learning is polarized right now. Some claim it is the only path to AGI, while others dismiss it as a dead end. We wrote this Practitioner's Guide with CoreWeave to replace those slogans with evidence. If you are building AI agents, the story of RL has shifted, and you need to pay attention. It is not just about teaching AI to play Chess or Go anymore. Today, RL post-training is the specific lever that determines whether your model stays a cool demo or actually ships to production. When you look at the data, the results are counter-intuitive. RL often outperforms Supervised Fine-Tuning on agentic tasks. While fine-tuning relies on curated datasets, RL can deliver competitive gains with as few as 16 well-chosen examples. You do not need perfect answers, just a way to score better outcomes. This guide breaks down exactly how to implement this using LoRA adapters. Instead of retraining the entire model, you update tiny adapter layers to fix specific failure modes. Think of it like a git diff that nudges the model toward reliability without breaking its general reasoning. The biggest barrier used to be managing the GPU infrastructure. The guide introduces Serverless RL, which lets you define the environment while the infrastructure handles the scaling and crashes for you. The goal is not perfection on day one. It is about building an agent that improves like a new hire: correct the misstep, reinforce the fix, and move on. Stop guessing how to make your agents reliable. The playbook is right here. Download the full eBook in the comments!
-
-
It’s not everyday you get to discuss the future of AI inside a historic monument like the Grand Palais in Paris! 🇫🇷 That is exactly what is happening later this month at Adopt AI. This event is being billed as the "Davos of AI," and looking at the convergence of global heads of state and enterprise leaders, it is going to be a defining moment for the European tech ecosystem. The Weights & Biases team will be on the ground, and we would love to connect with you in person. Here is where to find us: 📅 Dates: November 25-26, 2025 📍 Location: Grand Palais, Paris 🔎 Spot: Tech Demo Zone We are bringing our experts to give hands-on demos of the platform and discuss the challenges you are solving right now. We also made sure to pack plenty of swag for the occasion. If you are planning to attend, stop by the booth and say hello to the team or you can book a meeting here: https://lnkd.in/gZ6ftkJb
-
-
Weights & Biases reposted this
Just shipped: use marimo natively inside your favorite editors, including VS Code and Cursor. Version with Git; use interactive elements; reactive execution keeps code and outputs in sync. Our extension integrates deeply with @astral_sh uv for fast installs and isolated venvs. You can choose to let our extension manage packages for you, with uv. Import a package and you're automatically prompted to install it if it's missing. Dependencies are recorded in script metadata, making your notebooks reproducible down to the packages. See the extension in action at our YouTube video: https://lnkd.in/g8EK74Ca Read more in our blog: https://lnkd.in/gzw7bQR3
-
-
This is the clearest explanation of modern inference we’ve heard all year. We brought Baseten CEO Tuhin Srivastava onto Gradient Dissent, and he breaks down what actually happens when teams move from demos to real production workloads. Tuhin explains what teams run into once they move past demos and into real production: • Scaling across large GPU fleets without breaking workloads or spiking latency • Pushing runtimes to their limits as you choose between vLLM, TensorRT-LLM, and SGLang on new hardware He also shares why many teams eventually move from closed models to open source once cost and control become critical. If you want a clear snapshot of how large models run in the real world, this episode is worth your time. Link is in the comments.
-
Why your agent works in the demo but breaks in the wild. 🤯 Prototypes are easy. Production reliability is hard. We usually see agents fail when they hit "brittle corners" like vague inputs or schema changes that send the model into a loop. You can try to patch these holes with prompt engineering, but you’re essentially playing Whack-A-Mole. We just released a Practitioner's Guide to Reinforcement Learning to solve exactly this. It focuses on using RL post-training to ground agents, moving them from a cool demo to a reliable product. The simplest way to think about it (from Page 2): Think of it like training a dog. You don't physically move the dog's paws into a "sit" position (that’s SFT). Instead, you give a command, wait for the action, and reward the right behavior. Over time, the policy improves. We apply this logic to LLMs using algorithms like GRPO to reward better reasoning paths without needing massive labeled datasets. If you want to lower latency and cost while fixing reliability, give this a read. Download the guide here: https://lnkd.in/gUBVBFTJ
-