Please ignore the deluge of complete nonsense about Q*. One of the main challenges to improve LLM reliability is to replace Auto-Regressive token prediction with planning. Pretty much every top lab (FAIR, DeepMind, OpenAI etc) is working on that and some have already published ideas and results. It is likely that Q* is OpenAI attempts at planning. They pretty much hired Noam Brown (of Libratus/poker and Cicero/Diplomacy fame) to work on that. [Note: I've been advocating for deep learning architecture capable of planning since 2016].
Share this post