Monday, January 6, 2025

 With year over year advances in Large language models are becoming popular faster than ever, this is a summary of all the advances made last year including the learnings. First, the GPT-4 barrier was completely broken. Among seventy other competitors who overtook GPT-4 including Google’s Gemini and Anthropics’ Claude3, LLMs have successfully tackled long inputs where entire books can be thrown at the LLM and it would answer questions with high precision and recall and even solved coding challenges with trainable samples that users provide. Second, these models can now run on laptops even though GPT-4 earlier used to require datacenters with one or more $40,000 plus GPU. On a laptop, 64 GB of RAM is sufficient to train these models. Third, these competitions reduced the prices of LLMs significantly and provided more value by virtue of increased efficiency. Earlier the price was $30 per million tokens on OpenAI for GPT-4 while GPT-4o is now $2.50 per million tokens. These price drops are driven by two factors: increased competition and increased efficiency. The price drops tie to how much energy is being used for running prompts and the efficiency thing is really important for those concerned with the LLM’s environmental impact. Fourth, LLMs are becoming multi-modal. In computer vision, being able to vectorize images as well as text and use them in queries with vector search has proved tremendously successful in many use cases. Now, audio and video are starting to emerge. In Nov 2023, GPT-4 was the leader with multimodal vision and that is now caught up by many other models. For example, Amazon’s Nova has garnered support for both image and video and is available on their public cloud. This brings us to our fifth advancement, voice and live camera mode are science fiction come to life. Just a year back, audio, and live video mode were just an illusion. OpenAI’s Whisper speech-to-text model and a text-to-speech model enabled conversations with ChatGPT’s mobile apps but the actual model just saw text. The GPT-4o in 2024 is a true multi-model model that could accept audio input and generate incredibly realistic sounding speech without the separation. Sixth, prompt driven app generation is now mainstream and virtually permeated every industry sector. We knew they were good at writing code and entire static websites could be created with a single prompt, but Claude Artifacts took it up a notch by writing an on-demand interactive application that lets you use it directly inside the Claude interface. Seventh, universal access to the best models lasted just a few short months. OpenAI made GPT-4o free to all users unlike earlier where users did not know the latest as they were allowed access to earlier versions. This has changed with a monthly subscription service where they can keep up with the latest. Eighth, “Agents” still haven’t really happened yet. The term itself was extremely frustrating in that it did not clarify the purpose as whether the agents did low-level work or could also do high-level composites. And to make it more ambiguous, the term autonomous was thrown into the mix. They were not supposed to make meaningful decisions themselves because that eventually hits a roadblock or leads to hallucinations. And just as if to prove that, prompt injection leveraged this ambiguity. Ninth, Evals really matter. As one expert put it, the boring but crucial secret behind good system prompt is test-driven development. You don’t write down a system prompt and then find ways to test it. You write down tests and then find a system prompt that passes them. Tenth, there is a rise of inference-scaling “reasoning” models. This is an extension of chain-of-thought prompting trick where if you get a model to talk out loud about a problem it’s solving, you often get a result which the model would not have achieved otherwise. The biggest innovation here is that it opens up a new way to scale the model as they take on harder problems by spending more compute on inference.

Reference: previous articles & simonwillison.net


No comments:

Post a Comment