Generative AI in 2024 – A Preview
2023 was a blowout year for Generative AI systems. We saw huge advances in a range of systems – not only LLM's like ChatGPT, but also text-to-image models like DALL-E and Midjourney, text-to-speech and speech-to-text models, and even text-to-video and the ability to create 3D images from 2D. The future is wild, y'all.
So, naturally, this leads to the question of “what's next?”
The following is my best guess, based on my knowledge of the technology and observation of the ecosystem. My hit rate last year was pretty darn good – hopefully these predictions hold up as well. I look forward to your thoughts on it.
It may seem a bit tardy to be prediction the year in GenAI...in April. I'm a bit behind on a lot of posts at the moment, but figured better late than never. Hopefully, this means that the predictions are a bit better than they would be otherwise – and also more embarrassing if I'm off the mark. As Yogi Berra supposedly said “Prediction is hard – especially about the future.” :)
Better and More Efficient Models
First and foremost, we'll see more and better big commercial models, at prices at or lower than we see today. The big players – OpenAI, Google, Anthropic, etc. – will continue to build bigger and more powerful models – more parameters, higher throughput, more functionality. We're already seeing this play out. Google's Gemini model recently premiered with a 1M token context window, allowing users to upload and query multiple documents simultaneously. Anthropic's most recent Claude update already performs at or better than GPT-4 at prices below those from OpenAI, on the order of a few dollars per million tokens (a token is roughly .5-1 words). Rumors already abound about OpenAI's next model, GPT-5. And that's all before April. It's going to be a fun year.
We'll also likely see many more systems in the open-source space as well. The easy version of this prediction is: smarter models, approaching or surpassing GPT-4, that can run locally and fit on a single consumer-grade GPU (meaning 24GB of VRAM or less). However, I believe we can go further in this prediction. The availability and ease-of-use of llama.cpp as an alternative to other deep learning frameworks (like PyTorch or Tensorflow) has been a major source of innovation in the community, bringing the efficiency and portability of C code to the world of LLM's. This means running generative AI systems at usable speed on just a CPU, not a GPU. Justine Tunney has also been doing miraculous work in the world of system-level optimizations – the llamafile project she's engineering at Mozilla allows you to download a single-file and run an LLM on any common computer. It doesn't matter what OS you have, what brand of CPU or GPU you have, any special languages or programs you have installed – it just runs, everywhere. Incredible work!
There are also many other avenues of development.
A major criticism of GenAI tools in recent months has been the sheer quantity of resources supposedly required for these models. Energy, water, chips – there is no shortage that these systems cannot be blamed for, evidence be damned. It's gotten to the point where Sam Altman, the head of OpenAI, is actively exploring investment in fusion technology, suggesting that energy is the major bottleneck to AI development going forward.
Of course, as with other resource systems (energy grids, water supplies, etc.), the biggest dividends usually come not from breakthrough new technologies, but improved efficiency. So, along with advancements in task performance, I also expect this year will see major advances in improving speed and efficiency of GenAI systems, especially LLM's.
Unsurprisingly, there have already been a number of great developments in this direction. Some have explored shrinking models, including Microsoft's 1.58 bit LLM's paper, along with studies showing that a huge portion of model parameters can be zeroed out with minimal performance impact. Google released a paper recently showing that some words are easier to predict than others – meaning that significant energy can be saved by simply focusing more energy on the hard words and less on the easy ones. These sorts of tricks accounted for major advances in basic computing efficiency at the chip level decades ago – I expect will see many more approaches before the year is out, potentially offering order-of-magnitude improvements in resource consumption in the near future.
Compositional LLM's
As people experimented with LLM's over 2023, a common theme kept arising – the more that the models were tuned, the dumber they became. GPT-4 became famous for half-assing programming tasks (framing out the effort, then suggesting the user fill in the details), avoiding details in knowledge tasks, and so on. Conversely, using open-source models that had not yet been aligned (meaning “tuned for human preferences”) was eye-opening, both for the stark improvement in performance, as well as their readiness to create offensive content.
My theory is that we are simply asking too much of a single model. General purpose models have their place for general purpose tasks, in the same way that Wikipedia is a great starting place for learning about things, but rarely a final resource. Models trained on specific tasks, like programming or translation, tend to do much better at those tasks, but worse at others (like planning or question-answering).
Based on this, one promising direction for LLM development has been with Mixture-of-Expert models (or MoE for short) – rather than having a single super-smart expert on everything, designers blend together several smaller models, each expert in a specific area, to complete the user's task. These systems tend to require more memory overall, but surprisingly less than the sum of their parts would suggest, and their performance has been very promising so far.
However, I don't think that's quite far enough. In addition to answering a broad diversity of questions, we're also challenging these systems to detect harmful prompts, output aligned (i.e. human-respecting) outputs or structured outputs, and be less sensitive to specific prompt wording. Along with using multiple LLM's in parallel (as in the MoE approach), I expect we'll also start to see them in series as well.
Imagine a multi-step LLM system, where a single prompt passes through multiple smaller LLM's in order, each accomplishing a specific task to serve the user, such as:
- Security filtering (block harmful prompts)
- Prompt optimization (reduce sensitivity to specific prompt phrasing)
- Expert routing
- Expert solution
- Output alignment (keep it from saying something offensive)
- Structured formatting (output result at JSON, YAML, or more)
There have been some developments in this regard, especially with respect to structured formatting (Anthropic is ahead of the game here, as well as tools like the Outlines), but this overall concept seems underexplored by the community. I expect to see more soon.
Agentic Systems
But let's take this a step further. One of the tantalizing dreams of LLM's has been the potential to extend them beyond chatbots, into fully agentic assistants. There are now a number of frameworks (most notably DSPy) that enable a single LLM to assume multiple personas and talk to other versions of itself. These are now being built into fully functional systems, such as OpenDevin, that can be given a goal and automatically complete the work required as a cohort of differently-prompted model instances, acting as a synthetic “team”.
The best thing about this approach is that you don't need powerful models to make this happen. Rather than using one super-smart LLM as a jack-of-all-trades, you can use a virtual group of more-efficient AI models, where each member is more performant in their area.
This opens up a whole range of use cases and implementation approaches.
- For privacy, security, cost, or convenience reasons, you could run a model on your own local computer versus having to use pre-defined or pre-tuned models from elsewhere.
- You are not constrained by using any one model. You could use one base model (but different prompts) for each agent, use different models for each agent, or some mix of the two. You're also not constrained by what's available today – as new models become available, simply switch them in as needed.
- This also potentially opens up use cases far beyond what could be achieved with a single-prompted LLM. In today's world, you're essentially finding one super-smart person, then asking them to consider a task from multiple points of view. You're carrying around that previous history of the conversation through a single personality lens, even if exploring different facets of a problem. In an agentic structure, you're starting from that same super-smart core person, but, through unique prompts, essentially creating independent experts in collborating with each other. It enables a much broader range of exploration beyond what a single LLM could potentially handle, staging out the work in different phases, and potentially using different LLM's for different parts of the works. It's a whole new ball game, and we're still in the first inning.
Outside of LLM's
So far I've written primarily about LLM-based GenAI, and haven't touched on other systems yet. This is in part because I know less about other systems, and I don't have as clear a vision for them as for LLM's. However, based on previous trends, I think we can safely expect to see them continue to improve along multiple lines:
- Fidelity: higher accuracy and detail in their outputs, as well as larger outputs (longer songs, bigger pictures, etc.)
- Steering: staying closer to user expectations, and more consistent across a series of generations. (Think of generating a series of marketing pictures using a single consistent character, or an album of songs with consistent instrumentation and vocalists.)
- Cost: the price for using these will stay or get lower over time, if they're not already free.
The one direction I'm not prepared to predict is the availability of open-source models. StabilityAI had been a major leader in this area for a range of models, but have recently come in to difficult times. It's not clear that OSS community has the same level of skill with non-text-to-text models as they do with LLM's, so I wouldn't be surprised to see open-source development of locally-deployable non-text-to-text models slow down.
The one wildcard here is multimodal models, which can operate with a wide range of input (and potentially output) modalities, such as text, images, video, and so on. I don't think these will fully supplant targeted models focusing on specific modalities, but they could be a great addition to the assortment.
Legal Resolution
A long-standing challenge with Generative AI is it's troubled origin story – most major GenAI projects started with a scrape of the Internet, without asking for permission first. For publicly-owned or -committed data, like Wikipedia, that's not so much of an issue. However, for other data – forum posts, artistic output, pirated media, etc. – which has not granted explicit permission for other uses, it's been a grey area, a sticking point, and a source of both enmity and lawsuits that has cast a shadow over the whole undertaking.
It's clear that some of these discussions will reach a point of resolution this year. The New York Times lawsuit against OpenAI is clearly a gambit for a licensing arrangement, and both parties are motivated to resolve it quickly. The EU is moving forward with legislation to regulate AI model creation, including improved documentation of training sources and standards for release. US Representative Adam Schiff recently proposed legislation that would require disclosure of training data sources as well. This is something that AI researchers and companies should have been doing all along – now, it looks like the law is coming to force their hand.
What intrigues me is how this may impact GenAI perception and adoption going forward. Much of the reaction to GenAI on social media has clung to this particular issue with GenAI as a reason to scuttle the whole enterprise. But, as these tools become more ever-present, easier to use, and with proven results in both commercial and personal terms, will people come to accept that these systems are actually beneficial if the whole process is consensual, or will the original sin tarnish these systems for a generation?
As with all other aspects of this ecosystem – only time will tell. :)
#llm #llms #GenerativeAI #GenAI #ChatGPT #OpenAI #Anthropic
Written by Dulany Weaver. Copyright 2022-2024. All rights reserved.