The AI Revolution Is Yet To Come

April 25, 2023

“The future is already here – it's just not evenly distributed.” -William Gibson

It's been six months since OpenAI's ChatGPT system exploded onto the scene, enchanting the world with it's incredible fluidity, range of knowledge, and potential for insight. The reactions have been predictably wide-ranging, from those sneering at the technology as a fancy parlor trick to those seeing it as the future of personal automation. I'm on the record as bullish on large language model (LLM) technology. However, given the frenetic pace of the AI innovation nowadays and the proven profit motive for bringing it to business, I've been surprised at how slow this transition has been so far.

Let me explain.

A Road Taken

As an example of what could happen, consider text-to-image systems. You describe an image in words, and the AI generates pictures based on your description.

Throughout the 2010's, researchers hacked away at the problem, slowly growing the size of the pictures, the fidelity to the prompt, the array of styles available. However, examples never went much beyond poor, small images.

Then, in the course of 2 years, models were released by OpenAI, then Midjourney, then StabilityAI, that created high-quality pictures of a size that could actually be used commercially. Prompts went from a long, complicated paragraph to a simple sentence. Most recently, new tools allow you take these techniques even further, making videos, using reference pictures, or guiding specific elements of the image in certain ways. These systems are now standard plug-ins for Canva, Bing, Adobe products, and others. Once the underlying techniques were widely available, innovation exploded and business applications followed.

A Road To Be Explored

In the world of LLM's, there has been a low, steady drumbeat of progress. The standard architecture used today was published by Google in 2018. Github Copilot – a coding-focused LLM – became available as a subscription in June 2022. OpenAI released ChatGPT in November 2022, Meta released their open-source (but “research-only”) LLAMA model in February 2023, and OpenAI released GPT4 (an even more advanced model) in March 2023. StabilityAI released a fully open-source, commercially-licensed model in April 2023, but it has a lot of room for improvement. There are a few other models available from Google, AnthropicAI, EleutherAI, etc., but they're relatively minor players in this field with decent, not great, models available via the Web.

Meanwhile, the hacker community has experimented extensively with different ways of using these tools, but no killer apps have evolved yet. And despite their enormous potential, LLM's have barely made a dent in the business world outside of coding and some basic writing applications.

There are a few reasons for this.

1) Most good models are proprietary, served by 3rd parties, and can get expensive fast. Although they say they don't keep your data, software companies have historically been less-than-honest here. You want to be very careful about putting confidential information into the systems – which significantly limits their utility to business.

2) The one good open-source model (LLAMA) has a very limited license, meaning that it's only useful for experimentation today – not commercial use.

3) Even the experiments based on LLAMA use bespoke frameworks and only run well on very specific hardware or operating systems (cough Mac cough) that are expensive or not widely-used. Trying to port them outside of these constraints has so far yielded poor results. (Trust me – I've tried!)

(It should be noted that there are still some technical limitations, too. Training these models is not cheap (six to seven figures), so likely only to be driven by an organization with some profit motive. Some of the elements of the model – like the amount of text it can consider in one step – also have notable limits today, although techniques for overcoming these are being developed quickly.)

A Roadmap For The Future

Despite the lack of killer apps, people have been very clever at exploring a range of use cases for these systems. (Big shout out to Ethan Mollick at OneUsefulThing, who has started re-imagining both business and education through the use of LLM's.) Overall, they seem to be settling into three main application types:

1) Chatbots for interaction or text generation. Imagine using your LLM as a personalized tutor, a creative partner, or just a rubber ducky. Likewise, if you need to generate code, documentation, or rote marketing material, an LLM can take you quite far. The technology for this basically exists today, but the main problem to solve is democratization – enabling people to own their conversations and data by running the LLM on their local computer.

2) Data Oracles: LLM's trained on (or with access to) a wide variety of documents which can then answer questions or summarize material. Imagine a law office using an LLM for discovery, or a scientist loading an LLM with relevant papers on a subject and exploring the known and unknown. Along with privacy, this use case has a technical hurdle arising from how much data the LLM can keep “front of mind” – but there are multiple solutions being actively explored.

3) Personal Assistants: agents with access to the Internet who can do work on our behalf, developing plans and executing them autonomously (or with minor oversight). Imagine JARVIS from Iron Man, who can be your travel agent, personal secretary, and project manager all in one. Today, the barrier to this mode is both privacy and cost. Your personal secretary needs all of your passwords, plus your credit card number, and, today, every action they take (big or small) costs $.06. How far would you trust an automated system with this, and what would you let them do?

If these tools could be realized, the possibilities for their personal and commercial use are enormous. But, for businesses to adopt this technology, it must be private, affordable, and high-performing.

Based on this, what should the target for model builders be? Here's my thoughts:

Can be run locally on either Windows desktop computers or Windows/Linux servers. Windows has 75% of market share for desktop and laptop computers, and 20% share for servers. (Linux has 80% of servers.) If businesses are to use it, it must be Windows-compatible.
If it needs a GPU, it can use mid-to-high-end consumer GPU's (model size between 12GB-24GB). 12GB GPU's are $400-500 today, and 24GB GPU's start at $1200. A big company could run a server farm, but a small company would likely aim for $3-5k for all overall system cost, depending on it's performance and need. That also puts it in the range of pro-sumer users.
Can be accessed from outside programs (“server”-type construction vs “chat”-type construction). Chatbot architectures are great for today's common use cases, but Data Oracles and Personal Assistants will need to interface with outside systems to be useful. A chat interface just doesn't work for that.
Can execute “fast enough” to meet user needs. Mac users are seeing full answers from LLAMA-based models in about 30 seconds, or roughly 10 words/sec. This (or perhaps down to 5 words/sec) seems to be the limit of utility for these systems – anything slower might as well be done another way. And that would be per user – if a central LLM server is being used for a company of 10 users, it should generate results at a minimum of 50 words/sec.

LLM's have huge potential to transform the way we work with technology and each other. If we can cross the threshold of both easy deployment and easy use – the results will be incredible.

Tags: #ChatGPT #LLM #AI #ArtificialIntelligence