The AI Revolution Is At Our Doorstep

November 9, 2023

Back in April, I wrote a post called “The AI Revolution Is Yet To Come, discussing the surprising lack of progress in LLM technology since OpenAI launched it's GPT-3 product the previous November. On the occasion of OpenAI DevDay, featuring a number of new developments in OpenAI's offerings, this is a good time to reflect progress of LLM's, and AI generally, since my last post.

2023 Has Seen Massive Progress in AI

In short, the AI world has come a long way since April. The biggest improvements with large language models (LLM's) have come from increasing the size of the context window – where previously, LLM's often started “forgetting” information after about 512 tokens (where a token is roughly equivalent to a word), the floor for context windows is now in the thousands of tokens, with some new models even boasting the potential for infinite tokens. This makes use cases such as “chatting with documents” (also known as “retrieval augmented generation” or RAG – what I called “data oracles” in my previous post) a real possibility.

In the commercial world, OpenAI has continued to improve their models, releasing an array of newer versions with larger context windows, more recent information, and more capabilities. Their closest competitor today is AnthropicAI's Claude 2 system, which offers many similar features. Google has been weirdly absent from this space – despite having invented many of the techniques used in these systems, their offerings are considered marginal and underperforming. (I have to think Sundar Pinchai's CEO days are numbered.)

In the open-source space, Meta released an updated version of their open-source Llama model (creatively named “Llama 2”) that was finally licensed for commercial use – and which has since spawned a small Cambrian explosion of LLM's using it as a starting point.

A key evolution point has become the difference between models and weights. The model refers to the mathematical structure of the system – how it takes in data, what data gets multiplied by which variables at which points in the process, etc. The weights are the values of the variables within the model, and are strongly determined by both the exact model and the data set used for training the system. This distinction is important because each new model requires it's own implementation in the code libraries commonly used for running AI – and this implementation process both slows down adoption and is prone to error. However, using new weights in an existing model is virtually plug-and-play.

Because of this, a number of organizations have developed new weights for the Llama 2 model using different data sets, creating systems that can specialize in specific activities (“writing code” being a popular one) or which can perform better at more general activities. Most recently, models from Mistral and Zephyr are competitive with GPT 3.5 across a variety of benchmarks.

The technology infrastructure for actually running LLM's locally has come a long way, as well. When new models and weights become available, running them directly usually requires substantial computing power – far outside the capacity of all but the biggest tech organizations. While a number of companies have popped up to offer GPU computing as a cloud service (for instance, Paperspace or Salad.com), there also has grown a small ecosystem of alternative implementations (like ggml) as well as quantization systems (like GPTQ) that allow larger models to fit on smaller GPU's. The most incredible example I've seen so far has been the MLC system, which compiles models into formats targeted to specific hardware platforms. On a 12GB Nvidia GPU, I was able to serve Llama-based models at the speed of ChatGPT on my own computer. (The conversion process is a bear, but that only needs to happen once, and it's smooth like butter after the conversion is done.)

A number of frameworks have also been developed that enable software to leverage LLM's without committing to any particular LLM system. This frees the user from being reliant on any specific vendor, and allows them to quickly switch to better AI's as they become available. Langchain was an early framework on the scene, and many others have followed, including the independent LlamaIndex, Microsoft's Autogen (which is more agent-focused), and Stanford's DSPy. As with any framework, they all have their strengths, weaknesses, and perspectives, and it's unclear which will see the greatest adoption going forward.

And this is just the text-based, LLM AI technology. In no particular order, there's also been some other remarkable upgrades in AI since April:

OpenAI released DALL-E 3, an updated version of their DALL-E text-to-image system. Quality and consistency is significantly improved, hands are generally accurate, and it can get text right around 50% of the time. (With ChatGPT Plus, you get GPT-4 and unlimited DALL-E 3 generations for USD$20/month, making it one of the best deals on the planet today).
StableDiffusion and Midjourney have also gotten incredible upgrades, making commercial application of text-to-image systems very feasible and affordable. Tools like ControlNet provide additional user control over elements within the final image – a great example being the capability to turn a QR code into a scene.
OpenAI also implemented text-to-speech and speech-to-text within the ChatGPT system, as well as open-sourced their speech-to-text system, Whisper. It's astonishingly fast and accurate, able to transcribe hours of recordings in minutes. The ChatGPT mobile app is now effectively a voice chatbot.
OpenAI also added image-to-text functionality to ChatGPT, allowing users to have images and videos captioned directly from their phone apps.
There have also been remarkable gains made in text-to-video technology, such as Runway, as well as image-to-3D model generation tools, although those are still a bit further from commercialization.

The Moat Is Implementation, Not Technology

While OpenAI announced some improvements in models at Dev Day, it's biggest announcements focused not on technology, but instead user experience. All tools would be on by default. LLM's can now produce structured results that are easier to use programatically. Customized models can now be built – and sold – directly through the ChatGPT website, allowing users to program their own personal LLM's with natural language.

Prices have also come down dramatically. GPT-4 Turbo now costs $.01/1000 tokens in, $.03/1000 tokens out. GPT-3.5-turbo-1106, a slightly older but still very capable model, now costs $.001/1000 tokens in, $.002/1000 tokens out. OpenAI has made the cost of using their models go from “expensive” to “middling-to-rounding-error” for most individual use cases.

As a contrast, while there are now an abundance of free open-source models competitive with GPT3.5, implementing them is still the cost of the hardware (around $500 using a 12GB GPU, or $2200 if using a 24GB GPU) plus a smart engineer to set up and maintain the system. OpenAI makes their system cheap and pay-as-you-go.

The other advantage OpenAI announced is their Copyright Shield service. There remains significant controversy around the use of freely-accessed text, images, and other data as training material for generative AI without consent, including multiple lawsuits seeking damages for exactly this. OpenAI's new Copyright Shield explicitly protects users from legal liability from copyright infringement from using their services. Few open-source models, even if they're licensed for commercial use, can even speak to the copyright implications of their data set, and certainly aren't offering legal protection. It's a huge advantage in the commercial AI space.

AI Has A “White Man” Problem

Unfortunately, AI still has a major blind spot when it comes to use and implications: the primary developers of these tools are overwhelming white, male, and/or white males. While the conversation around the future and moral use of AI is dominated by AI doomers, who argue that advanced AI itself may become an existential threat to humanity, less – much less – discussion is being given to the real harms it can cause to people now – especially non-males and people from disadvantaged backgrounds.*

For instance, male students in New Jersey recently circulated AI-generated nude photos of their female classmates without their consent, causing untold social harm both now and in to the future. While DALL-E 3 puts in strong safeguards to prevent creating such images, tools such as StableDiffusion do not (and cannot, given that they are fully open-source). AI image generation tools have also been actively adopted by the disinformation industry, with the latest example being the Israeli war on Hamas. This is actively shaping public perceptions of a very challenging geo-political situation, with lives hanging in the balance.

Data security with these systems is also a major concern. In my previous post, I noted that “You want to be very careful about putting confidential information into the systems – which significantly limits their utility to business.” To address this, OpenAI had explicitly stated that user data would not be used for system training. This has since been walked back – at DevDay, Sam Altman (CEO of OpenAI) said that “We do not train on your data from the API or ChatGPT Enterprise, ever”. For free and paying users of ChatGPT, they have the option to turn off the use of their data for training in exchange for losing all of their chat history. The #enshittification of AI is already underway!

A strong argument for LLM's is also their potential to be used as agents, such as a virtual executive assistant that can automatically book meetings, answer emails, and make plans on your behalf. This requires a connection (and permission!) to some of the most sensitive data you possess, and authority to act on your behalf. OpenAI walking back their privacy guarantees reinforces that their privacy commitment is only as strong as their cybersecurity and their word. If you are a white male, a data breach would be bad. For anyone outside of that category, it could be life-threatening. The only true way to guarantee the privacy of your data or your company's data is still to keep it within your own network, on your own hardware.

The Road Ahead

In my previous post, I noted three main applications of LLM's: chatbots, data oracles, and assistants. In the last six months, advances in AI have moved all of them into the realm of real possibility, if not fully commercially available. This is huge progress, and worth celebrating.

I also noted a specific target for model builders going forward: systems that run on Windows, using price-accessible consumer GPU's, that can communicate through API's fast enough to be useful. Thanks to systems like MLC, this is now also within reach to knowledgeable prosumers.

The main gap now is two-fold:

First, the availability of off-the-shelf systems performant enough (large context windows, accurate and safe responses, etc.) to run on prosumer hardware, that are also accessible to non-techy business teams.
Second, the availability of apps that can take advantage of these systems to deliver real business value.

OpenAI, as a tech incumbent (to the extent that a company with a mere 1 year lead and $10B war chest can be considered an incumbent) is intent on keeping these two things bundled within their platform as long as they possibly can.

Smart companies that want to serve businesses and consumers, though, will start tackling them as separate opportunities. As the discussion of liability, privacy, and data security start to take greater precedence (and as OpenAI continues down it's path of enshittification), smart companies will be well-positioned to step in and make AI tools useful and safe for everyone.

*To be clear, this is not an original observation, and many researchers have been sounding the alarm on this for years, including Timnit Gebru, Margaret Mitchell, Kate Crawford, Safiya Noble, and many, many others. I just include this here to reflect the state of the discussion I'm seeing in social media and other platforms as of today, in hopes that AI behavior and use will become safer and more equitable as time goes on.

#AI #artificialintelligence #generativeAI #LLM #LLMs #OpenAI