AI

The AI Revolution Is At Our Doorstep

November 9, 2023

Back in April, I wrote a post called “The AI Revolution Is Yet To Come, discussing the surprising lack of progress in LLM technology since OpenAI launched it's GPT-3 product the previous November. On the occasion of OpenAI DevDay, featuring a number of new developments in OpenAI's offerings, this is a good time to reflect progress of LLM's, and AI generally, since my last post.

2023 Has Seen Massive Progress in AI

In short, the AI world has come a long way since April. The biggest improvements with large language models (LLM's) have come from increasing the size of the context window – where previously, LLM's often started “forgetting” information after about 512 tokens (where a token is roughly equivalent to a word), the floor for context windows is now in the thousands of tokens, with some new models even boasting the potential for infinite tokens. This makes use cases such as “chatting with documents” (also known as “retrieval augmented generation” or RAG – what I called “data oracles” in my previous post) a real possibility.

In the commercial world, OpenAI has continued to improve their models, releasing an array of newer versions with larger context windows, more recent information, and more capabilities. Their closest competitor today is AnthropicAI's Claude 2 system, which offers many similar features. Google has been weirdly absent from this space – despite having invented many of the techniques used in these systems, their offerings are considered marginal and underperforming. (I have to think Sundar Pinchai's CEO days are numbered.)

In the open-source space, Meta released an updated version of their open-source Llama model (creatively named “Llama 2”) that was finally licensed for commercial use – and which has since spawned a small Cambrian explosion of LLM's using it as a starting point.

A key evolution point has become the difference between models and weights. The model refers to the mathematical structure of the system – how it takes in data, what data gets multiplied by which variables at which points in the process, etc. The weights are the values of the variables within the model, and are strongly determined by both the exact model and the data set used for training the system. This distinction is important because each new model requires it's own implementation in the code libraries commonly used for running AI – and this implementation process both slows down adoption and is prone to error. However, using new weights in an existing model is virtually plug-and-play.

Because of this, a number of organizations have developed new weights for the Llama 2 model using different data sets, creating systems that can specialize in specific activities (“writing code” being a popular one) or which can perform better at more general activities. Most recently, models from Mistral and Zephyr are competitive with GPT 3.5 across a variety of benchmarks.

The technology infrastructure for actually running LLM's locally has come a long way, as well. When new models and weights become available, running them directly usually requires substantial computing power – far outside the capacity of all but the biggest tech organizations. While a number of companies have popped up to offer GPU computing as a cloud service (for instance, Paperspace or Salad.com), there also has grown a small ecosystem of alternative implementations (like ggml) as well as quantization systems (like GPTQ) that allow larger models to fit on smaller GPU's. The most incredible example I've seen so far has been the MLC system, which compiles models into formats targeted to specific hardware platforms. On a 12GB Nvidia GPU, I was able to serve Llama-based models at the speed of ChatGPT on my own computer. (The conversion process is a bear, but that only needs to happen once, and it's smooth like butter after the conversion is done.)

A number of frameworks have also been developed that enable software to leverage LLM's without committing to any particular LLM system. This frees the user from being reliant on any specific vendor, and allows them to quickly switch to better AI's as they become available. Langchain was an early framework on the scene, and many others have followed, including the independent LlamaIndex, Microsoft's Autogen (which is more agent-focused), and Stanford's DSPy. As with any framework, they all have their strengths, weaknesses, and perspectives, and it's unclear which will see the greatest adoption going forward.

And this is just the text-based, LLM AI technology. In no particular order, there's also been some other remarkable upgrades in AI since April:

OpenAI released DALL-E 3, an updated version of their DALL-E text-to-image system. Quality and consistency is significantly improved, hands are generally accurate, and it can get text right around 50% of the time. (With ChatGPT Plus, you get GPT-4 and unlimited DALL-E 3 generations for USD$20/month, making it one of the best deals on the planet today).
StableDiffusion and Midjourney have also gotten incredible upgrades, making commercial application of text-to-image systems very feasible and affordable. Tools like ControlNet provide additional user control over elements within the final image – a great example being the capability to turn a QR code into a scene.
OpenAI also implemented text-to-speech and speech-to-text within the ChatGPT system, as well as open-sourced their speech-to-text system, Whisper. It's astonishingly fast and accurate, able to transcribe hours of recordings in minutes. The ChatGPT mobile app is now effectively a voice chatbot.
OpenAI also added image-to-text functionality to ChatGPT, allowing users to have images and videos captioned directly from their phone apps.
There have also been remarkable gains made in text-to-video technology, such as Runway, as well as image-to-3D model generation tools, although those are still a bit further from commercialization.

The Moat Is Implementation, Not Technology

While OpenAI announced some improvements in models at Dev Day, it's biggest announcements focused not on technology, but instead user experience. All tools would be on by default. LLM's can now produce structured results that are easier to use programatically. Customized models can now be built – and sold – directly through the ChatGPT website, allowing users to program their own personal LLM's with natural language.

Prices have also come down dramatically. GPT-4 Turbo now costs $.01/1000 tokens in, $.03/1000 tokens out. GPT-3.5-turbo-1106, a slightly older but still very capable model, now costs $.001/1000 tokens in, $.002/1000 tokens out. OpenAI has made the cost of using their models go from “expensive” to “middling-to-rounding-error” for most individual use cases.

As a contrast, while there are now an abundance of free open-source models competitive with GPT3.5, implementing them is still the cost of the hardware (around $500 using a 12GB GPU, or $2200 if using a 24GB GPU) plus a smart engineer to set up and maintain the system. OpenAI makes their system cheap and pay-as-you-go.

The other advantage OpenAI announced is their Copyright Shield service. There remains significant controversy around the use of freely-accessed text, images, and other data as training material for generative AI without consent, including multiple lawsuits seeking damages for exactly this. OpenAI's new Copyright Shield explicitly protects users from legal liability from copyright infringement from using their services. Few open-source models, even if they're licensed for commercial use, can even speak to the copyright implications of their data set, and certainly aren't offering legal protection. It's a huge advantage in the commercial AI space.

AI Has A “White Man” Problem

Unfortunately, AI still has a major blind spot when it comes to use and implications: the primary developers of these tools are overwhelming white, male, and/or white males. While the conversation around the future and moral use of AI is dominated by AI doomers, who argue that advanced AI itself may become an existential threat to humanity, less – much less – discussion is being given to the real harms it can cause to people now – especially non-males and people from disadvantaged backgrounds.*

For instance, male students in New Jersey recently circulated AI-generated nude photos of their female classmates without their consent, causing untold social harm both now and in to the future. While DALL-E 3 puts in strong safeguards to prevent creating such images, tools such as StableDiffusion do not (and cannot, given that they are fully open-source). AI image generation tools have also been actively adopted by the disinformation industry, with the latest example being the Israeli war on Hamas. This is actively shaping public perceptions of a very challenging geo-political situation, with lives hanging in the balance.

Data security with these systems is also a major concern. In my previous post, I noted that “You want to be very careful about putting confidential information into the systems – which significantly limits their utility to business.” To address this, OpenAI had explicitly stated that user data would not be used for system training. This has since been walked back – at DevDay, Sam Altman (CEO of OpenAI) said that “We do not train on your data from the API or ChatGPT Enterprise, ever”. For free and paying users of ChatGPT, they have the option to turn off the use of their data for training in exchange for losing all of their chat history. The #enshittification of AI is already underway!

A strong argument for LLM's is also their potential to be used as agents, such as a virtual executive assistant that can automatically book meetings, answer emails, and make plans on your behalf. This requires a connection (and permission!) to some of the most sensitive data you possess, and authority to act on your behalf. OpenAI walking back their privacy guarantees reinforces that their privacy commitment is only as strong as their cybersecurity and their word. If you are a white male, a data breach would be bad. For anyone outside of that category, it could be life-threatening. The only true way to guarantee the privacy of your data or your company's data is still to keep it within your own network, on your own hardware.

The Road Ahead

In my previous post, I noted three main applications of LLM's: chatbots, data oracles, and assistants. In the last six months, advances in AI have moved all of them into the realm of real possibility, if not fully commercially available. This is huge progress, and worth celebrating.

I also noted a specific target for model builders going forward: systems that run on Windows, using price-accessible consumer GPU's, that can communicate through API's fast enough to be useful. Thanks to systems like MLC, this is now also within reach to knowledgeable prosumers.

The main gap now is two-fold:

First, the availability of off-the-shelf systems performant enough (large context windows, accurate and safe responses, etc.) to run on prosumer hardware, that are also accessible to non-techy business teams.
Second, the availability of apps that can take advantage of these systems to deliver real business value.

OpenAI, as a tech incumbent (to the extent that a company with a mere 1 year lead and $10B war chest can be considered an incumbent) is intent on keeping these two things bundled within their platform as long as they possibly can.

Smart companies that want to serve businesses and consumers, though, will start tackling them as separate opportunities. As the discussion of liability, privacy, and data security start to take greater precedence (and as OpenAI continues down it's path of enshittification), smart companies will be well-positioned to step in and make AI tools useful and safe for everyone.

*To be clear, this is not an original observation, and many researchers have been sounding the alarm on this for years, including Timnit Gebru, Margaret Mitchell, Kate Crawford, Safiya Noble, and many, many others. I just include this here to reflect the state of the discussion I'm seeing in social media and other platforms as of today, in hopes that AI behavior and use will become safer and more equitable as time goes on.

#AI #artificialintelligence #generativeAI #LLM #LLMs #OpenAI

The AI Revolution Is Yet To Come

April 25, 2023

“The future is already here – it's just not evenly distributed.” -William Gibson

It's been six months since OpenAI's ChatGPT system exploded onto the scene, enchanting the world with it's incredible fluidity, range of knowledge, and potential for insight. The reactions have been predictably wide-ranging, from those sneering at the technology as a fancy parlor trick to those seeing it as the future of personal automation. I'm on the record as bullish on large language model (LLM) technology. However, given the frenetic pace of the AI innovation nowadays and the proven profit motive for bringing it to business, I've been surprised at how slow this transition has been so far.

Let me explain.

A Road Taken

As an example of what could happen, consider text-to-image systems. You describe an image in words, and the AI generates pictures based on your description.

Throughout the 2010's, researchers hacked away at the problem, slowly growing the size of the pictures, the fidelity to the prompt, the array of styles available. However, examples never went much beyond poor, small images.

Then, in the course of 2 years, models were released by OpenAI, then Midjourney, then StabilityAI, that created high-quality pictures of a size that could actually be used commercially. Prompts went from a long, complicated paragraph to a simple sentence. Most recently, new tools allow you take these techniques even further, making videos, using reference pictures, or guiding specific elements of the image in certain ways. These systems are now standard plug-ins for Canva, Bing, Adobe products, and others. Once the underlying techniques were widely available, innovation exploded and business applications followed.

A Road To Be Explored

In the world of LLM's, there has been a low, steady drumbeat of progress. The standard architecture used today was published by Google in 2018. Github Copilot – a coding-focused LLM – became available as a subscription in June 2022. OpenAI released ChatGPT in November 2022, Meta released their open-source (but “research-only”) LLAMA model in February 2023, and OpenAI released GPT4 (an even more advanced model) in March 2023. StabilityAI released a fully open-source, commercially-licensed model in April 2023, but it has a lot of room for improvement. There are a few other models available from Google, AnthropicAI, EleutherAI, etc., but they're relatively minor players in this field with decent, not great, models available via the Web.

Meanwhile, the hacker community has experimented extensively with different ways of using these tools, but no killer apps have evolved yet. And despite their enormous potential, LLM's have barely made a dent in the business world outside of coding and some basic writing applications.

There are a few reasons for this.

1) Most good models are proprietary, served by 3rd parties, and can get expensive fast. Although they say they don't keep your data, software companies have historically been less-than-honest here. You want to be very careful about putting confidential information into the systems – which significantly limits their utility to business.

2) The one good open-source model (LLAMA) has a very limited license, meaning that it's only useful for experimentation today – not commercial use.

3) Even the experiments based on LLAMA use bespoke frameworks and only run well on very specific hardware or operating systems (cough Mac cough) that are expensive or not widely-used. Trying to port them outside of these constraints has so far yielded poor results. (Trust me – I've tried!)

(It should be noted that there are still some technical limitations, too. Training these models is not cheap (six to seven figures), so likely only to be driven by an organization with some profit motive. Some of the elements of the model – like the amount of text it can consider in one step – also have notable limits today, although techniques for overcoming these are being developed quickly.)

A Roadmap For The Future

Despite the lack of killer apps, people have been very clever at exploring a range of use cases for these systems. (Big shout out to Ethan Mollick at OneUsefulThing, who has started re-imagining both business and education through the use of LLM's.) Overall, they seem to be settling into three main application types:

1) Chatbots for interaction or text generation. Imagine using your LLM as a personalized tutor, a creative partner, or just a rubber ducky. Likewise, if you need to generate code, documentation, or rote marketing material, an LLM can take you quite far. The technology for this basically exists today, but the main problem to solve is democratization – enabling people to own their conversations and data by running the LLM on their local computer.

2) Data Oracles: LLM's trained on (or with access to) a wide variety of documents which can then answer questions or summarize material. Imagine a law office using an LLM for discovery, or a scientist loading an LLM with relevant papers on a subject and exploring the known and unknown. Along with privacy, this use case has a technical hurdle arising from how much data the LLM can keep “front of mind” – but there are multiple solutions being actively explored.

3) Personal Assistants: agents with access to the Internet who can do work on our behalf, developing plans and executing them autonomously (or with minor oversight). Imagine JARVIS from Iron Man, who can be your travel agent, personal secretary, and project manager all in one. Today, the barrier to this mode is both privacy and cost. Your personal secretary needs all of your passwords, plus your credit card number, and, today, every action they take (big or small) costs $.06. How far would you trust an automated system with this, and what would you let them do?

If these tools could be realized, the possibilities for their personal and commercial use are enormous. But, for businesses to adopt this technology, it must be private, affordable, and high-performing.

Based on this, what should the target for model builders be? Here's my thoughts:

Can be run locally on either Windows desktop computers or Windows/Linux servers. Windows has 75% of market share for desktop and laptop computers, and 20% share for servers. (Linux has 80% of servers.) If businesses are to use it, it must be Windows-compatible.
If it needs a GPU, it can use mid-to-high-end consumer GPU's (model size between 12GB-24GB). 12GB GPU's are $400-500 today, and 24GB GPU's start at $1200. A big company could run a server farm, but a small company would likely aim for $3-5k for all overall system cost, depending on it's performance and need. That also puts it in the range of pro-sumer users.
Can be accessed from outside programs (“server”-type construction vs “chat”-type construction). Chatbot architectures are great for today's common use cases, but Data Oracles and Personal Assistants will need to interface with outside systems to be useful. A chat interface just doesn't work for that.
Can execute “fast enough” to meet user needs. Mac users are seeing full answers from LLAMA-based models in about 30 seconds, or roughly 10 words/sec. This (or perhaps down to 5 words/sec) seems to be the limit of utility for these systems – anything slower might as well be done another way. And that would be per user – if a central LLM server is being used for a company of 10 users, it should generate results at a minimum of 50 words/sec.

LLM's have huge potential to transform the way we work with technology and each other. If we can cross the threshold of both easy deployment and easy use – the results will be incredible.

Tags: #ChatGPT #LLM #AI #ArtificialIntelligence

Language Without Intelligence: ChatGPT, LLM's, and the Future of Artificial Intelligence

February 8, 2023

Following the enthusiastic reception of OpenAI's public ChatGPT release last year, there is now a gold rush by Big Tech to capitalize on it's capabilities. Microsoft recently announced the integration of ChatGPT with its Bing search engine. Google – who have published numerous papers on this tech previously, but never made the systems public – announced the introduction of their Bard system into Google Search. And, naturally, there's a host of start-ups building their own versions of ChatGPT, or leveraging integration with OpenAI's version to power a variety of activities.

What's fascinating about this explosion of applications is that the underlying tech – LLM's, or large-language models – is conceptually simple. These systems take a block of input text (called “tokens”), run them through a neural network, and output a block of text that (according to the system) “best matches” what should come next. It's a language pattern-matcher. That's it.

And yet, it's capabilities are surprisingly powerful. It can compose poems in a variety of styles over a range of subjects. It can summarize. It can assume personas. It can write computer code and (bad) jokes. It can offer advice. And the responses are tight, well-composed English. It's like chatting with another person.

Except, when it's not. As many have noted, it often returns incorrect, if confident, answers. It makes up data and citations. It's code is often buggy or just flat-out wrong. It has a gift for creating responses that sound correct, regardless of the actual truth.

It's language without intelligence.

Let's sit with that for a minute. Engineers have created a machine that can manipulate language with a gift rivalling great poets, and yet often fails simple math problems.

The implications of this are fascinating.

First, from a cognitive science perspective, it suggests that language skill and intelligence – definitely in a machine, possibly in humans, maybe as a general rule – are two completely separate things. Someone compared ChatGPT to “a confident white man” – which a) oof and b) may be more accurate than they realized. In an environment where performance is measured by verbal fluidity or writing skill, but not actual knowledge, ChatGPT would absolutely excel. There are many jobs in the world that fit this description (and unsurprisingly, they seem to be dominated by white men!) For these sorts of activities, an agent – human or machine – doesn't have to be good at any particular thing except for convincing others it is smart through verbal acuity and vague allusions to data, either actual or imagined. (Give it an opinion column in the New York Times!)

Second, technologically, it immediately suggests both the utility and the limits of the system. Need to write an email, an essay, a poem – any product that primarily requires high language skill? ChatGPT and it's successors can now do that with ease. If the ultimate outcome of the activity is influencing a human's opinion (a teacher, a client, a loved one), you're all set. However, if you require a result that is actually right and factual, it requires human intervention. ChatGPT has the human gift for reverse-engineering justifications for it's actions, no matter how outlandish, and so there's no circumstance where you should trust it, on it's own, to do or say the right thing. A person's judgment is still required.

You might ask “how useful is it's output if you still have to revise it?” To which you might also ask “what value is a writer to an editor?” You don't hammer with a chainsaw – all tools don't need to be fit for all purposes. But, if you need to quickly generate readable text with a certain style about a certain subject, it offers a great starting point without minimal labor. For knowledge workers, that offers an incredible potential for time savings.

Finally, these systems do suggest a path toward artificial general intelligence. These models essentially solve the “challenge” of language, but lack both 1) real, truthful information, as well as 2) the ability to sort and assemble that information into knowledge. The first of those is easily answered – hook it up to the Internet, or books or your email account, or any other source of meaningful reference data. Part of ChatGPT's limitations come from the fact that it is deliberately not connected to the Internet, both constraining it and (at this stage) enhancing it's safety.

And, as for the ability to manipulate knowledge – that is underway, with some working proofs-of-concept already developed. If engineers can develop a reasoning system to complement LLM's – enabling them to decompose questions into a connected set of simpler knowledge searches, and perhaps with the tools to integrate that data in various ways – these systems have the potential to facilitate a wide range of knowledge-based activities.

(In fact, some of the earliest AI systems were reasoning machines of exactly this genre, but based on discrete symbols instead of language. LLM's offer the potential to advance these systems by interpreting language-based information that's less clear-cut than mathematical symbols.)

Along with the technical aspects, we must also ask: what does this mean for society? From a business perspective, likely the same as what happens with all automation – the worst gets automated, the best gets accelerated, and humanity's relationship with production changes. Writers of low-quality or formulaic content may be out of a job. Better writers will no longer have to start from a blank page. The best writing will still be manual, bespoke, and rare. The tone of writing across all media will be homogenized, with the quality floor set to “confident white man” (potentially offering benefits toward diversity and inclusion). The quality of all professional communications will improve as LLM's are integrated into Word, Powerpoint, Outlook, and similar communication software. Knowledge management (think wiki's, CRM's, project management tools) becomes much faster and easier through becoming more automated. Software comments will be automatically generated, letting programmers focus on system development. Sales becomes more effective as follow-ups become automated and messages are tailored to the customer. And that's just the beginning.

From a social standpoint, the outlook is more complex. Personalizing content becomes dramatically easier – one could imagine a system where the author just releases prompts for interaction, and an LLM interprets it uniquely for each reader in the way the reader finds most engaging. Video games, especially narrative video games, become deeper and richer. Social media may have more posts but be less interesting. Misinformation production becomes accelerated, and likely becomes more effective as the feedback cycle also accelerates. These new systems magnify many of society's existing challenges, while also opening up exciting new modes of interaction.

This has been a long time coming in the artificial intelligence community. After years of limited results, the availability of Big Computing has enabled revolutions in image processing, art creation – and now, language-based tasks. These are exciting times, with many more developments assuredly coming soon.

Tags: #ChatGPT #LLM #AI #ArtificialIntelligence