Pushing Electrons

Generative AI in 2024 – A Preview

April 10, 2024

2023 was a blowout year for Generative AI systems. We saw huge advances in a range of systems – not only LLM's like ChatGPT, but also text-to-image models like DALL-E and Midjourney, text-to-speech and speech-to-text models, and even text-to-video and the ability to create 3D images from 2D. The future is wild, y'all.

So, naturally, this leads to the question of “what's next?”

The following is my best guess, based on my knowledge of the technology and observation of the ecosystem. My hit rate last year was pretty darn good – hopefully these predictions hold up as well. I look forward to your thoughts on it.

It may seem a bit tardy to be prediction the year in GenAI...in April. I'm a bit behind on a lot of posts at the moment, but figured better late than never. Hopefully, this means that the predictions are a bit better than they would be otherwise – and also more embarrassing if I'm off the mark. As Yogi Berra supposedly said “Prediction is hard – especially about the future.” :)

Better and More Efficient Models

First and foremost, we'll see more and better big commercial models, at prices at or lower than we see today. The big players – OpenAI, Google, Anthropic, etc. – will continue to build bigger and more powerful models – more parameters, higher throughput, more functionality. We're already seeing this play out. Google's Gemini model recently premiered with a 1M token context window, allowing users to upload and query multiple documents simultaneously. Anthropic's most recent Claude update already performs at or better than GPT-4 at prices below those from OpenAI, on the order of a few dollars per million tokens (a token is roughly .5-1 words). Rumors already abound about OpenAI's next model, GPT-5. And that's all before April. It's going to be a fun year.

We'll also likely see many more systems in the open-source space as well. The easy version of this prediction is: smarter models, approaching or surpassing GPT-4, that can run locally and fit on a single consumer-grade GPU (meaning 24GB of VRAM or less). However, I believe we can go further in this prediction. The availability and ease-of-use of llama.cpp as an alternative to other deep learning frameworks (like PyTorch or Tensorflow) has been a major source of innovation in the community, bringing the efficiency and portability of C code to the world of LLM's. This means running generative AI systems at usable speed on just a CPU, not a GPU. Justine Tunney has also been doing miraculous work in the world of system-level optimizations – the llamafile project she's engineering at Mozilla allows you to download a single-file and run an LLM on any common computer. It doesn't matter what OS you have, what brand of CPU or GPU you have, any special languages or programs you have installed – it just runs, everywhere. Incredible work!

There are also many other avenues of development.

A major criticism of GenAI tools in recent months has been the sheer quantity of resources supposedly required for these models. Energy, water, chips – there is no shortage that these systems cannot be blamed for, evidence be damned. It's gotten to the point where Sam Altman, the head of OpenAI, is actively exploring investment in fusion technology, suggesting that energy is the major bottleneck to AI development going forward.

Of course, as with other resource systems (energy grids, water supplies, etc.), the biggest dividends usually come not from breakthrough new technologies, but improved efficiency. So, along with advancements in task performance, I also expect this year will see major advances in improving speed and efficiency of GenAI systems, especially LLM's.

Unsurprisingly, there have already been a number of great developments in this direction. Some have explored shrinking models, including Microsoft's 1.58 bit LLM's paper, along with studies showing that a huge portion of model parameters can be zeroed out with minimal performance impact. Google released a paper recently showing that some words are easier to predict than others – meaning that significant energy can be saved by simply focusing more energy on the hard words and less on the easy ones. These sorts of tricks accounted for major advances in basic computing efficiency at the chip level decades ago – I expect will see many more approaches before the year is out, potentially offering order-of-magnitude improvements in resource consumption in the near future.

Compositional LLM's

As people experimented with LLM's over 2023, a common theme kept arising – the more that the models were tuned, the dumber they became. GPT-4 became famous for half-assing programming tasks (framing out the effort, then suggesting the user fill in the details), avoiding details in knowledge tasks, and so on. Conversely, using open-source models that had not yet been aligned (meaning “tuned for human preferences”) was eye-opening, both for the stark improvement in performance, as well as their readiness to create offensive content.

My theory is that we are simply asking too much of a single model. General purpose models have their place for general purpose tasks, in the same way that Wikipedia is a great starting place for learning about things, but rarely a final resource. Models trained on specific tasks, like programming or translation, tend to do much better at those tasks, but worse at others (like planning or question-answering).

Based on this, one promising direction for LLM development has been with Mixture-of-Expert models (or MoE for short) – rather than having a single super-smart expert on everything, designers blend together several smaller models, each expert in a specific area, to complete the user's task. These systems tend to require more memory overall, but surprisingly less than the sum of their parts would suggest, and their performance has been very promising so far.

However, I don't think that's quite far enough. In addition to answering a broad diversity of questions, we're also challenging these systems to detect harmful prompts, output aligned (i.e. human-respecting) outputs or structured outputs, and be less sensitive to specific prompt wording. Along with using multiple LLM's in parallel (as in the MoE approach), I expect we'll also start to see them in series as well.

Imagine a multi-step LLM system, where a single prompt passes through multiple smaller LLM's in order, each accomplishing a specific task to serve the user, such as:

Security filtering (block harmful prompts)
Prompt optimization (reduce sensitivity to specific prompt phrasing)
Expert routing
Expert solution
Output alignment (keep it from saying something offensive)
Structured formatting (output result at JSON, YAML, or more)

There have been some developments in this regard, especially with respect to structured formatting (Anthropic is ahead of the game here, as well as tools like the Outlines), but this overall concept seems underexplored by the community. I expect to see more soon.

Agentic Systems

But let's take this a step further. One of the tantalizing dreams of LLM's has been the potential to extend them beyond chatbots, into fully agentic assistants. There are now a number of frameworks (most notably DSPy) that enable a single LLM to assume multiple personas and talk to other versions of itself. These are now being built into fully functional systems, such as OpenDevin, that can be given a goal and automatically complete the work required as a cohort of differently-prompted model instances, acting as a synthetic “team”.

The best thing about this approach is that you don't need powerful models to make this happen. Rather than using one super-smart LLM as a jack-of-all-trades, you can use a virtual group of more-efficient AI models, where each member is more performant in their area.

This opens up a whole range of use cases and implementation approaches.

For privacy, security, cost, or convenience reasons, you could run a model on your own local computer versus having to use pre-defined or pre-tuned models from elsewhere.
You are not constrained by using any one model. You could use one base model (but different prompts) for each agent, use different models for each agent, or some mix of the two. You're also not constrained by what's available today – as new models become available, simply switch them in as needed.
This also potentially opens up use cases far beyond what could be achieved with a single-prompted LLM. In today's world, you're essentially finding one super-smart person, then asking them to consider a task from multiple points of view. You're carrying around that previous history of the conversation through a single personality lens, even if exploring different facets of a problem. In an agentic structure, you're starting from that same super-smart core person, but, through unique prompts, essentially creating independent experts in collborating with each other. It enables a much broader range of exploration beyond what a single LLM could potentially handle, staging out the work in different phases, and potentially using different LLM's for different parts of the works. It's a whole new ball game, and we're still in the first inning.

Outside of LLM's

So far I've written primarily about LLM-based GenAI, and haven't touched on other systems yet. This is in part because I know less about other systems, and I don't have as clear a vision for them as for LLM's. However, based on previous trends, I think we can safely expect to see them continue to improve along multiple lines:

Fidelity: higher accuracy and detail in their outputs, as well as larger outputs (longer songs, bigger pictures, etc.)
Steering: staying closer to user expectations, and more consistent across a series of generations. (Think of generating a series of marketing pictures using a single consistent character, or an album of songs with consistent instrumentation and vocalists.)
Cost: the price for using these will stay or get lower over time, if they're not already free.

The one direction I'm not prepared to predict is the availability of open-source models. StabilityAI had been a major leader in this area for a range of models, but have recently come in to difficult times. It's not clear that OSS community has the same level of skill with non-text-to-text models as they do with LLM's, so I wouldn't be surprised to see open-source development of locally-deployable non-text-to-text models slow down.

The one wildcard here is multimodal models, which can operate with a wide range of input (and potentially output) modalities, such as text, images, video, and so on. I don't think these will fully supplant targeted models focusing on specific modalities, but they could be a great addition to the assortment.

Legal Resolution

A long-standing challenge with Generative AI is it's troubled origin story – most major GenAI projects started with a scrape of the Internet, without asking for permission first. For publicly-owned or -committed data, like Wikipedia, that's not so much of an issue. However, for other data – forum posts, artistic output, pirated media, etc. – which has not granted explicit permission for other uses, it's been a grey area, a sticking point, and a source of both enmity and lawsuits that has cast a shadow over the whole undertaking.

It's clear that some of these discussions will reach a point of resolution this year. The New York Times lawsuit against OpenAI is clearly a gambit for a licensing arrangement, and both parties are motivated to resolve it quickly. The EU is moving forward with legislation to regulate AI model creation, including improved documentation of training sources and standards for release. US Representative Adam Schiff recently proposed legislation that would require disclosure of training data sources as well. This is something that AI researchers and companies should have been doing all along – now, it looks like the law is coming to force their hand.

What intrigues me is how this may impact GenAI perception and adoption going forward. Much of the reaction to GenAI on social media has clung to this particular issue with GenAI as a reason to scuttle the whole enterprise. But, as these tools become more ever-present, easier to use, and with proven results in both commercial and personal terms, will people come to accept that these systems are actually beneficial if the whole process is consensual, or will the original sin tarnish these systems for a generation?

As with all other aspects of this ecosystem – only time will tell. :)

#llm #llms #GenerativeAI #GenAI #ChatGPT #OpenAI #Anthropic

The AI Revolution Is At Our Doorstep

November 9, 2023

Back in April, I wrote a post called “The AI Revolution Is Yet To Come, discussing the surprising lack of progress in LLM technology since OpenAI launched it's GPT-3 product the previous November. On the occasion of OpenAI DevDay, featuring a number of new developments in OpenAI's offerings, this is a good time to reflect progress of LLM's, and AI generally, since my last post.

2023 Has Seen Massive Progress in AI

In short, the AI world has come a long way since April. The biggest improvements with large language models (LLM's) have come from increasing the size of the context window – where previously, LLM's often started “forgetting” information after about 512 tokens (where a token is roughly equivalent to a word), the floor for context windows is now in the thousands of tokens, with some new models even boasting the potential for infinite tokens. This makes use cases such as “chatting with documents” (also known as “retrieval augmented generation” or RAG – what I called “data oracles” in my previous post) a real possibility.

In the commercial world, OpenAI has continued to improve their models, releasing an array of newer versions with larger context windows, more recent information, and more capabilities. Their closest competitor today is AnthropicAI's Claude 2 system, which offers many similar features. Google has been weirdly absent from this space – despite having invented many of the techniques used in these systems, their offerings are considered marginal and underperforming. (I have to think Sundar Pinchai's CEO days are numbered.)

In the open-source space, Meta released an updated version of their open-source Llama model (creatively named “Llama 2”) that was finally licensed for commercial use – and which has since spawned a small Cambrian explosion of LLM's using it as a starting point.

A key evolution point has become the difference between models and weights. The model refers to the mathematical structure of the system – how it takes in data, what data gets multiplied by which variables at which points in the process, etc. The weights are the values of the variables within the model, and are strongly determined by both the exact model and the data set used for training the system. This distinction is important because each new model requires it's own implementation in the code libraries commonly used for running AI – and this implementation process both slows down adoption and is prone to error. However, using new weights in an existing model is virtually plug-and-play.

Because of this, a number of organizations have developed new weights for the Llama 2 model using different data sets, creating systems that can specialize in specific activities (“writing code” being a popular one) or which can perform better at more general activities. Most recently, models from Mistral and Zephyr are competitive with GPT 3.5 across a variety of benchmarks.

The technology infrastructure for actually running LLM's locally has come a long way, as well. When new models and weights become available, running them directly usually requires substantial computing power – far outside the capacity of all but the biggest tech organizations. While a number of companies have popped up to offer GPU computing as a cloud service (for instance, Paperspace or Salad.com), there also has grown a small ecosystem of alternative implementations (like ggml) as well as quantization systems (like GPTQ) that allow larger models to fit on smaller GPU's. The most incredible example I've seen so far has been the MLC system, which compiles models into formats targeted to specific hardware platforms. On a 12GB Nvidia GPU, I was able to serve Llama-based models at the speed of ChatGPT on my own computer. (The conversion process is a bear, but that only needs to happen once, and it's smooth like butter after the conversion is done.)

A number of frameworks have also been developed that enable software to leverage LLM's without committing to any particular LLM system. This frees the user from being reliant on any specific vendor, and allows them to quickly switch to better AI's as they become available. Langchain was an early framework on the scene, and many others have followed, including the independent LlamaIndex, Microsoft's Autogen (which is more agent-focused), and Stanford's DSPy. As with any framework, they all have their strengths, weaknesses, and perspectives, and it's unclear which will see the greatest adoption going forward.

And this is just the text-based, LLM AI technology. In no particular order, there's also been some other remarkable upgrades in AI since April:

OpenAI released DALL-E 3, an updated version of their DALL-E text-to-image system. Quality and consistency is significantly improved, hands are generally accurate, and it can get text right around 50% of the time. (With ChatGPT Plus, you get GPT-4 and unlimited DALL-E 3 generations for USD$20/month, making it one of the best deals on the planet today).
StableDiffusion and Midjourney have also gotten incredible upgrades, making commercial application of text-to-image systems very feasible and affordable. Tools like ControlNet provide additional user control over elements within the final image – a great example being the capability to turn a QR code into a scene.
OpenAI also implemented text-to-speech and speech-to-text within the ChatGPT system, as well as open-sourced their speech-to-text system, Whisper. It's astonishingly fast and accurate, able to transcribe hours of recordings in minutes. The ChatGPT mobile app is now effectively a voice chatbot.
OpenAI also added image-to-text functionality to ChatGPT, allowing users to have images and videos captioned directly from their phone apps.
There have also been remarkable gains made in text-to-video technology, such as Runway, as well as image-to-3D model generation tools, although those are still a bit further from commercialization.

The Moat Is Implementation, Not Technology

While OpenAI announced some improvements in models at Dev Day, it's biggest announcements focused not on technology, but instead user experience. All tools would be on by default. LLM's can now produce structured results that are easier to use programatically. Customized models can now be built – and sold – directly through the ChatGPT website, allowing users to program their own personal LLM's with natural language.

Prices have also come down dramatically. GPT-4 Turbo now costs $.01/1000 tokens in, $.03/1000 tokens out. GPT-3.5-turbo-1106, a slightly older but still very capable model, now costs $.001/1000 tokens in, $.002/1000 tokens out. OpenAI has made the cost of using their models go from “expensive” to “middling-to-rounding-error” for most individual use cases.

As a contrast, while there are now an abundance of free open-source models competitive with GPT3.5, implementing them is still the cost of the hardware (around $500 using a 12GB GPU, or $2200 if using a 24GB GPU) plus a smart engineer to set up and maintain the system. OpenAI makes their system cheap and pay-as-you-go.

The other advantage OpenAI announced is their Copyright Shield service. There remains significant controversy around the use of freely-accessed text, images, and other data as training material for generative AI without consent, including multiple lawsuits seeking damages for exactly this. OpenAI's new Copyright Shield explicitly protects users from legal liability from copyright infringement from using their services. Few open-source models, even if they're licensed for commercial use, can even speak to the copyright implications of their data set, and certainly aren't offering legal protection. It's a huge advantage in the commercial AI space.

AI Has A “White Man” Problem

Unfortunately, AI still has a major blind spot when it comes to use and implications: the primary developers of these tools are overwhelming white, male, and/or white males. While the conversation around the future and moral use of AI is dominated by AI doomers, who argue that advanced AI itself may become an existential threat to humanity, less – much less – discussion is being given to the real harms it can cause to people now – especially non-males and people from disadvantaged backgrounds.*

For instance, male students in New Jersey recently circulated AI-generated nude photos of their female classmates without their consent, causing untold social harm both now and in to the future. While DALL-E 3 puts in strong safeguards to prevent creating such images, tools such as StableDiffusion do not (and cannot, given that they are fully open-source). AI image generation tools have also been actively adopted by the disinformation industry, with the latest example being the Israeli war on Hamas. This is actively shaping public perceptions of a very challenging geo-political situation, with lives hanging in the balance.

Data security with these systems is also a major concern. In my previous post, I noted that “You want to be very careful about putting confidential information into the systems – which significantly limits their utility to business.” To address this, OpenAI had explicitly stated that user data would not be used for system training. This has since been walked back – at DevDay, Sam Altman (CEO of OpenAI) said that “We do not train on your data from the API or ChatGPT Enterprise, ever”. For free and paying users of ChatGPT, they have the option to turn off the use of their data for training in exchange for losing all of their chat history. The #enshittification of AI is already underway!

A strong argument for LLM's is also their potential to be used as agents, such as a virtual executive assistant that can automatically book meetings, answer emails, and make plans on your behalf. This requires a connection (and permission!) to some of the most sensitive data you possess, and authority to act on your behalf. OpenAI walking back their privacy guarantees reinforces that their privacy commitment is only as strong as their cybersecurity and their word. If you are a white male, a data breach would be bad. For anyone outside of that category, it could be life-threatening. The only true way to guarantee the privacy of your data or your company's data is still to keep it within your own network, on your own hardware.

The Road Ahead

In my previous post, I noted three main applications of LLM's: chatbots, data oracles, and assistants. In the last six months, advances in AI have moved all of them into the realm of real possibility, if not fully commercially available. This is huge progress, and worth celebrating.

I also noted a specific target for model builders going forward: systems that run on Windows, using price-accessible consumer GPU's, that can communicate through API's fast enough to be useful. Thanks to systems like MLC, this is now also within reach to knowledgeable prosumers.

The main gap now is two-fold:

First, the availability of off-the-shelf systems performant enough (large context windows, accurate and safe responses, etc.) to run on prosumer hardware, that are also accessible to non-techy business teams.
Second, the availability of apps that can take advantage of these systems to deliver real business value.

OpenAI, as a tech incumbent (to the extent that a company with a mere 1 year lead and $10B war chest can be considered an incumbent) is intent on keeping these two things bundled within their platform as long as they possibly can.

Smart companies that want to serve businesses and consumers, though, will start tackling them as separate opportunities. As the discussion of liability, privacy, and data security start to take greater precedence (and as OpenAI continues down it's path of enshittification), smart companies will be well-positioned to step in and make AI tools useful and safe for everyone.

*To be clear, this is not an original observation, and many researchers have been sounding the alarm on this for years, including Timnit Gebru, Margaret Mitchell, Kate Crawford, Safiya Noble, and many, many others. I just include this here to reflect the state of the discussion I'm seeing in social media and other platforms as of today, in hopes that AI behavior and use will become safer and more equitable as time goes on.

#AI #artificialintelligence #generativeAI #LLM #LLMs #OpenAI

A Super Simple Reverse-Proxy Script Using Cloudflare API

July 10, 2023

For nerd reasons, I've been researching strategies for setting up a reverse-proxy. A reverse-proxy, with a few other steps, allows people to reach a computer behind a router just by visiting a URL. So, if you wanted to host a public website from a computer at your house, you'd likely need a reverse-proxy.

Reverse-proxy software grabs the “public” IP address of your computer, then uses a special service (called a “Dynamic DNS service”) to connect the domain name (URL) for your site to your IP address. That way, when someone visits “awesomewebsite.com”, they'll get sent to your computer hosting the site*. Many of the directions I've found recommended a program called ddclient, which does this automatically for a variety of services. Unfortunately, as of July 4th, 2023, this project is no longer maintained.

However, one of the references I found also noted that Cloudflare, whose systems protect websites from a range of different threats, can basically act as a dynamic DNS service through their API. I was intrigued, so I put together a super simple reverse-proxy script using the Cloudflare API – and it actually worked!

The full script and instructions can be found here. This post provides a quick walkthrough of how it works.

There's three key steps.

The first step is “getting your IP address”. This is made easy due to Ipify, which has a simple (and free!) API for returning your IP address. A quick call to https://api.ipify.org returns a response including your IP.

The second step is getting the record information for your domain from your Cloudflare account. (This assumes that you already set up an account with Cloudflare – they have a free tier! – and have added the domain name for your site to your Cloudflare account.) In order to update the DNS records for your site (which connect your domain name and IP), you need to get the object ID's associated with those records. One call to Cloudflare's DNS API using the zone ID for your domain name returns all the records associated with that domain. A quick parsing of the response lets you find the id's of the records that could be updated, and verify if an update is actually needed.

The third step (if necessary) is updating the record information for your domain with Cloudflare. Using the id's from the previous step, you call each record that redirects that IP address type (usually two “A”-type records – one for the vanilla domain, and one starting with “www”), and update the IP address.

And that's it!

Deploying this system is relatively straightforward. You need to copy the repository to your computer, rename “config.example” to “config.yaml”, fill in the details for your specific site and Cloudflare account in “config.yaml”, then run the script periodically (like with a cron job in Linux or a scheduled task in Windows). Overall, though, the amount of set-up work is very similar to installing someone else's program – except, now, you have full control over a simple-yet-powerful system that does the same thing and is fully transparent. Isn't that great?

*I'm purposefully glossing over some of the gritty details here, but the description above covers the essentials.

The AI Revolution Is Yet To Come

April 25, 2023

“The future is already here – it's just not evenly distributed.” -William Gibson

It's been six months since OpenAI's ChatGPT system exploded onto the scene, enchanting the world with it's incredible fluidity, range of knowledge, and potential for insight. The reactions have been predictably wide-ranging, from those sneering at the technology as a fancy parlor trick to those seeing it as the future of personal automation. I'm on the record as bullish on large language model (LLM) technology. However, given the frenetic pace of the AI innovation nowadays and the proven profit motive for bringing it to business, I've been surprised at how slow this transition has been so far.

Let me explain.

A Road Taken

As an example of what could happen, consider text-to-image systems. You describe an image in words, and the AI generates pictures based on your description.

Throughout the 2010's, researchers hacked away at the problem, slowly growing the size of the pictures, the fidelity to the prompt, the array of styles available. However, examples never went much beyond poor, small images.

Then, in the course of 2 years, models were released by OpenAI, then Midjourney, then StabilityAI, that created high-quality pictures of a size that could actually be used commercially. Prompts went from a long, complicated paragraph to a simple sentence. Most recently, new tools allow you take these techniques even further, making videos, using reference pictures, or guiding specific elements of the image in certain ways. These systems are now standard plug-ins for Canva, Bing, Adobe products, and others. Once the underlying techniques were widely available, innovation exploded and business applications followed.

A Road To Be Explored

In the world of LLM's, there has been a low, steady drumbeat of progress. The standard architecture used today was published by Google in 2018. Github Copilot – a coding-focused LLM – became available as a subscription in June 2022. OpenAI released ChatGPT in November 2022, Meta released their open-source (but “research-only”) LLAMA model in February 2023, and OpenAI released GPT4 (an even more advanced model) in March 2023. StabilityAI released a fully open-source, commercially-licensed model in April 2023, but it has a lot of room for improvement. There are a few other models available from Google, AnthropicAI, EleutherAI, etc., but they're relatively minor players in this field with decent, not great, models available via the Web.

Meanwhile, the hacker community has experimented extensively with different ways of using these tools, but no killer apps have evolved yet. And despite their enormous potential, LLM's have barely made a dent in the business world outside of coding and some basic writing applications.

There are a few reasons for this.

1) Most good models are proprietary, served by 3rd parties, and can get expensive fast. Although they say they don't keep your data, software companies have historically been less-than-honest here. You want to be very careful about putting confidential information into the systems – which significantly limits their utility to business.

2) The one good open-source model (LLAMA) has a very limited license, meaning that it's only useful for experimentation today – not commercial use.

3) Even the experiments based on LLAMA use bespoke frameworks and only run well on very specific hardware or operating systems (cough Mac cough) that are expensive or not widely-used. Trying to port them outside of these constraints has so far yielded poor results. (Trust me – I've tried!)

(It should be noted that there are still some technical limitations, too. Training these models is not cheap (six to seven figures), so likely only to be driven by an organization with some profit motive. Some of the elements of the model – like the amount of text it can consider in one step – also have notable limits today, although techniques for overcoming these are being developed quickly.)

A Roadmap For The Future

Despite the lack of killer apps, people have been very clever at exploring a range of use cases for these systems. (Big shout out to Ethan Mollick at OneUsefulThing, who has started re-imagining both business and education through the use of LLM's.) Overall, they seem to be settling into three main application types:

1) Chatbots for interaction or text generation. Imagine using your LLM as a personalized tutor, a creative partner, or just a rubber ducky. Likewise, if you need to generate code, documentation, or rote marketing material, an LLM can take you quite far. The technology for this basically exists today, but the main problem to solve is democratization – enabling people to own their conversations and data by running the LLM on their local computer.

2) Data Oracles: LLM's trained on (or with access to) a wide variety of documents which can then answer questions or summarize material. Imagine a law office using an LLM for discovery, or a scientist loading an LLM with relevant papers on a subject and exploring the known and unknown. Along with privacy, this use case has a technical hurdle arising from how much data the LLM can keep “front of mind” – but there are multiple solutions being actively explored.

3) Personal Assistants: agents with access to the Internet who can do work on our behalf, developing plans and executing them autonomously (or with minor oversight). Imagine JARVIS from Iron Man, who can be your travel agent, personal secretary, and project manager all in one. Today, the barrier to this mode is both privacy and cost. Your personal secretary needs all of your passwords, plus your credit card number, and, today, every action they take (big or small) costs $.06. How far would you trust an automated system with this, and what would you let them do?

If these tools could be realized, the possibilities for their personal and commercial use are enormous. But, for businesses to adopt this technology, it must be private, affordable, and high-performing.

Based on this, what should the target for model builders be? Here's my thoughts:

Can be run locally on either Windows desktop computers or Windows/Linux servers. Windows has 75% of market share for desktop and laptop computers, and 20% share for servers. (Linux has 80% of servers.) If businesses are to use it, it must be Windows-compatible.
If it needs a GPU, it can use mid-to-high-end consumer GPU's (model size between 12GB-24GB). 12GB GPU's are $400-500 today, and 24GB GPU's start at $1200. A big company could run a server farm, but a small company would likely aim for $3-5k for all overall system cost, depending on it's performance and need. That also puts it in the range of pro-sumer users.
Can be accessed from outside programs (“server”-type construction vs “chat”-type construction). Chatbot architectures are great for today's common use cases, but Data Oracles and Personal Assistants will need to interface with outside systems to be useful. A chat interface just doesn't work for that.
Can execute “fast enough” to meet user needs. Mac users are seeing full answers from LLAMA-based models in about 30 seconds, or roughly 10 words/sec. This (or perhaps down to 5 words/sec) seems to be the limit of utility for these systems – anything slower might as well be done another way. And that would be per user – if a central LLM server is being used for a company of 10 users, it should generate results at a minimum of 50 words/sec.

LLM's have huge potential to transform the way we work with technology and each other. If we can cross the threshold of both easy deployment and easy use – the results will be incredible.

Tags: #ChatGPT #LLM #AI #ArtificialIntelligence

Setting up Conda in a Windows Environment

March 29, 2023

With the excitement around ChatGPT and similar LLM's recently (along with the privacy concerns about sending your prompts to an independent, private org), I've been tinkering with local versions of LLM's. Those that use python programs often require Conda, especially if you're running them on a GPU. Getting this set up is a bit harder than I expected, so here's a quick guide to setting up a Conda environment on Windows.

First, download the installer for Miniconda. (The full Anaconda installer is very big, and you don't need all that.) The link can be found here.
After that's downloaded, go ahead and install. I'd recommend installing for all user accounts if you can.
Once that is installed, you'll need to add it to your system's path variable. Go to System Properties (you may have to look this up through the Windows Search Bar) –> Environment Variables –> System Variables. Select “Path”, then click the “Edit” button. On the next screen, click “New”, select the new entry, then “Browse”. Assuming you installed at the default location, you want to choose “C:\ProgramData\miniconda3\Scripts”. Once that is selected, click “OK” to all of the open menus.
Open a command window as an administrator (one option is Windows Search Bar, type “cmd”, then select “Run as Administrator” in the option that pops up). At the command line, use the command “conda init”. This is a one-time command to set up conda commands on your command window. Once that runs, close the command window, then open another one (you don't need to be administrator this time). Now it's time to create your conda environment!
Use the command “conda create —prefix <location>”, where <location> is the place on your hard drive where you'd like the conda environment to be created (like “C:\Users\MyName\conda”). You will be referencing this location in the future when using this environment.
Once this environment is created, activate it with the command “conda activate <location>”, where <location> is the hard drive location used on the previous step.
Now, you can install all of the cool stuff you need in your environment. See this page for a starting reference. To force the conda environment to use all the cool stuff you just installed, you may need to use the command SET PYTHONPATH="F:\...path...\...to...\...your...\...conda-install...".
If you need to get out of the environment, you can either close your command window or use the command “conda deactivate”. When you want to use it again in the future, just use the command “conda activate <location>”.

Good luck!

Language Without Intelligence: ChatGPT, LLM's, and the Future of Artificial Intelligence

February 8, 2023

Following the enthusiastic reception of OpenAI's public ChatGPT release last year, there is now a gold rush by Big Tech to capitalize on it's capabilities. Microsoft recently announced the integration of ChatGPT with its Bing search engine. Google – who have published numerous papers on this tech previously, but never made the systems public – announced the introduction of their Bard system into Google Search. And, naturally, there's a host of start-ups building their own versions of ChatGPT, or leveraging integration with OpenAI's version to power a variety of activities.

What's fascinating about this explosion of applications is that the underlying tech – LLM's, or large-language models – is conceptually simple. These systems take a block of input text (called “tokens”), run them through a neural network, and output a block of text that (according to the system) “best matches” what should come next. It's a language pattern-matcher. That's it.

And yet, it's capabilities are surprisingly powerful. It can compose poems in a variety of styles over a range of subjects. It can summarize. It can assume personas. It can write computer code and (bad) jokes. It can offer advice. And the responses are tight, well-composed English. It's like chatting with another person.

Except, when it's not. As many have noted, it often returns incorrect, if confident, answers. It makes up data and citations. It's code is often buggy or just flat-out wrong. It has a gift for creating responses that sound correct, regardless of the actual truth.

It's language without intelligence.

Let's sit with that for a minute. Engineers have created a machine that can manipulate language with a gift rivalling great poets, and yet often fails simple math problems.

The implications of this are fascinating.

First, from a cognitive science perspective, it suggests that language skill and intelligence – definitely in a machine, possibly in humans, maybe as a general rule – are two completely separate things. Someone compared ChatGPT to “a confident white man” – which a) oof and b) may be more accurate than they realized. In an environment where performance is measured by verbal fluidity or writing skill, but not actual knowledge, ChatGPT would absolutely excel. There are many jobs in the world that fit this description (and unsurprisingly, they seem to be dominated by white men!) For these sorts of activities, an agent – human or machine – doesn't have to be good at any particular thing except for convincing others it is smart through verbal acuity and vague allusions to data, either actual or imagined. (Give it an opinion column in the New York Times!)

Second, technologically, it immediately suggests both the utility and the limits of the system. Need to write an email, an essay, a poem – any product that primarily requires high language skill? ChatGPT and it's successors can now do that with ease. If the ultimate outcome of the activity is influencing a human's opinion (a teacher, a client, a loved one), you're all set. However, if you require a result that is actually right and factual, it requires human intervention. ChatGPT has the human gift for reverse-engineering justifications for it's actions, no matter how outlandish, and so there's no circumstance where you should trust it, on it's own, to do or say the right thing. A person's judgment is still required.

You might ask “how useful is it's output if you still have to revise it?” To which you might also ask “what value is a writer to an editor?” You don't hammer with a chainsaw – all tools don't need to be fit for all purposes. But, if you need to quickly generate readable text with a certain style about a certain subject, it offers a great starting point without minimal labor. For knowledge workers, that offers an incredible potential for time savings.

Finally, these systems do suggest a path toward artificial general intelligence. These models essentially solve the “challenge” of language, but lack both 1) real, truthful information, as well as 2) the ability to sort and assemble that information into knowledge. The first of those is easily answered – hook it up to the Internet, or books or your email account, or any other source of meaningful reference data. Part of ChatGPT's limitations come from the fact that it is deliberately not connected to the Internet, both constraining it and (at this stage) enhancing it's safety.

And, as for the ability to manipulate knowledge – that is underway, with some working proofs-of-concept already developed. If engineers can develop a reasoning system to complement LLM's – enabling them to decompose questions into a connected set of simpler knowledge searches, and perhaps with the tools to integrate that data in various ways – these systems have the potential to facilitate a wide range of knowledge-based activities.

(In fact, some of the earliest AI systems were reasoning machines of exactly this genre, but based on discrete symbols instead of language. LLM's offer the potential to advance these systems by interpreting language-based information that's less clear-cut than mathematical symbols.)

Along with the technical aspects, we must also ask: what does this mean for society? From a business perspective, likely the same as what happens with all automation – the worst gets automated, the best gets accelerated, and humanity's relationship with production changes. Writers of low-quality or formulaic content may be out of a job. Better writers will no longer have to start from a blank page. The best writing will still be manual, bespoke, and rare. The tone of writing across all media will be homogenized, with the quality floor set to “confident white man” (potentially offering benefits toward diversity and inclusion). The quality of all professional communications will improve as LLM's are integrated into Word, Powerpoint, Outlook, and similar communication software. Knowledge management (think wiki's, CRM's, project management tools) becomes much faster and easier through becoming more automated. Software comments will be automatically generated, letting programmers focus on system development. Sales becomes more effective as follow-ups become automated and messages are tailored to the customer. And that's just the beginning.

From a social standpoint, the outlook is more complex. Personalizing content becomes dramatically easier – one could imagine a system where the author just releases prompts for interaction, and an LLM interprets it uniquely for each reader in the way the reader finds most engaging. Video games, especially narrative video games, become deeper and richer. Social media may have more posts but be less interesting. Misinformation production becomes accelerated, and likely becomes more effective as the feedback cycle also accelerates. These new systems magnify many of society's existing challenges, while also opening up exciting new modes of interaction.

This has been a long time coming in the artificial intelligence community. After years of limited results, the availability of Big Computing has enabled revolutions in image processing, art creation – and now, language-based tasks. These are exciting times, with many more developments assuredly coming soon.

Tags: #ChatGPT #LLM #AI #ArtificialIntelligence

Django Super Quick Essential Start Guide

February 6, 2023

I recently started learning the Django web framework. The quick start documentation is terrific, but I could use a condensed version. That's this page. Intent here is to have a single-page version of all of the key info needed to get a Django app up and running. This guide focuses on basic, essential information – future guides will contain info on more advanced topics.

This guide assumes that Python 3, Django, and python-dotenv are already installed on the system.

New app, file system configuration

Go to directory where you'd like to store your code, then:

django-admin startproject mysite

where “mysite” is the name of your project.

To create an app within this project, go into the “mysite” folder, then:

python manage.py startapp appname

where “appname” is the name of the app within this project.

If, at any point, you want to run the test server, you can do so with:

python manage.py runserver

Unfortunately, this configuration has some security problems by default, and you should fix them before pushing to Github.

First, create the file “mysite/.gitignore”. Use the toptal configuration at this link as a start.

In “mysite/settings.py”, you'll find

SECRET_KEY=<long random string>

Copy this entry into a file called “.env” in your “mysite” directory. Then, in “mysite/settings.py”, change the entry to the following:

SECRET_KEY = os.environ['SECRET_KEY']

Setting up views

The server uses the following chain of calls for views: 1. “mysite/urls.py” –> 2. “appname/urls.py” –> 3. “appname/views.py”.

In “mysite/urls.py”, you'll need to configure the following to get started:

from django.contrib import admin
from django.urls import include, path

urlpatterns = [
    path('appname/', include('appname.urls')),
    path('admin/', admin.site.urls),
]

In “appname/urls.py”, you'll start with the following:

from django.urls import path

from . import views

urlpatterns = [
    path('', views.index, name='index'),
]

Then, in “appname/views.py”, this will be tuned more specifically to your individual app needs. For a basic “Hello World!” configuration, you can use:

from django.http import HttpResponse

def index(request):
    return HttpResponse("Hello, world. You're at the appname index.")

(Very important to note that the “ttp” is lowercase. “HTTPResponse” will fetch you nothing but heartache.)

More commonly, paths will likely involve some data reference in the URL. In “appname/urls.py”, the paths are checked sequentially through the urlpatterns list until a match is made. For instance, the following config might be used in a polling application:

app_name = 'polls'
urlpatterns = [
    # ex: /polls/
    path('', views.index, name='index'),
    # ex: /polls/5/
    path('<int:question_id>/', views.detail, name='detail'),
    # ex: /polls/5/results/
    path('<int:question_id>/results/', views.results, name='results'),
    # ex: /polls/5/vote/
    path('<int:question_id>/vote/', views.vote, name='vote'),
]

“app_name” creates the “polls” namespace for these calls, so that all calls here must be prefaced by “https://mysite.com/polls". This helps separate this page from other, potentially similarly-named pages in other apps of the project.

So, if you visit “https://mysite.com/polls/5/", it would get processed by the second entry down. The number “5” would get processed as a keyword argument for the function called (in this case, views.detail), with “question_id” being the variable name associated with the value, and “int” being the type of acceptable patterns being matched. (If someone tried to visit “https://mysite.com/polls/hello!/", they'd get a 404 “No page found” since it doesn't fit an integer pattern.)

With the “path” command: 1. The first argument is the URL pattern to be matched after stripping away the domain and app name (in this case, “https://mysite.com/polls") 2. The second argument is the function to call in “appname/views.py” (in the format “views.“ 3. The third argument contains other variables to be sent to that function. Most commonly, this is used for providing a helpful separation between the page function called and the specific URL formatting used to get there. This is most helpful in templates (see that section below for more details).

Models and Migration

Models go in “appname/models.py”. Here is an example for that file:

from django.db import models

class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')

    def __str__(self):
        return self.question_text

(It's good practice to include the str function for each model type so that it is easy for model objects to describe themselves.)

The call chain for migration calculations goes: 1. “mysite/settings.py” (looking at INSTALLED_APPS) –> 2. “appname/apps.py” –> 3. “appname/models.py”.

In “mysite/settings.py”, update the section under “INSTALLED_APPS” to feature your app. It should look similar to the following (note the first entry in the list):

INSTALLED_APPS = [
    'appname.apps.AppnameConfig',
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
]

In the “appname/apps.py” file, there is already likely a reference to “AppnameConfig”, so you're probably good there.

To create the migration, make sure you're in the “mysite” directory, then use the command:

python manage.py makemigrations appname

To then execute the migration, use:

python manage.py migrate

More on Views

Above, the basic pattern used for views utilized HttpResponse to send something back to the requester. Here are few more sophisticated options.

For contextless pages that just push data to the user (like an index page showing the top 5 blog entries, but don't use specific input from the user), you can use the “render” function instead of HttpResponse.

For example, in “appname/views.py”, you could use:

def index(request):
    latest_post_list = Post.objects.order_by('-pub_date')[:5]
    context = {'latest_post_list': latest_post_list}
    return render(request, 'appname/index.html', context)

Here, “latestpostlist” is a variable with data pulled from the database, “context” repackages latest_post_list to be sent to the template rendering engine. In the “render” function: 1. The first entry (“request”) is the context received from the requester 2. The second entry (“appname/index.html”) is the template to pull for rendering 3. The third entry (“context”) is the data to be sent to the template rendering engine.

For pages that are guided by user input (say, selecting a specific blog post), you can use “get_object_or_404()” to return a page if the object is found or a 404 if it isn't.

For example, in “appname/views.py”, you could use:

def detail(request, post_id):
    post = get_object_or_404(Post, pk=post_id)
    return render(request, 'appname/detail.html', {'post': post})

If there's no object, page goes straight to 404 without reaching “return render”. If there is an object, it processes normally through the “return render” line, following the same convention as above. If you're grabbing a list instead of an object, you can instead use “get_list_or_404()“.

Templates

Templates use the Jinja2 conventions for integrating variable data into HTML pages. By convention, templates are stored in your file system at “mysite/appname/templates/appname/pagename.html”, where “pagename.html” is the name of the template for that page. In the “appname/views.py” file, a template render would reference “appname/pagename.html” for that particular template.

The following example shows common markup used in templates:

<h1>{{ question.question_text }}</h1>
<ul>
{% for choice in question.choice_set.all %}
    <li><a href="{% url 'polls:detail' question.id %}">{{ question.question_text }}</a></li>
{% endfor %}
</ul>

Double brackets are used for variables shown directly. Properties on an object are accessed using dot notation (“question.question_text”, where “question_text” is a property of the object “question”).

{% %} brackets are used for control elements. Note the “for” and “endfor” blocks.

For creating links to other parts of the site, the format “{% url 'detail' question.id %}” is used. Here, 'polls:detail' references the “name” variable in the path function from “appname/urls.py” in the polls app (see section above). Rather than use a hardcoded path for this link, it looks up whatever path has the name “details” in the polls app, then uses the link format to create the link, using “question.id” as the input to the link format. So, if question.id = 34, it would create the link “https://mysite.com/appname/polls/34/" and insert into this part of the template.

Forms

Within the template for the form page, the code will be structured something like the following:

<form action="{% url 'polls:vote' question.id %}" method="post">
{% csrf_token %}

[Form tags and code]

</form>

Some key elements here: – In the opening tag, the “action” element points to the entry in the “appname/urls.py” page that will process the form. – The “method” in that tag is “post”, as it will be for every form. – “{% csrf_token %} is a data tag which adds security to your form by helping prevent cross-site request forgeries. This should be included in every form as well.

The view for this form will also need to be updated to accommodate the various states of the form entry sequence (new form, incorrectly filled form, completed form). An example is shown below:

def vote(request, question_id):
    #retrieve question object (based on question_id) from database,
    #either return object if object exists or a 404 page if it doesn't
    question = get_object_or_404(Question, pk=question_id)

    #attempt to assign the selected choice based on the completed form
    try: 
        selected_choice = question.choice_set.get(pk=request.POST['choice'])

    #if form was completed incorrectly
    except (KeyError, Choice.DoesNotExist): 
        # Redisplay the question voting form.
        return render(request, 'polls/detail.html', {
            'question': question,
            'error_message': "You didn't select a choice.",
        })

    #form was completed correctly, update database
    else:
        selected_choice.votes += 1
        selected_choice.save()
        # Always return an HttpResponseRedirect after successfully dealing
        # with POST data. This prevents data from being posted twice if a
        # user hits the Back button.
        return HttpResponseRedirect(reverse('polls:results', args=(question.id,)))

Some notes on the above: – request.POST is a dictionary-like object that allows you to look up the returned data by key. The data returned are always strings. – On successfully processing the data, the view returns an “HttpResponseRedirect” which takes one argument: the URL to which the user will be redirected after successful processing. – In the “HttpResponseRedirect”, the example uses “reverse('polls:results', args=(question.id,))” to perform a look-up in the “appname/urls.py” file in a similar fashion to the templates.

Admin Site

Before you can create an admin site, you need a superuser for the site. Do this with:

python manage.py createsuperuser

Then follow the prompts to create the user.

To add a model to the administrator site, update the “appname/admin.py” file in a manner similar to the one below:

from django.contrib import admin

from .models import Question

admin.site.register(Question)

Automated Testing

Tests for the app live in “appname/tests.py”. An example of a test to verify a function only returns True if the question dates are recent but not in the future may look like the following:

import datetime

from django.test import TestCase
from django.utils import timezone

from .models import Question


class QuestionModelTests(TestCase):

    def test_was_published_recently_with_future_question(self):
        """
        was_published_recently() returns False for questions whose pub_date
        is in the future.
        """
        time = timezone.now() + datetime.timedelta(days=30)
        future_question = Question(pub_date=time)
        self.assertIs(future_question.was_published_recently(), False)

Some notes on the above: – For the “Question” model, we created a subclass of the TestCase class specifically to examine this model and functions relating to it. The various test methods for this model are then built under this subclass. – The test methods must all begin with the prefix “test_” (the Django testing suite specifically looks for this) – The test validation is done on the final line – in this case, after the function is executed, it must return “False” in order to have a passing test.

The tests can then be run with the command:

python manage.py test appname

which then returns the test results to the command line.

The process executed with this command is: 1. “manage.py test appname” looks for “appname/tests.py” 2. It identifies a subclass of “django.test.TestCase” 3. It creates a special database for testing 4. In the example above, it creates an instance of that model with the specified pub_date 5. It checks the accuracy of the function for the model instance using the “assertIs()” method.

Some other validation functions that are especially helpful for testing templates:

#Use this command to check the information sent back in the page's "post" response
self.assertQuerysetEqual(response.context['latest_question_list'], [])

#Use this approach to check the type of page response received on a request
self.assertEqual(response.status_code, 404)

#Use this approach to perform a text-match within the response
self.assertContains(response, past_question.question_text)

Deploying a Small WriteFreely Instance on a Production Server

December 9, 2022

I've had blogs in various forms for ...a long time now, but struggled to use them effectively due to various technology challenges and life events. With the growth of the Fediverse and the wonderful technology stacks connected to it, it felt like time for a change. I decided to go with a WriteFreely instance for a few reasons:

ActivityPub compliance means that creating, saving, and sharing posts is easy and well-supported both individually and in the Fediverse.
WriteFreely has a clean, simple interface that works well on both desktop and mobile (with the mobile part becoming increasingly important!)
WriteFreely is free, open-source software (FOSS) with great documentation and support.

While the process was pretty well-documented and gotchas were few relative to other technologies, there were still enough trip hazards to keep things, um, interesting. This post is intended to provide a more complete process than is shown in the official documentation, and hopefully save people some time and headaches in the future.

The guide is targeted toward people with some experience using the command line on computers, especially on computers running Linux. If this isn't you, check out Write.as for a great instance that's already fully set up and supported. If you still want your own instance, reach out to me at @dulanyw@techhub.social.

Disclaimer

I am not a sys admin or professional website setter-upper – my tools are an engineering background, search engine, and patience. I've gotten websites to work well on many occasions, but I can't guarantee that everything below is best practice. (If you have advice on best practices I should include here – please let me know @dulanyw@techhub.social!)

This guide is also targeted toward small individual instances. This means: 1 user, low-to-medium volume of posts (a handful a day), generally low traffic with occasional peaks. Changes from this (multiple users, lots of posts/day, higher traffic) are very solvable with a dedicated high-performance database like PostGRES, use of Apache or nginx for serving requests, and/or more resources allocated to the server. These are great, but out of scope for now. The focus of this guide is “simple and low cost”, which means we will be doing as little as possible, as cheaply as possible to get the system up and running.

I should also note that these instructions are good as of December 2022. Major changes to WriteFreely or Ubuntu/Debian systems may change specific commands/processes, although the spirit would likely be the same.

Requirements

You will need the following to get started:

A server. A great option is a cloud-based server with a dedicated IP from a provider like DigitalOcean or Kamatera. These providers will also let you dynamically adjust resources to fit your needs and budget, which is helpful if you want to implement a more demanding system in the future.
A domain name. Again, a lot of providers here – I've had good experience with Ionos. You can also get free domains from Freenom. With whatever provide you choose, the key here is the ability to manually update your DNS records with that provider.
Some way to capture information through the process, such as IP address and login information. A notepad (real or virtual) will work, although a password manager (such as Keepass) would be better.

Per the WriteFreely Official Documentation, you will need also need:

Admin rights to the server (which you should already have if you use a cloud-based server approach)
About 30 minutes.

We'll get this blog up and running in four quick phases:

Prep work
Installation
Configuration
Clean-up

Prep Work

First, set up your server. I recommend using either an Ubuntu or Debian software image (latest versions) for ease of use. For a small instance, aim for minimum resource requirements (1GB RAM, 10GB hard drive, etc.). Select a password you can keep safe (and find easily!). Once the server has been set up, get the IP address.
Next, you'll need to make sure your domain name points to your server. That way, when people type in your blog's address (like https://www.awesomeblog.com), they get taken straight to your WriteFreely instance. In your domain registrar's interface, make sure the “A” record is updated with the IP address you were given in Step 1, then save it. It will take a few hours for your provider to share this with the rest of the Internet – by which point your blog should be up and running!
Time to connect to your server. Open a terminal (Linux) or command prompt (Windows) and type ssh root@<ip address> (substituting your own server's IP address from Step 1 for – for example ssh root@1.2.3.4). When it asks for your password, either enter it manually or copy and paste it (pasting into the terminal requires ctrl+shift+V instead of the normal ctrl+V), then press Enter. If all goes well, you should be logged in as root.
Next, we add a non-root user who can install things, and complete the remaining steps as that user. (It's bad practice to do everything as root user.) On the command line, enter adduser <username>, where is the name of the new user. (For example, enter adduser johnny.) Follow the prompts to add a password for this user and update their information, entering Y when the computer asks “Is this information correct? [Y/n]“.
With this complete, enter the command sudo usermod -aG sudo <username>, again entering the name of the new user instead of . If prompted, enter the root password to complete this. This raises the privileges for the new user so that they can install software.
Finally, type su <username> (replacing with the name of the new user). Enter the new user's password if requested.
Run the command sudo apt-get update && sudo apt-get dist-upgrade -y to make sure your system has the latest software.

Installation

Now, it's time to start setting up WriteFreely.

First, let's make a place on your system for the site. Create a couple of folders with the commands: sudo mkdir /var/www && sudo mkdir /var/www/mysite, then move into the folder with the command cd /var/www/mysite/. (You can use a different name than “mysite” – best practice is to keep it lowercase and without spaces.)
Now, download the latest version of the software. You can find it by going to the WriteFreely Github page in your browser, then scrolling to the bottom to find the asset that ends in “.tar.gz”. Right-click on the link and select “copy link”. Then, in your terminal, type sudo wget <link>, doing a ctrl+shift+v paste for the link in place of . For example, the command as of this writing would be sudo wget https://github.com/writefreely/writefreely/releases/download/v0.13.2/writefreely_0.13.2_linux_amd64.tar.gz. After this is run, make sure the download was successful with the command ls -l, which should show you the name of the file with a file size greater than 0 (hopefully a big number immediately to the left of the date).
Unzip the file with the command sudo tar -xvf <filename>, where is the name of the file you just downloaded. You should be able to just enter the first few letters of the file, then press the “tab” button to autocomplete, which will save a lot of typing and headaches.
Delete the original file using sudo rm <filename>. Again, use autocomplete to write out the full filename for you. This step isn't strictly required, but is good housekeeping practice.
With all of the files in place, time to start the configuration! Don't worry – WriteFreely has an interactive configurator that makes this part a lot easier. Enter cd writefreely && sudo ./writefreely config start to enter the writefreely folder and start the configurator. For each prompt, you can use the arrow keys to select your choice. If there is an error during the process, it will kick you out of the configurator, and you can retry with the command sudo ./writefreely config start. For the implementation described above (production, small instance, single user), make the following choices at each prompt:
- Production, standalone
- Secure (port 443), auto certificate
- SQLite
- Accept the default database name by pressing Enter
- Single user blog
- Choose an admin user name. You won't be able to change this later for logging in to the site, although you will be able to change the alias you use for federation.
- Choose an admin password. You can change this later if needed.
- Choose the name of the blog. You can change this later if needed.
- For public URL, type the domain name from Step 2, being careful to start it with https:// (such as https://www.awesomeblog.com).
- Federation: Enabled
- Choose “Public” or “Private” stats according to your preference.
- For instance metadata privacy, make the selection according to your preference.

Configuration

We're halfway done with the configuration.

Next, generate the encryption keys by typing sudo ./writefreely keys generate in the terminal, then press “Enter”.
Set up the database by typing sudo ./writefreely --init-db in the terminal, then press “Enter”.
Next step is to set up the certificate for secure communication. We'll use the LetsEncrypt Certbot for this. Use the following commands:
- sudo apt install snapd -y to install the Snap package manager.
- sudo snap install core; sudo snap refresh core to ensure the latest version is installed.
- sudo snap install --classic certbot to install Certbot.
- sudo ln -s /snap/bin/certbot /usr/bin/certbot to make sure Certbot can be run.
- sudo certbot certonly --standalone to run Certbot. Enter your email address when prompted. Press “Y” when prompted to agree to the terms of service. Select “Y” or “N” regarding joining the EFF mailing list. Enter the domain name for your site (from Step 2) when prompted without the leading “https://” (for example, enter “awesomesite.com” without the quotes).
- After the certificate is issued, save the locations of the certificate and private key to your notepad – we will need them for the next step. They should look something like “/etc/letsencrypt/live/mysite/fullchain.pem” and “/etc/letsencrypt/live/mysite/privkey.pem”. Use Ctrl+Shift+V to copy from the command line.
Type sudo vim config.ini to open the config file in Vim, then press “i” to enter editing mode. Update the following entries:
- bind = 0.0.0.0
- tls_cert_path = – set this to the address for the “fullchain.pem” file from the previous step.
- tls_key_path = – set this to the address for the “privkey.pem” file from the previous step.
- autocert = false
- site_description = – enter a description for your site. You can change this later, if you'd like.
Press the Escape button on your keyboard, type :wq, then press “Enter”. This saves the file, then closes the Vim text editor.
With that, we're ready to test it out! WriteFreely comes with its own built-in server software, so you just have to enter sudo ./writefreely in the terminal, then press “Enter” to launch your site. Go to the URL for your website (or the IP address if the URL doesn't work – DNS may still be updating!), and you should see a default webpage. Congrats!

Clean-up

Now that we know everything works, we just have to make it work for the long-haul. This means setting WriteFreely up as a service in your system, which means it is constantly, automatically running on your system.

Type sudo vim /etc/systemd/system/writefreely.service, press “i”, then copy/paste the following text:

[Unit] Description=WriteFreely Instance After=syslog.target network.target

[Service] Type=simple StandardOutput=syslog StandardError=syslog WorkingDirectory=/var/www/mysite/writefreely ExecStart=/var/www/mysite/writefreely/writefreely Restart=always

[Install] WantedBy=multi-user.target

(If you named your folders something other than “mysite”, be sure to update those lines appropriately!)

As before, press the Escape key, then type :wq in the terminal, and press “Enter”. This will save the file and exit Vim.

Finally (finally!), type sudo systemctl start writefreely in the terminal and press “Enter” to start the service. You can see what's going on with the server using the command journalctl -f -u writefreely.service.

Conclusion

Congratulations! You now have your own personal instance of WriteFreely! You can now blog and federate to your heart's content without being dependent on a big social media platform. Doesn't that feel good?

Big shout-out to the WriteFreely team for the wonderful project and great starting guide. Thanks to Priyanka Saggu for writing this guide, offering very helpful information in a similar vein. Also, here are the WriteFreely admin commands and configuration guide if you ever need them in the future.

Tags: #fediverse #writefreely #how-to

On The Future of the Fediverse – 2022 Edition

December 1, 2022

With the recent migration of millions from Twitter to Mastodon, there are now both many more people participating in the Fediverse, as well as far more expertise in running a social media ecosystem.

So far, Mastodon and the other federated services have held up well under the deluge. It feels like the technical elements of federated social media, if not solved, are in a relatively good place right now.

Naturally, though, there's much more to social media than just the tech.

In a recent thread, Whitney Merrill noted that instance owners have some interesting legal obligations based on both where their server is based and where the instance's users live. Along with privacy laws like GDPR, there may be other laws against hosting or distributing illegal content ranging from forbidden speech to CSAM. What happens when a government agency comes with a subpoena?

(Update: the great folks at EFF put together a great primer on the legal considerations around running your own instance)

There's also the actual operation of the instance community. How is moderation handled, and what's the right ratio of mods to users? Instance costs scale surprisingly slowly (for instance, techhub.social costs $500/mo for a Mastodon instance with over 50,000 users), but they still need to paid for. How is that covered? And if the instance owner gets tired of running it, what recourse do users have if, one day, it just gets shut down?

So, looking toward the future of the Fediverse, I feel like the key dimensions guiding it will be:

Instance ownership and liability.
Moderation.
Funding.

Based on this, here are some of the key models I think we'll see going forward.

Model 1 – Organization-run instances

As the Fediverse continues to grow, instances run by organizations will become more prominent. On their face, they solve a lot of Fedi problems:

Having more than one person involved in ownership should create a more durable instance, less threatened by one owner's whims
Organizations provide a liability shield unavailable to individual operators
Many hands make light work...of moderation

They also have the potential to offer options attractive to a broader range of users.

A for-profit instance could charge users a nominal fee ($8/month?) for 24/7 moderation, algorithmic discovery services, and easy on-boarding from other ad-based social media services. (This is actually a really nice business model, and I'm surprised more folks haven't jumped on it.)
A professional organization might offer lighter moderation, but provide tools for discovering and amplifying new work by it's members, independent of employers. It could also help on-boarding new members to the profession and connect them with mentors (think LinkedIn x ArXiv x Twitter)
A non-profit or co-op could offer low moderation options, but have systems that support mutual aid or keep members abreast of special-interest issues.

And so on. There's a huge space for exploration here that I'm excited to see develop.

Option 2 – Individual-run communities

This seems to be the bulk of instances today – run by one person, ad-hoc moderation support, donation model for technology costs. Which is great for getting the Fediverse off the ground, but has some challenges with long-term sustainability.

First, the single mod-king model can easily lead to burnout if the instance gets too big. The donations aren't likely enough to compensate the instance owner for their time, and the owner is at personal financial risk if a user does something illegal on their instance. They may have to be aggressive with the block button to keep bad actors at bay, or nuke the whole instance if it simply becomes too much.

It's a lot for one person to carry. God bless all who have done it and continue to do it. I hope, for their sake, we can do better.

Option 3 – Friends-and-family instances

One way to not get sued is to make sure the only folks on your instance are people who won't sue you.

If you are pretty technically savvy, you can spin up your own instance, invite your circle of friends and family, and only add others you know by invitation. If your neighbor or father-in-law hasn't used social media before, they'll have a friendly, supportive space to help them get up to speed. Hosting costs can be shared in-kind through, like, pizza and beer or something.

There's obvious scaling issues here for the whole Fediverse, but it keeps owners protected while providing a nice level of on-boarding support for those new to the space.

Option 4 – Individual instances

Finally, people can also spin up their own instances and be the sole owner of the island. This will make most sense for people who both:

Have the technical skills to do so
Already have a network on the Fediverse that they can bring with them (vs trying to build one from scratch without a Local feed).

The “technical skills” piece is still too-large a barrier for the majority of people. I have another blog post on setting up a (this!) WriteFreely instance, and even a simple process requires something like 30 minutes and a technical degree to do right. When Fedi tech finally gets to the place where you can pay $5/month and run your own instance with a button click, I think we'll see much more wide-spread adoption of this model – but not until then.

The Future

The widespread adoption of ActivityPub and the rapid growth of the Fediverse is one of the most exciting developments on the Internet in a long, long time.

And we're still in the early phases. Let's see what we can build together.

Tags: #fediverse