skip to main bit
a man slumped on his desk, from 'The Sleep of Reason Produces



On Stable Diffusion

A good friend made a Facebook post saying

“Sadly it turns out that the latest AI photo app y’awl using to look hot and sexy is built off the back of a training set full of work stolen from artists without payment.

How disappointing.

We sorted this shit years ago with Creative Commons licensing. It’s not hard to get right. #paytheartists”

It led to a heated debate! Here’s (with some few modifications) how I replied, which was sufficiently long that I felt I should pluck it out of the Facebookosphere, and settle it here:

I understand that people worry that large models built on publicly-available data are basically corporations reselling the Web back to us, but out of all the examples to draw upon to make that point, Stable Diffusion isn’t the best. It’s one of the first examples of a model whose weights are open, and free to reproduce, modify and share: . Like many people here in the comments, you can download it, inspect it, run it locally, and share it. You need a GPU to run it at a reasonable speed, which makes it a little pricey to run. The cost of building these models is very pricey — around $600,000 or so, which means that there’s currently a power differential between large corporations who can afford to speculatively build and experiment with these models, and the rest of us. But the knowledge of how to do it is built on open science, and a number of orgs are doing it truly in the open — for example, . All of these things, as ever, will get cheaper, and spread in use and experimentation.

Most importantly, the tool itself is just data; SD 1.0 was about 4.2GiB of floating-point numbers, I believe (taken from ). I’m currently using (literally, right now!) another open model, Whisper, which is 3GiB, and allows me to convert most spoken audio into text, and even translate it. I use it to, securely and privately, transcribe what I’m saying to myself through the day. I expect it will be encoded into hardware at some point very soon, so we will have open hardware that can do the kind of voice to text that you otherwise have to hand over to Google, Amazon, and co.

The ability to learn, condense knowledge, come to new conclusions, and empower people with that new knowledge, is what we do with the shared commonwealth of our creations every day. Copyright has not always been a feature of that process, but in many ways, it’s been an efficient adjunct to it: a way to compensate creators by taking a chunk from the costly act of copying itself. It’s a terrible fit to the modern digital world, though, just because that act of making a copy is now practically zero. Attempts to update it, have unfortunately revolved around trying to recreate the physical limits of previous copying equipment, and bolt it onto a system where that’s not where the revenue comes from.

It’s always been hard to stop these temporary monopolies from impeding the open commons that they all draw from, especially after we built a automatic copyright system post the Berne Convention, where everything was maximally locked down by default. That’s why Creative Commons was invented — because without that work, it was costly and near impossible to grant back to the commons, with legal certainty, the way that the commons could exist by default before the 1970s.

Again, I understand if people are worried that, say, Google is going to build tools that only they use to extract money from our shared heritage. But the problem isn’t that those tools should be illegal, and that anyone building or using them (like me, like EleutherAI, like any one following the instructions spelled out by the increasing, accelerating field of machine learning, and drawing on the things around them). It’s that the tools should be free, and open, and usable by everyone. Artists should get paid; and they shouldn’t have to pay for the privilege of building on our common heritage. They should be empowered to create amazing works from new tools, just as they did with the camera, the television, the sampler and the VHS recorder, the printer, the photocopier, Photoshop, and the Internet. A 4.2GiB file isn’t a heist of every single artwork on the Internet, and those who think it is are the ones undervaluing their own contributions and creativity. It’s an amazing summary of what we know about art, and everyone should be able to use it to learn, grow, and create.

One Response to “On Stable Diffusion”

  1. Pluralistic: Web apps could de-monopolize mobile devices (13 Dec 2022) - Mondaychick Says:

    […] On Stable Diffusion […]


petit disclaimer:
My employer has enough opinions of its own, without having to have mine too.