The 14 year old boy alignment problem, future shock, and AI microscopes
17.28, Thursday 4 May 2023 Link to this post
I have opinions!
Rather than have AI completely take over this blog, I am pasting three “working hunches” all at once, and skip this post if you’re not interested.
The AI alignment problem matters less than the 14 year old boy alignment problem
GPT-4 is capable of all manner of terrible things, including synthesising novel toxins and having them sent to your door – all detailed in OpenAI’s GPT-4 System Card as previously discussed.
It turns out that Large Language Models are pretty decent planners. As Auto-GPT (GitHub) shows, you can give an LLM a goal, and have it auto-expand that goal into a sequence of steps. And that, given some basic plug-ins, start executing those steps with external tools.
So that’s risky! People could use that for anything!
I guess it’s possible to monitor bad actors using AI, because they show signs in the rest of their lives about being terrorists or whatever.
But (having been one) 14 year old boys are idiots, and perfectly capable of typing “let’s do this idiot nation-destroying thing” and leaving the AI running overnight - and you can’t monitor them all. The main hurdle for 14 year old boys doing idiotic things is simply lack of opportunity and not knowing where to start. AI “fixes” that.
AI alignment (Wikipedia) is the term of art for how to get AI not to assist in awful things, i.e. how to
steer AI systems towards humans’ intended goals, preferences, or ethical principles.
But the genie is out of the bottle, isn’t it? You can download an LLM and run it on your laptop, and those won’t go away.
So the challenge is not in aligning AI, but either (a) aligning 14 year old boys to not do idiotic things (impossible), or (b) adapting (necessary).
Now this reminds me of biotech: I was at a conference a decade or so ago where they talked about DNA synthesis. At the time the user interface to a DNA synthesis was this: you go to a company’s website, and you paste in the sequence in a web form, then you add your credit card details, then you hit submit. The DNA arrives in a vial a few days later through the post.
I can’t remember the cost then, but today’s it’s about 10 cents per base pair. Well, smallpox was sequenced and published in the early 1990s. It’s about 186,000 base pairs, so that’s expensive to synthesis but only a dozen-laptops-expensive – within reach of a disgruntled discord server.
The “fix” is that the DNA synthesis company looks at everything that people paste in their web form and their computer says: does that string of A, C, G, and Ts match the smallpox sequence? If so, don’t print it.
Ok so now we need that for everything. At scale. For anything AI tools might touch.
Our social adaptation will have to be Gmail-scale anti-spam filters for everything. For DNA synthesis, for e-commerce, for WhatsApp calls that appear to be from someone you know, for Airbnb rentals.
I don’t know what that looks like, it’s going to be a mess.
AI was a 10 year wormhole into the future, but not necessarily a fundamental acceleration beyond that
Here’s how I get to 10 years because that’s a pretty concrete figure:
- When I was building Braggoscope, the web-scraping task that would have taken me 4 days instead took 20 minutes (using GitHub Copilot to write code plus GPT-3 APIs for automation). I work 9 hour days; that’s a 108x speed-up
- That’s 6.75 Moore’s Law doublings (2 to the 6.75 = 108)
- Each turn around Moore’s Law is 18 months. 6.75 x 18 months = 10 years.
10 years of progress is a lot in one go!
So we’re in this capability overhang. A lot is possible, but it takes time to digest.
Think of the web. It look years to realise that hitting a button could save content to a server (“user generated content”). Or to realise that “one click” e-commerce was possible (I was still selecting products online then emailing my credit card number in 1998). Or the end of boxed software and the rise of SaaS…
And faced with that overwhelming possibility space, we’re in some kind of collective future shock. Just saying WHOA and over-indexing on the recent rate of change.
However there are “problems” ahead. Look at, for example, prompt injection. It’s not a problem like a stop-the-world problem. But it’s a problem that will require some engineering and some breakthroughs. And I think these challenges will accumulate.
Which means we’ll be back to the regular ol’ rate of progress.
Or maybe it’ll speed up again, who knows.
But I feel like I can wrap my arms around 10 years.
We’re building apps to surround and harness AI, but we need microscopes to study it too
I always think of the Warp Core in Star Trek – this barely contained seething emanation of ENERGY, and the ship is built around it to sluice and direct that energy away to make it do useful things. (I don’t know if this is actually what the Warp Core is, my Star Trek lore is lacking. But bear with me.)
Or the drive shaft that ran down the centre of steam-powered factories, from which all power and motion was drawn.
When I look at the startups being built with large language models, they treat it like a Warp Core. The job is to surround the LLM, capture it, use it.
But I feel like we don’t understand LLMs as themselves enough.
Like: I write a prompt and generate a completion…
How stable is that completion? If we were to “fuzz” that prompt (by randomly changing single tokens, say), would the completion stay static or would it diverge? On the manifold of latent space, are we at a local energy minimum or a saddle? What are the load bearing tokens of the prompt?
You get a feel for this as you work.
How would you combine, tweak, version, and test prompts? I’m working with a startup right now and boshed together a “prompt construction kit” (screengrab here) so that the whole team can get that “feel” too.
So what would a tool for expert prompt engineers look like? An IDE that lets you look as the compiled code, and has tools to visualise the equivalent of a stack trace, or step through a run…
How do you visualise what it happening in the middle layers of a transformer model? How do you expose that and manipulate it?
I’m grasping because I barely know how to articulate what I want.
But I want microscopes! Not even tooling yet. Let’s understand LLM as a material and develop words for qualities we don’t even know about yet.
Anyway if anybody wants to pay me to collab with their engineers to build microscopes and tooling - in the spirit of inquiry right now - then I am all ears.