Vernor Vinge’s sci-fi novel Rainbows End (2006) is so prescient about AI training data.
His short Fast Times at Fairmont High (2002) is set in the same universe, and was written in that era where we felt like we had line of sight to pervasive augmented reality and also 3D printers. I read it at the time and it’s a low-stakes high school drama (about augmented reality and 3D printers), but from today’s perspective it is more like a utopia (of a certain kind) – democratised tools of production, reality as consensus hallucinations, super empowered kids.
The spine of Rainbows End is something called the “Librareome Project.”
Ok SPOILERS – right? So stop here if you’re planning to read the book (which would you).
The Librareome Project, you find out about a third of the way through, is a giant digitisation project of the world’s knowledge, and they plan to scan the world’s libraries to do it.
But didn’t Google already do that?
Yes but this is more total; like the Human Genome Project the whole is more than the sum of its parts:
It’s not just the digitization. It goes beyond Google and company. Huertas intends to combine all classical knowledge into a single, object-situational database with a transparent fee structure.
(Oh yeah, micropayments, there’s a whole model here.)
We’re not told what an object-situational database is. But this singular thing makes possible correlations that will reveal new knowledge:
Who really ended the Intifada? Who is behind the London art forgeries? Where was the oil money really going in the latter part of the last century? Some answers will only interest obscure historical societies. But some will mean big bucks. And Huertas will have exclusive rights to this oracle for six months.
I mean, this is so Large Language Model. 2006!!
An oracle!
This promise is why the universities are allowing their libraries to be scanned.
Uh, “scanned.”
The books are shredded. Fed into the wood chipper and blasted into a tunnel and photographed at high resolution:
The pictures coming from the camera tunnel are analyzed and reformatted. It’s a simple matter of software to reorient the images, match the tear marks and reconstruct the original texts in proper order. In fact–besides the mechanical simplicity of it all–that’s the reason for the apparent violence. The tear marks come close to being unique. Really, it’s not a new thing. Shotgun reconstructions are classic in genomics.
The shredded fragments of books and magazines flew down the tunnel like leaves in tornado, twisting and tumbling.
– the image has stuck with me since I read it.
Anyway.
The libraries are being fed into the maw of the machines.
And it turns out that Chinese Informagical, which has dibs on the British Museum and the British Library,
was going faster than Huertas so they don’t have their monopoly.
And the Chinese have nondestructive digitisation techniques, so none of it was necessary.
Well.
Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles (Washington Post, paywall-busting link).
I’m not trying to make a point here like “AI is bad” (you know me well enough and I’m pleased that my own book lives in the weights of the god machine) but one story reminds me of the other, and there is a violence intrinsic to creation, in this case the creation of new knowledge, slamming together words in the particle collider of linear algebra, something is lost but new exotic shimmering sparks appear - grab them! - and I guess what I mean is let’s recognise the violence and be worthy of it: if we’re going to do this then let’s at least reach for oracles.
Vernor Vinge’s sci-fi novel Rainbows End (2006) is so prescient about AI training data.
His short Fast Times at Fairmont High (2002) is set in the same universe, and was written in that era where we felt like we had line of sight to pervasive augmented reality and also 3D printers. I read it at the time and it’s a low-stakes high school drama (about augmented reality and 3D printers), but from today’s perspective it is more like a utopia (of a certain kind) – democratised tools of production, reality as consensus hallucinations, super empowered kids.
The spine of Rainbows End is something called the “Librareome Project.”
Ok SPOILERS – right? So stop here if you’re planning to read the book (which would you).
The Librareome Project, you find out about a third of the way through, is a giant digitisation project of the world’s knowledge, and they plan to scan the world’s libraries to do it.
Yes but this is more total; like the Human Genome Project the whole is more than the sum of its parts:
(Oh yeah, micropayments, there’s a whole model here.)
We’re not told what an object-situational database is. But this singular thing makes possible correlations that will reveal new knowledge:
I mean, this is so Large Language Model. 2006!!
An oracle!
This promise is why the universities are allowing their libraries to be scanned.
Uh, “scanned.”
The books are shredded. Fed into the wood chipper and blasted into a tunnel and photographed at high resolution:
– the image has stuck with me since I read it.
Anyway.
The libraries are being fed into the maw of the machines.
And it turns out that Chinese Informagical, which was going faster than Huertas so they don’t have their monopoly.
And the Chinese have nondestructive digitisation techniques, so none of it was necessary.
Well.
Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles (Washington Post, paywall-busting link).
I’m not trying to make a point here like “AI is bad” (you know me well enough and I’m pleased that my own book lives in the weights of the god machine) but one story reminds me of the other, and there is a violence intrinsic to creation, in this case the creation of new knowledge, slamming together words in the particle collider of linear algebra, something is lost but new exotic shimmering sparks appear - grab them! - and I guess what I mean is let’s recognise the violence and be worthy of it: if we’re going to do this then let’s at least reach for oracles.