11.44, Saturday 7 Dec 2002

Power laws have been coming up a lot recently. They're a property of many of the complex systems around us and basically mean there's a kind of self-similarity across scales. For example: For any given size city in the USA, there are four half its size. Or to put it another way, if there's one city with four million people, there are four with two million, and sixteen with one million inhabitants. Ubiquity by Mark Buchanan is about this, but I've also heard power laws come up in community sizes, and popularity of web sites (for any given website there are ten half as popular as it, say).

This is important because it means there's no typical size of a city, no typical popularity of a website, no typical size of a community. And that means trying to automatically cluster a social space is really very hard.

A quick example: You may have some data about how closely different weblogs are related to one another, and be attempting to see what groups or social clusters they fall into. This is easy if you can say "weblogs related a certain amount we'll class as friends" and everything falls out as neat clusters of friends. However!, if the social clusters obey a power law, these clusters are never going to fall out. In fact whatever scale you look at, there will be a mix of some tight clusters and some small ones you're not sure about. You're never going to be able to say "this is a social cluster, and this is a social cluster too" because there's no typical group archetype you can compare against.

Well, in that case you'd find some other way of doing what you wanted to do if you were fairly sure the social clusters were obeying a power law. Something the book Ubiquity doesn't address is when such power laws occur, but I've done a little experiment and from a first cut it looks like power laws are characteristic of randomly distributed values that come from a scarce resource (raw results). That is, if you have a system where values are unrelated (say, how tall people are), you won't get a power law distribution. But if the values are related (say, if a person chooses to live in San Francisco they can't live in New York), then you're going to get a power law.

Weblog community sizes draw from a scarce resource -- time a person devotes to one social circle is time that can't be spent on another social circle. So I'd expect social clusters in weblog space to obey a power law.

That means there's no typical size or typical type of weblog community. They're all different.

What next? This is just a handy rule of thumb that you'd be better off going elsewhere and finding some alternative to traditional ways of clustering and finding typical communities or typical cities. As to what the alternative is, I have no idea. I'm thinking about it.

I wish I'd know this when I spent two months of my physics dissertation attempting to find typical clusters across quasar spectra. I would've given up and tried something else.