skip to main bit
a man slumped on his desk, from 'The Sleep of Reason Produces
      Monsters'

Oblomovka

Currently:

2016-03-10

go wild

I love watching the AlphaGo/이세돌 games. I barely know anything about Go, so I’m essentially pursuing my favourite hobby of watching smarter people reach out beyond their comprehension.  The little shortcuts of explanations between expert Go players: the flurry of hand movements, the little trial explanations of future moves, and Go’s beautiful vocabulary, the subcultural mix of  deliberate ironic calm and background, barely concealed anxiety and excitement. A friend said it felt like “surrealist theater” sometimes. But what I love about games, about programs, about science is that even when it’s hidden and barely explicable, there’s always something there.

Nobody seems to understand AlphaGo’s wilder moves. In the second game, everyone commenting belatedly realised that it was doing something in the center when everyone thought it was losing the upper right to Lee. Opinions on who was winning swung wildly from side to side. AlphaGo itself has a metric of how it thinks its doing (it resigns if it perceives it has a less than 10% chance of winning). We don’t get to see what that is in the game, but the program’s British inventors said afterwards that AlphaGo thought it had a 50/50 chance in the mid-game, but its confidence slowly and consistently increased towards the end. Were AlphaGo’s early moves madness or genius, someone asked. We’ll know from whether it wins or not, another human replied. It won.

And again, something of a zeitgeist event. The AI people, who’ve been kicking around in my box of interesting predictors for nearly a decade, I think they feel that this is their moment.

I spent a couple of hour last weekend talking to Benjamen Walker about Nathan Barley, and the psychic damage of the early 2000s. At one point, I talked about the terrible distortion for technologists in the dotcom years of having years of everything you want and predict turn out to be true. Then I more sadly talked about how the magic had ebbed away. How so many of us coasted along on glib predictions that the Internet is going to make things nicer and more exciting for a decade, and it worked,  then suddenly every bet turns out wrong.

I  hate actually predicting things, because as soon as  you pre-commit, your perceived accuracy plummets (because now it’s your actual accuracy which is never as much fun). As ever, I can just couch my predictions in woolly language here so: I’m feeling myself be tugged along in the AI folks wake, because they’re going somewhere interesting for a few years, even if maybe the magic will fade from them before they reached home and the Singularity.

(Fun reading if you want it, in this vein: Crystal Society by Max Harms. My favourite book this year so far. And, just like my favourite book this decade, Constellation Games, indie/self-published.)

BTW, Constellation Games is the Book of Honor at the upcoming Potlatch science fiction conference. I’m mortified I’m missing it, but I think I’ll be ending up at the same city as the author (hi Leonard, are you going to be at LibrePlanet in Boston?), so maybe it’s not so bad. Who can predict?

2016-02-27

circling around

Yes, I’m increasingly excited (with an estimated excitement half-life of eight days) about reading lots of academic papers. I always enjoyed hanging out at paper-oriented conferences like SIGCHI, when I was a teenager I would read Nature in the public library and imagined what it would be like to understand a damn thing in it. I remember someone asking Kevin Kelly (pbuh) what he was reading and he said “oh I only read scientific papers these days” which is such a burn. Clearly it is my destiny to read random academic papers and stitch an unassailable theory of life from them. Or at least spend a week lowering my respect for the entire academia.

Today, I read (which is to say skimmed), Cowgill, Bo, and Eric Zitzewitz. “Corporate Prediction Markets: Evidence from Google, Ford, and Firm X.” The Review of Economic Studies (2015): rdv014, and Rachel Cummings, David M. Pennock, Jennifer Wortman Vaughan. “The Possibilities and Limitations of Private Prediction Markets”, arXiv:1602.07362 [cs.GT] (2016). Look at me, I’m citing.

The main thing I learned is that Google’s internal prediction market worked by letting people turn their fake money won on the market into lottery tickets for a monthly prize (with another prize for most prolific speculator). Clever trick to incentivize people but not turn it into an underground NASDAQ or somesuch.

Meanwhile, I recalled last night the Enron Email Dataset, a publicly available pile of 500,000 emails from 1999-2004. Will it corroborate my evidence that subject lines get longer every year?

Ta-da:

This is a steeper trend over the time period than my own corpus — 2.63 extra characters a year! I’m fretting a bit that it’s some artifact of a rookie statistical mistake I’m making, or the fact that there’s simply being more email over time. Someone who knows more than me on these matters, drop me a line — preferably a very long and descriptive one.

I’ve updated the code to include a function that can parse the Enron Depravities. You can get the latest Enron dataset here (423MB).

2016-02-25

academic problem

I was wandering around my PGP key neighbourhood last night, and found  Isis Agora Lovecruft’s distributed aggregating library, which I am immediately envious of (even though I suppose anything I could covet there I could just take with me). It is a library in the sense that it is a collection of books and papers, (although the other usage, as in  “code library” might work perhaps literally as well as metaphorically.)

Mostly it prompted me to see what it would take for me to develop an academic paper habit. I don’t have a guide here, so I immediately started uncovered mad evolutionary psychology papers that could so easily convince me of anything I wanted to believe was true. So in that corner at least, academia is less the sum of human knowledge and more another set of paths which takes you on a tour of the local ideas around your starting point. How do you get out? How do you see the shape of the whole thing? What happens when you bump into somebody coming in the exact opposite and contradictory direction?

It is also making me think about individual tools to manage vast personal data sets. We sort of faded out on this problem when the Great Centralisation began, and everything began ascending into the cloud. I think it might be where we should start, so when everything starts falling out again, our books and photos and films and songs and lives, we’ll know where to put it, and where to find it again later.

Isis is probably one of those few people who are close to the invariants of my personal politics, though I seem to remember that we had a blazing argument about basic ideological axioms within minutes of meeting (edit: I should note that my idea of blazing argument is most people’s idea of mild disagreement). Well, she signed my key regardless. You should sign my key too! To hell with all this passport and identity card waving. You know it’s me! It is! I’m in here! It’s me!

2016-02-24

a spectre is haunting internet

I am diving a little further out on the Net, now, and seeing a few patterns. I don’t really know how pervasive those patterns are. For most purposes (beyond my guilt), that doesn’t really matter. There’s always going to be limits to how far culturally you can wander. I can’t just go to a random place on the Internet and wander around from there, because you can’t deduce the significance of that place just from turning up. You need to know something of the path to that place.

What I’m always looking for is cultures or ideas or places that are generative. Places that lead to other places; spreading ridges in earthquake zones, creating more land under your feet. I’m lucky, because where I start out from these days is almost always toward somewhere imminently popular, or famously unpopular, or universally-declared-as-interesting. And I get to be “lucky” in searching for these, because before and after I get to these places, a whole crowd of invisible people who are just like me, but richer and more powerful and influential are also turning up, because we share a lot of common history and traits. And they’ll uplift what I find and suddenly it will be universally-declared-as-interesting. So you get to be an amazing prophet of trends.

You have to be aware of your cohort. You have to be aware that you are more-or-less identical with a huge subset of humanity, and when you like something, there’s a certain number of people who will not only like it when you show it them, but probably liked it before you got there. You are never the first, but you might be the first to talk about it among your friends.

Anyway, what I’d like to note here is the rise of communism.

I find that people are super-interested in communism, and that interest is permeating in a familiar way. Look at Reddit’s me_irl. Me_irl is one of the larger reddits, and it’s sort of broiling with strange memes, like 4chan used to. My aged instincts tell me the source for its generativity is offstage somewhere, and me_irl is actually the most boring, old receptacle for that output. I can definitely click around and swiftly people who are pissed off with me_irl, that it’s been taken over by social justice warriors or fascists and that you should got somewhere else for the real fun.

Nonetheless, me_irl, is really interested in communism. Just to double-check I’m not on crack, I went there just now, and clicked on the first “me☭irl” link I found. It was this, with these comments.

Clearly, in those comments, bystanders are irritated that me_irl, which should just be a random meme palace for people’s metaphorical depiction of their sad but ironically funny lives, has somehow veered into a constant reposter of Marx and Engels jokes. They also get annoyed that me_irl becomes regularly obsessed with scary skellingtons.

I am, for some reason, not going to construct an elaborate theory about the scary skellingtons. But I do find, when it comes to communism, that the tiny overlords of me_irl are wallowing in hints of a broader generative trend.

Now whenever I look around elsewhere, I really see a lot of people fascinated by communism. This is not in the sense of selling Socialist Worker at street corners, but mostly making rather sophisticated in-jokes about the bourgeoisie and commodity fetishism and Hoxhaism, and having others riff on those jokes. You can make endless jokes using communism as a source material, and also kick off many 3AM conversations or shower thoughts. Generative!

This really isn’t that surprising: communism is a pretty deep subculture (a bit less than catholicism-level deep, perhaps?), its source material gets translated a lot, it speaks to the human condition, it is explored in vivid amounts of detail in the further education that almost everyone has to attend to these days. It is pretty fertile, alien but approachable, old but new. Also everyone is grumpy at capitalism right now.

This is notable to me, though, because I grew up in communism’s lowest ebb. From 1989, onwards, communism was really the least generative ideology around, just because it had taken a gut punch from history. I remember walking around with Mackay and Cait in New York in the late nineties and finding a garbage pail full of old Marxist analysis, leaving us to  simultaneously cry out “look! the dustbin of history”!

You could certainly be into communism in the late 20th century, but I don’t think anyone was seriously expecting it to be the ur-source of new ideas right at that point.(And by “anyone”, of course, I mean “people less than a certain subcultural circumference away from me.”)

I’m thinking on a wider theory about what this means about subcultural flows across generational timescales, but unfortunately that idea needs a bit more javascript. So I’ll just leave this here and say that if in the next 5 years, we all start having more communist revolutions, you heard it here first. Well, here, and in_rl.

2016-02-22

new estonia

I spent some time last week with people slightly above my pay grade in the International Political Relations space talking about the future of the Internet. The event used Chatham House Rules, which are like the Three Laws of Robotics, except for dignitaries, so I can’t say who said what. I can exclusively reveal that some people aren’t happy with what Apple has done in response to the San Bernadino court order, while a lot of other equally powerful people think they are exactly right. You heard it here first.

My less shocking (but not by much) observation was that politicians and diplomats who like the Internet (or, at least, understand the Internet) aso like Estonia. A lot. You rarely find people of this ilk going on about the greatness of a country that is not their own, so that stood out to me. There may even have been a little Estonia envy going on. It is also possible that there was some patronising “plucky little country that I can acknowledge without any further ramifications”, but I think it was mostly genuine admiration. No-one was very specific about why Estonia was doing the right thing, and I think I will leave it at that.

Another theme was many people’s disappointment with their governments’ lack of a defensive posture regarding Internet security and privacy. That is, there was plenty of talk of the rights and wrongs of states hacking into endpoint devices, or requiring backdoors, or circumventing encryption — but many people were concerned that no state was doing enough to protect its citizens and organizations from attacks.

The criticism, just from its origins, seemed to center on the United States. But it occurred to me that actually the current budget for supporting basic infrastructural security work, such as ISC and OpenSSL and so on is currently so small that even a relatively small nation state could add an order of magnitude or two to it. In fact, given many technologists’ suspicions of the more heavily-resourced states, it might be politically more acceptable for an Estonia-level state to be a benefactor.

I don’t have an opinion on whether this would be a good idea or a bad idea (for the record, I do not believe donations from Latveria should be accepted at this time). I’m just noting that if a small state wanted to be the new Estonia of the Internet before the old Estonia of the Internet had even got a chance to settle into the throne of cyberspace, this would be a fine way of doing it.

2016-02-21

home server

Well, that was an embarrassing amount of time having to engineer around forgetting a password. Nothing important lost, but an important lesson: if you write down a password (and you should write down a password), write it down correctly. Being clever and elliptical in the past is just frustrating in the present. Also true of secret societies. Lighten up!

Anyway, the password was for an empty Debian install onto a chromebox which I’d set up, but not actually populated with files and such, so no great loss. Except I had to learn how to install Debian on a chromebox again (shades of “Flowers for Debian” again).

It remains a very promising base for a home server though. Asus Chromeboxen are still around $150, can be upgraded pretty easily, and installing a free operating system on them only costs you 3 sanity rolls, max. The machine is very quiet, tiny, and I think powerful enough if you stick some more RAM and SSD into it. My last home server has been happily doing its thing for a decade, and this, which I’m eying up as a replacement, has the same feel to it.

To be honest, the most exciting part of it was working out a way that I could encrypt its root hard drive, and but somehow let me ssh into it to type in the magic passphrase even before the thing had finished booting. This is a pretty good guide to doing that. Feels like magic to connect into a thing that hasn’t even booted into full Linux.

2016-02-20

the inhuman search engine

There was a time when you could parlay a decent understanding of Google search (or any search) into a journalistic career. Journalists were, on the whole, trained to collect information through contacts and telephone calls, but at that time, they didn’t yet have a consistent grip on how to piece together stories from the Net. The majority of stories were built from legwork, not basic Internet skills. The pendulum is swinging the other way now I think. Many, many articles are now written that were spun from forwarded screenshots and searches. You can still get ahead a little from having advanced knowledge: there still remains a benefit, I believe, for journalists who know a little coding or a little statistics. But with the home base of journalism moving online, here’s almost certainly an emerging premium now for people who can simultaneously talk to computers and humans in languages they understand. Or maybe can use the Internet to peer into motivations and other intimacies, rather than uncover facts.  A good example is Gwern and Andy Greenberg’s piece on the identity of Satoshi Nakamoto. There’s some serious understanding of a lot of tech in their research, but it was mostly undone by underestimating how strange human motivation can be. Why would someone try to plant a trail suggesting they were Nakamoto, with no obvious benefit? Strange motives sink plenty of research projects. But perhaps one of the conclusions of anyone who swims in the large scale view of conspiracy theories and fraud that the Net offers is that, absent a permanent cost, motivations can be truly random.

I was thinking this today, just because I got caught up in an excursion into fact-checking. Someone said something on a forum; I was mildly curious who they were. The forum didn’t publish names or emails, and the username was not unique or lead anywhere. But the forum used gravatars: those little icons that either show patterns or a user-configured image next to your post. Gravatars are based on your email address which you enter to get a confirmation note when you post to some forums. The icon image itself is served from gravatar.com, based on a MD5 hash of your email.

There’s no known mathematical way to get from the hash to the email (touch wood). But the hash still leaks information. You can generate hashes from a set of possible email addresses. You can confirm a person has used a particular email address by checking that emails hash (note there’s no guarantee someone is using their own email address — strange motivations can lead you down wrong paths). In this case, though, I was able to just search for the hash itself. I quickly found another account on a separate site using that same hashed gravatar, and where the user had used a more personal username. From the username I was able to try out an email address that matched the hash. And from that, I found a site that listed the person full name and address. All of this took me less than ten minutes.

I hadn’t really thought about using gravatars to expose identities before (others have). It would be a useful skill to have in a modern journalist’s toolkit though. I guess more intriguingly, it might be a tool that one could provide to journalists. I keep thinking about the narrow subset of all possible characters that the world’s email addresses, and indeed human names inhabit. If you were to set about compiling and de-duping the world’s known spamming lists, how many of the world’s emails could you collect? How quickly could you brute force everyone’s full name, or a reasonably high percentage? Over 90% of the US population are covered by 200,000 surnames: how quickly could we get high coverage by combining those with the  most popular first names? (I admit to first considering this when thinking about how one could independently track the extent and use of the Right to be Forgotten in the EU. Programmatically generate a significant percentage of all the possible names in the European namespace, then check the affected and unaffected search engine results for each.)

I would like journalism to be about creating new facts about the world, instead of reporting pre-existing facts or just propagating novel speculation.

2016-02-17

Lengthy subject matter — email subject lines do seem to be getting longer!

I spent a little time over the weekend pursuing my theory that email subject lines have grown longer over time, based on the surprising terseness of subjects I observed in an old inbox of messages from 1998.

I have two ginormous corpora of outgoing and incoming email: one from about 1999 which fades out into 2007, and one picking up the slack from 2007 onwards. In total, they contain 2,732,487 messages. I measured the subject line length of each of these messages, threw that into a database along with their date, and plotted the average length for every day in the corpus.

Ta da!

The trend line suggests that subject lines indeed  been lengthening by an average of 1.2 characters a year since 1999.

What’s going on here? Is it just me? Is it just my email correspondents who are getting more long-winded? Did I make a mistake? If you’re curious, you can check my working, and try it out on your own emails: here’s the code I used to create the graph above. If you have email archives of your own to measure (and speak a little Python), that code can slurp up your email in mbox, Maildir formats, or in a notmuchmail database, and plot the results using matplotlib. You can also try out different processes my conclusions with the 115MB sqlite database of my own subject line lengths, available for download, or as a torrent.

Some potential explanations I’ve been mulling. It could be an artifact of a growth in mailing lists traffic whose subject lines are prefixed by a mailing list name (ie “[mylist] hello everyone”). I could check that by removing square brackets or mails with mailing list headers.

It could be a rise in marketing email (though probably not spam), which might have different characteristics from artisanally-crafted individual emails. Literature survey time! People who send mass marketing email really care about subject line length (or at least the people who market to email marketers like to write about it, when they’ve run out of other things to write about). One of these studies, which analysed 9 million emails, let drop that the average length of 9 million emails sent in February 2015 was 41-50 characters, which seems to suggest that at marketing mail at that moment in time matches my average, or maybe slightly shorter. (The most conscientious of these marketing marketeers, incidentally, conclude that subject line length makes no difference to email open rates.)

It might be related to growing screen sizes. If you have more horizontal space to type a subject line, you might tend to stuff more into it. I should compare it with this browser screen resolution dataset from statcounter. It’d be hard to make a causal connection from them both rising at the same time, but there may be some discontinuities in average screen size that might correlate with, for example, whatever weirdness is happening in the subject line in 2009-2011 (could just be my weird data, though). If monitor size is a factor it’s surprising that the rise of mobile hasn’t slowed the curve though. Or maybe it has: I could filter out mobile messages and see if they’re shorter.

Finally, it could just be that people just say more in email subject lines these days. Not sure how you’d check that specific factor: it would be good to confirm that, say, word length was going up also. Odd that it’s such a consistent process though.

What are your theories?

2016-02-12

Contractually Required Blogpost

Well, I was hoping to present some stats about subject line lengths over the centuries, but this Python program seems to have a very conservative estimate of how many emails I wrote from 2001-2007. I’ll look at the code again tomorrow.

I just established with Ada and Milo’s help that modern American children know the “Baby Bumblebee” song. I wonder what it used to take for kids’ songs to cross continents (I didn’t learn this song in the UK)? I wonder if it’s easier now?

While I’m talking about old standbys, how Sesame Street successfully battled the gods of ancient Egypt for the soul of a small child is once again doing the rounds: Against Big Bird, The Gods Themselves Contend In Vain.

2016-02-11

Coding underwater

Part of my job is keeping up with a narrow subset of news. Being offline from Twitter has been strange for that: I hear news when people tell me. It’s a bit like when you come out of the swimming pool, and your ears are still full of water. I can still hear, but it’s muffled, at a distance. (“Now you have people to read Twitter for you,” says Liz consolingly.)

The lack of Facebook I haven’t noticed so much, but it was Twitter that was making me anxious. I’m already dealing with the consequences of a couple of minor twitter skirmishes second-hand. I can’t work out whether it’s easier to be calming, or whether I’m just a hypocrite for giving advice from the sidelines. Oddly, my continuing Tumblr habit is still pretty calming. Tumblr can get red hot for internecine warfare — I think possibly for the same porous private/public boundaries, contextless reblogging and hot-potato passing that Twitter enables — but I’ve adopted a somewhat lower level of people to follow, a distance away from my own circles. They’re not far away from the frontlines, and you occasionally hear a burst of gunfire, but in general it is quieter there.

I’m taking the time to continue to do digital maintenance. I moved a bunch of very ancient mailspools into somewhere less vulnerable. The earliest is from August 1997; I still remember my annoyance when I lost the rest of them by failing to pick up my backup CDs from Wired when I left.

Looking through them, I wasn’t surprised that the volume was smaller (despite feeling overwhelming at the time). But even the subject lines seem shorter, look:

(Apologies for any privacy squick for anyone listed. Hey, it’s all meta-data, right?)

I blame wider screens. Of course what I should do now is actually do some data-mining of subject lines (and email sizes) and see how they’ve grown over time. ACTUAL CODE AND DATA.

Talking of code, here’s something I did for yesterday’s post. My vision of writing online always had some element of code mixed with words. It was part of what fascinated me about the the Dynabook. Back when it would sound funny rather than horrid, I would always say that I preferred my fiction with code examples.

So in yesterday’s blog post, there’s a tiny piece of code. It just randomly shuffles the multiple links to tone argument definitions, because I didn’t want to privilege one version of the story over another. If I’d had more time I would have worked out a way to make it a bit more visible, but as it is it ate about an hour of my time, which is why I’m not eagerly diving headfirst into learning email parsing and MATLAB right now. But I do want to try and integrate code into my writing more. Paul Ford can’t have all the fun!

I was pleased that I could just stick the code into my blog post, like it was just so much more HTML. My Javascript is rusty, so it took me a while to make it sufficiently self-contained. Here’s the code:

The main function does something called a Fisher-Yates shuffle, which I’d never heard about until I’d googled for how to do a shuffle in Javascript and found Frank Mitchell’s only way to shuffle an array in Javascript. Like everyone else, I code by googling these days.