skip to main bit
a man slumped on his desk, from 'The Sleep of Reason Produces




home server

Well, that was an embarrassing amount of time having to engineer around forgetting a password. Nothing important lost, but an important lesson: if you write down a password (and you should write down a password), write it down correctly. Being clever and elliptical in the past is just frustrating in the present. Also true of secret societies. Lighten up!

Anyway, the password was for an empty Debian install onto a chromebox which I’d set up, but not actually populated with files and such, so no great loss. Except I had to learn how to install Debian on a chromebox again (shades of “Flowers for Debian” again).

It remains a very promising base for a home server though. Asus Chromeboxen are still around $150, can be upgraded pretty easily, and installing a free operating system on them only costs you 3 sanity rolls, max. The machine is very quiet, tiny, and I think powerful enough if you stick some more RAM and SSD into it. My last home server has been happily doing its thing for a decade, and this, which I’m eying up as a replacement, has the same feel to it.

To be honest, the most exciting part of it was working out a way that I could encrypt its root hard drive, and but somehow let me ssh into it to type in the magic passphrase even before the thing had finished booting. This is a pretty good guide to doing that. Feels like magic to connect into a thing that hasn’t even booted into full Linux.


the inhuman search engine

There was a time when you could parlay a decent understanding of Google search (or any search) into a journalistic career. Journalists were, on the whole, trained to collect information through contacts and telephone calls, but at that time, they didn’t yet have a consistent grip on how to piece together stories from the Net. The majority of stories were built from legwork, not basic Internet skills. The pendulum is swinging the other way now I think. Many, many articles are now written that were spun from forwarded screenshots and searches. You can still get ahead a little from having advanced knowledge: there still remains a benefit, I believe, for journalists who know a little coding or a little statistics. But with the home base of journalism moving online, here’s almost certainly an emerging premium now for people who can simultaneously talk to computers and humans in languages they understand. Or maybe can use the Internet to peer into motivations and other intimacies, rather than uncover facts.  A good example is Gwern and Andy Greenberg’s piece on the identity of Satoshi Nakamoto. There’s some serious understanding of a lot of tech in their research, but it was mostly undone by underestimating how strange human motivation can be. Why would someone try to plant a trail suggesting they were Nakamoto, with no obvious benefit? Strange motives sink plenty of research projects. But perhaps one of the conclusions of anyone who swims in the large scale view of conspiracy theories and fraud that the Net offers is that, absent a permanent cost, motivations can be truly random.

I was thinking this today, just because I got caught up in an excursion into fact-checking. Someone said something on a forum; I was mildly curious who they were. The forum didn’t publish names or emails, and the username was not unique or lead anywhere. But the forum used gravatars: those little icons that either show patterns or a user-configured image next to your post. Gravatars are based on your email address which you enter to get a confirmation note when you post to some forums. The icon image itself is served from, based on a MD5 hash of your email.

There’s no known mathematical way to get from the hash to the email (touch wood). But the hash still leaks information. You can generate hashes from a set of possible email addresses. You can confirm a person has used a particular email address by checking that emails hash (note there’s no guarantee someone is using their own email address — strange motivations can lead you down wrong paths). In this case, though, I was able to just search for the hash itself. I quickly found another account on a separate site using that same hashed gravatar, and where the user had used a more personal username. From the username I was able to try out an email address that matched the hash. And from that, I found a site that listed the person full name and address. All of this took me less than ten minutes.

I hadn’t really thought about using gravatars to expose identities before (others have). It would be a useful skill to have in a modern journalist’s toolkit though. I guess more intriguingly, it might be a tool that one could provide to journalists. I keep thinking about the narrow subset of all possible characters that the world’s email addresses, and indeed human names inhabit. If you were to set about compiling and de-duping the world’s known spamming lists, how many of the world’s emails could you collect? How quickly could you brute force everyone’s full name, or a reasonably high percentage? Over 90% of the US population are covered by 200,000 surnames: how quickly could we get high coverage by combining those with the  most popular first names? (I admit to first considering this when thinking about how one could independently track the extent and use of the Right to be Forgotten in the EU. Programmatically generate a significant percentage of all the possible names in the European namespace, then check the affected and unaffected search engine results for each.)

I would like journalism to be about creating new facts about the world, instead of reporting pre-existing facts or just propagating novel speculation.


Lengthy subject matter — email subject lines do seem to be getting longer!

I spent a little time over the weekend pursuing my theory that email subject lines have grown longer over time, based on the surprising terseness of subjects I observed in an old inbox of messages from 1998.

I have two ginormous corpora of outgoing and incoming email: one from about 1999 which fades out into 2007, and one picking up the slack from 2007 onwards. In total, they contain 2,732,487 messages. I measured the subject line length of each of these messages, threw that into a database along with their date, and plotted the average length for every day in the corpus.

Ta da!

The trend line suggests that subject lines indeed  been lengthening by an average of 1.2 characters a year since 1999.

What’s going on here? Is it just me? Is it just my email correspondents who are getting more long-winded? Did I make a mistake? If you’re curious, you can check my working, and try it out on your own emails: here’s the code I used to create the graph above. If you have email archives of your own to measure (and speak a little Python), that code can slurp up your email in mbox, Maildir formats, or in a notmuchmail database, and plot the results using matplotlib. You can also try out different processes my conclusions with the 115MB sqlite database of my own subject line lengths, available for download, or as a torrent.

Some potential explanations I’ve been mulling. It could be an artifact of a growth in mailing lists traffic whose subject lines are prefixed by a mailing list name (ie “[mylist] hello everyone”). I could check that by removing square brackets or mails with mailing list headers.

It could be a rise in marketing email (though probably not spam), which might have different characteristics from artisanally-crafted individual emails. Literature survey time! People who send mass marketing email really care about subject line length (or at least the people who market to email marketers like to write about it, when they’ve run out of other things to write about). One of these studies, which analysed 9 million emails, let drop that the average length of 9 million emails sent in February 2015 was 41-50 characters, which seems to suggest that at marketing mail at that moment in time matches my average, or maybe slightly shorter. (The most conscientious of these marketing marketeers, incidentally, conclude that subject line length makes no difference to email open rates.)

It might be related to growing screen sizes. If you have more horizontal space to type a subject line, you might tend to stuff more into it. I should compare it with this browser screen resolution dataset from statcounter. It’d be hard to make a causal connection from them both rising at the same time, but there may be some discontinuities in average screen size that might correlate with, for example, whatever weirdness is happening in the subject line in 2009-2011 (could just be my weird data, though). If monitor size is a factor it’s surprising that the rise of mobile hasn’t slowed the curve though. Or maybe it has: I could filter out mobile messages and see if they’re shorter.

Finally, it could just be that people just say more in email subject lines these days. Not sure how you’d check that specific factor: it would be good to confirm that, say, word length was going up also. Odd that it’s such a consistent process though.

What are your theories?


Contractually Required Blogpost

Well, I was hoping to present some stats about subject line lengths over the centuries, but this Python program seems to have a very conservative estimate of how many emails I wrote from 2001-2007. I’ll look at the code again tomorrow.

I just established with Ada and Milo’s help that modern American children know the “Baby Bumblebee” song. I wonder what it used to take for kids’ songs to cross continents (I didn’t learn this song in the UK)? I wonder if it’s easier now?

While I’m talking about old standbys, how Sesame Street successfully battled the gods of ancient Egypt for the soul of a small child is once again doing the rounds: Against Big Bird, The Gods Themselves Contend In Vain.


Coding underwater

Part of my job is keeping up with a narrow subset of news. Being offline from Twitter has been strange for that: I hear news when people tell me. It’s a bit like when you come out of the swimming pool, and your ears are still full of water. I can still hear, but it’s muffled, at a distance. (“Now you have people to read Twitter for you,” says Liz consolingly.)

The lack of Facebook I haven’t noticed so much, but it was Twitter that was making me anxious. I’m already dealing with the consequences of a couple of minor twitter skirmishes second-hand. I can’t work out whether it’s easier to be calming, or whether I’m just a hypocrite for giving advice from the sidelines. Oddly, my continuing Tumblr habit is still pretty calming. Tumblr can get red hot for internecine warfare — I think possibly for the same porous private/public boundaries, contextless reblogging and hot-potato passing that Twitter enables — but I’ve adopted a somewhat lower level of people to follow, a distance away from my own circles. They’re not far away from the frontlines, and you occasionally hear a burst of gunfire, but in general it is quieter there.

I’m taking the time to continue to do digital maintenance. I moved a bunch of very ancient mailspools into somewhere less vulnerable. The earliest is from August 1997; I still remember my annoyance when I lost the rest of them by failing to pick up my backup CDs from Wired when I left.

Looking through them, I wasn’t surprised that the volume was smaller (despite feeling overwhelming at the time). But even the subject lines seem shorter, look:

(Apologies for any privacy squick for anyone listed. Hey, it’s all meta-data, right?)

I blame wider screens. Of course what I should do now is actually do some data-mining of subject lines (and email sizes) and see how they’ve grown over time. ACTUAL CODE AND DATA.

Talking of code, here’s something I did for yesterday’s post. My vision of writing online always had some element of code mixed with words. It was part of what fascinated me about the the Dynabook. Back when it would sound funny rather than horrid, I would always say that I preferred my fiction with code examples.

So in yesterday’s blog post, there’s a tiny piece of code. It just randomly shuffles the multiple links to tone argument definitions, because I didn’t want to privilege one version of the story over another. If I’d had more time I would have worked out a way to make it a bit more visible, but as it is it ate about an hour of my time, which is why I’m not eagerly diving headfirst into learning email parsing and MATLAB right now. But I do want to try and integrate code into my writing more. Paul Ford can’t have all the fun!

I was pleased that I could just stick the code into my blog post, like it was just so much more HTML. My Javascript is rusty, so it took me a while to make it sufficiently self-contained. Here’s the code:

The main function does something called a Fisher-Yates shuffle, which I’d never heard about until I’d googled for how to do a shuffle in Javascript and found Frank Mitchell’s only way to shuffle an array in Javascript. Like everyone else, I code by googling these days.

Emergent themes

Look! Another no-publicity big-star tv-imitating-but-not-actually-tv feature! One more, and we shall have a trend!

Looks like the Flirble Organization has finally sublimated. I must write a proper obit for it, and, which held together so much of the early British Internet scene. In the exodus, I’m temporarily stashing my decades-old home domain, on an Amazon instance until I can find it a better home.

It’s pretty hard to navigate AWS’s billing system, but when I did, I found that I’d been paying them 3 cents a month for … quite a while. Digging around, I found that I’d already used it as a potential escape route — I created a backup copy of oblomovka from the time of the Haystack Affair. I don’t know if I ever actually switched Oblomovka over to that after Oblomovka started getting a lot of hits, but it’s been patiently waiting to deal with the failover ever since.

I really can’t escape the distant past in this posting series, can I?

I’ve often wondered what I would have done differently with Haystack, if I had the opportunity to go back in time. It seems like it was one of the first of a general rise in the j’accuse mode of dealing with issues in public infosec projects. I don’t do that sort of activism any more, I think because it’s far too stressful on everyone involved, and had a lot of less than optimal outcomes. The hope is that you can get people out of a bad situation quickly with gentler strategies.

I think this may be another emergent theme, though: large explosions of public group emotional intensity may be suspicious. I am certainly suspicious of them, and these days I actively avoid such events, perhaps a little too much. They are contagious, and defining — and are often effective.

It feels to me that part of the current meta-debate online is how emotion should be moderated online. What emotions should you express? What are you allowed to do or say with emotion as your impetus? Who is showing emotion, and who is showing no emotion? (Think of the discussions about trolling and harassment, of civil behaviour and safe and trusted platforms.) Who is deploying emotion, who is authentically demonstrating their emotion, what emotions can you/should you/must you empathise with. Which ones can you/should you/must you reject?

When I am discussing something intensely online (yes that is a euphemism for “being in a flame-war”), I am very emotional. I pace around, am distracted, am twitchy. A few times I’ve asked the other person in the argument how they feel, and I’m surprised when people say that they’re not feeling any emotion at all. Even when they’re writing twenty replies in an hour. Can that be true? I assume good faith, even in an Internet fistfight, but I find it hard to imagine. I have also noted that I have had to explicitly say I’m feeling emotional, because my written style never indicates that, because I’m usually trying to maintain the form of a “correct” Internet discussion.

It feels like one of the shifts in the last few years has been the acceptability of expressing strong emotion in discussion, especially in public debate. When the first time the tone argument (&c, &c, &c, &c, &c) was identified as a trope in online discussion, was also the place where people realized that being angry didn’t always reduce your points to rubble. That anger might actually help emphasise and underline your point. That it might be dishonest and unbalancing to discredit or put it to one side.

Yet when I say that, I am suffixing the description of this shift with “at least in one of the subcultures that might make a claim to define the broad parameters of Internet discussion.”

But what does *that* mean, in an Internet of billions?

I just spent a good 20 minutes attempting to eke out the first use of the phrase “tone argument.” I’m pretty sure most of my trails end just pre-Racefail, a seminal moment which brought many of these issues to a head in the online English-speaking science fiction community. But note that despite carefully picking out a broad set of sources above, I know at least two of the authors personally, heck I live with one of the founders of the definition sites linked to, and am probably within two hops, or 500 miles of almost all of the other authors. All of them come from political viewpoints that, while scattered across a political spectrum, are shared by a tiny (but growing?) percentage of the population, even in the countries they write from. Those countries, meanwhile, are all Western, and all in the anglosphere.

That parochialism used to be less weird. But given that part of this discussion is about diversity, it begins to get weirder. Much of the form of Internet discussion is formed by the protocols, and later the platforms that dominated it early on. But is it also defined by broad cultural rules that spread through that medium? Barlow’s Declaration has its force because it came from the epicenter. Now it feels like the strongest, most generative part of the current zeitgeist is a critique of that centering. But much of its most forceful forms come from incredibly close to the same epicenters, the same sources.

(I do apologise if none of this makes any sense to you! These are disjointed notes on my thinking than anything more substantial or coherent. I’m also a little weirded out by often I refer to myself in this. I think there’s an eventual version of this that doesn’t sound quite so personal or egocentric, but for now I’m stuck with being inside my own head, a place full of my personal effects.)


Sick beats… paper? scissors?

Still incoherently poorly. I ended up trying to just poke some old emails, since I knew I’d be too lightheaded to feel entirely guilty at not replying to them, even though I should be.

I think the only meta-thought I had was about why this blog is so consistently retrospective, when I don’t believe I mull over the past that much. I certainly feel a little embarrassed talking about the past to other people: but perhaps that means that I think about it a lot, but it gets blocked at the level of action, so I don’t receive any feedback about it?

I’d much rather think about the unconstrained future! Or the promising present.

Well, one of those ancient emails is still relevant. Bobbie Johnson sent out a mail at the start of Ghost Boat, Medium’s investigative journalism project to discover what happened to 243 people who were supposed to travel from Libya to Italy in a refugee boat — but who disappeared. It’s still ticking along, driven by the momentum of its team, and their audience, who continue to eke out new leads.

There’s something in this, and Serial, and many of the Patreon projects I see, where a research project is drawn forward by its own supporters. A set of works that would normally be constrained by time (because periodicals don’t just pay for one story, and people usually need to move on in their lives), that are now stretching, becoming people’s sole pursuit. It’s not unusual: plenty of people work at one thing for a large period of their lives. But it’s a new way of creating that venture. Is it any more or less predictable or stable than other long-term sources of resources or minor income? Does it lead to a different pattern of investment? Different projects selected?


Interdependence Day

I don’t what I was doing when Barlow’s Declaration came out. Looking now through some internal landmarks to orientate myself, I think I must have joined the exodus from Wired UK to Virgin Net a couple of months before its February 1996 dateline. The Wired UK essay was sent out a year, less a day, from the Declaration.

I wouldn’t be surprised if I missed it entirely. I don’t think I was hugely enamoured with West Coast techno-utopianism during this period.

What’s surprising, after placing it in the chronology, is how late that date feels. EFF had been around for six years; Wired magazine for three years, the Web for two years or so. The Californian Ideology, probably the most prominent critique of Barlow’s Jeffersonian framing, came out months before it did, in the Autumn of 1995.

It’s also worth digging around to see what the contemporary critiques of the Declaration were. At the time, I remember them as being pretty shoddy: not in terms of the points they made (which were significant, but largely obvious), but in their rhetorical heft. Zeitgeist doesn’t mean everyone thinks the same at the same time; it means that some ideas obtain a velocity that their critics, fighting headwinds, can only dream of achieving.

I wish I could understand more of this German one, awesomely named Die Anti-Barlow. The formatting obscures whether its conclusion is supposed to be a quote from John Perry, or another English-speaker, but it hangs in the air:

“Dominate culture today and you control the laws in 15 years.”

Five years on!


Horace and Pity

I’m sick again, which is hopefully not the leitmotif of 2016. Nothing serious, just a cold, but I’d barely recovered from the last bout of flu. So I’m mostly sleeping, ssh’ing into things to move stuff out of the shutting-down coloc, and watching Louis CK’s Horace and Pete, which is like a little off-off-Broadway production if community theater had HD cameras, Steve Buscemi, Jessica Lange, Alan Alda, and Paul Simon. I don’t mean that in a bad way!

I appreciate CK’s deliberate attempts not to pre-publicise. The first anyone heard about the show was a short mail from him to his subscribers, announcing just the show’s title and the price, $5, payable in PayPal, Amazon, Bitcoin and the rest. A day or so later he explained a bit more:

Part of the idea behind launching it on the site was to create a show in a new way and to provide it to you directly and immediately, without the usual promotion, banner ads, billboards and clips that tell you what the show feels and looks like before you get to see it for yourself. As a writer, there’s always a weird feeing that as you unfold the story and reveal the characters and the tone, you always know that the audience will never get the benefit of seeing it the way you wrote it because they always know so much before they watch it. And as a TV watcher I’m always delighted when I can see a thing without knowing anything about it because of the promotion. So making this show and just posting it out of the blue gave me the rare opportunity to give you that experience of discovery.

It’s a TV show that hasn’t been broadcast on anything like a television network. Not unheard of, but it also feels like a play and a personal project. Is television simply a format now: episodic, under two hours, a budget within these boundaries? I expect that Horace and Pete will end up on TV eventually, but then so do films.

It’s pretty good. It kept my attention through the headaches and coughing and woe and the is and the me. It’s consoling to watch someone do a Mike Leigh about people I am like, rather than people I don’t like. Fumbled lines and good-enough first takes, make me fall in love with you, always. It’s a toolkit of forms and performances being put to good use.


Thanking Hyperlinks For Their Service

Tidied up the sidebar a bit here. Happily deleted the Google Ads (what a strange and distracting experiment advertising proved to be. I mean universally, not here, where I think I got $10 or so across the decade. Entirely undistracted.). I felt sadder cutting down all the links to other people. The people are still here, but the destinations are long gone. I’ll replace them soon I hope, but I didn’t like the smack of anachronism a link to another person’s dead webpage had. That said, looking through some of the older blog entries here, maybe the Web and the Unixy way I had of looking at it was always a nostalgia-tainted vision of the future. Like we were recapitulating the dreams of the Seventies in an attempt to shove away the grip of the present. A short circuit.

I get the same generational cross-patch feel watching J.C.R. Licklider speaking in 1986. You can’t quite place where Licklider is in time here: he’s an old man, over 70, talking about man-machine prosthesis and virtual reality goggles as though they were ancient experiment. But you know that everyone there was looking in a straight line to the future, bucket-brigading these ideas out of the past, smuggling them past all those Eighties DOS boxes.

Those moments are disorienting, when a new future finds its secret history. When all the Rubyists began to find a joy (ha) and a history in Vim, a tool built for a different world; when young artists find themselves veering toward skills thirty-years gone instead of what they are supposed to learn in college. It’s not just about fashion, it’s about a second victory of an old school, on the verge of a total eclipse. There is a political analogy here; right now there always is with me.

(The other thing that’s caught my eye is differences in writing style in 2001. I’m possibly reading too much into a drily factual blog entry, but does even Glenn nowadays write like Glenn wrote then?


petit disclaimer:
My employer has enough opinions of its own, without having to have mine too.