skip to main bit
a man slumped on his desk, from 'The Sleep of Reason Produces



Archive for the ‘In lieu of social media’ Category


Lengthy subject matter — email subject lines do seem to be getting longer!

I spent a little time over the weekend pursuing my theory that email subject lines have grown longer over time, based on the surprising terseness of subjects I observed in an old inbox of messages from 1998.

I have two ginormous corpora of outgoing and incoming email: one from about 1999 which fades out into 2007, and one picking up the slack from 2007 onwards. In total, they contain 2,732,487 messages. I measured the subject line length of each of these messages, threw that into a database along with their date, and plotted the average length for every day in the corpus.

Ta da!

The trend line suggests that subject lines indeed  been lengthening by an average of 1.2 characters a year since 1999.

What’s going on here? Is it just me? Is it just my email correspondents who are getting more long-winded? Did I make a mistake? If you’re curious, you can check my working, and try it out on your own emails: here’s the code I used to create the graph above. If you have email archives of your own to measure (and speak a little Python), that code can slurp up your email in mbox, Maildir formats, or in a notmuchmail database, and plot the results using matplotlib. You can also try out different processes my conclusions with the 115MB sqlite database of my own subject line lengths, available for download, or as a torrent.

Some potential explanations I’ve been mulling. It could be an artifact of a growth in mailing lists traffic whose subject lines are prefixed by a mailing list name (ie “[mylist] hello everyone”). I could check that by removing square brackets or mails with mailing list headers.

It could be a rise in marketing email (though probably not spam), which might have different characteristics from artisanally-crafted individual emails. Literature survey time! People who send mass marketing email really care about subject line length (or at least the people who market to email marketers like to write about it, when they’ve run out of other things to write about). One of these studies, which analysed 9 million emails, let drop that the average length of 9 million emails sent in February 2015 was 41-50 characters, which seems to suggest that at marketing mail at that moment in time matches my average, or maybe slightly shorter. (The most conscientious of these marketing marketeers, incidentally, conclude that subject line length makes no difference to email open rates.)

It might be related to growing screen sizes. If you have more horizontal space to type a subject line, you might tend to stuff more into it. I should compare it with this browser screen resolution dataset from statcounter. It’d be hard to make a causal connection from them both rising at the same time, but there may be some discontinuities in average screen size that might correlate with, for example, whatever weirdness is happening in the subject line in 2009-2011 (could just be my weird data, though). If monitor size is a factor it’s surprising that the rise of mobile hasn’t slowed the curve though. Or maybe it has: I could filter out mobile messages and see if they’re shorter.

Finally, it could just be that people just say more in email subject lines these days. Not sure how you’d check that specific factor: it would be good to confirm that, say, word length was going up also. Odd that it’s such a consistent process though.

What are your theories?


Contractually Required Blogpost

Well, I was hoping to present some stats about subject line lengths over the centuries, but this Python program seems to have a very conservative estimate of how many emails I wrote from 2001-2007. I’ll look at the code again tomorrow.

I just established with Ada and Milo’s help that modern American children know the “Baby Bumblebee” song. I wonder what it used to take for kids’ songs to cross continents (I didn’t learn this song in the UK)? I wonder if it’s easier now?

While I’m talking about old standbys, how Sesame Street successfully battled the gods of ancient Egypt for the soul of a small child is once again doing the rounds: Against Big Bird, The Gods Themselves Contend In Vain.


Coding underwater

Part of my job is keeping up with a narrow subset of news. Being offline from Twitter has been strange for that: I hear news when people tell me. It’s a bit like when you come out of the swimming pool, and your ears are still full of water. I can still hear, but it’s muffled, at a distance. (“Now you have people to read Twitter for you,” says Liz consolingly.)

The lack of Facebook I haven’t noticed so much, but it was Twitter that was making me anxious. I’m already dealing with the consequences of a couple of minor twitter skirmishes second-hand. I can’t work out whether it’s easier to be calming, or whether I’m just a hypocrite for giving advice from the sidelines. Oddly, my continuing Tumblr habit is still pretty calming. Tumblr can get red hot for internecine warfare — I think possibly for the same porous private/public boundaries, contextless reblogging and hot-potato passing that Twitter enables — but I’ve adopted a somewhat lower level of people to follow, a distance away from my own circles. They’re not far away from the frontlines, and you occasionally hear a burst of gunfire, but in general it is quieter there.

I’m taking the time to continue to do digital maintenance. I moved a bunch of very ancient mailspools into somewhere less vulnerable. The earliest is from August 1997; I still remember my annoyance when I lost the rest of them by failing to pick up my backup CDs from Wired when I left.

Looking through them, I wasn’t surprised that the volume was smaller (despite feeling overwhelming at the time). But even the subject lines seem shorter, look:

(Apologies for any privacy squick for anyone listed. Hey, it’s all meta-data, right?)

I blame wider screens. Of course what I should do now is actually do some data-mining of subject lines (and email sizes) and see how they’ve grown over time. ACTUAL CODE AND DATA.

Talking of code, here’s something I did for yesterday’s post. My vision of writing online always had some element of code mixed with words. It was part of what fascinated me about the the Dynabook. Back when it would sound funny rather than horrid, I would always say that I preferred my fiction with code examples.

So in yesterday’s blog post, there’s a tiny piece of code. It just randomly shuffles the multiple links to tone argument definitions, because I didn’t want to privilege one version of the story over another. If I’d had more time I would have worked out a way to make it a bit more visible, but as it is it ate about an hour of my time, which is why I’m not eagerly diving headfirst into learning email parsing and MATLAB right now. But I do want to try and integrate code into my writing more. Paul Ford can’t have all the fun!

I was pleased that I could just stick the code into my blog post, like it was just so much more HTML. My Javascript is rusty, so it took me a while to make it sufficiently self-contained. Here’s the code:

The main function does something called a Fisher-Yates shuffle, which I’d never heard about until I’d googled for how to do a shuffle in Javascript and found Frank Mitchell’s only way to shuffle an array in Javascript. Like everyone else, I code by googling these days.

Emergent themes

Look! Another no-publicity big-star tv-imitating-but-not-actually-tv feature! One more, and we shall have a trend!

Looks like the Flirble Organization has finally sublimated. I must write a proper obit for it, and, which held together so much of the early British Internet scene. In the exodus, I’m temporarily stashing my decades-old home domain, on an Amazon instance until I can find it a better home.

It’s pretty hard to navigate AWS’s billing system, but when I did, I found that I’d been paying them 3 cents a month for … quite a while. Digging around, I found that I’d already used it as a potential escape route — I created a backup copy of oblomovka from the time of the Haystack Affair. I don’t know if I ever actually switched Oblomovka over to that after Oblomovka started getting a lot of hits, but it’s been patiently waiting to deal with the failover ever since.

I really can’t escape the distant past in this posting series, can I?

I’ve often wondered what I would have done differently with Haystack, if I had the opportunity to go back in time. It seems like it was one of the first of a general rise in the j’accuse mode of dealing with issues in public infosec projects. I don’t do that sort of activism any more, I think because it’s far too stressful on everyone involved, and had a lot of less than optimal outcomes. The hope is that you can get people out of a bad situation quickly with gentler strategies.

I think this may be another emergent theme, though: large explosions of public group emotional intensity may be suspicious. I am certainly suspicious of them, and these days I actively avoid such events, perhaps a little too much. They are contagious, and defining — and are often effective.

It feels to me that part of the current meta-debate online is how emotion should be moderated online. What emotions should you express? What are you allowed to do or say with emotion as your impetus? Who is showing emotion, and who is showing no emotion? (Think of the discussions about trolling and harassment, of civil behaviour and safe and trusted platforms.) Who is deploying emotion, who is authentically demonstrating their emotion, what emotions can you/should you/must you empathise with. Which ones can you/should you/must you reject?

When I am discussing something intensely online (yes that is a euphemism for “being in a flame-war”), I am very emotional. I pace around, am distracted, am twitchy. A few times I’ve asked the other person in the argument how they feel, and I’m surprised when people say that they’re not feeling any emotion at all. Even when they’re writing twenty replies in an hour. Can that be true? I assume good faith, even in an Internet fistfight, but I find it hard to imagine. I have also noted that I have had to explicitly say I’m feeling emotional, because my written style never indicates that, because I’m usually trying to maintain the form of a “correct” Internet discussion.

It feels like one of the shifts in the last few years has been the acceptability of expressing strong emotion in discussion, especially in public debate. When the first time the tone argument (&c, &c, &c, &c, &c) was identified as a trope in online discussion, was also the place where people realized that being angry didn’t always reduce your points to rubble. That anger might actually help emphasise and underline your point. That it might be dishonest and unbalancing to discredit or put it to one side.

Yet when I say that, I am suffixing the description of this shift with “at least in one of the subcultures that might make a claim to define the broad parameters of Internet discussion.”

But what does *that* mean, in an Internet of billions?

I just spent a good 20 minutes attempting to eke out the first use of the phrase “tone argument.” I’m pretty sure most of my trails end just pre-Racefail, a seminal moment which brought many of these issues to a head in the online English-speaking science fiction community. But note that despite carefully picking out a broad set of sources above, I know at least two of the authors personally, heck I live with one of the founders of the definition sites linked to, and am probably within two hops, or 500 miles of almost all of the other authors. All of them come from political viewpoints that, while scattered across a political spectrum, are shared by a tiny (but growing?) percentage of the population, even in the countries they write from. Those countries, meanwhile, are all Western, and all in the anglosphere.

That parochialism used to be less weird. But given that part of this discussion is about diversity, it begins to get weirder. Much of the form of Internet discussion is formed by the protocols, and later the platforms that dominated it early on. But is it also defined by broad cultural rules that spread through that medium? Barlow’s Declaration has its force because it came from the epicenter. Now it feels like the strongest, most generative part of the current zeitgeist is a critique of that centering. But much of its most forceful forms come from incredibly close to the same epicenters, the same sources.

(I do apologise if none of this makes any sense to you! These are disjointed notes on my thinking than anything more substantial or coherent. I’m also a little weirded out by often I refer to myself in this. I think there’s an eventual version of this that doesn’t sound quite so personal or egocentric, but for now I’m stuck with being inside my own head, a place full of my personal effects.)


Sick beats… paper? scissors?

Still incoherently poorly. I ended up trying to just poke some old emails, since I knew I’d be too lightheaded to feel entirely guilty at not replying to them, even though I should be.

I think the only meta-thought I had was about why this blog is so consistently retrospective, when I don’t believe I mull over the past that much. I certainly feel a little embarrassed talking about the past to other people: but perhaps that means that I think about it a lot, but it gets blocked at the level of action, so I don’t receive any feedback about it?

I’d much rather think about the unconstrained future! Or the promising present.

Well, one of those ancient emails is still relevant. Bobbie Johnson sent out a mail at the start of Ghost Boat, Medium’s investigative journalism project to discover what happened to 243 people who were supposed to travel from Libya to Italy in a refugee boat — but who disappeared. It’s still ticking along, driven by the momentum of its team, and their audience, who continue to eke out new leads.

There’s something in this, and Serial, and many of the Patreon projects I see, where a research project is drawn forward by its own supporters. A set of works that would normally be constrained by time (because periodicals don’t just pay for one story, and people usually need to move on in their lives), that are now stretching, becoming people’s sole pursuit. It’s not unusual: plenty of people work at one thing for a large period of their lives. But it’s a new way of creating that venture. Is it any more or less predictable or stable than other long-term sources of resources or minor income? Does it lead to a different pattern of investment? Different projects selected?


Thanking Hyperlinks For Their Service

Tidied up the sidebar a bit here. Happily deleted the Google Ads (what a strange and distracting experiment advertising proved to be. I mean universally, not here, where I think I got $10 or so across the decade. Entirely undistracted.). I felt sadder cutting down all the links to other people. The people are still here, but the destinations are long gone. I’ll replace them soon I hope, but I didn’t like the smack of anachronism a link to another person’s dead webpage had. That said, looking through some of the older blog entries here, maybe the Web and the Unixy way I had of looking at it was always a nostalgia-tainted vision of the future. Like we were recapitulating the dreams of the Seventies in an attempt to shove away the grip of the present. A short circuit.

I get the same generational cross-patch feel watching J.C.R. Licklider speaking in 1986. You can’t quite place where Licklider is in time here: he’s an old man, over 70, talking about man-machine prosthesis and virtual reality goggles as though they were ancient experiment. But you know that everyone there was looking in a straight line to the future, bucket-brigading these ideas out of the past, smuggling them past all those Eighties DOS boxes.

Those moments are disorienting, when a new future finds its secret history. When all the Rubyists began to find a joy (ha) and a history in Vim, a tool built for a different world; when young artists find themselves veering toward skills thirty-years gone instead of what they are supposed to learn in college. It’s not just about fashion, it’s about a second victory of an old school, on the verge of a total eclipse. There is a political analogy here; right now there always is with me.

(The other thing that’s caught my eye is differences in writing style in 2001. I’m possibly reading too much into a drily factual blog entry, but does even Glenn nowadays write like Glenn wrote then?

Permissionless society

I’m tentatively excited about keybase’s new filesystem, but I wonder if some of that excitement is simply because their directory structure — where I have a /keybase/public/<identifier> hierachy that can be mounted by anyone, and a /keybase/private/<me> folder that is synced only between machines I attest as controlling — maps so well to the structure I’ve been trying to use in my own home directories for, gosh, over a decade.

The top-level directory in my ~danny/ has a Private and a Public folder. The Private directory is encrypted, and is linked into by a menagerie of symlinks whenever I find something that I wouldn’t want the world to see, from configurations to tax documents. The Public folder, in theory, contains everything I wouldn’t care the world seeing. My ideal was that I’d just share ~/Public on a webserver, and I’d try to err on the Public side. In practice, I’ve never actually been brave enough to open up all of ~/Public. Too much private stuff gets emitted, even accidentally. As I was writing this, for instance, I realized that I had half-written a script that could be used to derive a relatively important password, and it was still slumped around in Public (I’ve always tried to keep all my ongoing code repositories on the Public side). Just the idea of  auditing the vast stash that has mounted up in there has lead to me growing ever more cautious.

I wish there was some middle ground between those two folders. But there isn’t, and that’s the world we live in. Unless I should mkdir ~/Obscurity one of these days.


petit disclaimer:
My employer has enough opinions of its own, without having to have mine too.