Danny O'Brien's Oblomovka » 2004 » July

Currently:

Archive for July 29th, 2004

2004-07-29»

notes on: protecting your open discussion forum»

Jamie McCarthy is talking about how Slashdot defends itself from various attacks (from DoS to “just jerks”). I managed to resist the temptation to sit at the back shouting “FIRST POST!” until they threw me out.

So far there’s been a great slice of the life of trolls, extracts from their scripts, IRC chats, and the effects on sites like the Wil Wheaton site. Jamie says he’ll be putting his slides up on his blog later.

“The more an attacker has to lose, the less likely they are to attack your site.” What if they’ve given you money? Or they might lose their job? Or lose access to your site? Seeing is gaming means: if they know about the rule, they’ll try and beat it. So if you ban something, and they find out a way of bypassing the ban, they’ll find another way. On the other hand, if you remove the visibility of the result, then they don’t know they’ve won. If they can score it, they will definitely game it. So trollers try to get slashdot stories into the “Hall of Fame” site. Or trying to get everybody to make them their slashdot friends. Same thing with Orkut: they have a top ten, so that inspires people to increase worthless links. Classic example here is Slashdot karma. We made two mistakes with karma. One: we called it karma. Second: we made it a visible number that was unbounded. When we made it into an adjective, karma-whoring collapsed. People want consistency — if you change the rules, innocent bystanders get mad. People don’t mind draconian rules if they’re consistent.

Trad exploits are out of scope for this discussion. Forum attacks are flooding, spamming. Doesn’t have to be comments. Any link on your site is the same: trackbacks are the same.

First defence is to make it expensive. Code can’t distinguish between an ingenious flood attack and a good discussion. Always close out old discussions on topical sites. Make the attacker spend time (geometric increases)

Or spend IPs. Open proxies are the main force multiplier attackers here: we need to stop open http proxies. http is the new smtp. reputable anonymisers aren’t a problem, it’s the 10000+ dumb open proxies. test ports 3128, 8080, 1080, 80 proxy port of people who comment.), accounts. Slashdot has a LWP::UserAgent patched to cope with multiple proxy tests. People complain about port scans, but they understand. Since April, when we implemented this, the crap floods diminished. Ask me again in a year to see whether this has worked.

Or spend accounts. You do have to assure that the email is valid by sending them the confirmation link. Watch out for robo-created accounts. They will usually just come from a single domain. Three hundred accounts from hotmail = normal, email accounts from evilbadmail.com = suspicious. You can make new users voices quieter – moderated by a human for the first or second time. If the account has to participate in a human way, that helps. If you want a low barrier entry for your site, you need to preserve newbie posting rights. And if so, you can only track by IP.

As an example, someone at OSCON (trolls! trolls amongst us!) created an account, and posted nasty comments so that the IP was banned. But it didn’t work, because we took into account that users of good standing can still post (anonymous posting *was* banned, which explains why all my hot grits links got dropped).

Captchas don’t work. These simply require that humans need to be brieflyG present. These don’t reduce your rates enough — only helpful if you’re trying to move actions from millions to thousands.

Other solutions: host troll discussions. Make it an easter egg, so they think they’re gaming you. They’ll preserve the site; and you can see what they’re doing. Visit where they chat. Offer a bounty (Wil Wheaton offered a $1000 that leads to prosecution). Or file a lawsuit. FreeRepublic filed for an injunction. Instead of writing code to block a troll, they put out a legal injunction.

Jamie’s summary for protecting your open discussion forum:

Know their motivations.
Know your resources: coders, moderators, hardware.
Hide information judiciously.
Encourage investment.
Enforce rules consistently.
Disallow poisoned input.
Remove their leverage, like open proxies.
Give yourself leverage to view and clean up.
Design your game well.

Comments Off on notes on: protecting your open discussion forum

notes on: book sales tell us about the state of the tech industry»

Tim O’Reilly is exposing his market research – this is sort of the sequel of a great talk Tim gave at Foo camp where his team crunched through O’Reilly book sales to try and work out new trends.

As ever, I missed the beginning. We join Mr O’Reilly as he is asserting very carefully that they have no idea what the correlation of these stats are with actual sales, then pointing out that MySql just overtook Oracle. The stats themselves is careful picking over of U.S. book sales from BookScan (not just O’Reilly). What follows is hand-wavey notes with me scrabbling to take notes, but the actual graphs Tim says will be made available publically soon.

.NET books just past Java last quarter of 2003.

Technology Supplier Market Share: Microsoft at 35%, all of open source at around 20%.

Programming language market share: 70% open source (including Java), 30% is proprietary (Visual BASIC, etc).

Version trend information — Photoshop, Dreamweaver, Flash: sort of bell-shaped, new versions spike and then slowly decline, total zig-zaggy but basically constant. With Red Hat Linux, though, after Fedora, RH books sales collapsed. Might RH have made a bad decision? Total market went down pretty quickly.

Macintosh has 3% market share? Not in books – Mac books sell 25% of all computer books. (Linux 17%, Windows 55%).

TIm’s now showing a bunch of tree maps which aren’t really easy to describe. But that’s okay, because you can play with them yourself.

Question from audience: is there any geographical patterning? This is just U.S. sales, although they have U.K. data. Wanted to sort by geography, but didn’t have time.

Question from audience: will you be publishing your categories, the ontology you’ve deduced. Answer: yeah, we really want to in our copious free time.

Are there any hotspots? Answer: Bay Area, Los Angeles, New York. Best tech shop is in Virginia. Weird, except they’re right next from Langley.

Comments Off on notes on: book sales tell us about the state of the tech industry

notes on: state of the unix kernel»

Greg Kroah-Hartman sez: we don’t have a development kernel anymore. He’s a kernel developer for a long time, a kernel maintainer, PCI, USB, drivercore, sysfs… “lotta crap like that”.

Pre-Bitkeeper, this is how they did it: maintainer sends patches to Linus. Wait. Resend if patches dropped. Lather, rinse, repeat.

January 2002 – patch penguin lkml flame fest. Worried about patchs getting dropped. Linus decided to use BitKeeper, 2.5.3 first kernel. “License weenies got their panties in bind, ah feh.” (Greg is like the Vin Diesel of kernel dev: I keep expecting him to shout “I! LOVE! THIS! DEVELOPMENT CYCLE!!!”.)

BitKeeper change things a lot. Bitkeeper maintainers got a lot more work. Linus sucks in all patches, discarded crappy stuff. Non-bk maintainers just sent patches as it goes.

Unexpected consequences: we knew what Linus was doing, so much better feedback. Bk2cvs and bk2svn trees. bkcommits-head mailing list so you could you see what was flying around.

All patches started to go through Andrew Morton; BK stuff goes through Linus.

Result: 2.6.0 = Two years of dev, 27149 different patches, 1.66 patches per hour. At least 916 unique contributors. Top developers handled 6956 patches. Ten patches a day for two years. People should research this data.

2.6.1 538 patches. 2.6.2 = 1472, 2.6.3 = 863 patches, 2.6.4 1385 patches…. keeps going. bY 2.6.7 2306 patches. Something is wrong. This is the stable kernel, but we’re going faster — one million lines added, 700 thousand lines deleted: a third of the kernel. A *lot* of data, and this is *stable*. Feeling uncomfortable yet?

All patches end up in the -mm tree. All bk trees end up iin the -mm tree. 17 different trees, acpi, agpgart, alsa, … So we have a staging area.

And 2.6.7 is the most stable kernel we’ve ever had. So we’re doing something weird, but right.

The -mm tree *is* the development kernel. Patches can go in, and if they suck, they can get out really quickly. Linus gets the good stuff, the -mm tree is the tester’s tree.

Almost everything is tested in the -mm tree before it goes to Linus.

Near future: Linus releases 2.6.x, maintainers flood him with patches, all have proved their stability in the -mm tree. We all recover, fix bugs. One month later, a new 2.6 is released.

Will 2.7 ever happen them? Maybe, if we have big intrusive patches — page clustering, timer core rewrite, stuff that touches everything. Linus’ll fork 2.7, all 2.6 changes will be merged into 2.7 daily. If 2.7 is unworkable, we delete it (so save your 2.7 patches). If 2.7 works, we merge to 2.6, call it 2.8.

Summary: the kernel developers all like how this is working. No stable internal-kernel API, never going to happen, get used to it (syscalls won’t break). Drivers outside of the tree are on their own (quote: “you’re screwed”) (conclusion: get into the kernel, quick before it runs crazily away in a big honking yellow bus of ever-accelerating development.) Proprietary people: if you’re not in the kernel, Greg says he doesn’t know you exist. And if you don’t give back to the community, he doesn’t care, either. (“Just my opinion”)

Everything subject to change at any time. These days, Linus is running the stable tree, -mm is running the unstable tree. When they realise this, they might change their mind about how things are doing.

Q&A: ISV like Oracle are going to go crazy, because they need to catch up with Greg and the kernel developers. They can’t just certify an ancient kernel and stick with that anymore.

Q&A: We don’t want the distros to fork, as happened when SUSE and Redhat settled on differenly patched 2.4.

Q&A: What about automated testing? Big companies have a test suite – hopefully, they’ll start running them every night. Novell has a giant test deparment, and they’re ramping up to do nightly test runs. If they can get that under 24 hours, we have a big honking regression test.

Q&A: (from me) What are the limits on the speed of kernel development now? “We’re going so fast, there’s a huge rate of change. Andrew seems to scale pretty well; I’m maxed out; more people coming in. We’ll find out what the limits are.”

Q&A: How many people working on Kernel at IBM? Answer: don’t know, Greg emphasises that he’s only speaking for himself.

Q&A: When are you going to fix broken stuff, like Firewire? “Well, it should build.” (heckle: is that your testsuite then?) “Yes. We can’t test drivers. You need to write them themselves.”

Q&A: The problem with automated tests for drivers is that it needs hardware. Who is going to pay for that? “I’ve written drivers that I don’t have hardware for; I know someone who has debugged joystick drivers via Irc between Czechoslavakia and Israel. ‘Push the joystick left? Does it work?'” Some stuff isn’t easily testable. We need to fix the infrastructure. OSDL trying to fix this.

Q&A: How can Andrew and Linus do anything but rubber-stamp thousands of patches. Answer: we’re not writing much code these days. It’s a trust network — I trust people who send us stuff. And now we have a blame network, too, so if somebody sends me something that breaks, I know whose fault it is.

Q&A: Is there still a problem with tracing origin of code? With BK, I can resolve all lines to a email address and a name. The lawyers I speak to say that’s okay. All patches get a “signed off by: name and email”. That’s a little more explicit. Not legally binding. Once again, it’s a blame issue. I’ve seen patches flow through five people. More people to blame!

Somebody was sending me code on plug and play, and it just seemed too good: conformed with all the Microsoft specs, worked really well, made me very suspicious. So I made him prove it, send me all the places that he found it the stuff. And I was finally happy with the provenance. Later it turns out he was seventeen years old. You can’t tell where you’re going to get excellent code.

Oh, btw: Greg says there’s a really good piece about this switch in the latest issue of Linux Weekly News. I don’t have a subscription to LWN, for ridiculous personal reasons (I feel odd using paid-for info to compile NTK. Same thing why I don’t like taking journalist freebies: I believe I will be struck by lightning if I do. Previous experious indicates that belief to be justified.) But if I wasn’t so weirdly idiosyncratic, I’d pay their sub rates in a shot.

Comments Off on notes on: state of the unix kernel

notes on: make magazine»

Make: Dale Dougherty, publisher of Make Magazine. Mark Frauenfelder is the editor. Will be coming out in Spring. Here to have a discussion; they don’t have a presentation, they just want to work out what they want to do with it. :hey’re here to increase their pool of contributors, rather than subscribers.

Three streams: Dale developed the Hack series for O’Reilly. Really smart people are doing interesting clever nonobvious things with computers, and other people would find useful. Another book project inspiration was the “Hardware Hacking Projects For Geeks” – like replacing the sound chip in a Furby.

Started because Dale was in a cab with Tim O’Reilly saying “there isn’t a Martha Stewart in the technology space – somebody who rediscovered and recovered crafts and gave them to a wider public”.

A move from mass-manufacturing to individual manufacturing. Creating one-offs at home, using tech Dale’s seen at MIT Media lab. Current magazines are “cargo magazines” just telling you where to buy, not how to manufacture.

Mark Frauenfelder up now. He said the idea reminded him of the old Forties Popular Science magazines, back then it was cheaper to build than buy. Then buying was cheaper, so we lost that knowledge. But it’s always been a lot of fun, especially when the roles may reverse again, at least for customised stuff.

So example spread they’re showing is of Kite Camera photography. Feature is a low-cost, $10 camera with a silly putty timer. Ebay as a parts supplier for everyone. (Looks good, although in the cutthroat British magazine industry, I can already hear Future and Dennis leaping like jackals on these scraps and sending their first Make rip-off to print before this talk even finishes. “Make Format”? “Slaphappy Magazine” “Dig”. Yeah, “Digger”. With a soft-porn centerfold section called “The Dirty Digger”)

Going for a HOWTO feel, just to improve the overall documentation of these products. It’s valuable to preserve the dead-ends too, so people can take off with those, so they’ll be sticking

Somebody from audience suggesting talking to Fry’s about “MAKE kits”. Popular Science used to offer “buy the plans” services (like Altair, or Nascom). Mark wants to include the complete plans in the magazine itself.

Question: are you going to have a section for quick hacks? Yes: mobile, home entertainment, etc. Guy asking says he’s got a lot of hacks that are too small even for Webpages.

Question: have you seen Readymade? They have a MacGyver challenge, similiar sensibility. Mark used to work with the Readymade publisher. They see it differently — Readymade not as geeky as Make is going to be. Mark likes a lot of Readymade, but Make involves technology. Followup question: what is your definition of technology then? Large and blurry — the spread of the tech metaphor to other areas.

Question: what about the legality issue. It’s an editorial decision, says Dale. In a sense, the Hacks book isn’t written for hackers, it’s written for people who can learn from hackers. But the grey areas move — burning CDs was a grey area a few years ago, and now? There’s a big difference between doing a hack, describing it, and then building it from the plans.

Dale: “we see the O’Reilly audience as a core, but we want expand beyond that.” Like reading National Geographic sometimes. It’s not just procedure, it’s also personal.

Question: what kind of licensing. Ooooh good question. Answer: it’s … complicated. We’re just seeking a non-exclusive license. When we work with photography, they have their own mad rights world. So without paying them, it would be hard to get them to shift to a creative commons world. My real goal is build a community with this. Questioner is curious because she’s Indian, and keen to take this stuff and use it in India.

Question: are you familiar Lindsay Publication books – they reprint old “How to Make Your Own Foundry”.

Comment from audience: woodworking magazines really good guide, as are cooking magazines.

One feature is how to do good project shots, which is deliberately so they get better pictures when people send stuff in.

Sorry, lost my note-taking ability then as I got into a discussion. Major bit of info: it’ll be 200 pages, quarterly. Subscription-driven, it sounds. Hybrid book/magazine: mook!

URL: http://make.oreilly.com/

Comments Off on notes on: make magazine

notes on: edd dumbill on doap»

Hi, confusing switches-of-first-person-and-third fans, and welcome back to my OSCON incoherentfest. You join us slamming into the Marriott, late enough to miss the redeye all-Dyson triple-slam keynote completely. Sorry. I ended up staying up too late trying to stop Ada bowling into Larry Wall’s legs, tipping him over Nat Torkington’s hotel balcony, and other acts of chaotic-god toddler action. So you get the first post-keynote session: Edd Dumbill’s (whose name I always misspell) DOAP talk.

Aim: to cut down the amount of work a software maintainer has to do to get the news out about their work. Too many project registries: Freshmeat, OSDir, GNOME software map, blah, blah, blah. Hard to keep up to date. Flip-side is if you’re trying to maintain your own registry, it’s hard to keep track.

Goals: it had to cope with internationalized descriptions. There had to be tools for the creation and consumption of these descriptions. It had to have interoperability with other Web metadata – FOAF, Dublic Core, RSS.

Use cases: easy importing of details into registries; exchange between registries; automatic configuration of resources – finding CVS repositories or bug trackers; and assisting packagers.

Tried to learn from recent metadata successes and failures. Dublin core is double-edged sword; mostly goodness, great documentation, raising awareness. Not done so well is that they underspecified in various way, so there are questions about how to use certain terms (what’s “author”? Name? E-mail address? URL?). RSS: very messy history, suffered from underspecification too. (Edd was involved in RSS 1.0, which was very notunderspecified, if i recall). ebXML is an electronic business vocabulary – boring, but they have schema and lots of documentation. HTML – hard to retrofit validation. Lessons Edd drew from this: docs, interop, schema, community.

XML or RDF, that is the question. Straightforward XML? Or RDF? XML comes in many flavours, though: well-formed, with an W3C XML schema (huge processing overhead; Edd doesn’t like), RELAX NG (feels a lot more lightweight, but has its own issues). RDF provides “webby data” – semantics as well as syntax. Edd likes RDF.

Surveyed existing work: Freshmeat, SourceForge, GNOME, KDE, Open Metadata Framework, Advogato. He here shows huge spreadsheet of relative features like mirror site lists, purchase links, demo site, license, etc. He thought the social relations between developers and projects was particularly important, because egoboo is so important. And screenshots! Must have screenshots!

Fields that weren’t anywhere else, but he stuck in: non-CVS repositories, wikis, more project roles: translator (woefully underrecognised), testers, etc. Spread the egoboo.

Biggest issue: how do you uniquely identify a project? Can’t use name, names change, names clash. How about a URL? But what URL? What happens if lose that domain? He picked homepage, but what if homepage moves? So Edd added “old homepage” property. If two DOAP descriptions share a homepage (either their current or their old homepage), then they’re the same project.

What about license? So that people can compare licenses in different projects, we give a unique URI for each license. He’s defined URI’s for common licenses. If you actually resolve those URIs you get an RDF file that points to the GNU site, etc. (Edd makes the argument that FSF might move their license, so he’s pointing to his own URIs. But what if Edd loses his domain or gets hit by a truck. Not sure this isn’t just adding an indirection, and not solving problem.)

Shows a simple DOAP file. Looks good, nice and simple: DOAP file pulls in FOAF and RDF namespaces to define a bunch of stuff. Looks easily createable with a template.

Tools: need a Creator, Viewer, and a Validator. Someone else wrote a DOAP-a-matic. There’s also a rel link for autodiscovering projects on Webpages. Someone else has written a firefox plugin that shows a colourful human-readable version of the DOAP data if a HTML page has a link to a DOAP file in it. Edd is writing a toolkit for validating, written in Mono.

Participants: OSDir.com and GNOME Software Map are already interested, looking to engage others. Needs ore tools, like autoconf and distutils, Makemaker to spit out info about the project they’re managing.

Q and As: Edd says he’s deliberately staying out of category discussions. You can add categories by just pointing to URI of the category — so you can point to Freshmeat categories, Debian tags, etc. And of course because it’s RDF, you can assert relationships between those categories.

Comments Off on notes on: edd dumbill on doap

skip to main bit

Oblomovka

usual, suspect

nearby

info

oblomov

Archives

Currently:

Archive for July 29th, 2004

2004-07-29»