skip to main bit
a man slumped on his desk, from 'The Sleep of Reason Produces
      Monsters'

Oblomovka

Currently:

Author Archive

2004-07-28

notes on: crash course in database design: surviving when you don’t have a dba

Okay, now I’m late arriving at Dirk Elmendorf’s crash course in database design. We join him just after he’s explained why you should care about db design, and where to bury your DBA’s body when you’ve accidentally poisoned his coffee. I think.

So first steps: you have entities, and relationships. Entities are bits of data, relationships are connections between them. You don’t have to have relationships between every entity (he’s really seen databases that are like that).

Easiest example: a one-to-one relationship. A professor and a private office, say. So one professor has one office. Then there’s one-to-many. One classroom may have many classes occur in it. Many to one: a class has many classrooms. Then there’s the complicated one, the many to many: if a class was repeated in a bunch of classes.

SQL doesn’t handle many-to-many. So you have to insert something in the middle, so you have a many-to-one, and then a one-to-many. So in this case, you create a new entity: a class schedule. A class relates to a single time and data, and a classroom relates to a single date and time.

(Editorial confession at this point: I learn everything I know about database design from futzing around with the Access visual designer. Relational DBs give me the heebie-geebiesSo I’m now hitting the extent of my knowledge. I may be mangling what Dirk is saying here- d.)

Building your database: use consistent coding standards (you putzes!). Single and plural table neames, upper case, underscore, camelcaps, special table prefixes — e.g ref_units, where “ref” is a prefix for contant reference values.

Normalisation! Hooray! Normalisation is a set of rules to designing schmea. It helps you reduce redundancy. Redundancy is bad because it causes errors, and wastes resources. (Ed: Also I think there’s a commandment against it in the Bible. Or there was until they refactored them).

It eliminates database inconsistencies -poorly laid out data can provide false statements.

Normalization – first Normal form. The easiest one: all records have to save the same “shape” – the same rows, the same data. One way to cheat is to have a fake array in your table, like “author_name1”, “author_name2”, “author_name3”. What happens when you have more than three authors? What do you do when you delete an author. Author3 sometimes becomes a comma-separated field of the rest of the authors. Yuk. Address1,2,3 is very common – but databases can store newlines now.

Second normal form. Row can’t have partial key dependences. So having a field in the employee table marked “office_location”, then you’re mixing ideas, because if you delete all the employees, your office_location will disappear too. Move it out to its own table.

Third normal form. One non-key field cannot be a fact about another non-key field. So you can’t have a book table, with a publisher *and* a publisher’s address. You need to break that out again – a separate publisher table.

Fourth normal form. A record should not contain two or more multi-valued facts which are independent. So a class table that has “room” and “professor”. You really want to move those out into their own classes, because those facts aren’t related.

(Ed: bascally, databases seem to crave to become triplets. I can see why people turn to RDF religion after this.)

Fifth normal form: information cannnot be represented by several smaller record types. So basically, once you’ve gone for the four normal forms, stop breaking out the bloody tables, because you’re GOING TOO FAR.

Normalization – you can have too much of a good thing. Start at a normalized database and work backwards as you need to optimize.

Practical tips: ideally, put data checking into a central library, and then make sure all data is run through it before it enters the database.

Unfortunately, the usual case is that you don’t have enough control over access to the database – the db is used by multiple applications, languages, and the integrity of the data is more important than performance. So you need to put data-checking inside the db.

A primary key is a non-null unique identifier for a row, a composite key is a collection of fields that give you a non-null unique identifies. A foreign key is a primary key that’s stored in a different table. A foreign key constraint means that a foreign key has to appear as a primary key in another table.

Cascading updates and deletes. Cascade allows you to handle foreign tables that your application may not even be aware of. ON UPDATE CASCADE, handles updating of the foreign key. On delete cascade deletes the row that is referencing a foreign key which is being deleted from the primary table of the foreign key. ON DELETE CASCADE is daaaaangerous. You might want it to delete logs, but you really don’t want all records of this person disappearing when you delete something. ON DELETE CASCADE SET NULL – so you can just make individual record values to disappear.

Column restaraints. Default to NOT NULL, because NULL can cause real problems with sorting, etc. UNIQUE – you can use this for multiple columns.

CHECK – stuff like CHECK( age > 0). You’re looking for data validity, but you’re not putting business logic in your database. You want to leave that to a proper DBA.

Triggers! Advanced data checking, or to scrub data – automatically lowercasing stuff. Can also handle more advanced clean up – so deleting a single table can cause a trigger to clean other tables that are related and log the event to a log table. But triggers are daaangerous for programmers, because they’re hard to maintain in the development cycle, and invisible when debugging.

Dirk’s DBA told him to talk more about indexes. Indexes are cheat-sheets for the database to improve performance. But you need to pay attention to actual queries to figure where they are needed. Index a lot, but remove the ones that are not being used. Don’t bother indexing a column which has a small amount of possible values for a lot of rows. Boolean/value ids that have a short list.

Conclusions: DB design isn’t new or cutting edge, so there’s a ton of literature out there to help you learn more about database.

Just because you don’t have are not a DBA doesn’t mean you should build on top of a poorly designed database.

notes on: subvert this! developing with subversion on mac os x

Wow, it’s *really* crowded here. I couldn’t get into the discussion of Power Laws (now! with real stats!), so I’ve nipped into Brian Fitzpatrick‘s guide to using SVN with the MacOS X. I’ve missed the first few minutes, so let’s join Brian as he finishes explaining how Subversion kicks CVS’s HEAD in.

(Offstage: oof, ow, ugh.) … Binding surfaces is big with Subversion. Having a lot of ways to plug in your own code into a system is good (CVS just has a pipe). The Apache foundation are big on big binding surfaces for glue, because that’s how they felt Apache beat out Netscape server

Reasons why Subversion has better binding than CVS: Subversion is written in ANSI C, so it plays well with others. It uses SWIG for external language bindings, so instant Perl and Python APIs (Ruby and others not yet supported because nobody has stepped up to take the bat). Java support is via JNI.

Then there’s the API promise: between 1.0 – 1.XX the API will be binary compatible.

Subversion’s dependencies. There’s the Apache Portable Runtime and APR util. This gave us the capability to run on any platform that Apache runs on. The other dependency is the SWIG, the Apache server, and the Berkeley DB. Oh, and Neon – a client library for DAV operations.

That looks like a lot, but in fact the only one you really need is the APR and APR utils. The rest of the stuff is mainly for the DAV support. (In subversion 1.0, you needed berkeley database if you’re running a server, but the latest version has a backend that uses flatfiles. Good for NFS.)

Subversion has a bunch of libraries. libsvn_client – primary interface for client programs, libsvn_delta is the tree and diff routines (first non-GPL diff engine), libsvn_fs_base is a Berkeley database filesystem library, libsvn_fs_fs – the flat file equivalent. (Filesystem is just a way of describing the db storage; it’s not a real filesystem). Libsvn_ra is the repository access common utilities, and then libsvn_ra_dav, libsvn_ra_local, and libsvn_ra_svn — for DAV client-server communication, local communication, and SVN, subversion’s own client-server protocol (same as pserver in CVS).

Then there’s libsvn_repos, which is the high-level interface. There’s libsvn_subr, which is a misc subroutines. Libsvn_wc is the stuff to cope with SVN/ directories, the equivalent of the CVS directory in checked out copies.

Two Apache modules: mod_authz_svn, a special authorisation module, and mod_dav_svn which handles dav requests and converts them into subversion actions.

(Aside – there’s a SVN plugin for Tortoise CVS. And a Finder plugin, cool!)

Now Brian is going to build a SVN tool, using the Subversion libraries, Xcode, Interface Builder, PyObjC. I have a feeling I will be hand-wavily describing lots of GUI development now…

The subversion team has been converting CVS repositories into SVN for testing purposes. Brian’s working with the Apache 1.3 SVN repository for this demo, which he has locally. The mini app he’ll build will be a program that lets you drag and drop files and see log data on it.

Okay, lots of Interface Builder widgety goodness. He’s using a Filewell palette third-party widget, which I think is this filewell.

More Interface Builder linking of outlets and actions to a controller object. This is making me crave doing programming in Xcode.

Hah! Brian cheated by cutting and pasting a wdoge of PyObjC code! It’s pretty clean – there’s just an “import Logger” statement at the top to pull in the SVN Apache SWIG libraries. Works great.

Tools: SCPlugin, which is the aforementionedFinder plugin, just source code. Full-featured, has all the actions on a right-click button (I didn’t even know Finder had plugins). Eclipse supports subversion, and Xcode 2.0 will support subversion in the glorious Tiger future that Steve promises us all.

a camper at camp smalltalk

On my way to the first day proper of O’Reilly’s Open Source Conference. It seems much busier than last year: I guess something’s improving, whether it’s the economy or business interest in Open Source, or just a fading away of people’s reluctance to tempt terrorist ire by coming out of the woods and onto a plane..

I’ve been in Portland for a week or so now, hanging out as a U.N. Observer at Camp Smalltalk, which is like Camp X-Ray only with objects. Actually, it was a load of fun. Smalltalk has a strong community culture, which I think is one of the reasons that it’s produced such a disproportionately large amount of good practices and useful meta-programming techniques. That, and that when you kick up a Smalltalk session. you can do a “View Source” on the entire operating system’s code. I like Smalltalk.

I got to see Ward Cunningham slinging index cards, and Ralph “Gang of Four” Johnson hacking code. L. Peter Deutsch, virtual machine pioneer, was there. After years working on Ghostscript, he’s been tempted back to Smalltalk, and spent the week porting the Python bytecode compiler to output to the modern Smalltalk VM. He estimates he might be able to get a 10-50 speedup by doing that. If only I could have kidnapped him and dragged him to help out Dan Sugalski, due to be pied by the Pythonistas this week for failing (just!) to speed up Python by porting it to the new Perl6 VM. What a fine mongrel VM that would be.

Other interesting stuff: if you’re interested in new language constructs, you really should check F-Script, a Smalltalkic scripting language for the Mac. It’s strongly tied to the Objective C object model – a bit too tied, in fact, so like ObjC you can’t create your own new classes at runtime, just instantiate objects, which is a bit limiting in a scripting environment. The real magic of F-Script, though, is OOPLA, which is a deeply splendid merging of Smalltalk and APL. No, no, don’t run away – it’s good, even if you don’t know either language.

OOPLA is the syntactic sugar that let’s you send an array of messages to an object. Given that everything in Smalltalk is an object, that means you can turn most basic operations into iterations; which removes most of the need for loops. If you know basic Smalltalk syntax, check out Chapter 16-19 of the FScriptGuide.pdf. If you don’t know Smalltalk, and have a Mac, read the whole thing, and have a play around. It’s fun. (Somebody will now tell me that Ruby does all of these things. Must. Learn. Ruby.)

Other stuff: the Website to Croquet, Alan Kay’s next generation 3D desktop environment, is showing off some more screenshots. Looks like the Squeak-based software is going to see the light of day next month.

I also met James Foster, who is working on a badly-needly simplification of that whole appalling bug-tracking, task management software space. Just looking over his shoulder was fun; I can’t wait to see the final results.

Okay, now I’m typing in the middle of Tim O’Reilly’s keynote, which is distractingly good. I’m sure other people will blog it better, but if anyone was wondering, when O’Reilly showed that book sales almost exactly matched the relative cost of adwords in for those keywords, it was me who very loudly went “woah!”. Information wants to be smuggled out via leaky patterns.

2004-07-12

linux user and developer

Aha. So, I’ve been writing a column for Linux User and Developer, the UK Linux magazine (which also did a fairly well-regarded expo). It’s a fun gig. I haven’t received payment for the last three months, which sadly doesn’t appear to be that unusual in the publishing industry. But you know, why should that stop me having fun?

Some time last week, LinuxUser’s pages on the Live Publishing site disappeared. Uh-oh.

I’ve just called Live Publishing, and their receptionist has instructed any queries to be redirected to Alan Coleman at Unity. Unity “provides constructive advice and effective solutions to all insolvency-related problems”.

Uh-oh.

Can’t get through to speak to Alan yet. I see the expo wing of the company has merged with LinuxWorld. I suspect that was a bit of a one-sided merge.

The odd thing is that, rather naively, I thought that LinuxUser was a magazine being published by Live Publishing rather than a separate company that could hit the skids all on its own. The LinuxUser pages in the Google cache all say “copyright Live Publishing”. I wonder how long it has been this separate, glorious, but sadly a bit cash-strapped institution?

Discuss

2004-07-11

magnetic storage future history

In December of 2002, I uploaded a screen-captured table showing IBM’s estimates for the cost of a terabyte over the next eight years. I couldn’t be bothered to convert it into HTML. Eighteen months on, Adrian Furby did just that. This shows there’s some “can I have some more”‘s law of the lazyweb or something, and that you should optimise for laziness and early public whining instead of planning ahead. I’ve added it to the page.

I just checked with NewEgg for prices, and with a $61 80GB harddrive, we’re four bucks short of right on target. We should be down to a sub-$500 terabyte some time next year.

2004-07-01

freeware, shareware and not-meant-to-be-eitherware

I’m downloading the demo version of a repartitioning application. I can’t help thinking this is a bad idea. “You’re 30 days are up! Now re-repartioning your drive.”

You have to admire their business savvy though, releasing a MacOS partition resizer the day that Tiger hits the file-sharing networks.

2004-06-30

i’m back

… with a motherboard and processor that’s almost exactly the same as the 1999 model that I was using before (except of course now it only costs $100 or so). About the only thing that tempted me to upgrade a bit was how much CPU power my desktop machine spends analysing spam. It’s like some devil mirror image of those worthy distributed processing projects: all those cycles being turned into nothing better than contributory global warming. There has to be some spin-off calculation that could be folded into this. Maybe we could track changes in people’s sexual proclivities through keyword analysis of pr0n spam? Maybe there’s some powerful thesis about word-recognition waiting to be deduced in all those viagra variants?

?

I went to the Jobs keynote, and didn’t tell you about it. I’m sorry. I started to write a blog entry, but it turned into a column. I’m writing too many columns; all my prose is puddling into the generic Flann O’Brien/ Beachcomber/ Richard Geefe knock-off is the end of all columnist’s minds.

I think the only bit that won’t go into the column is that I spent too much time at the keynote trying to work out whether Jobs was wearing the same jeans as last year. I don’t think he was. I think he has a million identical jeans, on a huge Matrix-style clothesrack in his flying Pixar palace.

My notes are full of this crap. Everyone else in the journalist’s compound was clattering into outliners about the H.264 and personal iChat servers and how it all interrelates with current market conditions. My notes say “iChat AV, 3 other people. OMG they ARE trapped like General Zod! Is woman he talking to wife? Wrong kind of flirty. Forced bonhomie. Are you killed if you fuckup Jobs demos? Maybe they have gunmen off unseen, off webcam. “But, darling, it was my net connection it was bursty I couldn’t hear what BLAM BLAM BLAMBLAM”.

I need to let this stuff out here more, before I go mad.

2004-06-23

my miiind (will be slow to answer mail in next few days)

My desktop machine’s motherboard is finally giving up the ghost (in a staccato, clattering, stutters to halt every half-hour death rattle kind of way). Its chief function is mail-wrangling. Without it I’m left to stare rather bemusedly at 2000 and counting mail messages a day.

The upshot of this is, I’m not going to be very good at answering mail for the next few days. Stick “OBHURRY!” in the subject line and I’ll spot you, though.

(Or you could IM me, which is what all the hip kids who have abandoned mail are doing. Rats! Rats leaving the ship!)

2004-06-15

layerone is over

LayerOne was great: in the same way as NotCon was intended to be a scaled down, more social version of commercial technical conferences like Emerging Tech, LayerOne was a scaled-down, more talky version of hacker cons lke DefCon.

As I said in the intro to NotCon, the real secret intent behind a lot of what we’ve done with NTK live events is to “cross the streams” – introduce disparate geek groups who are doing eminently combinable things, but who don’t usually meet. LayerOne was a fantastic example of that kind of crossover. (Justin Mason has already picked up a few potential anti-spam approaches from chatting to white hat computer underground types).

I didn’t get to see as much of the talks as I wanted, but what I did see was very, very good: Dan Kaminsky did his usual “pile one crazy but conceivable idea on top of another until you end up with something that’s impossible yet implemented” magic. You really don’t want to know what he’s been doing with DNS (especially if you’re a sysadmin) – but if you do, here’s the PowerPoint. David Hulton and Lance James showed just how professional ad hoc security audits are getting these days: David reverse-engineered a smartcard parking meter to show the potential exploits, and Lance did some serious cryptoanalysis on Trillian’s secure IM features, including a fascinating digression on how man-in-the-middle-attacks are eminently possible on cable-modem networks.

I’m looking forward to next year!

2004-06-11

if you’re in los angeles

If you’re in Los Angeles, come along to the Westin LAX LayerOne this weekend for a fantastic-looking conference. It should be fun. It will be for me, for I get to go to someone else’s con and imbibe like an overclocked imbibamatic-o-mat. As the invite says: free beer!

This will also (barring being boo-ed off stage) be the canonical version of the Life Hacks talk. I will try very hard to not handwave arbitrary statistics when in full flow. Nor will I guilelessly slander/fawn over prominent Net celebrities in the search of a good joke. Consecutive Life Hack talks have featured me describing one correspondent as a “genius” and inventing a fictional mano e mano fist-fight between us in the next. Neither are strictly true, as this canonical version will make clear.

It also means that I’ll finally slap up the MP3 and PowerPoint of the complete presentation, ending the mystery of the eight or so words that Cory has failed to meticulously transcribe in his notes.

Not that I’m unhappy he’s so detailed. Lacking any memory of what I say on stage, I reconstructed my original talk for NotCon from Cory’s Etech notes. I’m now adding bits from his NotCon coverage for the LayerOne talk. He’s the Boswell to my Mr Pooter!

Doctorow isn’t here for this rendition, which should hopefully stop us getting into a screeching feedback loop. But that’s all irrelevant anyway. Come tomorrow, there will only be one Life Hacks talk. All the rest were imaginary stories taking place on alternate worlds – which never happened. Excelsior!