Currently:
Archive for July 28th, 2004
2004-07-28»
notes on: perl lightning talks, impressionistically rendered»
Stumbling into the Perl Lightning Talks now. Randal Schwartz (looks like Randal. Certainly wearing Randal-like clothes. He’s the Hooter’s guy, right? I always get him and Tom Phoenix confused. Okay, definitely Randal.) Anyway, he’s written a CGI replacement that uses Class::Prototyped to create a proper MVC-style object interface for Web applications. The stub class implements a default Web app, and you just stick in your own methods which customise it. I wonder if this is how WebObjects works? They worked out how to structure it by looking at oodles of existing CGI apps.
I don’t know what the name of this class is (for I am an idiot), but it’lll be out on CPAN soonest. Look for the Hooters guy.
(Ed: The whole set-up is far better explained by Randal himself in this Linux Magazine article from Feb 2004.)
Perl is too slow! is the name of the next talk.
Perl’s compile cycle is slow. Mod_perl solves this. But what about other environments?
Why can’t we create a generic stub for any program?
This guy (known as anonymous for the purposes of these notes) saw Speedy CGI, tried to use that, but couldn’t get it to work for command line environment.
So he wrote pperl. It used to be a big backend that sat communicating via STDIN and STDOUT over a Unix domain socket. But no STDERR or weird files.
Richard Clamp from London.pm came to help. He’s a veeery scary Unix hacker who knows how to use send_fd() and recv_fd() to send file descriptors over Unix domain sockets. Yes, that scary.
Result: /usr/bin/pperl is about ten times faster than normal Perl
BUT THAT’S NOT GOOD ENOUGH!
“String matching in Perl is slow” (controversy!)
Well, it’s *naive*. Converting string matches into C speeds things up, but it’s still naive.
Aho Corasick algorithm is faster. 150 times faster.
Text::QSearch implements Aho Corasick algorithm, will be on CPAN (once lawyers look it over) in a few weeks.
The next guy (it’s all guys so far) doesn’t like CVS. Which is bad, because he maintains a lot of modules, so he gets a lot of patches. He wants a VCS system where everybody has access to the repository, but prevents complete madmen (apart from him) from trashing it.
There is this (GPL’d) config management system thing called Aegis. So under Aegis, you can declare a change, and put it into a task list. And someone else can pick up something and check out the data. When you commmit, though, you have to pass tests, and build: the sort of thing you have to bolt onto CVS.
Then after that it stays in a waiting room, and review the code. So Mr Module owner can check the code, kick it back or accept it.
He’s offering AEGIS accounts at … dammit. E-mail addresses are meant for projectors and wikis, not audio. Well, ask on #perl I imagine.
This guy’s name is Thomas, and he’s from Amsterdam. He faced a challenge that he couldn’t code his way out of. He went to the developing world, and taught people how to use open source code, but when these people went away to work on it, they weren’t online, so they couldn’t use it. Enter: NGO In A Box, which is prepackaged open source software for NGOs to use.
(Ed: Everybody really likes NGO in a box. Big applause, not least because it was the shortest of the talks.)
Next Perl talk I sacrificed to the God of tidying up the other descriptions. It was something big and clever and Microsofty that Boeing uses. Sorry.
Mark Jason Dominus can’t be here because he has a new baby girl! Ahhhh.
Andy Lester is up. He likes to test stuff. He’s here to talk about “prove” which is part of the Test harness. It is like make test, and is now in core Perl. It will *change how you think about tests*, he says.
Prove spits out better diagnostics than make test, so you can file better bug reports.
Test first and prove are best friends. (Not sure why, I guess because make test isn’t.)
(Ed: prove looks like a TestRunner for Perl. Which is cool, although I’m surprised Perl didn’t have one already.)
Check out more info at “prove –man” with latest Perl distribs, and http://qa.perl.org/
David Turner. Works for RMS. Starts by quoting the “spider in the hands of an angry Lord” preach. “Licensing is not theology – it’s rocket science”. He’s a fine preacher.
Parrot licensing will cast you all into hell, he tells the audience, and lo they are sore afraid. Go ye through the Parrot source, and stick there proper copyright notices. Don’t put “all rights reserved”, on fear of your mortal soul, because that the Lord RMS doth not think that it doth mean what thou thinkest it means. Put it instead under the GPL or the Artistic License. But don’t put it under the Artistic License, put it under the Clarified Artistic License, for the Artistic License as it stands is sorely artistic, and lo it is ambiguous in many areas. And Brad Kuhn did come down from on high and suggest that the CAL be the license for Perl6, and he spake truth, for ye will be sent to Hell if you do not heed him! For he is the prophet of RMS, and the one who is to come that is greater, who is known as Hurd, and whose todo list is legion. (I am paraphrasing a fair bit here)
Some copyright notices say “(c) The Perl Foundation”. You need to get signed declarations for copyright re-assignment, it’s the Law. Talk to the FSF and we’ll sort it out to you. Amen, brother!
Enough. I go find daughter so that I can chase her in giggling circles around the hotel.
Comments Off on notes on: perl lightning talks, impressionistically rendered
notes on: mono 1.0»
I turned up on time for Miguel’s Mono talk. Unfortunately, even though Miguel is the fastest talker in the world, he didn’t get very deeply into the cool, unknown stuff, so it’s mostly cool known stuff.
They just released Mono 1.0. It’s the result of three years of work. Mostly assembled by members of the community; roughly 300 people. According to the stats, they got a lot more code done than other projects — probably because it was easy to compartmentalise. Each class could be built in isolation.
What is Mono? It’s an open source implementation of .Net. It’s cross platform implementation of a virtual machine, an SDK, and a bunch of class libraries. Linux, of course, plus MacOS X, Solaris, etc, and Windows. Windows doesn’t need Mono, but fifty percent of the Mono contributors use Windows primarily. A lot of people are looking at a migration path away from Windows.
We support: C#, Java and Nemerle. In preview, VB.NET, Jscript, and Python.
Mono 1.0 doesn’t have Windows.Forms, EnterpriseServices or InstallationServices. But they do have Gtk# and Gnome.net, bindings for building desktop applications on Linux. Also they have a Cairo (graphical subsystem) substrate. Lot of third-party database support, Relax NG.
Documentation is lifted mostly from the ECMA spec (“We’re bad, but not as bad as most open source software”). The Documentation system has a “wiki” feature, so you can enter and upload contributions using the help system.
Mono is now the official Novell desktop development platform. We’re using it initially for new software, and for extending existing software (there’s APIs for using Mono as an extension system). Examples include: Beagle (a filing-system extension which does Spotlightlike metadata features, demo’ed it in Norway to 300 developers six hours before Steve Jobs demo’ed the same stuff in Tiger), Dashboard, F-Spot, Evolution 2. None of that is shipping, it’s for the next version of the desktop. Nat’s department is doing all the interesting desktop stuff. We’ve mixed up the kernel and desktop people to make sure that they’re not working in isolation.
Roll-outs: Voelcker uses Mono to run 400 servers with 150,000 users ported an ASP.NET application to Unix.
Reuse: we can run existing .Net apps written in C# or VB. Third party compilers in Eiffel, Ada, Fortran, C/C++ etc.
Java. We have a JIT that converts Java bytecodes into the .NET VM. You can run C# and Java code side-by-side. We use the GNU Classpath, which means we have the same limitations (no Swing, etc). Applications like Eclipse run out the box.
There are two stacks: there’s the ASP.NET/ADO.NET/Windows.Forms stack which is the “Microsoft Compatibility Libraries”. And then the rest of the code, which is free software running on both Windows and Linux. The Mozilla bindings don’t work on both, and the Gnome APIs don’t either. But everything else is cross-platform between Windows and *nix. We have a nice Rendezvous stack someone wrote at Novell.
Who develops Mono? Novell has twenty engineers working full-time on Mono, and 300 developers from open source community. Also help from “nameless embedded system vendors”, Mainsoft – a product that hooks to VisualStudio which lets you run ASP.NET stuff on J2EE servers, SourceGear.
Where are we going? Continue to improve Unix, Gnome, Cocoa. We’re building new things in .Net, iFolder 3 (multi-user, open source), Beagle (WinFS), Novell Dashboard.
Mono 1.2 – incremental update, debugger, Cocoa 1.0, Gtk# 1.2, Windows.Form
Mono 2.0 – ASP.Net, ADO.NET, System.Xml, Windows.Form 2.0
Currently: Lots of optimisations. SSA Partial Redundancy Elimination, cool ex-Mozilla SportsModel garbage collector.
Out of time!
Comments Off on notes on: mono 1.0
notes on: crash course in database design: surviving when you don’t have a dba»
Okay, now I’m late arriving at Dirk Elmendorf’s crash course in database design. We join him just after he’s explained why you should care about db design, and where to bury your DBA’s body when you’ve accidentally poisoned his coffee. I think.
So first steps: you have entities, and relationships. Entities are bits of data, relationships are connections between them. You don’t have to have relationships between every entity (he’s really seen databases that are like that).
Easiest example: a one-to-one relationship. A professor and a private office, say. So one professor has one office. Then there’s one-to-many. One classroom may have many classes occur in it. Many to one: a class has many classrooms. Then there’s the complicated one, the many to many: if a class was repeated in a bunch of classes.
SQL doesn’t handle many-to-many. So you have to insert something in the middle, so you have a many-to-one, and then a one-to-many. So in this case, you create a new entity: a class schedule. A class relates to a single time and data, and a classroom relates to a single date and time.
(Editorial confession at this point: I learn everything I know about database design from futzing around with the Access visual designer. Relational DBs give me the heebie-geebiesSo I’m now hitting the extent of my knowledge. I may be mangling what Dirk is saying here- d.)
Building your database: use consistent coding standards (you putzes!). Single and plural table neames, upper case, underscore, camelcaps, special table prefixes — e.g ref_units, where “ref” is a prefix for contant reference values.
Normalisation! Hooray! Normalisation is a set of rules to designing schmea. It helps you reduce redundancy. Redundancy is bad because it causes errors, and wastes resources. (Ed: Also I think there’s a commandment against it in the Bible. Or there was until they refactored them).
It eliminates database inconsistencies -poorly laid out data can provide false statements.
Normalization – first Normal form. The easiest one: all records have to save the same “shape” – the same rows, the same data. One way to cheat is to have a fake array in your table, like “author_name1”, “author_name2”, “author_name3”. What happens when you have more than three authors? What do you do when you delete an author. Author3 sometimes becomes a comma-separated field of the rest of the authors. Yuk. Address1,2,3 is very common – but databases can store newlines now.
Second normal form. Row can’t have partial key dependences. So having a field in the employee table marked “office_location”, then you’re mixing ideas, because if you delete all the employees, your office_location will disappear too. Move it out to its own table.
Third normal form. One non-key field cannot be a fact about another non-key field. So you can’t have a book table, with a publisher *and* a publisher’s address. You need to break that out again – a separate publisher table.
Fourth normal form. A record should not contain two or more multi-valued facts which are independent. So a class table that has “room” and “professor”. You really want to move those out into their own classes, because those facts aren’t related.
(Ed: bascally, databases seem to crave to become triplets. I can see why people turn to RDF religion after this.)
Fifth normal form: information cannnot be represented by several smaller record types. So basically, once you’ve gone for the four normal forms, stop breaking out the bloody tables, because you’re GOING TOO FAR.
Normalization – you can have too much of a good thing. Start at a normalized database and work backwards as you need to optimize.
Practical tips: ideally, put data checking into a central library, and then make sure all data is run through it before it enters the database.
Unfortunately, the usual case is that you don’t have enough control over access to the database – the db is used by multiple applications, languages, and the integrity of the data is more important than performance. So you need to put data-checking inside the db.
A primary key is a non-null unique identifier for a row, a composite key is a collection of fields that give you a non-null unique identifies. A foreign key is a primary key that’s stored in a different table. A foreign key constraint means that a foreign key has to appear as a primary key in another table.
Cascading updates and deletes. Cascade allows you to handle foreign tables that your application may not even be aware of. ON UPDATE CASCADE, handles updating of the foreign key. On delete cascade deletes the row that is referencing a foreign key which is being deleted from the primary table of the foreign key. ON DELETE CASCADE is daaaaangerous. You might want it to delete logs, but you really don’t want all records of this person disappearing when you delete something. ON DELETE CASCADE SET NULL – so you can just make individual record values to disappear.
Column restaraints. Default to NOT NULL, because NULL can cause real problems with sorting, etc. UNIQUE – you can use this for multiple columns.
CHECK – stuff like CHECK( age > 0). You’re looking for data validity, but you’re not putting business logic in your database. You want to leave that to a proper DBA.
Triggers! Advanced data checking, or to scrub data – automatically lowercasing stuff. Can also handle more advanced clean up – so deleting a single table can cause a trigger to clean other tables that are related and log the event to a log table. But triggers are daaangerous for programmers, because they’re hard to maintain in the development cycle, and invisible when debugging.
Dirk’s DBA told him to talk more about indexes. Indexes are cheat-sheets for the database to improve performance. But you need to pay attention to actual queries to figure where they are needed. Index a lot, but remove the ones that are not being used. Don’t bother indexing a column which has a small amount of possible values for a lot of rows. Boolean/value ids that have a short list.
Conclusions: DB design isn’t new or cutting edge, so there’s a ton of literature out there to help you learn more about database.
Just because you don’t have are not a DBA doesn’t mean you should build on top of a poorly designed database.
Comments Off on notes on: crash course in database design: surviving when you don’t have a dba
notes on: subvert this! developing with subversion on mac os x»
Wow, it’s *really* crowded here. I couldn’t get into the discussion of Power Laws (now! with real stats!), so I’ve nipped into Brian Fitzpatrick‘s guide to using SVN with the MacOS X. I’ve missed the first few minutes, so let’s join Brian as he finishes explaining how Subversion kicks CVS’s HEAD in.
(Offstage: oof, ow, ugh.) … Binding surfaces is big with Subversion. Having a lot of ways to plug in your own code into a system is good (CVS just has a pipe). The Apache foundation are big on big binding surfaces for glue, because that’s how they felt Apache beat out Netscape server
Reasons why Subversion has better binding than CVS: Subversion is written in ANSI C, so it plays well with others. It uses SWIG for external language bindings, so instant Perl and Python APIs (Ruby and others not yet supported because nobody has stepped up to take the bat). Java support is via JNI.
Then there’s the API promise: between 1.0 – 1.XX the API will be binary compatible.
Subversion’s dependencies. There’s the Apache Portable Runtime and APR util. This gave us the capability to run on any platform that Apache runs on. The other dependency is the SWIG, the Apache server, and the Berkeley DB. Oh, and Neon – a client library for DAV operations.
That looks like a lot, but in fact the only one you really need is the APR and APR utils. The rest of the stuff is mainly for the DAV support. (In subversion 1.0, you needed berkeley database if you’re running a server, but the latest version has a backend that uses flatfiles. Good for NFS.)
Subversion has a bunch of libraries. libsvn_client – primary interface for client programs, libsvn_delta is the tree and diff routines (first non-GPL diff engine), libsvn_fs_base is a Berkeley database filesystem library, libsvn_fs_fs – the flat file equivalent. (Filesystem is just a way of describing the db storage; it’s not a real filesystem). Libsvn_ra is the repository access common utilities, and then libsvn_ra_dav, libsvn_ra_local, and libsvn_ra_svn — for DAV client-server communication, local communication, and SVN, subversion’s own client-server protocol (same as pserver in CVS).
Then there’s libsvn_repos, which is the high-level interface. There’s libsvn_subr, which is a misc subroutines. Libsvn_wc is the stuff to cope with SVN/ directories, the equivalent of the CVS directory in checked out copies.
Two Apache modules: mod_authz_svn, a special authorisation module, and mod_dav_svn which handles dav requests and converts them into subversion actions.
(Aside – there’s a SVN plugin for Tortoise CVS. And a Finder plugin, cool!)
Now Brian is going to build a SVN tool, using the Subversion libraries, Xcode, Interface Builder, PyObjC. I have a feeling I will be hand-wavily describing lots of GUI development now…
The subversion team has been converting CVS repositories into SVN for testing purposes. Brian’s working with the Apache 1.3 SVN repository for this demo, which he has locally. The mini app he’ll build will be a program that lets you drag and drop files and see log data on it.
Okay, lots of Interface Builder widgety goodness. He’s using a Filewell palette third-party widget, which I think is this filewell.
More Interface Builder linking of outlets and actions to a controller object. This is making me crave doing programming in Xcode.
Hah! Brian cheated by cutting and pasting a wdoge of PyObjC code! It’s pretty clean – there’s just an “import Logger” statement at the top to pull in the SVN Apache SWIG libraries. Works great.
Tools: SCPlugin, which is the aforementionedFinder plugin, just source code. Full-featured, has all the actions on a right-click button (I didn’t even know Finder had plugins). Eclipse supports subversion, and Xcode 2.0 will support subversion in the glorious Tiger future that Steve promises us all.
Comments Off on notes on: subvert this! developing with subversion on mac os x
a camper at camp smalltalk»
On my way to the first day proper of O’Reilly’s Open Source Conference. It seems much busier than last year: I guess something’s improving, whether it’s the economy or business interest in Open Source, or just a fading away of people’s reluctance to tempt terrorist ire by coming out of the woods and onto a plane..
I’ve been in Portland for a week or so now, hanging out as a U.N. Observer at Camp Smalltalk, which is like Camp X-Ray only with objects. Actually, it was a load of fun. Smalltalk has a strong community culture, which I think is one of the reasons that it’s produced such a disproportionately large amount of good practices and useful meta-programming techniques. That, and that when you kick up a Smalltalk session. you can do a “View Source” on the entire operating system’s code. I like Smalltalk.
I got to see Ward Cunningham slinging index cards, and Ralph “Gang of Four” Johnson hacking code. L. Peter Deutsch, virtual machine pioneer, was there. After years working on Ghostscript, he’s been tempted back to Smalltalk, and spent the week porting the Python bytecode compiler to output to the modern Smalltalk VM. He estimates he might be able to get a 10-50 speedup by doing that. If only I could have kidnapped him and dragged him to help out Dan Sugalski, due to be pied by the Pythonistas this week for failing (just!) to speed up Python by porting it to the new Perl6 VM. What a fine mongrel VM that would be.
Other interesting stuff: if you’re interested in new language constructs, you really should check F-Script, a Smalltalkic scripting language for the Mac. It’s strongly tied to the Objective C object model – a bit too tied, in fact, so like ObjC you can’t create your own new classes at runtime, just instantiate objects, which is a bit limiting in a scripting environment. The real magic of F-Script, though, is OOPLA, which is a deeply splendid merging of Smalltalk and APL. No, no, don’t run away – it’s good, even if you don’t know either language.
OOPLA is the syntactic sugar that let’s you send an array of messages to an object. Given that everything in Smalltalk is an object, that means you can turn most basic operations into iterations; which removes most of the need for loops. If you know basic Smalltalk syntax, check out Chapter 16-19 of the FScriptGuide.pdf. If you don’t know Smalltalk, and have a Mac, read the whole thing, and have a play around. It’s fun. (Somebody will now tell me that Ruby does all of these things. Must. Learn. Ruby.)
Other stuff: the Website to Croquet, Alan Kay’s next generation 3D desktop environment, is showing off some more screenshots. Looks like the Squeak-based software is going to see the light of day next month.
I also met James Foster, who is working on a badly-needly simplification of that whole appalling bug-tracking, task management software space. Just looking over his shoulder was fun; I can’t wait to see the final results.
Okay, now I’m typing in the middle of Tim O’Reilly’s keynote, which is distractingly good. I’m sure other people will blog it better, but if anyone was wondering, when O’Reilly showed that book sales almost exactly matched the relative cost of adwords in for those keywords, it was me who very loudly went “woah!”. Information wants to be smuggled out via leaky patterns.
Comments Off on a camper at camp smalltalk