July 2004
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
<<Jun Aug>>
Currently:
2004-07-29»
notes on: protecting your open discussion forum»
Jamie McCarthy is talking about how Slashdot defends itself from various
attacks (from DoS to "just jerks"). I managed to resist the temptation to sit
at the back shouting "FIRST POST!" until they threw me out.
So far there's been a great slice of the life of trolls, extracts from
their scripts, IRC chats, and the effects on sites like the Wil Wheaton site.
Jamie says he'll be putting his slides up on his
blog later.
"The more an attacker has to lose, the less likely they are to attack your
site." What if they've given you money? Or they might lose their job? Or lose
access to your site? Seeing is gaming means: if they know about the rule,
they'll try and beat it. So if you ban something, and they find out a way of
bypassing the ban, they'll find another way. On the other hand, if you remove
the visibility of the result, then they don't know they've won. If they can
score it, they will definitely game it. So trollers try to get slashdot
stories into the "Hall of Fame" site. Or trying to get everybody to make them
their slashdot friends. Same thing with Orkut: they have a top ten, so that
inspires people to increase worthless links. Classic example here is Slashdot
karma. We made two mistakes with karma. One: we called it karma. Second: we
made it a visible number that was unbounded. When we made it into an
adjective, karma-whoring collapsed. People want consistency -- if you change
the rules, innocent bystanders get mad. People don't mind draconian rules if
they're consistent.
Trad exploits are out of scope for this discussion. Forum attacks are
flooding, spamming. Doesn't have to be comments. Any link on your site is the
same: trackbacks are the same.
First defence is to make it expensive. Code
can't distinguish between an ingenious flood attack and a good discussion.
Always close out old discussions on topical sites. Make the attacker spend
time (geometric increases)
Or spend IPs. Open proxies are the main force multiplier attackers here: we
need to stop open http proxies. http is the new smtp. reputable anonymisers
aren't a problem, it's the 10000+ dumb open proxies. test ports 3128, 8080,
1080, 80 proxy port of people who comment.), accounts. Slashdot has a
LWP::UserAgent patched to cope with multiple proxy tests. People complain
about port scans, but they understand. Since April, when we implemented this,
the crap floods diminished. Ask me again in a year to see whether this has
worked.
Or spend accounts. You do have to assure that the email is valid by sending
them the confirmation link. Watch out for robo-created accounts. They will
usually just come from a single domain. Three hundred accounts from hotmail =
normal, email accounts from evilbadmail.com = suspicious. You can make new
users voices quieter - moderated by a human for the first or second time. If
the account has to participate in a human way, that helps. If you want a low
barrier entry for your site, you need to preserve newbie posting rights. And
if so, you can only track by IP.
As an example, someone at OSCON (trolls! trolls amongst us!) created an
account, and posted nasty comments so that the IP was banned. But it didn't
work, because we took into account that users of good standing can still post
(anonymous posting *was* banned, which explains why all my hot grits links got
dropped).
Captchas don't work. These simply require that humans need to be brieflyG
present. These don't reduce your rates enough -- only helpful if you're trying
to move actions from millions to thousands.
Other solutions: host troll discussions. Make it an easter egg, so they
think they're gaming you. They'll preserve the site; and you can see what
they're doing. Visit where they chat. Offer a bounty (Wil Wheaton offered a
$1000 that leads to prosecution). Or file a lawsuit. FreeRepublic filed for an
injunction. Instead of writing code to block a troll, they put out a legal
injunction.
Jamie's summary for protecting your open discussion forum:
-
Know their motivations.
-
Know your resources: coders, moderators, hardware.
-
Hide information judiciously.
-
Encourage investment.
-
Enforce rules consistently.
-
Disallow poisoned input.
-
Remove their leverage, like open proxies.
-
Give yourself leverage to view and clean up.
-
Design your game well.
notes on: book sales tell us about the state of the tech industry »
Tim
O'Reilly is exposing his market research - this is sort of the
sequel of
a great talk Tim gave at Foo camp where his team crunched through O'Reilly
book sales to try and work out new trends.
As ever, I missed the beginning. We join Mr O'Reilly as he is asserting
very carefully that they have no idea what the correlation of these stats are
with actual sales, then pointing out that MySql just overtook Oracle. The
stats themselves is careful picking over of U.S. book sales from BookScan (not
just O'Reilly). What follows is hand-wavey notes with me scrabbling to take
notes, but the actual graphs Tim says will be made available publically
soon.
.NET books just past Java last quarter of 2003.
Technology Supplier Market Share: Microsoft at 35%, all of open source at
around 20%.
Programming language market share: 70% open source (including Java), 30% is
proprietary (Visual BASIC, etc).
Version trend information -- Photoshop, Dreamweaver, Flash: sort of
bell-shaped, new versions spike and then slowly decline, total zig-zaggy but
basically constant. With Red Hat Linux, though, after Fedora, RH books sales
collapsed. Might RH have made a bad decision? Total market went down pretty
quickly.
Macintosh has 3% market share? Not in books - Mac books sell 25% of all
computer books. (Linux 17%, Windows 55%).
TIm's now showing a bunch of tree
maps which aren't really easy to describe. But that's okay, because you can play with them
yourself.
Question from audience: is there any geographical patterning? This is just
U.S. sales, although they have U.K. data. Wanted to sort by geography, but
didn't have time.
Question from audience: will you be publishing your categories, the
ontology you've deduced. Answer: yeah, we really want to in our copious free
time.
Are there any hotspots? Answer: Bay Area, Los Angeles, New York. Best tech
shop is in Virginia. Weird, except they're right next from Langley.
notes on: state of the unix kernel»
Greg Kroah-Hartman sez: we don't have a development kernel anymore. He's a
kernel developer for a long time, a kernel maintainer, PCI, USB, drivercore,
sysfs... "lotta crap like that".
Pre-Bitkeeper, this is how they did it: maintainer sends patches to Linus.
Wait. Resend if patches dropped. Lather, rinse, repeat.
January 2002 - patch penguin lkml flame fest. Worried about patchs getting
dropped. Linus decided to use BitKeeper, 2.5.3 first kernel. "License weenies
got their panties in bind, ah feh." (Greg is like the Vin Diesel of kernel
dev: I keep expecting him to shout "I! LOVE! THIS! DEVELOPMENT CYCLE!!!".)
BitKeeper change things a lot. Bitkeeper maintainers got a lot more work.
Linus sucks in all patches, discarded crappy stuff. Non-bk maintainers just
sent patches as it goes.
Unexpected consequences: we knew what Linus was doing, so much better
feedback. Bk2cvs and bk2svn trees. bkcommits-head mailing list so you could
you see what was flying around.
All patches started to go through Andrew Morton; BK stuff goes through
Linus.
Result: 2.6.0 = Two years of dev, 27149 different patches, 1.66 patches per
hour. At least 916 unique contributors. Top developers handled 6956 patches.
Ten patches a day for two years. People should research this data.
2.6.1 538 patches. 2.6.2 = 1472, 2.6.3 = 863 patches, 2.6.4 1385
patches.... keeps going. bY 2.6.7 2306 patches. Something is wrong. This is
the stable kernel, but we're going faster --- one million lines added, 700
thousand lines deleted: a third of the kernel. A *lot* of data, and this is
*stable*. Feeling uncomfortable yet?
All patches end up in the -mm tree. All bk trees end up iin the -mm tree.
17 different trees, acpi, agpgart, alsa, ... So we have a staging area.
And 2.6.7 is the most stable kernel we've ever had. So we're doing
something weird, but right.
The -mm tree *is* the development kernel. Patches can go in, and if they
suck, they can get out really quickly. Linus gets the good stuff, the -mm tree
is the tester's tree.
Almost everything is tested in the -mm tree before it goes to
Linus.
Near future: Linus releases 2.6.x, maintainers flood him with patches, all
have proved their stability in the -mm tree. We all recover, fix bugs. One
month later, a new 2.6 is released.
Will 2.7 ever happen them? Maybe, if we have big intrusive patches -- page
clustering, timer core rewrite, stuff that touches everything. Linus'll fork
2.7, all 2.6 changes will be merged into 2.7 daily. If 2.7 is unworkable, we
delete it (so save your 2.7 patches). If 2.7 works, we merge to 2.6, call it
2.8.
Summary: the kernel developers all like how this is working. No stable
internal-kernel API, never going to happen, get used to it (syscalls won't
break). Drivers outside of the tree are on their own (quote: "you're screwed")
(conclusion: get into the kernel, quick before it runs crazily away in a big
honking yellow bus of ever-accelerating development.) Proprietary people: if
you're not in the kernel, Greg says he doesn't know you exist. And if you
don't give back to the community, he doesn't care, either. ("Just my
opinion")
Everything subject to change at any time. These days, Linus is running the
stable tree, -mm is running the unstable tree. When they realise this, they
might change their mind about how things are doing.
Q&A: ISV like Oracle are going to go crazy, because they need to catch
up with Greg and the kernel developers. They can't just certify an ancient
kernel and stick with that anymore.
Q&A: We don't want the distros to fork, as happened when SUSE and
Redhat
settled on differenly patched 2.4.
Q&A: What about automated testing? Big companies have a test suite -
hopefully, they'll start running them every night. Novell has a giant test
deparment, and they're ramping up to do nightly test runs. If they can get
that under 24 hours, we have a big honking regression test.
Q&A: (from me) What are the limits on the speed of kernel development
now? "We're going so fast, there's a huge rate of change. Andrew seems to
scale pretty well; I'm maxed out; more people coming in. We'll find out what
the limits are."
Q&A: How many people working on Kernel at IBM? Answer: don't know, Greg
emphasises that he's only speaking for himself.
Q&A: When are you going to fix broken stuff, like Firewire? "Well, it
should build." (heckle: is that your testsuite then?) "Yes. We can't test
drivers. You need to write them themselves."
Q&A: The problem with automated tests for drivers is that it needs
hardware. Who is going to pay for that? "I've written drivers that I don't
have hardware for; I know someone who has debugged joystick drivers via Irc
between Czechoslavakia and Israel. 'Push the joystick left? Does it work?'"
Some stuff isn't easily testable. We need to fix the infrastructure. OSDL
trying to fix this.
Q&A: How can Andrew and Linus do anything but rubber-stamp thousands of
patches. Answer: we're not writing much code these days. It's a trust network
-- I trust people who send us stuff. And now we have a blame network, too, so
if somebody sends me something that breaks, I know whose fault it is.
Q&A: Is there still a problem with tracing origin of code? With BK, I
can resolve all lines to a email address and a name. The lawyers I speak to
say that's okay. All patches get a "signed off by: name and email". That's a
little more explicit. Not legally binding. Once again, it's a blame issue.
I've seen patches flow through five people. More people to blame!
Somebody was sending me code on plug and play, and it just seemed too good:
conformed with all the Microsoft specs, worked really well, made me very
suspicious. So I made him prove it, send me all the places that he found it
the stuff. And I was finally happy with the provenance. Later it turns out he
was seventeen years old. You can't tell where you're going to get excellent
code.
Oh, btw: Greg says there's a really good piece about this switch in
the latest issue of Linux Weekly News.
I don't have a subscription to LWN, for ridiculous personal reasons (I
feel odd using paid-for info to compile NTK. Same thing why I don't like
taking journalist freebies: I believe I will be struck by lightning if I
do. Previous experious indicates that belief to be justified.) But if I
wasn't so weirdly idiosyncratic, I'd pay their sub rates in a
shot.
notes on: Make magazine»
Make: Dale Dougherty, publisher of Make Magazine. Mark Frauenfelder is the
editor. Will be coming out in Spring. Here to have a discussion; they don't
have a presentation, they just want to work out what they want to do with it.
:hey're here to increase their pool of contributors, rather than
subscribers.
Three streams: Dale developed the Hack series for O'Reilly. Really smart
people are doing interesting clever nonobvious things with computers, and
other people would find useful. Another book project inspiration was the
"Hardware Hacking Projects For Geeks" - like replacing the sound chip in a
Furby.
Started because Dale was in a cab with Tim O'Reilly saying "there isn't a
Martha Stewart in the technology space - somebody who rediscovered and
recovered crafts and gave them to a wider public".
A move from mass-manufacturing to individual manufacturing. Creating
one-offs at home, using tech Dale's seen at MIT Media lab. Current magazines
are "cargo magazines" just telling you where to buy, not how to
manufacture.
Mark Frauenfelder up now. He said the idea reminded him of the old Forties
Popular Science magazines, back then it was cheaper to build than buy. Then
buying was cheaper, so we lost that knowledge. But it's always been a lot of
fun, especially when the roles may reverse again, at least for customised
stuff.
So example spread they're showing is of Kite Camera photography. Feature is
a low-cost, $10 camera with a silly putty timer. Ebay as a parts supplier for
everyone. (Looks good, although in the cutthroat British magazine industry, I
can already hear Future and Dennis leaping like jackals on these scraps and
sending their first Make rip-off to print before this talk even finishes.
"Make Format"? "Slaphappy Magazine" "Dig". Yeah, "Digger". With a soft-porn
centerfold section called "The Dirty Digger")
Going for a HOWTO feel, just to improve the overall documentation of these
products. It's valuable to preserve the dead-ends too, so people can take off
with those, so they'll be sticking
Somebody from audience suggesting talking to Fry's about "MAKE kits".
Popular Science used to offer "buy the plans" services (like Altair, or
Nascom). Mark wants to include the complete plans in the magazine itself.
Question: are you going to have a section for quick hacks? Yes: mobile,
home entertainment, etc. Guy asking says he's got a lot of hacks that are too
small even for Webpages.
Question: have you seen Readymade? They have a MacGyver challenge, similiar
sensibility. Mark used to work with the Readymade publisher. They see it
differently --- Readymade not as geeky as Make is going to be. Mark likes a
lot of Readymade, but Make involves technology. Followup question: what is
your definition of technology then? Large and blurry -- the spread of the tech
metaphor to other areas.
Question: what about the legality issue. It's an editorial decision, says
Dale. In a sense, the Hacks book isn't written for hackers, it's written for
people who can learn from hackers. But the grey areas move -- burning CDs was
a grey area a few years ago, and now? There's a big difference between doing a
hack, describing it, and then building it from the plans.
Dale: "we see the O'Reilly audience as a core, but we want expand beyond
that." Like reading National Geographic sometimes. It's not just procedure,
it's also personal.
Question: what kind of licensing. Ooooh good question. Answer: it's ...
complicated. We're just seeking a non-exclusive license. When we work with
photography, they have their own mad rights world. So without paying them, it
would be hard to get them to shift to a creative commons world. My real goal
is build a community with this. Questioner is curious because she's Indian,
and keen to take this stuff and use it in India.
Question: are you familiar Lindsay Publication books - they reprint old
"How to Make Your Own Foundry".
Comment from audience: woodworking magazines really good guide, as are
cooking magazines.
One feature is how to do good project shots, which is deliberately so they
get better pictures when people send stuff in.
Sorry, lost my note-taking ability then as I got into a discussion. Major
bit of info: it'll be 200 pages, quarterly. Subscription-driven, it sounds.
Hybrid book/magazine: mook!
URL: http://make.oreilly.com/
notes on: Edd Dumbill on DOAP»
Hi, confusing switches-of-first-person-and-third fans, and welcome back to
my OSCON incoherentfest. You join us slamming into the Marriott, late enough
to miss the redeye all-Dyson triple-slam keynote completely. Sorry. I ended up
staying up too late trying to stop Ada bowling into Larry Wall's legs, tipping
him over Nat Torkington's hotel balcony, and other acts of chaotic-god toddler
action. So you get the first post-keynote session: Edd Dumbill's (whose name I
always misspell) DOAP talk.
Aim: to cut down the amount of work a software maintainer has to do to get
the news out about their work. Too many project registries: Freshmeat, OSDir,
GNOME software map, blah, blah, blah. Hard to keep up to date. Flip-side is if
you're trying to maintain your own registry, it's hard to keep track.
Goals: it had to cope with internationalized descriptions. There had to be
tools for the creation and consumption of these descriptions. It had to have
interoperability with other Web metadata - FOAF, Dublic Core, RSS.
Use cases: easy importing of details into registries; exchange between
registries; automatic configuration of resources - finding CVS repositories or
bug trackers; and assisting packagers.
Tried to learn from recent metadata successes and failures. Dublin core is
double-edged sword; mostly goodness, great documentation, raising awareness.
Not done so well is that they underspecified in various way, so there are
questions about how to use certain terms (what's "author"? Name? E-mail
address? URL?). RSS: very messy history, suffered from underspecification too.
(Edd was involved in RSS 1.0, which was very notunderspecified, if i recall).
ebXML is an electronic business vocabulary - boring, but they have schema and
lots of documentation. HTML - hard to retrofit validation. Lessons Edd drew
from this: docs, interop, schema, community.
XML or RDF, that is the question. Straightforward XML? Or RDF? XML comes in
many flavours, though: well-formed, with an W3C XML schema (huge processing
overhead; Edd doesn't like), RELAX NG (feels a lot more lightweight, but has
its own issues). RDF provides "webby data" - semantics as well as syntax. Edd
likes RDF.
Surveyed existing work: Freshmeat, SourceForge, GNOME, KDE, Open Metadata
Framework, Advogato. He here shows huge spreadsheet of relative features like
mirror site lists, purchase links, demo site, license, etc. He thought the
social relations between developers and projects was particularly important,
because egoboo is so important. And screenshots! Must have screenshots!
Fields that weren't anywhere else, but he stuck in: non-CVS repositories,
wikis, more project roles: translator (woefully underrecognised), testers,
etc. Spread the egoboo.
Biggest issue: how do you uniquely identify a project? Can't use name,
names change, names clash. How about a URL? But what URL? What happens if lose
that domain? He picked homepage, but what if homepage moves? So Edd added "old
homepage" property. If two DOAP descriptions share a homepage (either their
current or their old homepage), then they're the same project.
What about license? So that people can compare licenses in different
projects, we give a unique URI for each license. He's defined URI's for common
licenses. If you actually resolve those URIs you get an RDF file that points
to the GNU site, etc. (Edd makes the argument that FSF might move their
license, so he's pointing to his own URIs. But what if Edd loses his domain or
gets hit by a truck. Not sure this isn't just adding an indirection, and not
solving problem.)
Shows a simple DOAP file. Looks good, nice and simple: DOAP file pulls in
FOAF and RDF namespaces to define a bunch of stuff. Looks easily createable
with a template.
Tools: need a Creator, Viewer, and a Validator. Someone else wrote a
DOAP-a-matic. There's also a rel link for autodiscovering projects on
Webpages. Someone else has written a firefox plugin that shows a colourful
human-readable version of the DOAP data if a HTML page has a link to a DOAP
file in it. Edd is writing a toolkit for validating, written in Mono.
Participants: OSDir.com and GNOME Software Map are already interested,
looking to engage others. Needs ore tools, like autoconf and distutils,
Makemaker to spit out info about the project they're managing.
Q and As: Edd says he's deliberately staying out of category discussions.
You can add categories by just pointing to URI of the category -- so you can
point to Freshmeat categories, Debian tags, etc. And of course because it's
RDF, you can assert relationships between those categories.
2004-07-28»
notes on: perl lightning talks, impressionistically rendered»
Stumbling into the Perl Lightning Talks now. Randal Schwartz (looks
like Randal. Certainly wearing Randal-like clothes. He's the Hooter's guy,
right? I always get him and Tom Phoenix confused. Okay, definitely Randal.)
Anyway, he's written a CGI replacement that uses Class::Prototyped to create a
proper MVC-style object interface for Web applications. The stub class
implements a default Web app, and you just stick in your own methods which
customise it. I wonder if this is how WebObjects works? They worked out how
to structure it by looking at oodles of existing CGI apps.
I don't know what the name of this class is (for I am an idiot), but it'lll
be out on CPAN soonest. Look for the Hooters guy.
(Ed: The whole set-up is far better explained by Randal himself in this Linux
Magazine article from Feb 2004.)
Perl is too slow! is the name of the next talk.
Perl's compile cycle is slow. Mod_perl solves this. But what about other
environments?
Why can't we create a generic stub for any program?
This guy (known as anonymous for the purposes of these notes) saw Speedy
CGI, tried to use that, but couldn't get it to work for
command line environment.
So he wrote pperl. It used to be a big backend that sat communicating via
STDIN and STDOUT over a Unix domain socket. But no STDERR or weird files.
Richard Clamp from London.pm came to help. He's a veeery scary Unix hacker
who knows how to use send_fd() and recv_fd() to send file descriptors over
Unix domain sockets. Yes, that scary.
Result: /usr/bin/pperl is about ten times faster than normal Perl
BUT THAT'S NOT GOOD ENOUGH!
"String matching in Perl is slow" (controversy!)
Well, it's *naive*. Converting string matches into C speeds things up, but
it's still naive.
Aho Corasick algorithm is faster. 150 times faster.
Text::QSearch implements Aho Corasick algorithm, will be on CPAN (once
lawyers look it over) in a few weeks.
The next guy (it's all guys so far) doesn't like CVS. Which is bad, because
he maintains a lot of modules, so he gets a lot of patches. He wants a VCS
system where everybody has access to the repository, but prevents complete
madmen (apart from him) from trashing it.
There is this (GPL'd) config management system thing called Aegis. So under Aegis, you can
declare a change, and put it into a task list. And someone else can pick up
something and check out the data. When you commmit, though, you have to pass
tests, and build: the sort of thing you have to bolt onto CVS.
Then after that it stays in a waiting room, and review the code. So Mr
Module owner can check the code, kick it back or accept it.
He's offering AEGIS accounts at ... dammit. E-mail addresses are meant for
projectors and wikis, not audio. Well, ask on #perl I imagine.
This guy's name is Thomas, and he's from Amsterdam. He faced a challenge
that he couldn't code his way out of. He went to the developing world, and
taught people how to use open source code, but when these people went away to
work on it, they weren't online, so they couldn't use it. Enter: NGO In A Box,
which is prepackaged open source software for NGOs to use.
(Ed: Everybody really likes NGO in a box. Big applause, not least because
it was the shortest of the talks.)
Next Perl talk I sacrificed to the God of tidying up the other
descriptions. It was something big and clever and Microsofty that Boeing uses.
Sorry.
Mark Jason Dominus can't be here
because he has a new baby girl! Ahhhh.
Andy Lester is up. He likes to test stuff. He's here to talk about
"prove" which is part of the Test harness. It is like make test, and is
now in core Perl. It will *change how you think about tests*, he says.
Prove spits out better diagnostics than make test, so you can file better
bug reports.
Test first and prove are best friends. (Not sure why, I guess because make
test isn't.)
(Ed: prove looks like a TestRunner for Perl. Which is cool, although I'm
surprised Perl didn't have one already.)
Check out more info at "prove --man" with latest Perl distribs, and
http://qa.perl.org/
David Turner. Works for RMS. Starts by quoting the "spider in the hands of
an angry Lord" preach. "Licensing is not theology - it's rocket science". He's
a fine preacher.
Parrot licensing will cast you all into hell, he tells the audience, and lo
they are sore afraid. Go ye through the Parrot source, and stick there
proper copyright notices. Don't put "all rights reserved", on fear of
your mortal soul, because that the Lord RMS doth not think that it doth mean
what thou thinkest it means. Put it instead under the GPL or the Artistic
License. But don't put it under the Artistic License, put it under the
Clarified Artistic License, for the Artistic License as it stands is sorely
artistic, and lo it is ambiguous in many areas. And Brad Kuhn did come down
from on high and suggest that the CAL be the license for Perl6, and he spake
truth, for ye will be sent to Hell if you do not heed him! For he is the
prophet of RMS, and the one who is to come that is greater, who is known as
Hurd, and whose todo list is legion. (I am paraphrasing a fair bit here)
Some copyright notices say "(c) The Perl Foundation". You need to
get signed declarations for copyright re-assignment, it's the Law. Talk to the
FSF and we'll sort it out to you. Amen, brother!
Enough. I go find daughter so that I can chase her in giggling circles
around the hotel.
notes on: Mono 1.0»
I turned up on time for Miguel's
Mono talk. Unfortunately, even though Miguel is the fastest talker in
the world, he didn't get very deeply into the cool, unknown stuff, so it's
mostly cool known stuff.
They just released Mono 1.0. It's the result of three years of work.
Mostly assembled by members of the community; roughly 300 people. According to
the stats, they got a lot more code done than other projects -- probably
because it was easy to compartmentalise. Each class could be built in
isolation.
What is Mono? It's an open source implementation of .Net. It's cross
platform implementation of a virtual machine, an SDK, and a bunch of class
libraries. Linux, of course, plus MacOS X, Solaris, etc, and Windows.
Windows doesn't need Mono, but fifty percent of the Mono contributors use
Windows primarily. A lot of people are looking at a migration path away from
Windows.
We support: C#, Java and Nemerle. In preview, VB.NET, Jscript, and Python.
Mono 1.0 doesn't have Windows.Forms, EnterpriseServices or
InstallationServices. But they do have Gtk# and Gnome.net, bindings for
building desktop applications on Linux. Also they have a Cairo (graphical
subsystem) substrate. Lot of third-party database support, Relax NG.
Documentation is lifted mostly from the ECMA spec ("We're bad, but not as
bad as most open source software"). The Documentation system has a "wiki"
feature, so you can enter and upload contributions using the help system.
Mono is now the official Novell desktop development platform. We're using
it initially for new software, and for extending existing software (there's
APIs for using Mono as an extension system). Examples include: Beagle (a
filing-system extension which does Spotlightlike metadata features, demo'ed
it in Norway to 300 developers six hours before Steve Jobs demo'ed the same
stuff in Tiger), Dashboard, F-Spot, Evolution 2. None of that is shipping,
it's for the next version of the desktop. Nat's department is doing all the
interesting desktop stuff. We've mixed up the kernel and desktop people to
make sure that they're not working in isolation.
Roll-outs: Voelcker uses Mono to run 400 servers with 150,000 users ported
an ASP.NET application to Unix.
Reuse: we can run existing .Net apps written in C# or VB. Third party
compilers in Eiffel, Ada, Fortran, C/C++ etc.
Java. We have a JIT
that converts Java bytecodes into the .NET VM. You can run C# and Java code
side-by-side. We use the GNU Classpath, which means we have the same
limitations (no Swing, etc). Applications like Eclipse run out the box.
There are two stacks: there's the ASP.NET/ADO.NET/Windows.Forms stack
which is the "Microsoft Compatibility Libraries". And then the rest of the
code, which is free software running on both Windows and Linux. The Mozilla
bindings don't work on both, and the Gnome APIs don't either. But everything
else is cross-platform between Windows and *nix. We have a nice Rendezvous
stack someone wrote at Novell.
Who develops Mono? Novell has twenty engineers working full-time on Mono,
and 300 developers from open source community. Also help from "nameless
embedded system vendors", Mainsoft - a product that hooks to VisualStudio
which lets you run ASP.NET stuff on J2EE servers, SourceGear.
Where are we going? Continue to improve Unix, Gnome, Cocoa. We're building
new things in .Net, iFolder 3 (multi-user, open source), Beagle (WinFS),
Novell Dashboard.
Mono 1.2 - incremental update, debugger, Cocoa 1.0, Gtk# 1.2,
Windows.Form
Mono 2.0 - ASP.Net, ADO.NET, System.Xml, Windows.Form
2.0
Currently: Lots of optimisations. SSA Partial Redundancy Elimination, cool
ex-Mozilla SportsModel garbage collector.
Out of time!
notes on: crash course in database design: surviving when you don't have a dba »
Okay, now I'm late arriving at Dirk Elmendorf's crash course in database
design. We join him just after he's explained why you should care about db
design, and where to bury your DBA's body when you've accidentally poisoned
his coffee. I think.
So first steps: you have entities, and relationships. Entities are bits of
data, relationships are connections between them. You don't have to have
relationships between every entity (he's really seen databases that are like
that).
Easiest example: a one-to-one relationship. A professor and a private
office, say. So one professor has one office. Then there's one-to-many. One
classroom may have many classes occur in it. Many to one: a class has many
classrooms. Then there's the complicated one, the many to many: if a class was
repeated in a bunch of classes.
SQL doesn't handle many-to-many. So you have to insert something in the
middle, so you have a many-to-one, and then a one-to-many. So in this case,
you create a new entity: a class schedule. A class relates to a single time
and data, and a classroom relates to a single date and time.
(Editorial confession at this point: I learn everything I know about
database design from futzing around with the Access visual designer.
Relational DBs give me the heebie-geebiesSo I'm
now hitting the extent of my knowledge. I may be mangling what Dirk is saying
here- d.)
Building your database: use consistent coding standards (you putzes!).
Single and plural table neames, upper case, underscore, camelcaps, special
table prefixes -- e.g ref_units, where "ref" is a prefix for contant reference
values.
Normalisation! Hooray! Normalisation is a set of rules to designing schmea.
It helps you reduce redundancy. Redundancy is bad because it causes errors,
and wastes resources. (Ed: Also I think there's a commandment against it in
the Bible. Or there was until they refactored them).
It eliminates database inconsistencies -poorly laid out data can provide
false statements.
Normalization - first Normal form. The easiest one: all records have to
save the same "shape" - the same rows, the same data. One way to cheat is to
have a fake array in your table, like "author_name1", "author_name2",
"author_name3". What happens when you have more than three authors? What do
you do when you delete an author. Author3 sometimes becomes a comma-separated
field of the rest of the authors. Yuk. Address1,2,3 is very common - but
databases can store newlines now.
Second normal form. Row can't have partial key dependences. So having a
field in the employee table marked "office_location", then you're mixing
ideas, because if you delete all the employees, your office_location will
disappear too. Move it out to its own table.
Third normal form. One non-key field cannot be a fact about another non-key
field. So you can't have a book table, with a publisher *and* a publisher's
address. You need to break that out again - a separate publisher table.
Fourth normal form. A record should not contain two or more multi-valued
facts which are independent. So a class table that has "room" and "professor".
You really want to move those out into their own classes, because those facts
aren't related.
(Ed: bascally, databases seem to crave to become triplets. I can see why
people turn to RDF religion after this.)
Fifth normal form: information cannnot be represented by several smaller
record types. So basically, once you've gone for the four normal forms, stop
breaking out the bloody tables, because you're GOING TOO FAR.
Normalization - you can have too much of a good thing. Start at a
normalized database and work backwards as you need to optimize.
Practical tips: ideally, put data checking into a central library, and then
make sure all data is run through it before it enters the database.
Unfortunately, the usual case is that you don't have enough control over
access to the database - the db is used by multiple applications, languages,
and the integrity of the data is more important than performance. So you need
to put data-checking inside the db.
A primary key is a non-null unique identifier for a row, a composite key is
a collection of fields that give you a non-null unique identifies. A foreign
key is a primary key that's stored in a different table. A foreign key
constraint means that a foreign key has to appear as a primary key in another
table.
Cascading updates and deletes. Cascade allows you to handle foreign tables
that your application may not even be aware of. ON UPDATE CASCADE, handles
updating of the foreign key. On delete cascade deletes the row that is
referencing a foreign key which is being deleted from the primary table of the
foreign key. ON DELETE CASCADE is daaaaangerous. You might want it to delete
logs, but you really don't want all records of this person disappearing when
you delete something. ON DELETE CASCADE SET NULL - so you can just make
individual record values to disappear.
Column restaraints. Default to NOT NULL, because NULL can cause real
problems with sorting, etc. UNIQUE - you can use this for multiple
columns.
CHECK - stuff like CHECK( age > 0). You're looking for data validity,
but you're not putting business logic in your database. You want to leave that
to a proper DBA.
Triggers! Advanced data checking, or to scrub data - automatically
lowercasing stuff. Can also handle more advanced clean up - so deleting a
single table can cause a trigger to clean other tables that are related and
log the event to a log table. But triggers are daaangerous for programmers,
because they're hard to maintain in the development cycle, and invisible when
debugging.
Dirk's DBA told him to talk more about indexes. Indexes are cheat-sheets
for the database to improve performance. But you need to pay attention to
actual queries to figure where they are needed. Index a lot, but remove the
ones that are not being used. Don't bother indexing a column which has a
small amount of possible values for a lot of rows. Boolean/value ids that
have a short list.
Conclusions: DB design isn't new or cutting edge, so there's a ton of
literature out there to help you learn more about database.
Just because you don't have are not a DBA doesn't mean you should build on
top of a poorly designed database.
notes on: subvert this! developing with subversion on mac os x »
Wow, it's *really* crowded here. I couldn't get into the discussion of
Power Laws (now! with real stats!), so I've nipped into Brian Fitzpatrick's guide
to
using SVN with the MacOS X. I've missed the first few minutes, so let's join
Brian as he finishes explaining how Subversion kicks CVS's HEAD in.
(Offstage: oof, ow, ugh.) ... Binding surfaces is big with Subversion.
Having a lot of ways to plug in your own code into a system is good (CVS just
has a pipe). The Apache foundation are big on big binding surfaces for glue,
because that's how they felt Apache beat out Netscape server
Reasons why Subversion has better binding than CVS: Subversion is written
in ANSI C, so it plays well with others. It uses SWIG for external language
bindings, so instant Perl and Python APIs (Ruby and others not yet supported
because nobody has stepped up to take the bat). Java support is via JNI.
Then there's the API promise: between 1.0 - 1.XX the API will be binary
compatible.
Subversion's dependencies. There's the Apache Portable Runtime and APR
util. This gave us the capability to run on any platform that Apache runs on.
The other dependency is the SWIG, the Apache server, and the Berkeley DB. Oh,
and Neon
- a client library for DAV operations.
That looks like a lot, but in fact the only one you really need is
the APR and APR utils. The rest of the stuff is mainly for the DAV support.
(In subversion 1.0, you needed berkeley database if you're running a server,
but the latest version has a backend that uses flatfiles. Good for NFS.)
Subversion has a bunch of libraries. libsvn_client - primary interface for
client programs, libsvn_delta is the tree and diff routines (first non-GPL
diff engine), libsvn_fs_base is a Berkeley database filesystem library,
libsvn_fs_fs - the flat file equivalent. (Filesystem is just a way of
describing the db storage; it's not a real filesystem). Libsvn_ra is the
repository access common utilities, and then libsvn_ra_dav, libsvn_ra_local,
and libsvn_ra_svn -- for DAV client-server communication, local communication,
and SVN, subversion's own client-server protocol (same as pserver in CVS).
Then there's libsvn_repos, which is the high-level interface. There's
libsvn_subr, which is a misc subroutines. Libsvn_wc is the stuff to cope with
SVN/ directories, the equivalent of the CVS directory in checked out
copies.
Two Apache modules: mod_authz_svn, a special authorisation module, and
mod_dav_svn which handles dav requests and converts them into subversion
actions.
(Aside - there's a SVN plugin for Tortoise CVS. And a Finder plugin,
cool!)
Now Brian is going to build a SVN tool, using the Subversion libraries,
Xcode, Interface Builder, PyObjC. I have a feeling I will be hand-wavily
describing lots of GUI development now...
The subversion team has been converting CVS repositories into SVN for
testing purposes. Brian's working with the Apache 1.3 SVN repository for this
demo, which he has locally. The mini app he'll build will be a program that
lets you drag and drop files and see log data on it.
Okay, lots of Interface Builder widgety goodness. He's using a Filewell
palette third-party widget, which I think is this filewell.
More Interface Builder linking of outlets and actions to a controller
object. This is making me crave doing programming in Xcode.
Hah! Brian cheated by cutting and pasting a wdoge of PyObjC code! It's
pretty clean - there's just an "import Logger" statement at the top to
pull in
the SVN Apache SWIG libraries. Works great.
Tools: SCPlugin, which is the
aforementionedFinder plugin, just source code. Full-featured, has all the
actions on a right-click button (I didn't even know Finder had plugins).
Eclipse supports subversion, and Xcode 2.0 will support subversion in the
glorious Tiger future that Steve promises us all.
a camper at camp smalltalk»
On my way to the first day proper of O'Reilly's Open Source Conference.
It seems much busier than last year: I guess something's improving, whether
it's the economy or business interest in Open Source, or just a fading away of
people's reluctance to tempt terrorist ire by coming out of the woods and onto
a plane..
I've been in Portland for a week or so now, hanging out as a U.N. Observer
at Camp Smalltalk, which
is like Camp X-Ray only with objects. Actually, it was a load of fun.
Smalltalk has a strong community culture, which I think is one of the reasons
that it's produced such a disproportionately large amount of good practices
and useful meta-programming techniques. That, and that when you kick up a
Smalltalk session. you can do a "View Source" on the entire operating system's
code. I like Smalltalk.
I got to see Ward
Cunningham slinging index cards, and Ralph "Gang of Four" Johnson
hacking code. L. Peter
Deutsch, virtual machine pioneer, was there. After years working on
Ghostscript, he's been tempted back to Smalltalk, and spent the week porting
the Python bytecode compiler to output to the modern Smalltalk VM. He
estimates he might be able to get a 10-50 speedup by doing that. If only I
could have kidnapped him and dragged him to help out Dan Sugalski, due to be pied by
the Pythonistas this week for failing (just!) to speed up Python by porting it
to the new Perl6 VM. What a fine mongrel VM that would be.
Other interesting stuff: if you're interested in new language constructs,
you really should check F-Script, a
Smalltalkic scripting language for the Mac. It's strongly tied to the
Objective C object model - a bit too tied, in fact, so like ObjC you can't
create your own new classes at runtime, just instantiate objects, which is a
bit limiting in a scripting environment. The real magic of F-Script, though,
is OOPLA, which is a deeply splendid merging of Smalltalk and APL. No,
no, don't run away - it's good, even if you don't know either language.
OOPLA is the syntactic sugar that let's you send an array of messages to an
object. Given that everything in Smalltalk is an object, that means you can
turn most basic operations into iterations; which removes most of the need for
loops. If you know basic Smalltalk syntax, check out Chapter 16-19 of the FScriptGuide.pdf.
If you don't know Smalltalk, and have a Mac, read the whole thing, and have a
play around. It's fun. (Somebody will now tell me that Ruby does all of these
things. Must. Learn. Ruby.)
Other stuff: the Website to Croquet, Alan Kay's next generation
3D desktop environment, is showing off some more
screenshots. Looks like the Squeak-based software is going to see the
light of day next month.
I also met James
Foster, who is working on a badly-needly simplification of that whole
appalling bug-tracking, task management software space. Just looking over his
shoulder was fun; I can't wait to see the final results.
Okay, now I'm typing in the middle of Tim O'Reilly's keynote, which is
distractingly good. I'm sure other
people will blog it better, but if anyone was wondering, when O'Reilly
showed that book sales almost exactly matched the relative cost of adwords in
for those keywords, it was me who very loudly went "woah!". Information wants
to be smuggled out via leaky patterns.
petit disclaimer:
My employer has enough opinions of its own, without having to have mine too.