main bit This page looks very fancy in a modern browser, with "stylesheets" and "layout" and thing, but frankly I prefer the way you're seeing it here. Congratulations for not crumbling to the Browser Upgrade Initiative! Support the Web Designer Downgrade Conclusion!
a man slumped on his desk, from 'The Sleep of Reason Produces Monsters'

Oblomovka

oblomovitis

latest entry

this year
2006
2006
2005
2004
2003
2002
2001

rss

search entries:

usual, suspect

need to know

haddock

boingboing

current thrills

Thinking List

Delicious Links

EFF DeepLinks

sponsors

David McBride

Adewale Oshineye

Diggory, Andrew, and Matt R.

writing

ancient notes

why I like 802.11
senate committee letter
oscon2003
ms and free software

code

ubiquity
webolodeon
wat
tagling
haiku

info

e-mail

homepage

pgp etc

amazon wishlist

oblomov

the book

Tagling

Wow, it's messy in this website's CMS. It's like I'm typing this *directly into my webserver*. Anyway.

I've always rather admired Yahoo!'s Content Analysis Term Extraction Web Service, even if it's not the most mash-up-tastic of APIs. Still, I did think that it might work bolted onto that other Web 2.0 theme, tagging.

This is what term extraction means: you feed Yahoo! a pile of text from a document via its API, and it spits out what it thinks are the significant words or phrases in that sample. This little Ruby program uses Yahoo! Term Extraction on text you supply via STDIN, then gives you (by default) a list of suggested Technorati tags for the text. If you give the program "-txt" as an option, it'll just spit out the tags in human readable plaintext.

Yahoo! will generally give you more tags than you reasonably need, so the results usually require manual pruning. Here's the tags it suggested for the text on this page.

      danny% w3m -dump -T text/html tagling.php3 | tagling

















Some of those are from the secret text you see if you read this page in a text browser, and I'm not sure what depths of my subconscious it was ploughing at the end, but not bad.

Here's the code. I may move it from here if it grows any bigger, but right now you might as well just cut and paste. Don't worry about trying to get developer tokens from Yahoo -- they dole them out on an application basis, so the pre-cooked ones in the code should do you fine.

#!/usr/bin/env ruby -w
###
# tagling - Use Yahoo to spit out tags for a piece of text
###
#

require 'CGI'
require 'rexml/document'
require 'net/http'

appid = 'tagling1.0'
api_uri = URI.parse('http://api.search.yahoo.com/ContentAnalysisService/V1/termExtraction')

text = STDIN.read 
i = Net::HTTP.post_form(api_uri, { 'appid' => appid, 'context' => text  } )

i = REXML::Document.new i.body

i.each_element("//Result") do |a| 
    t = a.text
    puts case ARGV[0] 
    when '-txt' then t
    else  %(<a href="http://technorati.com/tag/#{CGI.escape(t)}" rel="tag">#{t}</a>)
    end
end