Corante is an interesting collection of blogs. Unfortunately, it renders so slowly in Firefox that I basically can’t stand reading it. I don’t know if it’s Firefox’s problem (IE seems to do ok), but in the meantime it’d be nice if the Corante folks fixed it (yes, I mailed a comment in)…
I’m going to be on a few planes over the next two weeks (to San Diego for ETech, to San Francisco for a customer visit, and to DC for PyCon). I like to read when I travel, and there’s a bunch of stuff that I’ve been meaning to get to. As batteries run out and eyes tire, I like to occasionally print the more substantive pieces on dead trees. Also, I occasionally print stuff to share with people who, for reasons as diverse as network security, poor eyesight, or simple personal preferences, would rather read on paper than on a screen (it’s much easier to share Malcolm Gladwell’s piece on SUVs around the playground if it’s on paper).
And I have to say that way too many blogs are basically unprintable. Simply too many of them are clearly never tested for printability (uh, I should test my own… phew, it’s a bit of a small font but at least the layout’s just fine).
So, if you have a blog, please print one of your own entries once in a while, and see what it’s like. Also, if you have a popular blog where people leave lots of comments, consider making it possible to print your words and not all the comments.
There, rant over, I feel better now.
Doing a little digging on the topic of my last post, I was poking around nature.com, and found connotea, which is described as a derivative of del.icio.us. It is apparently similar to an independent effort called CiteULike.
At first, it seems like an awful lot of duplication — the core is basically a clone of del.icio.us. The biggest difference seems to be that it seems to think of URLs as handles to actual bibliographic entries, which are extracted at bookmarking time from the pages being bookmarked, and the bibliographic handle is the “primary key” (I wonder what happens if two URLs point to the same biblio entry). The analysis works on a few major sites so far, including pubmed and Amazon. Having the bibliographic data then lets them do integration with citation management software (like EndNote). If enough of one’s sources are found online, then I can certainly see that as being a useful tool — I spent way too much time entering LaTeX bibliographies over the years.
But is the new feature “worth” having a segregated social bookmarking service (and data pool) just for scientists?
First, will it work? Assuming that the system is bootstrapped, my guess is: probably. The social aspect of del.icio.us, i.e. the tag-sharing, link-exploring and folksonomy-building will probably work just fine in a “vertical” community such as scientists or lawyers (assuming a high enough degree of participation). The profession-specific shared bookmarking service could very well make folksonomy development go a tad faster, within well-defined communities with a shared jargon (although I feel that jargon semantics don’t carry across subfields, with one field’s definition of a term quite at odds with another’s). Paul Kedrosky will be happy to see another vertical search concept (if he doesn’t know about it already!).
Apart from the duplication of effort, which is only theoretically bad, one obvious downside of the verticalization of the tool is that people doing interdisciplinary work (e.g. scientific lawyers, aka patent lawyers) will probably suffer from the compartmentalization of the meta-data — but they’re used to it by now.
Most interesting to me is the notion that the folks at Nature may have figured out a possible new feature/concept for systems like del.icio.us. Maybe it’s worth considering the possibilities that follow from doing more in-depth analysis of the “stuff” being bookmarked, and extracting the key parts of the content of interest, as opposed to focusing (as technologists would naturally do) on the “simple bit”, i.e. the URL. After all, the URL isn’t what’s interesting — it’s the stuff in the page that is.
As an example, this morning I bookmarked the page on gawker that was my introduction to the Starbucks corporate anthem (warning, it’s depressing as hell). I bookmarked the page because “it was there” — but it would be nice for the system to know that what’s key about that page is the link to the MP3 file — not just the words that Gawker uses to introduce it. If others have bookmarked another page that happens to include the same link, del.icio.us wouldn’t let me know about it. A version of something like Connotea that knew about link structures might.
As my kids say, very instering.
Due in part to my initial waffling about domain names (and subsequent struggles with Apache and WordPress configuration), bloglines now has multiple different URLs which map to the same actual feed (david.ascher.ca, http://www.ascher.ca/blog, ascher.ca/blog, ascher.ca/wordpress, etc.). This isn’t a problem (I hope) for readers, because of DNS redirects, and Apache rewrite rules. However, it’s a very minor annoyance for me when it comes to understanding my readership trends (as represented by the bloglines contingent) — I have to look at the statistics for 9 different bloglines id’s.
Are there any was to tell Bloglines that some feeds should be merged? (I have tried to do “permanent redirects”, but I’m not convinced that it’s had any effect on the bloglines database structure).
More broadly: it’s interesting that bloglines didn’t put in the infrastructure to let people “claim” feeds, the way Technorati does. It would allow authors to help bloglines serve bloglines readers, to automate URL changes and the like, thereby making them a higher-value aggregator from the author’s point of view (note that I don’t yet understand the value in claiming a technorati feed).
Note to self: figure out why inbound trackbacks don’t seem to work. My first guess is that it’s the Apache URL rewriting.
Note to the world: trackbacks, pingbacks etc. are still way too obtuse.
Interesting bit found in my webserver access logs:
220.127.116.11 - - [08/Nov/2004:00:07:32 -0500] "GET /blog/index.cgi/?flav=rss HTTP/1.0" 404 24542 "-" "Twisted PageGetter"
Googling “Twisted PageGetter” confirms that it’s a spider that comes with Twisted Python. Once more, Python’s in the web spidering business, this times w/ bloglines.
I just figured out one reason why my old blog was slow — I was including the HTML snippet from technorati which pulls the JS code from their servers — it caused serious delays on page rendering. Out with that, and in with the simper version.