Bottom-up semantic web

Once upon a time, there was structured markup, and there was hypertext, and they were good.

Or at least, they seemed good to those who cared. Which was a small but sophisticated community, working on things like SGML and Xanadu. Hypertext systems were complicated and centrally maintained. After all, that was the only way to make everything work.

And then, along came a Brit working in Switzerland, named Tim Berners-Lee. He took those beautiful hypertext systems, and he broke them. In their place, he created a little tool called the World Wide Web.

The earth shook. (Or, so we can imagine in hindsight.)

For the Web was very good. Very, very good. Sure, you could put a link on a Web page to a site that didn’t exist, or to a site that stopped existing at some later point. Anyone who clicked on that link would get an ugly “404 not found” error. No matter. The Web still worked. And because there was no central system to prevent errors and collisions, it was suddenly far easier for anyone to publish on the Web. Before long, we had Yahoo!, Amazon.com, eBay, Mathir, an expensive sock puppet, the Google IPO, and various and sundry other epochal developments.

So far, this story is a paraphrase of things Clay Shirky has written far more eloquently here, here, and probably elsewhere.

The irony, of course, is that Berners-Lee has lately spent much of his energy into something called the semantic Web. (Clay nails this one too, as does Cory Doctorow.) The semantic Web looks for all the world like the bad, old pre-Web top-down approach to structured markup. The core idea is a good one: adding structure to the Web would make the vast assemblage of information on the global network even more powerful. If computers knew that a certain page was talking about books, or that a certain photo was of a new Macintosh computer, or any number of other structural relationships, they could do wonderful things with that information. Google and other tools have shown that we can get further than expected with brute force and clever algorithms operating on unstructured text. But semantic structure is still the next great frontier for the Web.

Well, guess what. It’s happening. Just not in the way the Semantic Web proponents have been advocating.

The latest and greatest example of the bottom-up semantic Web in action is tags. Tags are user-created labels for objects on the Web, such as pages and photos. Using a tool such as Del.icio.us (for bookmark links) or Flikr for photos, anyone can assign tags. Once objects are tagged, users can search on those tags and retrieve human-categorized results. Technorati recently introduced tag search across blog posts, del.icio.us bookmarks, and Flikr photos, with the ability to tag other types of objects as well.

What’s cool about this is that, in true Web spirit, it simply ignores the biggest problems with a decentralized system. I might think something belongs under a “politics” tag that you categorize differently. Or, different users will tag the same item in inconsistent ways. Not to mention that, to take a trivial example, “blogs,” “weblogs,” and “Web logs” might all refer to the same thing, but be treated as distinct tags. So what. Tags work well enough to be useful, despite not being perfect. Just like the Web vs. SGML, just like Ethernet vs. token ring networking, the lightweight, decentralized solution wins.

And it gets better.

The exciting part of tags is that they fit together with mechanisms to build open programmatic interfaces to Web resources. A tag category, for example, can easily become an RSS syndication feed. And more. Lots of smart people, and many startups, are coming up with intruiging applications of these new capabilities.

The semantic Web is dead. Long live the semantic web.