Warning: Declaration of AVH_Walker_Category_Checklist::walk($elements, $max_depth) should be compatible with Walker::walk($elements, $max_depth, ...$args) in /home/ontolige/subdomains/transducer.ontoligent.com/wp-content/plugins/extended-categories-widget/4.2/class/avh-ec.widgets.php on line 62
Proposal for a Simple Semantic Web Blog - The Transducer

Proposal for a Simple Semantic Web Blog


The Seman­tic Web (SW), also known as Web 3.0, promis­es to trans­form the World Wide Web from a large, indexed col­lec­tion of hyper­linked doc­u­ments into a vast knowl­edge base of doc­u­ments, maps, and oth­er media forms that can be linked, remixed, and mashed up by intel­li­gent agents for a vari­ety of pur­pos­es. Your mis­sion is to cre­ate a pro­to­typ­i­cal con­tent devel­op­ment sys­tem — by mod­ding a blog or wiki, for exam­ple — that would allow for the mass cre­ation of mash­able seman­tic con­tent. Such a tool, if sim­ple enough to use, could make a viral con­tri­bu­tion to the emer­gence of the SW.


The key fea­ture of the SW (and the rea­son it’s called seman­tic in the first place) is the addi­tion of a lay­er of markup to the web’s doc­u­ments that describes, in both a machine and human read­able way, the “mean­ing” of what doc­u­ments con­tain. For exam­ple, instead of sim­ply ital­i­ciz­ing and cap­i­tal­iz­ing a string of text in a doc­u­ment to sig­ni­fy that it is a book title, one would wrap the string in a title tag, like so: “<title>Price and Prejudice</title>.” In addi­tion, these tags would be part of a larg­er frame­work, or ontol­ogy, that would pro­vide the oppor­tu­ni­ty for machines to dis­am­biguate fur­ther the mean­ings of the strings in documents.

With a crit­i­cal mass of prop­er­ly marked up doc­u­ments, it is easy to write pro­grams that can com­bine doc­u­ments to pro­duce oth­er doc­u­ments or media forms that sum­ma­rize or visu­al­ize the con­tent of the source doc­u­ments. In the lan­guage of rela­tion­al data­bas­es, SW markup allows pro­gram­mers to devel­op agents that can auto­mat­i­cal­ly per­form join-like queries in a doc­u­ment col­lec­tion, pro­duc­ing inter­est­ing and use­ful reports. (Object-ori­ent­ed data­base devel­op­ers may rec­og­nize the con­cept of tra­ver­sal here.)  The key is a shared set of tags and iden­ti­fiers among the documents.

The prob­lem, of course, is how to gen­er­ate a crit­i­cal mass of marked up doc­u­ments. Tra­di­tion­al SW pro­po­nents envi­sion a new regime of mark-up in which all doc­u­ments are writ­ten in accord­ing to a new­er, more com­plex dialect of XML to replace (x)HTML. This is known as the “bot­tom-up” approach to the SW. We know that’s nev­er going to hap­pen. If any­thing, Web 2.0 tech­nolo­gies have taught us that crit­i­cal mass­es of con­tent, in which net­work effects are pos­si­ble, require very low thresh­olds of par­tic­i­pa­tion. The rea­son blogs and wikis are so incred­i­bly suc­cess­ful is that they are incred­i­bly easy to use.

Anoth­er approach to the SW is the “top-down” approach. In this view, machines do a lot of the work to inter­pret what humans already know how to read. In effect, this view puts its faith in arti­fi­cial intel­li­gence and the mod­el of Google in being to tell what an ital­i­cized, cap­i­tal­ized string of text is in a doc­u­ment. (Projects like Cite­Seer have shown that this is pos­si­ble with tra­di­tion­al cita­tion styles in aca­d­e­m­ic doc­u­ments.) The prob­lem with this approach is that, beyond some sim­ple things like text ref­er­ences, it is very hard to write pro­grams that can read text they way humans do. We will get there at some point (prob­a­bly soon­er than we imag­ine), but we aren’t there now.

In between these views is a method that adopts the wis­dom of Web 2.0 appli­ca­tions — the use of micro­for­mats. Instead of try­ing to rein­vent either the way con­tent pro­duc­ers write, or the way search engines work, the micro­for­mats method pig­gy-backs on exist­ing mark-up and search prac­tices and adds the 20% of effort that may make the 80% dif­fer­ence (or maybe it’s 1 to 99). For exam­ple, RDFa is a stan­dard that allows users to add attrib­ut­es to exist­ing XHMTL doc­u­ments that pro­vides seman­tic con­tent. And there are already in use oth­er stan­dards for adding seman­tic con­tent to doc­u­ments: hCal­en­dar, FOAF, XFN, hCard, etc.

But these for­mats are still not easy enough to use. Ide­al­ly, one would have a user inter­face that makes it easy to add these attrib­ut­es to ele­ments as one writes them, some­thing as easy or eas­i­er than adding a track­back or a set of tags to a blog post. Imag­ine being able to block off an arbi­trary seg­ment of text, have a dia­log box appear ask­ing what kind of con­tent this is, and then adding a few sim­ple attrib­ut­es, per­haps with some AJAX-fed data fields to smooth the process. Imag­ine also being able to add any num­ber of micro­for­mats to a doc­u­ment, and get­ting them from a pub­lic repository.

Efforts to pro­duce such a tool would not be in vain. We know that there is already a pro­duc­tive mutu­al cal­i­bra­tion that goes on between con­tent providers and search engines. For exam­ple, there is a whole subindus­try in SEO — search engine opti­miza­tion — in which con­tent providers have demon­strat­ed their will­ing­ness to adapt con­tent to the ways of search engines, such as Google. And it works the oth­er way, too: all the buzz about Web 3.0 has led com­pa­nies like Yahoo! to incor­po­rate seman­tic web prin­ci­ples into their search engines, lead­ing to a new field of seman­tic search opti­miza­tion. The indus­try seems ripe for a spark to cat­alyze the var­i­ous devel­op­ments in the field of the seman­tic web — see Calais, Twine, and dbPe­dia for some examples.

The con­tent devel­op­ment tool sug­gest­ed by this pro­pos­al could be that spark. Such a spark could ignite a dynam­ic in which search engines will begin to seek out, priv­i­lege, and select for doc­u­ments with seman­tic con­tent. The pres­sure will then be on for con­tent providers — and I mean any­one who pro­duces web con­tent, not just big com­pa­nies — to shape up their con­tent for the search engines, and the forces of dis­tri­b­u­tion will pull pro­duc­tion in its con­sid­er­able draft.

A Prac­ti­cal Note

As for which con­tent devel­op­ment sys­tem to use, three come to mind as both stan­dard and exten­si­ble: Medi­aWi­ki, Word­Press, and Drupal.

Leave a Reply

Your email address will not be published. Required fields are marked *