Software


Applications

Figura

Figures and manuscript page
Fig­ures and man­u­script page

An inter­ac­tive crit­i­cal edi­tion of Chré­tien de Troyes’s Le Cheva­lier de la Char­rette (Lancelot, ca. 1180),  an Old French Arthuri­an Romance stud­ied by the late Pro­fes­sor Karl Uit­ti of Prince­ton Uni­ver­si­ty.  Figu­ra inte­grates fac­sim­i­le images and TEI-encod­ed diplo­mat­ic tran­scrip­tions of eight man­u­script trad­tions (held in libraries around the world), the 1989 Foulet-Uit­ti crit­i­cal edi­tion of the text, the gram­mat­i­cal data asso­ci­at­ed with each word in the crit­i­cal edi­tion, and an exhaus­tive col­lec­tion of the rhetor­i­cal fig­ures that schol­ars have mapped onto the text.  These fig­ures include adnom­i­na­tio, chi­as­mus, enjamb­ment, ora­tio rec­ta, ora­tio obli­qua, and rich rhyme.

From a tech­ni­cal per­spec­tive, Figu­ra is a cus­tom web appli­ca­tion designed to present the sev­er­al lay­ers of text and data asso­ci­at­ed with the Prince­ton Char­rette Project.   Because of the com­plex, non-over­lap­ping nature of the text and its inter­pre­tive layers–from the phys­i­cal to the rhetorical–as well as the fact that the encod­ing process involved many simul­ta­ne­ous edi­tors in both Europe and Amer­i­ca, a non-XML solu­tion was cho­sen to man­age the data and its pre­sen­ta­tion.  How­ev­er, the entire­ty of the data­base is exportable as an XML doc­u­ment, the FAUX Char­rette.

The appli­ca­tion was first writ­ten in Cocoon, but then migrat­ed to PHP for rea­sons of speed and portability.

TextGarden

View of Hebrew text
View of Hebrew text

An inter­ac­tive crit­i­cal edi­tion of the Geniza archive asso­ci­at­ed with Pro­fes­sor Mark Cohen’s Geniza Project at Prince­ton Uni­versin­ty.  In order to avoid writ­ing Yet Anoth­er Bou­tique Human­i­ties Com­put­ing Appli­tion, TextGar­den was also writ­ten as an exper­i­ment to cre­ate a high­ly flex­i­ble appli­ca­tion frame­work that lets users define data struc­tures on the fly.  TG was designed to accom­plish two main goals: (1) allow users to cre­ate class­es, prop­er­ties, and instances as part of their data entry work flow, and (2) to allow gran­u­lar anno­ta­tion and link­ing among media objects, for exam­ple between text ele­ments of docuemnts.

The data mod­el is based on my own expe­ri­ence with human­i­ties com­put­ing appli­ca­tions, RDF, and XTM.  It is strong­ly influ­enced by my work on Almagest, which uses a sim­i­lar graph-like data mod­el to orga­nize the com­plex and high­ly flex­i­ble data mod­els that schol­ars tend to require in anno­tat­ing media col­lec­tions.  It is per­haps clos­est in design and con­cept to Free­base.  Struc­tural­ly, it dis­tin­guish­es between seman­tic data and media, and stoes all meta­da­ta as seman­tic data in a sim­ple graph con­struct­ed of triples.

Before leav­ing Prince­ton, TextGar­den was to be my main appli­ca­tion to sup­port human­i­ties com­put­ing appli­ca­tions.  It was intend­ed to pro­vide a series of ser­vice lay­ers: (1) media con­sol­i­da­tion ser­vices, (2) ontol­ogy ser­vices, (3) col­lab­o­ra­tion ser­vices, (4) ana­lyt­i­cal ser­vices, (5) visu­al­iza­tion ser­vices, and (6) pub­li­ca­tion services.

Con­sol­i­da­tion ser­vices con­cern such foun­da­tion­al needs as the stor­age and data­bas­ing of pri­ma­ry sources, includ­ing fac­sim­i­les, tran­scrip­tions and metadata.

Ontol­ogy ser­vices con­cern the pro­vi­sion of a flex­i­ble data mod­el, express­able as a top­ic map or a seman­tic web, designed to cap­ture the inter­tex­tu­al links imma­nent in the con­sol­i­dat­ed mate­ri­als con­ceived as a sin­gle “super­text.”

Col­lab­o­ra­tion ser­vices con­cern the pro­vi­sion of tools to allow schol­ars to col­lab­o­rate in the upload­ing, edit­ing, link­ing and anno­ta­tion of the source mate­ri­als; these ser­vices involve authen­ti­ca­tion and autho­riza­tion in addi­tion to “social soft­ware” func­tions such as those asso­ci­at­ed with blogs and wikis.

Ana­lyt­i­cal ser­vices con­cern such activ­i­ties as latent seman­tic index­ing and oth­er sta­tis­ti­cal analy­ses of text cor­po­ra as well as the sim­ple gen­er­a­tion of reports and lists. Visu­al­iza­tion ser­vices are com­prised of tools designed to dis­play the struc­ture of the archive by means of such tools as Flash and SVG.

Final­ly, pub­li­ca­tion ser­vices con­cern the rights-sen­si­tive dis­tri­b­u­tion of both pri­ma­ry and derived mate­ri­als to the pub­lic for teach­ing and research, either as tra­di­tion­al media, new media, or as web services.

Almagest

Lec­ture­Builder 1.0

Prince­ton’s home grown asset man­age­ment sys­tem orig­i­nal­ly designed by Kirk Alexan­der and Kevin Per­ry to sup­port the tech­ing needs of art his­to­ri­ans.  Cur­rent­ly Almagest is an open source project that sup­ports a large num­ber of cours­es at Prince­ton across the dis­ci­plines.   My main con­tri­bu­tion to this project was to devel­op Lec­ture­Builder, a JavaScript-based GUI to allow users to quick­ly cre­ate lec­tures out of their anno­tat­ed col­lec­tions of images.  In addi­tion to col­lab­o­rat­ing with Kevin on the devel­opemt of Almagest 2.0, I also built, in Java, the infra­struc­ture to sup­port TEI–encoded XML texts that could be anno­tat­ed and linked at the ele­ment lev­el.   This work became the basis for my think­ing behind TextGar­den.  Unfor­tu­nate­ly, this work remains behind a fire­wall, although you may down­load an open source ver­sion of the soft­ware here.

Almagest was ahead of its time in many ways.  Not only did it sup­port a wiki-like syn­tax in Pun­star, which allowed users to link to data records in their anno­ta­tions through sim­ple syn­tax, it used what is ess­ne­tial­ly a graph-based data mod­el to orga­nize meta­da­ta.  This meant that users could break free of the flat mod­el of tra­di­tion­al meta­da­ta mod­els (like Dublin Core) and cre­ate net­works of meaning–semantic webs–that linked data by a rhi­zom­ic net­work of bidi­rec­tion­al links.  To this day it remains rev­o­lu­tion­ary in this respect, and I hope to devel­op oth­er tools that use this model.

Metamorph

An ear­ly exper­i­ment in data­base-dri­ven tex­tu­al­i­ty, orig­i­nal­ly writ­ten to sup­port a clas­si­cist’s need to have a quick index to Ovid’s Meta­mor­phoses.  The idea is sim­ple: parse a text into a series of tokens (words and punc­tion) and store these in a data­base.  Then, dis­play the text as a search­able index, where each word in the dis­played text is a link to a search on itself.  Its strength is that is very easy to build; its weak­ness is that it does not do search­ing across more com­plex strings than sin­gle tokens, although this func­tion­al­i­ty can be added.   This approach to man­ag­ing text became the foun­da­tion for Figu­ra.

Experiments

Stweet, the Semantic Twitter

This is a project in progress.   The idea is to cap­ture tweets–Twitter utterances–in a par­tic­u­lar for­mat and have them con­tribute to a triple store.  Tweet thus becomes a sim­ple, mas­sive­ly col­lab­o­ra­tive data entry tool for the linked data web.  For more infor­ma­tion, check out the proe­jct infom­ra­tion page:  “http://ontoligent.com/stweet”.

Graph Visualization Experiments

I am cur­rent­ly very intest­ed in pre­sent­ing graph data using visu­al­iza­tion tools.  I believe such tools will even­tu­al­ly become gen­res of inter­ac­tion with com­plex data that will take us beyond tra­di­tion­al search and browse.  I will be adding exam­ples here in the near future.

Leave a Reply

Your email address will not be published. Required fields are marked *