An interactive critical edition of Chrétien de Troyes’s Le Chevalier de la Charrette (Lancelot, ca. 1180), an Old French Arthurian Romance studied by the late Professor Karl Uitti of Princeton University. Figura integrates facsimile images and TEI-encoded diplomatic transcriptions of eight manuscript tradtions (held in libraries around the world), the 1989 Foulet-Uitti critical edition of the text, the grammatical data associated with each word in the critical edition, and an exhaustive collection of the rhetorical figures that scholars have mapped onto the text. These figures include adnominatio, chiasmus, enjambment, oratio recta, oratio obliqua, and rich rhyme.
From a technical perspective, Figura is a custom web application designed to present the several layers of text and data associated with the Princeton Charrette Project. Because of the complex, non-overlapping nature of the text and its interpretive layers–from the physical to the rhetorical–as well as the fact that the encoding process involved many simultaneous editors in both Europe and America, a non-XML solution was chosen to manage the data and its presentation. However, the entirety of the database is exportable as an XML document, the FAUX Charrette.
The application was first written in Cocoon, but then migrated to PHP for reasons of speed and portability.
An interactive critical edition of the Geniza archive associated with Professor Mark Cohen’s Geniza Project at Princeton Universinty. In order to avoid writing Yet Another Boutique Humanities Computing Applition, TextGarden was also written as an experiment to create a highly flexible application framework that lets users define data structures on the fly. TG was designed to accomplish two main goals: (1) allow users to create classes, properties, and instances as part of their data entry work flow, and (2) to allow granular annotation and linking among media objects, for example between text elements of docuemnts.
The data model is based on my own experience with humanities computing applications, RDF, and XTM. It is strongly influenced by my work on Almagest, which uses a similar graph-like data model to organize the complex and highly flexible data models that scholars tend to require in annotating media collections. It is perhaps closest in design and concept to Freebase. Structurally, it distinguishes between semantic data and media, and stoes all metadata as semantic data in a simple graph constructed of triples.
Before leaving Princeton, TextGarden was to be my main application to support humanities computing applications. It was intended to provide a series of service layers: (1) media consolidation services, (2) ontology services, (3) collaboration services, (4) analytical services, (5) visualization services, and (6) publication services.
Consolidation services concern such foundational needs as the storage and databasing of primary sources, including facsimiles, transcriptions and metadata.
Ontology services concern the provision of a flexible data model, expressable as a topic map or a semantic web, designed to capture the intertextual links immanent in the consolidated materials conceived as a single “supertext.”
Collaboration services concern the provision of tools to allow scholars to collaborate in the uploading, editing, linking and annotation of the source materials; these services involve authentication and authorization in addition to “social software” functions such as those associated with blogs and wikis.
Analytical services concern such activities as latent semantic indexing and other statistical analyses of text corpora as well as the simple generation of reports and lists. Visualization services are comprised of tools designed to display the structure of the archive by means of such tools as Flash and SVG.
Finally, publication services concern the rights-sensitive distribution of both primary and derived materials to the public for teaching and research, either as traditional media, new media, or as web services.
- The Geniza instance of TextGarden
- An article on the Genize Project from Princeton’s IT’s Academic blog
Almagest was ahead of its time in many ways. Not only did it support a wiki-like syntax in Punstar, which allowed users to link to data records in their annotations through simple syntax, it used what is essnetially a graph-based data model to organize metadata. This meant that users could break free of the flat model of traditional metadata models (like Dublin Core) and create networks of meaning–semantic webs–that linked data by a rhizomic network of bidirectional links. To this day it remains revolutionary in this respect, and I hope to develop other tools that use this model.
An early experiment in database-driven textuality, originally written to support a classicist’s need to have a quick index to Ovid’s Metamorphoses. The idea is simple: parse a text into a series of tokens (words and punction) and store these in a database. Then, display the text as a searchable index, where each word in the displayed text is a link to a search on itself. Its strength is that is very easy to build; its weakness is that it does not do searching across more complex strings than single tokens, although this functionality can be added. This approach to managing text became the foundation for Figura.
Stweet, the Semantic Twitter
This is a project in progress. The idea is to capture tweets–Twitter utterances–in a particular format and have them contribute to a triple store. Tweet thus becomes a simple, massively collaborative data entry tool for the linked data web. For more information, check out the proejct infomration page: “http://ontoligent.com/stweet”.
Graph Visualization Experiments
I am currently very intested in presenting graph data using visualization tools. I believe such tools will eventually become genres of interaction with complex data that will take us beyond traditional search and browse. I will be adding examples here in the near future.