Category Archives: Uncategorized

Cultural Analytics and Machine Learning

The rise of effec­tive text min­ing by means of machine learn­ing has pro­duced new oppor­tu­ni­ties for the study of cul­ture. Avail­able text ana­lyt­ic meth­ods — such as n-gram extrac­tion, top­ic mod­el­ing (e.g. LDA), enti­ty extrac­tion, word embed­ding (e.g. word2vec), etc. —  allow for the inves­ti­ga­tion of cul­tur­al pat­terns with­in dig­i­tal tex­tu­al media through the human inter­pre­ta­tion of their prod­ucts, i.e. dis­tri­b­u­tions and net­works of top­ics, enti­ties, word sequences, word pairs, etc. The the­o­ret­i­cal foun­da­tion for this approach is pro­vid­ed by an oper­a­tional the­o­ry of cul­ture that employs the con­cept of ontol­ogy as a bridge term between anthro­pol­o­gy and com­pu­ta­tion­al think­ing.

Although the cul­ture con­cept is com­plex, com­pris­ing men­tal, social, bio­log­i­cal, and phys­i­cal dimen­sions, cul­tur­al anthro­pol­o­gists have his­tor­i­cal­ly focused on the study of shared cog­ni­tive mod­els and sym­bol­ic struc­tures to pro­vide insights into the behav­ior of human com­mu­ni­ties, from groups of hunter-gath­er­ers to nations states. A foun­da­tion­al idea in cul­tur­al anthro­pol­o­gy is that such mod­els and struc­tures con­sist of net­works and clus­ters of sym­bols and mean­ings that under­gird the use of lan­guage. In this view, the spo­ken and writ­ten prod­ucts of lan­guage are pat­terned by a sym­bol­ic sub­strate that pro­vides a ref­er­en­tial con­text for lan­guage. This sub­strate, often referred to as “con­text,” may be oper­a­tional­ly defined as an ontol­ogy, a con­cept that bridges the domains of anthro­pol­o­gy and com­pu­ta­tion­al think­ing. An ontol­ogy in this view is a set of cat­e­gories orga­nized by a set of core oper­a­tors, such as oppo­si­tion, anal­o­gy, medi­a­tion, metonymy, etc.

Giv­en this frame­work, the prod­ucts gen­er­at­ed by text ana­lyt­ic meth­ods may be used to gen­er­ate approx­i­mate rep­re­sen­ta­tions of cul­ture-as-ontol­ogy. For exam­ple, the top­ics in a top­ic mod­el may form a net­work on the basis of their mutu­al infor­ma­tion. This net­work may then be used as evi­dence for sym­bol­ic struc­tures that may have exist­ed among the authors of the tex­tu­al cor­pus from which the top­ic mod­el was extract­ed. In addi­tion, abstract­ed sym­bol­ic struc­tures may be employed as fea­tures in down­stream inves­ti­ga­tions of the rela­tion­ship between cul­ture and social action.

The effec­tive­ness of this approach depends in large mea­sure on the abil­i­ty to rapid­ly gen­er­ate mod­els from large and tem­po­ral­ly dif­fer­en­ti­at­ed text cor­po­ra. Tem­po­ral dif­fer­en­ti­a­tion over rel­a­tive­ly long peri­ods of time — years and decades — has the poten­tial to sur­face what changes and remains con­stant in a cul­tur­al ontol­ogy. Giv­en this, the need for improved pro­cess­ing pow­er, at the lev­el of both soft­ware opti­miza­tion and hard­ware inno­va­tion, becomes para­mount.