Cultural Analytics and Machine Learning
The rise of effective text mining by means of machine learning has produced new opportunities for the study of culture. Available text analytic methods — such as n‑gram extraction, topic modeling (e.g. LDA), entity extraction, word embedding (e.g. word2vec), etc. — allow for the investigation of cultural patterns within digital textual media through the human interpretation of their products, i.e. distributions and networks of topics, entities, word sequences, word pairs, etc. The theoretical foundation for this approach is provided by an operational theory of culture that employs the concept of ontology as a bridge term between anthropology and computational thinking.
Although the culture concept is complex, comprising mental, social, biological, and physical dimensions, cultural anthropologists have historically focused on the study of shared cognitive models and symbolic structures to provide insights into the behavior of human communities, from groups of hunter-gatherers to nations states. A foundational idea in cultural anthropology is that such models and structures consist of networks and clusters of symbols and meanings that undergird the use of language. In this view, the spoken and written products of language are patterned by a symbolic substrate that provides a referential context for language. This substrate, often referred to as “context,” may be operationally defined as an ontology, a concept that bridges the domains of anthropology and computational thinking. An ontology in this view is a set of categories organized by a set of core operators, such as opposition, analogy, mediation, metonymy, etc.
Given this framework, the products generated by text analytic methods may be used to generate approximate representations of culture-as-ontology. For example, the topics in a topic model may form a network on the basis of their mutual information. This network may then be used as evidence for symbolic structures that may have existed among the authors of the textual corpus from which the topic model was extracted. In addition, abstracted symbolic structures may be employed as features in downstream investigations of the relationship between culture and social action.
The effectiveness of this approach depends in large measure on the ability to rapidly generate models from large and temporally differentiated text corpora. Temporal differentiation over relatively long periods of time — years and decades — has the potential to surface what changes and remains constant in a cultural ontology. Given this, the need for improved processing power, at the level of both software optimization and hardware innovation, becomes paramount.
Leave a Reply