Four Theses on Big Data

The fol­low­ing is a frag­ment from an inter­nal doc­u­ment I am draft­ing on why humanists–including but not only dig­i­tal humanists–should be close­ly involved with UVa’s new Data Sci­ence Insti­tute. Much of this will be old hat to many dig­i­tal human­ists who have already embraced the rise of big data as an oppor­tu­ni­ty to be seized and a move­ment to be exam­ined. How­ev­er, I also think the term Big Data is often dis­missed as a mere mar­ket­ing term with­out any sub­stance behind it. This I dis­agree with, and the fol­low­ing is meant as a plea for tak­ing the idea seri­ous­ly.

Big Data is a social fact. The expres­sion “Big Data,” although now a mar­ket­ing term with its ori­gin in the hard sci­ences, index­es a gen­uine his­tor­i­cal trans­for­ma­tion in the social orga­ni­za­tion of knowl­edge. This trans­for­ma­tion is the most recent episode in a decades-long devel­op­ment of a con­crete, glob­al, and per­va­sive net­work of elec­tron­ic data pro­duc­ing and con­sum­ing devices, embed­ded in society’s major sec­tors, includ­ing gov­ern­ment, med­i­cine, finance, edu­ca­tion, and busi­ness. This net­work is not an abstrac­tion; it is not vir­tu­al. It is a mate­r­i­al devel­op­ment of the human bios­phere with a geog­ra­phy and con­crete­ness com­pa­ra­ble to that of the free mar­ket as described in Polanyi’s The Great Trans­for­ma­tion. We might call it the “datasphere”—a sphere of exchange in which the pro­duc­tion, dis­tri­b­u­tion, and con­sump­tion of dig­i­tal data sets has devel­oped in rela­tion to oth­er spheres of exchange that con­sti­tute what Castells has called the net­worked soci­ety.  The data­s­phere emerges from the com­bi­na­tion of sep­a­rate trends with long his­to­ries, such as the devel­op­ment of com­pu­ta­tion­al think­ing, the rise sta­tis­ti­cal meth­ods and world hypothe­ses with­in the sci­ences and soci­ety, the use of records, both paper and elec­tron­ic, by orga­ni­za­tions to “rep­re­sent” and man­age pop­u­la­tions, the con­struc­tion of a net­work of com­pu­ta­tion­al devices for the sens­ing, stor­ing, and analy­sis of data, etc. Many symp­toms of this his­tor­i­cal devel­op­ment are not new—the anx­i­ety of infor­ma­tion over­load, the mil­lenar­i­an belief in the trans­for­ma­tive effects of abun­dant data, and so forth. But the his­tor­i­cal moment is unique and gen­uine. We inhab­it a sit­u­a­tion that requires new per­spec­tives and approach­es to under­stand it.

When we speak of Big Data, we refer to a sys­tem of rep­re­sent­ing the world. This sys­tem inheres in an assem­blage of tech­nolo­gies and prac­tices that com­pose the data­s­phere, includ­ing data­bas­es, data mod­els, best prac­tices, user inter­faces, char­ac­ter sets, query lan­guages, sen­sors, net­work pro­to­cols, algo­rithms, soft­ware stacks, modes of office and lab work, and so forth. These elements—considered as an ecol­o­gy with infor­ma­tion flows, ener­gy (and there­fore eco­nom­ic) require­ments, and selec­tive pres­sures on behavior—are pro­duc­ing a series of effects in the areas of cog­ni­tion and epis­te­mol­o­gy. Observers of knowl­edge-work in the data­s­phere have claimed that Big Data and asso­ci­at­ed ana­lyt­i­cal meth­ods have made obso­lete such appar­ent­ly estab­lished notions as causal mod­els (Chris Ander­son), cat­e­gories and sys­tems of clas­si­fi­ca­tion (Clay Shirky), read­ing (Fran­co Moret­ti), and even mean­ing itself (Claude Shan­non). In place of these long estab­lished ideas and prac­tices, the emerg­ing field of Data Sci­ence sug­gests a novum organum, a scien­za nuo­va, in which the study of cul­ture and soci­ety becomes a branch of sta­tis­ti­cal physics. When epis­te­molo­gies change, espe­cial­ly to this degree, so do ontolo­gies, eth­i­cal per­spec­tives, and aes­thet­ic sen­si­bil­i­ties. Such trans­for­ma­tions beg for the par­tic­i­pa­tion of human­ists, to join the con­ver­sa­tion about their effects on how we think and to pur­sue new forms of human­is­tic research.

Big Data is often big social data. One of the most com­pelling aspects of the cur­rent moment is its social dimen­sion. The con­cept of Big Data has migrat­ed from the more eso­teric hard sci­ences, where the phrase was coined to des­ig­nate mas­sive and logis­ti­cal­ly prob­lem­at­ic data sets gen­er­at­ed by new sens­ing tech­nolo­gies, to the wider worlds of pol­i­cy and mar­ket­ing, where it now stands for a threat and oppor­tu­ni­ty to var­i­ous social con­stituen­cies, pre­cise­ly because Big Data increas­ing­ly refers to big social and cul­tur­al data. We now gen­er­ate real-time data about human behavior—through dig­i­tized libraries, insti­tu­tion­al records, trans­ac­tion­al data (e.g. cred­it card use or Google search­ing), and social media—in quan­ti­ties vast­ly exceed­ing that of data avail­able through tra­di­tion­al meth­ods, such as sur­veys, par­tic­i­pant obser­va­tion, and archival records. These data are not only mas­sive in scale but rich in scope, includ­ing pre­cise and exhaus­tive behav­ioral traces—think of con­sumer data tracked by scannable cards or dis­course data gen­er­at­ed by social media—that could not oth­er­wise be cap­tured with­out the exis­tence of the tech­ni­cal appa­ra­tus described above. It is this change in both the quan­ti­ty and qual­i­ty of social data, that presents enor­mous chal­lenges and oppor­tu­ni­ties, at the tech­ni­cal and cul­tur­al lev­els, that defines Big Data as an area of con­cern and inter­est to the human­ist and social sci­en­tist.

The phrase “big social data” was first used, as far I can deter­mine, by Lev Manovich in 2011. It is now rou­tine­ly used by social media mar­ket­ing gurus to describe the data prod­uct of social media sites.

Human­ists should both use and crit­i­cal­ly study Big Data. Beyond the legal and eth­i­cal issues raised by the use of Big Data, there are sig­nif­i­cant val­ue oppor­tu­ni­ties opened up by this his­tor­i­cal moment that are of inter­est to a broad range of the human­i­ties. We may define at least three gen­er­al areas in which human­ists may par­tic­i­pate along­side the emerg­ing field of Data Sci­ence: (1) what might be called, broad­ly, the phi­los­o­phy of data, which would focus on the epis­te­mo­log­i­cal and method­olog­i­cal issues raised by both exis­tence and use of these new sets of data and their accom­pa­ny­ing meth­ods; (2) the his­to­ry and soci­ol­o­gy of data, which would focus on how cul­tur­al life and social orga­ni­za­tion are affect­ed and trans­formed by the “data prod­ucts”  being devel­oped by insti­tu­tions such as gov­ern­ments, busi­ness­es, hos­pi­tals, etc., and (3) dig­i­tal human­i­ties research into data sets of cul­tur­al mate­ri­als, rep­re­sent­ed by such schools of thought as dis­tant read­ing, cul­tur­al ana­lyt­ics, macro­analy­sis, and cul­tur­omics.

Under the cat­e­go­ry of the phi­los­o­phy of data are a series of rich inter­pre­tive ques­tions such as: What is the rela­tion­ship between data as process­able records in a soft­ware envi­ron­ment and the cul­tur­al and social real­i­ties to which they pur­port­ed­ly refer? Are many aspects of these vast data sets beyond the reach of human under­stand­ing? How can we under­stand such dimen­sions as truth, accu­ra­cy, and bias in large data sets? Is a physics of cul­ture now pos­si­ble with the rise of Big Data? In what ways do the scale and vari­ety of Big Data change the kinds of ques­tions we can ask of a spe­cif­ic domain? Are mod­els and tra­di­tion­al sta­tis­ti­cal meth­ods made obso­lete by the algo­rith­mic meth­ods devel­oped to make sense of Big Data? How might a gen­uine data crit­i­cism be devel­oped, one that would incor­po­rate inter­pre­tive meth­ods and con­cerns with quan­ti­ta­tive tech­niques?

Under the cat­e­go­ry of the his­to­ry and soci­ol­o­gy of Big Data are ques­tions such as: What are the genealo­gies and con­tours of the insti­tu­tion­al frame­works with­in which the data­s­phere has devel­oped? How are fun­da­men­tal human asso­ci­a­tions being trans­formed, if at all, by the new data prod­ucts and data-based plat­forms of par­tic­i­pa­tion that char­ac­ter­ize the data­s­phere? How is the tra­di­tion­al pub­lic sphere being affect­ed by the data­s­phere? What are the cul­tur­al and iden­ti­ty effects of Big Data, viewed as a means of rep­re­sent­ing pop­u­la­tions to insti­tu­tions that make deci­sions that influ­ence the lives of peo­ple? How does Big Data affect the bal­ance of pow­er between gov­ern­ments, cor­po­ra­tions, and publics? What hap­pens to the res pub­li­ca when the scope of res increas­ing­ly includes data? How is the social con­tract changed when Big Data medi­ates key social rela­tion­ships, such as that between cit­i­zen and rep­re­sen­ta­tive or doc­tor and patient?

The field of dig­i­tal human­i­ties com­pris­es a num­ber of devel­op­ments in the human­i­ties, with a con­cen­tra­tion in lit­er­ary stud­ies and his­to­ry. Through the vec­tor of avail­able com­pu­ta­tion­al meth­ods and large dig­i­tized col­lec­tions of pri­ma­ry sources, dig­i­tal human­ists have dis­cov­ered and adapt­ed quan­ti­ta­tive approach­es to the study of cul­ture and soci­ety pre­vi­ous­ly employed by social sci­en­tists, such as archae­ol­o­gists in the study of mate­r­i­al cul­ture. They have applied these resources to areas nor­mal­ly con­sid­ered unreach­able by quan­ti­ta­tive meth­ods, such as the study of voice, genre, nar­ra­tive, influ­ence, and sym­bol­ism in lit­er­a­ture and works of art. In addi­tion, dig­i­tal human­ists have redis­cov­ered human geog­ra­phy through the use of GIS and relat­ed tech­nolo­gies, con­tribut­ing to what has been called the “spa­tial turn” in the human­i­ties. Both approach­es have been cou­pled with high per­for­mance com­put­ing and have encoun­tered many of the same logis­ti­cal dif­fi­cul­ties that nat­ur­al sci­en­tists have encoun­tered in their domains.

*     *     *

Leave a Reply

Your email address will not be published. Required fields are marked *