Metadata and the Research Project

In a widely reported quotation, former director of the NSA and CIA General Michael Hayden said in May 2014 that “We kill people based on metadata.” Metadata is increasingly valuable today: it would also seem that it carries not one but multiple forms of value, some of those forms payable in blood.

Information Scientist Jeffrey Pomerantz, in his book Metadata (Cambridge, MA: The MIT Press, 2015), argues that until recently, the term “metadata” has typically been used to refer to “[d]ata that was created deliberately; data exhaust, on the contrary, is produced incidentally as a result of doing other things” (126, emphasis mine). That’s an interesting term, “data exhaust,” as perhaps an analogue to the pollution associated with the economic production and consumption of the industrial age. And of course corporations and governments are finding new things to do with this so-called data exhaust (like kill people, for example, or just to chart the social networks of potential insurgents like Paul Revere, as Kieran Healy charmingly demonstrates, or even to advertise Target products to covertly pregnant teenagers until their parents find out, as the anecdote popular a while back noted). It’s got cash value, click-through value, and my Digital Technology and Culture (DTC) students last semester put together some really terrific projects examining the use of cookies and Web advertising and geolocation for ubiquitous monitoring and monetizing.

But that idea of useful information as by-product keeps coming back to me: I wonder if someone has ever tried to copyright the spreading informational ripples they leave in their wakes as they travel through their digital lives, since those ripples would seem to be information in fixed form (they’re recorded and tracked, certainly) created by individual human activity, if not intention. There’s a whole apparatus there that we interact with: as Pomerantz notes, “[i]n the modern era of ubiquitous computing, metadata has become infrastructural, like the electrical grid or the highway system. These pieces of modern infrastructure are indispensible but are also only the tip of the iceberg: when you flick on a lightswitch, for example, you are the end user of a large set of technologies and policies. Individually, these technologies and policies may be minor, and may seem trivial. . . but in the aggregate, they have far-reaching cultural and economic implications. And it’s the same with metadata” (3). So the research paper has as its infrastructure things like the credit hour and plagiarism policies and the Library of Congress Classification system, which composition instructors certainly address as at once central to the research project and also incidental, because the thing many of us want to focus is the agent and the intentional action; the student and the research.

In that older form of the first-year composition “research project” or “inquiry paper,” we pose both the production of information and the retrieval of information (via research-based human-directed activity) as entirely unified and intentional acts, but I’m increasingly moving toward advocating that we should be shifting our attention much more to examining the incidental and algorithmic production and retrieval of information as a model of research, especially given that “[w]hen using online resources, data is produced incidentally as a result simply of using those resources” (Pomerantz 126). One aspect of that orientation toward data exhaust is in the National Science Digital Library, which Pomerantz points out “is using the term ‘paradata’ to mean ‘use data about educational resources'” (129). And the most obvious place where that’s happening, to me, is in the Learning Management Systems: we can see (and even assign a value to) how much students are engaging the digital schoolhouses we’ve constructed for them. In Blackboard Learn, I can track how many views a certain online resource has received, or who’s interacted with it, and even grade those clicks as a sort of panoptic enforcer: that’s the dark side of it. More hopefully, we can encourage students to do the same by using tools like Eli Review to engage with data and metadata about their own interactions in peer review, and adapt that metadata to their own purposes and learn from it. My colleagues and I have experimented with asking students to track their time use with tools like Toggl, and I’m developing a plan to do so again in a way where we might create an evolving collaborative public time-map of writing and research activity and relate that time-map to their engagement with the writing process.

And there are other forms of metadata, as well. At Kairos an important part of our editorial process is inserting specific types of Dublin Core metadata into all the webtexts we publish. A citation can be a type of provenance metadata that helps one know the value of a journal that a citation comes from, especially if that journal is indexed by SCImago or h-Index, and of course that provenance metadata has economic drivers: Pomerantz points to the rise of provenance metadata as a result of the fact that “the marginal cost of [re]production for digital resources is nearly zero. Because of this, data about the provenance of resources is more important in the online world” (101). And because metadata is designed to be machine-readable, we can begin to automate knowing where our texts and information—our various forms of pseudoimmaterial capital—come from.

Of course, that’s what the NSA has been doing as well with our phone records, email headers, tweets and status updates, and many of us are mostly to the point where we’re used to it. Still, I encrypt my backups and use strong passwords and use the TorBrowser, and will probably buy a VPN subscription and start looking into encrypted messaging solutions sometime this year. A significant part of the DTC course I taught last semester dealt with technological (in addition to cultural and legal) surveillance countermeasures (including but certainly not limited to dreaming about the possibilities of the NSA Playset), and the students felt like they were at a sort of cultural cusp: some of them not worried too much about privacy or resigned to being monitored, and some of them deeply worried about it and actively investigating countermeasures and shaping their social media lives in response to what they see as the intrusion of pervasive data-gathering. As someone who studies rhetoric, I think it’s becoming increasingly important to think and talk about the rhetoric of secrecy in an age of metadata.

Marc Ambinder and D. B. Grady, in Deep State: Inside the Government Secrecy Industry (Hoboken, NJ: Wiley, 2013), discuss the disclosure of the NSA’s creation of the Stuxnet virus and the uproar in Congress and the media caused by that disclosure, even though the workings of the virus and its Iranian targets (and therefore its very likely originators) were widely known over a year before, and ask as a consequence: “When is a secret not really a secret? . . . What is the value of authoritative confirmation when all it does is tell us that what we think we know is indeed what we know?” (255). They follow these questions with an account of the 2009 Obama admininstration’s promised release of a “review of federal cyber policies” to promote “a new age of open discussion about the technological and security challenges posed by the age of ubiquitous, instantaneous communication” that was later pulled back by the administration, for the reason that “although the cyber policy questions that the lawers debated were obvious and common, the ‘mere fact that we recognize them could be of use to the enemy.’ In other words, merely because the review sought the formal opinion of lawyers from the Department of Defense, the CIA, Homeland Security, the Justice Department, and the National Security Agency, releasing it might somehow provide those with nefarious intentions a guidebook to exploit the gaps in U.S. law” (261–262). This would seem to be another sort of metadata; rather than provenance metadata, maybe call it authorization metadata, confirming the obvious either by action or by inaction in secrecy classification or comment. Later, an “unclassified presentation” on the same topic “makes a point that the classified review finds too secret to be released” (Ambinder and Grady 262). Audience determines action in the absence or oversaturation of information, and as we already know from the work of Lawrence Lessig and Frank Pasquale and Bruce Schneier, technology and its (mis)uses always advance faster than policy and the law.  So what do we see as the rhetorical situation for digital research writing in this world of neither-confirm-nor-deny information and aggregated and manipulated data exhaust?

Cryptome recently released a video art project that in one portion posed the prisoner’s dilemma in relation to the sharing of information: if one nation or one spy has the option to release information or to withhold it, and might bear less penalty for withholding it and more penalty for releasing it individually, but both will be even worse off if both withhold and even better off if both release, what will they do? Edward Snowden and Chelsea Manning already made that gamble. When will such gambles become more common in students’ lives?

In fall 2010, an officer at West Point’s Combating Terrorism Center generously shared information about the English-language PDF publicity magazine published by al Qaeda in the Arabian Peninsula, titled Inspire. I made the perhaps foolish decision to share it online with a colleague, and was immediately discovered via internet logs to have done so and scolded by the officer, who noted the FBI at the time did not want the magazine freely circulated in the US. (It has a lot of grim stuff in it, including bomb-making instructions apparently followed by the Tsarnaev brothers, and has lately become widely available online for industrious searchers.) But I also shared it in the classroom as a way to discuss and evaluate rhetorical strategies in relation to audience: in the first issue, for example, there’s an article by Usama Bin Laden on the immiserating global effects of climate change linked to United States economic policies that’s absolutely fascinating in the way it’s argued. And I still share it each semester with my DTC students as an example of synthesis of argument and design, and again as a way to discuss and evaluate rhetorical strategies.

The other aspect worth sharing, of course, is that its original editor Samir Khan was killed in Yemen by a US drone attack on September 30 2011, along with Anwar Al-Awlaki. Both were American citizens. One wonders what part metadata played in their killing.

Metadata and the Research Project

One thought on “Metadata and the Research Project

Comments are closed.