dat in Adema 2009


in a
turkish publishing house and make translations). but they miss a point, it was
the very movement which made book a medium that de-posits "book" (in the
Blanchotian sense): these blogs do indeed a very important service, they save
books from the databanks. I'm not going to make a easy rider argument and
decry technology.what I mean is this: these books are the very bricks which
make up resistance -they are not compost-, it is a sharing "partage" and these
fragmentary impartations (the act in whi


dat in Adema 2019


l
authors earned their incomes solely from writing, where in 2013 this figure
had dropped to just 11.5%.[4](ch3.xhtml#footnote-149)

It seems that one of the primary reasons for the ALCS to conduct this survey
was to collect ‘accurate, independent data’ on writers’ earnings and
contractual issues, in order for the ALCS to ‘make the case for authors’
rights’ — at least, that is what the ALCS Chief Executive Owen Atkinson writes
in the introduction accompanying the survey, which was sent out to all ALCS
members.[5](ch3.xhtml#footnote-148) Yet although this research was conducted
independently and the researchers did not draw conclusions based on the data
collected — in the form of policy recommendations for example — the ALCS did
frame the data and findings in a very specific way, as I will outline in what
follows; this framing includes both the introduction to the survey and the
press release that accompanies the survey’s findings. Yet to some extent this
framing, as I will argue, is already apparent in the methodology used to
produce the data underlying the research report.

First of all, let me provide an example of how the research findings have been
framed in a specific way. Chief Executive Atkinson mentions in his
introduction to the survey that the ALCS ‘exists to ensure that writers are
treated fairly and remunerated appropriately’. He continues that the ALCS
commissioned the survey to collect ‘accurate, independent data,’ in order to
‘make the case for writers’ rights’.[6](ch3.xhtml#footnote-147) Now this focus
on rights in combination with remuneration is all the more noteworthy if we
look at an earlier ALCS funded report from 2007, ‘Authors’ Earnings


survey,
which the 2013 survey updates. The 2007 report argues conclusively that
current copyright law has empirically failed to ensure that authors receive
appropriate reward or remuneration for the use of their
work.[7](ch3.xhtml#footnote-146) The data from the subsequent 2013 survey show
an even bleaker picture as regards the earnings of writers. Yet Atkinson
argues in the press release accompanying the findings of the 2013 survey that
‘if writers are to continue making their irreplaceable cont


ed and
determined by focusing on two fixed and predetermined entities in advance.
First of all, the study focuses on individual human agents of creativity (i.e.
creators contributing economic value): the value of writing is established by
collecting data and making measurements at the level of individual authorship,
addressing authors/writers as singular individuals throughout the survey.
Secondly, economic worth is further determined by focusing on the fixed and
stable creative objects authors prod


(ch3.xhtml#footnote-143-backlink) Ibid.

[11](ch3.xhtml#footnote-142-backlink) In the survey, three questions that
focus on various sources of remuneration do list digital publishing and/or
online uses as an option (questions 8, 11, and 15). Yet the data tables
provided in the appendix to the report do not provide the findings for
questions 11 and 15 nor do they differentiate according to type of media for
other tables related to remuneration. The only data table we find in the
report related to digital publishing is table 3.3, which lists ‘Earnings
ranked (1 to 7) in relation to categories of work’, where digital publishing
ranks third after books and magazines/periodicals, but before newspapers,


dat in Adema & Hall 2013


mic knowledge - companies can build new
businesses based on its use and exploitation, for example - thus increasing the impact
of higher education on society and helping the UK, Europe and the West (and North)
to be more competitive globally. 36

To date, the open access movement has progressed much further toward its goal of
making all journal articles available open access than it has toward making all
academic books available in this fashion. There are a number of reasons why this is
the case. Fi


that two
of the three BBB component definitions (the Bethesda and Berlin statements) require
removing barriers to derivative works.
53
An examination of the licenses used on two of the largest open access book publishing
platforms or directories to date, the OAPEN (Open Access Publishing in Academic
Networks) platform and the DOAB (Directory of Open Access Books), reveals that on the
OAPEN platform (accessed May 6th 2012) 2 of the 966 books are licensed with a CC-BY
license, and 153 with a CC-BY-NC


is is especially the case with regard to the
publication of books, where a more conservative vision frequently holds sway. For
instance, it is intriguing that in an era in which online texts are generally connected to
a network of other information, data and mobile media environments, the open access
book should for the most part still find itself presented as having definite limits and a
clear, distinct materiality.

But if the ability to re-use material is an essential feature of open access – a


rbound book is being re-imagined in offline
environments. In this post-digital print culture, paper publishing is being used as a new form
of avant-garde social networking that, thanks to its analog nature, is not so easily controlled
by the digital data-gathering commercial hegemonies of Google, Amazon, Facebook et al. For
more, see Alessandro Ludovico, Post-Digital Print - the Mutation of Publishing Since 1984,
Onomatopee, 2012; and Florian Cramer, `Post-Digital Writing', Electronic Book Review,
D


dat in Barok 2014


File:Papyrus_of_Plato_Phaedrus.jpg)

[![](/skins/common/images/magnify-
clip.png)](/File:Papyrus_of_Plato_Phaedrus.jpg "Enlarge")


A crucial difference between print and digital is that text files such as HTML
documents nor markdown documents nor database-driven texts did inherit this
quality. Their containers are simply not structured into pages, precisely
because of the nature of their materiality as media. Files are written on
memory drives in scattered chunks, beginning at point A and ending


adapts to properties of its display when rendered. As it is nicely implied in
the animated logo of this event and as we know it from EPUBs for example.

The general format of bibliographic record is:



Author. Title. Publisher. [Place.] Date. [Page.] URL.


In the case of 'reference-linking' we can refer to a passage by including the
information about its beginning and length determined by the character
position within the text (in analogy to _pp._ operator used for printed
publications) as well as the text version information (in printed texts served
by edition and date of publication). So what is common in printed text as the
page information is here replaced by the character position range and version.
Such a reference-link is more precise while addressing particular section of a
particular version of a document


dat in Barok 2014


to a subject-matter, consists of
discrete queries. A query, such as a question about what something is, what
kinds, parts and properties does it have, and so on, can be consulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.

Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.

C

Searching in a colle


dat in Barok 2014


ally every
utterance on the internet, or rather made by means of the equipment connected
to it through standard protocols, is recorded, in encrypted or unencrypted form,
on servers of information agencies, besides copies of a striking share of these data
on servers of private companies. We are only at the beginning of civil mobilization towards reversal of the situation and the future is open, yet nothing suggests
so far that there is any real alternative other than “to demand the impossible.”
T


t of their surveillance techniques. This situation is obviously
radically different from a totalitarianism we got to know. Even though secret
agencies in the Eastern Bloc were blackmailing people to produce miserable literature as their agents, samizdat publications could at least theoretically escape their
attention.
This is not the only difference. While captured samizdats were read by agents of
flesh and blood, publications collected through the internet surveillance are “read”
by software ag


n the page on Prague.
2. Links to their web presence from inside their wiki pages, while it usually
implies their (self-)presentation.
http://monoskop.org/The_Media_Are_With_Us
3. Basic information, including a name or title in an original language, dates
of birth, foundation, realization, relations to other agents, ideally through
links inside the wiki. These are presented in narrative and in English.
4. Literature or bibliography in as many languages as possible, with links to
versions of texts on


life of the previous century.
Since then the focus has considerably expanded to more than a century of art and
new media on the whole continent. Still it portrays merely another layer of the
research, the one which is yet a collection of fragmentary data, without much
context. Soon we also hit the limit of what is about this field online. The next
question was how to work in the internet environment with printed sources.
Log
http://monoskop.org/log
When I was installing this blog five years ago I tr


dat in Barok 2018


ht,
it’s supported by all browsers, footnotes too, you can adapt its layout
easily.
That’s completely fine for a researcher.
As a researcher, you just need source code:
you need plain text, page numbers, images, working footnotes, relevant data
and code.
_Data and code_ as well:
this is where online companions to print books come in,
you want to publish your research material,
your interviews, spreadsheets, software you made.
...
Here we distinguish between researchers and read


dat in Bodo 2014


s not just size that made Gigapedia unique. Unlike most sites, it
moved beyond its initial specialization in scientific texts to incorporate a wide range of academic
disciplines. Compared to its peers, it also had a highly developed central metadata database, which
contained bibliographic details on the collection and also, significantly, on gaps in the collection, which
underpinned a process of actively solicited contributions from users. With the ubiquitous
scanner/copiers, the production of book


seed of all scientific digital libraries on the net. Its mission is simple and straightforward. It
collects free-floating scientific texts and other collections from the Internet and consolidates them (both
content and metadata) into a single, open database. Though ordinary users can search the catalog and
retrieve the texts, its main focus is the distribution of the catalog and the collection to anyone who
wants to build services upon them. Aleph has regularly updated links that point to its own, neatly packed
source code, its database dump, and to the terabytes worth of collection. It is a knowledge infrastructure
that can be freely accessed, used and built upon by anyone. This radical openness enables a number of
other pirate libraries to offer Aleph’s catalogue along with


arding books is very, very
important, and did just that. If someone had free access to a Xerox machine, they were Xeroxing
everything in sight. A friend of mine had entire room full of Xeroxed books.”2
From the 1960s onwards, the ever-growing Samizdat networks tried to counterbalance the effects of
censorship and provide access to both censored classics and information on the current state of Soviet

2

Anonymous source #1

7

Draft Manuscript, 11/4/2014, DO NOT CITE!
society. Reaching a readersh


around 200,000, these networks operated in a networked, bottom-up
manner. Each node in the chain of distribution copied the texts it received, and distributed the copies.
The nodes also carried information backwards, towards the authors of the samizdat publications.
In the immediate post-Soviet political turmoil and economic calamity, access to print culture did not get
any easier. Censorship officially ended, but so too did much of the funding for the state-funded
publishing sector. Mass unemploym


in RuNet
The copying of censored and uncensored works (by hand, by typewriters, by photocopying or by
computers), the hoarding of copied texts, the buying and selling of books on the black market, and the
informal, peer-to-peer distribution of samizdat material were integral parts of the everyday experience
of much of educated Soviet and post-Soviet readers. The building and maintenance of individual
collections and the participation in the informal networks of exchange offered a sense of political


nd printers became available,
people started to print forbidden books, or just books that were difficult to find, not necessarily
forbidden. I have seen myself a print-out on a mainframe computer of a science fiction novel,
printed in all caps! Samizdat was printed on typewriters, xeroxed, printed abroad and xeroxed, or
printed on computers. Only paper circulated, files could not circulate until people started to have
PCs at home. As late as 1992 most people did not have a PC at home. So the only re


ed to kolhoz and processed into
DJVU. The focus was on collecting the most important science textbooks and monographs of all time, in
all fields of natural science.
There was never any commercial support. The kolhoz group never had a web site with a database, like
most projects today. They had an ftp server with files, and the access to ftp was given by PM in a forum.
This ftp server was privately supported by one of the members (who was an academic researcher, like
most kolhoz members). The files w


chivist: a user of the forum started the
laborious task of organizing the texts into a usable, searchable format—first filtering duplicates and
organizing existing metadata first into an excel spreadsheet, and later moving to a more open, webbased database operating under the name Aleph.
Aleph inherited more than just books from Kolhoz and Moshkov’s lib.ru. It inherited their elitism with
regard to canonical texts, and their understanding of librarianship as a community effort. Like the earlier


te widely publicized services that interface with the
public.They let others run the risk of getting famous.

Mirrors and communities
Aleph serves as a source archive for around a half-dozen freely accessible pirate libraries on the net. The
catalog database is downloadable, the content is downloadable, even the server code is downloadable.
No passwords are required to download and there are no gatekeepers. There are no obstacle to setting
up a similar library with a wider catalog, with improved use


e centralization of the book collection: many of the mirrors have their own upload pages
where one can contribute to a mirror’s collection, and it is not clear how or whether books that land at
one of the mirrors find their way back to the central database. Aleph also offers a desktop library
management tool, which enables dedicated librarians to see the latest Aleph database on their desktop
and integrate their local collections with the central database via this application. Nevertheless, it seems
that nothing really stands in the way of the fragmentation of the collection, apart from the willingness of
uploaders to contribute directly to Aleph rather than to one of its mirrors (or other sites)


outs and identity of web services is already operational. But
[i]f people are physically served court invitations, they will have to close the site. The idea is, however,
that the entire collection is copied throughout the world many times over, the database is open, the code
for the site is open, so other people can continue.16

On methodology
We tried to reconstruct the story behind Aleph by conducting interviews and browsing through the BBS
of the community. Access to the site and community membe


raries linking past and present, and preparing for the future.
Sezneva, O., & Karaganis, J. (2011). Chapter 4: Russia. In J. Karaganis (Ed.), Media Piracy in Emerging
Economies. New York: Social Science Research Council.
Skilling, H. G. (1989). Samizdat and an Independent Society in Central and Eastern Europe. Pa[Aleph]rave
Macmillan.
Solzhenitsyn, A. I. (1974). The Gulag Archipelago 1918-1956: An Experiment in Literary Investigation,
Parts I-II. Harper & Row.
Stelmach, V. D. (1993). Reading in Russ


dat in Bodo 2015


that called it to life and shaped its development. I looked at its catalogue to
understand what it has to offer and how that piratical supply of books is related to the legal supply of
books through libraries and online distributors. I also acquired data on its usage, so was able to
reconstruct some aspects of piratical demand. After a short introduction, in the first part of this essay I
will outline some of the main findings, and in the second part will situate the findings in the wider context
of


reated uncertainty in the book market. The previous decades, however, have taught
authors and readers how to overcome political and economic obstacles to access to books. During the
Soviet times authors, editors and readers operated clandestine samizdat distribution networks, while
informal book black markets, operating in semi-private spheres, made uncensored but hard to come by
books accessible (Stelmakh, 2001). This survivalist attitude and the skills that came with it became handy
in the post-So


n ISBN number registered in the catalogue3 was
available in print either as a new copy or a second hand one, only about one third of the titles were
available in e-book formats. The mean price of the titles still in print was 62 USD according to the data
gathered from Amazon.com.
The number of works accessed through of Aleph is as impressive as its catalogue. In the three months
between March and June, 2012, on average 24.000 documents were downloaded every day from one of
its half-a-dozen mirrors.4


nd what it is not
Aleph is an example of the library in the post scarcity age. It is founded on the idea that books should no
longer be a scarce resource. Aleph set out to remove both sources of scarcity: the natural source of
3

Market availability data is only available for that 40% of books in the Aleph catalogue that had an ISBN number
on file. The titles without a valid ISBN number tend to be older, Russian language titles, in general with low
expected print and e-book availability.
4
Download data is based on the logs provided by one of the shadow library services which offers the books in
Aleph’s catalogue as well as other works also free and without any restraints or limitations.

9

Bodó B. (2015): Libraries in the post-scarcity era.
i


les the peer production of the
library. Aleph is an open source library. This means that every resource it uses and every resource it
creates is freely accessible to anyone for use without any further restrictions. This includes the server
code, the database, the catalogue and the collection. The open source nature of Aleph rests on the
ideological claim that the scientific knowledge produced by humanity, mostly through public funds
should be open for anyone to access without any restrictions. Every


dat in Bodo 2016


d unrestricted access to them.

The support for a freely accessible scholarly knowledge commons takes many
different forms. A growing number of academics publish in open access
journals, and offer their own scholarship via self-archiving. But as the data
suggest (Bodó 2014a), there are also hundreds of thousands of people who use
pirate libraries on a regular basis. There are many who participate in
courtesy-based academic self-help networks that provide ad hoc access to
paywalled scholarly papers (Cabanac 2015).[1] But a few people believe that
scholarly knowledge could and should be liberated from proprietary databases,
even by force, if that is what it takes. There are probably no more than a few
thousand individuals who occasionally donate a few bucks to cover the
operating costs of piratical services or share their private digital
collections with the worl


libraries are
familiar with the crippling consequences of not having access to fundamental
texts in science, either for political or for purely economic reasons. The
Soviet intelligentsia had decades of experience in bypassing censors, creating
samizdat content distribution networks to deal with the lack of access to
legal distribution channels, and running gray and black markets to survive in
a shortage economy (Bodó 2014b). Their skills and attitudes found their way to
the next generation, who no


gued for the unilateral liberation of scholarly
knowledge from behind paywalls to provide universal access to a common human
heritage. A few years later he tried to put his ideas into action by
downloading millions of journal articles from the JSTOR database without
authorization. Alexandra Elbakyan is a 27-year-old neurotechnology researcher
from Kazakhstan and the founder of Sci-hub, a piratical collection of tens of
millions of journal articles that provides unauthorized access to paywalled
artic


the paradox of the total piratical archive: they collect enormous
wealth, but they do not own or control any of it. As an insurance policy
against copyright enforcement, they have already given everything away: they
release their source code, their databases, and their catalogs; they put up
the metadata and the digitalized files on file-sharing networks. They realize
that exclusive ownership/control over any aspects of the library could be a
point of failure, so in the best traditions of archiving,


y and probably sealed her own legal and
financial fate. But her library is probably safe. In the wake of this lawsuit,
pirate libraries are busy securing themselves: pirates are shutting down
servers whose domain names were confiscated and archiving databases, again and
again, spreading the illicit collections through the underground networks
while setting up new servers. It may be easy to destroy individual
collections, but nothing in history has been able to destroy the idea of the
universal libra


dat in Constant 2009


gure 13

figure 17

figure 19
Doppelgänger: The
electronic double
(duplicate, twin) in
a society of control
and surveillance

figure 14

3

3

3

4

4

figure 20 CookieSensus: Cookies
found on washingtonpost.com

figure 22 Image Tracer: Images
and data accumulate into layers as
the query is repeated over time

figure 21 ... and
cookies sent by tacodo.net
figure 23 Shmoogle: In one click,
Google hierarchy crumbles down

4

4

4

5

5

figure 24 Jussa
Parrikka: We move
onto a baroque world,
a mode


extended objects

EN, NL, FR

93

Leiff Elgren, CM von Hausswolff
Elgaland-Vargaland EN, NL, FR

95

CM von Hausswolff, Guy-Marc Hinant
Ghost Machinery EN, NL

98

Read Feel Feed Real

101

EN, NL, FR

Manu Luksch, Mukul Patel
Faceless: Chasing the Data Shadow

EN

104

Julien Ottavi
Electromagnetic spectrum Research code
0608 FR

119

Michael Murtaugh
Active Archives or: What's wrong with the
YouTube documentary? EN

131


EN, NL, FR

Femke Snelting

NL

139
143

Adrian Mackenzie
Centres of envelo


stion of the interaction
between body and technology on the table. How to think
about the actual effects of surveillance, the ubiquitous presence of cameras and public safety procedures that can only
regard individuals as an amalgamate of analysable data?
What is the status of ‘identity' when it appears both elusive and unchangeable? How are we conditioned by the
technology we use? What is the relationship between commitment and reward? flexibility of work and healthy life?
Which traces does technology leave in our thinking, behavior, our routine movements? And what residue do we
leave behind ourselves on electr(on)ic fields through our
presence in forums, social platforms, databases, log files?
The dual nature of the term ‘notation' formed an important source of inspiration. Systems that choreographers,
composers and computer programmers use to record ideas
and observations, can then be interpreted as instruction,
as a c


hemes, not meant to isolate areas of thinking, but rather
as ‘spider threads' interlinking various projects:
E-traces (p. 35) subjected the current reality of Web 2.0
to a number of critical considerations. How do we regain
control of the abundant data correlation that mega-companies such as Google and Yahoo produce, in exchange for
our usage of their services? How do we understand ‘service' when we are confronted with their corporate Janus
face: one a friendly interface, the other Machiavellian


ectivity, this profound sense of micro-collaboration, which has often been tapped into.”
Constant, October 2009

34

34

35

35

EN

E-Traces

35

35

35

36

36

How does the information we seize in search engines
circulate, what happens to our data entered in social networking sites, health records, news sites, forums and chat
services we use? Who is interested? How does the ‘market' of the electronic profile function? These questions
constitute the framework of the E-traces project.
For thi



than order. While Google serves the users with information ready for
immediate consumption, Shmoogle forces its users to scroll down and
make their own choices. If Google is a search engine, then Shmoogle
is a research engine.

figure 22
Images
and data
accumulate
into layers
as the query
is repeated
over time

figure 23 In
one click,
Google
hierarchy
crumbles
down

In Image Tracer, order is important. Image Tracer is a collaboration between artist group De Geuzen and myself. Tracer was born
out of our mutual interest in the traces images leave behind them on
their networked paths. In Tracer images and data accumulate into
layers as the query is repeated over time. Boundaries between image
and data are blurred further as the image is deliberately reduced to
thumbnail size, and emphasis is placed on the image's context, the
neighbouring images, and the metadata related to that image. Image Tracer builds up an archive of juxtaposed snapshots of


camera's and legislation are ingredients for a science fiction film, live annotation of videostreaming with the help
of IRC chats. . .
A mobile video laboratory was set up during the festival, to test out how to bring together scripting, annotation, data readings and recordings in digital archives.
Operating somewhere between surveillance and observation, the Open Source video team mixed hands-on Icecast
streaming workshops with experiments looking at the way
movements are regulated through motion control and vice
versa.

MANU LUKSCH, MUKUL PATEL
License: Creative Commons Attribution - NonCommercial - ShareAlike license
figure 94
CCTV
sculpture
in a park
in London

EN

Faceless: Chasing the Data Shadow
Stranger than fiction
Remote-controlled UAVs (Unmanned Aerial Vehicles) scan the city
for anti-social behaviour. Talking cameras scold people for littering
the streets (in children's voices). Biometric data is extracted from
CCTV images to identify pedestrians by their face or gait. A housing project's surveillance cameras stream images onto the local cable
channel, enabling the community to monitor itself.

figure 95
Poster in
London

These are not pr


control room operated by the
London Metropolitan Police Force. In April 2001 the existing CCTV system in Birmingham city centre was upgraded to smart CCTV. People are routinely scanned by both
systems and have their faces checked against the police databases.”
Centre for Computing and Social Responsibility http://www.ccsr.cse.dmu.ac.uk
/resources/general/ethicol/Ecv12no1.html
2

104

104

104

105

105

leads the world in the deployment of surveillance technologies. With
an estimated 4.2 million


al
communications surveillance system Echelon, shopping habits monitored through store loyalty cards, individual purchases located using
RfiD (Radio-frequency identification) tags, and our meal preferences
collected as part of PNR (flight passenger) data. 5 Our digital selves
are many dimensional, alert, unforgetting.

4
5

A Report on the Surveillance Society. For the Information Commissioner by the Surveillance Studies Network, September 2006, p.19. Available from http://www.ico.gov.uk
‘e-Border


languages spoken in the household and many other personal details.
The Canadian Federal Government granted Lockheed Martin a $43.3 million deal to
conduct its 2006 Census. Public outcry against it resulted in only civil servants handling the actual data, and a new government task force being set up to monitor privacy
during the Census.
http://censusalert.org.uk/
http://www.vivelecanada.ca/staticpages/index.php/20060423184107361

105

105

105

106

106

Increasingly, these data traces are arrayed and administered in
networked structures of global reach. It is not necessary to posit a
totalitarian conspiracy behind this accumulation – data mining is an
exigency of both market effciency and bureaucratic rationality. Much
has been written on the surveillance society and the society of control,
and it is not the object here to construct a general critique of data
collection, retention and analysis. However, it should be recognised
that, in the name of effciency and rationality – and, of course, security – an ever-increasing amount of data is being shared (also sold,
lost and leaked 6) between the keepers of such seemingly unconnected
records as medical histories, shopping habits, and border crossings.
6

Sales: “Personal details of all 44 million adults living in Britain could be s


partment of Work and Pensions were
found on a road near Exeter airport, following their loss from a TNT courier vehicle.
There were also documents relating to home loans and mortgage interest, and details
of national insurance numbers, addresses and dates of birth.
In November 2007, HM Revenue and Customs (HMRC) posted, unrecorded and unregistered via TNT, computer discs containing personal information on 25 million people
from families claiming child benefit, including the bank details of parents and the dates
of birth and national insurance numbers of children. The discs were then lost.
Also in November, HMRC admitted a CD containing the personal details of thousands
of Standard Life pension holders has gone missing, leaving them at heightened risk
of identity theft. The CD, which contained data relating to 15,000 Standard Life
pensions customers including their names, National Insurance numbers and pension
plan reference numbers was lost in transit from the Revenue offce in Newcastle to the
company's headquarters in Edinburgh by ‘an exte


rom the boot of
an HMRC car. A staff member had been using the PC for a routine audit of tax
information from several investment firms. HMRC refused to comment on how many
individuals may be at risk, or how many financial institutions have had their data
stolen as well. BBC suggest the computer held data on around 400 customers with
high value individual savings accounts (ISAs), at each of five different companies –
including Standard Life and Liontrust. (In May, Standard Life sent around 300 policy
documents to the wrong people.)

106

106

106

107

107

Legal frameworks intended to safeguard a conception of privacy by
limiting data transfers to appropriate parties exist. Such laws, and in
particular the UK Data Protection Act (DPA, 1998) 7, are the subject
of investigation of the film Faceless.
From Act to Manifesto
“I wish to apply, under the Data Protection Act,
for any and all CCTV images of my person held
within your system. I was present at [place] from
approximately [time] onwards on [date].” 8
For several years, ambientTV.NET conducted a series of exercises
to visualise the data traces that we leave behind, to render them
into experience and to dramatise them, to watch those who watch
us. These experiments, scrutinising the boundary between public
and private in post-9/11 daily life, were run under the title ‘the Spy
School'. In 2002, the Spy School carried out an exercise to test the
reach of the UK Data Protection Act as it applies to CCTV image
data.
The Data Protection Act 1998 seeks to strike a balance between
the rights of individuals and the sometimes competing interests
of those with legitimate reasons for using personal information.
The DPA gives individuals certain rights regarding information
held about them. It places obligations on those who process information (data controllers) while giving rights to those who are
the subject of that data (data subjects). Personal information
covers both facts and opinions about the individual. 9

7
9

The full text of the DPA (1998) is at http://www.opsi.gov.uk/ACTS/acts1998
/19980029.htm
Data Protection Act Fact Sheet available from the UK Information Commissioners
Offce, http://www.ico.gov.uk

107

107

107

108

108

The original DPA (1984) was devised to ‘permit and regulate'
access to computerised personal data such as health and financial
records. A later EU directive broadened the scope of data protection
and the remit of the DPA (1998) extended to cover, amongst other
data, CCTV recordings. In addition to the DPA, CCTV operators
‘must' comply with other laws related to human rights, privacy, and
procedures for criminal investigations, a


: Lights out for the territory, Granta, London, 1998, p. 91)

108

108

108

109

109

“RealTime orients the life of every citizen. Eating, resting, going
to work, getting married – every act is tied to RealTime. And every
act leaves a trace of data – a footprint in the snow of noise...” 11
The film plays in an eerily familiar city, where the reformed RealTime calendar has dispensed with the past and the future, freeing
citizens from guilt and regret, anxiety and fear. Without memory or
ant


ucture. Faceless interrogates the laws that govern the video surveillance of society
and the codes of communication that articulate their operation, and
in both its mode of coming into being and its plot, develops a specific
critique.
Reclaiming the data body
Through putting the DPA into practice and observing the consequences over a long exposure, close-up, subtle developments of the
law were made visible and its strengths and lacunae revealed.
“I can confirm there are no such recordings of
yourself from that date, our recording system was
not working at that time.” (11/2003)

11

Faceless, 2007

109

109

109

110

110

Many data requests had negative outcomes because either the surveillance camera, or the recorder, or the entire CCTV system in question
was not operational. Such a situation constitutes an illegal use of
CCTV: the law demands that operators: “comply with th


her that can be
done regarding the tapes, and I can only apologise
for all the inconvenience you have been caused.”
(11/2003)
Technical failures on this scale were common. Gross human errors
were also readily admitted to:

12

CCTV Systems and the Data Protection Act 1998, available from http://www.ico.gov
.uk

110

110

110

111

111

“As I had advised you in my previous letter, a request was made to remove the tape and for it not
to be destroyed. Unhappily this request was not
carried out and


s
look quite indistinct in the tape, but the picture you sent to us shows you wearing a similar
fur coat, and our main identification had been made
through this and your description of the location.”
(07/2002)

111

111

111

112

112

To release data on the basis of such weak identification compounds
the failure.
Much confusion is caused by the obligation to protect the privacy
of third parties in the images. Several data controllers claimed that
this relieved them of their duty to release images:
“[... W]e are not able to supply you with the images you requested because to do so would involve
disclosure of information and images relating to
other persons who can be identified from the tape
and we are not in a position to obtain their consent to disclosure of the images. Further, it is
simply not possible for us to eradicate the other
images. I would refer you to section 7 of the Data
Protection Act 1998 and in particular Section 7
(4).” (11/2003)
Even though the section referred to states that it is:
“not to be construed as excusing a data controller
from communicating so much of the information
sought by the request as can be communicated without disclosing the identity of the other individual concerned, whether by the omission of names or
other identifying particulars or otherwise.”
Where video is concerned, anonymisation of third parties is an expensive, labour-intensive procedure – one common technique is to occlude
each head with a black oval. Data controllers may only charge the
statutory maximum of 10 £ per request, though not all seemed to be
aware of this:

112

112

112

113

113

“It was our understanding that a charge for production of the tape should be borne by the person
making t


f with
their heads!

Visually provocative and symbolically charged as the occluded heads
are, they do not necessarily guarantee anonymity. The erasure of a
face may be insuffcient if the third party is known to the person requesting images. Only one data controller undeniably (and elegantly)
met the demands of third party privacy, by masking everything but
the data subject, who was framed in a keyhole. (This was an uncommented second offering; the first tape sent was unprocessed.) One
CCTV operator discovered a useful loophole in the DPA:
“I should point out that we reserve the right, in
accordance with Section 8(2) of the Data Protection
Act, not to provide you with copies of the information requested if to do so would take disproportionate effort.” (12/2004)
What counts as ‘disproportionate effort'? The gold standard was set
by an institution whose approach was almos


o be
passed to the branch sundry income account.” (Head
of Security, internal communication 09/2003)
From 2004, the process of obtaining images became much more difficult.
“It is clear from your letter that you are aware
of the provisions of the Data Protection Act and
that being the case I am sure you are aware of
the principles in the recent Court of Appeal decision in the case of Durant vs. financial Services Authority. It is my view that the footage you
have requested is not personal data and therefore
[deleted] will not be releasing to you the footage
which you have requested.” (12/2004)
Under Common Law, judgements set precedents. The decision in
the case Durant vs. financial Service Authority (2003) redefined
‘personal data'; since then, simply featuring in raw video data does
not give a data subject the right to obtain copies of the recording.
Only if something of a biographical nature is revealed does the subject
retain the right.

114

114

114

115

115

“Having considered the matter carefully, we do not
believe that the information we hold has the necessary relevance or proximity to you. Accordingly
we do not believe that we are obligated to provide
you with a copy pursuant to the Data Protection Act
1988. In particular, we would remark that the video
is not biographical of you in any significant way.”
(11/2004)
Further, with the introduction of cameras that pan and zoom, being
filmed as part of a crowd by a static camera is no longer grounds for
a data request.
“[T]he Information Commissioners office has indicated that this would not constitute your personal
data as the system has been set up to monitor the
area and not one individual.” (09/2005)
As awareness of the importance of data rights grows, so the actual
provision of those rights diminishes:

115

115

115

116

116

figure 89
Still from
Faceless,
2007

"I draw your attention to CCTV systems and the Data
Protection Act 1998 (DPA) Guidance Note on when the
Act applies. Under the guidance notes our CCTV system is no longer covered by the DPA [because] we:
• only have a couple of cameras
• cannot move them remotely
• just record on video whatever the cameras pick
up
• only give the recorded images to the police to
investigate an incident on our premises"
(05/2004)
Data retention periods (which data controllers define themselves)
also constitute a hazard to the CCTV filmmaker:
“Thank you for your letter dated 9 November addressed to our Newcastle store, who have passed
it to me for reply. Unfortunately, your letter was
delayed in the post to me and only received this
week. [...] There was nothing on the tapes that you
requested that caused the store to


he Met are to
have live access to them, having been exempted from parts of the
Data Protection Act to do so. 15 As such realities of CCTV's daily
operation become more widely known, existing acceptance may be
somewhat tempered.
Physical bodies leave data traces: shadows of presence, conversation, movement. Networked databases incorporate these traces into
data bodies, whose behaviour and risk are priorities for analysis and
commodification, by business and by government. The securing of
a data body is supposedly necessary to secure the human body, either preventatively or as a forensic tool. But if the former cannot
be assured, as is the case, what grounds are there for trust in the
hollow promise of the latter? The all-seeing eye of the panopticon is
not complete, yet. Regardless, could its one-way gaze ever assure an
enabling conception of security?

15

Surveillance State Function Creep – London Congestion Charge “real-time bulk data”
to be automatically handed over to the Metropolitan Police etc. http://p10.hostingprod
.com/@spyblog.org.uk/blog/2007/07/surveillance_state_function_creep_london_congestion
_charge_realtime_bulk_data.html

118

118

118

119

119

MICHAEL MURTAU


sus a
‘slideshow', versus a ‘music video', together with a sense that these
different kinds of material might need to be handled differently. Each
clip is compressed in a uniform way, meaning at the moment into a
flash format video file of fixed data rate and screen size.
Clips have no history
Despite these limitations, users of YouTube have found workarounds
to, for instance, download clips to then rework them into derived clips.
Although the derived works are often placed back again on YouTube


h comparing the kinds of algorithmic processes that take
place in DSP with those found in new media more generally. Although
it is an incredibly broad generalisation, I think it is safe to say that
DSP does not belong to the set-based algorithms and data-structures
that form the basis of much interest in new media interactivity or
design.
DSP differs from set-based code. If we think of social software such
as flickr, Google, or Amazon, if we think of basic information infrastructures such as relational databases or networks, if we think of
communication protocols or search engines, all of these systems rely
on listing, enumerating, and sorting data. The practices of listing,
indexing, addressing, enumerating and sorting, all concern sets. Understood in a fairly abstract way, this is what much software and code
does: it makes and changes sets. Even areas that might seem quite
remote from set-ma


cebook or
YouTube also can be understood as massive deployments of set theory
in the form of code. Their sociality is very much dependent on set
making and set changing operations, both in the composition of the
user interfaces and in the underlying databases that make constantly
seek to attach new relations to data, to link identities and attributes.
In terms of activism, and artwork, relations that can be expressed in
the form of sets and operations on sets, are highly manipulable. They
can be learned relatively easily, and they are not too difficult to work


tbooks in these areas often do
not mention DSP. The distinction between DSP and other forms of
computation is clearly defined in a textbook of DSP:
Digital Signal Processing is distinguished from other areas in
computer science by the unique type of data it uses: signals.
In most cases, these signals originate as sensory data from the
real world: seismic vibrations, visual images, sound waves, etc.
DSP is the mathematics, the algorithms, and the techniques

158

158

158

159

159

used to manipulate these signals after they have been converted
into a digital form. (Smith, 2004)
While it draws on some of the logical and set-based operations
found in code in general, DSP code deals with signals that usually involve some kind of sensory data – vibrations, waves, electromagnetic
radiation, etc. These signals often involve forms of rapid movement,
rhythms, patterns or fluctuations. Sometimes these movements are
embodied in physical senses, such as the movements of air involved in
hearin


applied to chunks of
sensation – video frames – to make them into something that can be
manipulated, stored, changed in size or shape, and circulated. Notice
160

160

160

161

161

that the code here is quite opaque in comparison to the graph data
structures discussed previously. This opacity reflects the sheer number of operations that have to be compressed into code in order for
digital signal processing to work.
Working with DSP: architecture and geography
So we can perhaps see from the tw


ure.
Physically, codecs take many forms, in software and hardware. Today, codecs nestle in
set-top boxes, mobile phones, video cameras and webcams, personal computers, media
players and other gizmos. Codecs perform encoding and decoding on a digital data
stream or signal, mainly in the interest of finding what is different in a signal and what
is mere repetition. They scale, reorder, decompose and reconstitute perceptible images
and sounds. They only move the differences that matter through informat


ame time, yet all be individualised and separate? The flow of
information and messages promises something highly individualised
(we saw this in the UMPC video from Intel). In terms of this individualising change, the movement of images, messages and data, and the
movement of people, have become linked in very specific ways today.
The greater the degree of individualization, the more dense becomes
the mobility of people and the signals they transmit and receive. And
as people mobilise, they drag pers


ier in the decoding
process in contemporary wireless networks, a fairly generic computational algorithm comes into action: the Fast Fourier Transform
(ffT). In some ways, it is not surprising to find the ffT in wireless networks or in digital video. Dating from the mid-1960s, ffTs
have long been used to analyse electrical signals in many scientific
and engineering settings. It provides the component frequencies of
a time-varying signal or waveform. Hence, in ‘spectral analysis', the
ffT can show t


f waveforms. The envelope of a signal becomes something that
contains many simple signals. It is interesting that wireless networks
tend to use this process in reverse. It deliberately takes a well-separated and discrete set of signals – a digital datastream – and turns it
into a single complex signal. In contrast to the normal uses of ffT in
separating important from insignificant parts of a signal, in wireless
networks, and in many other communications setting, ffT is used to
put signals toget


ence of other transmitters into account. How does the ffT allow many transmitters to
inhabit the same spectrum, and even use the same frequencies?
The name of this technique is OFDM (Orthogonal Frequency Division Multiplexing). OFDM spreads a single data stream coming
from a single device across a large number of sub-carriers signals (52
in IEEE 802.11a/g). It splits the data stream into dozens of separate signals of slightly different frequency that together evenly use
the whole available radio spectrum. This is done in such a way that
many different transmitters can be transmitting at the same time,
on the same frequency, without interfering with each other. The advantage of spreading a single high speed data stream across many
signals (wideband) is that each individual signal can carry data at a
169

169

169

170

170

much slower rate. Because the data is split into 52 different signals,
each signal can be much slower (1/50). That means each bit of data
can be spaced apart more in time. This has great advances in urban
environments where there are many obstacles to signals, and signals
can reflect and echo often. In this context, the slower the data is
transmitted, the better.
At the transmitter, a reverse ffT (IffT) is used to re-combine
the 50 signals onto 1 signal. That is, it takes the 50 or so different
sub-carriers produced by OFDM, each of which has a single slightly
different, but caref


oks like
'white noise': it has no remarkable or outstanding tendency whatsoever, except to a receiver synchronised to exactly the right carrier
frequency. At the receiver, this complex signal is transformed, using ffT, back into a set of 50 separate data streams, that are then
reconstituted into a single high speed stream.
Even if we cannot come to grips with the techniques of transformation using in DSP in any great detail, I hope that one point stands
out. The transformation involves ‘c'hanges in kind. Data does not
simply move through space. It changes in kind in order to move
through space, a space whose geography is understood as too full of
potential relations.
Conclusion
A couple of points in conclusion:
a. The spectrum of different wireless-audio


of performance and dance
video art, and (documentary) film, which reflect upon our complex
body–technique relations. Searching for the indicating, probing, disturbing or subverting gesture(s) in the endless feedback loop between
technology, tools, data and bodies, we collected historical as well as
contemporary material for this temporary archive.

Modern Times or the Assembly Line
Reflects the body in work environments, which are structured by
technology, ranging from the pre-industrial manual wo


of the feminist ‘liberation', or the
monster that devours us.” (Insite 2000
program, San Diego Museum of Art)

http://www.livemovie.org

Perform the script, write the score
Considers dance and performance as knowledge systems where movement and data interact. With excerpts of performance documents,
interviews and (dance) films. But also the script, the code, as system
of perversion, as an explorative space for the circulation of bodies.
William Forsythe's works
Choreography can be understood as


at work.
In a Skype conversation that was live broadcast in La Bellone during Verbindingen/Jonctions 10, we spoke about ingimp, a clone of the
popular image manipulation programme Gimp, but with an important difference. Ingimp allows users to record data about their usage
in to a central database, and subsequently makes this data available
to anyone.
At the Libre Graphics Meeting 2008 in Wroclaw, just before Michael
Terry presents ingimp to an audience of Gimp developers and users,
Ivan Monroy Lopez and Femke Snelting meet up with Michael Terry
again to talk more about the p


l, there is this
group of people who appear to be using it in this way, these are the
characteristics of their environment, these are the sets of tools they
work with, these are the types of images they work with and so on,
so that we have some real data to ground discussions about how the
software is actually used by people.
You asked me now why Gimp? I actually used Gimp extensively
for my PhD work. I had these little cousins come down and hang
out with me in my apartment after school, and I would


is a
great application, there is a lot of power to it, and I had already an
investment in its code base, so it made sense to use that as a platform
for testing out ideas of open instrumentation.
FS: What is special about ingimp, is the fact that the data you
collect, is equally free to use, run, study and distribute, as the software
you are studying. Could you describe how that works?

218

218

218

219

219

MT: Every bit of data we collect, we make available: you can go to
the website, you can download every log file that we have collected.
The intent really is for us to build tools and infrastructure so that the
community itself can sustain this analysis, can sustain this


and I both had this
experience where we work with user interfaces, and since everybody
uses an interface, everybody feels they are an expert, so there can be
a lot of noise. So, not only did we want to create an open environment for collecting this data, and analysing it, but we also wanted to
increase the chance that we are making valuable contributions, and
that the community itself can make valuable contributions. Like I
said, there is enough opinion out there. What we really need to do
is to be


g used. So, we have
made a point from the start to try to be as open as possible with
everything, so that anyone can really contribute to the project.
FS: Ingimp has been running for a year now. What are you finding?
MT: I have started analysing the data, and I think one of the things
that we realised early on is that it is a very rich data set; we have lots
and lots of data. So, after a year we've had over 800 installations, and
we've collected about 5000 log files, representing over half a million
commands, representing thousands of hours of the application being
used. And one of the things you have to realise is that when you have
a data set of that size, there are so many different ways to look at it
that my particular perspective might not be enough. Even if you sit
219

219

219

220

220

someone down, and you have him or her use the software for twenty
minutes, and you videota


available, but
they really didn't have an infrastructure for analysing them. So, we
created this new piece of software called ‘Stats Jam', an extension
to MediaWiki, which allows anyone to go to the website and embed
SQL-queries against the ingimp data set and then visualise those
results within the Wiki text. So, I'll be announcing that today and
demonstrating that, but I have been using that tool now for a week
to complement the existing data analysis we have done.
One of the first things that we realized is that we have over 800
installations, but then you have to ask, how many of those are really serious users? A lot of people probably just were curious, they
downloaded it and installed it, found that it didn't really do much
for them and so maybe they don't use it anymore. So, the first thing
we had to do is figure out which data points should we really pay
attention to. We decided that a person should have used ingimp on
two different occasions, preferably at least a day apart, where they'd
saved an image on both of the instances. We used that as an indication of what a serious user is. So with that filter in place, the ‘800
installations' drops down to about 200 people. So we had about 200
people using ingimp; and looking at the data, this represents about
800 hours of use, about 4000 log files, and again still about half a
million commands. So, it's still a very significant group of people.
200 people are still a lot, and that's a lot of data, representing about
11000 images they have been working on – there's just a lot.
From that group, what we found is that use of ingimp is really
short and versatile. So, most sessions are about fifteen minutes or
less, on average. There are outlier


are some people who use it
for longer periods of time, but really it boils down to them using it for
about fifteen minutes, and they are applying fewer than a hundred
operations when they are working on the image. I should probably
be looking at my data analysis as I say this, but they are very quick,
220

220

220

221

221

short, versatile sessions, and when they use it, they use less than 10
different tools, or they apply less than 10 different commands.
What else did we find? We found that th


e how people are using
that as a medium for communicating to us. Some people will say,
“Just testing out, ignore this!” Or, people are trying to do things like
insert HTML code, to do like a cross-site scripting attack, because,
you have all the data on the website, so they will try to play with
that. Some people are very sparse and they say ‘image manipulation'
221

221

221

222

222

or ‘graphic design' or something like that, but then some people are
much more verbose, and they give more of a plan, “This is what I
expect to be doing.” So, I think it has been interesting to see how
people have adopted that and what's nice about it, is that it adds a
really nice human element to all this empirical data.
Ivan Monroy Lopez (IM): I wanted to ask you about the data;
without getting too technical, could you explain how these data are
structured, what do the log files look like?
MT: So the log files are all in XML, and generally we compress
them, because they can get rather large. And the reason that they
are rather large is that we are very verbose in our logging. We want
to be completely transparent with respect to everything, so that if
you have some doubts or if you have some questions about what kind
of data has been collected, you should be able to look at the log file,
and figure out a lot about what that data is. That's how we designed
the XML log files, and it was really driven by privacy concerns and
by the desire to be transparent and open. On the server side we take
that log file and we parse it out, and then we throw it into a database,
so that we can query the data set.
FS: Now we are talking about privacy. . . I was impressed by the
work you have done on this; the project is unusually clear about why
certain things are logged, and other things not; mainly to prevent
the possibility of ‘playing back' actions so that one could identify
individual users from the data set. So, while I understand there are
privacy issues at stake I was wondering... what if you could look at the
collected data as a kind of scripting for use, as writing a choreography
that might be replayed later?
MT: Yes, we have been fairly conservative with the type of information that we collect, because this really is the first instance where
anyone has captured such rich data about how people are using software on a day to day basis, and then made it all that data publicly
222

222

222

223

223

available. When a company does this, they will keep the data internally, so you don't have this risk of someone outside figuring something out about a user that wasn't intended to be discovered. We
have to deal with that risk, because we are trying to go about this
in a very open and transparent way, which means that people may
be able to subject our data to analysis or data mining techniques
that we haven't thought of, and extract information that we didn't
intent to be recording in our file, but which is still there. So there are
fairly sophisticated techniques where you can do things like look at
audio recordings of


nd name
to the documentation – but I didn't have time to implement it, but
certainly there are possibilities like that, you can imagine.
FS: Maybe another group can figure something out like that? That's
the beauty of opening up your software plus data set of course.
Well, just a bit more on what is logged and what not... Maybe you
could explain where and why you put the limit, and what kind of use
you might miss out on as a result?
MT: I think it is important to keep in mind that whatever instrum


nitor maybe, or maybe you are not really seeing what their
hands are doing. No matter what instrument you use, you are always
getting a particular slice.
I think you have to work backwards and ask what kind of things
do you want to learn. And so the data that we collect right now, was
really driven by what people have done in the past in the area of instrumentation, but also by us bringing people into the lab, observing
them as they are using the application, and noticing particular behaviours and saying, hey, that seems to be interesting, so what kind of
data could we collect to help us identify those kind of phenomena, or
that kind of performance, or that kind of activity? So again, the data
that we were collecting was driven by watching people, and figuring
out what information will help us to identify these types of activities.
As I've said, this is really the first project that is doing this, and
we really need to make sure we don't


matter of time before people start doing
this, because there are a lot of grumblings about, “We should be
doing instrumentation, someone just needs to sit down and do it.”
Now there is an extension out for firefox that will collect this kind
of data as well, so you know. . .
IM: Maybe users could talk with each other, and if they are aware
that this type of monitoring could happen, then that would add a
different social dimension. . .
MT: It could. I think it is a matter of awareness, really. W


iginal
string was.
There are these little ‘gotchas' like that, that I don't think most
people are aware of, and this is why I get really concerned about
instrumentation efforts right now, because there isn't this body of
experience of what kind of data should we collect, and what shouldn't
we collect.
FS: As we are talking about this, I am already more aware of what
data I would allow being collected. Do you think by opening up this
data set and the transparent process of collecting and not collec


ifferent tool very clear.
MT: Right. You are more aware, because you are making that
choice to download that, compared to the regular version. There is
this awareness about that.
We have this lengthy text based consent agreement that talks about
the data we collect, but less than two percent of the population reads
license agreements. And, most of our users are actually non-native
English speakers, so there are all these things that are working against
us. So, for the past year we have really been focussing on privacy, not
only in terms of how we collect the data, but how we make people
aware of what the software does.
We have been developing wordless diagrams to illustrate how the
software functions, so that we don't have to worry about localisation
errors as much. And so we have these illustrations that show someone
downloading ingimp, starting it up, a graph appears, there is a little
icon of a mouse and a keyboard on the graph, and they type and you
see the keyboard bar go up, and then at the end when they close the
application, you see the data being sent to a web server. And then
227

227

227

228

228

we show snapshots of them doing different things in the software, and
then show a corresponding graph change. So, we developed these by
bringing in both native and non-native speakers, h



and researchers have been doing instrumentation for at least ten years,
probably ten to twenty years. So, the idea is not new, but what is
new – in terms of the research aspects of this –, is how do we do this
in a way where we can make all the data open? The fact that you
make the data open, really impacts your decision about the type of
data you collect and how you are representing it. And you need to
really inform people about what the software does.
But I think your question is... how does it impact the Gimp's
usability process? Not at all, right now. But that is because we have
intentionally been laying off to the side, until we got to the point
where we had an infrastructure, where the entire community could
really participate with the data analysis. We really want to have
this to be a self-sustaining infrastructure, we don't want to create a
system where you have to rely on just one other person for this to
work.
IM: What approach did you take in order to make this project
self-sustainable?
228

228

228

229

229

MT: Collecting data is not hard. The challenge is to understand
the data, and I don't want to create a situation where the community
is relying on only one person to do that kind of analysis, because this
is dangerous for a number of reasons. first of all, you are creating
a dependency on an external party, and that part


eave at some point.
If that is the case, then you need to be able to pass the baton to
someone else, even if that could take a considerate amount of time
and so on.
You also don't want to have this external dependency, because of
the richness in the data, you really need to have multiple people
looking at it, and trying to understand and analyse it. So how are
we addressing this? It is through this Stats Jam extension to the
MediaWiki that I will introduce today. Our hope is that this type
of tool will lower the barrier for the entire community to participate
in the data analysis process, whether they are simply commenting on
the analysis we made or taking the existing analysis, tweaking it to
their own needs, or doing something brand new.
In talking with members of the Gimp project here at the Libre
Graphics Meeting, they started asking questions like, “So how many
people are doing this, how many people are doing this and how many
this?” They'll ask me while we are sitting in a café, and I will be able
to pop the database open and say, “A certain number of people have
done this.” or, “No one has actually used this tool at all.”
The danger is that this data is very rich and nuanced, and you
can't really reduce these kinds of questions to an answer of “N people
do this”, you have to understand the larger context. You have to
understand why they are doing it, why they are not doing it. So, the
data h


or is this some more
widespread phenomenon?
They asked me yesterday how many people are using this colour
picker tool – I can't remember the exact name – so I looked and there
229

229

229

230

230

was no record of it being used at all in my data set. So I asked them
when did this come out, and they said, “Well it has been there at
least since 2.4.” And then you look at my data set, and you notice
that most of my users are in the 2.2 series, so that could be part of
the reasons. Another reason could be, that they just don't know that
it is there, they don't know how to use it and so on. So, I can answer
the question, but t


ed across all log files? Is it the number of people that have
used it? Is it the number of log files where it has been used at least
once? There are lots and lots of ways in which you can interpret
this question. So, you really need to approach this data analysis as
a discourse, where you are saying: here are my assumptions, here is
how I am getting to this conclusion, and this is what it means for
this particular group of people. So again, I think it is dangerous if
one person does that and you bec


ace design,
I see it really as a sort of reality check: this is how communities are
using the software and now you can take that information and ask,
230

230

230

231

231

do we want to better support these people or do we. . . For example
on my data set, most people are working on relatively small images
for short periods of time, the images typically have one or two layers,
so they are not really complex images. So regarding your question,
one of the things you can ask is, should we be creatin


ing, fairly common operations, so should we create a tool
that strips away the rest of the stuff? Or, should we figure out why
people are not using any other functionality, and then try to improve
the usability of that?
There are so many ways to use data – I don't really know how
it is going to be used, but I know it doesn't drive design. Design
happens from a really good understanding of the users, the types of
tasks they perform, the range of possible interface designs that are
out there, lots of prototyping, evaluating those prototypes and so on.
Our data set really is a small potential part of that process. You can
say, well, according to this data set, it doesn't look like many people
are using this feature, let's not too much focus on that, let's focus on
these other features or conversely, let's figure out why they are not
using them. . . Or you might even look at things like how big their


oftware, actor network theory, digital archives, knowledge management,
machine readability, semantic web,
data mining, information visualization, profiling, privacy, ubiquitous
computing, locative media.

292

292

293

293

ware, de compilatie van data en de
exploratie van numerieke archieven en
privacy. In 2007 behaalde hij een M.A.
in Media Design aan het Piet Zwart
Instituut in Rotterdam.

amazons (1st version in Tanzfabrik,
2nd in Ausland, Berlin) and The
bitch is back under pressure (reloaded


f a perfect day - a complex story
that combines autobiographical facts
and fictions.

Tsila Hassine
http://www.missdata.org/

EN

Tsila Hassine is a media artist / designer.
Her interests lie with the
hidden potentialities withheld in the
electronic data mines. In her practice she endeavours to extrude undercurrents of information and traces of
processes that are not easily discerned
through regular consumption of mass
networked media. This she accomplishes through repetitive misuse of
available pla


dat in Constant 2015


es Lieux

Distributed Version Control

Even when you are done, you are not done
Having the tools is just the beginning
Data analysis as a discourse

Why you should own the beer company you design for
Just Ask and That Will Be That
Tying the story to data
Unicodes

If the design thinking is correct, the tools should be irrelevant
You need to copy to understand
What’s the thinking here

The construction of a book (Aether9)
Performing Libre Graphics

The Making of Conversations

7
13
23
37
47
71
84
9


function, I think someone would implement it for Scribus in a short
time, and I think we would actually like it. Maybe we would generalize it a
little, so that for example you could also add other licenses too. We already
have support for some meta data, and in the future we might put some more
function in to support license managing, for example also for fonts.
6
7

because the fonts get outlined and/or reencoded
http://creativecommons.org/press-releases/entry/5947

18

About the relation between


ia. Adobe FreeHand — Wikipedia, The Free Encyclopedia, 2014. [Online; accessed 18.12.2014]
OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType,
retaining TrueType’s basic structure and adding many intricate data structures for prescribing
typographic behavior. Wikipedia. Opentype — wikipedia, the free encyclopedia, 2014. [Online; accessed 18.12.2014]
Unicode is a computing industry standard for the consistent encoding, representation, and
handling of text


Linux user group, a group of Linux users who meet
regularly to really work together on Linux and Free Software. It is the most
active group of Linux users in the French speaking part of Belgium.

How did you come into contact with this group?

That dates a while back. I have been trained in Linux a long time ago ...
Five years? Ten years? Twenty years?

Almost twenty years ago. I came across the beginnings of Linux in 1995 or
1996, I am not sure. I had some Slackware 2 installed, I messed around wi


) in 2014. She studied communication in
Istanbul and Leuven and joined Constant for a few months
to document the various working practices at Constant
Variable. Between 2011 and 2014, Variable housed studios
for Artists, Designers, Techno Inventors, Data Activists,
Cyber Feminists, Interactive Geeks, Textile Hackers, Video
Makers, Sound Lovers, Beat Makers and other digital creators who were interested in using F/LOS software for
their creative experiments.

Why do you think people should use and or


hat is missing from the analysis in
Free Culture discourse, the economic reality. It depends on where they (developers) work. A lot of them are employed by companies so they get a
salary. Others do it for a hobby. I’d be interested to get accurate data on
what percentage of F/LOSS developers are getting paid, etc. In the absence
of that data, I think it’s fair to say it is an unsolved problem. If we think
that developers ‘should’ be compensated for their work, then we need to talk
about capitalism. Or at least, about statutory funding models.

104

It is interesting that you used


g. What Git does, is that it
makes that process somehow transparent in the sense that, it takes care of
it for you. Or better, you have to make it take care for you. So instead of
having all files visible in your working directory, you put them in a database,
so you can go back to them later on. And then you have some commands to
manipulate this history. To show, to comment, to revert to specific versions.
More than versioning your own files, it is a tool to synchronize your work
with others. It all


to go where.
This doesn’t really work for the Etherpad-style-direct-collaboration. For
me it’s cool to think about how you could make these things work together.
Now I’m working on this collaborative font editor which does that in some
sort of database. How would that work? It would not work if every revision
would be in the Git. I was thinking you could save, or sort of commit, and
that would put it in a Git repository, this you can pull and push. But if
you want to have four people working t


transpose that table. If you’re just flipping it 90 degrees then you are using
it as a layout grid, and not as a table. That’s one obvious thing. Even then,
deciding to display it as a tabular thing means that it probably came from a
much bigger dataset, and you’ve just chosen to sum all of the sales data over
147

one year. Another one: you have again the sales data, you could have it as pie
chart, but you could also have it as a bar chart, you could have it in various
other ways. You can imagine that what you would do is ship some XML
that has that data, and then you would have a script or something which
would turn it into an SVG pie chart. And you could have a bar chart, or you
could also say show me only February. That interaction is one of the things
that one can do, and arguably you’re givin


within Fontlab. The ever important thing about RoboFab was
that they developed UFO, I think it’s the Universal Font Object – I’m not
sure what the exact name is – but it’s a XML font format which means that
you can interchange font source data with different programs and specifically
that means that you have a really good font interpolation program that can
read and write that UFO XML format and then you can have your regular
type design format font editor that will generate bitmap font formats that
you actually use in a system. You can write your own tool for a specific
task and push and pull the data back and forth. Some of these Dutch guys,
especially Erik has written a really good interpolation tool. So, as a kind
of thread in the story of font. Remember that time where Fontographer
was not developed actively then you have Georges Williams fro


rsity of Waterloo, Canada and his
main research focus is on improving usability in Open Source
software. We speak about ingimp, a clone of the popular image
manipulation programme GIMP, but with an important difference: ingimp allows users to record data about their usage in to
a central database, and subsequently makes this data available to
anyone. This conversation was also published in the Constant
publication Tracks in electr(on)ic fields.
Maybe we could start this conversation with a description of the ingimp project
you are developing and why you chose to work on usab


re is this group of people who appear to be using it in this way,
these are the characteristics of their environment, these are the sets of tools
171

they work with, these are the types of images they work with and so on, so
that we have some real data to ground discussions about how the software
is actually used by people. You asked me now why GIMP? I actually used
GIMP extensively for my PhD work. I had these little cousins come down
and hang out with me in my apartment after school, and I would


t it is a great application, there is a lot of power to it, and I had already
an investment in its code base so it made sense to use that as a platform for
testing out ideas of open instrumentation.
What is special about ingimp, is the fact that the data you generate is made by
the software you are studying itself. Could you describe how that works?
Every bit of data we collect, we make available: you can go to the website,
you can download every log file that we have collected. The intent really
is for us to build tools and infrastructure so that the community itself can
sustain this analysis, can sustain this


and I both had this experience where we work with
user interfaces, and since everybody uses an interface, everybody feels they
are an expert, so there can be a lot of noise. So, not only did we want to
create an open environment for collecting this data, and analysing it, but we
172

also want to increase the chance that we are making valuable contributions,
and that the community itself can make valuable contributions. Like I said,
there is enough opinion out there. What we really need to do is t


is being used. So, we have made a point from
the start to try to be as open as possible with everything, so that anyone can
really contribute to the project.
ingimp has been running for a year now. What are you finding?
I have started analysing the data, and I think one of the things that we
realised early on is that it is a very rich data set; we have lots and lots of
data. So, after a year we’ve had over 800 installations, and we’ve collected
about 5000 log files, representing over half a million commands, representing thousands of hours of the application being used. And one of the things
you have to realise is that when you have a data set of that size, there are so
many different ways to look at it that my particular perspective might not
be enough. Even if you sit someone down, and you have him or her use the
software for twenty minutes, and you videotape it, then you can spend


ilable, but they really didn’t
have an infrastructure for analysing them. So, we created this new piece of
software called ‘StatsJam’, an extension to MediaWiki, which allows anyone
to go to the website and embed SQL-queries against the ingimp data set
and then visualise those results within the Wiki text. So, I’ll be announcing
that today and demonstrating that, but I have been using that tool now for
a week to complement the existing data analysis we have done. One of the
first things that we realized is that we have over 800 installations, but then
you have to ask, how many of those are really serious users? A lot of people
probably just were curious, they downloaded it and installed it, found that it
didn’t really do much for them and so maybe they don’t use it anymore. So,
the first thing we had to do is figure out which data points should we really
pay attention too. We decided that a person should have saved an image,
and they should have used ingimp on two different occasions, preferably at
least a day apart, where they’d saved an image on both of the instances. We
used that as an indication of what a serious user is. So with that filter in
place, then the ‘800 installations’ drops down to about 200 people. So we
had about 200 people using ingimp, and looking at the data this represents
173

about 800 hours of use, about 4000 log files, and again still about half a million commands. So, it’s still a very significant group of people. 200 people
is still a lot, and that’s a lot of data, representing about 11000 images they
have been working on, there’s just a lot.
From that group, what we found is that use of ingimp is really short and
versatile. So, most sessions are about fifteen minutes or less, on average.
There are outliers


are some people who use it for longer periods of
time, but really it boils down to them using it for about fifteen minutes, and
they are applying fewer than a hundred operations when they are working on
the image. I should probably be looking at my data analysis as I say this, but
they are very quick, short, versatile sessions, and when they use it, they use
less than 10 different tools, or they apply less than 10 different commands
when they are using it. What else did we find? We found that the t


to see how people are using that as a medium
for communicating to us. Some people will say, Just testing out, ignore this!
Or, people are trying to do things like insert HTML code, to do like a
cross-site scripting attack, because, you have all the data on the website, so
they will try to play with that. Some people are very sparse and they say
‘image manipulation’ or ‘graphic design’ or something like that, but then
some people are much more verbose, and they give more of a plan, This
is what I expect to be doing. So, I think it has been interesting to see how
people have adopted that and what’s nice about it, is that it adds a really nice
human element to all this empirical data.
I wanted to ask you about the data, without getting too technical, could
you explain how these data are structured, what do the log files look like?

So the log files are all in XML, and generally we compress them, because
they can get rather large. And the reason that they are rather large is that we
are very verbose in our logging. We want to be completely transparent with
respect to everything, so that if you have some doubts or if you have some
questions about what kind of data has been collected, you should be able to
look at the log file, and figure out a lot about what that data is. That’s how
we designed the XML log files, and it was really driven by privacy concerns
and by the desire to be transparent and open. On the server side we take
that log file and we parse it out, and then we throw it into a database, so
that we can query the data set.
Now we are talking about privacy ... I was impressed by the work you have done
on this; the project is unusually clear about why certain things are logged, and
other things not; mainly to prevent the possibility of ‘playing back’ actions so that
one could identify individual users from the data set. So, while I understand
there are privacy issues at stake I was wondering ... what if you could look at the
collected data as a kind of scripting for use? Writing a choreography that might
be replayed later?
Yes, we have been fairly conservative with the type of information that we
collect, because this really is the first instance where anyone has captured
such rich data about how people are using software on a day to day basis,
and then made it all that data publicly available. When a company does
175

this, they will keep the data internally, so you don’t have this risk of someone outside figuring something out about a user that wasn’t intended to be
discovered. We have to deal with that risk, because we are trying to go about
this in a very open and transparent way, which means that people may be
able to subject our data to analysis or data mining techniques that we haven’t
thought of and extract information that we didn’t intent to be recording in
our file, but which is still there. So there are fairly sophisticated techniques
where you can do things like look at audio recordings


d name to the documentation – but I didn’t have time to
implement it, but certainly there are possibilities like that, you can imagine.

Maybe another group can figure something out like that? That’s the beauty of
opening up your software plus data set of course. Well, just a bit more on what
is logged and what not ... Maybe you could explain where and why you put the
limit and what kind of use you might miss out on as a result?
176

I think it is important to keep in mind that whatever instr


le have done
in the past in the area of instrumentation, but also by us bringing people
into the lab, observing them as they are using the application, and noticing
particular behaviours and saying, hey, that seems to be interesting, so what
kind of data could we collect to help us identify those kind of phenomena,
or that kind of performance, or that kind of activity? So again, the data that
we were collecting was driven by watching people, and figuring out what
information will help us to identify these types of activities. As I’ve said,
this is really the first project that is doing this, and we really need to make
sure we don


only a matter of time before people start doing this, because
there are a lot of grumblings about, we should be doing instrumentation, someone just needs to sit down and do it. Now there is an extension out for Firefox
that will collect this kind of data as well, so you know ...
Maybe users could talk with each other, and if they are aware that this
type of monitoring could happen, then that would add a different social
dimension ...
177

It could. I think it is a matter of awareness, really, so when we bring
people into the lab and have them go to the ingimp website, download and
install it and use it, and go check out the stats on the website, and then we
ask questions like, what kind of data are we collecting? We have a lengthy
concern agreement that details the type of information we are collecting and
the ways your privacy could be impacted, but people don’t read it.
So concretely ... what information are you recording, and what inf


here are these little ‘gotchas’, things to look out for, that I
don’t think most people are aware of, and this is why I get really concerned
about instrumentation efforts right now, because there isn’t this body of
experience of what kind of data should we collect, and what shouldn’t we
collect.
178

As we are talking about this, I am already more aware of what data I would allow
to be collected. Do you think by opening up this data set and the transparent
process of collecting and not collecting, this will help educate users about these
kinds of risks?
It might, but honestly I think probably the thing that will educate people
the most is if there was a really large privacy err


a different
tool very clear.

Right. You are more aware, because you are making that choice to download
that, compared to the regular version. There is this awareness about that.
We have this lengthy text based consent agreement that talks about the data
we collect, but less than two percent of the population reads license agreements. And, most of our users are actually non-native English speakers,
so there are all these things that are working against us. So, for the past
year we have really been focussing on privacy, not only in terms of how we
collect the data, but how we make people aware of what the software does.
We have been developing wordless diagrams to illustrate how the software
179

functions, so that we don’t have to worry about localisation errors as much.
And so we have these illustrations


that show someone downloading ingimp,
starting it up, a graph appears, there is a little icon of a mouse and a keyboard on the graph, and they type and you see the keyboard bar go up, and
then at the end when they close the application, you see the data being sent
to a web server. And then we show snapshots of them doing different things
in the software, and then show a corresponding graph change. So, we developed these by bringing in both native and non-native speakers, having
them look at the dia


ompanies and researchers
have been doing instrumentation for at least ten years, probably ten to
twenty years. So, the idea is not new but what is new, in terms of the
research aspects of this, is how do we do this in a way where we can make
all the data open? The fact that you make the data open, really impacts your
decision about the type of data you collect and how you are representing it.
And you need to really inform people about what the software does. But I
think your question is ... how does it impact the GIMP’s usability process?
Not at all, right now. But that is because we have intentionally been laying
off to the side, until we got to the point where we had an infrastructure,
where the entire community could really participate with the data analysis.
We really want to have this to be a self-sustaining infrastructure, we don’t
want to create a system where you have to rely on just one other person for
this to work.

What approach did you take in order to make this project self-sustainable?

Collecting data is not hard. The challenge is to understand the data, and I
don’t want to create a situation where the community is relying on only one
180

person to do that kind of analysis, because this is dangerous for a number of
reasons. First of all, you are creating a dependency on an external party, and
t


ve at some point. If that is the case, then you need to be able to pass the
baton to someone else, even if that could take a considerate amount of time
and so on. You also don’t want to have this external dependency, because
of the richness in the data, you really need to have multiple people looking
at it, and trying to understand and analyse it. So how are we addressing
this? It is through this StatsJam extension to the MediaWiki that I will
introduce today. Our hope is that this type of tool will lower the barrier
for the entire community to participate in the data analysis process, whether
they are simply commenting on the analysis we made or taking the existing
analysis, tweaking it to their own needs, or doing something brand new.

In talking with members of the GIMP project here at the Libre Graphics
Meeting, they started asking questions like, So how many people are doing
this, how many people are doing this and how many this? They’ll ask me while
we are sitting in a café, and I will be able to pop the database open and say, A
certain number of people have done this, or, no one has actually used this tool at
all. The danger is that this data is very rich and nuanced, and you can’t really
reduce these kind of questions to an answer of N people do this, you have to
understand the larger context. You have to understand why they are doing
it, why they are not doing it. So, the data helps to answer some questions,
but it generates new questions. They give you some understanding of how
the people are using it, but then it generates new questions of, Why is this
the case? Is this because these are just the people using ingimp, or is this
some more widespread phenomenon? They asked me yesterday how many
people are using this colour picker tool – I can’t remember the exact name –
so I looked and there was no record of it being used at all in my data set. So
I asked them when did this come out, and they said, Well it has been there at
least since 2.4. And then you look at my data set, and you notice that most of
my users are in the 2.2 series, so that could be part of the reasons. Another
reason could be, that they just don’t know that it is there, they don’t know
how to use it and so on. So, I can answer the question, b


ed across all log files? Is it the number of people
that have used it? Is it the number of log files where it has been used at
least once? There are lots and lots of ways in which you can interpret this
question. So, you really need to approach this data analysis as a discourse,
where you are saying, here are my assumptions, here is how I am getting to
this conclusion, and this is what it means for this particular group of people.
So again, I think it is dangerous if one person does that and you bec


t is going to impact interface design, I see it
really as a sort of reality check: this is how communities are using the
software and now you can take that information and ask, do we want to
better support these people or do we ... For example on my data set, most
people are working on relatively small images for short periods of time,
the images typically have one or two layers, so they are not really complex
images. So regarding your question, one of the things you can ask is, should
we be creatin


ing, fairly common operations, so should we
create a tool that strips away the rest of the stuff? Or, should we figure out
why people are not using any other functionality, and then try to improve
the usability of that? There are so many ways to use data I don’t really
know how it is going to be used, but I know it doesn’t drive design. Design
happens from a really good understanding of the users, the types of tasks
they perform, the range of possible interface designs that are out there, lots
of prototyping, evaluating those prototypes and so on. Our data set really
is a small potential part of that process. You can say, well according to this
data set, it doesn’t look like many people are using this feature, let’s not
182

much focus too on that, let’s focus on these other features or convers


e a hippie but I
think I’ll have to take the risk (laughs).
If you go to the Flash website, it tells you the important things you need to
know about Flash, and than you click download. Maybe there is a link to a
complex survey that tries to gather data en masse of untold millions of users.
I think that any randomly chosen website of a Libre Graphics project will
look similar. But instead it could say when you click download or run the
software ... we’re a bunch of people ... why don’t you come


ld do that. There are 6 million thousand people on this planet and the
amount of people not doing F/LOSS is enormous. Don’t wring your hands
about ‘where are the women’. Just ask them to join and that will be that!
208

Tying the story to data

In the summer of 2010, Constant commissioned artist and
researcher Evan Roth to develop a work of his choice, and
to make the development process available in some way.
He decided to use a part of his fee as prize-money for
The GML-Recorder Challenge, inviting makers to propose an Open Source device ‘that can unobtrusively record
graffiti motion data during a graffiti writer’s normal practice in the city’. In three interviews that took place in
Brussels and Paris within a period of one and a half years,
we spoke about the collaborative powers of the GMLstandard, about contact points between


mo3010.
Brussels, July 2010
ER
FS

So what should we talk about?

Can you explain what GML stands for?

GML stands for Graffiti Markup Language 1 . It is a very simple fileformat designed for amateur programmers. It is a way to store graffiti
motion data. I started working with graffiti writers, combining graffiti
and technology back in New York, in 2003. In graduate school, my thesis
ER

1

Graffiti Markup Language (.gml) is a universal, XML based, open file format designed to
store graffiti motion data (x and y coordinates and time). The format is designed to maximize
readability and ease of implementation, even for hobbyist programmers, artists and graffiti
writers. http://www.graffitimarkuplanguage.com

213

Tying the story to data

was on graffiti analysis, and writing software that could capture their
gestures, to archive motion data from graffiti writers. Back than I was
saving the data in an x-y-time array, I was calling them .graph files and I
sensed there was something interesting about the data, the visualization
of motion data but I had never opened up the project at that time.
About a year ago I released the second part of the project, of which the
source code was open but the dataset wasn’t. In conversation with a
friend of mine named Theo 2 , who also collaborated with me on the
L.A.S.E.R. Tag project 3 , he brought up the .graph file again and how
we could bring back the file format as a way to connect all these different


Analysis 4 , L.A.S.E.R. Tag, EyeWriter 5 ... so I
worked with Theo Watson, Chris Sugrue 6 and Jamie Wilkinson 7 and
other people to develop Graffiti Markup Language. It is a simple set of
guidelines, basically an .xml file format that saves x-y-time data but does
it in a way that is very specifically related to graffiti so there’s a drip tag
and there’s tags related to the size of the brush and to how many strokes
you have: is it one stroke or two strokes or three strokes.
The main idea is: How


The EyeWriter system uses inexpensive cameras and Open Source computer vision
software to track the wearer’s eye movements. http://www.eyewriter.org
Chris Sugrue http://csugrue.com
Jamie Wilkinson http://www.jamiedubs.com

214

Tying the story to data

and upload it to an open database. The 000000book-site 8 hosts all this
data and some people are writing software for this.

So there are three parts: the GML-standard, software to record and
play and than there is the data itself – all of it is ‘open’ in some way. Could
you go through each of them and talk about how they produce uploads and
downloads?
FS

Right. It starts with Graffiti Analysis. It is software written in C++
using OpenFrameworks, an Open Source


aphy
analysis project and he used Graffiti Analysis as a starting point. I find it
exciting when that happens but more often people take the file-format as
a starting point, and use it as a jumping-off point for making their own
work.
Second was the database. We had this file-format that we loosely defined.
I worked with Jamie to develop the 000000book site. It is pretty nutsand-bolts but you can click ‘upload’ and click on your own .gml files and
it will playback in the browser. People have dev


e made
Flash players, SVG players. Golan Levin has developed an application
that converts a .gml file into an auto-CAD format. The 000000book site
is basically where graffiti writers connect to developers.
In the middle between Graffiti Analysis and database is the Graffiti Markup
Language, that I think will have its own place on the web. But sometimes
ER

8

9

http://000000book.com. Pronounced: ‘Black Book’: ‘A black book is a graffiti artist’s
sketchbook. Often used to sketch out and plan


o previous illicit works.’
Wikipedia. Glossary of graffiti — wikipedia, the free encyclopedia, 2014. [Online; accessed 5.8.2014]

Stéphane Buellet, Camera Linea http://www.chevalvert.fr/portfolio/numerique/camera-linea

215

Tying the story to data

I see it as one project. One of my interests is in archiving graffiti and all
of these things are ways of doing that. It is interesting how these three
things work together. In terms of an OS development model it has been
producing results I haven’t seen when I just released source code.
FS

How do you do that, develop a standard for graffiti?

We started by looking at Graffiti Analysis and L.A.S.E.R. Tag, the
apps that were using graffiti motion data. From those two projects I had a
lot of experience of meeting graffiti writers as a userbase. When you meet
with them, they tell you right away what pieces of the software they think
are missing. So from talking with them we developed a lot of featu


cations for. It is there but we are looking for
how to use it.
ER

FS

Did you ever think about this standard as a way to define a discipline?

(laughs) I think in the beginning it was a very functional conversation.
We were having apps running this data and I don’t think we were thinking
ER

216

Tying the story to data

of defining graffiti when we were writing the format. But looking back,
it is interesting to think about it.
Graffiti has a lot of privacy issues related to it too, right? So we did
discuss about what it would mean to start recording geo-located data.
There are different interests in graffiti. There is an interest in visuals and
in deconstructing characters. Another group is interested in it, because
it is a sport and more of a performance art. For this type of interest, it
is more important to know exactly where and when it happened because
it is different on a rooftop in New York to a studio in the basement of
someones house. But if someone realizes this data resulted from an illegal
action, and wanted to tie it back to someone, than it starts to be like
a surveillance camera. What happens when someone is caught with a
laptop with all this data?
FS

Your desire to archive, is it also about producing new work?

I see graffiti writers as hackers. They use the city in the same way
as hackers are using computer systems. They are finding ways of using
a system to make it do things that it wasn


hardly overlap. One of the interests I have is making these
two groups of people hang out more. I was physically the person bridging these two groups; I was the nerd person meeting the graffiti writers
talking to them about software and having this database.
Now it is not about my personal collection anymore, it is making a handshake between two communities; making them run off with each other
and having fun as opposed to me having to be there all the time to make
introductions.
ER

Is GML about th


ti
writers. A tag might be something they have been writing for more than
25 years and that will be very personal to them and the way they write
this is because they’ve written it a million times. So at the one hand it
ER

217

Tying the story to data

is super-personal, but on the other hand a lot of graffiti writers have no
problem sharing this data. To them it is just another tag. They feel like,
I have written this tag a billion times and so when you want to keep one of
them, it is no big deal.
I don’t think the conversation has gotten as involved as it could have.
You set something in moti


ence that I have been reluctant to
deal with yet. Once you start talking too much about it, you will scare
off people on either side of the fence. I think that will have to happen at
some point but for now I have decided to refer to it as an ‘open database’
and I hope that people will play nicely, like I said.
ER

FS

But just imagine, what kind of licence would you need?

It might make more sense to go for a media-related licence than for
a code licence. Creative Commons licences would lend themselves easily
for this. People could choose non-commercial or pure public domain.
Does that make sense?
ER

218

Tying the story to data

Well, yes but if you look at the objects that people share, we’re much
closer to code than to a video file?
FS

is?

ER

Functionally it is code. But would a graffiti writer know what GPL

I am interested in the apprentice-system you were talking


had one camera recorder tracking the pen and
another camera behind the hand and another so you could see the full
body. But there was something about tracking just the pen tip that I
liked. It is an easier point of entry for dealing with the motion data than
having three different video feeds.
ER

Maybe it is more about metadata? Not a question of device or application, but about space for a comment.
FS

Maybe in the keywords there will be something like: Rooftop.
Brooklyn. Arrested.
The most interesting part is often the stories that people tell afterward
anyway. So it is an interesting idea, how to tie the story to the data.
It is a design problem too. Historically graffiti has been documented
many times by outsiders. The movie Style Wars 10 is a good example of
ER

10

Style Wars. Tony Silver, 1983. http://www.stylewars.com

219

Tying the story to data

this epic documentary that was made by outsiders that became insiders.
Also, the people that have been documenting most of the graffiti are not
necessarily graffiti writers.
Graffiti has a history with documentarians entering into their community a


ffiti writers to document their stories into the .gml files themselves,
or is it going to take outsiders? How does the format facilitate that?

Do you think the availability of a project like GML can have an impact
on the way graffiti is learned? If data becomes available in a community
that operates traditionally through apprenticeships and person-to-person
sharing, what does it do?
FS

I am interested in Open Source culture being influenced by graffiti,
and I am interested in Open Source culture i


t the rules, the culture of writing
graffiti often has a rigid structure. To people in that community what
ER

11
12

KATSU http://www.flickr.com/search/?q=graffiti+katsu
Mark Jenkins tapesculptures http://tapesculpture.org

220

Tying the story to data

I do is a blip on their radar. I am honored when I get to meet graffiti
writers and they are interested in what I am doing but I don’t think it
will change anything in what is in some ways a very strict system.
And I don’t want that either. I l


ublic domain through the research and development of creative technologies and media.
Release early, often and with rap music. http://fffff.at
Blender is a free Open Source 3D content creation suite. http://www.blender.org/

221

Tying the story to data

be 3D printed, to become something physical. The video that I posted intentionally showed online showed screenshots from Blender and it ended
up on one of the bigger community sites. I only saw it when my cousin,
who is a big Blender user, e-mailed


talk where I explained sort of where I see these things overlap,
I could make a better case than the three minute video they reacted to.
ER

What about Gesture Markup Language instead of Graffiti Markup
Language?
FS

Essentially GML records x-y-time data. If you talk about what it
functionally does, it is probably more related to gesture than it is to graffiti. There is nothing at the core specifically related to graffiti. I am
interested in branding it in relation to graffiti and to get people to talk
about Open Source where it is traditionally not talked about. To me
that is interesting. It is a way to get people excited about open data, and
popularizing ideas about Open Source.
ER

FS

Would you be OK if it would get more popular in non-graffiti circles?

I am super excited when I see it used in bizarre places. I’ll keep using
it for graffiti, but someone e-mailed me that they w


use it to track juggling, but how to track multiple balls
in the air? I keep calling it Graffiti Markup Language because I think it
is a good story.
ER

15

http://www.blendernation.com/2010/07/09/blender-graffiti-analysis

222

Tying the story to data

PW

What’s the licence on GML?

We haven’t really entered into that. Why would you need a licence
on a file format?
ER
FS

It would prevent that anyone could own the standard.

That sounds good. Actually it would be interesting for the project,


ndardization practices. Related, how can GML connect to other standard practices?
Could it be RDF compliant?

FS

PW

Gesture recognition to help out the police?

Or maps of places that are in need of some graffiti? How to link GML
to other types of data?
FS

It is hard for me to imagine something. But one thing is interesting
for example, how GML is used in the EyeWriter project. It has not
so much to do with gesture, but more with how you would draft in a
computer. TEMPT is plotting points, so the time data might not be so
interesting but because it is in the same format, the community might
pick it up and do something with it. All the TEMPT data he writes with
his eyes and it is uploaded to the 000000book site automatically. That
allowed another artist called Benjamin Gaulon 16 who I now know, but
didn’t know at the time, to use it with his Print Ball project. He took the
tag data from a paralyzed graffiti writer in Los Angeles and painted it on
a wall in Dublin. Eye-movement translated into a paint-ball gun ... that
is the kind of collaboration that I hope GML can be the middle-point
for. If that happens, things can start to


ring
collaborators, while something much more general like GML seems to be
more compelling for people to contribute to?
FS

16

Benjamin Gaulon, Print Ball
http://www.eyewriter.org/paintball-shooting-robot-writes-tempt1-tag

223

Tying the story to data

I’ll answer that in a second, but you reminded me of something
else: because EyeWriter was GML based, a lot of the collaborations
that happened with people outside of the project were GML related,
not EyeWriter related. So we did have artists lik


graffiti
markup language which is one of the three points of the triangle. The
second step would be a new addition to the wish-list, a challenge with a
prize associated to it which seems funny. The project I’d like to concentrate on is making the data collection easier so that graffiti writers can be
more active in the upload sense. Taking the NASA development model:
Can you get into orbit on this budget?
ER

How is that different from the way you record graffiti motion at the
moment?
FS

If I go


n security
cameras would still pick it up. The design I am focusing momentum on is
a system that’s easier. A system that can work without me there, without
having to have a laptop there. The whole idea is that it would be a natural
way to get good data, to document graffiti without a red-head holding a
laptop following you around the whole time!
ER

224

Tying the story to data

Paris, December 2010
FS

How is it to be the sole jury member?

I tried to get another jury-member on there actually. Do you know
Limor Fried? She runs Adafruit Industries. 17 I really like her work. She
works with her partner Phil Torrone who runs


t. If you solved
one of the design problems by the Mozilla community you could receive
ER

17
18

Limor Fried, Adafruit Industries http://www.adafruit.com
Phillip Torrone, Makezine http://makezine.com/pub/au/Phillip_Torrone

225

Tying the story to data

kudo’s from the community, but if you solved one of my projects, you
don’t really get kudo’s from my community, do you?
Having the money associated makes it this big thing. At Ars Electronica
and so on, it got people talking about it and so i


idea was that if you could take a
photo of it on the wall, and then with your finger you guide it for how it
ER

19
20
21

Kyle McDonald http://kylemcdonald.net
Michael Auger http://lm4k.com
Golan Levin http://www.flong.com

226

Tying the story to data

was written. It has an algorithm for image processing and that combined
with your best guess of how it was written would be backed out in motion
data. But it is faked data.
FS

That it is really interesting!

Yes it is and I would love it if he would make it but I am not going to
let him win with it (laughs). I understand why he wants to do it; especially
if you are not inside the graffiti community, your only experience is what
you see on the wall and you don’t know who these people are and it is
going to be almost impossible to ever get data for those tags. If you don’t
have access to that community you are never going to get the tag of the
person that you really want. I like the idea that he is thinking about
getting some data from the wall as opposed to getting it from the hand.
ER

Learning by copying. Nowhere near solving the challenge, but interesting. OSP 22 we were discussing about the way designers are invited into
Open Source Software by way of contest. Troy James


aesthetic, music, and other
such diverse fields when we are so stuck on how much more consistent a damn panel looks with tripe
22 pixel icons of a given flavour?
http://www.librescope.com/975/spec-work-and-contests-part-two

227

Tying the story to data

definable design goals of what we wanted to reach, especially between the
first version and where we are now with the second version.
FS

How did that work?

We are not talking about a ton of money here, 10 to 20.000 , and
we tried to get as far as


p is already a lot. The biggest success
is the project space, to see all the projects happening.
ER

FS

What happened on the site since we talked?

A project I like, is kml2gml 24 for example. It is done by a friend from
Tokyo. He was gathering GPS data riding his bike around various cities,
and building up a font based on his path. I like projects like this, where
ER

24

Yamaguchi Takahiro http://www.graffitimarkuplanguage.com/kml2GML

228

Tying the story to data

someone takes a work that is already done and just writes an application
to convert the data into another format. To see him riding his bike played
back in GML was really nice. It is super low barrier to entry, he already
did all the hard work. I like that there is now a system for piping very
different kinds of data through GML.
FS

But it could also work the other way around?

Yeah. This is maybe a tangent but depending on how someone solves
the GML challenge ... I was discussing this with Mike (the person that is
developing the sensor based version). He was thinking that if you would
turn on his system, and leave it on for a whole night of graffiti writing,
you would have the gestural data plus the GPS data. You could make
a .gml file that is tracking you down the street, and zoom in when you
start making the tag. Also you would get much more information on
3D movement, like tilt and when the pen is picking up and going down.
Right now all I am getting is a 2D view through video data. I am really
keeping my fingers crossed. But he ran into trouble though.
ER

FS

Like what?

I have my doubts about using these kind of sensors, because ‘drift’ is
a problem. When you start using these sensors too long, it tends to move
a little


all before writing on it, feeling out the
playing field before starting! It is like working on a tablet; to move from
actual movement to instruction; navigation blends into the action of drawing
itself.
FS

ER

I like that!

229

Tying the story to data

SV

The guy using the iPhone did not use it as a sensor at all?

Theo was interested in using the iPhone to record motion data in
GML, but also to save the coordinates so you could try it into a Google
Earth or something but he had trouble with the sensitivity of the sensor.
Maybe it is better now but you needed to draw on a huge scale for one
letter. You could not record anything small.
ER

But it could be nice if you could record with a device that is less conspicuous.
FS

I know. I have just been experimenting with mounting cameras on
spray-cans. A tangent to GML, but related. It is not data, but video.
ER

What do you think is the difference between recording video, and
recording data? You mentioned that you wanted to move away from documentation the image to capture movement. Video is somehow indirect
data?
FS

Video is annoying in that it is computationally expensive. In Brazil 25
I have been using the laptop but the data is not very precise.
Kyle thinks he might be able to back out GML data from videos. This
might solve the challenge, depending on how many cameras you need and
how expensive they are. But so far I have not heard back from him. He
said it needs three different cameras all looking at the wall. I mean: talk
about computati


late them. To me it seems more difficult than it needs to be
(laughs).
ER

It is both overcomplicated and beautiful, trying to reverse engineer
movement from the image.
FS

I am getting more into video myself. I get more enjoyment from capturing the data than from the projections, like what most people associate
with my work.
ER

FS

Why is it so much more interesting to capture, rather than to project?

In part because it stays new, I’ve been doing those projections for a
while now and I know what happens at these events. For a while it was
very new, we just did it with friends, to project on the Brooklyn bridge
ER

25

Graffiti Analysis: Belo Horizonte, Brazil 2010 http://vimeo.com/16997642

230

Tying the story to data

for example. Now it has turned into these events where everyone knows
in advance, instead of just showing up at at a certain time ate a set corner.
It has lost a lot of its magic and power.
Michele and I have done so many of these projections and w


entation is interesting. I don’t know where all of this
is going right now, I am just trying to get the footage; I put these pieces
together showing all this movement but I don’t really know what the final
project is. It is more about collecting data so I am interested in having
video, audio and GML that can be synced up, and the sound from these
microphones is something to do something with later. This is research
for me. I like the idea of having all this data related to a 10 second gesture.
I am thinking that in the future we can do interesting things with it. I
am even thinking about how the audio could be used as a signal to tell
you what is drawing and what is not drawing. It is a really analog way of


ressing that way.
ER

Are you thinking of other ways of capturing? You talk about capturing
movement, but do you also archive other elements? Do you take notes,
pictures? What happens to the conversations you are having?
FS

231

Tying the story to data

I have been missing out on that piece. It is a small amount of time
we have, and I am already trying to get so much. I am setting up a
camera that shoots straight video from a tripod, I am capturing from the
laptop and I am also screencasting the a


tation as a shared space that allows collaboration but also defines
the end of a collaboration.
FS

26
27

momo3010 http://momo1030.com
Simon Yuill. All problems of notation will be solved by the masses. Mute Magazine, 2008

232

Tying the story to data

Maybe using an XML-like structure was a bad idea? Maybe if I had
started with a less code-based set of rules? If the files were raw video,
it would encourage people to go outside more often? By picking XML
I am defining where the thing heads in a w


more common knowledge. If I would start to do
ER

28
29

Interview with François Chastanet http://www.youtube.com/watch?v=ayPcaGVKJHg
François Chastanet, Cholo writing: Latino gang graffiti in Los Angeles. Dokument, 2009

233

Tying the story to data

that now, I would quickly lose my small user-base. I love that idea though;
the way XML is programmed fits very much to the way you program for the
web. But what if it was playing more with language, starting from graffiti
which is very coded?
When


was very
against projection, because I felt that was totally against the idea of graffiti. I was presenting all of these print ideas and the output would be
pasted back into the city because I was against making an impermanent
representation of the data. In the end Zach said, you are just fighting this
because you have a motion project and you want to project motion and
then I said alright, I’ll do a test. And the tests were so exciting that I felt
OK with it.
ER

In what way does GML bridge the


graffiti writers would be into that too. How to develop
a style that is unique enough to stand out in an existing canon is already
hard enough. This could give someone an edge.
ER

I think the next challenge I’d like to run is about recreating the data
outside. I’ve been thinking about these helicopters with embedded wireless
ER

234

Tying the story to data

camera’s, have you seen them? The obvious thing to me would be uploading
a .gml file to one of these helicopters that is dripping paint on a rooftop.
Scale is so important, so going bigger is always going to be better.
Gigantic rooftop tags could


an edge back to the project. The
GML-challenge is already a step into that direction; it is not about the
prettiest screensaver. To ask people to design something that is tying back
to what graffiti is, which is in a way a crime.
I think fixing the data capture is the right place to start, the next one could
be about making marks in the city. Like: the first person to recreate this
GML-tag on the roof of this building, that would be fun. The first person
that could put this ‘Hello World’ tag onto the Brooklyn bridge and get a
photo of it gets the prize. That would get us back to the question of how
we leave marks on the surface of the city.
When you capture data of an individual writer in a certain standard,
it ends up as typography?
FS

That’s another trend that happens when designers look at graffiti, and
I’ve fallen into this too sometimes, you want to be able to make fonts out of
it. People have don


ry much about how it is written and
the order of the letters. When TEMPT picked this style he made a smart
decision that a lot of people miss when you make a font, you miss all the
motions and the connections.
ER

What if a programmer could put this data in a font, and generate
alternating connections?
SV

ER That kind of stuff is interesting. It would help graffiti writers to design
tags maybe?
To get my feet wet, I designed a tag once, and it was so not-fun to write!
I was thinking about a tag that would look different and that would fit

235

Tying the story to data

into corners, I was interested in designing something that wasn’t curved;
that would fit the angles of the city, hard edges. So I had forgotten all
my research about drafting and writing. I think I stopped writing in part
because the tag I picked


ng with his eyes, he ended up writing in the same way
as he would have written with his hands. When he saw the video with the
robot, it freaked him out because he was like: That’s how my hand moved
when I did that tag!
ER

236

Tying the story to data

The Graffiti Markup Field Recorder
challenge

An easily reproducible DIY device that can unobtrusively record graffiti motion data during a graffiti writer’s normal practice in the city. 30
Project Description and Design Requirements:



The GML Field Recorder Challenge is a DIY hardware and software solution for unobtrusively recording graffiti motion data during a graffiti writer’s
normal practice in the city. The winning project will be an easy to follow
instruction set that can be reproduced by graffiti writers and amateur technologists. The goal is to create a device that will document a night o


ble to make
something for the first time. But I also did not want to make it so small
that the design would be impossible.

ER

30

GML-recorder challenge as published on:
http://www.graffitimarkuplanguage.com/challenges

237

.

Tying the story to data



Computers and equipment outside of the 300

can be used

for non-field activities (such as downloading and manipulating data captured in-field), but at the time of
capture a graffiti writer should have no more than 300
worth of equipment on him or herself.

I was trying to think of how the challenge could be gamed ... I did not
want to get into a situation where we were g


e video-camera on me that is just documenting, I have another one on a tripod, and I am usually screen capturing
the software as it processes the video-footage because it tells another story.
I screw up because I forget to hit stop or record. If the data-capture just
works, I can go have fun getting good video-footage.
ER

What if it had to be operated by more than one person? It is nice
how the documentation now turns the act of writing into a performancefor-one.
FS

If you record alone, the data becomes more interesting and mysterious,
right? I mean, no one else has seen it. Something captured very privately,
than gets potentially shared publicly and turned into things that are very
different. I also thought: you don’t want to be dependent on someone else.
It is a lot to ask, especially if you are doing something illegal.
ER

238

Tying the story to data



Any setup and/or calibration should be limited to 10
seconds or less.

This came out of me dealing with the current system. It feels wrong
that it takes ten to fifteen minutes to get it running. Graffiti is not meant
to be that way. This speak


out and trying to capture with a system where it requires you to attach
a flashlight to a graffiti implement. I didn’t want anyone solving the
problem and then, Step one is: ‘Attach a police siren to a spraypaint can’

239

Tying the story to data



The resulting solution should be able to record at least
10 unique GML tags of approximately 10 seconds each in
length in one session without the need for connecting
to or using additional equipment.

I wasn’t thinking this was going to be a


t. I did not want the
graffiti writer to behave as if he was on vacation with a camera that could take
only three photos. I wanted to make sure they were not making decisions
on what they were writing based and how much memory they had.
ER



All data recorded using the field recorder should be
saved in a secure and non-incriminating fashion.

(laughs) If I had to do that one again, I would have put that in Bonus
category actually. That’s a difficult question to ask. What does secure
mean? It s


Who knows if that’s true, there were a lot of people around
him, but how do you really know?
ER

FS

GML could help balance the load?

You mean it would not be just the image of a tag but more like signing
at the bank?
ER

240

Tying the story to data

I mean that if you copy and distribute your data, the chance is small
that you can link it to an individual.
FS



The winning design will have some protection in the event
that the device falls into the wrong hands.

This again should probably have been a bonus item. Wouldn’t it be
awesome i


ace. If it works in those two situations, you
should theoretically be able to tie it to anything, even outside of graffiti. If
it was too much about spraypaint, it would be harder for someone to strap
it to a skateboard.
ER

241

Tying the story to data



System should be able to record writing on various surfaces and materials.

It is something you can easily forget about. When you are developing
something in the studio and it works well against a white wall, and than
when you go out in the ci


about graffiti that much. The street and the
studio are so different.
ER



Data should be captured at 30 points per second minimum.

I was assuming that lots of people were going to use cameras, and
I wanted to make sure they were taking enough data points. With other
capturing methods it is probably not such a problem. Even at 30 points per
seconds you can start to see the facets if you zoom in, so anything less is not
ideal.
ER



The recording system should not interfere with the writer s


to have on record: I love his solution! There’s a lot in his
design that is ‘making us more aware’ of what’s happening in the creation
of a tag. One thing that he is doing that is not in the specs, is that he is
ER

242

Tying the story to data

logging strokes, like up and down. When you watch him using it, you
can see a little light going from red to green when the fingers goes on
and off the spraypaint can. When you watch graffiti, it is too small of a
movement to even notice but when you are seeing that, it adds another
level of understanding of how they are writing.


All motion data should be saved using the current GML
standard 31 .
FS



Obvious.

All aspects of the winning design should be able to be
reproduced by graffiti writers and amateur technologists.

It wouldn’t be exciting if only ten people can make this thing


nother direction, back to my interest in what graffiti is rather than
anything that people might find aesthetically pleasing. It is not about
‘graffiti influenced visuals’.
ER

31

http://graffitimarkuplanguage.com/spec

243

Tying the story to data



All software must be released Open Source.

All hard-

ware must include clear DIY instructions/tutorials.

All

media must be released under an Open Content licence that
promotes collaboration (such as a Free Art License or
Creative Commons S


ur eyes
and ears are supposed to tell you about who’s coming around the corner.
Is there traffic coming or a train? There are so many other things you
need to pay attention to rather than: Is this button on?
The whole project is about getting good data. As soon as you force people
to think too much about the capture process, I think it influences when
and how they are writing.
ER

Bonus, but not required:


Inclusion of date, time and location saved in the .gml
file.

Yes. Security-wise that is questionable, but the nerd in me would just
love it. You could get really interesting data about a whole night of writing.
You could see a bigger story than just that of a single tag. How long did it
take to gain entry? How long were they hiding in the bushes? These things
get back to graffiti as a performance art rather than a form of visual art.
ER

244

Tying the story to data

Paris, November 2011
Last time we had contact we discussed how to invite Muharrem to
Brussels 32 . But now on the day of the deadline, it seems there are new
developments?
FS

ER I think in terms of the actual challenge, the main update is that sin


ses a modified lens on top of a plastic lens
that comes on top of a mouse, so that it can look at a surface that is a set
distance away. It has another sensor that looks at pitch, tilt and orientation,
but he is using that only to orient, the actual data gets recorded through the
mouse. It can get very high resolution, he is looking at up to a millimeter I
guess.
FS

Muharrem’s solution seems less precise?

I think he gets away with more because his solution is only for spraypaint
and once you are writing on that scale, even if you are off a few centimeters,
it might not ruin the data. If you look at the data he is getting, it actually
looks very good. I don’t think he has any numbers on the actual resolution
he is getting but if you were using his system with a pen, I think it would
be a different case. I like a lot of his solution too, it is an inter


from Phoenix (US) to Brussels (BE) and document his project in a worksession as
part of the Verbindingen/Jonctions 13 meetingdays. http://www.vj13.constantvzw.org
Joshua Noble http://www.thefactoryfactory.com/gmlchallenge/

245

Tying the story to data

cause I did not want to hand graffiti writers a mouse (laughter). I had
done all this research into graffiti and started to be embedded in the
community and I knew enough about the community that if you were
going to ask them to take part in someth


ideo I had not made and it did not have my name on it but personally I
ER

34

Torvalds, Linus; David Diamond (2001). Just For Fun: The Story of an Accidental
Revolutionary. New York, New York, United States: HarperCollins.

246

Tying the story to data

still felt a part of it. I think when you are working in open systems, you
take pride when a project has wings. It is maybe even a selfish act. It is
the story of me receiving some art-finding and realizing that I am not the
best toolmaker for the


Marini: Some personal projects of mine, for example specific effects and ‘looks’ that I have a
personal attachment to, I don’t release
https://github.com/kylemcdonald/SharingInterviews/blob/master/antonmarini.markdown

247

Tying the story to data

My focus has been on tags, this one portion of graffiti. I do think
there could be cool uses for more involved pieces. It would be great if
someone else would come in and do that, because it is a part of graffiti that
I haven’t studied that much.


f that, you have to do it a million times, for twenty years.
ER

In Seattle they call a piece that stays up for a longer time a ‘burner’. I
was connecting that to an archival practice of ephemera. It is a self-agreed
JH

248

Tying the story to data

upon archival process, and it means that the piece will not be touched, even
for years.

ER Graffiti has an interesting relationship to archiving. On the one hand,
many graffiti writers think: Now that tag’s done, but I’ve got another
million o


uires more reverence and it is even worse when it is painted
over.
But I think that GML is different, it is really more similar to a photo of
the tag. It is not trying to be the actual thing.
FS

Once a tag is saved in GML, what can be done with the data?

I am myself reluctant to take any of these tags that I’ve collected and
do anything with it at all without talking closely to whoever’s tag it is,
because it is such an intimate thing. In that sense it is strange to have
an open data repository and to be so reluctant to use it in a way that is
looking at anyone too specifically.
The sculpture I’ve been working on is an average from a workshop; sixteen different graffiti writers merged into one. I don’t want to take advantage


to referentiality. Like beat jacking for
DJs or biting rhymes for MCs, there must be a moment where you are not
just homaging, but stealing a style.
JH

I’ve seen cases where both parties have been happy, like when Yamaguchi
Takahiro used some GML data from KATSU and piped it into Google
Maps, so he was showing these big KATSU tags all over the earth which
was a nice web-based implementation. I think he was doing what a graffiti writer does naturally: Get out there and make the tag bigger but in
different ways. He is not taking KATSU-data from the database without
shining light back on him.
ER

GML seems very inspired by the practice of Free Software, but at the
same time it reiterates the conventional hierarchies of who are supposed to
FS

249

Tying the story to data

use what ... in which way ... from who. For me the excitement with open
licences is that you can do things without asking permission. So, usage
can develop even if it is not already prescribed by the culture. How would
someone like me, pretty far removed from graffiti culture ever know what I
am entitled to do?

I have my reasons for which I would and would not use certain pieces
of data in certain contexts, but I like the fact that it is open for people
that might use it for other things, even if I would not push some of those
boundaries myself.
ER

Even when I am sometimes disappointed by the actual closedness of
F/LOSS, at least in theory through its licensing and refusal to limit who is
entitled and who’s not, it is a liberating force. It seems GML is only half
liberating?
FS

I agree. I think the lack of that is related to the data. The looseness of
its licence makes it less of an invitation in a sense. If the people that put
data up there would sit down and really talk about what this means, when
they would really walk through all the implications of what it means to
public d


hat would be great. I would love that. Then you
could use it without having to worry about all the morality issues and
people’s feelings. It would be more free.
I think it would be good to do a workshop with graffiti writers where
beyond capturing data, you reserve an hour after the workshop to talk to
everybody about what it would mean to add an open licence. I’ve done
workshops with graffiti writers and I talked to everyone: Look, I am
going to upload this tag up to this place where everyone c


o be worried about copyright on something that is
illegal, things you can not publicly claim ownership of.
JH

Would you agree that standards are a normalizing practice, that in a
way GML is part of a legalizing process?
FS

250

Tying the story to data

For that to happen, a larger community would have to get involved. It
would need to be Gesture Markup Language, and a community other than
graffiti writers would need to get involved.
ER

FS
ER

Would you be interested in legalizing graffiti?
No. T


ake point between these two cultures, but
GML is a specific thing within this larger world of F/LOSS and graffiti
JH

36

KRS-One Master Teacher. AN INTRODUCTION TO HIP HOP .
http://www.krs-one.com/#!temple-of-hip-hop/c177q

251

Tying the story to data

in the larger world of hiphop. What other types of contact points might
there be? Do you see any similarities and differences?

For me, even beyond technology and beyond graffiti it all boils down to
this idea of the hack that is really a phenomeno


en in
Linux Journal in a fold-out spread of him posing with a Lamborghini or
something. Talk about braggadocio! You get into certain levels or certain
dynamics within the community where its really like pissing contests.
JH

252

Tying the story to data

I like that, I think there’s something there. At the instigation of the
Open Source Initiative, though: like Linus ‘pre-stock option’, sitting in his
bedroom not seeing the sun for a year and hacking and nerding out. To me
they are so differe


sort of showing off his stuff and
he has this machismo about him. Not necessarily directly mysognistic
but a macho kind of character and then take a nerd and have them do the
same.
JH

FS

Would they really be so different?

253

Tying the story to data

Obviously some rappers and some nerds, I mean that’s one of the
beauties – I mean its a global movement, you can’t help but have diversity
– but if we’re just speaking in generalizations?
JH

FS

There’s a lot of showing off in F/LOSS t


still on the to-do list. Yet a large part
of the code is quite directly reusable. The code allows to parse different types
of files. E-mails and chat-logs are often found in project archives. Here the
Python scripts allows to order them according to date information, and will
automatically assign a style to the different content fields.

The code itself is a documentation source, as much on concrete aspects, such
as e-mail parsing, than on a possible architecture, on certain coding motives, etc.
And


tion of the common work by exercising the
rights to reproduce, distribute, and modify that
are granted by the license.
“Originals” (sources or resources of the work)
means all copies of either the initial work or any
subsequent work mentioning a date and used

by their author(s) as references for any subsequent updates, interpretations, copies or reproductions.
“Copy” means any reproduction of an original
as defined by this license.
OBJECT

The aim of this license is to define the conditions


ving rise to authors rights and
related rights shall not challenge the rights
granted by this license. For example, this is the
reason why performances must be subject to the
same license or a compatible license. Similarly,
integrating the work in a database, a compilation or an anthology shall not prevent anyone
from using the work under the same conditions
as those defined in this license.
INCORPORATION OF THE WORK

Incorporating this work into a larger work that
is not subject to the Free Art Lic


dat in Constant 2016


ive History of the Google Cultural Institute GERALDINE
EN+NL

JUÁREZ




FR
EN

Une histoire préventive du Google Cultural Institute GERALDINE JUÁREZ
Special:Disambiguation

• Location, location, location
◦ EN From Paper Mill to Google Data Center SHINJOUNG YEO
◦ EN House, City, World, Nation, Globe NATACHA ROUSSEL
◦ EN The Smart City - City of Knowledge DENNIS POHL
◦ FR La ville intelligente - Ville de la connaissance DENNIS POHL
◦ EN The Itinerant Archive
• Cross-readings


ry (2011)
7. Some people have said, "Why do I need the Semantic Web? I have Google!" Google is great for helping people find things, yes!
But finding things more easily is not the same thing as using the Semantic Web. It's about creating things from data you've
complied yourself, or combining it with volumes (think databases, not so much individual documents) of data from other sources
to make new discoveries. It's about the ability to use and reuse vast volumes of data. Yes, Google can claim to index billions of
pages, but given the format of those diverse pages, there may not be a whole lot more the search engine tool can reliably do.
We're looking at applications that enable transformations, by being able to take large amounts of data and be able to run models
on the fly - whether these are financial models for oil futures, discovering the synergies between biology and chemistry researchers
in the Life Sciences, or getting the best price and service on a new pair of hiking boots.


t ça. Et puis, on va générer les
métadonnées Dublin Core. L’identifiant, un titre, tout ce
qui concerne les contributeurs : éditeurs, illustrateurs,
imprimeurs etc . c’est une description, c’est une
indexation par mots clefs, c’est une date, c’est une
localisation géographique, si il y en a une. C’est aussi,
faire des liens avec soit des ressources en interne soit des
ressources externes. Donc par exemple, moi si je pense à
une affiche, si elle a été dans une exposition si elle


n a à chaque fois quatre fichiers numériques : Un fichier RAW, un fichier
Tiff en 300 DPI, un JPEG en 300 DPI et un dernier JPE en 72 DPI, qui sont en fait les
trois formats qu’on utilise le plus. Et puis, là pareil, vous remettez un titre, une date, vous
avez aussi tout ce qui concerne les autorisations, les droits… Pour chaque document il y a
tout ces champs à remplir.
SM : Face à un schéma d’Otlet, on se demandait parfois ce que sont tous ces gribouillons.
On ne comprend pas tout de s



RC : En fait la classification décimale n’étant pas une méthode d’indexation standardisée,
elle n’est pas demandée dans ces champs. Pour chaque champ à remplir dans le Dublin
Core, vous avez des normes à utiliser. Par exemple, pour les dates, les pays et la langue vous
avez les normes ISO, et la CDU n’est pas reconnue comme une norme.
Quand je décris dans Pallas, moi je mets l’indice CDU. Parce que les collections
iconographiques sont classées par thématique. Les cartes postales


os archives. Il faut distinguer
des éléments et la politique de numérisation, je ne suis pas en train de vouloir dire : « Tiens,
on est dans la gestion de méga-données chez nous. »
Nous ne gérons pas de grandes quantités de données. Le Big Data ne nous concerne pas
tout à fait, en terme de données conservées chez nous. Le débat nous intéresse au même titre
que ce débat existait sous une autre forme fin du 19e siècle avec l’avènement de la presse
périodique et la multiplication


ser les
bibliografie. En vandaag heeft Google
alles samen met de volledige tekst
choses pour pouvoir les changer. Ça il le comprend dès
erbij die dan nog op elk woord
le départ, c’est pour ça, la rédaction des fiches, c’est
doorzoekbaar is. Dat is de droom van
standardisé, vous ne pouvez pas rédiger n’importe
zowel Vander Haeghen als Otlet
méér dan verder zetten. Vanuit die
comment. C’est pour ça qu’il développe la CDU, il faut
gedachte zijn wij vanzelfsprekend
un langage qui so


:
SVP: Maar ... je kan niet bij Google
gaan aankloppen, Google kiest jou.
Wij hebben wel hun aandacht
gevraagd voor het Mundaneum met
de link tussen Vander Haeghen en
Otlet. Als Google België iets
organiseert, proberen ze ons altijd te
betrekken, omdat wij nu eenmaal een
universiteit zijn. U heeft het
Mundaneum gezien, het is een zeer
mooi archief, maar dat is het ook.
Voor ons zou dat enkel een stuk van
een collectie zijn. Ze worden ook op
een totaal andere manier gesteund
door Google dan wij.

Notre intention en terme de numérisation n’est pas celle
là, et nous ne voyons pas notre action, nous, uniquement
par ce biais là. À


dexalist:
"Bij elke verwijzing stond weer een
andere verwijzing, de één nog
interessanter dan de ander. Elk
vormde de top van een piramide van
weer verdere literatuurstudie, zwanger
met de dreiging om af te dwalen. Elk
was een strakgespannen koord dat
indien niet in acht genomen de auteur
in de val van een fout zou lokken, een
vondst al uitgevonden en
opgeschreven."
From The Indexalist:
“At every reference stood another
reference, each more interesting than
the last. Each the apex of a pyramid
o


ues institutionnelles doivent
payer.
From Voor elk boek is een
gebruiker:
FS: Hoe gaan jullie om met boeken
en publicaties die al vanaf het begin
digitaal zijn? DM: We kopen e-books
en e-tijdschriften en maken die
beschikbaar voor onderzoekers. Maar
dat zijn hele andere omgevingen,
omdat die content niet fysiek binnen
onze muren komt. We kopen toegang
tot servers van uitgevers of de
aggregator. Die content komt nooit bij
ons, die blijft op hun machines staan.
We kunnen daar dus eigenlijk niet
zoveel mee doen, behalve verwijzen
en zorgen dat het evengoed vindbaar
is als de print.

P.52

P.53

MODULE 1: WORKFLOWS
• from book to e-book
◦ digitizing a book on a
book scanner
◦ removing DRM and
converting e-book
formats
• from clutter to catalogue
◦ managing an e-book
library with C


aries have to pay for subscriptions.
From Voor elk boek is een
gebruiker:
FS: Hoe gaan jullie om met boeken
en publicaties die al vanaf het begin
digitaal zijn? DM: We kopen e-books
en e-tijdschriften en maken die
beschikbaar voor onderzoekers. Maar
dat zijn hele andere omgevingen,
omdat die content niet fysiek binnen
onze muren komt. We kopen toegang
tot servers van uitgevers of de
aggregator. Die content komt nooit bij
ons, die blijft op hun machines staan.
We kunnen daar dus eigenlijk niet
zoveel mee doen, behalve verwijzen
en zorgen dat het evengoed vindbaar
is als de print.

Le programme de bibliothécaire amateur développe
plusieurs aspects et implications d'une telle définition.
Certaines parties du programme ont été construites à
partir de différents ateliers et exposés q


guage
nothing
of words
(language is nothing but a bag of words)
MICHAEL MURTAUGH

In text indexing and other machine reading applications the term "bag of
words" is frequently used to underscore how processing algorithms often
represent text using a data structure (word histograms or weighted vectors)
where the original order of the words in sentence form is stripped away. While
"bag of words" might well serve as a cautionary reminder to programmers of
the essential violence perpetrated to a text an


away.
The resulting representation is then a collection of each unique word used in the text,
typically weighted by the number of times the word occurs.
Bag of words, also known as word histograms or weighted term vectors, are a standard part
of the data engineer's toolkit. But why such a drastic transformation? The utility of "bag of
words" is in how it makes text amenable to code, first in that it's very straightforward to
implement the translation from a text document to a bag of words representa


nication for reasons of safety,
commercial telegraphy extended this network of communication to include those parties
coordinating the "raw materials" being mined, grown, or otherwise extracted from overseas
sources and shipped back for sale.

"RAW DATA NOW!"
Tim Berners-Lee: [...] Make a beautiful website, but
first give us the unadulterated data, we want the data.
We want unadulterated data. OK, we have to ask for
raw data now. And I'm going to ask you to practice
that, OK? Can you say "raw"?
Audience: Raw.
Tim Berners-Lee: Can you say "data"?
Audience: Data.
TBL: Can you say "now"?
Audience: Now!
TBL: Alright, "raw data now"!
[...]

From La ville intelligente - Ville de la
connaissance:
Étant donné que les nouvelles formes
modernistes et l'utilisation de
matériaux propageaient l'abondance
d'éléments décoratifs, Paul Otlet
croyait en la possibilité du langage


de tous les
éléments inefficaces et subjectifs.
From The Smart City - City of
Knowledge:
As new modernist forms and use of
materials propagated the abundance
of decorative elements, Otlet believed
in the possibility of language as a
model of 'raw data', reducing it to
essential information and
unambiguous facts, while removing all
inefficient assets of ambiguity or
subjectivity.

So, we're at the stage now where we have to do this -the people who think it's a great idea. And all the
people -- and


immediate return on the investment because it will only really pay off when everybody
else has done it -- they'll do it because they're the sort of person who just does things
which would be good if everybody else did them. OK, so it's called linked data. I want
[6]
you to make it. I want you to demand it.
UN/STRUCTURED

As graduate students at Stanford, Sergey Brin and Lawrence (Larry) Page had an early
interest in producing "structured data" from the "unstructured" web. [7]
The World Wide Web provides a vast source of information of almost all types,
ranging from DNA databases to resumes to lists of favorite restaurants. However, this
information is often scattered among many web servers and hosts, using many different
formats. If these chunks of information could be extracted from the World Wide Web
and integrated into a structured form, they would form an unprecedented source of
information. It would include the largest international directory of people, the largest
and most diverse databases of products, the greatest bibliography of academic works,
and many other useful resources. [...]

P.76

P.77

2.1 The Problem
Here we define our problem more formally:
Let D be a large database of unstructured information such as the World Wide Web
[8]
[...]

In a paper titled Dynamic Data Mining Brin and Page situate their research looking for rules
(statistical correlations) between words used in web pages. The "baskets" they mention stem
from the origins of "market basket" techniques developed to find correlations between the
items


o tackle the scale of the web and still perform
using contemporary computing power completing its task in a reasonably short amount of
time.
A traditional algorithm could not compute the large itemsets in the lifetime of the
universe. [...] Yet many data sets are difficult to mine because they have many
frequently occurring items, complex relationships between the items, and a large
number of items per basket. In this paper we experiment with word usage in documents
on the World Wide Web (see Section 4.2 for details about this data set). This data set
is fundamentally different from a supermarket data set. Each document has roughly
150 distinct words on average, as compared to roughly 10 items for cash register
transactions. We restrict ourselves to a subset of about 24 million documents from the
web. This set of documents contains over 14 millio


hat's quite symptomatic. It goes
something like this: you (the programmer) have managed to cobble out a lovely "content
management system" (either from scratch, or using any number of helpful frameworks)
where your user can enter some "items" into a database, for instance to store bookmarks.
After this ordered items are automatically presented in list form (say on a web page). The
author: It's great, except... could this bookmark come before that one? The problem stems
from the fact that the database ordering (a core functionality provided by any database)
somehow applies a sorting logic that's almost but not quite right. A typical example is the
sorting of names where details (where to place a name that starts with a Norwegian "Ø" for
instance), are language-specific, and when a mixture of languages occurs, no single ordering
is necessarily "correct". The (often) exascerbated programmer might hastily add an
additional database field so that each item can also have an "order" (perhaps in the form of a
date or some other kind of (alpha)numerical "sorting" value) to be used to correctly order
the resulting list. Now the author has a means, awkward and indirect but workable, to control

the order of the presented data on the start page. But one might well ask, why not just edit
the resulting listing as a document? Not possible! Contemporary content management
systems are based on a data flow from a "pure" source of a database, through controlling
code and templates to produce a document as a result. The document isn't the data, it's the
end result of an irreversible process. This problem, in this and many variants, is widespread
and reveals an essential backwardness that a particular "computer scientist" mindset relating
to what constitutes "data" and in particular it's relationship to order that makes what might be
a straightforward question of editing a document into an over-engineered database.
Recently working with Nikolaos Vogiatzis whose research explores playful and radically
subjective alternatives to the list, Vogiatzis was struck by how from the earliest specifications
of HTML (still valid today) have separate elements (OL and


on, still followed by modern web
browsers, the only difference between the two visually is that UL items are preceded by a
bullet symbol, while OL items are numbered.
The idea of ordering runs deep in programming practice where essentially different data
structures are employed depending on whether order is to be maintained. The indexes of a
"hash" table, for instance (also known as an associative array), are ordered in an
unpredictable way governed by a representation's particular implementation. This data
structure, extremely prevalent in contemporary programming practice sacrifices order to offer
other kinds of efficiency (fast text-based retrieval for instance).
DATA MINING

In announcing Google's impending data center in Mons, Belgian prime minister Di Rupo
invoked the link between the history of the mining industry in the region and the present and
future interest in "data mining" as practiced by IT companies such as Google.
Whether speaking of bales of c


gorithm, and in the process
(voluntarily) blind themselves to the work practices which have produced and maintain these
"resources".
Berners-Lee, in chastising his audience of web publishers to not only publish online, but to
release "unadulterated" data belies a lack of imagination in considering how language is itself
structured and a blindness to the need for more than additional technical standards to connect
to existing publishing practices.
Last
Revision:
2·08·2016

1. Benjamin Franklin Lieb


hese days and I guess I'm no exception." from Brin's Stanford webpage
8. Extracting Patterns and Relations from the World Wide Web, Sergey Brin, Proceedings of the WebDB Workshop at EDBT
1998, http://www-db.stanford.edu/~sergey/extract.ps
9. Dynamic Data Mining: Exploring Large Rule Spaces by Sampling; Sergey Brin and Lawrence Page, 1998; p. 2 http://
ilpubs.stanford.edu:8090/424/
10. Hypertext Markup Language (HTML): "Internet Draft", Tim Berners-Lee and Daniel Connolly, June 1993, http://
www.w3.o


tions.
(2) The unique identifier at hand for these text portions is not the bibliographic
information, but the URL.
(3) The text is as long as web-crawlers of a given search engine are set to reach,
refashioning the library into a storage of indexed data.

These are some of the lines along which online texts appear to produce difference. The first
contrasts the distinct printed publication to the machine-readable text, the second the
bibliographic information to the URL, and the third the library to


s created an
environment in which all machine-readable online
documents in reach are effectively treated as one single
document. For any text-sequence to be locatable, it
doesn't matter in which file format it appears, nor whether
its interface is a database-powered website or mere
directory listing. As long as text can be extracted from a
document, it is a container of text sequences which itself
is a sequence in a 'book' of the web.
Even though this is hardly news after almost two decades
of Googl


nal papers, newspaper articles, etc., that are
designed to be read from beginning to end.

From Voor elk boek is een gebruiker:
FS: Maar het gaat toch ook over de
manier waarop jullie toegang bieden,
de bibliotheek als interface? Online
laten jullie dat nu over aan Google.
SVP: De toegang gaat niet meer
over: “deze instelling heeft dit, deze
instelling heeft iets anders”, al die
instellingen zijn via dezelfde interface
te bereiken. Je kan doorheen al die
collecties zoeken en dat is ook weer
een stukje van die originele droom van
Otlet en Vander Haeghen, het idee
van een wereldbibliotheek. Voor elk
boek is er een gebruiker, de
bibliotheek moet die maar gaan
zoeken.
Wat ik intrigerend vind is dat alle
boeken één boek geworden zijn
doordat ze op hetzelfde niveau
doorzoekbaar zijn, dat is ongelooflijk
opwindend. Dat is een andere manier
van lezen die zelfs Otlet zich niet had
kunnen voorstellen. Ze zouden zot
worden moesten ze dit weten.

Still, the scope of textual forms appearing in search
results, and thus a corpus of texts in which they are being
brought int


e cards are further arranged between
coloured guide cards. As an alternative to tabbed cards, signal flags may be used. Here,
metal clips may be attached to the top end of the card and that stand out like guides. For use
of the system in relation to dates of the month, the card is printed with the numbers 1 to 31
at the top. The metal clip is placed as a signal to indicate the card is to receive attention on
the specified day. Within a large organisation a further card can be drawn up to assign
responsibility for processing that date’s cards. There were numerous means of working the
cards, special techniques for integrating them into any type of research or organisation, means
by which indexes operating on indexes could open mines of information and expand the
knowledge and ca


pite shifts in handwriting styles, whereby letters sometimes
appear extremely rushed and distorted in multiple idiosyncratic ways, the
experts consulted unanimously declared that the manuscript was most likely
authored by one and the same person. To date, the author remains unknown.
Q

I've been running with a word in my mouth, running with this burning untitled shape, and I
just can't spit it out. Spit it with phlegm from a balcony, kiss it in a mirror, brush it away one
morning. I've been running


lbs
replaced with a micro fish-eye lens implant?” Knitted eyebrows: “Someone whose neural
pathways zigzagged phrenologist categories?” Microexpressionist: “How many semioticiandentists and woodworm-writers have visited the Chaos Institute to date?” A ragged mane:
“The same number as the number of neurological tools for brain mapping that the Institute
owns?” {one lengthy word crossed out, probably a name}: “Would your brain topography get
upset and wrinkle if you imagined all the bur


infrastructures, who are
fundamental for these systems to work but often forgotten or displaced. Next, an account of
the elements of distribution and control that appear both in the idea of a Reseau Mundaneum
, and in the contemporary functioning of data centres, and the resulting interaction with other
types of infrastructures. Finally, there is a brief analysis of the two approaches to the
'organization of world's knowledge', which examines their regimes of truth and the issues that

P.168

P.169


ype of badge and are isolated in a section of the
Mountain View complex secluded from the rest of the
workers through strict access permissions and fixed time
schedules. Their work consists of scanning the pages of
printed books for the Google Books database, a task that
is still more convenient to do by hand (especially in the
case of rare or fragile books). The workers are mostly
women and ethnic minorities, and there is no mention of
them on the Google Books website or elsewhere; in fact
the whol


cesses and labor of writing,
editing, design, layout, typesetting, and
eventually publishing, collecting and
[9]
cataloging .

In 2013, while Prime Minister Di Rupo was celebrating the beginning of the second phase
of constructing the Saint Ghislain data centre, a few hundred kilometres away a very similar
situation started to unroll. In the municipality of Eemsmond, in the Dutch province of
Groningen, the local Groningen Sea Ports and NOM development were rumoured to have
plans with another code named company, Saturn, to build a data centre in the small port of
Eemshaven.
A few months later, when it was revealed that Google
was behind Saturn, Harm Post, director of Groningen
Sea Ports, commented: "Ten years ago Eemshaven
became the laughing stock of ports and industrial
development in the Netherlands, a planning failure of the
previous century. And now Google is building a very
large data centre here, which is 'pure advertisement' for
Eemshaven and the data port."[10] Further details on tax
cuts were not disclosed and once finished, the data centre will provide at most 150 jobs in
the region.
Yet another territory fortunately chosen by Google, just like Mons, but what are the selection
criteria? For one thing, data centres need to interact with existing infrastructures and flows of
various type. Technically speaking, there are three prerequisites: being near a substantial
source of electrical power (the finished installation will consume twice as much as the w


artly due to the rapid growth of the importance of Software as a service, so-called cloud
computing, which is the rental of computational power from a central provider. With the rise
of the SaaS paradigm the geographical and topological placement of data centres becomes of
strategic importance to achieve lower latencies and more stable service. For this reason,

Google has in the last 10 years been pursuing a policy of end-to-end connection between its
facilities and user interfaces. This includes buying leftover fibre networks[11], entering the
business of underwater sea cables[12] and building new data centres, including the ones in
Mons and Eemshaven.
The spread of data centres around the world, along the main network cables across
continents, represents a new phase in the diagram of the Internet. This should not be
confused with the idea of decentralization that was a cornerstone value in the early stages of
inter


kind of operations in specific buildings, that is fostering their distribution.
The tension between centralization and distribution and the dependence on neighbouring
infrastructures as the electrical grid is not an exclusive feature of contemporary data storage
and networking models. Again, similarities emerge from the history of the Mundaneum,
illustrating how these issues relate closely to the logistic organization of production first
implemented during the industrial revolution, and theorized wi


of a planned living environment implied
that methods similar to those employed for managing the
flows of coal and electricity could be used for the
organization of culture and knowledge.

P.172
From From Paper Mill to Google
Data Center:
In a sense, data centers are similar to
the capitalist factory system; but

P.173

The Traité de Documentation, published in 1934, includes an extended reflection on a
Universal Network of Documentation, that would coordinate the transfer of knowledge
between diff


m idea, facilitated by the railway system[20]. No wonder that
future Mundaneums were foreseen to be built next to a train station.
In Otlet's plans for a Reseau Mundaneum we can already detect some of the key
transformations that reappear in today's data centre scenario. First of all, a drive for
centralization, with the accumulation of materials that led to the monumental plans of World
Cities. In parallel, the push for international exchange, resulting in a vision of a distribution
network. Thirdl


ture.
While the plan for Antwerp was in the end rejected in favour of more traditional housing
development, 80 years later the legacy of the relation between existing infrastructural flows
and logistics of documentation storage is highlighted by the data ports plan in Eemshaven.
Since private companies are the privileged actors in these types of projects, the circulation of
information increasingly respond to the same tenets that regulate the trade of coal or
electricity. The very different welcome that traditional politics reserve for Google data centres
is a symptom of a new dimension of power in which information infrastructure plays a vital
role. The celebrations and tax cuts that politicians lavish on these projects cannot be
explained with 150 jobs or economic incentives for a depressed


k behind the same table, here it is. Now it is your turn to sow the
good seed of documentation, of institution, and of Mundaneum, through the pre-book and the
spoken word[1]
NL

Toen ik die herfst in Brussel arriveerde was ik nog heel jong. Ik dacht dat ik als au-pair in
de huishouding zou helpen, maar in plaats daarvan moest ik de professor helpen met het
afmaken van zijn boek. Toen ik aankwam was het schrijven al afgerond, maar de drukker
worstelde nog met het manuscript omdat het handschrift moeilijk te ontcijferen was. Het
werd mijn taak om de drukproeven te corrigeren. Er waren veel woorden die de drukker en
ik niet konden ontcijferen, dus dan moesten we het navragen. Maar vaak had de professor
geen tijd voor ons. Ik de



transfer of the archives from the Friends of the Palais Mondial to the Centre de Lecture
Public of the French community.
In the inventory, the cats are nowhere to be found.[3]
NL

Ze schenkt ons koffie uit een keramieken koffiepot en serveert gebak dat ze bij de
naburige bakkerij kocht. Herhaaldelijk herinnert ze ons eraan dat 'het allemaal geschreven
staat in de documenten'. Ze vertelt ons dat in de jaren zestig, haar man op een dag
thuiskwam en opgewonden vertelde dat hij het Mundaneum ontdekt had op de Leuvense
Steenweg in Brussel. Sindsdien keerde hij daar regelmatig terug om de vrienden van het
Palais Mondial te ontmoeten: de toegewijde verzorgers van die immense papieren erfenis.
Ik ben er zelf niet zo vaak geweest, zegt ze. Maar ik herinner me dat er katten waren om de
muizen weg te houden van al het papier. En mijn man hield van katten. In de jaren tachtig,
toen hij eindelijk een positie had die hem in staat stelde om de archieven te redden, moest er
ook voor de katten worden gezorgd. Hij wilde de katten opnemen in de inventaris.
We drinken onze koffie op en ze neemt ons mee achter een gordijn dat de salon van een
klein kantoor scheidt. Ze toont ons vier groene mappen met de keurig geordende papieren
van haar voormalige echtgenoot over het Mundaneum. In de derde map bevindt zich de akte

die de overdracht van de archieven beschrijft van de Vr


transforming
Le Traité de Documentation into a printed book.
2. NL
Wilhelmina Coops kwam in 1932 uit Nederland naar Brussel om Frans te leren. Ze hielp het manuscript voor Le Traité de
Documentation omzetten naar een gedrukt boek.
3. EN
The act is dated April 4 1985. Madame Canonne is a librarian, widow of André Canonne († 1990). She is custodian of
the documents relating to the wanderings of The Mundaneum in Brussels.
4. NL
De akte is gedateerd op 4 april 1985. Madame Canonne is bibliothecares


ife of Paul Otlet, collaborated with her husband on many projects. Her family fortune kept
the Mundaneum running after other sources had dried up.
6. NL
Cato van Nederhasselt, de tweede vrouw van Paul Otlet, werkte met haar man aan vele projecten. Nadat alle andere
bronnen waren uitgeput hield haar familiefortuin het Mundaneum draaiende.

A Preemptive
History
of the
Google
Cultural
Institute
GERALDINE JUÁREZ

I. ORGANIZING INFORMATION IS NEVER INNOCENT

Six years ago, Google, an Alphabet company,


ne.”[2]
The Google Cultural Institute is strictly divided in Art Project, Historical Moments and
World Wonders, roughly corresponding to fine art, world history and material culture.
Technically, the Google Cultural Institute can be described as a database that powers a
repository of high-resolution images of fine art, objects, documents and ephemera, as well as
information about and from their ‘partners’ - the public museums, galleries and cultural
institutions that provide this cultural mate


nduced starvation of publicly funded cultural institutions even throughout the
wealthy countries”[3]. It is important to understand that what Google is really doing is
bankrolling the technical infrastructure and labour needed to turn culture into data. In this
way it can be easily managed and feed all kind of products needed in the neoliberal city to
promote and exploit these cultural ‘assets’, in order to compete with other urban centres in
the global stage, but also, to feed Google’s unst


ogle Cultural Institute is a complex subject of interest since it reflects the colonial
impulses embedded in the scientific and economic desires that formed the very collections
which the Google Cultural Institute now mediates and accumulates in its database.

Who colonizes the colonizers? It is a very difficult issue which I have raised before in an
essay dedicated to the Google Cultural Institute, Alfred Russel Wallace and the colonial
impulse behind archive fevers from the 19th but also the 21st


a
continuation of the idea of Enlightenment that gave birth to the impulse to collect, organise
and manage information in the 19th century. My use of this term aims to emphasize and
situate contemporary accumulation and management of information and data within a
technoscientific landscape driven by “profit above else” as a “logical extension of the surplus
value accumulated through colonialism and slavery.”[7]
Unlike in colonial times, in contemporary technocolonialism the important narrative is not the
supremacy of a specific human culture. Technological culture is the saviour. It doesn’t matter
if the culture is Muslim, French or Mayan, the goal is to have the best technologies to turn it
into data, rank it, produce content from it and create experiences that can be monetized.
It only makes sense that Google, a company with a mission of to organise the world’s
information for profit, found ideal partners in the very institutions that were pr


re that this is not the only way of
seeing things. That the museum – the installation, the arrangement, the collection – has a
history, and that it also has an ideological baggage”[8]. But the Google Cultural Institute is
not a museum, it is a database with an interface that enables to browse cultural content.
Unlike the prestigious museums it collaborates with, it lacks a history situated in a specific
cultural discourse. It is about fine art, world wonders and historical moments in a general


istory of information science beyond Silicon Valley. After all, they
understand that “ownership over the historical narratives and their material correlates
becomes a tool for demonstrating and realizing economic claims”.[9]
After establishing a data centre in the Belgian city of Mons, home of the Mundaneum
archive center, Google lent its support to "the Mons 2015 adventure, in particular by
working with our longtime partners, the Mundaneum archive. More than a century ago, two
visionary Belgian


hives in the
Historical Moments section of the Google Cultural Institute.

P.188

P.189

Later in August, Eric Schmidt declares that education should bring art and science together
just like in “the glory days of the Victorian Era”.[21]
2012
EU DATA AUTHORITIES INITIATE A NEW INVESTIGATION INTO GOOGLE AND
THEIR NEW TERMS OF USE.

At the request of the French authorities, the European Union initiates an investigation against
Google, related to the breach of data privacy due to the new terms of use published by
Google on 1 March 2012.[22]
THE GOOGLE CULTURAL INSTITUTE CONTINUES TO DIGITALIZE CULTURAL
‘ASSETS’.

According to the Google Cultural Institute website, 151 partners join the Google Art
Project i


ations in the UK
via Ireland as "devious, calculated and, in my view, unethical".[24]
2014
EUROPEAN COURT OF JUSTICE RULES ON THE “RIGHT TO BE FORGOTTEN”
AGAINST GOOGLE.

The controversial ruling holds search engines responsible for the personal data that it handles
and under European Law the court ruled “that the operator is, in certain circumstances,
obliged to remove links to web pages that are published by third parties and contain
information relating to a person from the list of results


HE CITY OF MONS, EUROPEAN
CAPITAL OF CULTURE IN 2015.

A press release from Google[30] describes the new partnership with the Belgian city of Mons
as a result of their position as local employer and investor in the city, since one of their two
major data centres in Europe is located there.
2015
EU COMMISSION SENDS STATEMENT OF OBJECTIONS TO GOOGLE.

The European Commission has sent a Statement of Objections to Google alleging the
company has abused its dominant position in the markets for general in


. Google Paris. Accessed Dec 22, 2016 http://www.google.se/about/careers/locations/paris/

3. Schiller, Dan & Yeo, Shinjoung. “Powered By Google: Widening Access And Tightening Corporate Control.” (In Aceti, D.
L. (Ed.). Red Art: New Utopias in Data Capitalism: Leonardo Electronic Almanac, Vol. 20, No. 1. London: Goldsmiths
University Press. 2014):48
4. Down, Maureen. “The Google Art Heist”. The New York Times. Sept 12, 2015 http://www.nytimes.com/2015/09/13/
opinion/sunday/the-google-art-h


n des
revendications économiques ».[9]
Après avoir établi un centre de données dans la ville belge de Mons, ville du Mundaneum,
Google a offert son soutien à « l'aventure Mons 2015, en particulier en travaillant avec nos
partenaires de longue date, les archives du Mundaneum. Plus d'un siècle auparavant, deux
visionnaires belges ont imaginé l'architecture du World Wide Web d'hyperliens et

d'indexation de l'information, non pas sur des ordinateurs, mais sur des cartes de papier.
Leur créat


le 22 décembre 2016 http://www.google.se/about/careers/locations/paris/

P.206

P.207

3. Schiller, Dan & Yeo, Shinjoung. « Powered By Google: Widening Access And Tightening Corporate Control. » (In Aceti, D.
L. (Éd.). Red Art: New Utopias in Data Capitalism: Leonardo Electronic Almanac, Vol. 20, No. 1. Londres : Goldsmiths
University Press. 2014): 48
4. Down, Maureen. « The Google Art Heist ». The New York Times. 12 septembre 2015 http://
www.nytimes.com/2015/09/13/opinion/sunday/the-googl


Vint Cerf, so-called 'internet evangelist', or 'father of the internet',
working at LA MÉGA-ENTREPRISE
◦ Jiddu Krishnamurti, priest at the 'Order of the Star', a theosophist
splinter group that Paul Otlet related to
◦ Sir Tim Berners Lee, 'open data evangelist', heading the World Wide
Web consortium (W3C)

4. L'UTOPISTE may refer to:
◦ Paul Otlet, documentalist, universalist, internationalist, indexalist. At
times considered as the 'father of information science', or 'visionary inventor of
t


ay refer to:
◦ Wallonia (Belgium), or La Wallonie. Former mining area, homebase of former prime minister Elio di Rupo, location of two Google
datacenters and the Mundaneum Archive Center
◦ Groningen (The Netherlands), future location of a Google data
center in Eemshaven
◦ Hamina (Finland), location of a Google data center

P.210

P.211

9. LE BIOGRAPHE is used for persons that are instrumental in constructing the
narrative of Paul Otlet. It may refer to:
◦ André Canonne, librarian and director of the Centre de Lecture
publique de la Communauté française


services, health,
education, self-driving cars, internet of things, life sciences, and the like. Google’s lucrative
internet business does not only generate profits. As Google’s chief economist Hal Varian
states:
…it also generates torrents of data about users’ tastes and habits, data that Google
then sifts and processes in order to predict future consumer behavior, find ways to
improve its products, and sell more ads. This is the heart and soul of Googlenomics.
It’s a system of constant self-analysis: a data-fueled feedback loop that defines not only
[8]
Google’s future but the future of anyone who does business online.

P.214

P.215

Google’s business model is emblematic of the “new economy” which is primarily built around
data and information


capitalist social relations and transcended the material
world? Google and other Internet companies have been investing heavily in industrial-scale
real estate around the world and continue to build large-scale physical infrastructure in the
way of data centers where the world’s bits and bytes are stored, processed and delivered.
The term “tube” or “cloud” or “weightless” often gives us a façade that our newly marketed
social and cultural activities over the Internet transcend the ph


n abstract place but rather is manifested
in the concrete material world, one deeply embedded in capitalist development which
reproduces structural inequality on a global scale. Specifically, the analysis will focus on
Google’s growing large-scale data center infrastructure that is restructuring and reconfiguring
previously declining industrial cities and towns as new production places within the US and
around the world.
Today, data centers are found in nearly every sector of the economy: financial services,
media, high-tech, education, retail, medical, government etc. The study of the development of
data centers in each of these sectors could be separate projects in and of themselves;
however, for this project, I will only look at Google as a window into the “new” economy, the

company which has led the way in the internet sector in building out and linking up data
centers as it expands its territory of profit.[10]
DATA CENTRES IN CONTEXT

The concepts of “spatial fix” by critical geographer David Harvey[11] and “digital capitalism”
by historian of communication and information Dan Schiller[12] are useful to contextualize and
place the emergence of large-scale data centers within capitalist development. Harvey
illustrates the notion of spatial fix to explicate and situate the geographical dynamics and crisis
tendency of capitalism with over-accumulation and under-consumption. Harvey’s spatial fix
has dual me


ed and extended beyond information industries and reorganized the entire economy
from manufacturing production to finance to science to education to arts and health and
impacts every iota of people’s social lives.[14] Current growth of large-scale data centers by
Internet companies and their reoccupation of industrial towns needs to be situated within the
context of the development of digital capitalism.
FROM MANUFACTURING FACTORY TO DATA FACTORY

Large-scale data centers – sometimes called “server farms” in an oddly quaint allusion to the
pre-industrial agrarian society – are centralized facilities that primarily contain large numbers
of servers and computer equipment used for data processing, data storage, and high-speed
telecommunications. In a sense, data centers are similar to the capitalist factory system; but
instead of a linear process of input of raw materials to
output of material goods for mass consumption, they input
mass data in order to facilitate and expand the endless
cycle of commodification – an Ouroboros-like machine.
As the factory system enables the production of more

P.216

From X = Y:
In these proposals, Otlet's archival

P.217

goods at a lower cost through automation and control of labor to maximize profit, data centers
have been developed to process large quantities of bits and bytes as fast as possible and at as
low a cost as possible through automation and centralization. The data center is a hyperautomated digital factory system that enables the operation of hundreds of thousands of
servers through centralization in order to conduct business around the clock and around the
globe. Compared to traditional industrial factories that produce material goods and generally
employ entire towns if not cities, large-scale data centers each generally employ fewer than
100 full-time employees – most of these employees are either engineers or security guards.
In a way, data centers are the ultimate automated factory. Moreover, the owner of a
traditional factory needs to acquire/purchase/extract raw materials to produce commodities;
however, much of the raw data for a data center are freely drawn from the labor and
everyday activities of Internet users without a direct cost to the data center. The factory
system is to industrial capitalism what data centers are becoming to digital capitalism.
THE GROWTH OF GOOGLE’S DATA FACTORIES

Today, there is a growing arms race among leading Internet companies – Google, Microsoft,
Amazon, Facebook, IBM – in building out large-scale data centers around the globe.[16]
Among these companies, Google has so far been leading in terms of scale and capital
investment. In 2014, the company spent $11 billion for real estate purchases, production
equipment, and data center construction,[17] compared to Amazon which spent $4.9 billion
and Facebook with $1.8 billion in the same year.[18]
Until 2002, Google rented only one collocation facility in Santa Clara, California to house
about 300 servers.[19] However, by 2003 the company had started to purchase entire
collocation buildings that were cheaply available due to overexpansion during the dot.com
era. Google soon began to design and build its own data centers containing thousands of
custom-built servers as Google expanded its services and global market and responded to
competitive pressures. Initially, Google was highly secretive about its data center locations
and related technologies; a former Google employee called this Google’s “Manhattan
project.” However, in 2012, Google began to open up its data centers. While this seems
like Google’s had a change of heart and wants to be more transparent about their data
centers to the public, it is in reality more about Google’s self-serving public relations
onslaught to show how its cloud infrastructure is superior to Google’s competitors and to
secure future cloud clients.[20]
As of 2016, Google has data centers in 14 locations around the globe – eight in Americas,
two in Asia and four in Europe – with an unknown number of collocated centers – ones in
which space, servers, and infrastructure are shared with other companies – in undisclosed
locations. The sheer size of Google’s data centers is reflected in its server chip consumption.
In all, Google supposedly accounts for 5% of all server chips sold in the world,[21] and it is
even affecting the price of chips as the company is one of biggest chip buyers. Google’s
recent all


optic cables in US
cities,[25] and investing in building massive undersea cables to maintain its dominance and
expand its markets by controlling Internet infrastructure.[26]
With its own customized servers and software, Google is building a massive data center
network infrastructure, delivering its service at unprecedented speeds around the clock and
around the world. According to one report, Google’s global network of data centers, with a
capacity to deliver 1-petabit-per-second bandwidth, is powerful enough to read all of the
scanned books in the Library of Congress in a fraction of a second.[27] New York Times
columnist Pascal Zachary once reported:
…I believe tha


eded to
support the “new economy” is beginning to occupy and transform our landscapes and
building a new fixed network of global digital production space.
NEW NETWORK OF DIGITAL PRODUCTION SPACE:
RESTRUCTURING INDUSTRIAL CITIES

While Google’s data traffic and exchange extends well beyond geographic boundaries, its
physical plants are fixed in places where digital goods and services are processed and
produced. For the production of material goods, access to cheap labor has long been one of
the primary criteria for companies to select their places of production; but for data centers, a
large quantity of cheap labor is not as important since they require only a small number of
employees. The common characteristics necessary for data center sites have so far been:
good fiber-optic infrastructure; cheap and reliable power sources for cooling and running
servers, geographical diversity for redundancy and speed, cheap land, and locations close to
target markets.[29] Today, if one finds geographical areas in the world with some combination
of these factors, there will likely be data centers there or in the planning stages for the near
future.

P.218

P.219

Given these criteria, there has been an emerging trend of reconfiguration and conversion to
data centers of former industrial sites such as paper mills, printing plants, st


t regions of the upper Northeast, Great Lakes and Midwest regions – previously hubs of
manufacturing industries and heart lands of both industrial capitalism and labor movements –
are turning (or attempting to turn) into hotspots for large-scale data centers for Internet
companies.[30] These cities are the remains of past crises of industrial capitalism as well as of
long labor struggles.
The reasons that former industrial sites in the US and other parts of the world are attractive
for data center conversion is that, starting in the 1970s, many factories had closed or moved
their operations overseas in search of ever-cheaper labor and concomitantly weak or
nonexistent labor laws, leaving behind solid physical plants and industrial infrastructures of
power, water and cooling systems once used to drive industrial machines and production lines
and now perfectly fit for data center development.[31] Especially, finding cheap energy is
crucial for companies like Google since data center energy costs are a major expenditure.
Moreover, many communities surrounding former industrial sites have struggled and become
distressed with increasing poverty, high unemployment and little labor power. Thus, under
the guise of “economic development,” many state and local governments have been eager to
lure data centers by offering lavish subsidies for IT companies. For at least the last five years,
state after state has legislated tax breaks for data centers and about a dozen states have
created customized incentives programs for data center operations.[32] State incentives range
from full or partial exemptions of sales/use taxes on equipment, construction materials, and in
some cases purchases of electricity and backup fuel.[33] This kind of corporate-centric
economic developmen


es; but rather the goal is to, “create a good business climate and therefore to
optimize conditions for capital accumulation no matter what the consequences for
employment or social and environmental well-being.”[34]
Google’s first large-scale data center site is located in one of these struggling former industrial
towns. In 2006, Google opened its first data center in The Dalles – now nicknamed
Googleville – a town of a little over 15,000 located alongside the Columbia River and
about 80 miles east of Portland, Oregon. It is an ideal site in the sense that it is close to a
major metropolitan corrido


ir
workers and left their installed infrastructure behind.
Since then, The Dalles, like other industrial towns, has suffered from high unemployment,
poverty, aging population and budget-strapped schools, etc. Thus, the decision for Google to
build a data center the size of two football fields (68,680-square-foot storage buildings) in
order to take advantage of the preinstalled fiber optic infrastructure, relatively cheap
hydropower from the Dalles Dam, and tax benefits was presented as the new hope


While public subsidies were a necessary precondition of building the
data center,[41] there were no transparency or open public debates on alternative visions of
development that reflects collective community interests.
Google’s highly anticipated data center in The Dalles opened in 2006, but it “opened” only
in the sense that it became operational. To this day, Google’s data center site is off-limits to
the community and is well-guarded, including multiple CCTV cameras which survey the
grounds around the clock. Google might boast of its corporate culture as “open” and “nonhierarchical” but this does not extend to the data centers within the community where Google
benefits as it extracts resources. Not only was the building process secretive, but access to the
data center itself is highly restricted. Data centers are well secured with several guards, gates
and checkpoints. Google’s data center has reshaped the landscape into a pseudo-militarized
zone as it is not far off from a top-secret military compound – access denied.
This kind of landscape is reproduced in other parts of the US as well. New data center hubs
have begun to emerge in other rural communities; one of them is in southwestern North
Carolina where the leading tech giants – Google, Facebook, Apple, Disney and American
Express – have built data centers in close proximity to each other. The cluster of data
centers is referred to as the “NC Data Center Corridor,”[42] a neologism used to market the
area.
At one time, the southwestern part of North Carolina had heavy concentration of highly
labor-intensive textiles and furniture industries that exploited the region’s cheap labor supply
an


anded as a center of the “new economy” geared toward
attracting high-tech industries. For many towns, abandoned manufacturing plants are no
longer an eyesore but rather are becoming major selling points to the IT industry. Rich
Miller, editor of Data Center Knowledge, stated, “one of the things that’s driving the
competitiveness of our area is the power capacity built for manufacturers in the past 50
years.”[44]
In 2008, Google opened a $600 million data center in Lenoir, NC, a town in Caldwell
County (population 18,228[45]). Lenoir was once known as the furniture capital of the South
but lost 1,120 jobs in 2006.[46] More than 300,000 furniture jobs moved away from the
United States during 2000 as f


ide hundreds of good-paying, knowledgebased jobs that North Carolina’s citizens want;”[49] yet, he addressed neither the cost of
attracting Google for taxpayers – including those laid-off factory workers – nor the
environmental impact of the data center. In 2013, Google expanded its operation in Lenoir
with an additional $600 million investment, and as of 2015, it has 250 employees in its
220-plus acre data center site.[50]
The company continues its crusade of giving “hope” to distressed communities and now
“saving” the environment from the old coal-fueled industrial economy. Google’s latest project
in the US is in Widows Creek, Alabama where the company is converting a coal burning
power plant commissioned in 1952 – which has been polluting the area for years – to its 14
th data center powered by renewable power. Shifting from coal to renewable energy seems to
demonstrate how Google has gone “green” and is being a different kind of corporation that
cares for the environment. However, this is a highly calculated business


hat
relying on renewable energy is more economical over the long term than coal – which is
more volatile as commodity prices greatly fluctuate.[51] Google is gobbling up renewable
energy deals around the world to procure cheap energy and power its data centers.[52]
However, Google’s “green” public relations also camouflage environmental damages that are
brought by the data center’s enormous power consumption, e-waste from hardware, rare
earth mining and the environmental damage over the entire supply chain.[53]
The trend of reoccupation of industrial sites by data centers is not confined to the US.
Google’s Internet business operates across territories and more than 50% of its revenues
come from outside the US. As Google’s domestic search market share has stabilized at
around 60% share, the company has aggressively moved to build data centers around the
world for its global expansion. One of Google’s most ambitious data center projects outside
the US was in Hamina, Finland where Google converted a paper mill to a data center.

In 2008, Stora Enso, the Finnish paper maker, in which the Finnish Government held 16%
of the company’s shares and controlled 34% of the company, shut down its Summa paper
mill on the site close to the city of Hamina in Southeastern Finl


mill
and its infrastructure itself.
Whitewashing the workers’ struggles, the Helsinki Times reported that, “everyone was
excited about Google coming to Finland. The news that the Internet giant had bought the old
Stora Enso mill in Hamina for a data centre was great news for a community stunned by job
losses and a slowing economy.”[56] However, the local elites recognized that jobs created by
Google would not drastically affect the city’s unemployment rate or alleviate the economic
plight f


cision by arguing that
connecting Google’s logo to the city’s image would result in increased investments in the
area.[57] The facility had roughly 125 full-time employees when Google announced its
Hamina operation’s expansion in 2013.[58] The data center is monitored by Google’s
customary CCTV cameras and motion detectors; even Google staff only have access to the
server halls after passing biometric authentication using iris recognition scanners.[59]
Like Google’s other data centers, Google’s decision to build a data center in Hamina is not
merely because of favorable existing infrastructure or natural resources. The location of
Hamina as its first Nordic data center is vital and strategic in terms of extending Google’s
reach into geographically dispersed markets, speed and management of data traffic. Hamina
is located close to the border with Russia and the area has long been known for good
Internet connectivity via Scandinavian telecommunications giant TeliaSonera, whose services
and international connections run right through the area


re in Hamina, Google is establishing its
strategic global digital production beach-head for both the Nordic and Russian markets.
As Google is trying to maintain its global dominance and expand its business, the company
has continued to build out its data center operations on European soil. Besides Finland,
Google has built data centers in Dublin, Ireland, and St. Ghislain and Mons in Belgium,
which respectively had expanded their operations after their initial construction. However,
the stories of each of these data centers is similar: aluminum smelting plant town The Dalles,
Oregon and Lenoir North Carolina in the US, paper mill town Hamina, Finland, coalmining town Ghislain–Mons, Belgium and a warehouse converted data center in Dublin,
Ireland. Each of these were once industrial production sites and/or sites for the extraction of
environmental resources turned into data centers creating temporal production spaces to
accelerate digital capitalism. Google’s latest venture in Europe is in a seaport town of

P.222

P.223

Eemshaven, Netherlands which hosts several power stations as well as the transatlantic fiberoptic cable which links the US and Europe.
To many struggling communities around the world, the building of Google’s large-scale data
centers has been presented by the company and by political elites as an opportunity to
participate in the “new economy” – as well as a veiled threat of being left behind from the
“new economy” – as if this would magically lead to the cre


tion
for capitalist development.
CONCLUSION

Is the current physical landscape that supports the “new economy” outside of capitalist social
relations? Does the process of the redevelopment of struggling former industrial cites by
building Google data centers under the slogan of participation in the “new economy” really
meet social needs, and express democratic values? The “new economy” is boasted about as if
it is radically different from past industrial capitalist development, the solut


cal
innovation, relocation and reconstruction of new physical production places to link
geographically dispersed markets, reduction of labor costs, removal of obstacles that hinder its
growth and continuous expansion. Google’s purely market-driven data centers illustrate that
the “new economy” built on data and information does not bypass physical infrastructures and
physical places for the production and distribution of digital commodities. Rather, it is firmly
anchored in the physical world and simply establishes new infrastructures on top of existin


ernet-traffic-to-surpass-one-zettabyte-in-2016/
2. Ibid.

3. Cade Metz, “A new company called Alphabet now owns Google,” Wired, August 10, 2015. http://wired.com/2015/08/
new-company-called-alphabet-owns-google/.
4. Google hasn’t released new data since 2012, but the data extrapolate from based on Google annual growth date. See Danny
Sullivan, “Google Still Doing At Least 1 Trillion Searches Per Year,” Search Engine Land, January 16, 2015, http://
searchengineland.com/google-1-trillion-searches-per-year-212940
5. This is Google’s desktop search engine market as


07742/alphabet-annual-global-revenue/.
7. “Advertising revenue of Google from 2001 to 2015 (in billion U.S. dollars),” Statista, http://www.statista.com/
statistics/266249/advertising-revenue-of-google/.
8. Seven Levy, “Secret of Googlenomics: Data-Fueled Recipe Brews Profitability,” Wired, May 22, 2009, http://
www.wired.com/culture/culturereviews/magazine/17-06/nep_googlenomics?currentPage=all.
9. Daniel Bell, The Coming of Post-Industrial Society: A Venture In Social Forecasting (New York


ontention?” Open Democracy, October 13, 2015, https://
www.opendemocracy.net/digitaliberties/dan-schiller/digital-capitalism-stagnation-and-contention.
15. Ibid: 113-117.
16. Jason Hiner, “Why Microsoft, Google, and Amazon are racing to run your data center.” ZDNet, June 4, 2009, http://
www.zdnet.com/blog/btl/why-microsoft-google-and-amazon-are-racing-to-run-your-data-center/19733.
17. Derrick Harris, “Google had its biggest quarter ever for data center spending. Again,” Gigaom, February 4, 2015, https://
gigaom.com/2015/02/04/google-had-its-biggest-quarter-ever-for-data-center-spending-again/.
18. Ibid.
19. Steven Levy, In the plex: how Google thinks, works, and shapes our lives )New York: Simon & Schuster, 2011), 182.
20. Steven Levy, “Google Throws Open Doors to Its Top-Secret Data Center,” Wired, October 17 2012, http://
www.wired.com/2012/10/ff-inside-google-data-center/.
21. Cade Metz, “Google’s Hardware Endgame? Making Its Very Own Chips,” Wired, February 12, 2016, http://
www.wired.com/2016/02/googles-hardware-end


gle's undersea cable,” Computerworld, July 14, 2015, http://
www.computerworld.com/article/2947841/network-hardware-solutions/9-things-you-didnt-know-about-googles-underseacable.html
27. Jaikumar Vijayan, “Google Gives Glimpse Inside Its Massive Data Center Network,” eWeek, June 18, 2015, http://
www.eweek.com/servers/google-gives-glimpse-inside-its-massive-data-center-network.html
28. Pascal Zachary, “Unsung Heroes Who Move Products Forward,” New York Times, September 30, 2007, http://
www.nytimes.com/2007/09/30/technology/30ping.html

P.224

P.225

29. Tomas Freeman, Jones Lang, and Jason Warner, “What’s Important in the Data Center Location Decision,” Spring 2011,
http://www.areadevelopment.com/siteSelection/may2011/data-center-location-decision-factors2011-62626727.shtml
30. “From rust belt to data center green?” Green Data Center News, February 10, 2011, http://www.greendatacenternews.org/
articles/204867/from-rust-belt-to-data-center-green-by-doug-mohney/
31. Rich Miller, “North Carolina Emerges as Data Center Hub,” Data Center Knowledge, November 7, 2010, http://
www.datacenterknowledge.com/archives/2010/11/17/north-carolina-emerges-as-data-center-hub/.
32. David Chernicoff, “US tax breaks, state by state,” Datacenter Dynamics, January 6, 2016, http://
www.datacenterdynamics.com/design-build/us-tax-breaks-state-by-state/95428.fullarticle; Case Study: Server Farms,” Good
Job First, http://www.goodjobsfirst.org/corporate-subsidy-watch/server-farms.
33. John Leino, “The role of incentives in Data Center Location Decisions,” Critical Environment Practice, February 28, 2011,
http://www.cbrephoenix.com/wp_eig/?p=68.
34. David, Harvey, Spaces of global capitalism (London: Verso. 2006), 25.
35. Marsha Spellman, “Broadband, and Google, Come to


fe.
37. Ginger Strand, “Google’s addition to cheap electricity,” Harper Magazine, March 2008, https://web.archive.org/web/20080410194348/http://harpers.org/media/
slideshow/annot/2008-03/index.html.

38. Linda Rosencrance, “Top-secret Google data center almost completed,” Computerworld, June 16, 2006, http://
www.computerworld.com/article/2546445/data-center/top-secret-google-data-center-almost-completed.html.
39. Bryon Beck, “Welcome to Googleville America’s newest information superhighway begins On Oregon’s Silicon Prairie,”
Willamette Week, June 4, 2008, http://wweek.com/portland/article-9089-welcome_to_googleville.html.
40. Rich Miller, “Google & Facebook: A Tale of Two Data Centers,” Data Center Knowledge, August 2, 2010, http://
www.datacenterknowledge.com/archives/2010/08/10/google-facebook-a-tale-of-two-data-centers/
41. Ibid.
42. Alex Barkinka, “From textiles to tech, the state’s newest crop,” Reese News Lab, April 13, 2011


/
reesenews.org/2011/04/13/from-textiles-to-tech-the-states-newest-crop/14263/.
43. “Textile & Apparel Overview,” North Carolina in the Global Economy, http://www.ncglobaleconomy.com/textiles/
overview.shtml.
44. Rich Miller, “The Apple-Google Data Center Corridor,” Data Center knowledge, August 4, 2009, http://
www.datacenterknowledge.com/archives/2009/08/04/the-apple-google-data-center-corridor/.
45. “2010 Decennial Census from the US Census Bureau,” http://factfinder.census.gov/bkmk/cf/1.0/en/place/Lenoir cit


g,” Federal Reserve Bank of
Richmond, Working Paper Series, September 2004, https://www.richmondfed.org/~/media/richmondfedorg/publications/
research/working_papers/2004/pdf/wp04-7.pdf
48. Stephen Shankland, “Google gives itself leeway for N.C., data center,” Cnet, December 5, 2008, http://
news.cnet.com/8301-1023_3-10114349-93.html; Bill Bradley, “Cities Keep Giving Out Money for Server Farms, See
Very Few Jobs in Return,” Next City, August 15, 2013, https://nextcity.org/daily/entry/cities-keep-giving-out-money-forserver-farms-see-few-jobs-in-return.
49. Katherine Noyes, “Google Taps North Carolina for New Datacenter,” E-Commerce Times, January 19, 2007, http://
www.ecommercetimes.com/story/55266.html?wlc=1255976822
50. Getahn Ward, “Google to invest in new Clarksville data center,” Tennessean, December 22, 2015, http://
www.tennessean.com/story/money/real-estate/2015/12/21/google-invest-500m-new-clarksville-data-center/77474046/.
51. Ingrid Burrington, “The Environmental Toll of a Netflix Binge,” Atlantic, Decem


ulp-mills.
56. David Cord, “Welcome to Finland,” The Helsinki Times, April 9, 2009, http://www.helsinkitimes.fi/helsinkitimes/2009apr/
issue15-95/helsinki_times15-95.pdf.
57. Elina Kervinen, Google is busy turning the old Summa paper mill into a data centre. Helsingin Sanomat International Edition,
October 9, 2010, https://web.archive.org/web/20120610020753/http://www.hs.fi/english/article/Google+is+busy
+turning+the+old+Summa+paper+mill+into+a+data+centre/1135260141400.
58. “Google invests 450M in expansion of Hamina data centre,” Helsinki Times, November 4, 2013, http://
www.helsinkitimes.fi/business/8255-google-invests-450m-in-expansion-of-hamina-data-centre.html.
59. “Revealed: Google’s new mega data center in Finland,” Pingdon, September 15, 2010, http://
royal.pingdom.com/2010/09/15/googles-mega-data-center-in-finland/
60. Ibid.
61. Shiv Mehta, “What's Google Strategy for the Russian Market?” Investopedia, July 28, 2015, http://www.inves


t the fundamental principles of modern town planning.”[6] For Le Corbusier
“statistics are merciless things,” because they “show the past and foreshadow the future”[7],
therefore such a formula must be based on the objectivity of diagrams, data and maps.

CORBUSIER - SCHEME FOR THE TRAFFIC
CIRCULATION

P.238

P.239

OTLET'S FORMULA

Moreover, they “give us an exact picture of
our present state and also of former states;
[...] (through statistics) we are enabled to
penetrate the future a


with contemporary architectural developments. As new modernist forms and use of
materials propagated the abundance of decorative
From A bag but is language nothing
elements, Otlet believed in the possibility of language as
of words:
a model of 'raw data', reducing it to essential information
and unambiguous facts, while removing all inefficient
Tim Berners-Lee: [...] Make a
beautiful website, but first give us the
assets of ambiguity or subjectivity.
“Information, from which has been removed all


d it's right
functional and holistic desiderata.”[12] An abstraction
would enable Otlet to constitute the “equation of
urbanism” as a type of sociology (S): U = u(S), because
according to his definition, urbanism “is an art of

unadulterated data, we want the data.
We want unadulterated data. OK, we
have to ask for raw data now. And
I'm going to ask you to practice that,
OK? Can you say "raw"?
Audience: Raw.
Tim Berners-Lee: Can you say
"data"?
Audience: Data.
TBL: Can you say "now"?
Audience: Now!
TBL: Alright, "raw data now"!
[...]

From La ville intelligente - Ville de la
connaissance:
Étant donné que les nouvelles formes
modernistes et l'utilisation de
matériaux propageaient l'abondance
d'éléments décoratifs, Paul Otlet
croyait en la possibilité du langage


ds from the ideas of Auguste Comte, Frederic Le Play and Elisée Reclus in
order to reach a unified understanding of an urban development in a special context. This
position would allow to represent the complexity of an inhabited environment through data.[15]
THINKING THE MUNDANEUM

The only person that Otlet considered capable of the architectural realization of the
Mundaneum was Le Corbusier, whom he approached for the first time in spring 1928. In
one of the first letters he addressed the need to


preparing the ways for the coming years”, Le Corbusier wrote to Arthur

Fontaine and Albert Thomas from the International Labor Organization that prediction is
free and “preparing the ways for the coming years”.[28] Free because statistical data is always
available, but he didn't seem to consider that prediction is a form of governing. A similar
premise underlies the present domination of the smart city ideologies, where large amounts of
data are used to predict for the sake of efficiency. Although most of the actors behind these
ideas consider themselves apolitical, the governmental aspect is more than obvious. A form of
control and government, which is not only biopolitical but rather epistemic. The data is not
only used to standardize units for architecture, but also to determine categories of knowledge
that restrict life to the normality of what can be classified. What becomes clear in this
juxtaposition of Le Corbusier's and Paul Otlet's work is


into the everyday
experience, and becomes with material, form and function an actor that performs an epistemic
practice on its inhabitants and users. In this case: the conception that everything can be
known, represented and (pre)determined through data.

P.242

P.243

1. Paul Otlet, Monde: essai d'universalisme - Connaissance du Monde, Sentiment du Monde, Action organisee et Plan du Monde
, (Bruxelles: Editiones Mundeum 1935): 448.
2. Steve Lohr, Sidewalk Labs, a Start-Up Created by Google, Has B


essentielles et aux faits sans
ambiguïté, tout en se débarrassant de tous les éléments
inefficaces et subjectifs.

From A bag but is language nothing
of words:
Tim Berners-Lee: [...] Make a
beautiful website, but first give us the
unadulterated data, we want the data.
We want unadulterated data. OK, we
have to ask for raw data now. And
I'm going to ask you to practice that,
OK? Can you say "raw"?

« Des informations, dont tout déchet et élément étrangers
Audience: Raw.
ont été supprimés, seront présentées d'une manière assez
analytique. Elles seront encodées sur différentes feuilles
Tim Berners-Lee: Can you say
"data"?
ou cartes plutôt que confinées dans des volumes, » ce qui
permettra l'annotation standardisée de l'hypertexte pour
Audience: Data.
la classification décimale universelle ( CDU ).[11] De plus,
TBL: Can you say "now"?
la « régulation à travers l'architecture et sa tendance à un
urbanisme total favoriseront une meilleure compréhension Audience: Now!
du livre Traité de documentation ainsi que du désidérata
TBL: Alright, "raw data now"!
fonctionnel et holistique adéquat. »[12] Une abstraction
[...]
permettrait à Paul Otlet de constituer « l'équation de
l'urbanisme » comme un type de sociologie : U = u(S),
car selon sa définition, l'urbanisme « L'urbanisme est l'art
Fr


de
materials propagated the abundance
toute l'activité qu'une Société déploie pour arriver au but
of decorative elements, Otlet believed
qu'elle se propose ; l'expression matérielle (corporelle)
in the possibility of language as a
model of 'raw data', reducing it to
de son organisation. »[13] La position scientifique qui
essential information and
détermine toutes les valeurs caractéristiques d'une
unambiguous facts, while removing all
certaine région par une classification et une observatio


Traité de Documentation . In 1934, Otlet did not have enough money to pay
for the full print-run of the book and therefore the edition remained with Van Keerberghen
who would distribute the copies himself through mailorders. The plaque on the door dates
from the period that the Traité was printed. So far we have not been able to confirm whether
this family-business is still in operation.

P.270

P.271

RUE OTLET

O P T I O N A L :

(from Rue Piers, ca. 30") Follow Rue
Piers and turn left into
Me


esteenweg, continue
onto Chaussée de Mons. Turn left onto
Otletstraat. Alternatively you can
take tram 51 or 81 to Porte
D'Anderlecht.

Although it seems that this dreary street is named to honor Paul Otlet, it already
mysteriously appears on a map dated 1894 when Otlet was not even 26 years old [19] and
again on a map from 1910, when the Mundaneum had not yet opened it's doors.[20]

P.272

P.273

OUTSIDE BRUSSELS

1998: THE MUNDANEUM RESURRECTED

Bernard Anselme, le nouveau ministre-président de


ets
technology". The Mundaneum archive center plays a central role in the media-campaigns
and activities leading up to the festive year. In that same period, the center undergoes a largescale renovation to finally brings the archive facilities up to date. A new reading room is
named after André Canonne, the conference room is called Utopia. The mise-en-scène of
Otlet's messy office is removed, but otherwise the scenography remains largely unchanged.

P.274

P.275

2007: CRYSTAL COMPUTING

Jean-Pa


accord de confidentialité liait Google, l’Awex et l’Idea,
notamment. «A plusieurs reprises, on a eu chaud, parce qu’il était prévu qu’au
[27]
moindre couac sur ce point, Google arrêtait tout»
Beaucoup de show, peu d’emplois: Pour son data center belge, le géant des
moteurs de recherche a décroché l’un des plus beaux terrains industriels de
Wallonie. Résultat : à peine 40 emplois directs et pas un euro d’impôts. Reste que
la Région ne voit pas les choses sous cet angle. En


négociant avec son fournisseur d’électricité
[28]
(NDLR : Electrabel) une réduction de son énorme facture.

In 2005, Elio di Rupo succeeds in bringing a company "Crystal Computing" to the region,
code name for Google inc. who plans to build a data-center at Saint Ghislain, a prime
industrial site close to Mons. Promising 'a thousand jobs', the presence of Google becomes a
way for Di Rupo to demonstrate that the Marshall Plan for Wallonia, an attempt to "step up
the efforts taken to put Wallonia back on the track to prosperity" is attaining its goals. The
first data-center opens in 2007 and is followed by a second one opening in 2015. The
direct impact on employment in the region is estimated to be somewhere between 110[29] and
120 jobs.[30]

P.276

P.277

Last
Revision:
2·08·2016

1. Paul Otlet (1868-1944)


ed on a common subject:
• Think of a category that is the common ground for the link. For example if two texts
refer to a similar issue or specific concept (eg. 'rawdata'), formulate it without
spaces or using underscores (eg. 'raw_data', not 'raw data' );
• Edit the two or more pages which you want to link, adding {{RT|rawdata}}

data /> before the text section, and end=rawdata /> at the end (take care of the closing '/>' );
• All text sections in other wiki pages th


; 1962.
• Marlene Manoff, "Theories of the archive from across the
disciplines," in portal: Libraries and the Academy, Vol. 4, No.
1 (2004), pp. 9–25.
• Charles van den Heuvel, W. Boyd Rayward, Facing
Interfaces: Paul Otlet's Visualizations of Data Integration.

Journal of the American society for information science and
technology (2011).

DON'T BE EVIL

Standing on the hands of Internet giants.
• Rene Koenig, Miriam Rasch (eds), Society of the Query
Reader: Reflections on Web Search, Amst


offey, Evil Media. Cambridge,
Mass., United States: MIT Press, 2012.
• Steve Levy In The Plex. Simon & Schuster, 2011.
• Dan Schiller, ShinJoung Yeo, Powered By Google: Widening
Access and Tightening Corporate Control in: Red Art: New
Utopias in Data Capitalism, Leonardo Electronic Almanac,
Volume 20 Issue 1 (2015).
• Invisible Committee, Fuck Off Google, 2014.
• Dave Eggers, The Circle. Knopf, 2014.
• Matteo Pasquinelli, Google’s PageRank Algorithm: A
Diagram of the Cognitive Capitalism


dat in Constant 2018


ensified identity shaping and self-management. It also affects the
public, as more and more libraries, universities and public
infrastructures as well as the management of public life rely on
\"solutions\" provided by private companies. Centralizing data flows in
the clouds, services blur the last traces of the thin line that
separates bio- from necro-politics.

Given how fast these changes resonate and reproduce, there is a growing
urgency to engage in a critique of software that goes beyond taking


the background?

We adopted the term of observation for a number of reasons. We regard
observation as a way to approach software, as one way to organize
engagement with its implications. Observation, and the enabling of
observation through intensive data-centric feedback mechanisms, is part
of the cybernetic principles that underpin present day software
production. Our aim was to scrutinize this methodology in its many
manifestations, including in \"observatories\" \-- high cost
infrastructures \[te


s
> people of all kinds. A wide range of professional and amateur
> practitioners will provide you with
> Software-as-a-Critique-as-a-Service on the spot. Available services
> range from immediate interface critique, collaborative code
> inspection, data dowsing, various forms of network analyses,
> unusability testing, identification of unknown viruses, risk
> assessment, opening of black-boxes and more. Free software
> observations provided. Last intake at 16:45.\
> (invitation to the Walk-In Clin


arding software curiosity. The
publisher will not accept any responsability in case of damages caused
by misuse, misundestanding of instruction or lack of curiosity. By
trying the action exposed in the guide, you accept the responsability of
loosing data or altering hardware, including hard disks, usb key, cloud
storage, screens by throwing them on the floor, or even when falling on
the floor with your laptop by tangling your feet in an entanglement of
cables. No harm has been done to human, animal,


he fact that it\'s the
conservation of multiple stages of life of a software since its initial
computerization until today. The idea of introducing informatics into
the work of working with/on the Bible (versions in Hebrew, Greek, Latin,
and French) dates back to 1971, via punch card recordings and their
memorization on magnetic tape. Then came the step of analyzing texts
using computers.

[SHOW IMAGE HERE:
http://gallery.constantvzw.org/var/resizes/Preparing-the-Techno-galactic-Software-Observatory


/dev/mem` tools to explore processes stored in the memory

ps ax | grep process
cd /proc/numberoftheprocess
cat maps

\--\> check what it is using

The proc filesystem is a pseudo-filesystem which provides an interface
to kernel data structures. It is commonly mounted at `/proc`. Most of it
is read-only, but some files allow kernel variables to be changed.

dump to a file\--\>change something in the file\--\>dump new to a
file\--\>diff oldfile newfile

\"where am i?\"

to find r


descriptor} [Example]{.example .empty
.descriptor}

` {.verbatim}
# ends of time

https://en.wikipedia.org/wiki/Year_2038_problem

Exact moment of the epoch:
03:14:07 UTC on 19 January 2038

## commands

local UNIX time of this machine
%XBASHCODE: date +%s

UNIX time + 1
%BASHCODE: echo $((`date +%s` +1 ))

## goodbye unix time

while :
do
sleep 1
figlet $((2147483647 - `date +%s`))
done

# Sundial Time Protocol Group tweaks

printf 'Current Time in Millennium Unix Time


tension between
\"software as language\" and \"software as operation\".]{.why
.descriptor} [How: By running a quine you will get your code back. You
may do a step forward and wonder about functionality and aesthetics,
uselessness and performativity, data and code.]{.how .descriptor}
[Example: A quine (Python). When executed it outputs the same text as
the source:]{.example .descriptor}

` {.sourceCode .python}
s = 's = %r\nprint(s%%s)'
print(s%s)
`

[Example: A oneline unibash/etherpad


.\" \"Aquine
is aquine is aquine. \" Aquine is not a quine This is not aquine

[Remember: Although seemingly absolutely useless, quines can be used as
exploits.]{.remember .descriptor}

Exploring boundaries/tensions

databases treat their content as data (database punctualization) some
exploits manage to include operations in a database

[TODO: RELATES TO
http://pad.constantvzw.org/p/observatory.guide.monopsychism]{.tmp}
[]{#zwu0ogu0 .anchor}
[[Method:](http://pad.constantvzw.org/p/observatory.guide.glossary)
Glossaries as an exercise]{.method .descriptor} [What: Use the techni


l use it? If not why?



- Do you pay for media distribution/streaming services?
- Do you remember your first attempt at using free software and how
did that make you feel?
- Have you used any of these software services : facebook, dating app
(grindr, tinder, etc.), twitter, instagram or equivalent.



- Can you talk about your favorite apps or webtools that you use
regularly?
- What is most popular software your friends use?



- SKILL
- Would you s


ut of market due to problematic
licencing agreement (intuitivly I knew it was wrong) - it had too much
unprofessional pixeleted edges in its graphics.

### \...meAsSoftwareUserOfDatingWebsites:

\"I got one feature request implemented by a prominent dating website
(to search profiles by language they speak), however I was never
publicly acknowledged (though I tried to make use of it few times), that
made our relations feel a bit exploitative and underappreciated. \"

### \...meAsSoftwareUserTryingTo


ome proprietary drivers.

Q: Do you remember your first attempt at using free software and how did
that make you feel?\
A: Yes i installed my dual boot in \... 10 years ago. scared and
powerful.

Q: Do you use one of this software service: facebook, dating app (grindr
of sort), twitter, instagram or equivalent?\
A: Google, gmail that\'s it

Q: Can you talk about your favorite apps or webtools that you use
regularly?\
A: Music player. vanilla music and f-droid. browser. I pay attention to
clearing my


pped working
the evening before, but it was unclear why. So I started looking around
the pi\'s filesystem to find out what was wrong. Took me a while to find
the relevant lines in /var/log/syslog but it became clear that there was
a problem with the database. Which database? Where does etherpad
\'live\'? I found it in /opt/etherpad and in a subdirectory named var/
there it was: dirty.db, and dirty it was.

A first look at the file revealed no apparent problem. The last lines
looked like this:

`{"key":"sessionstora


dat in Custodians 2015


own knowledge today. Consider Elsevier, the
largest scholarly publisher, whose 37% profit margin1 stands in sharp contrast
to the rising fees, expanding student loan debt and poverty-level wages for
adjunct faculty. Elsevier owns some of the largest databases of academic
material, which are licensed at prices so scandalously high that even Harvard,
the richest university of the global north, has complained that it cannot
afford them any longer. Robert Darnton, the past director of Harvard Library,
s


for
what we here urge you to stand up for too, wrote: "We need to take
information, wherever it is stored, make our copies and share them with the
world. We need to take stuff that's out of copyright and add it to the
archive. We need to buy secret databases and put them on the Web. We need to
download scientific journals and upload them to file sharing networks. We need
to fight for Guerilla Open Access. With enough of us, around the world, we'll
not just send a strong message opposing the privati


dat in Dean, Dockray, Ludovico, Broekman, Thoburn & Vilensky 2013


oving image files, analytic frameworks, slogans or memes (‘We
are the 99%’), but also more abstract forms such as densities of reposting
and forwarding, and, in that wonderful ‘VersuS’ social media visualisation
you mention, cartographies of data flow. Here a multiplicity of social media
communications, each with their particular communicative function on the
day, are converted into a strange kind of collective, intensive entity, a digital
‘solar flare’ as you put it.6 Its creators, ‘A


has had to keep
moving. Since this perpetual change seems to be part of the nature of the
project, my convention has been to be deliberately inconsistent with the name.
I think one part of what you’re referring to about the web is the way in
which data moves from place to place in two ways - one is that it is copied
between directories or computers; and the other is that the addressing is
changed. Although it seems fairly stable at this point, over time it changes
significantly with things slippin


e #30 we delivered ‘Notepad’ to all our
subscribers - an artwork by the S.W.A.M.P. duo. It was an apparently ordinary
yellow legal pad, but each ruled line, when magnified, reveals itself to be
‘microprinted’ text enumerating the full names, dates, and locations of each
Iraqi civilian death on record over the first three years of the Iraq War. And
in issue #40 we’ve printed and will distribute in the same way a leaflet of
the Newstweek project (a device which hijacks online major news webs


f ‘archipelagos’ of previously submerged archives that would emerge,
if collectively and digitally indexed, and shared with those who need to access
them. I’m trying to apply this to Neural itself in the ‘Neural Archive’ project,
an online database with all the data about the publications received by Neural
during the years, which should be part of a larger network of small institutions,
whose final goal would be to test and then formulate a viable model to easily
build and share these kind of databases.
Turning to my projects outside of Neural, these social and commercial
aspects of the relation between the materiality of the printed page and the
manipulability of its digital embodiment were foregrounded in Amazon Noir,
an artwork which I dev


truct our own physical memory.
Finally, in Face to Facebook (developed again with Paolo Cirio and part of
the ‘Hacking Monopolism’ trilogy together with Amazon Noir and Google
Will Eat Itself) we ‘stole’ 1 million Facebook profiles’ public data, filtering
them through their profile pictures with face-recognition software, and then
170

New Formations

posted all the filtered data on a custom-made dating website, sorted by their
facial expression characteristics.11 In the installation we produced, we glued
more than 1,700 profile pictures on white-painted square wood panels,
and projected also the software diagram and an introductory video. Here
the ‘printed’ part deals more with materializing ‘stolen’ personal online
information. The ‘profile pictures’ treated as public data by Facebook, and
scraped with a script by Paolo and me, once properly printed are a terrific
proof of our online fragility and at the same time of how ‘printing’ is becoming
a contemporary form of ‘validation’. In fact we decided to print th


, I analyze ten different moments in
history when the death of paper was announced (before the digital); of course,
it never happened, proving that perhaps even current pronouncements
will prove to be mistaken (by the way, the first one I’ve found dates back to
1894, which explains the subtitle). In the second chapter I’ve tried to track
a history of how avant-garde and underground movements have used print
Materialities Of Independent Publishing 171

11. http://www.faceto-facebook.net/

tactic


dat in Dekker & Barok 2017


platform for texts and media started to emerge and
Monoskop became a reality. More than a decade later, Barok
is well-known as the main editor of Monoskop. In 2016, he
began a PhD research project at the University of Amsterdam. His project, titled Database for the Documentation of
Contemporary Art, investigates art databases as discursive
platforms that provide context for artworks. In an extended
email exchange, we discuss the possibilities and restraints
of an online ‘archive’.
ANNET DEKKER

You started Monoskop in 2004, already some time ago. What
does the n


d
archives (in Mons)4 last year.
28 May 2016.

AD

https://monoskop.org/
Ideographies_of_
Knowledge. Accessed
28 May 2016.

Did you have a background in library studies, or have
you taken their ideas/methods of systemization and categorization (meta data)? If not, what are your methods
and how did you develop them?

213

COPYING AS A WAY TO START SOMETHING NEW

4

been an interesting process, clearly showing the influence
of a changing back-end system. Are you interested in the
idea of sharing and


her hand, besides providing access, digital
libraries are also fit to provide context by treating publications as a corpus of texts that can be accessed through an
unlimited number of interfaces designed with an understanding of the functionality of databases and an openness
to the imagination of the community of users. This can
be done by creating layers of classification, interlinking
bodies of texts through references, creating alternative
indexes of persons, things and terms, making full-text
se


016.

to also include other artists and movements around the
world.11
AD

Can you say something about the longevity of the project?
You briefly mentioned before that the web was your best
backup solution. Yet, it is of course known that websites
and databases require a lot of maintenance, so what will
happen to the type of files that you offer? More and more
voices are saying that, for example, the PDF format is all
but stable. How do you deal with such challenges?
DB

Surely, in the realm of bits,


to embrace redundancy, to promote
spreading their contents across as many nodes and sites
as anyone wishes. We may look at copying not as merely
mirroring or making backups, but opening up for possibilities to start new libraries, new platforms, new databases.
That is how these came about as well. Let there be Zzzzzrgs,
Ůbuwebs and Multiskops.

AD

What were your biggest challenges beside technical ones?
For example, have you ever been in trouble regarding copyright issues, or if not, how would you


dat in Dockray 2010


ey.
Those categorical definitions offer very little to
help think about digital files and their native
tendency to replicate and travel across networks.
What kinds of public spaces are these, coming into
the foreground by an incessant circulation of data?
Tw o paradigmatic forms of publicness can be
described through the lens of the scan and the
export, two methods for producing a digital text.
Although neither method necessarily results in a
file that must be distributed, such files typically
are.


erent after all: one is a legitimate
copy and the other is not. Legitimacy in this case
has nothing whatsoever to do with internal traits,
such as fidelity to the original, but with external
ones, namely, records of economic transactions in
customer databases.
In practical terms, this means that a digital
book must be purchased by every single reader.
Unlike the book, which is commonly purchased,
read, then handed it off to a friend (who then
shares it with another friend and so on until it
comes to


ear as it loses the historical struc­
ture of the book and becomes pure, continuous
text. For example, page numbers give way to the
more abstract concept of a "location" when the
file is derived from the export as opposed to the
scan, from the text data as opposed to the
physi­cal object. The act of reading in a group is also

100

different ways. An analogy: they are not prints
from the same negative, but entirely different
photographs of the same subject. Our scans are
variations, perhaps compe


dat in Dockray 2013


eat. Such repetition then gives way to copy-andpasting (or merely calling). The analogy here is to the robot, to the replacement of human labor
with technology.

Now, when a program is in the midst of being executed, the computer's memory fills with data -but some of that is obsolete, no longer necessary for that program to run. If left alone, the memory
would become clogged, the program would crash, the computer might crash. It is the role of the
garbage collector to free up memory, deleting what i


to every personcomputer. If the files were books.. then this collective collection would be a public library.

In order for a system like this to work, for the inputs and the outputs to actually engage with one
another to produce action or transmit data, there needs to be something in place already to enable
meaningful couplings. Before there is any interaction or any relationship, there must be some
common ground in place that allows heterogenous objects to ‘talk to each other’ (to use a phras


through low prices and wide selection is
the most visible platform for buying books and uses that position to push retailers and publishers
both to, at best, the bare minimum of profitability.

In addition to selling things to people and collecting data about its users (what they look at and
what they buy) to personalize product recommendations, Amazon has also made an effort to be a
platform for the technical and logistical parts of other retailers. Ultimately collecting data from
them as well, Amazon realizes a competitive advantage from having a comprehensive, up-to-theminute perspective on market trends and inventories. This volume of data is so vast and valuable
that warehouses packed with computers are constructed to store it, protect it, and make it readily
available to algorithms. Data centers, such as these, organize how commodities circulate (they run
business applications, store data about retail, manage fulfillment) but also - increasingly - they
hold the commodity itself - for example, the book. Digital book sales started the millennium very
slowly but by 2010 had overtaken hardcover sales.

Amazon’s store of digital books (or Apple’s or Google’s, for that matter) is a distorted reflection of
the collection circulating within the file-sharing network, displaced from personal computers to
corporate data centers. Here are two regimes of digital property: the swarm and the cloud. For
swarms (a reference to swarm downloading where a single file can be downloaded in parallel
from multiple sources) property is held in common between peers -- however, property is
positioned out of reach, on the cloud, accessible only through an interface that has absorbed legal
and business requirements.

It's just half of the story, however, to associate the cloud with mammoth data centers; the other
half is to be found in our hands and laps. Thin computing, including tablets and e-readers, iPads
and Kindles, and mobile phones have co-evolved with data centers, offering powerful, lightweight
computing precisely because so much processing and storage has been externalized.

In this technical configuration of the cloud, the thin computer and the fat data center meet through
an interface, inevitably clean and simple, that manages access to the remote resources. Typically,
a person needs to agree to certain “terms of service,” have a unique, measurable account, and
provide payment information; in


of
operation cut, and are in some cases being closed down entirely, and on the other side, the
traditional publishing industry finds its stores, books, and profits dematerialized, the image is
perhaps appropriate. Server racks, in photographs inside data centers, strike an eerie resemblance
to library stacks - - while e-readers are consciously designed to look and feel something like a
book. Yet, when one peers down into the screen of the device, one sees both the book - and the
library.

Like a Fac


arehouses of remote, secure hard
drives. But the cloud internalizes processing as well as storage, capturing the new forms of cooperation and collaboration characterizing the new economy and its immaterial labor. Social
relations are transmuted into database relations on the "social web," which absorbs selforganization as well. Because of this, the cloud impacts as strongly on the production of
publications, as on their consumption, in the tradition sense.

Storage, applications, and services offered in the cloud are marketed for consumption by authors
and publishers alike. Document editing, project management, and accounting are peeled slowly
away from the office staff and personal computers into the data centers; interfaces are established
into various publication channels from print on demand to digital book platforms. In the fully
realized vision of cloud publishing, the entire technical and logistical apparatus is externalized,
leaving only the h


ised by
themselves, but now without the omnipresent threat of legal prosecution. One has the sneaking
suspicion though.. that such a compromise is as hollow.. as the promises to a desperate city of the

jobs that will be created in a new constructed data center - - and that pitting “food on the table”
against “access to knowledge” is both a distraction from and a legitimation of the forms of power
emerging in the cloud. It's a distraction because it's by policing access to knowledge that the


ations are becoming more wealthy, or working less to survive. If we turn the picture
sideways, however, a new contradiction emerges, between the indebted, living labor - of authors,
editors, translators, and readers - on one side, and on the other.. data centers, semiconductors,
mobile technology, expropriated software, power companies, and intellectual property.
The talk in the data center industry of the “industrialization” of the cloud refers to the scientific
approach to improving design, efficiency, and performance. But the term also recalls the basic
narrative of the Industrial Revolution: the movement from home-based


nce, we shift from a
networked, but small-scale, relationship to computation (think of “home publishing”) to a
reorganized form of production that puts the accumulated energy of millions to work through
these cloud companies and their modernized data centers.

What kind of buildings are these blank superstructures? Factories for the 21st century? An engineer
named Ken Patchett described the Facebook data center that way in a television interview, “This is
a factory. It’s just a different kind of factory than you might be used to.” Those factories that we’re
“used to,” continue to exist (at Foxconn, for instance) producing the infrastructure, under
recognizably exploitative conditions, for a “different kind of factory,” - a factory that extends far
beyond the walls of the data center.

But the idea of the factory is only part of the picture - this building is also a mine.. and the
dispersed workforce devote most of their waking hours to mining-in-reverse, packing it full of data,
under the expectation that someone - soon - will figure out how to pull out something valuable.

Both metaphors rely on the image of a mass of workers (dispersed as it may be) and leave a darker
and more difficult possibility: the data center is like the hydroelectric plant, damming up property,
sociality, creativity and knowledge, while engineers and financiers look for the algorithms to
release the accumulated cultural and social resources on demand, as profit.

This returns us


osures have taken form (for example, see
Apple's iOS products, Google's search box, and Amazon's "marketplace"). Control over the
interface is guaranteed by control over the entire techno-business stack: the distributed hardware
devices, centralized data centers, and the software that mediates the space between. Every major
technology corporation must now operate on all levels to protect against any loss.

There is a centripetal force to the cloud and this essay has been written in its irresistible


gravity and the seeming
insurmountability of it all, there is no chance that the system will absolutely manage and control
the noise within it. Riots break out on the factory floor; algorithmic trading wreaks havoc on the
stock market in an instant; data centers go offline; 100 million Facebook accounts are discovered
to be fake; the list will go on. These cracks in the interface don't point to any possible future, or
any desirable one, but they do draw attention to openings that might circumvent th


dat in Dockray, Forster & Public Office 2018


are unwanted by official institutions or, worse,
buried beneath good intentions and bureaucracy, then what tools and platforms
and institutions might we develop instead?

While trying to both formulate and respond to these questions, we began making
Dat Library and HyperReadings:

**Dat Library** distributes libraries across many computers so that many
people can provide disk space and bandwidth, sharing in the labour and
responsibility of the archival infrastructure.

**HyperReadings** implements ‘reading lists’ or a structured set of pointers
(a list, a syllabus, a bibliography, etc.) into one or more libraries,
_activating_ the archives.

## Installation

The easiest way to get started is to install [Dat Library as a desktop
app](http://dat-dat-dat-library.hashbase.io), but there is also a programme
called ‘[datcat](http://github.com/sdockray/dat-cardcat)’, which can be run on
the command line or included in other NodeJS projects.

## Accidents o


s worldwide. When
expanding the scope to consider public, private, and community libraries, that
number becomes uncountable.

Published during the early days of the World Wide Web, the report acknowledges
the emerging role of digitization (“online databases, CD-ROM etc.”), but today
we might reflect on the last twenty years, which has also introduced new forms
of loss.

Digital archives and libraries are subject to a number of potential hazards:
technical accidents like disk failures, accidental deletions, misplaced data
and imperfect data migrations, as well as political-economic accidents like
defunding of the hosting institution, deaccessioning parts of the collection
and sudden restrictions of access rights. Immediately after library.nu was
shut down on the grounds of copyright in


e as sites for [biblioleaks](https://www.jmir.org/2014/4/e112/).
Furthermore, given the vulnerability of these archives, we ought to look for
alternative approaches that do not rule out using their resources, but which
also do not _depend_ on them.

Dat Library takes the concept of “a library of libraries” not to manifest it
in a single, universal library, but to realise it progressively and partially
with different individuals, groups and institutions.

## Archival properties

So far, the empha


reservation, but ultimately create a rarefied
relationship between the archives and their publics. Disregarding this
precious tendency toward preciousness, we also introduce _adaptability_ as a
fundamental consideration in the making of the projects Dat Library and
HyperReadings.

To adapt is to fit something for a new purpose. It emphasises that the archive
is not a dead object of research but a set of possible tools waiting to be
activated in new circumstances. This is always a possibility of an a


ting computers running mostly open-source
software can be the guts of an advanced capitalist engine, like Facebook. So,
could it be possible to organise our networked devices, embedded as they are
in a capitalist economy, in an anti-capitalist way?

Dat Library is built on the [Dat
Protocol](https://github.com/datproject/docs/blob/master/papers/dat-paper.md),
a peer-to-peer protocol for syncing folders of data. It is not the first
distributed protocol ([BitTorrent](https://en.wikipedia.org/wiki/BitTorrent)
is the best known and is noted as an inspiration for Dat), nor is it the only
new one being developed today ([IPFS](https://ipfs.io) or the Inter-Planetary
File System is often referenced in comparison), but it is unique in its
foundational goals of preserving scientific knowledge as a public good. Dat’s
provocation is that by creating custom infrastructure it will be possible to
overcome the accidents that restrict access to scientific knowledge. We would
specifically acknowledge here the role that the Dat community — or any
community around a protocol, for that matter — has in the formation of the
world that is built on top of that protocol. (For a sense of the Dat
community’s values — see its [code of conduct](https://github.com/datproject
/Code-of-Conduct/blob/master/CODE_OF_CONDUCT.md).)

When running Dat Library, a person sees their list of libraries. These can be
thought of as similar to a
[torrent](https://en.wikipedia.org/wiki/Torrent_file), where items are stored
across many computers. This means that many people will share in the provision
of di


onymous with accessibility — if
something can’t be accessed, it doesn’t exist. Here, we disentangle the two in
order to consider _access_ independent from questions of resilience.

##### Technically Accessible

When you create a new library in Dat, a unique 64-digit “key” will
automatically be generated for it. An example key is
`6f963e59e9948d14f5d2eccd5b5ac8e157ca34d70d724b41cb0f565bc01162bf`, which
points to a library of texts. In order for someone else to see the library you
have creat


unique key (by email,
chat, on paper or you could publish it on your website). In short, _you_
manage access to the library by copying that key, and then every key holder
also manages access _ad infinitum_.

At the moment this has its limitations. A Dat is only writable by a single
creator. If you want to collaboratively develop a library or reading list, you
need to have a single administrator managing its contents. This will change in
the near future with the integration of
[hyperdb](https://github.com/mafintosh/hyperdb) into Dat’s core. At that
point, the platform will enable multiple contributors and the management of
permissions, and our single key will become a key chain.

How is this key any different from knowing the domain name of a website? If a
site isn’t indexed


because they are shared, i.e., held in common.

It is important, while imagining the possibilities of a technological
protocol, to also consider how different _cultural protocols_ might be
implemented and protected through the life of a project like Dat Library.
Certain aspects of this might be accomplished through library metadata, but
ultimately it is through people hosting their own archives and libraries
(rather than, for example, having them hosted by a state institution) that
cultural protocol


ic group, but rather that
it should generate spaces that people can inhabit as they wish. The poet Jean
Paul once wrote that books are thick letters to friends. Books as
infrastructure enable authors to find their friends. This is how we ideally
see Dat Library and HyperReadings working.

## Use cases

We began work on Dat Library and HyperReadings with a range of exemplary use
cases, real-world circumstances in which these projects might intervene. Not
only would the use cases make demands on the software we were and still are
beginning to write, but they would also give us demands to make on the Dat
protocol, which is itself still in the formative stages of development. And,
crucially, in an iterative feedback loop, this process of design produces
transformative effects on those situations described in the use cases
themselves, resulting in furt


most invisible. On the other hand, if the
issues are presented together, with commentary and surrounding publications,
the political environment becomes palpable. Wendy and Chris have kindly
allowed us to make their personal collection available via Dat Library (the
key is: 73fd26846e009e1f7b7c5b580e15eb0b2423f9bea33fe2a5f41fac0ddb22cbdc), so
you can discover this for yourself.

### Academia.edu alternative

Academia.edu, started in 2008, has raised tens of millions of dollars as a
social network fo


rnal/2015/10/18/does-
academiaedu-mean-open-access-is-becoming-irrelevant.html) that “its financial
rationale rests … on the ability of the angel-investor and venture-capital-
funded professional entrepreneurs who run Academia.edu to exploit the data
flows generated by the academics who use the platform as an intermediary for
sharing and discovering research”. Moreover, he emphasises that in the open-
access world (outside of the exploitative practice of for-profit publishers
like Elsevier, who charge a premium for subscriptions), the privileged
position is to be the one “ _who gate-keeps the data generated around the use
of that content_ ”. This lucrative position has been produced by recent
“[recentralising tendencies](http://commonstransition.org/the-revolution-will-
not-be-decentralised-blockchains/)” of the internet, which in Acade


ies, personal web pages, and
other archives.

Is it possible to redecentralise? Can we break free of the subjectivities that
Academia.edu is crafting for us as we are interpellated by its infrastructure?
It is incredibly easy for any scholar running Dat Library to make a library of
their own publications and post the key to their faculty web page, Facebook
profile or business card. The tricky — and interesting — thing would be to
develop platforms that aggregate thousands of these libraries in direct
competition with Academia.edu. This way, individuals would maintain control
over their own work; their peer groups would assist in mirroring it; and no
one would be capitalising on the sale of data related to their performance and
popularity.

We note that Academia.edu is a typically centripetal platform: it provides no
tools for exporting one’s own content, so an alternative would necessarily be
a kind of centrifuge.

This alternative is becoming increasingly realistic. With open-access journals
already paving the way, there has more recently been a [call for free and open
access to citation data](https://www.insidehighered.com/news/2017/12/06
/scholars-push-free-access-online-citation-data-saying-they-need-and-deserve-
access). [The Initiative for Open Citations (I4OC)](https://i4oc.org) is
mobilising against the privatisation of data and working towards the
unrestricted availability of scholarly citation data. We see their new
database of citations as making this centrifugal force a possibility.

### Publication format

In writing this README, we have strung together several references. This
writing might be published in a book and the references will be


suppositions that amount to a
set of social propositions.

### The role of individuals in the age of distribution

Different people have different technical resources and capabilities, but
everyone can contribute to an archive. By simply running the Dat Library
software and adding an archive to it, a person is sharing their disk space and
internet bandwidth in the service of that archive. At first, it is only the
archive’s index (a list of the contents) that is hosted, but if the person
downloads


ise together to
guarantee the durability and accessibility of an archive, saving a future
UbuWeb from ever having to worry about if their ‘ISP pulling the plug’. As
supporters of many archives, as members of many communities, individuals can
use Dat Library to perform this function many times over.

On the Web, individuals are usually users or browsers — they use browsers. In
spite of the ostensible interactivity of the medium, users are kept at a
distance from the actual code, the infrastructure of a website, which is run
on a server. With a distributed protocol like Dat, applications such as
[Beaker Browser](https://beakerbrowser.com) or Dat Library eliminate the
central server, not by destroying it, but by distributing it across all of the
users. Individuals are then not _just_ users, but also hosts. What kind of
subject is this user-host, especially as compared to the user of the serve


ly written in a
[Git](https://en.wikipedia.org/wiki/Git)
[repository](https://en.wikipedia.org/wiki/Repository_\(version_control\)).
Git is a free and open-source tool for version control used in software
development. All the code for Hyperreadings, Dat Library and their numerous
associated modules are managed openly using Git and hosted on GitHub under
open source licenses. In a real way, Git’s specification formally binds our
collaboration as well as the open invitation for others to participate


dat in Elbakyan 2016


uses to post on its own site. What Anderson does is points out that if
that information falls into the wrong hands, there are all sorts of terrible
things that can be done because those access codes provide access to personal
information, to student data, to all sorts of other things that could be badly
misused, so my question to you is what assurances can you give us that that
kind of information will not fall into the wrong hands.

**Elbakyan** : Well, first of all I doubt that it's possible t


dat in USDC 2015


uct names of other major global publishers (collectively with www.sci-hub.org the “Sci-Hub Website”). The sci-hub.org domain name is registered by
“Fundacion Private Whois,” located in Panama City, Panama, to an unknown registrant. As of
the date of this filing, the Sci-Hub Website is assigned the IP address 31.184.194.81. This IP address is part of a range of IP addresses assigned to Petersburg Internet Network Ltd., a webhosting company located in Saint Petersburg, Russia.

6. Upon informa


g domain is registered by “Whois Privacy
Corp.,” located at Ocean Centre, Montagu Foreshore, East Bay Street, Nassau, New Providence,

2

Case 1:15-cv-04282-RWS Document 1 Filed 06/03/15 Page 3 of 16

Bahamas, to an unknown registrant. As of the date of this filing, libgen.org is assigned the IP address 93.174.95.71. This IP address is part of a range of IP addresses assigned to Ecatel Ltd., a web-hosting company located in Amsterdam, the Netherlands.

7. The Libgen Domains include “elibgen.or


and hospitals that purchase physical and electronic copies of Elsevier’s products and
access to Elsevier’s digital libraries. Elsevier distributes its scientific journal articles and book
chapters electronically via its proprietary subscription database “ScienceDirect”
(www.sciencedirect.com). In most cases, Elsevier holds the copyright and/or exclusive
distribution rights to the works available through ScienceDirect. In addition, Elsevier holds
trademark rights in “Elsevier,” “ScienceDirect,” and several other related trade names.
19.

The ScienceDirect database is home to almost one-quarter of the world's peer-

reviewed, full-text scientific, technical and medical content. The ScienceDirect service features
sophisticated search and retrieval tools for students and professionals which facilitates acces


, the user may connect remotely to the university’s
network using a proxy connection. Universities offer proxy connections to their students and
faculty so that those users may access university computing resources – including access to
research databases such as ScienceDirect – from remote locations which are unaffiliated with the
university. This practice facilitates the use of ScienceDirect by students and faculty while they
are at home, travelling, or otherwise off-campus.
Defendants’ Un


ect. Specifically, Defendants utilize their websites located at sci-hub.org and at the Libgen
Domains to operate an international network of piracy and copyright infringement by
circumventing legal and authorized means of access to the ScienceDirect database. Defendants’
piracy is supported by the persistent intrusion and unauthorized access to the computer networks

7

Case 1:15-cv-04282-RWS Document 1 Filed 06/03/15 Page 8 of 16

of Elsevier and its institutional subscribers, including universi


l, article or book identifier (such as a Digital
Object Identifier, PubMed Identifier, or the source URL).
31.

When a user performs a keyword search on Sci-Hub, the website returns a proxied

version of search results from the Google Scholar search database. 1 When a user selects one of
the search results, if the requested content is not available from the Library Genesis Project, SciHub unlawfully retrieves the content from ScienceDirect using the access previously obtained.
Sci-Hub then provides


to a user request, in addition to providing a copy of that article to that
user, Sci-Hub also provides a duplicate copy to the Library Genesis Project, which stores the
9

Case 1:15-cv-04282-RWS Document 1 Filed 06/03/15 Page 10 of 16

article in a database accessible through the Internet. Upon information and belief, the Library
Genesis Project is designed to be a permanent repository of this and other illegally obtained
content.
36.

Upon information and belief, in the event that a Sci-Hub user r


dat in USDC 2015


lation

604 F.3d 110,

117

Arista

(2d Cir. 2010)

Inc. v. Rural Tel. Serv. Co.,

499 U.S.

(quoting

340,

361

(1991) ) .
Elsevier has made a substantial evidentiary showing,
documenting the manner in which the Defendants access its
ScienceDirect database of scientific literature and post
copyrighted material on their own websites free of charge.
According to Elsevier,

the Defendants gain access to

ScienceDirect by using credentials fraudulently obtained from
educational institutions,

includin


States that

is used in a

manner that affects interstate or foreign commerce or
communication of the United States."
7

I .

§ (e) (2) (B);

Nexans

Wires S. A.
2006).

v.

Sa

Inc.

166 F.

App'x 559, 562 n. 5

(2d Cir.

Elsevier's ScienceDirect database is located on multiple

servers throughout the world and is accessed by educational
institutions and their students, and qualifies as a computer
used in interstate commerce, and therefore as a protected
computer under the CFAA.

See Woltermann D


ave suffered over $5,000 in damage or loss, see
Register. com, Inc.

v.

Verio, Inc. , 356 F. 3d 393, 439

(2d Cir.

2004), Elsevier has made the necessary showing since it
documented between 2,000 and 8,500 of its articles being added
to the LibGen database each day

(Woltermann Dec.

at 8, Exs.

G &

H) and because its articles carry purchase prices of between
$19. 95 and $41. 95 each.
Leon, No.

12 Civ.

Id.

at 2;

see Millennium TGA, Inc.

1360, 2013 WL 5719079, at *10

(E. D. N.Y.

v.

Oct.

1


dat in Fuller 2016


e cards are further arranged between
coloured guide cards. As an alternative to tabbed cards, signal flags may be
used. Here, metal clips may be attached to the top end of the card and that
stand out like guides. For use of the system in relation to dates of the
month, the card is printed with the numbers 1 to 31 at the top. The metal clip
is placed as a signal to indicate the card is to receive attention on the
specified day. Within a large organisation a further card can be drawn up to
assign responsibility for processing that date’s cards. There were numerous
means of working the cards, special techniques for integrating them into any
type of research or organisation, means by which indexes operating on indexes
could open mines of information and expand the knowledge and capabilities of
mankind.

As he pressed me further, I began to experiment with such methods myself by
withdrawing data from the sanatorium’s records and transferring it to cards in
the night. The advantages of the system are overwhelming. Cards, cut to the
right mathematical degree of accuracy, arrayed readily in drawers, set in
cabinets of standard sizes that may


dat in Fuller & Dockray 2011


so on. I wonder, what is a kind of characteristic or
unusual reading behavior? For instance are there people who download the
entire list? Or do you see people being relatively selective? How does the
mania of the net, with this constant churning of data, map over to forms of
bibliomania?

**SD:** Well, in Aaaaarg it's again very specific. Anecdotally again, I have
heard from people how much they download and sometimes they're very selective,
they just see something that's interesting and download i


eginning was the highly partial and
subjective nature to the contents and that is something I would want to
preserve, which is why I never thought it to be particularly exciting to have
lots of high quality metadata - it doesn't have the publication date, it
doesn't have all the great metadata that say Amazon might provide. The system
is pretty dismal in that way, but I don't mind that so much. I read something
on the Internet which said it was like being in the porn section of a video
store with al


*MF:** You could also find that in different ways for instance with a pdf, a
pdf that was bought directly as an ebook that's digitally watermarked will
have traces of the purchaser coded in there. So then there's also this work of
stripping out that data which will become a new kind of labour. So it doesn't
have this kind of humanistic refrain, the actual hand, the touch of the
labour. This is perhaps more interesting, the work of the code that strips it
out, so it's also kind of recognising that co


dat in Giorgetta, Nicoletti & Adema 2015


uerillaOpenAccessManifesto/Goamjuly2008_djvu.txt):

> We need to take information, wherever it is stored, make our copies and
share them with the world. We need to take stuff that’s out of copyright and
add it to the archive. We need to buy secret databases and put them on the
Web. We need to download scientific journals and upload them to file sharing
networks. We need to fight for Guerilla Open Access. (Swartz 2008)

However whatever form or vision of open access you prefer, I do not think it
is


dat in Graziano, Mars & Medak 2019


increasingly becoming central for teaching and research. In addition to their publishing business, Elsevier
has expanded its ‘research intelligence’ offering, which now encompasses a whole
range of digital services, including the Scopus citation database; Mendeley reference
manager; the research performance analytics tools SciVal and Research Metrics; the
centralized research management system Pure; the institutional repository and pub-

22 Vincent Larivière, Stefanie Haustein, and Philippe Mon


hly contested terrain—the very idea of a public good being under attack by
dominant managerial techniques that try to redefine it, driving what Randy Martin

23 Ben Williamson, ‘Number Crunching: Transforming Higher Education into “Performance Data”’,
Medium, 16 August 2018, https://medium.com/ussbriefs/number-crunching-transforming-highereducation-into-performance-data-9c23debc4cf7.
24 Max Chafkin, ‘Udacity’s Sebastian Thrun, Godfather Of Free Online Education, Changes Course’,
Fast


bus migrating online?
In the contemporary university, critical pedagogy is clashing head-on with the digitization of higher education. Education that should empower and research that should
emancipate are increasingly left out in the cold due to the data-driven marketization
of academia, short-cutting the goals of teaching and research to satisfy the fluctuating demands of labor market and financial speculation. Resistance against the capture of data, research workflows, and scholarship by means of digitization is a key
struggle for the future of mass intellectuality beyond exclusions of class, disability,
gender, and race.
What have we learned from #Syllabus as a media object?
As old formats tr


es for Housework: Pamphlets – Flyers – Photographs,’ MayDay Rooms, http://maydayrooms.org/
archives/wages-for-housework/wfhw-pamphlets-flyers-photographs/.
Williamson, Ben. ‘Number Crunching: Transforming Higher Education into “Performance Data”’,
Medium, 16 August 2018, https://medium.com/ussbriefs/number-crunching-transforming-highereducation-into-performance-data-9c23debc4cf7/.



dat in Hamerman 2015


iversal.

_[UbuWeb](http://www.ubuweb.com/resources/index.html)_ , founded in 1996 by
conceptual artist/ writer Kenneth Goldsmith, is the largest online archive of
avant-garde art resources. Its holdings include sound, video and text-based
works dating from the historical avant-garde era to today. While many of the
sites in the “pirate library” continuum source their content through
community-based or peer-to-peer models, UbuWeb focuses on making available out
of print, obscure or difficult


dat in Kelty, Bodo & Allen 2018


brary, is now entangling
print and digital in novel ways. And, as he warns, the terrain
of antagonism is shifting. While for-profit publishers are
seemingly conceding to Guerrilla Open Access, they are
opening new territories: platforms centralizing data, metrics
and workflows, subsuming academic autonomy into new
processes of value extraction.
The 2010s brought us hope and then realization how little
digital networks could help revolutionary movements. The
redistribution toward the wealthy, assiste


itutions of solidarity. The embrace of privilege—
marked by misogyny, racism and xenophobia—this has catalyzed
is nowhere more evident than in the climate denialism of the
Trump administration. Guerrilla archiving of US government
climate change datasets, as recounted by Laurie Allen,
indicates that more technological innovation simply won't do
away with the 'post-truth' and that our institutions might be in
need of revision, replacement and repair.
As the contributions to this pamphlet indicate


editor Ken Wissoker were
enthusiastically accommodating of my demands to make the book freely and openly
available. They also played along with my desire to release the 'source code' of the
book (i.e. HTML files of the chapters), and to compare the data on readers of the
open version to print customers. It was a moment of exploration for both scholarly
presses and for me. At the time, few authors were doing this other than Yochai Benkler
(2007) and Cory Doctorow2, both activists and advocates for f


ternet at its outset in the
1980s, such as gopher, WAIS, and the HTML of CERN, was conducted in the name
of the digital transformation of the library. But by 2007, these aims were swamped
by attempts to transform the Internet into a giant factory of data extraction. Even
in 2006-7 it was clear that this unfinished business of digitizing the scholarly record
was going to become a problem—both because it was being overshadowed by other
concerns, and because of the danger it would eventually be subje


ng 'life
cycles' or 'pipeline' of research, not just its dissemination.

Recursive Publics and Open Access

11

Metrics
More than anything, OA is promoted as a way to continue
to feed the metrics God. OA means more citations, more
easily computable data, and more visible uses and re-uses of
publications (as well as 'open data' itself, when conceived of
as product and not measure). The innovations in the world
of metrics—from the quiet expansion of the platforms of the
publishers, to the invention of 'alt metrics', to the enthusiasm
of 'open science' for metrics-driven


publishers are happy to let go of access control and copyright,
it means that they’ve found something that is even more profitable than selling
back to us academics the content that we have produced. And this more profitable
something is of course data. Did you notice where all the investment in academic
publishing went in the last decade? Did you notice SSRN, Mendeley, Academia.edu,
ScienceDirect, research platforms, citation software, manuscript repositories, library
systems being bought up by t


s
and technologies operate on and support open access content, while they generate
data on the creation, distribution, and use of knowledge; on individuals, researchers,
students, and faculty; on institutions, departments, and programs. They produce data
on the performance, on the success and the failure of the whole domain of research
and education. This is the data that is being privatized, enclosed, packaged, and sold
back to us.

Drip, drip, drop, its only nostalgia. My heart is light, as I don’t have to worry about
gutting the library. Soon it won’t matter at all.

Taylorism reached academia. In the nam


of efficiency, austerity, and transparency,
our daily activities are measured, profiled, packaged, and sold to the highest bidder.
But in this process of quantification, knowledge on ourselves is lost for us, unless we
pay. We still have some patchy datasets on what we do, on who we are, we still have
this blurred reflection in the data-mirrors that we still do control. But this path of
self-enlightenment is quickly waning as less and less data sources about us are freely
available to us.

22

Own Nothing

Who is downloading books and articles? Everyone. Radical open access? We won,
if you like.

Balazs Bodo

23

I strongly believe that information on the self is the foundation
of self-determination. We need to have data on how we operate,
on what we do in order to know who we are. This is what is being
privatized away from the academic community, this is being
taken away from us.
Radical open access. Not of content, but of the data about
ourselves. This is the next challenge. We will digitize every page,
by hand if we must, that process cannot be stopped anymore.
No outside power can stop it and take that from us. Drip, drip,
drop, this is what I console myself with, as another handful of
books land among the waste.
But the data we lose now will not be so easy to reclaim.

24

Balazs Bodo

Own Nothing

25

What if
We Aren't
the Only
Guerrillas
Out
There?
Laurie
Allen

My goal in this paper is to tell the story
of a grass-roots project called Data
Refuge (http://www.datarefuge.org)
that I helped to co-found shortly after,
and in response to, the Trump election
in the USA. Trump’s reputation as
anti-science, and the promise that his
administration would elevate people into
positions of power with a track record
of distorting, hiding, or obscuring the
scientific evidence of climate change
caused widespread concern that
valuable federal data was now in danger.
The Data Refuge project grew from the
work of Professor Bethany Wiggin and
the graduate students within the Penn
Program in Environmental Humanities
(PPEH), notably Patricia Kim, and was
formed in collaboration with the Penn
Libraries, where I work. In this paper, I
will discuss the Data Refuge project, and
call attention to a few of the challenges
inherent in the effort, especially as
they overlap with the goals of this
collective. I am not a scholar. Instead,
I am a librarian, and my perspective as
a practicing informational professional
informs the way I approach this paper,
which weaves together the practical
and technical work of ‘saving data’ with
the theoretical, systemic, and ethical
issues that frame and inform what we
have done.

I work as the head of a relatively small and new department within the libraries
of the University of Pennsylvania, in the city of Philadelphia, Pennsylv


was hired to lead the Digital Scholarship department in the spring of 2016,
and most of the seven (soon to be eight) people within Digital Scholarship joined
the library since then in newly created positions. Our group includes a mapping
and spatial data librarian and three people focused explicitly on supporting the
creation of new Digital Humanities scholarship. There are also two people in the
department who provide services connected with digital scholarly open access
publishing, including the maintenance of the Penn Libraries’ repository of open
access scholarship, and one Data Curation and Management Librarian. This
Data Librarian, Margaret Janz, started working with us in September 2016, and
features heavily into the story I’m about to tell about our work helping to build Data
Refuge. While Margaret and I were the main people in our department involved in
the project, it is useful to understand the work we did as connected more broadly
to the intersection of activities—from multimodal, digital, humanities creation to
open access publishing across disciplines—represented in our department in Penn.
At the start of Data Refuge, Professor Wiggin and her students had already been
exploring the ways that data about the environment can empower communities
through their art, activism, and research, especially along the lower Schuylkill
River in Philadelphia. They were especially attuned to the ways that missing data,
or data that is not collected or communicated, can be a source of disempowerment.
After the Trump election, PPEH graduate students raised the concern that the
political commitments of the new administration would result in the disappearance
of environmental and climate data that is vital to work in cities and communities
around the world. When they raised this concern with the library, together we cofounded Data Refuge. It is notable to point out that, while the Penn Libraries is a
large and relatively well-resourced research library in the United States, it did not
have any automatic way to ingest and steward the data that Professor Wiggin and
her students were concerned about. Our system of acquiring, storing, describing
and sharing publications did not account for, and could not easily handle, the
evident need to take in large quantities of public data from the open web and make
them available and citable by future scholars. Indeed, no large research library
was positioned to respond to this problem in a systematic way, though there was
general agreement that the community would like to help.
The collaborative, grass-roots movement that formed Data Refuge included many
librarians, archivists, and information professionals, but it was clear from the
beginning that my own profession did not have in place a system for stewarding
these vital information resources, or for treating them as ‘public


documents librarians, our project
joined efforts that were ongoing in a huge range of communities, including: open
data and open science activists; archival experts working on methods of preserving
born-digital content; cultural historians; federal data producers and the archivists
and data scientists they work with; and, of course, scientists.

the scientific record to fight back, in a concrete way, against
an anti-fact establishment. By downloading data and moving
it into the Internet Archive and the Data Refuge repository,
volunteers were actively claiming the importance of accurate
records in maintaining or creating a just society.

This distributed approach to the work of downloading and saving the data
encouraged people to see how they were invested in environmental and scientific
data, and to consider how our government records should be considered the
property of all of us. Attending Data Rescue events was a way for people who value

Of course, access to data need not rely on its inclusion in
a particular repository. As is demonstrated so well in other
contexts, technological methods of sharing files can make
the digital repositories of libraries and archives seem like a
redundant holdover from the past. However, as I will argue
further in this paper, the data that was at risk in Data Refuge
differed in important ways from the contents of what Bodó
refers to as ‘shadow libraries’ (Bodó 2015). For opening
access to copies of journals articles, shadow libraries work
perfectly. However, the value of these shadow libraries relies
on the existence of the widely agreed upon trusted versions.
If in doubt about whether a copy is trustworthy, scholars
can turn to more mainstream copies, if necessary. This was
not the situation we faced building Data Refuge. Instead, we
were often dealing with the sole public, authoritative copy
of a federal dataset and had to assume that, if it were taken
down, there would be no way to check the authenticity of
other copies. The data was not easily pulled out of systems
as the data and the software that contained them were often
inextricably linked. We were dealing with unique, tremendously
valuable, but often difficult-to-untangle datasets rather than
neatly packaged publications. The workflow we established
was designed to privilege authenticity and trustworthiness
over either the speed of the copying or the easy usability of
the resulting data. 2 This extra care around authenticity was
necessary because of the politicized nature of environmental
data that made many people so worried about its removal
after the election. It was important that our project
supported the strongest possible scientific arguments that
could be made with the data we were ‘saving’. That meant
that our copies of the data needed to be citable in scientific
scholarly papers, and that those citations needed to be
able to withstand hostile political forces who claim that the
science of human-caused climate change is ‘uncertain’. It

28

What if We Aren't the Only Guerrillas Out There?

Born from the collaboration between Environmental Humanists and Librarians,
Data Refuge was always an effort both at storytelling and at storing data. During
the first six months of 2017, volunteers across the US (and elsewhere) organized
more than 50 Data Rescue events, with participants numbering in the thousands.
At each event, a group of volunteers used tools created by our collaborators at
the Environmental and Data Governance Initiative (EDGI) (https://envirodatagov.
org/) to support the End of Term Harvest (http://eotarchive.cdlib.org/) project
by identifying seeds from federal websites for web archiving in the Internet
Archive. Simultaneously, more technically advanced volunteers wrote scripts to
pull data out of complex data systems, and packaged that data for longer term
storage in a repository we maintained at datarefuge.org. Still other volunteers
held teach-ins, built profiles of data storytellers, and otherwise engaged in
safeguarding environmental and climate data through community action (see
http://www.ppehlab.org/datarefugepaths). The repository at datarefuge.org that
houses the more difficult data sources has been stewarded by myself and Margaret
Janz through our work at Penn Libraries, but it exists outside the library’s main
technical infrastructure.1

Laurie Allen

29

was easy to imagine in the Autumn of 2016, and even easier
to imagine now, that hostile actors might wish to muddy the
science of climate change by releasing fake data designed
to cast doubt on the science of climate change. For that
reasons, I believe that the unique facts we were seeking
to safeguard in the Data Refuge bear less similarity to the
contents of shadow libraries than they do to news reports
in our current distributed and destabilized mass media
environment. Referring to the ease of publishing ideas on the
open web, Zeynep Tufecki wrote in a rec


ssian bots? Was it maybe even
generated with the help of artificial intelligence? (Yes, there
are systems that can create increasingly convincing fake
videos.)” (Tufekci 2018). This was the state we were trying to
avoid when it comes to scientific data, fearing that we might
have the only copy of a given dataset without solid proof that
our copy matched the original.
If US federal websites cease functioning as reliable stewards
of trustworthy scientific data, reproducing their data
without a new model of quality control risks producing the
very censorship that our efforts are supposed to avoid,
and further undermining faith in science. Said another way,
if volunteers duplicated federal data all over the Internet
without a trusted system for ensuring the authenticity of
that data, then as soon as the originals were removed, a sea of
fake copies could easily render the original invisible, and they
would be just as effectively censored. “The most effective
forms of censorship today involve meddling with trust and
attention, not muzzling speech itself.” (Tufekci 2018).
These concerns about the risks of open access to data should
not be understood as capitulation to the current marketdriven approach to scholarly publishing, nor as a call for
continuation of the status quo. Instead, I hope to encourage
continuation of the creative approaches to scholarship
represented


len

Data Refuge will serve as a call to take greater responsibility for the systems into
which scholarship flows and the structures of power and assumptions of trust (by
whom, of whom) that scholarship relies on.
While plenty of participants in the Data Refuge community posited scalable
technological approaches to help people trust data, none emerged that were
strong enough to risk further undermining faith in science that a malicious attack
might cause. Instead of focusing on technical solutions that rely on the existing
systems staying roughly as they are, I would like to focus o


rmation
landscape has proved to be tremendously harmful for the dissemination of facts,
and has been especially dangerous to marginalized communities (Noble 2018).
While the world of scholarly humanities publishing is doing somewhat better than
open data or mass media, there is still a risk that without new forms of filtering and
establishing quality and trustworthiness, good ideas and important scholarship will
be lost in the rankings of search engines and the algorithms of social media. We
need ne


t we’re getting is true.” (boyd 2018)
In closing, I’ll return to the notion of Guerrilla warfare that brought this panel
together. While some of our collaborators and some in the press did use the term
‘Guerrilla archiving’ to describe the data rescue efforts (Currie and Paris 2017),
I generally did not. The work we did was indeed designed to take advantage of
tactics that allow a small number of actors to resist giant state power. However,

What if We Aren't the Only Guerrillas Out There?


titutions where many of us work and by communities of scholars and
activists who make up these institutions. It was designed to get as many people as
possible working to address the complex issues raised by the two interconnected
challenges that the Data Refuge project threw into relief. The first challenge,
of course, is the need for new scientific, artistic, scholarly and narrative ways of
contending with the reality of global, human-made climate change. And the second
challenge, as I’ve argued


repair, and perhaps in some cases, replacement. And
this work will rely on scholars, as well as expert information practitioners from a
range of fields (Caswell 2016).

¹ At the time of this writing, we are working
on un-packing and repackaging the data
within Data Refuge for eventual inclusion
in various Research Library Repositories.

Ideally, of course, all federally produced
datasets would be published in neatly
packaged and more easily preservable
containers, along with enough technical
checks to ensure their validity (hashes,
checksums, etc.) and each agency would
create a periodical published inventory of
datasets. But the situation we encountered
with Data Refuge did not start us in
anything like that situation, despite the
hugely successful and important work of
the employees who created and maintained
data.gov. For a fuller view of this workflow,
see my talk at CSVConf 2017 (Allen 2017).

2

Closing note: The workflow established and used at Data Rescue events was
designed to tackle this set of difficult issues, but needed refinement, and was retired
in mid-2017. The Data Refuge project continues, led by Professor Wiggin and her
colleagues and students at PPEH, who are “building a storybank to document
how data lives in the world – and how it connects people, places, and non-human
species.” (“DataRefuge” n.d.) In addition, the set of issues raised by Data Refuge
continue to inform my work and the work of many of our collaborators.

32

Laurie Allen

What if We Aren't the Only Guerrillas Out There?

33

References
Allen, Laurie. 2017. “Contexts and Institutions.” Paper presented at csv,conf,v3, P


ries in the Post - Scarcity Era.” In Copyrighting Creativity:
Creative Values, Cultural Heritage Institutions and Systems of Intellectual Property,
edited by Porsdam. Routledge.
boyd, danah. 2018. “You Think You Want Media Literacy… Do You?” Data & Society: Points.
March 9, 2018. https://points.datasociety.net/you-think-you-want-media-literacy-doyou-7cad6af18ec2.
Caswell, Michelle. 2016. “‘The Archive’ Is Not an Archives: On Acknowledging the
Intellectual Contributions of Archival Stud


g/datarefuge/.
DataRescue Paths.” n.d. PPEH Lab. Accessed May 20, 2018.
http://www.ppehlab.org/datarefugepaths/.
“End of Term Web Archive: U.S. Government Websites.” n.d. Accessed May 20, 2018.
http://eotarchive.cdlib.org/.
“Environmental Data and Governance Initiative.” n.d. EDGI. Accessed May 19, 2018.
https://envirodatagov.org/.
Laster, Shari. 2016. “After the Election: Libraries, Librarians, and the Government - Free
Government Information (FGI).” Free Government Information (FG


es Reinforce
Racism. New York: NYU Press.
Tufekci, Zeynep. 2018. “It’s the (Democracy-Poisoning) Golden Age of Free Speech.”
WIRED. Accessed May 20, 2018.
https://www.wired.com/story/free-speech-issue-tech-turmoil-new-censorship/.
“Welcome - Data Refuge.” n.d. Accessed May 20, 2018. https://www.datarefuge.org/.
Williams, Stacie M, and Jarrett Drake. 2017. “Power to the People: Documenting Police
Violence in Cleveland.” Journal of Critical Library and Information Studies 1 (2).
https://


dat in Liang 2012


blurred, the very idea of a
library is up for grabs. It has taken me well over two decades to build a
collection of a few thousand books while around two hundred thousand books
exist as bits and bytes on my computer. Admittedly hard drives crash and data
is lost, but is that the same threat as those of rain or fire? Which then is
my library and which its shadow? Or in the spirit of logotopias would it be
more appropriate to ask the spatial question: where is the library?

If the possibility of havin


dat in Marczewska, Adema, McDonald & Trettien 2018


hat the open availability of research content has
been an important material condition for scholars and publishers to explore new
formats and new forms of interaction around publications. In order to remix and
re-use content, do large scale text and data-mining, experiment with open peer
review and emerging genres such as living books, wiki-publications, versionings and
multimodal adaptations, both the scholarly materials and platforms that lie at the
basis of these publishing gestures strongly bene


ess as just another profitable business model, retaining
and further exploiting existing relations instead of disrupting
them; of how new commercial intermediaries and gatekeepers
parasitical on open forms of communication are mining
and selling the data around our content to further their
own pockets—e.g. commercial SSRNs such as Academia.
edu and ResearchGate. In addition to all this, open access
can do very little to further experimentation if it is met by
a strong conservatism from scholars, t


dat in Mars & Medak 2019




Against innovation

Consider Elsevier, the largest scholarly publisher, whose 37% profit margin stands
in sharp contrast to the rising fees, expanding student loan debt and poverty-level
wages for adjunct faculty. Elsevier owns some of the largest databases of academic
material, which are licensed at prices so scandalously high that even Harvard, the
richest university of the global north, has complained that it cannot afford them
any longer. (Custodians.online, 2015: n.p.)

The enormous profits a


the university and the library.
Where is their equalizing capacity in a historical conjuncture marked by the
rising levels of inequality? In the accelerating ‘race against the machine’
(Brynjolfsson and McAfee, 2012), with the advances in big data, AI and
robotization threatening to obliterate almost half of the jobs in advanced
economies (Frey and Osborne, 2013; McKinsey Global Institute, 2018), the
university is no longer able to fulfill the promise that it can provide both the
breadth and


As
Gary Hall reports in his ‘Uberfication of the university’ (2016), a survey UK vicechancellors has detected a number of areas where universities under their
command should become more disruptively innovative:
Among them are “uses of student data analytics for personalized services” (the
number one innovation priority for 90 percent of vice-chancellors); “uses of
technology to transform learning experiences” (massive open online courses
[MOOCs]; mobile virtual learning environments [VL


ir public funders leave them in their underfunded torpor to
improvise their way through education and research processes. It is these
institutions that depend the most on the Library Genesis and Science Hubs of
this world. If we look at the download data of Library Genesis, as has Balasz Bodó
(2015), we can discern a clear pattern that the users in the rich economies use
these shadow libraries to find publications that are not available in the digital
form or are pay-walled, while the users in the


dat in Mars & Medak 2019


to the governance of access to MIT’s
own resources, it is well known that anyone who is registered and
connected to the “open campus” wireless network, either by being
physically present or via VPN, can search JSTOR, Google Scholar,
and other databases in order to access otherwise paywalled journals from major publishers such as Reed Elsevier, Wiley-­Blackwell,
Springer, Taylor and Francis, or Sage.
The MIT Press has also published numerous books that we love
and without which we would have


platform for accessing academic journals, Sci-­hub. A
voluntary and noncommercial project of anonymous scientists
mostly from Eastern Europe, Sci-­hub provides as of end of 2015
access to more than 41 million academic articles either stored
in its database or retrieved through bypassing the paywalls of
academic publishers. The only person explicitly named in Elsevier’s
lawsuit was Sci-­hub’s founder Alexandra Elbakyan, who minced no
words: “When I was working on my research project, I found


shing models, workflows, and metrics, radicalizing the work of conventional open access, which has by now increasingly
become recuperated by big for-­profit publishers, who see in open access an
opportunity to assume the control over the economy of data in academia.
Some established academic publishers, too, have been open to experiments
that go beyond mere open access and are trying to redesign how academic
writing is produced, made accessible, and valorized. This essay has the good
fortune of app


dat in Mattern 2014


rary-
infrastructure-1x.jpg)](https://placesjournal.org/wp-content/uploads/2014/06
/mattern-library-infrastructure-1x.jpg)Left: Rijksmuseum Library, Amsterdam.
[Photo by[Ton Nolles](https://www.flickr.com/photos/tonnolles/9428619486/)]
Right: Google data center in Council Bluffs, Iowa. [Photo by Google/Connie
Zhou]

Melvil Dewey was a one-man Silicon Valley born a century before Steve Jobs. He
was the quintessential Industrial Age entrepreneur, but unlike the Carnegies
and Rockefellers, with their i


ed them, preserved
them and made them accessible (or not) to patrons. But the [forms of those
resources](http://www.spl.org/prebuilt/cen_conceptbook/page16.htm) have
changed — from scrolls and codices; to LPs and LaserDiscs; to e-books,
electronic databases and open data sets. Libraries have had at least to
comprehend, if not become a key node within, evolving systems of media
production and distribution. Consider the medieval scriptoria where
manuscripts were produced; the evolution of the publishing industry and b


well-connected they are, [they
actually _don’t_ have the world at their
fingertips](https://placesjournal.org/article/marginalia-little-libraries-in-
the-urban-margins/) — that “material protected by stringent copyright and held
in proprietary databases is often inaccessible outside libraries” and that,
“as digital rights management becomes ever more complicated, we … rely even
more on our libraries to help us navigate an increasingly fractured and
litigious digital terrain.” 21 And th


experts in “copyright
compliance, licensing, privacy, information use, and ethics”; gurus of
“aligning … programs with collections, space, and resources”; skilled creators
of “custom ontologies, vocabularies, taxonomies” and structured data; adept
practitioners of data mining. 28 Others recommend that libraries get into the
content production business. In the face of increasing pressure to rent and
license proprietary digital content with stringent use policies, why don’t
libraries do more to promote the creatio


al,
not an individualistic or entrepreneurial zero-sum game to be won by the most
industrious. 32

Libraries, she argued, “will always be at a disadvantage” to Google and Amazon
because they value privacy; they refuse to exploit users’ private data to
improve the search experience. Yet libraries’ failure to compete in
_efficiency_ is what affords them the opportunity to offer a “different kind
of social reality.” I’d venture that there _is_ room for entrepreneurial
learning in the libr


Readers Like You.
Please [Subscribe](https://placesjournal.org/newsletter/ "Places Newsletter
Signup") or [Donate](https://placesjournal.org/donate "Donate").

###### Author's Note

I’d like to thank the students in my “Archives, Libraries and Databases”
seminar and my “Digital Archives” studio at The New School, who’ve given me
much food for thought over the years. Thanks, too, to my colleagues at the
[Architectural League of New York](http://archleague.org/) and the [Center for
an Ur


dat in Mattern 2018


he un-affiliated readers, equally interested and invested in
decolonization, who had no academic librarians to serve as their liaisons.

I’ve found myself standing before similar gates in similar provinces of
paradox: the scholarly book on “open data” that sells for well over $100; the
conference on democratizing the “smart city,” where tickets sell for ten times
as much. Librarian Ruth Tillman was [struck with “acute irony
poisoning”](https://twitter.com/ruthbrarian/status/9327011528


n-profit, university-supported, open-access
venue for public scholarship on landscape, architecture, urbanism. After
having written thirteen (fifteen by Fall 2017) long-form pieces for  _Places_
since 2012, I’ve effectively assumed their “urban data and mediated spaces”
beat. I work with paid, professional editors who care not only about subject
matter – they’re just as much domain experts as any academic peer reviewer
I’ve encountered – but also about clarity and style and visual pre


dat in Mars & Medak 2017


e collective open
255

CHAPTER 12

letter ‘In solidarity with Library Genesis and Sci-Hub’ (Custodians.online, 2015),
five for-profit publishers (Elsevier, Springer, Wiley-Blackwell, Taylor & Francis
and Sage) own more than half of all existing databases of academic material, which
are licensed at prices so scandalously high that even Harvard, the richest university
of the Global North, has complained that it cannot afford them any longer. Robert
Darnton, the past director of Harvard Library, s


e resulting vacuum did not last for long, as Library.nu repository got
merged into the holdings of Library Genesis. Building on the legacy of Soviet
scholars who devised the ways of shadow production and distribution of
knowledge in the form of samizdat and early digital distribution of texts in the
post-Soviet period (Balázs, 2014), Library Genesis has built a robust infrastructure
with the mission to provide access to the largest online library in existence while
keeping a low profile. At this mo


cient and
engaging way of politicization. The naïve and oft overused claim – particularly
during the Californian nineties – of the revolutionary potential of emerging digital
networks turned out to be a good candidate for replacement by a story dating back
two centuries earlier – the story of emergence of public libraries in the early days
of the French bourgeois revolution in the 19th century.
The seizure of book collections from the Church and the aristocracy in the
course of revolutions ca


dat in Medak, Mars & WHW 2015


itutions in crisis.
Library Genesis15 is an online repository with over
a million books and is the first project in history to
offer everyone on the Internet free download of its
entire book collection (as of this writing, about fifteen terabytes of data), together with the all metadata
(MySQL dump) and PHP/HTML/Java Script code
for webpages. The most popular earlier reposito15 See http://libgen.org/.

82

M. Mars • M. Zarroug • T. Medak

ries, such as Gigapedia (later Library.nu), handled
thei


, or legal digests in card form).
Even the idea of the encyclopedia has taken this
form (Nelson’s Perpetual Cyclopedia [6]).
Theoretically and technically, we now have in
the Repertory a new instrument for analytically or
monographically recording data, ideas, information. The system has been improved by divisionary cards of various shapes and colours, placed in
such a way that they express externally the outline
of the classification being used and reduce search
time to a minimum. It has been imp


ke repertories of objects, persons,
phenomena; and documentary repertories of files
made up of written or printed materials of all kinds.
The possibility can be envisaged of encyclopedic
repertories in which are registered and integrated
the diverse data of a scientific field and which draw
for this purpose on materials published in periodicals. Let each article, each report, each item of news
henceforth carry a classification number and, automatically, by clipping, encyclopedias on cards can

98

P


se
repertories: bibliographic repertories; repertories of
documentary dossiers gathering pamphlets and extracts together by subject; catalogues; chronological
repertories of facts or alphabetical ones of names;
encyclopedic repertories of scientific data, of laws,
of patents, of physical and technical constants, of
statistics, etc. All of these repertories will be set up
according to the method described above and arranged by the same universal classification. As soon
as an organisation to contain t


reau. Hollerith, his invention and his business connections lie at the roots of the
present IBM company. The equipment and its uses in the
census from 1890 to 1910 are briefly described in John H.
Blodgett and Claire K. Schultz, “Herman Hollerith: Data
Processing Pioneer,” American Documentation 20 (1969):
221-226. As they observe, suggesting the accuracy of Otlet’s
extrapolation, “his was not simply a calculating machine,
it performed selective sorting, an operation basic to all information


he circulation of many kinds of what Hito Steyerl
calls the poor image. Often low in resolution, these
détourned materials circulated thanks both to the
compression of information but also because of the
addition of information. There might be less data
but there’s added metadata, or data about data, enabling its movement.
Needless to say the old culture industries went
into something of a panic about all this. As I wrote
over ten years ago in A Hacker Manifesto, “information wants to be free but is everywhere in chains.”
It is one of the q


he old culture industries but what I call the vulture
industries. Their strategy was not to try to stop the
flow of free information but rather to see it as an
environment to be leveraged in the service of creating a new kind of business. “Let the data roam free!”
says the vulture industry (while quietly guarding
their own patents and trademarks). What they aim
to control is the metadata.
It’s a new kind of exploitation, one based on an
unequal exchange of information. You can have the
little scraps of détournement that you desire, in exchange for performing a whole lot of free labor—and
giving up all of the metadata. So you get your little
bit of data; they get all of it, and more importantly,
any information about that information, such as
the where and when and what of it.

Metadata Punk

113

It is an interesting feature of this mode of exploitation that you might not even be getting paid for


uch the actions of the working class
to which the ruling class had to respond in this case,
as what I call the hacker class. They had to recuperate a whole social movement, and they did. So our
tactics have to change.
In the past we were acting like data-punks. Not
so much “here’s three chords, now form your band.”
More like: “Here’s three gigs, now go form your autonomous art collective.” The new tactic might be
more question of being metadata-punks. On the one
hand, it is about freeing


ble in that case.
It takes matters off the internet and out of circulation among strangers. Ask me about it in person if
we meet in person.
The other two are Monoskop Log and UbuWeb.
It is hard to know what to call them. They are websites, archives, databases, collections, repositories,
but they are also a bit more than that. They could be
thought of also as the work of artists or of curators;
of publishers or of writers; of archivists or researchers. They contain lots of files. Monoskop is mostly
b


was a sufficient answer to that
question in the era of the culture industries, they try
to formulate, in their modest way, a suitable tactic
for answering the property question in the era of
the vulture industries.
This takes the form of moving from data to metadata, expressed in the form of the move from writing
to publishing, from art-making to curating, from
research to archiving. Another way of thinking this,
suggested by Hiroki Azuma would be the move from
narrative to database. The object of critical attention
acquires a third dimension, a kind of informational
depth. The objects before us are not just a text or an
image but databases of potential texts and images,
with metadata attached.

116

McKenzie Wark

The object of any avant-garde is always to practice the relation between aesthetics and everyday
life with a new kind of intensity. UbuWeb and
Monoskop seem to me to b


f European avant-gardes and media art.
If we take the index as a formalization of cross-referential relations between names of people, titles
of works and concepts that exist in the books and
across the books, what emerges is a model of a relational database reflecting the rich mesh of cultural
networks. Each book can serve as an index linking
its text to people, other books, segments in them.
To provide a paradigmatic demonstration of that
idea, Monoskop.org has assembled an index of all
persons in


sent
into something radically uncertain. The efforts of
Monoskop.org in digitizing of the artifacts of the
20th century avant-gardes and playing with the
epistemic tools of early book culture is a parallel
gesture, with a technological twist. If big data and
the control over information flows of today increasingly naturalizes and re-affirms the 19th century
positivist assumptions of the steerablity of society,
then the endlessly recombinant relations and affiliations between cultural objects threate


y under a double attack. One unleashed by
the dismantling of the institutionalized forms of
social redistribution and solidarity. The other by
the commodifying forces of expanding copyright
protections and digital rights management, control
over the data flows and command over the classification and order of information. In a world of
collapsing planetary boundaries and unequal development, those who control the epistemic order

136

Tomislav Medak

control the future.08 The Googles and the NSAs ru


dat in Medak, Sekulic & Mertens 2014


, typeface and quality of print - and there aren't that many OCR tools
that are good at it. There is, however, a relatively good free software solution - Tesseract
(http://code.google.com/p/tesseract-ocr/) - that has solid performance, good language data and can
be trained for an even better performance, although it has its problems. Proprietary solutions (e.g.
Abby FineReader) sometimes provide superior results.
Tesseract supports as input format primarily .tiff files. It produces a plain text file


under 'Output'. Save the project.
IV. Optical character recognition & V. Creating a finalized e-book file
If using all free software:
1) open gscan2pdf (if not already installed on your machine, install gscan2pdf from the
repositories, Tesseract and data for your language from https://code.google.com/p/tesseract-ocr/)
- point gscan2pdf to open your .tiff files
- for Optical Character Recognition, select 'OCR' under the drop down menu 'Tools',
select the Tesseract engine and your language, start the


dat in Murtaugh 2016


)

[Michael Murtaugh](/wiki/index.php?title=Michael_Murtaugh "Michael Murtaugh")

In text indexing and other machine reading applications the term "bag of
words" is frequently used to underscore how processing algorithms often
represent text using a data structure (word histograms or weighted vectors)
where the original order of the words in sentence form is stripped away. While
"bag of words" might well serve as a cautionary reminder to programmers of the
essential violence perpetrated to a text an


way. The resulting representation is then
a collection of each unique word used in the text, typically weighted by the
number of times the word occurs.

Bag of words, also known as word histograms or weighted term vectors, are a
standard part of the data engineer's toolkit. But why such a drastic
transformation? The utility of "bag of words" is in how it makes text amenable
to code, first in that it's very straightforward to implement the translation
from a text document to a bag of words representa


cation for reasons of safety, commercial
telegraphy extended this network of communication to include those parties
coordinating the "raw materials" being mined, grown, or otherwise extracted
from overseas sources and shipped back for sale.

## "Raw data now!"

From [La ville intelligente - Ville de la connaissance](/wiki/index.php?title
=La_ville_intelligente_-_Ville_de_la_connaissance "La ville intelligente -
Ville de la connaissance"):

Étant donné que les nouvelles formes modernistes et l'util


ndex.php?title
=The_Smart_City_-_City_of_Knowledge "The Smart City - City of Knowledge"):

As new modernist forms and use of materials propagated the abundance of
decorative elements, Otlet believed in the possibility of language as a model
of '[raw data](/wiki/index.php?title=Bag_of_words "Bag of words")', reducing
it to essential information and unambiguous facts, while removing all
inefficient assets of ambiguity or subjectivity.


> Tim Berners-Lee: [...] Make a beautiful website, but first give us the
unadulterated data, we want the data. We want unadulterated data. OK, we have
to ask for raw data now. And I'm going to ask you to practice that, OK? Can
you say "raw"?

>

> Audience: Raw.

>

> Tim Berners-Lee: Can you say "data"?

>

> Audience: Data.

>

> TBL: Can you say "now"?

>

> Audience: Now!

>

> TBL: Alright, "raw data now"!

>

> [...]

>

> So, we're at the stage now where we have to do this -- the people who think
it's a great idea. And all the people -- and I think there's a lot of people
at TED who do things because -- even though there's not an immediate return on
the investment because it will only really pay off when everybody else has
done it -- they'll do it because they're the sort of person who just does
things which would be good if everybody else did them. OK, so it's called
linked data. I want you to make it. I want you to demand it. [6]

## Un/Structured

As graduate students at Stanford, Sergey Brin and Lawrence (Larry) Page had an
early interest in producing "structured data" from the "unstructured" web. [7]

> The World Wide Web provides a vast source of information of almost all
types, ranging from DNA databases to resumes to lists of favorite restaurants.
However, this information is often scattered among many web servers and hosts,
using many different formats. If these chunks of information could be
extracted from the World Wide Web and integrated i


rectory of people, the largest and most diverse
databases of products, the greatest bibliography of academic works, and many
other useful resources. [...]

>

> **2.1 The Problem**
> Here we define our problem more formally:
> Let D be a large database of unstructured information such as the World
Wide Web [...] [8]

In a paper titled _Dynamic Data Mining_ Brin and Page situate their research
looking for _rules_ (statistical correlations) between words used in web
pages. The "baskets" they mention stem from the origins of "market basket"
techniques developed to find correlations between the it


ackle the scale of the web and still perform using
contemporary computing power completing its task in a reasonably short amount
of time.

> A traditional algorithm could not compute the large itemsets in the lifetime
of the universe. [...] Yet many data sets are difficult to mine because they
have many frequently occurring items, complex relationships between the items,
and a large number of items per basket. In this paper we experiment with word
usage in documents on the World Wide Web (see Section 4.2 for details about
this data set). This data set is fundamentally different from a supermarket
data set. Each document has roughly 150 distinct words on average, as compared
to roughly 10 items for cash register transactions. We restrict ourselves to a
subset of about 24 million documents from


hat's quite
symptomatic. It goes something like this: you (the programmer) have managed to
cobble out a lovely "content management system" (either from scratch, or using
any number of helpful frameworks) where your user can enter some "items" into
a database, for instance to store bookmarks. After this ordered items are
automatically presented in list form (say on a web page). The author: It's
great, except... could this bookmark come before that one? The problem stems
from the fact that the database ordering (a core functionality provided by any
database) somehow applies a sorting logic that's almost but not quite right. A
typical example is the sorting of names where details (where to place a name
that starts with a Norwegian "Ø" for inst


ge-specific, and
when a mixture of languages occurs, no single ordering is necessarily
"correct". The (often) exascerbated programmer might hastily add an additional
database field so that each item can also have an "order" (perhaps in the form
of a date or some other kind of (alpha)numerical "sorting" value) to be used
to correctly order the resulting list. Now the author has a means, awkward and
indirect but workable, to control the order of the presented data on the start
page. But one might well ask, why not just edit the resulting listing as a
document? Not possible! Contemporary content management systems are based on a
data flow from a "pure" source of a database, through controlling code and
templates to produce a document as a result. The document isn't the data, it's
the end result of an irreversible process. This problem, in this and many
variants, is widespread and reveals an essential backwardness that a
particular "computer scientist" mindset relating to what constitutes "data"
and in particular it's r


n, still followed by modern web browsers, the only difference
between the two visually is that UL items are preceded by a bullet symbol,
while OL items are numbered.

The idea of ordering runs deep in programming practice where essentially
different data structures are employed depending on whether order is to be
maintained. The indexes of a "hash" table, for instance (also known as an
associative array), are ordered in an unpredictable way governed by a
representation's particular implementation. This data structure, extremely
prevalent in contemporary programming practice sacrifices order to offer other
kinds of efficiency (fast text-based retrieval for instance).

## Data mining

In announcing Google's impending data center in Mons, Belgian prime minister
Di Rupo invoked the link between the history of the mining industry in the
region and the present and future interest in "data mining" as practiced by IT
companies such as Google.

Whether speaking of bales of


orithm, and in the process (voluntarily) blind themselves to the work
practices which have produced and maintain these "resources".

Berners-Lee, in chastising his audience of web publishers to not only publish
online, but to release "unadulterated" data belies a lack of imagination in
considering how language is itself structured and a blindness to the need for
more than additional technical standards to connect to existing publishing
practices.

Last Revision: 2*08*2016

1. ↑ Benjamin Franklin


Stanford webpage](http://infolab.stanford.edu/~sergey/)
8. ↑ Extracting Patterns and Relations from the World Wide Web, Sergey Brin, Proceedings of the WebDB Workshop at EDBT 1998,
9. ↑ Dynamic Data Mining: Exploring Large Rule Spaces by Sampling; Sergey Brin and Lawrence Page, 1998; p. 2
10. ↑ Hypertext Markup Language (HTML): "Internet Draft", Tim Berners-Lee and Daniel Connolly, June 1993,


dat in Sekulic 2018


tworks. We need to fight for Guerilla Open Access.”(7)
On January 6, 2011, the MIT police and the US Secret Service arrested Aaron
Swartz on charges of having downloaded a large number of scientific articles
from one of the most used and paywalled database. The federal prosecution
decided to show the increasingly nervous publishing industry the lengths they
are willing to go to protect them by indicting Swartz on 13 criminal counts.
With a threat of 50 years in prison and US$1 million fine, Aaron


dat in Sollfrank 2018


e shut down in 2012 as a consequence of a series of
injunctions from powerful publishing houses. The now leading shadow library in
the field, Library Genesis (LibGen), can be considered as its even more
influential successor. As of November 2016 the database contained 25 million
documents (42 terabytes), of which 2.1 million were books, with digital copies
of scientific articles published in 27,134 journals by 1342 publishers.18 The
large majority of the digital material is of scientific and educat


ilable under various and changing
domain names.20

The related project Sci-Hub is an online service that processes requests for
pay-walled articles by providing systematic, automized, but unauthorized
backdoor access to proprietary scholarly journal databases. Users requesting
papers not present in LibGen are advised to download them through Sci-Hub; the
respective PDF files are served to users and automatically added to LibGen (if
not already present). According to _Nature_ magazine, Sci-Hub hosts


dat in Sollfrank & Dockray 2013


cording to (A) what would
make things work, but (B), like you say, in a way that expresses the politics,
as we see them, of the site. [10:14] And so almost at every level, at every
design decision that Kayla might be making, or every kind of code or database
decision, you know, interactive decision that I might be making – those
conversations and those ideas are finding their way into that. [10:45] And
vice versa, that you see code, in a certain way, as not determining politics,
but certainly infl


dat in Sollfrank & Goldsmith 2013


e thing. [17:15] For many, many years
people would always come up to me and say, we'd like to put UbuWeb in a
database. And I said no. It’s working really well as it is. And, you know,
imagine if Ubu had been locked up in some sort of horrible SQL database. And
the administrator of the database walks away, the guy that knows all that
stuff walks away with the keys – which always happens. No… [17:39] This way it
is free, is open, is simple, is backwardly compatible – it always works.
[17:45] I like the simplicity of it. It's not


dat in Sollfrank & Kleiner 2012


everybody could
communicate with everybody without any kind of mediation, or control or
censorship – why that has been replaced with centralised, privatised
platforms, from an economic basis. [02:00] So that the need for capitalist
capture of user data, and user interaction, in order to allow investors to
recoup profits, is the driving force behind centralisation, and so it explains
that.

[02:15]
Copyright Myth

[02:19]
C.S.: The framework of these whole interviews is the relation bet


equirement to seek these
goods. [21:53] If you are running a company like Amazon, you are not making
any money selling Linux, you are making money selling web services, books and
other kinds of derivative products. You need free software to run your data
centre, to run your computer. [22:08] So the cost of software to you is a
cost, and so you're happy to have free software and support it. Because it
makes a lot more sense for you to contribute to some project that it’s also
used by five other com


ck you can upload
and download files to it – it's a file sharing system. It has a Wiki and file
space, essentially. Then you hide the stick somewhere, and you text the system
and it forwards your message to the next person that is waiting to share data.
And this continues like that, so then that person can share data on it, they
hide it somewhere and send an SMS to the system which then it gets forwarded
to the next person. [36:28] This work serves a few different functions at
once. First, it starts to get people to understand networks and all the basic
componen


ever. The problem is political. [39:43] The
problem is that these systems will not be financed by capital, because capital
requires profit in order to sustain itself. In order to capture profit it
needs to have control of user interaction and user's data. [39:57] To
illustrate this, we created a micro-blogging platform like Twitter, but using
a protocol of the 1970s called Finger. So we've used the protocol that has
been around since the 1970s and made a micro-blogging platform out of it –
fully,


m is economic. [41:23] For Thimbl to become a
reality, society has to transcend its economic limitations – it's social and
economic limitations in order to find ways to create communication systems
that are not simply funded by the capture of user data and information, which
Thimbl can't do because it is a distributive system. You can't control the
users, you can't know who is using it or what they are doing, because it's
fully distributed.

[41:47]
R15N

[41:52]
The R15N has elements


dat in Sollfrank & Mars 2013


It’s done by some
Russian hackers, who also allow anyone to download all of that. It’s 9
Terabytes of books, quite some chunk of hard disks which you need for that.
[10:47] And you can also download PHP, the back end of the website and the
MySQL database (a thumb of the MySQL database), so you can run your own
Library Genesis. That’s one of the ways how you can do that. [11:00] You can
also go and join Aaaaarg.org, where it is also not just about downloading
books and uploading books, it’s also about communication and i


dat in Stalder 2018


landscape is its
*algorithmicity*. It is characterized, in other []{#Page_5
type="pagebreak" title="5"}words, by automated decision-making processes
that reduce and give shape to the glut of information, by extracting
information from the volume of data produced by machines. This extracted
information is then accessible to human perception and can serve as the
basis of singular and communal activity. Faced with the enormous amount
of data generated by people and machines, we would be blind were it not
for algorithms.

The third chapter will focus on *political dimensions*. These are the
factors that enable the formal dimensions described in the preceding
chapter to manifest themselve


gh for the crisis to subside. The old administrative methods,
which involved manual information processing, simply could no longer
keep up. The crisis reached its first dramatic peak in 1889 in the
United States, with the realization that the census data from the year
1880 had not yet been analyzed when the next census was already
scheduled to take place during the subsequent year. In the same year,
the Secretary of the Interior organized a conference to investigate
faster methods of data processing. Two methods were tested for making
manual labor more efficient, one of which had the potential to achieve
greater efficiency by means of novel data-processing machines. The
latter system emerged as the clear victor; developed by an engineer
named Hermann Hollerith, it mechanically processed and stored data on
punch cards. The idea was based on Hollerith\'s observations of the
coup­ling and decoupling of railroad cars, which he interpreted as
modular units that could be combined in any desired order. The punch
card transferred this approach to information []{#Page_41
type="pagebreak" title="41"}management. Data were no longer stored in
fixed, linear arrangements (tables and lists) but rather in small units
(the punch cards) that, like railroad cars, could be combined in any
given way. The increase in efficiency -- with respect to speed *and*
flexibility --


, and nearly a hundred of Hollerith\'s
machines were used by the Census
Bureau.[^65^](#c1-note-0065){#c1-note-0065a} This marked a turning point
in the history of information processing, with technical means no longer
being used exclusively to store data, but to process data as well. This
was the only way to avoid the impending crisis, ensuring that
bureaucratic management could maintain centralized control. Hollerith\'s
machines proved to be a resounding success and were implemented in many
more branches of government


mporary manner. The principle was easy enough: the program would
automatic­ally reload a certain website over and over again in order to
exhaust the capacities of its network
servers.[^79^](#c1-note-0079){#c1-note-0079a} The goal was not to
destroy data but rather to disturb the normal functioning of an
institution in order to draw attention to the activities and interests
of the protesters.
:::

::: {.section}
### Networks as places of action {#c1-sec-0012}

What this new generation of media activ


ion among developers -- and
technological platforms, which enabled this form of cooperation
[]{#Page_55 type="pagebreak" title="55"}by providing archives, filter
functions, and search capabil­ities that made it possible to organize
large amounts of data, was thus advanced even further. The programmers
were no longer primarily working on the development of the internet
itself, which by then was functioning quite reliably, but were rather
using the internet to apply their cooperative principles to ot


ce and
development depend on []{#Page_58 type="pagebreak" title="58"}communal
formations. "Algorithmicity" denotes those aspects of cultural processes
that are (pre-)arranged by the activities of machines. Algorithms
transform the vast quantities of data and information that characterize
so many facets of present-day life into dimensions and formats that can
be registered by human perception. It is impossible to read the content
of billions of websites. Therefore we turn to services such as Google\'s
search algorithm, which reduces the data flood ("big data") to a
manageable amount and translates it into a format that humans can
understand ("small data"). Without them, human beings could not
comprehend or do anything within a culture built around digital
technologies, but they influence our understanding and activity in an
ambivalent way. They create new dependencies by pre-sorting and making
the


hey knew, they were now
looking ahead toward what they might not (yet) know.

In order to organize this information flood of rapidly amassing texts,
it was necessary to create new conventions: books were now specified by
their author, publisher, and date of publication, not to mention
furnished with page numbers. This enabled large numbers of texts to be
catalogued and every individual text -- indeed, every single passage --
to be referenced.[^11^](#c2-note-0011){#c2-note-0011a} Scientists could
leg


rom the internet.[^20^](#c2-note-0020){#c2-note-0020a} At the
same time, new providers have entered the market of free access; their
method is not to facilitate distributed downloads but rather to offer,
on account of the drastically reduced cost of data transfers, direct
streaming. Although some of these services are relatively easy to locate
and some have been legally banned -- the best-known case in Germany
being that of the popular site kino.to -- more of them continue to
appear.[^21^](#c2-note-


*. The total number of photographs saved there has been
estimated to be 250 billion. In addition, there are also large platforms
for professional "stock photos" (supplies of pre-produced images that
are supposed to depict generic situations) and the databanks of
professional agencies such Getty Images or Corbis. All of these images
can be found easily and acquired quickly (though not always for free).
Yet photography is not unique in this regard. In all fields, the number
of cultural artifacts avail


ic
collection devoted to the colonial history of France, it is now possible
for everything to exist side by side. Europeana is not an archive in the
traditional sense, or even a museum with a fixed and meaningful order;
rather, it is just a standard database. Everything in it is just one
search request away, and every search generates a unique order in the
form of a sequence of visible artifacts. As a result, individual objects
are freed from those meta-narratives, created by the museums and
archive


request and the corpus
of material, which is likewise constantly changing.

Precisely because it offers so many different approaches to more or less
freely combinable elements of information, []{#Page_70 type="pagebreak"
title="70"}the order of the database no longer really provides a
framework for interpreting search results in a meaningful way.
Al­together, the meaning of many objects and signs is becoming even more
uncertain. On the one hand, this is because the connection to their
original con


on contexts. In less official archives and in less
specialized search engines, the dissolution of context is far more
pronounced than it is in the case of the Europeana project. For the sake
of orienting its users, for instance, YouTube provides the date when a
video has been posted, but there is no indication of when a video was
actually produced. Further information provided about a video, for
example in the comments section, is essentially unreliable. It might be
true -- or it might not. The inte


"are not an affirmative confirmation of the past; rather, they are
*questionings* of the present through reaching back to historical
events," especially as they are represented in images and other forms of
documentation. Thanks to search engines and databases, such
representations are more or less always present, though in the form of
indeterminate images, ambivalent documents, and contentious
interpretations. Artists in this situation, as Arns explains,

::: {.extract}
do not ask the naïve questio


i­viduals must
continuously communicate in order to constitute themselves within the
fields and practices, or else they will remain invisible. The mass of
tweets, updates, emails, blogs, shared pictures, texts, posts on
collaborative platforms, and databases (etc.) that are necessary for
this can only be produced and processed by means of digital
technologies. In this act of incessant communication, which is a
constitutive element of social existence, the personal desire for
self-constitution and o


communication is always
the here and now. With the instant transmission of information,
everything that is not "here" is inaccessible and everything that is not
"now" has disappeared. Powerful infrastructure has been built to achieve
these effects: data centers, intercontinental networks of cables,
satellites, high-performance nodes, and much more. Through globalized
high-frequency trading, actors in the financial markets have realized
this []{#Page_90 type="pagebreak" title="90"}technical vision t


for example, common
languages, technical standards, or social conventions. The fundamental
protocol for the internet is the Transmission Control Protocol/Internet
Protocol (TCP/IP). This suite of protocols defines the common language
for exchanging data. Every device that exchanges information over the
internet -- be it a smartphone, a supercomputer in a data center, or a
networked thermostat -- has to use these protocols. In growing areas of
social contexts, the common language is English. Whoever wishes to
belong has to speak it increasingly often. In the natural sciences,
communication now takes place


ormation,
which in practical terms is infinitely large, and all of the growth
curves continue to climb steeply -- today\'s cultural reality is
nevertheless entirely different from that described by Borges. Our
ability to deal with massive amounts of data has radically improved, and
thus our faith in the utility of information is not only unbroken but
rather gaining strength. What is new is precisely such large quantities
of data ("big data"), which, as we are promised or forewarned, will lead
to new knowledge, to a comprehensive understanding of the world, indeed
even to "omniscience."[^76^](#c2-note-0076){#c2-note-0076a} This faith
in data is based above all on the fact that the two processes described
above -- referentiality and communality -- are not the only new
mechanisms for filtering, sorting, aggregating, and evaluating things.
Beneath or ahead of the social mechanisms of decentralized and networked
cultural production, there are algorithmic processes that pre-sort the
immeasurably large volumes of data and convert them into a format that
can be apprehended by individuals, evaluated by communities, and
invested with meaning.

Strictly speaking, it is impossible to maintain a categorical
distinction between social processes that take place in and by


would be able to execute
not only one but (theoretically) every written algorithm. The
Hungarian-born mathematician John von Neumann made it his goal to
implement this idea. In 1945, he published a model in which the program
(the algorithm) and the data (the input and output) were housed in a
common storage device. Thus, both could be manipulated simultaneously
without having to change the hardware. In this way, he converted the
"Turing machine" into the "universal Turing machine"; that is, the
mod


tio by a factor of a billion.
With inflation taken into consideration, this factor would be even
higher. No less dramatic were the increases in performance -- or rather
[]{#Page_106 type="pagebreak" title="106"}the price reductions -- in the
area of data storage. In 1980, it cost more than \$400,000 to store a
gigabyte of data, whereas 30 years later it would cost just 10 cents to
do the same -- a price reduction by a factor of 4 million. And in both
areas, this development has continued without pause.

These increases in performance have formed the material basis for the


mposing
texts or analyzing the content of images, are now frequently done by
machines. As early as 2010, a program called Stats Monkey was introduced
to produce short reports about baseball games. All that the program
needs for this is comprehensive data about the games, which can be
accumulated mechanically and which have since become more detailed due
to improved image recognition and sensors. From these data, the program
extracts the decisive moments and players of a game, recognizes
characteristic patterns throughout the course of play (such as
"extending an early lead," "a dramatic comeback," etc.), and on this
basis generates its own report. Regardin


usiness was created from the original interdisciplinary research
project: Narrative Science. In addition to sport reports it now offers
texts of all sorts, but above all financial reports -- another field for
which there is a great deal of available data. These texts have been
published by reputable media outlets such as the business magazine
*Forbes*, in which their authorship []{#Page_107 type="pagebreak"
title="107"}is credited to "Narrative Science." Although these
contributions are still limite


08 type="pagebreak"
title="108"}world, for, in the summer of 2013, a single bot contributed
more than 200,000 articles to it.[^89^](#c2-note-0089){#c2-note-0089a}
Since 2013, moreover, the company Epagogix has offered software that
uses histor­ical data to evaluate the market potential of film scripts.
At least one major Hollywood studio uses this software behind the backs
of scriptwriters and directors, for, according to the company\'s CEO,
the latter would be "nervous" to learn that their creativ


ns -- with edges
or surfaces in images, for instance -- for it is extremely complex and
computationally intensive to program such learning processes. In recent
years, however, there have been enormous leaps in available computing
power, and both the data inputs and the complexity of the learning
models have increased exponentially. Today, on the basis of simple
patterns, algorithms are developing improved recognition of the complex
content of images. They are refining themselves on their own. The te


xtracting {#c2-sec-0022}

Orders generated by algorithms are a constitutive element of the digital
condition. On the one hand, the mechanical pre-sorting of the
(informational) world is a precondition for managing immense and
unstructured amounts of data. On the other hand, these large amounts of
data and the computing centers in which they are stored and processed
provide the material precondition for developing increasingly complex
algorithms. Necessities and possibilities are mutually motivating


a search
engine. What the user does not see are the complex preconditions for
assembling the search results. By the middle of 2014, according to the
company\'s own information, the Google index alone included more than a
hundred million gigabytes of data.

Originally (that is, in the second half of the 1990s), Page­Rank
functioned in such a way that the algorithm analyzed the structure of
links on the World Wide Web, first by noting the number of links that
referred to a given document, and second


op dynamic orders for rapidly changing fields, enabling the
evaluation of the importance of individual documents without knowledge
of their content. Because the analysis of citations or links operates on
a purely quantitative basis, large amounts of data can be quickly
structured with them, and especially relevant positions can be
determined. The second advantage of this approach is that it does not
require any assumptions about the contours of different fields or their
relationships to one another.


nerated uniquely for every
user and then presented. Google is not the only company that has gone
down this path. Orders produced by algorithms have become increasingly
oriented toward creating, for each user, his or her own singular world.
Facebook, dating services, and other social mass media have been
pursuing this approach even more radically than Google.
:::

::: {.section}
### From the data shadow to the synthetic profile {#c2-sec-0024}

This form of generating the world requires not only detailed information
about the external world (that is, the reality []{#Page_116
type="pagebreak" title="116"}shared by everyone) but also informatio


y that Amazon
assembles its book recommendations, for the company knows that, within
the cluster of people that constitutes part of every person\'s profile,
a certain percentage of them have already gone through this sequence of
activity. Or, as the data-mining company Science Rockstars (!) once
pointedly expressed on its website, "Your next activity is a function of
the behavior of others and your own past."

Google and other providers of algorithmically generated orders have been
devoting increase


gle Now, and its slogan is
"The right information at just the right time." The program, which was
originally developed as an app but has since been made available on
Chrome, Google\'s own web browser, attempts to anticipate, on the basis
of existing data, a user\'s next step, and to provide the necessary
information before it is searched for in order that such steps take
place efficiently. Thus, for instance, it draws upon information from a
user\'s calendar in order to figure out where he or she will have to go
next. On the basis of real-time traffic data, it will then suggest the
optimal way to get there. For those driving cars, the amount of traffic
on the road will be part of the equation. This is ascertained by
analyzing the motion profiles of other drivers, which will allow the
program to determine whether the traffic is flowing or stuck in a jam.
If enough historical data is taken into account, the hope is that it
will be possible to redirect cars in such a way that traffic jams should
no longer occur.[^110^](#c2-note-0110){#c2-note-0110a} For those who use
public transport, Google Now evaluates real-time data about the
locations of various transport services. With this information, it will
suggest the optimal route and, depending on the calculated travel time,
it will send a reminder (sometimes earlier, sometimes later) when it is
time to go. That which


sed in technical or
mathematical terms, codifies assumptions that express a specific
position in the world. There can be no purely descriptive variables,
just as there can be no such thing as "raw
data."[^112^](#c2-note-0112){#c2-note-0112a} Both -- data and variables
-- are always already "cooked"; that is, they are engendered through
cultural operations and formed within cultural
categories.[^113^](#c2-note-0113){#c2-note-0113a} With every use of
produced data and with every execution of an algorithm, the assumptions
embedded in them are activated, and the positions contained within them
have effects on the world that the algorithm generates and presents.

As already mentioned, the early version of the Pa


o price negotiations with the
company.[^122^](#c2-note-0122){#c2-note-0122a}

Controversies over the methods of Amazon or Google, however, are the
exception rather than the rule. Necessary (but never neutral) decisions
about recording and evaluating data []{#Page_121 type="pagebreak"
title="121"}with algorithms are being made almost all the time without
any discussion whatsoever. The logic of the original Page­Rank algorithm
was criticized as early as the year 2000 for essentially representing
the


deled group. In other words, Google\'s new algorithm
favors that which is gaining popularity within a user\'s social network.
The global village is thus becoming more and more
provincial.[^124^](#c2-note-0124){#c2-note-0124a}
:::

::: {.section}
### Data behaviorism {#c2-sec-0026}

Algorithms such as Google\'s thus reiterate and reinforce a tendency
that has already been apparent on both the level of individual users and
that of communal formations: in order to deal with the vast amounts and
complex


e of critique. It
was held to be mechanistic, reductionist, and authoritarian because it
privileged the observing scientist over the subject. In practice, it
quickly ran into its own limitations: it was simply too expensive and
complicated to gather data about human behavior.

Yet that has changed radically in recent years. It is now possible to
measure ever more activities, conditions, and contexts empirically.
Algorithms like Google\'s or Amazon\'s form the technical backdrop for
the revival of a


re.[^127^](#c2-note-0127){#c2-note-0127a} Every critique
of this positivistic perspective -- that every measurement result, for
instance, reflects not only the measured but also the measurer -- is
brushed aside with reference to the sheer amounts of data that are now
at our disposal.[^128^](#c2-note-0128){#c2-note-0128a} This attitude
substantiates the claim of those in possession of these new and
comprehensive powers of observation (which, in addition to Google and
Facebook, also includes the intel


2-note-0075}  Jorge Luis Borges, "The Library of
Babel," trans. Anthony Kerrigan, in Borges, *Ficciones* (New York: Grove
Weidenfeld, 1962), pp. 79--88.

[76](#c2-note-0076a){#c2-note-0076}  Heinrich Geiselberger and Tobias
Moorstedt (eds), *Big Data: Das neue Versprechen der Allwissenheit*
(Berlin: Suhrkamp, 2013).

[77](#c2-note-0077a){#c2-note-0077}  This is one of the central tenets
of science and technology studies. See, for instance, Geoffrey C. Bowker
and Susan Leigh Star, *Sorting Thin


007).

[94](#c2-note-0094a){#c2-note-0094}  Each of these models was tested on
the basis of the 50 million most common search terms from the years
2003--8 and classified according to the time and place of the search.
The results were compared with data from the health authorities. See
Jeremy Ginsberg et al., "Detecting Influenza Epidemics Using Search
Engine Query Data," *Nature* 457 (2009): 1012--4.

[95](#c2-note-0095a){#c2-note-0095}  In absolute terms, the rate of
correct hits, at 15.8 percent, was still relatively low. With the same
dataset, however, random guessing would only have an accuracy of 0.005
perc


107--17.

[100](#c2-note-0100a){#c2-note-0100}  Eugene Garfield, "Citation Indexes
for Science: A New Dimension in Documentation through Association of
Ideas," *Science* 122 (1955): 108--11.

[101](#c2-note-0101a){#c2-note-0101}  Since 1964, the data necessary for
this has been published as the Science Citation Index (SCI).

[102](#c2-note-0102a){#c2-note-0102}  The assumption that the subjects
produce these structures indirectly and without any strategic intention
has proven to be problematic


, it is not only the world of
advertising that motivates the collection of personal information. Such
information is also needed for the development of personalized
algorithms that []{#Page_194 type="pagebreak" title="194"}give order to
the flood of data. It can therefore be assumed that the rampant
collection of personal information will not cease or slow down even if
commercial demands happen to change, for instance to a business model
that is not based on advertising.

[109](#c2-note-0109a){#c2-n


hich would now be
traffic-free.

[111](#c2-note-0111a){#c2-note-0111}  Pamela Vaughan, "Demystifying How
Facebook\'s EdgeRank Algorithm Works," *HubSpot* (April 23, 2013),
online.

[112](#c2-note-0112a){#c2-note-0112}  Lisa Gitelman (ed.), *"Raw Data"
Is an Oxymoron* (Cambridge, MA: MIT Press, 2013).

[113](#c2-note-0113a){#c2-note-0113}  The terms "raw," in the sense of
unprocessed, and "cooked," in the sense of processed, derive from the
anthropologist Claude Lévi-Strauss, who introduced th


  One estimate that continues to be
cited quite often is already obsolete: Michael K. Bergman, "White Paper
-- The Deep Web: Surfacing Hidden Value," *Journal of Electronic
Publishing* 7 (2001), online. The more content is dynamically generated
by databases, the more questionable such estimates become. It is
uncontested, however, that only a small portion of online information is
registered by search engines.

[116](#c2-note-0116a){#c2-note-0116}  Theo Röhle, "Die Demontage der
Gatekeeper: Rela


eral of Google\'s competitors, including
Microsoft, TripAdvisor, and Oracle.

[119](#c2-note-0119a){#c2-note-0119}  "Antitrust: Commission Sends
Statement of Objections to Google on Comparison Shopping Service,"
*European Commission: Press Release Database* (April 15, 2015), online.

[120](#c2-note-0120a){#c2-note-0120}  Amit Singhal, "An Update to Our
Search Algorithms," *Google Inside Search* (August 10, 2012), online. By
the middle of 2014, according to some sources, Google had received
aroun


2-note-0124a){#c2-note-0124}  Eli Pariser, *The Filter Bubble:
How the New Personalized Web Is Changing What We Read and How We Think*
(New York: Penguin, 2012).

[125](#c2-note-0125a){#c2-note-0125}  Antoinette Rouvroy, "The End(s) of
Critique: Data-Behaviourism vs. Due-Process," in Katja de Vries and
Mireille Hilde­brandt (eds), *Privacy, Due Process and the Computational
Turn: The Philosophy of Law Meets the Philosophy of Technology* (New
York: Routledge, 2013), pp. 143--65.

[126](#c2-note-


to find the
cause. Our 'independent variables' -- the causes of behavior -- are the
external conditions of which behavior is a function."

[127](#c2-note-0127a){#c2-note-0127}  Nathan Jurgenson, "View from
Nowhere: On the Cultural Ideology of Big Data," *New Inquiry* (October
9, 2014), online.

[128](#c2-note-0128a){#c2-note-0128}  danah boyd and Kate Crawford,
"Critical Questions for Big Data: Provocations for a Cultural,
Technological and Scholarly Phenomenon," *Information, Communication &
Society* 15 (2012): 662--79.
:::
:::

[III]{.chapterNumber} [Politics]{.chapterTitle} {#c3}

::: {.s


evertheless cooperate on the level
of the technical protocol and allow users to send information back and
forth regardless of which providers are used. A choice to switch
providers would not cause the forfeiting of individuals\' address books
or any data. Those who put convenience first can use one of the large
commercial providers, or they can choose one of the many small
commercial or non-commercial services that specialize in certain niches.
It is even possible to set up one\'s own server in orde


courses of
action. Admittedly, modern email services are set up in such a way that
most of their users remain on the surface, while the essential decisions
about how they are able to act are made on the "back side"; that is, in
the program code, in databases, and in configuration files. Yet these
two levels are not structurally (that is, organizationally and
technically) separated from one another. Whoever is willing and ready to
[]{#Page_131 type="pagebreak" title="131"}appropriate the correspondi


today, and the large providers -- above all Google,
whose Gmail service had more than 500 million users in 2014 -- dominate
the market. The gap has thus widened between user interfaces and the
processes that take place behind them on servers and in data centers,
and this has expanded what Crouch referred to as "the influence of the
privileged elite." In this case, the elite are the engineers and
managers employed by the large providers, and everyone else with access
to the underbelly of the infrastructure, including the British
Government Communications Headquarters (GCHQ) and the US National
Security Agency (NSA), both of which employ programs such as a MUSCULAR
to record data transfers between the computer centers operated by large
American providers.[^10^](#c3-note-0010){#c3-note-0010a}

Nevertheless, email essentially remains an open application, for the
SMTP protocol forces even the largest providers to cooperate. Sma


by all of these providers, communal formations
can be created with ease. Every day, groups are formed that organize
information, knowledge, and resources in order to establish self-defined
practices (both online and offline). The immense amounts of data,
information, and cultural references generated by this are pre-sorted by
algorithms that operate in the background to ensure that users never
lose their orientation.[^13^](#c3-note-0013){#c3-note-0013a} Viewed from
the perspective of output legitim


al social networks have institutionalized a power
imbalance between those engaged with the user interface and those who
operate the services behind the scenes. The possibility of users to
organize themselves and exert influence -- over the way their data are
treated, for instance -- is severely limited.

One (nominal) exception to this happened to be Facebook itself. From
2009 to 2012, the company allowed users to vote about any proposed
changes to its terms and conditions, which attracted more than


feature, providers such as Facebook have further tilted the balance of
power between users and operators. With every new version and with every
new update, the possibilities of interaction are changed in such a way
that, within closed networks, more data can be produced in a more
uniform format. Thus, it becomes easier to make connections between
them, which is their only real source of value. Facebook\'s compulsory
"real-name" policy, for instance, which no longer permits users to
register under a


e companies to
assemble, in the background, a uniform profile out of the activities of
users on sites or applications that seem at first to have nothing to do
with one another.[^17^](#c3-note-0017){#c3-note-0017a} Google, for
instance, connects user data from its search function with information
from YouTube and other online services, but also with data from Nest, a
networked thermostat. Facebook connects data from its social network
with those from WhatsApp, Instagram, and the virtual-reality service
Oculus.[^18^](#c3-note-0018){#c3-note-0018a} This trend is far from
over. Many services are offering more and more new functions for
generating data, and entire new areas of recording data are being
developed (think, for instance, of Google\'s self-driving car). Yet
users have access to just a minuscule portion of the data that they
themselves have generated and with which they are being described. This
information is fully available to the programmers and analysts alone.
All of this is done -- as the sanctimonious argument goes -- in the name
of data protection.
:::

::: {.section}
### Selling, predicting, modifying {#c3-sec-0005}

Unequal access to information has resulted in an imbalance of power, for
the evaluation of data opens up new possibilities for action. Such data
can be used, first, to earn revenue from personalized advertisements;
second, to predict user behavior with greater accuracy; and third, to
adjust the parameters of interaction in such a way that preferred
patterns of []{#Page_135 type="pagebreak" t


ebook and other
social mass media are set up in such a way that those who control the
servers are always able to see everything. All of this information,
moreover, is formatted in such a way as to optimize its statistical
analysis. As the amounts of data increase, even the smallest changes in
frequencies and correlations begin to gain significance. In its study of
romantic relationships, for instance, Facebook discovered that the
number of online interactions reaches its peak 12 days before a
relati


they might usually
exchange. With trad­itional methods of surveillance, which focus on
individual people, such a small deviation would not have been detected.
To do so, it is necessary to have immense numbers of users generating
immense volumes of data. Accordingly, these new []{#Page_136
type="pagebreak" title="136"}analytic possibilities do not mean that
Facebook can accur­ately predict the behavior of a single user. The
unique person remains difficult to calculate, for all that could be
ascert


138 type="pagebreak" title="138"}on that day. An application
was surreptitiously loaded into the timelines of more than 10 million
people that contained polling information and a list of friends who had
already voted. It was possible to collect this data because the
application had a built-in function that enabled people to indicate
whether they had already cast a vote. A control group received a message
that encouraged them to vote but lacked any personalization or the
possibility of social interac


failing to shape one\'s
own activity in a coherent manner are ideal-typical manifestations of
the power of networks.

The problem experienced by the unwilling-willing users of Facebook has
not been caused by the transformation of communication into data as
such. This is necessary to provide input for algorithms, which turn the
flood of information into something usable. To this extent, the general
complaint about the domination of algorithms is off the mark. The
problem is not the algorithms themse



In June 2013, Edward Snowden exposed an additional and especially
problematic aspect of the expansion of post-democratic structures: the
comprehensive surveillance of the internet by government intelligence
agencies. The latter do not use collected data primarily for commercial
ends (although they do engage in commercial espionage) but rather for
political repression and the protection of central power interests --
or, to put it in more neutral terms, in the service of general security.
Yet the NSA


ersonnel swapping of this sort takes place at all levels and
is facilitated by the fact that the two sectors are engaged in nearly
the same activity: analyzing social interactions in real time by means
of their exclusive access to immense volumes of data. The lines of
inquiry and the applied methods are so similar that universities,
companies, and security organizations are able to cooperate closely with
one another. In many cases, certain programs or analytic methods are
just as suitable for commer


which is "to involve
European scientists and researchers in the development of solutions to
and tools for automatic threat
detection."[^47^](#c3-note-0047){#c3-note-0047a} Research, however, is
just one area of activity. As regards the collection of data and the
surveillance of communication, there is also a high degree of
cooperation between private and government actors, though it is not
always without tension. Snowden\'s revelations have done little to
change this. The public outcry of large inte


through smart homes, which are still
limited to the high end of the market, and smart meters, which have been
implemented across all social
strata.[^51^](#c3-note-0051){#c3-note-0051a} The latter provide
electricity companies with detailed real-time data about a household\'s
usage behavior and are supposed to enhance energy efficiency, but it
remains unclear exactly how this new efficiency will be
achieved.[^52^](#c3-note-0052){#c3-note-0052a} The concept of the "smart
city" extends this process to


infiltrate human beings. Adherents of the
Quantified Self movement work diligently to record digital information
about their own bodies. The number of platforms that incite users to
stay fit (and []{#Page_147 type="pagebreak" title="147"}share their data
with companies) with competitions, point systems, and similar incentives
has been growing steadily. It is just a small step from this hobby
movement to a disciplinary regime that is targeted at the
body.[^54^](#c3-note-0054){#c3-note-0054a} Imagine the possibilities of
surveillance and sanctioning that will come about when data from
self-optimizing applications are combined with the data available to
insurance companies, hospitals, authorities, or employers. It does not
take too much imagination to do so, because this is already happening in
part today. At the end of 2014, for instance, the Generali Insurance
Company announced a new


beral democracy with a
few problems that can be eliminated through well-intentioned reforms.
Rather, a new social system has emerged in which allegedly relaxed
control over social activity is compensated for by a heightened level of
control over the data and structural conditions pertaining to the
activity itself. In this system, both the virtual and the physical world
are altered to achieve particular goals -- goals determined by just a
few powerful actors -- without the inclusion of those affected by these
changes and often without them being able to notice the changes at all.
Whoever refuses to share his or her data freely comes to look suspicious
and, regardless of the motivations behind this anonymity, might even be
regarded as a potential enemy. In July 2014, for instance, the following
remarks were included in Facebook\'s terms of use: "On Facebook people
c


nt of
things, it falls somewhat short, for every form of power provokes its
own forms of resistance.[^61^](#c3-note-0061){#c3-note-0061a} In the
context of post-democracy under the digital condition, these forms have
likewise shifted to the level of data, and an especially innovative and
effective means of resistance []{#Page_149 type="pagebreak"
title="149"}has been the "leak"; that is, the unauthorized publication
of classified documents, usually in the form of large datasets. The most
famous platform for this is WikiLeaks, which since 2006 has attracted
international attention to this method with dozens of spectacular
publications -- on corruption scandals, abuses of authority, corporate
malfeasance, environmental damage, and war crimes. As a form of
resistance, however, leaking entire databases is not limited to just one
platform. In recent years and through a variety of channels, large
amounts of data (from banks and accounting firms, for instance) have
been made public or have been handed over to tax investigators by
insiders. Thus, in 2014, for instance, the *Süddeutsche Zeitung*
(operating as part of the International Consortium of Investigative
Journalists based in Washington, DC), was not only able to analyze the
so-called "Offshore Leaks" -- a database concerning approximately
122,000 shell companies registered in tax
havens[^62^](#c3-note-0062){#c3-note-0062a} -- but also the "Luxembourg
Leaks," which consisted of 28,000 pages of documents demonstrating the
existence of secret and extensive t


from such advances.
Even institutions that depend on keeping secrets, such as banks and
intelligence agencies, have to "share" their information internally and
rely on a large pool of technical personnel to record and process the
massive amounts of data. To accomplish these tasks, employees need the
fullest possible access to this information, for even the most secret
databases have to be maintained by someone, and this also involves
copying data. Thus, it is far easier today than it was just a few
decades ago to smuggle large volumes of data out of an
institution.[^65^](#c3-note-0065){#c3-note-0065a}

This new form of leaking, however, did not become an important method of
resistance on account of technical developments alone. In the era of big
data, databases are the central resource not only for analyzing how the
world is described by digital communication, but also for generating
that communication. The power of networks in particular is organized
through the construction of environmental conditio


e often banal and harmless, but as a whole they
contribute to a dynamic field that is meant to produce the results
desired by the planners who issue them. In order to reconstruct this
process, it is necessary to have access to these large amounts of data.
With such information at hand, it is possible to relocate the
surreptitious operations of post-democracy into the sphere of political
debate -- the public sphere in its emphatic, liberal sense -- and this
needs to be done in order to strengthen dem


lation of these
many individual programs. One of these programs written by outsiders is
the Linux kernel, which in many respects is the central and most complex
program within a GNU/Linux operating system. Governing the organization
of processes and data, it thus forms the interface between hardware and
software. An entire institutional subsystem has been built up around
this complex program, upon which everything else depends. The community
of developers was initiated by Linus Torvalds, who wrote t


ibution of music into their own hands without the authorization of
copyright owners. This incited a flood of litigation that managed to
shut the service down in July 2001. This did not, however, put an end to
the large-scale practice of unauthorized data sharing. New services and
technologies, many of which used (the file-sharing protocol) BitTorrent,
quickly filled in the gap. The number of court cases skyrocketed, not
least because new legal standards expanded the jurisdiction of copyright
law and


es, with its low percentage of women editors (around 10 percent),
exhausting discussions, complex rules, lack of young personnel, and
systematic attempts at manipulation, have been well documented because
Wikipedia also guarantees free access to the data generated by the
activities of users, and thus makes the development of the commons
fairly transparent for outsiders.[^84^](#c3-note-0084){#c3-note-0084a}

One of the most fundamental and complex decisions in the history of
Wikipedia was to change i


h types of contribution
ultimately derive from the same motivation: they are expressions of
appre­ciation for the meaning that the common resource possesses for
one\'s own activity.
:::

::: {.section}
### At the interface with physical space: open data {#c3-sec-0014}

Wikipedia, however, is an exception. None of the other new commons have
managed to attract such large financial contributions. The project known
as OpenStreetMap (OSM), which was founded in 2004 by Steve Coast,
happens to be the most


ocus on building a consensus that
does not have to be perfect but simply good enough for the overwhelming
majority of the community to acknowledge it (a "rough consensus").
Today, the coverage and quality of the maps that can be generated from
these data are so good for so many areas that they now represent serious
competition to commercial digital alternatives. OSM data are used not
only by Wikipedia and other non-commercial projects but also
increasingly by large commercial services that need geographical
information and suitable maps but do not want to rely on a commercial
provider whose terms and conditions can


from donations and half from holding
conferences.[^89^](#c3-note-0089){#c3-note-0089a} That said, OSM is
nevertheless a socially, technologically, and financially robust
commons, though one with a model entirely different from Wikipedia\'s.
Because data are at the heart of the project, its needs for hardware and
bandwidth are negligible compared to Wikipedia\'s, and its servers can
be housed at universities or independently operated by individual
groups. Around this common resource, a global networ


the experience of working with free software
to the generation of large bases of knowledge, the community responsible
for OpenStreetMaps succeeded in making the experiences of the Wikipedia
project useful for the creation of a commons based on large datasets,
and managed to adapt these experiences according to the specific needs
of such a project.[^91^](#c3-note-0091){#c3-note-0091a}

It is of great political significance that informational commons have
expanded into the areas of data recording and data use. Control over
data, which specify and describe the world in real time, is an essential
element of the contempor­ary constitution of power. From large volumes
of data, new types of insight can be gained and new strategies for
action can be derived. The more one-sided access to data becomes, the
more it yields imbalances of power.

In this regard, the commons model offers an alternative, for it allows
various groups equal and unobstructed access to this potential resource
of power. This, at least, is how the Open Data movement sees things.
Data are considered "open" if they are available to everyone without
restriction to be used, distributed, and developed freely. For this to
occur, it is necessary to provide data in a standard-compatible format
that is machine-readable. Only in such a way can they be browsed by
algorithms and further processed. Open data are an important
precondition for implementing the power of algorithms in a democratic
manner. They ensure that there can be an effective diversity of
algorithms, for anyone can write his or her own algorithm or commission
others to process data in various ways and in light of various
interests. Because algorithms cannot be neutral, their diversity -- and
the resulting ability to compare the results of different methods -- is
an important precondition for them not becoming an uncontrollable
instrument of power. This can be achieved most dependably through free
access to data, which are maintained and cultivated as a commons.

Motivated by the conviction that free access to data represents a
necessary condition for autonomous activity in the []{#Page_167
type="pagebreak" title="167"}digital condition, many new initiatives
have formed that are devoted to the decentralized collection,
networking, and communal organization of data. For several years, for
instance, there has been a global community of people who observe
airplanes in their field of vision, share this information with one
another, and make it generally accessible. Outside of the tight
community, these data are typically of little interest. Yet it was
through his targeted analysis of this information that the geographer
and artist Trevor Paglen succeeded in mapping out the secret arrests
made by American intelligence services. Ultimately, even the CIA\'s
clandestine airplanes have to take off and land like any others, and
thus they can be observed.[^92^](#c3-note-0092){#c3-note-0092a} Around
the collection of environmental data, a movement has formed whose
adherents enter measurements themselves. To cite just one example:
thanks to a successful crowdfunding campaign that raised more than
\$144,000 (just 39,000 were needed), it was possible to finance the
development of a simple set of sensors called the Air Quality Egg. This
device can measure the concentration of carbon dioxide or nitrogen
dioxide in the air and send its findings to a public database. It
involves the use of relatively simple technologies that are likewise
freely licensed (open hardware). How to build and use it is documented
in such a detailed and user-friendly manner -- in instructional videos
on YouTube, for instance -- th


ommercial product. Over time, this has brought about a
network of stations that is able to measure the quality of the air
exactly, locally, and in places that are relevant to users. All of this
information is stored in a global and freely accessible database, from
which it is possible to look up and analyze hyper-local data in real
time and without restrictions.[^93^](#c3-note-0093){#c3-note-0093a}

A list of examples of data commons, both the successful and the
unsuccessful, could go on and on. It will suffice, however, to point out
that many new commons have come about that are redefining the interface
between physical and informational space and creating new strategie


cial infrastructures for
communal learning, compiling documentation, making infor­mation
available, and thus facilitating access for those interested and
building up the community. All of this depends on free knowledge, from
Wikipedia to scientific databases. This enables a great variety of
actors -- in this case en­vironmental scientists, programmers,
engineers, and interested citizens -- to come together and create a
common frame of reference in which everyone can pursue his or her own
goals and yet do so on the basis of communal resources. This, in turn,
has given rise to a new commons, namely that of environmental data.

Not all data can or must be collected by individuals, for a great deal
of data already exists. That said, many scientific and state
institutions face the problem of having data that, though nominally
public (or at least publicly funded), are in fact extremely difficult
for third parties to use. Such information may exist, but it is kept in
institutions to which there is no or little public access, or it exists
only in analog or non-machine-readable formats (as PDFs of scanned
documents, for instance), or its use is tied to high license fees. One
of the central demands of the Open Data and Open Access movements is
thus to have free access to these collections. Yet there has been a
considerable amount of resistance. Whether for political or economic
reasons, many public and scientific institutions do not want their data
to be freely accessible. In many cases, moreover, they also lack the
competence, guidelines, budgets, and internal processes that would be
necessary to make their data available to begin with. But public
pressure has been mounting, not least through initiatives such as the
global Open Data Index, which compares countries according to the
accessibility of their information.[^94^](#c3-note-0094){#c3-note-0094a}
In Germany, the Digital Openness Index evaluates states and communities
in terms of open data, the use of open-source software, the availability
of open infrastructures (such as free internet access in public places),
open policies (the licensing of public information,
freedom-of-information laws, the transparency of budget planning, etc.),
and open education (freely accessible educational resources, for
instance).[^95^](#c3-note-0095){#c3-note-0095a} The results are rather
sobering. The Open Data Index has identified 10 []{#Page_169
type="pagebreak" title="169"}different datasets that ought to be open,
including election results, company registries, maps, and national
statistics. A study of 97 countries revealed that, by the middle of
2015, only 11 percent of these datasets were entirely freely accessible
and usable.

Although public institutions are generally slow and resistant in making
their data freely available, important progress has nevertheless been
made. Such progress indicates not only that the new commons have
developed their own structures in parallel with traditional
institutions, but also that the commoners have begun to make new


ndamental level with respect to their procedures,
self-perception, and relation to citizens. This is easier said than
done.
:::

::: {.section}
### Municipal infrastructures as commons: citizen networks {#c3-sec-0015}

The demands for open access to data, however, are not exhausted by
attempts to redefine public institutions and civic participation. In
fact, they go far beyond that. In Germany, for instance, there has been
a recent movement toward (re-)communalizing the basic provision of water
and


iversity, and although
its citizens are able to (or have to) lead their lives in a
self-responsible manner, they are no longer able to exert any influence
over the political and economic structures in which their lives are
unfolding. On the basis of data-intensive and comprehensive
surveillance, these structures are instead shaped disproportionally by
an influential few. The resulting imbalance of power has been growing
steadily, as has income inequality. In contrast to this, the tendency
toward com


o a renewal of democracy, based on
institutions that exist outside of the market and the state. At its core
this movement involves a new combination of economic, social, and
(ever-more pressing) ecological dimensions of everyday life on the basis
of data-intensive participatory processes.

What these two developments share in common is their comprehensive
realization of the infrastructural possibilities of the present. Both of
them develop new relations of production on the basis of new productive
f


first providers of
Webmail was Hotmail, which became available in 1996. Just one year
later, the company was purchased by Microsoft.

[10](#c3-note-0010a){#c3-note-0010}  Barton Gellmann and Ashkan Soltani,
"NSA Infiltrates Links to Yahoo, Google Data Centers Worldwide, Snowden
Documents Say," *Washington Post* (October 30, 2013), online.

[11](#c3-note-0011a){#c3-note-0011}  Initiated by hackers and activists,
the Mailpile project raised more than \$160,000 in September 2013 (the
fundraising g


ounced that it would support "end-to-end" encryption for emails. See
"Making End-to-End Encryption Easier to Use," *Google Security Blog*
(June 3, 2014), online.

[13](#c3-note-0013a){#c3-note-0013}  Not all services use algorithms to
sort through data. Twitter does not filter the news stream of individual
users but rather allows users to create their own lists or to rely on
external service providers to select and configure them. This is one of
the reasons why Twitter is regarded as "difficult."


-0019a){#c3-note-0019}  Wolfie Christl, "Kommerzielle
digitale Überwachung im Alltag," *Studie im Auftrag der
Bundesarbeitskammer* (November 2014), online.

[20](#c3-note-0020a){#c3-note-0020}  Viktor Mayer-Schönberger and
Kenneth Cukier, *Big Data: A Revolution That Will Change How We Live,
Work and Think* (Boston, MA: Houghton Mifflin Harcourt, 2013).

[21](#c3-note-0021a){#c3-note-0021}  Carlos Diuk, "The Formation of
Love," *Facebook Data Science Blog* (February 14, 2014), online.

[22](#c3-note-0022a){#c3-note-0022}  Facebook could have determined this
simply by examining the location data that were transmitted by its own
smartphone app. The study in question, however, did not take such
information into account.

[23](#c3-note-0023a){#c3-note-0023}  Dan Lyons, "A Lot of Top
Journalists Don\'t Look at Traffic Numbers: Here\'s Why," *


r local
institutional review board had approved it -- and apparently on the
grounds that Facebook apparently manipulates people\'s News Feeds all
the time."

[27](#c3-note-0027a){#c3-note-0027}  In a rare moment of openness, the
founder of a large dating service made the following remark: "But guess
what, everybody: []{#Page_198 type="pagebreak" title="198"}if you use
the Internet, you\'re the subject of hundreds of experiments at any
given time, on every site. That\'s how websites work." See Chri


ense,
to use these services.

[40](#c3-note-0040a){#c3-note-0040}  Mary Madden et al., "Teens, Social
Media and Privacy," *Pew Research Center: Internet, Science & Tech* (May
21, 2013), online.

[41](#c3-note-0041a){#c3-note-0041}  Meta-data are data that provide
information about other data. In the case of an email, the header lines
(the sender, recipient, date, subject, etc.) form the meta-data, while
the data are made up of the actual content of communication. In
practice, however, the two categories cannot always be sharply
distinguished from one another.

[42](#c3-note-0042a){#c3-note-0042}  By manipulating online polls, for
instance, or flooding soc


e Atlantic* (March 25, 2014), online.

[49](#c3-note-0049a){#c3-note-0049}  See the documentary film *Low
Definition Control* (2011), directed by Michael Palm.

[50](#c3-note-0050a){#c3-note-0050}  Felix Stalder, "In der zweiten
digitalen Phase: Daten versus Kommunikation," *Le Monde Diplomatique*
(February 14, 2014), online.

[51](#c3-note-0051a){#c3-note-0051}  In 2009, the European Parliament
and the European Council ratified Directive 2009/72/EC, which stipulates
that, by the year 2020, 80


00
in 2012 to the []{#Page_203 type="pagebreak" title="203"}company Mapbox
in order for the latter to make improvements to OSM\'s infrastructure.

[91](#c3-note-0091a){#c3-note-0091}  This was accomplished, for
instance, by introducing methods for data indexing and quality control.
See Ramthum, "Offene Geodaten durch OpenStreetMap" (cited above).

[92](#c3-note-0092a){#c3-note-0092}  Trevor Paglen and Adam C. Thompson,
*Torture Taxi: On the Trail of the CIA\'s Rendition Flights* (Hoboken,
NJ: Me


permission
of the Estate of Richard Brautigan; all rights reserved.

ISBN-13: 978-1-5095-1959-0

ISBN-13: 978-1-5095-1960-6 (pb)

A catalogue record for this book is available from the British Library.

Library of Congress Cataloging-in-Publication Data

Names: Stalder, Felix, author.

Title: The digital condition / Felix Stalder.

Other titles: Kultur der Digitalitaet. English

Description: Cambridge, UK ; Medford, MA : Polity Press, \[2017\] \|
Includes bibliographical references and index.

Iden


dat in Tenen & Foxman 2014


er contains the following
provisions:

-- We neither condone nor condemn any forms of information exchange.\
-- We strive to protect our sources and do not retain any identifying
personal information.\
-- We seek transparency in sharing our methods, data, and findings with
the widest possible audience.\
-- Credit where credit is due. We believe in documenting attribution
thoroughly.\
-- We limit our usage of licensed material to the analysis of metadata,
with results used for non-commercial, nonprof


s, Goals, and
Scope of the Project." He answers: "we loot sites with ready-made
collections," "sort the indices in arbitrary normalized formats," "for
uncatalogued books we build a 'technical index': name of file, size,
hashcode," "write scripts for database sorting after the initial catalog
process," "search the database," "use the database for the construction
of an accessible catalog," "build torrents for the distribution of files
in the collection."^[29](#fn-2025-29){#fnref-2025-29}^ But, "everything
begins with the forum," in the words of another founding
member.^[30](#fn-2025-


veals a multitude
of concious choices that work to further atomize *Aleph* and to
decentralize it along the axes of the collection, governance, and
engineering.

By March of 2009 these efforts resulted in approximately 79k volumes or
around 180gb of data.^[36](#fn-2025-36){#fnref-2025-36}^ By December of
the same year, the moderators began talking about a terabyte, 2tb in
2010, and around 7tb by 2011.^[37](#fn-2025-37){#fnref-2025-37}^ By
2012, the core group of "prospectors" grew to 1,000 registere


als behind the *BitTorrent* protocol. At its bare minimum (as
it was described in the original specification by Bram Cohen) the
protocol involves a "seeder," someone willing to share something it its
entirety; a "leecher," someone downloading shared data; and a torrent
"tracker" that coordinates activity between seeders and
leechers.^[41](#fn-2025-41){#fnref-2025-41}^

Imagine a music album sharing agreement between three friends, where,
initially, only one holds a copy of some album: for example, N


power of *BitTorrent* comes from shifting the burden of sharing from a
single seeder (friend one) to a "swarm" of leechers (friends two and
three). On this model, the first leecher joining the network (friend
two, in our case) would begin to get his data from the seeder directly,
as before. But the second leecher would receive some bits from the
seeder and some from the first leecher, in a non-linear, asynchronous
fashion. In our example, we can imagine the remaining friend getting
some songs from t


h* combats the problem of fading torrents by renting
"seedboxes"--servers dedicated to keeping the *Aleph* seeds containing
the archive alive, preserving the availability of the collection. The
server in production as of 2014 can serve up to 12tb of data speeds of
100-800 megabits per second. Other file sharing communities address the
issue by enforcing a certain download to upload ratio on members of
their network.

The lack of true anonymity is the second problem intrinsic to the
*BitTorrent* prot


sustainability of *Aleph* as a distributed system therefore
requires a rare participant: one interested in downloading the archive
as a whole (as opposed to downloading individual books), one who owns
the hardware to store and transmit terabytes of data, and one possessing
the technical expertise to do so safely.

**Peer preservation**

In light of the challenges and the effort involved in maintaining the
archive, one would be remiss to describe *Aleph* merely in terms of book
piracy, understood in


ike *Wikipedia*, which, according to recent studies, saw a
decline in new contributors due to increasingly strict rule
enforcement.^[54](#fn-2025-54){#fnref-2025-54}^ However, our results are
merely speculative at the moment. The analysis of a large dataset we
have collected as corollary to our field work online may offer further
evidence for these initial intuitions. In the meantime, it is not enough
to conclude that brick-and-mortar libraries should learn from these
emergent, distributed architect


dat in Thylstrup 2019


ctice of mass digitization is forming
new nexuses of knowledge, and new ways of engaging with that knowledge. What
at first glance appears to be a simple act of digitization (the transformation
of singular books from boundary objects to open sets of data), reveals, on
closer examination, a complex process teeming with diverse political, legal,
and cultural investments and controversies.

This volume asks why mass digitization has become such a “matter of concern,”2
and explores its implications


his volume
argues that the shape-shifting quality of mass digitization, and its social
dynamics, alters the politics of cultural memory institutions. Two movements
simultaneously drive mass digitization programs: the relatively new phenomenon
of big data gold rushes, and the historically more familiar archival
accumulative imperative. Yet despite these prospects, mass digitization
projects are also uphill battles. They are costly and speculative processes,
with no guaranteed rate of return, and they


is book asks the question
of how mass digitization affects the politics of cultural memory institutions.
As a matter of practice, something is clearly changing in the conversion of
bounded—and scarce—historical material into ubiquitous ephemeral data. In
addition to the technical aspects of digitization, mass digitization is also
changing the political territory of cultural memory objects. Global commercial
platforms are increasingly administering and operating their scanning
activities in favor


ter led a nomadic life, moving from The
Hague to Brussels and then in 1993 to the city of Mons in Belgium, where it
now exists as a museum called the Mundaneum Archive Center. Fatefully, Mons, a
former mining district, also houses Google’s largest data center in Europe and
it did not take Google long to recognize the cultural value in entering a
partnership with the Mundaneum, the two parties signing a contract in 2013.
The contract entailed among other things that Google would sponsor a traveling


as “librarians,”
“cultural works,” and “taxonomies,” and cultural memory practices such as
“curating,” “reading,” and “ownership.” Librarians were “disintermediated” by
technology, cultural works fragmented into flexible data, and curatorial
principles were revised and restructured just as reading was now beginning to
take place in front of screens, meaning-making to be performed by machines,
and ownership of works to be substituted by contractual renewals.

Thinking abo


andardization is linked with globalization (and various neoliberal
regimes) and the attendant widespread contraction of the state, while on the
other hand, standardization implies a reconfiguration of everyday life.98
Standards allow for both minute data analytics and overarching political
systems that “govern at a distance.”99 Standardization understood in this way
is thus a mode of capturing, conceptualizing, and configuring reality, rather
than simply an economic instrument or lubricant. In a


rds
to file standards such as Word and MP4 and HTTP.103 Moreover, mass
digitization assemblages confront users with a series of additional standards,
from cultural standards of tagging to technical standards of interoperability,
such as the European Data Model (EDM) and Google’s schema.org, or legal
standards such as copyright and privacy regulations. Yet, while these
standards share affinities with the standardization processes of
industrialization, in many respects they also deviate from them. I



and interoperability to increase their range.107 One area of such
reconfiguration in mass digitization is the taxonomic field, where stable
institutional taxonomic structures are converted to new flexible modes of
knowledge organization like linked data.108 Linked data can connect cultural
memory artifacts as well as metadata in new ways, and the move from a cultural
memory web of interlinked documents to a cultural memory web of interlinked
data can potentially “amplify the impact of the work of libraries and
archives.”109 However, in order to work effectively, linked data demands
standards and shared protocols.

Flexibility allows the user a freer range of actions, and thus potentially
also the possibility of innovation. These affordances often translate into
user freedom or empowerment. Yet flexibility does not nece


012. 48. Beagle et al. 2003; Lavoie and Dempsey 2004; Courant 2006;
Earnshaw and Vince 2007; Rieger 2008; Leetaru 2008; Deegan and Sutherland
2009; Conway 2010; Samuelson 2014. 49. The earliest textual reference to the
mass digitization of books dates to the early 1990s. Richard de Gennaro,
Librarian of Harvard College, in a panel on funding strategies, argued that an
existing preservation program called “brittle books” should take precedence
over other preservation strategies such as mass d


2017. 101.
Busch 2011. 102. Peters 2015, 224. 103. DeNardis 2011. 104. Hall and Jameson
1990. 105. Kolko 1988. 106. Agre 2000. 107. For more on the importance of
standard flexibility in digital networks, see Paulheim 2015. 108. Linked data
captures the intellectual information users add to information resources when
they describe, annotate, organize, select, and use these resources, as well as
social information about their patterns of usage. On one hand, linked data
allows users and institutions to create taxonomic categories for works on a
par with cultural memory experts—and often in conflict with such experts—for
instance by linking classical nudes with porn; and on the other hand, it
allows users and institutions to harness social information about patterns of
use. Linked data has ideological and economic underpinnings as much as
technical ones. 109.  _The National Digital Platform: for Libraries, Archives
and Museums_ , 2015, report-national-digital-platform>. 110.


ttititudes and pessimism about “the end of the book” to the
triumphalist mythologizing of liquid virtual books that were shedding their
analog ties like butterflies shedding their cocoons.

The most widely publicized mass digitization project to date, Google Books,
precipitated the entire emotional spectrum that could arise from these textual
transversals: from fears that control over culture was slipping from authors
and publishers into the hands of large tech companies, to hopeful ideas about


the creation of new habits and techniques of acceleration and
rationalization that tie in with the politics of digital culture and digital
devices. The industrial scaling of mass digitization becomes a crucial part of
the industrial apparatus of big data, which provide new modes of inscription
for both individuals and digital industries that in turn can be capitalized on
via data-mining, just as it raises questions of digital labor and copyright.

Yet, what kinds of scaling techniques—and what kinds of investments—Google
would have to leverage to achieve its initial goals were still unclear to
Google in those early years


n with scale and time, Google
bought a consignment of books from a second-hand book store in Arizona. They
scanned them and subsequently experimented with how to best index these works
not only by using information from the book, but also by pulling data about
the books from various other sources on the web. These extractions allowed
them to calculate a work’s relevance and importance, for instance by looking
at the number of times it had been referred to.12

In 2004 Google was also granted patent


oved to be much more complex
than the simple physical exchange of books and digital files. As the next
section outlines, this complex system of cultural production was held together
by contractual arrangement—central joints, as it were, connecting data and
works, public and private, local and global, in increasingly complex ways. For
Google Books, these contractual relations appear as the connective tissues
that make these assemblages possible, and which are therefore fundamental to
their affectiv


ural infrastructures. These infrastructures
are governed less by the hierarchical world of curators, historians, and
politicians, and more by feedback networks of tech companies, users, and
algorithms. Moreover, they forge ever closer connections to data-driven market
logics, where computational rather than representational power counts. Mass
digitization PPPs such as Google Books are thus also symptoms of a much more
pervasive infrapolitical situation, in which cultural memory institutions are
incr


ard scientific publications. As mathematicians Eitan Pechenik et
al. show, the contents of the Google Books corpus in the period of the 1900s
is “increasingly dominated by scientific publications rather than popular
works,” and “even the first data set specifically labeled as fiction appears
to be saturated with medical literature.”64 The fact that Google Books is
constellated in such a manner thus challenges a “vast majority of existing
claims drawn from the Google Books corpus,” just as it points to the need “to
fully characterize the dynamics of the corpus before using these data sets to
draw broad conclusions about cultural and linguistic evolution.”65

Last but not least, Google Books’s collection still bespeaks its beginnings:
it still primarily covers Anglophone ground. There is hardly any literature
that reviews the


arked lack of online availability of twentieth-century
collections.” 33 The lack of a common copyright mechanism not only hinders
online availability, but also challenges European cross-border digitization
projects as well as the possibilities for data-mining collections à la Google
because of the difficulties connected to ascertaining the relevant
public domain and hence definitively flagging the public domain status of an
object.34

While Europeana’s twentieth-century black hole poses a prob


eoffrey Bowker, and
others have successfully managed to frame infrastructure “not only in terms of
human versus technological components but in terms of a set of interrelated
social, organizational, and technical components or systems (whether the data
will be shared, systems interoperable, standards proprietary, or maintenance
and redesign factored in).”50 It follows, then, as Christine Borgman notes,
that even if interoperability in technical terms is a “feature of products and
services that allows the connection of people, data, and diverse systems,”51
policy practice, standards and business models, and vested interest are often
greater determinants of interoperability than is technology.52 In similar
terms, information science scholar Jerome Mcdonough notes that “we n


ultural memory industry, such as private individual users and commercial
industries.57

The logic of interoperability is also born of a specific kind of
infrapolitics: the politics of modular openness. Interoperability is motivated
by the “open” data movements that seek to break down proprietary and
disciplinary boundaries and create new cultural memory infrastructures and
ways of working with their collections. Such visions are often fueled by
Lawrence Lessig’s conviction that “the most imp


.67 And as the already monumental
and ever accelerating digital collections exceed human curatorial capacity,
the computing power of machines and cognitive capabilities of ordinary
citizens is increasingly needed to penetrate and make meaning of the data
accumulations.

What role is Europeana’s user given in this new environment? With the
increased modulation of public-private boundaries, which allow different
modules to take on different tasks and on different levels, the strict
separation betwee


ext.
4. 4\. Augmenting collections, that is, enriching collections with additional dimensions. One example is the recently launched Europeana Sound Connections, which encourages and enables visitors to “actively enrich geo-pinned sounds from two data providers with supplementary media from various sources. This includes using freely reusable content from Europeana, Flickr, Wikimedia Commons, or even individuals’ own collections.”69
5. 5\. And finally, Europeana also offers participation th


e premature digital
distribution of _Mein Kampf_ in Euro­peana was thus, according to copyright
legislation, illegal. While the _Mein Kampf_ case was extraordinary, it
flagged a more fundamental problem of how to police and analyze all the
incoming data from individual cultural heritage institutions.

On a more fundamental level, however, _Mein Kampf_ indicated not only a legal,
but also a political, issue for Europeana: how to deal with the expressions
that Europeana’s feedback mechanisms facilitated. Mass digitization promoted a
new kind of cultural memory logic, namely of feedback. Feedback mechanisms are
central to data-driven companies like Google because they offer us traces of
the inner worlds of people that would otherwise never appear in empirical
terms, but that can be catered to in commercial terms. 81 Yet, while the
traces might interest the corporation (or


h nation-state
contributes to Europeana.83 So while Europeana is in principle representing
Europe’s collective cultural memory, in reality it represents a highly
fragmented image of Europe with a lot of European countries not even appearing
in the databases. Moreover, even these numbers are potentially misleading, as
one information scholar formerly working with Europeana notes: to pump up
their statistical representation, many institutions strategically invented
counting systems that would make t


erms, they recount not only the classic tale of a fragmented Europe
but also how Europe is increasingly perceived, represented, and managed by
calculative technologies. In technical terms, they reveal the gray areas of
how to delineate and calculate data: what makes a data object? And in cultural
policy terms, they reflect the highly divergent prioritization of mass
digitization in European countries.

The final question is, then: how is this fragmented European collection
distributed? This is the point where European


es/Europeana_Professional/Advocacy/Twentieth%20Century%20Black%20Hole
/copy-of-europeana-policy-illustrating-the-20th-century-black-hole-in-the-
europeana-dataset.pdf> . 34. C. Handke, L. Guibault, and J. J. Vallbé, “Is
Europe Falling Behind in Data Mining? Copyright’s Impact on Data Mining in
Academic Research,” 2015, id-12015-15-handke-elpub2015-paper-23>. 35. Interview with employee, DG
Copyright, DC Commission, 2010. 36. Interview with employee, DG Information
and Societ


der and connect to useful information across
systems, and calls for interoperability have increased as systems have become
increasingly complex. 44. There are “myriad technical and engineering issues
associated with connecting together networks, databases, and other computer-
based systems”; digitized cultural memory institutions have the option of
providing “a greater array of services” than traditional libraries and
archives from sophisticated search engines to document reformatting as r


noncapitalist practices of dissent without
profit motives.24 The dissent, however, was not necessarily explicitly
expressed. Lacking the defining fervor of a clear political ideology, and
offering no initiatives to overthrow the Soviet regime, samizdat was rather a
mode of dissent that evaded centralized ideological control. Indeed, as
Aleksei Yurchak notes, samizdat practices could even be read as a mode of
“suspending the political,” thus “avoiding the political concerns that had a
binary logic determined by the sovereign state” to demonstrate “to themselves
and to others that there were subjects, col


of life, and
physical and symbolic spaces in the Soviet context that, without being overtly
oppositional or even political, exceeded that state’s abilities to define,
control, and understand them.”25 Yurchak thus reminds us that even though
samizdat was practiced as a form of nonpolitical practice, it nevertheless
inherently had significant political implications.

The infrapolitics of samizdat not only referred to a specific social practice
but were also, as Ann Komaromi reminds us, a particular discourse network
rooted in the technology of the typewriter: “Because so many people had their
own typewriters, the production of samizdat was more individual and typically
less linked to ideology and organized political structures. … The circulation
of Samizdat was more rhizomatic and spontaneous than the underground
press—samizdat was like mushroom ‘spores.’”26 The technopolitical
infrastructure of samizdat changed, however, with the fall of the Berlin Wall
in 1989, the further decentralization of the Russian media landscape, and the
emergence of digitization. Now, new nodes emerged in the Russian information
landscape, and there was no centralized auth


,
the transmission of the Western capitalist system gave rise to new types of
shadow activity that produced items instead of just sharing items, adding a
new consumerist dimension to shadow libraries. Indeed, as Kuznetsov notes, the
late-Soviet samizdat created a dynamic textual space that aligned with more
general tendencies in mass digitization where users were “both readers and
librarians, in contrast to a traditional library with its order, selection,
and strict catalogisation.”27

If many o


library’s operator Ilya Larin adhered
to the international piracy movement, calling his site a pirate library and
gracing Librusek’s website with a small animated pirate, complete with sabre
and parrot.

The integration and proliferation of samizdat practices into a complex
capitalist framework produced new global readings of the infrapolitics of
shadow libraries. Rather than reading shadow libraries as examples of late-
socialist infrapolitics, scholars also framed them as capitalist symptoms o


eeded to stem
infringing activity. Yet, this book argues that Karaganis’s report, and the
approach it represents, also frames the infrapolitics of shadow libraries
within a consumerist framework that excises the noncommercial infrapolitics of
samizdat from the picture. The increasing integration of Russian media
infrapolitics into Western apparatuses, and the reframing of shadow libraries
from samizdat practices of political dissent to market failure, situates the
infrapolitics of shadow libraries within a consumerist dispositive and the
individual participants as consumers. As some critical voices suggest, this
has an impact on the political poten


aries
as the “other” in the landscape of mass digitization. Shadow libraries
instigate new creative relations, the dynamics of which are infrastructurally
premised upon the medium they use. Just as typewriters were an important
component of samizdat practices in the Soviet Union, digital infrastructures
are central components of shadow libraries, and in many respects shadow
libraries bring to the fore the same cultural-political questions as other
forms of mass digitization: questions of territo


for quantity, which drives mass digitization, is—much like the Borges stories
to which Kelly also refers—laced with ambivalence. On the one hand, the
quantitative aspirations are driven forth by the basic assumption that “more
is more”: more data and more cultural memory equal better industrial and
intellectual progress. One the other hand, the sheer scale of ambition also
causes frustration, anxiety, and failed plans.

The sense that sheer size and big numbers hold the promise of progress a


11 In this way, new
questions with old trajectories arise: What is important for understanding a
collection and its life? What should be included and excluded? And how will we
know what will turn out to be important in the future?

In the era of big data, the imperative is often to digitize and “save all.”
Prestige mass digitization projects such as Google Books and Europeana have
thus often contextualized their importance in terms of scale. Indeed, as we
saw in the previous chapters, the questi


tion of authority generates anxiety in the cultural memory circles
that had hitherto been able to hold claim to knowledge organization expertise.
This is the dizzying perspective that haunts the cultural memory professionals
faced with Europeana’s data governance model. Thus, as one Europeana
professional explained to me in 2010, “Europeana aims at an open-linked-data
model with a number of implications. One implication is that there will be no
control of data usage, which makes it possible, for instance, to link classics
with porn. Libraries do not agree to this loss of control which was at the
base of their self-understanding.”60 The Europeana professional then proceeded
to recount the profound anxiet


so many projects that existed outside social
media platforms and operated across mass digitization projects. One example
was the “serendipity engine,” Serendip-o-matic, which first examined the
user’s research interests and then, based on this data, identified “related
content in locations such as the Digital Public Library of America (DPLA),
Europeana, and Flickr Commons.”80 While this initiative was not endorsed by
any of these mass digitization projects, they nevertheless featured it on


ing into a parasitical game of relational network effects, where
different platforms challenge and use each other to gain more views and
activity. This gives successful platforms a great advantage in the digital
economy. They not only gain access to data, but they also control the rules of
how the data is to be managed and governed. Therefore, when a user is surfing
Google Books, Google—and not the library—collects the user’s search queries,
including results that appeared in searches and pages the user visited from
the search. The browser, moreover, tracks the user’s activity, including pages
the user has visited and when, user data, and possibly user login details with
auto-fill features, user IP address, Internet service provider, device
hardware details, operating system and browser version, cookies, and cached
data from websites. The labyrinthine infrastructure of the mass digitization
ecosystem also means that if you access one platform through another, your
data will be collected in different ways. Thus, if you visit Europeana through
Facebook, it will be Facebook that collects your data, including name and
profile; biographical information such as birthday, hometown, work history,
and interests; username and unique identifier; subscriptions, location,
device, activity date, time and time-zone, activities; and likes, check-ins,
and events.115 As more platforms emerge from which one can access mass
digitized archives, such as social media sites like Facebook, Google+,
Pinterest, and Twitter, as well as mobile devices such as Android, gaining an
overview of who collects one’s data and how becomes more nebulous.

Europeana’s reminder illustrates the assemblatic infrastructural set-up of
mass digitization projects and how they operate with multiple entry points,
each of which may attach its own infrapolitical dynamics. It als


trained—pre-
determined by a set of design decisions about what is necessary, relevant and
useful. Platforms put those design decisions back into the hands of users.
Instead of a single interface, there are innumerable ways of interacting with
the data.” See Tim Sherratt, “From Portals to Platforms; Building New
Frameworks for User Engagement,” National Library of Australia, November 5,
2013, platform>. 98. “Europeana


_Wired_ , November 4, 2017, /how-google-book-search-got-lost>. 7. What to make, for instance, of the new
trend of employing Google’s neural networks to find one’s museum doppelgänger
from the company’s image database? Or the fact that Google Cultural Institute
is consistently turning out new cultural memory hacks such as its cardboard VR
glasses, its indoor mapping of museum spaces, and its gigapixel Art Camera
which reproduces artworks in uncanny detail. Or


te Sovereignty in Europe and Beyond_. New York: Palgrave Macmillan.
4. Agre, Philip E. 2000. “The Market Logic of Information.” _Knowledge, Technology & Policy_ 13 (3): 67–77.
5. Aiden, Erez, and Jean-Baptiste Michel. 2013. _Uncharted: Big Data as a Lens on Human Culture_. New York: Riverhead Books.
6. Ambati, Vamshi, N. Balakrishnan, Raj Reddy, Lakshmi Pratha, and C. V. Jawahar. 2006. “The Digital Library of India Project: Process, Policies and Architecture.” _CiteSeer_.


for Coping with Information Overload ca. 1550–1700.” _Journal of the History of Ideas_ 64 (1): 11–28.
34. Bloom, Harold. 2009. _The Labyrinth_. New York: Bloom’s Literary Criticism.
35. Bodó, Balazs. 2015. “The Common Pathways of Samizdat and Piracy.” In _Samizdat: Between Practices and Representations_ , ed. V. Parisi. Budapest: CEU Institute for Advanced Study. Available at SSRN; .
36. Bodó, Balazs. 2016. “Libraries in the Post-Scarcity Era.


ed. Ivo de Gennaro. Leiden, the Netherlands: Brill.
42. Borghi, Maurizio, and Stavroula Karapapa. 2013. _Copyright and Mass Digitization: A Cross-Jurisdictional Perspective_. Oxford: Oxford University Press.
43. Borgman, Christine L. 2015. _Big Data, Little Data, No Data: Scholarship in the Networked World_. Cambridge, MA: MIT Press.
44. Bottando, Evelyn. 2012. _Hedging the Commons: Google Books, Libraries, and Open Access to Knowledge_. Iowa City: University of Iowa.
45. Bowker, Geoffrey C., Karen Baker, Floren


dgehampton, NY: Bridge Works Pub. Co.
149. Hayles, N. Katherine. 2005. _My Mother Was a Computer: Digital Subjects and Literary Texts_. Chicago: University of Chicago Press.
150. Helmond, Anne. 2015. “The Platformization of the Web: Making Web Data Platform Ready.” _Social Media + Society_ 1 (2). .
151. Hicks, Marie. 2018. _Programmed Inequality: How Britain Discarded Women Technologists and Lost its Edge in Computing_. Cambrid


Knowledge and Power: The Library as a Gendered Space in the Western Imaginary_. Utrecht, the Netherlands: Utrecht University.
170. Kolko, Joyce. 1988. _Restructuring the World Economy_. New York: Pantheon Books.
171. Komaromi, Ann. 2012. “Samizdat and Soviet Dissident Publics.” _Slavic Review_ 71 (1): 70–90.
172. Kramer, Bianca. 2016a. “Sci-Hub: Access or Convenience? A Utrecht Case Study, Part 1.” _I &M / I&O 2.0_, June 20.


ihar K., Bharat Kumar, and Ashis K. Pani. 2014. _Progressive Trends in Electronic Resource Management in Libraries_. Hershey, PA: Information Science Reference.
227. Paulheim, Heiko. 2015. “What the Adoption of Schema.org Tells About Linked Open Data.” _CEUR Workshop Proceedings_ 1362:85–90.
228. Peatling, G. K. 2004. “Public Libraries and National Identity in Britain, 1850–1919.” _Library History_ 20 (1): 33–47.
229. Pechenick, Eitan A., Christopher M. Danforth, Peter S. Dodds,


ironments for Learning_ , eds. G. Dettori, T. Giannetti, A. Paiva, and A. Vaz, 103–114. Dordrecht: Sense Publishers.
306. Walker, Neil. 2003. _Sovereignty in Transition_. Oxford: Hart.
307. Weigel, Moira. 2016. _Labor of Love: The Invention of Dating_. New York: Farrar, Straus and Giroux.
308. Weiss, Andrew, and Ryan James. 2012. “Google Books’ Coverage of Hawai’i and Pacific Books.” _Proceedings of the American Society for Information Science and Technology_ 49 (1): 1–3.
309. We


without permission in writing from the
publisher.

This book was set in ITC Stone Sans Std and ITC Stone Serif Std by Toppan
Best-set Premedia Limited. Printed and bound in the United States of America.

Library of Congress Cataloging-in-Publication Data

Names: Thylstrup, Nanna Bonde, author.

Title: The politics of mass digitization / Nanna Bonde Thylstrup.

Description: Cambridge, MA : The MIT Press, [2018] | Includes bibliographical
references and index.

Identifiers: LCCN 2018010472 | ISBN 9780


dat in WHW 2016


itical terrain became blatantly direct was the exhibition Written-off: On the Occasion of the 20th Anniversary of Operation
Storm, which we organized in the summer of 2015 at Gallery Nova (figs.
2–4).
The exhibition/action Written-off was based on data from Ante Lesaja’s
extensive research on “library purification”, which he published in his book
Knjigocid: Uništavanje knjige u Hrvatskoj 1990-ih (Libricide: The Destruction
of Books in Croatia in the 1990s).18 People were invited to bring in


adrid, 2014.
Photo by Joaquín Cortés and Román Lores / MNCARS.

through the control of metadata (information about information),21 Public Library shifts the focus away from aesthetic intention – from unique,
closed, and discrete works – to a database of works and the metabolism
of the database. It creates values through indexing and connectivity, imagined communities and imaginative dialecticization. The web of interpenetration and determination activated by Public Library creates a pedagogical endeavour that also includes a propagand


t Autonomy
Cube enables and protects is that it prevents so-called traffic analysis – the
tracking, analysis, and theft of metadata for the purpose of anticipating
people’s behaviour and relationships. In the hands of the surveillance
state this data becomes not only a means of steering our tastes, modes of
consumption, and behaviours for the sake of making profit but also, and
more crucially, an effective method and weapon of political control that
can affect political organizing in often still

 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.