Medak, Mars & WHW
Public Library
2015


Public Library

may • 2015
price 50 kn

This publication is realized along with the exhibition
Public Library • 27/5 –13/06 2015 • Gallery Nova • Zagreb
Izdavači / Publishers
Editors
Tomislav Medak • Marcell Mars •
What, How & for Whom / WHW
ISBN 978-953-55951-3-7 [Što, kako i za koga/WHW]
ISBN 978-953-7372-27-9 [Multimedijalni institut]
A Cip catalog record for this book is available from the
National and University Library in Zagreb under 000907085

With the support of the Creative Europe Programme of the
European Union

ZAGREB • ¶ May • 2015

Public Library

1.
Marcell Mars, Manar Zarroug
& Tomislav Medak

75

Public Library (essay)
2.
Paul Otlet

87

Transformations in the Bibliographical
Apparatus of the Sciences
(Repertory — Classification — Office
of Documentation)
3.
McKenzie Wark

111

Metadata Punk
4.
Tomislav Medak
The Future After the Library
UbuWeb and Monoskop’s Radical Gestures

121

Marcell Mars,
Manar Zarroug
& Tomislav Medak

Public library (essay)

In What Was Revolutionary about the French Revolution? 01 Robert Darnton considers how a complete collapse of the social order (when absolutely
everything — all social values — is turned upside
down) would look. Such trauma happens often in
the life of individuals but only rarely on the level
of an entire society.
In 1789 the French had to confront the collapse of
a whole social order—the world that they defined
retrospectively as the Ancien Régime — and to find
some new order in the chaos surrounding them.
They experienced reality as something that could
be destroyed and reconstructed, and they faced
seemingly limitless possibilities, both for good and
evil, for raising a utopia and for falling back into
tyranny.02
The revolution bootstraps itself.
01 Robert H. Darnton, What Was Revolutionary about the
French Revolution? (Waco, TX: Baylor University Press,
1996), 6.
02 Ibid.

Public library (essay)

75

In the dictionaries of the time, the word revolution was said to derive from the verb to revolve and
was defined as “the return of the planet or a star to
the same point from which it parted.” 03 French political vocabulary spread no further than the narrow
circle of the feudal elite in Versailles. The citizens,
revolutionaries, had to invent new words, concepts
… an entire new language in order to describe the
revolution that had taken place.
They began with the vocabulary of time and space.
In the French revolutionary calendar used from 1793
until 1805, time started on 1 Vendémiaire, Year 1, a
date which marked the abolition of the old monarchy on (the Gregorian equivalent) 22 September
1792. With a decree in 1795, the metric system was
adopted. As with the adoption of the new calendar,
this was an attempt to organize space in a rational
and natural way. Gram became a unit of mass.
In Paris, 1,400 streets were given new names.
Every reminder of the tyranny of the monarchy
was erased. The revolutionaries even changed their
names and surnames. Le Roy or Leveque, commonly
used until then, were changed to Le Loi or Liberté.
To address someone, out of respect, with vous was
forbidden by a resolution passed on 24 Brumaire,
Year 2. Vous was replaced with tu. People are equal.
The watchwords Liberté, égalité, fraternité (freedom, equality, brotherhood)04 were built through
03 Ibid.
04 Slogan of the French Republic, France.fr, n.d.,
http://www.france.fr/en/institutions-and-values/slogan
-french-republic.html.

76

M. Mars • M. Zarroug • T. Medak

literacy, new epistemologies, classifications, declarations, standards, reason, and rationality. What first
comes to mind about the revolution will never again
be the return of a planet or a star to the same point
from which it departed. Revolution bootstrapped,
revolved, and hermeneutically circularized itself.
Melvil Dewey was born in the state of New York in
1851.05 His thirst for knowledge was found its satisfaction in libraries. His knowledge about how to
gain knowledge was developed by studying libraries.
Grouping books on library shelves according to the
color of the covers, the size and thickness of the spine,
or by title or author’s name did not satisfy Dewey’s
intention to develop appropriate new epistemologies in the service of the production of knowledge
about knowledge. At the age of twenty-four, he had
already published the first of nineteen editions of
A Classification and Subject Index for Cataloguing
and Arranging the Books and Pamphlets of a Library,06 the classification system that still bears its
author’s name: the Dewey Decimal System. Dewey
had a dream: for his twenty-first birthday he had
announced, “My World Work [will be] Free Schools
and Free Libraries for every soul.”07
05 Richard F. Snow, “Melvil Dewey”, American Heritage 32,
no. 1 (December 1980),
http://www.americanheritage.com/content/melvil-dewey.
06 Melvil Dewey, A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a
Library (1876), Project Gutenberg e-book 12513 (2004),
http://www.gutenberg.org/files/12513/12513-h/12513-h.htm.
07 Snow, “Melvil Dewey”.

Public library (essay)

77

His dream came true. Public Library is an entry
in the catalog of History where a fantastic decimal08
describes a category of phenomenon that—together
with free public education, a free public healthcare,
the scientific method, the Universal Declaration of
Human Rights, Wikipedia, and free software, among
others—we, the people, are most proud of.
The public library is a part of these invisible infrastructures that we start to notice only once they
begin to disappear. A utopian dream—about the
place from which every human being will have access to every piece of available knowledge that can
be collected—looked impossible for a long time,
until the egalitarian impetus of social revolutions,
the Enlightment idea of universality of knowledge,
and the expcetional suspenssion of the comercial
barriers to access to knowledge made it possible.
The internet has, as in many other situations, completely changed our expectations and imagination
about what is possible. The dream of a catalogue
of the world — a universal approach to all available
knowledge for every member of society — became
realizable. A question merely of the meeting of
curves on a graph: the point at which the line of
global distribution of personal computers meets
that of the critical mass of people with access to
the internet. Today nobody lacks the imagination
necessary to see public libraries as part of a global infrastructure of universal access to knowledge
for literally every member of society. However, the
08 “Dewey Decimal Classification: 001.”, Dewey.info, 27 October 2014, http://dewey.info/class/001/2009-08/about.en.

78

M. Mars • M. Zarroug • T. Medak

emergence and development of the internet is taking place precisely at the point at which an institutional crisis—one with traumatic and inconceivable
consequences—has also begun.
The internet is a new challenge, creating experiences commonly proferred as ‘revolutionary’. Yet, a
true revolution of the internet is the universal access
to all knowledge that it makes possible. However,
unlike the new epistemologies developed during
the French revolution the tendency is to keep the
‘old regime’ (of intellectual property rights, market
concentration and control of access). The new possibilities for classification, development of languages,
invention of epistemologies which the internet poses,
and which might launch off into new orbits from
existing classification systems, are being suppressed.
In fact, the reactionary forces of the ‘old regime’
are staging a ‘Thermidor’ to suppress the public libraries from pursuing their mission. Today public
libraries cannot acquire, cannot even buy digital
books from the world’s largest publishers.09 The
small amount of e-books that they were able to acquire already they must destroy after only twenty-six
lendings.10 Libraries and the principle of universal
09 “American Library Association Open Letter to Publishers on
E-Book Library Lending”, Digital Book World, 24 September
2012, http://www.digitalbookworld.com/2012/americanlibrary-association-open-letter-to-publishers-on-e-booklibrary-lending/.
10 Jeremy Greenfield, “What Is Going On with Library E-Book
Lending?”, Forbes, 22 June 2012, http://www.forbes.com/
sites/jeremygreenfield/2012/06/22/what-is-going-on-withlibrary-e-book-lending/.

Public library (essay)

79

access to all existing knowledge that they embody
are losing, in every possible way, the battle with a
market dominated by new players such as Amazon.
com, Google, and Apple.
In 2012, Canada’s Conservative Party–led government cut financial support for Libraries and
Archives Canada (LAC) by Can$9.6 million, which
resulted in the loss of 400 archivist and librarian
jobs, the shutting down of some of LAC’s internet
pages, and the cancellation of the further purchase
of new books.11 In only three years, from 2010 to
2012, some 10 percent of public libraries were closed
in Great Britain.12
The commodification of knowledge, education,
and schooling (which are the consequences of a
globally harmonized, restrictive legal regime for intellectual property) with neoliberal austerity politics
curtails the possibilities of adapting to new sociotechnological conditions, let alone further development, innovation, or even basic maintenance of
public libraries’ infrastructure.
Public libraries are an endangered institution,
doomed to extinction.
Petit bourgeois denial prevents society from confronting this disturbing insight. As in many other
fields, the only way out offered is innovative mar11 Aideen Doran, “Free Libraries for Every Soul: Dreaming
of the Online Library”, The Bear, March 2014, http://www.
thebear-review.com/#!free-libraries-for-every-soul/c153g.
12 Alison Flood, “UK Lost More than 200 Libraries in 2012”,
The Guardian, 10 December 2012, http://www.theguardian.
com/books/2012/dec/10/uk-lost-200-libraries-2012.

80

M. Mars • M. Zarroug • T. Medak

ket-based entrepreneurship. Some have even suggested that the public library should become an
open software platform on top of which creative
developers can build app stores13 or Internet cafés
for the poorest, ensuring that they are only a click
away from the Amazon.com catalog or the Google
search bar. But these proposals overlook, perhaps
deliberately, the fundamental principles of access
upon which the idea of the public library was built.
Those who are well-meaning, intelligent, and
tactfull will try to remind the public of all the many
sides of the phenomenon that the public library is:
major community center, service for the vulnerable,
center of literacy, informal and lifelong learning; a
place where hobbyists, enthusiasts, old and young
meet and share knowledge and skills.14 Fascinating. Unfortunately, for purely tactical reasons, this
reminder to the public does not always contain an
explanation of how these varied effects arise out of
the foundational idea of a public library: universal
access to knowledge for each member of the society produces knowledge, produces knowledge about
knowledge, produces knowledge about knowledge
transfer: the public library produces sociability.
The public library does not need the sort of creative crisis management that wants to propose what
13 David Weinberger, “Library as Platform”, Library Journal,
4 September 2012, http://lj.libraryjournal.com/2012/09/
future-of-libraries/by-david-weinberger/.
14 Shannon Mattern, “Library as Infrastructure”, Design
Observer, 9 June 2014, http://places.designobserver.com/
entryprint.html?entry=38488.

Public library (essay)

81

the library should be transformed into once our society, obsessed with market logic, has made it impossible for the library to perform its main mission. Such
proposals, if they do not insist on universal access
to knowledge for all members, are Trojan horses for
the silent but galloping disappearance of the public
library from the historical stage. Sociability—produced by public libraries, with all the richness of its
various appearances—will be best preserved if we
manage to fight for the values upon which we have
built the public library: universal access to knowledge for each member of our society.
Freedom, equality, and brotherhood need brave librarians practicing civil disobedience.
Library Genesis, aaaaarg.org, Monoskop, UbuWeb
are all examples of fragile knowledge infrastructures
built and maintained by brave librarians practicing
civil disobedience which the world of researchers
in the humanities rely on. These projects are re-inventing the public library in the gap left by today’s
institutions in crisis.
Library Genesis15 is an online repository with over
a million books and is the first project in history to
offer everyone on the Internet free download of its
entire book collection (as of this writing, about fifteen terabytes of data), together with the all metadata
(MySQL dump) and PHP/HTML/Java Script code
for webpages. The most popular earlier reposito15 See http://libgen.org/.

82

M. Mars • M. Zarroug • T. Medak

ries, such as Gigapedia (later Library.nu), handled
their upload and maintenance costs by selling advertising space to the pornographic and gambling
industries. Legal action was initiated against them,
and they were closed.16 News of the termination of
Gigapedia/Library.nu strongly resonated among
academics and book enthusiasts circles and was
even noted in the mainstream Internet media, just
like other major world events. The decision by Library Genesis to share its resources has resulted
in a network of identical sites (so-called mirrors)
through the development of an entire range of Net
services of metadata exchange and catalog maintenance, thus ensuring an exceptionally resistant
survival architecture.
aaaaarg.org, started by the artist Sean Dockray, is
an online repository with over 50,000 books and
texts. A community of enthusiastic researchers from
critical theory, contemporary art, philosophy, architecture, and other fields in the humanities maintains,
catalogs, annotates, and initiates discussions around
it. It also as a courseware extension to the self-organized education platform The Public School.17
16 Andrew Losowsky, “Library.nu, Book Downloading Site,
Targeted in Injunctions Requested by 17 Publishers,” Huffington Post, 15 February 2012, http://www.huffingtonpost.
com/2012/02/15/librarynu-book-downloading-injunction_
n_1280383.html.
17 “The Public School”, The Public School, n.d.,
https://www.thepublicschool.org/.

Public library (essay)

83

UbuWeb18 is the most significant and largest online
archive of avant-garde art; it was initiated and is lead
by conceptual artist Kenneth Goldsmith. UbuWeb,
although still informal, has grown into a relevant
and recognized critical institution of contemporary
art. Artists want to see their work in its catalog and
thus agree to a relationship with UbuWeb that has
no formal contractual obligations.
Monoskop is a wiki for the arts, culture, and media
technology, with a special focus on the avant-garde,
conceptual, and media arts of Eastern and Central
Europe; it was launched by Dušan Barok and others.
In the form of a blog Dušan uploads to Monoskop.
org/log an online catalog of curated titles (at the
moment numbering around 3,000), and, as with
UbuWeb, it is becoming more and more relevant
as an online resource.
Library Genesis, aaaaarg.org, Kenneth Goldsmith,
and Dušan Barok show us that the future of the
public library does not need crisis management,
venture capital, start-up incubators, or outsourcing but simply the freedom to continue extending
the dreams of Melvil Dewey, Paul Otlet19 and other
visionary librarians, just as it did before the emergence of the internet.

18 See http://ubu.com/.
19 “Paul Otlet”, Wikipedia, 27 October 2014,
http://en.wikipedia.org/wiki/Paul_Otlet.

84

M. Mars • M. Zarroug • T. Medak

With the emergence of the internet and software
tools such as Calibre and “[let’s share books],”20 librarianship has been given an opportunity, similar to astronomy and the project SETI@home21, to
include thousands of amateur librarians who will,
together with the experts, build a distributed peerto-peer network to care for the catalog of available
knowledge, because
a public library is:
— free access to books for every member of society
— library catalog
— librarian
With books ready to be shared, meticulously
cataloged, everyone is a librarian.
When everyone is librarian, library is
everywhere.22


20 “Tools”, Memory of the World, n.d.,
https://www.memoryoftheworld.org/tools/.
21 See http://setiathome.berkeley.edu/.
22 “End-to-End Catalog”, Memory of the World, 26 November 2012,
https://www.memoryoftheworld.org/end-to-end-catalog/.

Public library (essay)

85

Paul Otlet

Transformations
in the Bibliographical Apparatus
of the Sciences [1]
Repertory — Classification — Office
of Documentation
1. Because of its length, its extension to all countries,
the profound harm that it has created in everyone’s
life, the War has had, and will continue to have, repercussions for scientific productivity. The hour for
the revision of the old order is about to strike. Forced
by the need for economies of men and money, and
by the necessity of greater productivity in order to
hold out against all the competition, we are going to
have to introduce reforms into each of the branches
of the organisation of science: scientific research, the
preservation of its results, and their wide diffusion.
Everything happens simultaneously and the distinctions that we will introduce here are only to
facilitate our thinking. Always adjacent areas, or
even those that are very distant, exert an influence
on each other. This is why we should recognize the
impetus, growing each day even greater in the organisation of science, of the three great trends of
our times: the power of associations, technological
progress and the democratic orientation of institutions. We would like here to draw attention to some
of their consequences for the book in its capacity

Transformations In The Bibliographical
Apparatus Of The Sciences

87

as an instrument for recording what has been discovered and as a necessary means for stimulating
new discoveries.
The Book, the Library in which it is preserved,
and the Catalogue which lists it, have seemed for
a long time as if they had achieved their heights of
perfection or at least were so satisfactory that serious
changes need not be contemplated. This may have
been so up to the end of the last century. But for a
score of years great changes have been occurring
before our very eyes. The increasing production of
books and periodicals has revealed the inadequacy of
older methods. The increasing internationalisation
of science has required workers to extend the range
of their bibliographic investigations. As a result, a
movement has occurred in all countries, especially
Germany, the United States and England, for the
expansion and improvement of libraries and for
an increase in their numbers. Publishers have been
searching for new, more flexible, better-illustrated,
and cheaper forms of publication that are better-coordinated with each other. Cataloguing enterprises
on a vast scale have been carried out, such as the
International Catalogue of Scientific Literature and
the Universal Bibliographic Repertory. [2]
Three facts, three ideas, especially merit study
for they represent something really new which in
the future can give us direction in this area. They
are: The Repertory, Classification and the Office of
Documentation.
•••

88

Paul Otlet

2. The Repertory, like the book, has gradually been
increasing in size, and improvements in it suggest
the emergence of something new which will radically modify our traditional ideas.
From the point of view of form, a book can be
defined as a group of pages cut to the same format
and gathered together in such a way as to form a
whole. It was not always so. For a long time the
Book was a roll, a volumen. The substances which
then took the place of paper — papyrus and parchment — were written on continuously from beginning to end. Reading required unrolling. This was
certainly not very practical for the consultation of
particular passages or for writing on the verso. The
codex, which was introduced in the first centuries of
the modern era and which is the basis of our present
book, removed these inconveniences. But its faults
are numerous. It constitutes something completed,
finished, not susceptible of addition. The Periodical
with its successive issues has given science a continuous means of concentrating its results. But, in
its turn, the collections that it forms runs into the
obstacle of disorder. It is impossible to link similar
or connected items; they are added to one another
pell-mell, and research requires handling great masses of heavy paper. Of course indexes are a help and
have led to progress — subject indexes, sometimes
arranged systematically, sometimes analytically,
and indexes of names of persons and places. These
annual indexes are preceded by monthly abstracts
and are followed by general indexes cumulated every
five, ten or twenty-five years. This is progress, but
the Repertory constitutes much greater progress.

Transformations In The Bibliographical
Apparatus Of The Sciences

89

The aim of the Repertory is to detach what the
book amalgamates, to reduce all that is complex to
its elements and to devote a page to each. Pages, here,
are leaves or cards according to the format adopted.
This is the “monographic” principle pushed to its
ultimate conclusion. No more binding or, if it continues to exist, it will become movable, that is to
say, at any moment the cards held fast by a pin or a
connecting rod or any other method of conjunction
can be released. New cards can then be intercalated,
replacing old ones, and a new arrangement made.
The Repertory was born of the Catalogue. In
such a work, the necessity for intercalations was
clear. Nor was there any doubt as to the unitary or
monographic notion: one work, one title; one title,
one card. As a result, registers which listed the same
collections of books for each library but which had
constantly to be re-done as the collections expanded,
have gradually been discarded. This was practical
and justified by experience. But upon reflection one
wonders whether the new techniques might not be
more generally applied.
What is a book, in fact, if not a single continuous line which has initially been cut to the length
of a page and then cut again to the size of a justified
line? Now, this cutting up, this division, is purely
mechanical; it does not correspond to any division
of ideas. The Repertory provides a practical means
of physically dividing the book according to the
intellectual division of ideas.
Thus, the manuscript library catalogue on cards
has been quickly followed by catalogues printed on
cards (American Library Bureau, the Catalogue or

90

Paul Otlet

the Library of Congress in Washington) [3]; then by
bibliographies printed on cards (International Institute of Bibliography, Concilium Bibliographicum)
[4]; next, indices of species have been published on
cards (Index Speciorum) [5]. We have moved from
the small card to the large card, the leaf, and have
witnessed compendia abandoning the old form for
the new (Jurisclasseur, or legal digests in card form).
Even the idea of the encyclopedia has taken this
form (Nelson’s Perpetual Cyclopedia [6]).
Theoretically and technically, we now have in
the Repertory a new instrument for analytically or
monographically recording data, ideas, information. The system has been improved by divisionary cards of various shapes and colours, placed in
such a way that they express externally the outline
of the classification being used and reduce search
time to a minimum. It has been improved further
by the possibility of using, by cutting and pasting,
materials that have been printed on large leaves or
even books that have been published without any
thought of repertories. Two copies, the first providing the recto, the second the verso, can supply
all that is necessary. One has gone even further still
and, from the example of statistical machines like
those in use at the Census of Washington (sic) [7],
extrapolated the principle of “selection machines”
which perform mechanical searches in enormous
masses of materials, the machines retaining from
the thousands of cards processed by them only those
related to the question asked.
•••

Transformations In The Bibliographical
Apparatus Of The Sciences

91

3. But such a development, like the Repertory before it, presupposes a classification. This leads us to
examine the second practical idea that is bringing
about the transformation of the book.
Classification plays an enormous role in scientific thought. If one could say that a science was a
well-made language, one could equally assert that
it is a completed classification. Science is made up
of verified facts which are organised in a structure
of systems, hypotheses, theories, laws. If there is
a certain order in things, it is necessary to have it
also in science which reflects and explains nature.
That is why, since the time of Greek thought until
the present, constant efforts have been made to improve classification. These have taken three principal directions: classification studied as an activity
of the mind; the general classification and sequence
of the sciences; the systematization appropriate to
each discipline. The idea of order, class, genus and
species has been studied since Aristotle, in passing
by Porphyrus, by the scholastic philosophers and by
modern logicians. The classification of knowledge
goes back to the Greeks and owes much to the contributions of Bacon and the Renaissance. It was posed
as a distinct and separate problem by D’Alembert
and the Encyclopédie, and by Ampère, Comte, and
Spencer. The recent work of Manouvrier, Durand
de Cros, Goblot, Naville, de la Grasserie, has focussed on various aspects of it. [8] As to systematics,
one can say that this has become the very basis of
the organisation of knowledge as a body of science.
When one has demonstrated the existence of 28 million stars, a million chemical compounds, 300,000

92

Paul Otlet

vegetable species, 200,000 animal species, etc., it is
necessary to have a means, an Ariadne’s thread, of
finding one’s way through the labyrinth formed by
all these objects of study. Because there are sciences of beings as well as sciences of phenomena, and
because they intersect with each other as we better
understand the whole of reality, it is necessary that
this means be used to retrieve both. The state of development of a science is reflected at any given time
by its systematics, just as the general classification
of the sciences reflects the state of development of
the encyclopedia, of the philosophy of knowledge.
The need has been felt, however, for a practical
instrument of classification. The classifications of
which we have just spoken are constantly changing, at least in their detail if not in broad outline. In
practice, such instability, such variability which is
dependent on the moment, on schools of thought
and individuals, is not acceptable. Just as the Repertory had its origin in the catalogue, so practical
classification originated in the Library. Books represent knowledge and it is necessary to arrange them
in collections. Schemes for this have been devised
since the Middle Ages. The elaboration of grand
systems occurred in the 17th and 18th centuries
and some new ones were added in the 19th century. But when bibliography began to emerge as an
autonomous field of study, it soon began to develop
along the lines of the catalogue of an ideal library
comprising the totality of what had been published.
From this to drawing on library classifications was
but a step, and it was taken under certain conditions
which must be stressed.

Transformations In The Bibliographical
Apparatus Of The Sciences

93

Up to the present time, 170 different classifications
have been identified. Now, no cooperation is possible if everyone stays shut up in his own system. It
has been necessary, therefore, to choose a universal
classification and to recommend it as such in the
same way that the French Convention recognized
the necessity of a universal system of weights and
measures. In 1895 the first International Conference
of Bibliography chose the Decimal Classification
and adopted a complete plan for its development. In
1904, the edition of the expanded tables appeared. A
new edition was being prepared when the war broke
out Brussels, headquarters of the International Institute of Bibliography, which was doing this work,
was part of the invaded territory.
In its latest state, the Decimal Classification has
become an instrument of great precision which
can meet many needs. The printed tables contain
33,000 divisions and they have an alphabetical index consisting of about 38,000 words. Learning is
here represented in its entire sweep: the encyclopedia of knowledge. Its principle is very simple. The
empiricism of an alphabetical classification by subject-heading cannot meet the need for organising
and systematizing knowledge. There is scattering;
there is also the difficulty of dealing with the complex expressions which one finds in the modern terminology of disciplines like medicine, technology,
and the social sciences. Above all, it is impossible
to achieve any international cooperation on such
a national basis as language. The Decimal Classification is a vast systematization of knowledge, “the
table of contents of the tables of contents” of all

94

Paul Otlet

treatises. But, as it would be impossible to find a
particular subject’s relative place by reference to
another subject, a system of numbering is needed.
This is decimal, which an example will make clear.
Optical Physiology would be classified thus:
5 th Class
3rd Group
5th Division
7th Sub-division

Natural Sciences
Physics
Optics
Optical Physiology

or 535.7
This number 535.7 is called decimal because all
knowledge is taken as one of which each science is
a fraction and each individual subject is a decimal
subdivided to a lesser or greater degree. For the sake
of abbreviation, the zero of the complete number,
which would be 0.5357, has been suppressed because
the zero would be repeated in front of each number.
The numbers 5, 3, 5, 7 (which one could call five hundred and thirty-five point seven and which could
be arranged in blocks of three as for the telephone,
or in groups of twos) form a single number when
the implied words, “class, group, division and subdivision,” are uttered.
The classification is also called decimal because
all subjects are divided into ten classes, then each
of these into at least ten groups, and each group
into at least ten divisions. All that is needed for the
number 535.7 always to have the same meaning is
to translate the tables into all languages. All that is
needed to deal with future scientific developments

Transformations In The Bibliographical
Apparatus Of The Sciences

95

in optical physiology in all of its ramifications is to
subdivide this number by further decimal numbers
corresponding to the subdivisions of the subject
Finally, all that is needed to ensure that any document or item pertaining to optical physiology finds
its place within the sum total of scientific subjects
is to write this number on it In the alphabetic index
to the tables references are made from each word
to the classification number just as the index of a
book refers to page numbers.
This first remarkable principle of the decimal
classification is generally understood. Its second,
which has been introduced more recently, is less
well known: the combination of various classification numbers whenever there is some utility in expressing a compound or complex heading. In the
social sciences, statistics is 31 and salaries, 331.2. By
a convention these numbers can be joined by the
simple sign : and one may write 31:331.2 statistics
of salaries.01
This indicates a general relationship, but a subject also has its place in space and time. The subject
may be salaries in France limited to a period such as
the 18th century (that is to say, from 1700 to 1799).
01 The first ten divisions are: 0 Generalities, 1 Philosophy, 2
Religion, 3 Social Sciences, 4 Philology, Language, 5 Pure
Sciences, 6 Applied Science, Medicine, 7 Fine Arts, 8 Literature, 9 History and Geography. The Index number 31 is
derived from: 3rd class social sciences, 1st group statistics. The
Index number 331.2 is derived from 3rd class social sciences,
3rd group political economy, 1st division topics about work,
2nd subdivision salaries.

96

Paul Otlet

The sign that characterises division by place being
the parenthesis and that by time quotation marks
or double parentheses, one can write:
33:331.2 (44) «17» statistics — of salaries — in
France — in the 17th century
or ten figures and three signs to indicate, in terms
of the universe of knowledge, four subordinated
headings comprising 42 letters. And all of these
numbers are reversible and can be used for geographic or chronologic classification as well as for
subject classification:
(44) 31:331.2 «17»
France — Statistics — Salaries — 17th Century
«17» (44) 31:331.2
17th Century — France — Statistics — Salaries
The subdivisions of relation and location explained
here, are completed by documentary subdivisions
for the form and the language of the document (for
example, periodical, in Italian), and by functional
subdivisions (for example, in zoology all the divisions by species of animal being subdivided by biological aspects). It follows by virtue of the law of
permutations and combinations that the present
tables of the classification permit the formulation
at will of millions of classification numbers. Just as
arithmetic does not give us all the numbers readymade but rather a means of forming them as we
need them, so the classification gives us the means

Transformations In The Bibliographical
Apparatus Of The Sciences

97

of creating classification numbers insofar as we have
compound headings that must be translated into a
notation of numbers.
Like chemistry, mathematics and music, bibliography thus has its own extremely simple notations:
numbers. Immediately and without confusion, it
allows us to find a place for each idea, for each thing
and consequently for each book, article, or document and even for each part of a book or document
Thus it allows us to take our bearings in the midst
of the sources of knowledge, just as the system of
geographic coordinates allows us to take our bearings on land or sea.
One may well imagine the usefulness of such a
classification to the Repertory. It has rid us of the
difficulty of not having continuous pagination. Cards
to be intercalated can be placed according to their
class number and the numbering is that of tables
drawn up in advance, once and for all, and maintained with an unvarying meaning. As the classification has a very general use, it constitutes a true
documentary classification which can be used in
various kinds of repertories: bibliographic repertories; catalogue-like repertories of objects, persons,
phenomena; and documentary repertories of files
made up of written or printed materials of all kinds.
The possibility can be envisaged of encyclopedic
repertories in which are registered and integrated
the diverse data of a scientific field and which draw
for this purpose on materials published in periodicals. Let each article, each report, each item of news
henceforth carry a classification number and, automatically, by clipping, encyclopedias on cards can

98

Paul Otlet

be created in which all the results of international
scientific cooperation are brought together at the
same number. This constitutes a profound change
in the technology of the Book, since the repertory
thus formed is simultaneously a constantly up-dated book and a cooperative book in which are found
printed elements produced in all locations.
•••
4. If we can realize the third idea, the Office of Documentation, then reform will be complete. Such an
office is the old library, but adapted to a new function. Hitherto the library has been a museum of
books. Works were preserved in libraries because
they were precious objects. Librarians were keepers.
Such establishments were not organised primarily
for the use of documents. Moreover, their outmoded
regulations if they did not exclude the most modern
forms of publication at least did not admit them.
They have poor collections of journals; collections
of newspapers are nearly nonexistent; photographs,
films, phonograph discs have no place in them, nor
do film negatives, microscopic slides and many other “documents.” The subject catalogue is considered
secondary in the library so long as there is a good
register for administrative purposes. Thus there is
little possibility of developing repertories in the
library, that is to say of taking publications to pieces and redistributing them in a more directly and
quickly accessible form. For want of personnel to
arrange them, there has not even been a place for
the cards that are received already printed.

Transformations In The Bibliographical
Apparatus Of The Sciences

99

The Office of Documentation, on the contrary, is
conceived of in such a way as to achieve all that is
lacking in the library. Collections of books are the
necessary basis for it, but books, far from being
considered as finished products, are simply materials which must be developed more fully. This
development consists in establishing the connections each individual book has with all of the other
books and forming from them all what might be
called The Universal Book. It is for this that we use
repertories: bibliographic repertories; repertories of
documentary dossiers gathering pamphlets and extracts together by subject; catalogues; chronological
repertories of facts or alphabetical ones of names;
encyclopedic repertories of scientific data, of laws,
of patents, of physical and technical constants, of
statistics, etc. All of these repertories will be set up
according to the method described above and arranged by the same universal classification. As soon
as an organisation to contain these repertories is
created, the Office of Documentation, one may be
sure that what happened to the book when libraries
first opened — scientific publication was regularised
and intensified — will happen to them. Then there
will be good reason for producing in bibliographies,
catalogues, and above all in books and periodicals
themselves, the rational changes which technology and the creative imagination suggest. What is
still an exception today will be common tomorrow.
New possibilities will exist for cooperative work
and for the more effective organisation of science.
•••

100

Paul Otlet

5. Repertory, Classification, Office of Documentation are therefore the three related elements of a
single reform in our methods of registering scientific discoveries and making them available to the
greatest number of people. Already one must speak
less of experiments and uncertain trials than of the
beginning of serious achievement. The International Institute of Bibliography in Brussels constitutes
a vast intellectual cooperative whose members are
becoming more numerous each day. Associations,
scientific establishments, periodical publications,
scientific and technical workers of every kind are
affiliating with it. Its repertories contain millions of
cards. There are sections in several countries02 . But
this was before the War. Since its outbreak, a movement in France, England and the United States has
been emerging everywhere to improve the organisation of the Book. The Office of Documentation has
been suggested as the solution for the requirements
that have been discussed.
It is important that the world of science and
technology should support this movement and
above all that it should endeavour to apply the new
methods to the works which it will be necessary to
re-organise. Among the most important of these is
the International Catalogue of Scientific Literature,
that fine and great work begun at the initiative of the
Royal Society of London. Until now, this work has
02 In France, the Bureau Bibliographique de Paris and great
associations such as the Société pour l’encouragement de
l’industrie nationale, l’Association pour l’avancement des
sciences, etc., are affiliated with it.

Transformations In The Bibliographical
Apparatus Of The Sciences

101

been carried on without relation to other works of
the same kind: it has not recognised the value of a
card repertory or a universal classification. It must
recognise them in the future.03 ❧

03 See Paul Otlet, “La Documentation et I’information au service de I’industrie”, Bulletin de la Société d’encouragement
de l’industrie nationale, June 1917. — La Documentation au
service de l’invention. Euréka, October 1917. — L’Institut
International de Bibliographie, Bibliographie de la France,
21 December 1917. — La Réorganisation du Catalogue international de la littérature scientifique. Revue générale des
sciences, IS February 1918. The publications of the Institute,
especially the expanded tables of the Decimal Classification,
have been deposited at the Bureau Bibliographique de Paris,
44 rue de Rennes at the apartments of the Société de l’encouragement. — See also the report presented by General
Sebert (9] to the Congrès du Génie civil, in March 1918 and
whose conclusions about the creation in Paris of a National
Office of Technical Documentation have been adopted.

102

Paul Otlet

Editor’s Notes
[1] “Transformations operées dans l’appareil bibliographique
des sciences,” Revue scientifique 58 (1918): 236-241.
[2] The International Catalogue of Scientific Literature, an enormous work, was compiled by a Central Bureau under the
sponsorship of the Royal Society from material sent in from
Regional Bureaus around the world. It was published annually beginning in 1902 in 17 parts each corresponding to
a major subject division and comprising one or more volumes. Publication was effectively suspended in 1914. By the
time war broke out, the Universal Bibliographic Repertory
contained over 11 million entries.
[3] For card publication by the Library Bureau and Library of
Congress, see Edith Scott, “The Evolution of Bibliographic
Systems in the United States, 1876–1945” and Editor’s Note
36 to the second paper and Note 5 to the seventh paper in
International Organisation and Dissemination of Knowledge; Selected Essays of Paul Otlet, translated and edited by
W. Boyd Rayward. Amsterdam: Elsevier, 1990: 148–156.
[4] Otlet refers to the Concilium Bibliographicum also in Paper
No. 7, “The Reform of National Bibliographies...” in International Organisation and Dissemination of Knowledge; Selected
Essays of Paul Otlet. See also Editor’s Note 5 in that paper
for the major bibliographies published by the Concilium
Bibliographicum.
[5] A possible example of what Otlet is referring to here is the
Gray Herbarium Index. This was “planned to provide cards
for all the names of vascular plant taxa attributable to the

Transformations In The Bibliographical
Apparatus Of The Sciences

103

Western Hemisphere beginning with the literature of 1886”
(Gray Herbarium Index, Preface, p. iii). Under its first compiler, 20 instalments consisting in all of 28,000 cards were
issued between 1894 and 1903. It has been continued after
that time and was for many years “issued quarterly at the
rate of about 4,000 cards per year.” At the time the cards
were reproduced in a printed catalogue by G. K. Hall in 1968,
there were 85 subscribers to the card sets.
[6] Nelson’s Perpetual Loose-Leaf Encylcopedia was a popular,
12-volume work which went through many editions, its
principle being set down at the beginning of the century.
It was published in binders and the publisher undertook to
supply a certain number of pages of revisions (or renewals)
semi-annually after each edition, the first of which appeared
in 1905. An interesting reference presumably to this work
occurs in a notice, “An Encylcopedia on the Card-Index System,” in the Scientific American 109 (1913): 213. The Berlin
Correspondent of the journal reports a proposal made in
Berlin which contains “an idea, in a sense ... already carried
out in an American loose-leaf encyclopedia, the publishers
of which supply new pages to take the place of those that
are obsolete” (Nelsons, an English firm, set up a New York
branch in 1896. Publication in the U.S. of works to be widely
circulated there was a requirement of the copyright law.)
The reporter observes that the principle suggested “affords
a means of recording all facts at present known as well as
those to be discovered in the future, with the same safety
and ease as though they were registered in our memory, by
providing a universal encyclopedia, incessantly keeping
abreast of the state of human knowledge.” The “bookish”
form of conventional encyclopedias acts against its future
success. “In the case of a mere storehouse of facts the in-

104

Paul Otlet

finitely more mobile form of the card index should however
be adopted, possibly,” the author goes on making a most interesting reference, “in conjunction with Dr. Goldschmidt’s
Microphotographic Library System.” The need for a central
institute, the nature of its work, the advantages of the work
so organised are described in language that is reminiscent
of that of Paul Otlet (see also the papers of Goldschmidt
and Otlet translated in International Organisation and
Dissemination of Knowledge; Selected Essays of Paul Otlet).
[7] These machines were derived from Herman Hollerith’s
punched cards and tabulating machines. Hollerith had
introduced them under contract into the U.S. Bureau of
the Census for the 1890 census. This equipment was later
modified and developed by the Bureau. Hollerith, his invention and his business connections lie at the roots of the
present IBM company. The equipment and its uses in the
census from 1890 to 1910 are briefly described in John H.
Blodgett and Claire K. Schultz, “Herman Hollerith: Data
Processing Pioneer,” American Documentation 20 (1969):
221-226. As they observe, suggesting the accuracy of Otlet’s
extrapolation, “his was not simply a calculating machine,
it performed selective sorting, an operation basic to all information retrieval.”
[8] The history of the classification of knowledge has been treated
in English in detail by E.C. Richardson in his Classification
Theoretical and Practical, the first edition of which appeared
in 1901 and was followed by editions in 1912 and 1930. A
different treatment is given in Robert Flint’s Philosophy as
Scientia Scientarium: a History of the Classification of the
Sciences which appeared in 1904. Neither of these works
deal with Manouvrier, a French anthropologist, or Durand

Transformations In The Bibliographical
Apparatus Of The Sciences

105

de Cros. Joseph-Pierre Durand, sometimes called Durand
de Cros after his birth place, was a French physiologist and
philosopher who died in 1900. In his Traité de documentation,
in the context of his discussion of classification, Otlet refers
to an Essai de taxonomie by Durand published by Alcan. It
seems that this is an error for Aperçus de taxonomie (Alcan,
1899).
[9] General Hippolyte Sebert was President of the Association française pour l’avancement des sciences, and the Société d’encouragement pour l’industrie nationale. He had
been active in the foundation of the Bureau bibliographique
de Paris. For other biographical information about him see
Editor’s Note 9 to Paper no 17, “Henri La Fontaine”, in International Organisation and Dissemination of Knowledge;
Selected Essays of Paul Otlet.

English translation of the Paul Otlet’s text published with the
permission of W. Boyd Rayward. The translation was originally
published as Paul Otlet, “Transformations in the Bibliographical
Apparatus of the Sciences: Repertory–Classification–Office of
Documentation”, in International Organisation and Dissemination of Knowledge; Selected Essays of Paul Otlet, translated and
edited by W. Boyd Rayward, Amsterdam: Elsevier, 1990: 148–156.

106

Paul Otlet

107

108

public library

http://aaaaarg.org/

109

McKenzie Wark

Metadata Punk

So we won the battle but lost the war. By “we”, I
mean those avant-gardes of the late twentieth century whose mission was to free information from the
property form. It was always a project with certain
nuances and inconsistencies, but over-all it succeeded beyond almost anybody’s wildest dreams. Like
many dreams, it turned into a nightmare in the end,
the one from which we are now trying to awake.
The place to start is with what the situationists
called détournement. The idea was to abolish the
property form in art by taking all of past art and
culture as a commons from which to copy and correct. We see this at work in Guy Debord’s texts and
films. They do not quote from past works, as to do
so acknowledges their value and their ownership.
The elements of détournement are nothing special.
They are raw materials for constructing theories,
narratives, affects of a subjectivity no longer bound
by the property form.
Such a project was recuperated soon enough
back into the art world as “appropriation.” Richard
Prince is the dialectical negation of Guy Debord,

Metadata Punk

111

in that appropriation values both the original fragment and contributes not to a subjectivity outside of
property but rather makes a career as an art world
star for the appropriating artist. Of such dreams is
mediocrity made.
If there was a more promising continuation of
détournement it had little to do with the art world.
Détournement became a social movement in all but
name. Crucially, it involved an advance in tools,
from Napster to Bitorrent and beyond. It enabled
the circulation of many kinds of what Hito Steyerl
calls the poor image. Often low in resolution, these
détourned materials circulated thanks both to the
compression of information but also because of the
addition of information. There might be less data
but there’s added metadata, or data about data, enabling its movement.
Needless to say the old culture industries went
into something of a panic about all this. As I wrote
over ten years ago in A Hacker Manifesto, “information wants to be free but is everywhere in chains.”
It is one of the qualities of information that it is indifferent to the medium that carries it and readily
escapes being bound to things and their properties.
Yet it is also one of its qualities that access to it can
be blocked by what Alexander Galloway calls protocol. The late twentieth century was — among other
things — about the contradictory nature of information. It was a struggle between détournement and
protocol. And protocol nearly won.
The culture industries took both legal and technical steps to strap information once more to fixity
in things and thus to property and scarcity. Inter-

112

McKenzie Wark

estingly, those legal steps were not just a question of
pressuring governments to make free information
a crime. It was also a matter of using international
trade agreements as a place outside the scope of de­
mo­­cratic oversight to enforce the old rules of property. Here the culture industries join hands with the
drug cartels and other kinds of information-based
industry to limit the free flow of information.
But laws are there to be broken, and so are protocols of restriction such as encryption. These were
only ever delaying tactics, meant to shore up old
monopoly business for a bit longer. The battle to
free information was the battle that the forces of
détournement largely won. Our defeat lay elsewhere.
While the old culture industries tried to put information back into the property form, there were
other kinds of strategy afoot. The winners were not
the old culture industries but what I call the vulture
industries. Their strategy was not to try to stop the
flow of free information but rather to see it as an
environment to be leveraged in the service of creating a new kind of business. “Let the data roam free!”
says the vulture industry (while quietly guarding
their own patents and trademarks). What they aim
to control is the metadata.
It’s a new kind of exploitation, one based on an
unequal exchange of information. You can have the
little scraps of détournement that you desire, in exchange for performing a whole lot of free labor—and
giving up all of the metadata. So you get your little
bit of data; they get all of it, and more importantly,
any information about that information, such as
the where and when and what of it.

Metadata Punk

113

It is an interesting feature of this mode of exploitation that you might not even be getting paid for your
labor in making this information—as Trebor Scholz
as pointed out. You are working for information
only. Hence exploitation can be extended far beyond
the workplace and into everyday life. Only it is not
so much a social factory, as the autonomists call it.
This is more like a social boudoir. The whole of social
space is in some indeterminate state between public
and private. Some of your information is private to
other people. But pretty much all of it is owned by
the vulture industry — and via them ends up in the
hands of the surveillance state.
So this is how we lost the war. Making information free seemed like a good idea at the time. Indeed, one way of seeing what transpired is that we
forced the ruling class to come up with these new
strategies in response to our own self-organizing
activities. Their actions are reactions to our initiatives. In this sense the autonomists are right, only
it was not so much the actions of the working class
to which the ruling class had to respond in this case,
as what I call the hacker class. They had to recuperate a whole social movement, and they did. So our
tactics have to change.
In the past we were acting like data-punks. Not
so much “here’s three chords, now form your band.”
More like: “Here’s three gigs, now go form your autonomous art collective.” The new tactic might be
more question of being metadata-punks. On the one
hand, it is about freeing information about information rather than the information itself. We need
to move up the order of informational density and

114

McKenzie Wark

control. On the other hand, it might be an idea to
be a bit discreet about it. Maybe not everyone needs
to know about it. Perhaps it is time to practice what
Zach Blas calls infomatic opacity.
Three projects seem to embody much of this
spirit to me. One I am not even going to name or
discuss, as discretion seems advisable in that case.
It takes matters off the internet and out of circulation among strangers. Ask me about it in person if
we meet in person.
The other two are Monoskop Log and UbuWeb.
It is hard to know what to call them. They are websites, archives, databases, collections, repositories,
but they are also a bit more than that. They could be
thought of also as the work of artists or of curators;
of publishers or of writers; of archivists or researchers. They contain lots of files. Monoskop is mostly
books and journals; UbuWeb is mostly video and
audio. The work they contain is mostly by or about
the historic avant-gardes.
Monoskop Log bills itself as “an educational
open access online resource.” It is a component part
of Monoskop, “a wiki for collaborative studies of
art, media and the humanities.” One commenter
thinks they see the “fingerprint of the curator” but
nobody is named as its author, so let’s keep it that
way. It is particularly strong on Eastern European
avant-garde material. UbuWeb is the work of Kenneth Goldsmith, and is “a completely independent
resource dedicated to all strains of the avant-garde,
ethnopoetics, and outsider arts.”
There’s two aspects to consider here. One is the
wealth of free material both sites collect. For any-

Metadata Punk

115

body trying to teach, study or make work in the
avant-garde tradition these are very useful resources.
The other is the ongoing selection, presentation and
explanation of the material going on at these sites
themselves. Both of them model kinds of ‘curatorial’
or ‘publishing’ behavior.
For instance, Monoskop has wiki pages, some
better than Wikipedia, which contextualize the work
of a given artist or movement. UbuWeb offers “top
ten” lists by artists or scholars which give insight
not only into the collection but into the work of the
person making the selection.
Monoskop and UbuWeb are tactics for intervening in three kinds of practices, those of the artworld, of publishing and of scholarship. They respond to the current institutional, technical and
political-economic constraints of all three. As it
says in the Communist Manifesto, the forces for social change are those that ask the property question.
While détournement was a sufficient answer to that
question in the era of the culture industries, they try
to formulate, in their modest way, a suitable tactic
for answering the property question in the era of
the vulture industries.
This takes the form of moving from data to metadata, expressed in the form of the move from writing
to publishing, from art-making to curating, from
research to archiving. Another way of thinking this,
suggested by Hiroki Azuma would be the move from
narrative to database. The object of critical attention
acquires a third dimension, a kind of informational
depth. The objects before us are not just a text or an
image but databases of potential texts and images,
with metadata attached.

116

McKenzie Wark

The object of any avant-garde is always to practice the relation between aesthetics and everyday
life with a new kind of intensity. UbuWeb and
Monoskop seem to me to be intimations of just
such an avant-garde movement. One that does not
offer a practice but a kind of meta-practice for the
making of the aesthetic within the everyday.
Crucial to this project is the shifting of aesthetic
intention from the level of the individual work to the
database of works. They contain a lot of material, but
not just any old thing. Some of the works available
here are very rare, but not all of them are. It is not
just rarity, or that the works are available for free.
It is more that these are careful, artful, thoughtful
collections of material. There are the raw materials here with which to construct a new civilization.
So we lost the battle, but the war goes on. This
civilization is over, and even its defenders know it.
We live in among ruins that accrete in slow motion.
It is not so much a civil war as an incivil war, waged
against the very conditions of existence of life itself.
So even if we have no choice but to use its technologies and cultures, the task is to build another way
of life among the ruins. Here are some useful practices, in and on and of the ruins. ❧

Metadata Punk

117

118

public library

http://midnightnotes.memoryoftheworld.org/

119

Tomislav Medak

The Future After the Library
UbuWeb and Monoskop’s
Radical Gestures

The institution of the public library has crystallized,
developed and advanced around historical junctures
unleashed by epochal economic, technological and
political changes. A series of crises since the advent
of print have contributed to the configuration of the
institutional entanglement of the public library as
we know it today:01 defined by a publicly available
collection, housed in a public building, indexed and
made accessible with a help of a public catalog, serviced by trained librarians and supported through
public financing. Libraries today embody the idea
of universal access to all knowledge, acting as custodians of a culture of reading, archivists of material
and ephemeral cultural production, go-betweens
of information and knowledge. However, libraries have also embraced a broader spirit of public
service and infrastructure: providing information,
01 For the concept and the full scope of the contemporary library
as institutional entanglement see Shannon Mattern, “Library
as Infrastructure”, Places Journal, accessed April 9, 2015,
https://placesjournal.org/article/library-as-infrastructure/.

The Future After the Library

121

education, skills, assistance and, ultimately, shelter
to their communities — particularly their most vulnerable members.
This institutional entanglement, consisting in
a comprehensive organization of knowledge, universally accessible cultural goods and social infrastructure, historically emerged with the rise of (information) science, social regulation characteristic
of modernity and cultural industries. Established
in its social aspect as the institutional exemption
from the growing commodification and economic
barriers in the social spheres of culture, education
and knowledge, it is a result of struggles for institutionalized forms of equality that still reflect the
best in solidarity and universality that modernity
had to offer. Yet, this achievement is marked by
contradictions that beset modernity at its core. Libraries and archives can be viewed as an organon
through which modernity has reacted to the crises
unleashed by the growing production and fixation
of text, knowledge and information through a history of transformations that we will discuss below.
They have been an epistemic crucible for the totalizing formalizations that have propelled both the
advances and pathologies of modernity.
Positioned at a slight monastic distance and indolence toward the forms of pastoral, sovereign or
economic domination that defined the surrounding world that sustained them, libraries could never
close the rift or between the universalist aspirations
of knowledge and their institutional compromise.
Hence, they could never avoid being the battlefield
where their own, and modernity’s, ambivalent epis-

122

Tomislav Medak

temic and social character was constantly re-examined and ripped asunder. It is this ambivalent
character that has been a potent motor for critical theory, artistic and political subversion — from
Marx’s critique of political economy, psychoanalysis
and historic avant-gardes, to revolutionary politics.
Here we will examine the formation of the library
as an epistemic and social institution of modernity
and the forms of critical engagement that continue
to challenge the totalizing order of knowledge and
appropriation of culture in the present.
Here Comes the Flood02
Prior to the advent of print, the collections held in
monastic scriptoria, royal courts and private libraries
typically contained a limited number of canonical
manuscripts, scrolls and incunabula. In Medieval
and early Renaissance Europe the canonized knowledge considered necessary for the administration of
heavenly and worldly affairs was premised on reading and exegesis of biblical and classical texts. It is
02 The metaphor of the information flood, here incanted in the
words of Peter Gabriel’s song with apocalyptic overtones, as
well as a good part of the historic background of the development of index card catalog in the following paragraphs
are based on Markus Krajewski, Paper Machines: About
Cards & Catalogs, 1548–1929 (MIT Press, 2011). The organizing idea of Krajewski’s historical account, that the index
card catalog can be understood as a Turing machine avant
la lettre, served as a starting point for the understanding
of the library as an epistemic institution developed here.

The Future After the Library

123

estimated that by the 15th century in Western Europe
there were no more than 5 million manuscripts held
mainly in the scriptoria of some 21,000 monasteries and a small number of universities. While the
number of volumes had grown sharply from less
than 0.8 million in the 12th century, the number of
monasteries had remained constant throughout that
period. The number of manuscripts read averaged
around 1,000 per million inhabitants, with the total
population of Europe peaking around 60 million.03
All in all, the book collections were small, access was
limited and reading culture played a marginal role.
The proliferation of written matter after the invention of mechanical movable type printing would
greatly increase the number of books, but also the
patterns of literacy and knowledge production. Already in the first fifty years after Gutenberg’s invention, 12 million volumes were printed, and from
this point onwards the output of printing presses
grew exponentially to 700 million volumes in the
18th century. In the aftermath of the explosion in
book production the cost of producing and buying
books fell drastically, reducing the economic barriers to literacy, but also creating a material vector
for a veritable shift of the epistemic paradigm. The
03 For an economic history of the book in the Western Europe
see Eltjo Buringh and Jan Luiten Van Zanden, “Charting
the ‘Rise of the West’: Manuscripts and Printed Books in
Europe, A Long-Term Perspective from the Sixth through
Eighteenth Centuries”, The Journal of Economic History 69,
No. 02 (June 2009): 409–45, doi:10.1017/S0022050709000837,
particularly Tables 1-5.

124

Tomislav Medak

emerging reading public was gaining access to the
new works of a nascent Enlightenment movement,
ushering in the modern age of science. In parallel
with those larger epochal transformations, the explosion of print also created a rising tide of new books
that suddenly inundated the libraries. The libraries
now had to contend both with the orders-of-magnitude greater volume of printed matter and the
growing complexity of systematically storing, ordering, classifying and tracking all of the volumes
in their collection. An once almost static collection
of canonical knowledge became an ever expanding
dynamic flux. This flood of new books, the first of
three to follow, presented principled, infrastructural and organizational challenges to the library that
radically transformed and coalesced its functions.
The epistemic shift created by this explosion of
library holdings led to a revision of the assumption
that the library is organized around a single holy
scripture and a small number of classical sources.
Coextensive with the emergence and multiplication of new sciences, the books that were entering
the library now covered an ever diversified scope
of topics and disciplines. And the sheer number of
new acquisitions demanded the physical expansion of libraries, which in turn required a radical
rethinking of the way the books were stored, displayed and indexed. In fact, the flood caused by the
printing press was nothing short of a revolution in
the organization, formalization and processing of
information and knowledge. This becomes evident
in the changes that unfolded between the 16th and
the early 20th in the cataloging of library collections.

The Future After the Library

125

The initial listings of books were kept in bound
volumes, books in their own right. But as the number of items arriving into the library grew, the constant need to insert new entries made the bound
book format increasingly impractical for library
catalogs. To make things more complicated still,
the diversification of the printed matter demanded
a richer bibliographic description that would allow
better comprehension of what was contained in the
volumes. Alongside the name of the author and the
book’s title, the description now needed to include
the format of the volume, the classification of the
subject matter and the book’s location in the library.
As the pace of new arrivals accelerated, the effort to
create a library catalog became unending, causing a
true crisis in the emerging librarian profession. This
would result in a number of physical and epistemic
innovations in the organization and formalization
of information and knowledge. The requirement
to constantly rearrange the order of entries in the
listing lead to the eventual unbinding of the bound
catalog into separate slips of paper and finally to the
development of the index card catalog. The unbound
index cards and their floating rearrangement, not
unlike that of the movable type, would in turn result in the design of filing cabinets. From Conrad
Gessner’s Bibliotheca Universalis, a three-volume
book-format catalog of around 3,000 authors and
10,000 texts, arranged alphabetically and topically,
published in the period 1545–1548; Gottfried Wilhelm Leibniz’s proposals for a universal library
during his tenure at the Wolfenbüttel library in the
late 17th century; to Gottfried van Swieten’s catalog

126

Tomislav Medak

of the Viennese court library, the index card catalog and the filing cabinets would develop almost to
their present form.04
The unceasing inflow of new books into the library
prompted the need to spatially organize and classify
the arrangement of the collection. The simple addition of new books to the shelves by size; canonical
relevance or alphabetical order, made little sense
in a situation where the corpus of printed matter
was quickly expanding and no individual librarian
could retain an intimate overview of the library’s
entire collection. The inflow of books required that
the brimming shelf-space be planned ahead, while
the increasing number of expanding disciplines required that the collection be subdivided into distinct
sections by fields. First the shelves became classified
and then the books individually received a unique
identifier. With the completion of the Josephinian
catalog in the Viennese court library, every book became compartmentalized according to a systematic
plan of sciences and assigned a unique sequence of
a Roman numeral, a Roman letter and an Arabic
numeral by which it could be tracked down regardless of its physical location.05 The physical location
of the shelves in the library no longer needed to be
reflected in the ordering of the catalog, and the catalog became a symbolic representation of the freely
re-arrangeable library. In the technological lingo of
today, the library required storage, index, search
and address in order to remain navigable. It is this
04 Krajewski, Paper Machines, op. cit., chapter 2.
05 Ibid., 30.

The Future After the Library

127

formalization of a universal system of classification
of objects in the library with the relative location of
objects and re-arrangeable index that would then in
1876 receive its present standardized form in Melvil
Dewey’s Decimal System.
The development of the library as an institution of
public access and popular literacy did not proceed
apace with the development of its epistemic aspects.
It was only a series of social upheavals and transformations in the course of the 18th and 19th century
that would bring about another flood of books and
political demands, pushing the library to become
embedded in an egalitarian and democratic political culture. The first big step in that direction came
with the decision of the French revolutionary National Assembly from 2 November 1789 to seize all
book collections from the Church and aristocracy.
Million of volumes were transferred to the Bibliothèque Nationale and local libraries across France.
In parallel, particularly in England, capitalism was
on the rise. It massively displaced the impoverished rural population into growing urban centers,
propelled the development of industrial production and, by the mid-19th century, introduced the
steam-powered rotary press into the book business.
As books became more easily, and mass produced,
the commercial subscription libraries catering to the
better-off parts of society blossomed. This brought
the class aspect of the nascent demand for public
access to books to the fore. After the failed attempts
to introduce universal suffrage and end the system
of political representation based on property entitlements in 1830s and 1840s, the English Chartist

128

Tomislav Medak

movement started to open reading rooms and cooperative lending libraries that would quickly become
a popular hotbed of social exchanges between the
lower classes. In the aftermath of the revolutionary
upheavals of 1848, the fearful ruling classes heeded
the demand for tax-financed public libraries, hoping
that the access to literature and edification would
ultimately hegemonize the working class for the
benefits of capitalism’s culture of self-interest and
competition.06
The Avant-gardes in the Library
As we have just demonstrated, the public library
in its epistemic and social aspects coalesced in the
context of the broader social transformations of
modernity: early capitalism and processes of nation-building in Europe and the USA. These transformations were propelled by the advancement of
political and economic rationalization, public and
business administration, statistical and archival
procedures. Archives underwent a corresponding and largely concomitant development with the
libraries, responding with a similar apparatus of
classification and ordering to the exponential expansion of administrative records documenting the
social world and to the historicist impulse to capture the material traces of past events. Overlaying
the spatial organization of documentation; rules
06 For the social history of public library see Matthew Battles,
Library: An Unquiet History (Random House, 2014) chapter
5: “Books for all”.

The Future After the Library

129

of its classification and symbolic representation of
the archive in reference tools, they tried to provide
a formalization adequate to the passion for capturing historical or present events. Characteristic
of the ascendant positivism of the 19th century, the
archivists’ and librarians’ epistemologies harbored
a totalizing tendency that would become subject to
subversion and displacement in the first decades of
the 20th century.
The assumption that the classificatory form can
fully capture the archival content would become
destabilized over and over by the early avant-gardist
permutations of formal languages of classification:
dadaist montage of the contingent compositional
elements, surrealist insistence on the unconscious
surpluses produced by automatized formalized language, constructivist foregrounding of dynamic and
spatialized elements in the acts of perception and
cognition of an artwork.07 The material composition
of the classified and ordered objects already contained formalizations deposited into those objects
by the social context of their provenance or projected onto them by the social situation of encounter
with them. Form could become content and content
could become form. The appropriations, remediations and displacements exacted by the neo-avantgardes in the second half of the 20th century pro07 Sven Spieker, The Big Archive: Art from Bureaucracy (MIT
Press, 2008) provides a detailed account of strategies that
the historic avant-gardes and the post-war art have developed toward the classificatory and ordering regime of the
archive.

130

Tomislav Medak

duced subversions, resignifications and simulacra
that only further blurred the lines between histories
and their construction, dominant classifications and
their immanent instabilities.
Where does the library fit into this trajectory? Operating around an uncertain and politically embattled universal principle of public access to knowledge
and organization of information, libraries continued being sites of epistemic and social antagonisms,
adaptations and resilience in response to the challenges created by the waves of radical expansion of
textuality and conflicting social interests between
the popular reading culture and the commodification of cultural consumption. This precarious position is presently being made evident by the third
big flood — after those unleashed by movable type
printing and the social context of industrial book
production — that is unfolding with the transition
of the book into the digital realm. Both the historic
mode of the institutional regulation of access and
the historic form of epistemic classification are
swept up in this transformation. While the internet
has made possible a radically expanded access to
digitized culture and knowledge, the vested interests of cultural industries reliant on copyright for
their control over cultural production have deepened the separation between cultural producers and
their readers, listeners and viewers. While the hypertextual capacity for cross-reference has blurred
the boundaries of the book, digital rights management technologies have transformed e-books into
closed silos. Both the decommodification of access
and the overcoming of the reified construct of the

The Future After the Library

131

self-enclosed work in the form of a book come at
the cost of illegality.
Even the avant-gardes in all their inappropriable
and idiosyncratic recalcitrance fall no less under
the legally delimited space of copyrightable works.
As they shift format, new claims of ownership and
appropriation are built. Copyright is a normative
classification that is totalizing, regardless of the
effects of leaky networks speaking to the contrary.
Few efforts have insisted on the subverting of juridical classification by copyright more lastingly than
the UbuWeb archive. Espousing the avant-gardes’
ethos of appropriation, for almost 20 years it has
collected and made accessible the archives of the
unknown; outsider, rare and canonized avant-gardes and contemporary art that would otherwise remained reserved for the vaults and restricted access
channels of esoteric markets, selective museological
presentations and institutional archives. Knowing
that asking to publish would amount to aligning itself with the totalizing logic of copyright, UbuWeb
has shunned the permission culture. At the level of
poetical operation, as a gesture of displacing the cultural archive from a regime of limited, into a regime
of unlimited access, it has created provocations and
challenges directed at the classifying and ordering
arrangements of property over cultural production.
One can only assume that as such it has become a
mechanism for small acts of treason for the artists,
who, short of turning their back fully on the institutional arrangements of the art world they inhabit,
use UbuWeb to release their own works into unlimited circulation on the net. Sometimes there might

132

Tomislav Medak

be no way or need to produce a work outside the
restrictions imposed by those institutions, just as
sometimes it is for academics impossible to avoid
the contradictory world of academic publishing,
yet that is still no reason to keep one’s allegiance to
their arrangements.
At the same time UbuWeb has played the game
of avant-gardist subversion: “If it doesn’t exist on
the internet, it doesn’t exist”. Provocation is most
effective when it is ignorant of the complexities of
the contexts that it is directed at. Its effect starts
where fissures in the defense of the opposition start
to show. By treating UbuWeb as massive evidence
for the internet as a process of reappropriation, a
process of “giving to all”, its volunteering spiritus
movens, Kenneth Goldsmith, has been constantly rubbing copyright apologists up the wrong way.
Rather than producing qualifications, evasions and
ambivalences, straightforward affirmation of copy­
ing, plagiarism and reproduction as a dominant
yet suppressed mode of operation of digital culture re-enacts the avant-gardes’ gesture of taking
no hostages from the officially sanctioned systems
of classification. By letting the incumbents of control over cultural production react to the norm of
copying, you let them struggle to dispute the norm
rather than you having to try to defend the norm.
UbuWeb was an early-comer, starting in 1996
and still functioning today on seemingly similar
technology, it’s a child of the early days of World
Wide Web and the promissory period of the experimental internet. It’s resolutely Web 1.0, with
a single maintainer, idiosyncratically simple in its

The Future After the Library

133

layout and programmatically committed to the
eventual obsolescence and sudden abandonment.
No platform, no generic design, no widgets, no
kludges and no community features. Only Beckett
avec links. Endgame.
A Book is an Index is an Index is an Index...
Since the first book flood, the librarian dream of
epistemological formalization has revolved around
the aspiration to cross-reference all the objects in
the collection. Within the physical library the topical designation has been relegated to the confines of
index card catalog that remained isolated from the
structure of citations and indexes in the books themselves. With the digital transition of the book, the
time-shifted hypertextuality of citations and indexes
became realizable as the immediate cross-referentiality of the segments of individual text to segments
of other texts and other digital artifacts across now
permeable boundaries of the book.
Developed as a wiki for collaborative studies of
art, media and the humanities, Monoskop.org took
up the task of mapping and describing avant-gardes and media art in Europe. In its approach both
indexical and encyclopedic, it is an extension of
the collaborative editing made possible by wiki
technology. Wikis rose to prominence in the early
2000s allowing everyone to edit and extend websites running on that technology by mastering a
very simple markup language. Wikis have been the
harbinger of a democratization of web publishing
that would eventually produce the largest collabo-

134

Tomislav Medak

rative website on the internet — the Wikipedia, as
well as a number of other collaborative platforms.
Monoskop.org embraces the encyclopedic spirit of
Wikipedia, focusing on its own specific topical and
topological interests. However, from its earliest days
Monoskop.org has also developed as a form of index
that maps out places, people, artworks, movements,
events and venues that compose the dense network
of European avant-gardes and media art.
If we take the index as a formalization of cross-referential relations between names of people, titles
of works and concepts that exist in the books and
across the books, what emerges is a model of a relational database reflecting the rich mesh of cultural
networks. Each book can serve as an index linking
its text to people, other books, segments in them.
To provide a paradigmatic demonstration of that
idea, Monoskop.org has assembled an index of all
persons in Friedrich Kittler’s Discourse Networks,
with each index entry linking both to its location
in the digital version of the book displayed on the
aaaaarg.org archive and to relevant resources for
those persons on the Monoskop.org and the internet. Hence, each object in the library, an index
in its own right, potentially allows one to initiate
the relational re-classification and re-organization
of all other works in the library through linkable
information.
Fundamental to the works of the post-socialist
retro-avant-gardes of the last couple of decades has
been the re-writing of a history of art in reverse.
In the works of IRWIN, Laibach or Mladen Stilinović, or comparable work of Komar & Melamid,

The Future After the Library

135

totalizing modernity is detourned by re-appropriating the forms of visual representation and classification that the institutions of modernity used to
construct a linear historical narrative of evolutions
and breaks in the 19th and 20th century. Genealogical
tables, events, artifacts and discourses of the past
were re-enacted, over-affirmed and displaced to
open up the historic past relegated to the archives
to an understanding that transformed the present
into something radically uncertain. The efforts of
Monoskop.org in digitizing of the artifacts of the
20th century avant-gardes and playing with the
epistemic tools of early book culture is a parallel
gesture, with a technological twist. If big data and
the control over information flows of today increasingly naturalizes and re-affirms the 19th century
positivist assumptions of the steerablity of society,
then the endlessly recombinant relations and affiliations between cultural objects threaten to overflow
that recurrent epistemic framework of modernity’s
barbarism in its cybernetic form.
The institution of the public library finds itself
today under a double attack. One unleashed by
the dismantling of the institutionalized forms of
social redistribution and solidarity. The other by
the commodifying forces of expanding copyright
protections and digital rights management, control
over the data flows and command over the classification and order of information. In a world of
collapsing planetary boundaries and unequal development, those who control the epistemic order

136

Tomislav Medak

control the future.08 The Googles and the NSAs run
on capturing totality — the world’s knowledge and
communication made decipherable, organizable and
controllable. The instabilities of the epistemic order
that the library continues to instigate at its margins
contributes to keeping the future open beyond the
script of ‘commodify and control’. In their acts of
re-appropriation UbuWeb and Monoskop.org are
but a reminder of the resilience of libraries’ instability that signals toward a future that can be made
radically open. ❧

08 In his article “Controlling the Future—Edward Snowden and
the New Era on Earth”, (accessed April 13, 2015, http://www.
eurozine.com/articles/2014-12-19-altvater-en.html), Elmar
Altvater makes a comparable argument that the efforts of
the “Five Eyes” to monitor the global communication flows,
revealed by Edward Snowden, and the control of the future
social development defined by the urgency of mitigating the
effects of the planetary ecological crisis cannot be thought
apart.

The Future After the Library

137

138

public library

http://kok.memoryoftheworld.org

139

Public Library
www.memoryoftheworld.org

Publishers
What, How & for Whom / WHW
Slovenska 5/1 • HR-10000 Zagreb
+385 (0) 1 3907261
whw@whw.hr • www.whw.hr
ISBN 978-953-55951-3-7 [Što, kako i za koga/WHW]
Multimedia Institute
Preradovićeva 18 • HR-10000 Zagreb
+385 (0)1 4856400
mi2@mi2.hr • www.mi2.hr
ISBN 978-953-7372-27-9 [Multimedijalni institut]
Editors
Tomislav Medak • Marcell Mars • What, How & for Whom / WHW
Copy Editor
Dušanka Profeta [Croatian]
Anthony Iles [English]
Translations
Una Bauer
Tomislav Medak
Dušanka Profeta
W. Boyd Rayward
Design & layout
Dejan Kršić @ WHW
Typography
MinionPro [robert slimbach • adobe]

English translation of the Paul
Otlet’s text published with the permission of W. Boyd
Rayward. The translation was originally published as
Paul Otlet, “Transformations in the Bibliographical
Apparatus of the Sciences: Repertory–Classification–Office
of Documentation”, in International Organisation and
Dissemination of Knowledge; Selected Essays of Paul Otlet,
translated and edited by W. Boyd Rayward, Amsterdam:
Elsevier, 1990: 148–156. ❧
format / size
120 × 200 mm
pages
144
Paper
Agrippina 120 g • Rives Laid 300 g
Printed by
Tiskara Zelina d.d.
Print Run
1000
Price
50 kn
May • 2015

This publication, realized along with the exhibition
Public Library in Gallery Nova, Zagreb 2015, is a part of
the collaborative project This Is Tomorrow. Back to Basics:
Forms and Actions in the Future organized by What, How
& for Whom / WHW, Zagreb, Tensta Konsthall, Stockholm
and Latvian Center for Contemporary Art / LCCA, Riga, as a
part of the book edition Art As Life As Work As Art. ❧

Supported by
Office of Culture, Education and Sport of the City of Zagreb
Ministry of Culture of the Republic of Croatia
Croatian Government Office for Cooperation with NGOs
Creative Europe Programme of the European Commission.
National Foundation for Civil Society Development
Kultura Nova Foundation

This project has been funded with support
from European Commision. This publication reflects
the views only of the authors, and the Commission
cannot be held responsible for any use which may be
made of the information contained therein. ❧
Publishing of this book is enabled by financial support of
the National Foundation for Civil Society Development.
The content of the publication is responsibility of
its authors and as such does not necessarily reflect
the views of the National Foundation. ❧
This project is financed
by the Croatian Government Office for Cooperation
with NGOs. The views expressed in this publication
are the sole responsibility of the publishers. ❧

This book is licensed under a Creative
Commons Attribution–ShareAlike 4.0
International License. ❧

Public Library

may • 2015
price 50 kn


Tenen & Foxman
Book Piracy as Peer Preservation
2014


Book Piracy as Peer Preservation {#book-piracy-as-peer-preservation .entry-title}

**Abstract**

In describing the people, books, and technologies behind one of the
largest "shadow libraries" in the world, we find a tension between the
dynamics of sharing and preservation. The paper proceeds to
contextualize contemporary book piracy historically, challenging
accepted theories of peer production. Through a close analysis of one
digital library's system architecture, software and community, we assert
that the activities cultivated by its members are closer to that of
conservationists of the public libraries movement, with the goal of
preserving rather than mass distributing their collected material.
Unlike common peer production models emphasis is placed on the expertise
of its members as digital preservations, as well as the absorption of
digital repositories. Additionally, we highlight issues that arise from
their particular form of distributed architecture and community.

>  
>
> *Literature is the secretion of civilization, poetry of the ideal.
> That is why literature is one of the wants of societies. That is why
> poetry is a hunger of the soul. That is why poets are the first
> instructors of the people. That is why Shakespeare must be translated
> in France. That is why Molière must be translated in England. That is
> why comments must be made on them. That is why there must be a vast
> public literary domain. That is why all poets, all philosophers, all
> thinkers, all the producers of the greatness of the mind must be
> translated, commented on, published, printed, reprinted, stereotyped,
> distributed, explained, recited, spread abroad, given to all, given
> cheaply, given at cost price, given for nothing.*
> ^[1](#fn-2025-1){#fnref-2025-1}^

**Introduction**

The big money (and the bandwidth) in online media is in film, music, and
software. Text is less profitable for copyright holders; it is cheaper
to duplicate and easier to share. Consequently, issues surrounding the
unsanctioned sharing of print material receive less press and scant
academic attention. The very words, "book piracy," fail to capture the
spirit of what is essentially an Enlightenment-era project, openly
embodied in many contemporary "shadow libraries":^[2](#fn-2025-2){#fnref-2025-2}^
in the words of Victor Hugo, to establish a "vast public
literary domain." Writers, librarians, and political activists from Hugo
to Leo Tolstoy and Andrew Carnegie have long argued for unrestricted
access to information as a form of a public good essential to civic
engagement. In that sense, people participating in online book exchanges
enact a role closer to that of a librarian than that of a bootlegger or
a plagiarist. Whatever the reader's stance on the ethics of copyright
and copyleft, book piracy should not be dismissed as mere search for
free entertainment. Under the conditions of "digital
disruption,"^[3](#fn-2025-3){#fnref-2025-3}^ when the traditional
institutions of knowledge dissemination---the library, the university,
the newspaper, and the publishing house---feel themselves challenged and
transformed by the internet, we can look to online book sharing
communities for lessons in participatory governance, technological
innovation, and economic sustainability.

The primary aims of this paper are ethnographic and descriptive: to
study and to learn from a library that constitutes one of the world's
largest digital archives, rivaling *Google Books*, *Hathi Trust*, and
*Europeana*. In approaching a "thick description" of this archive we
begin to broach questions of scope and impact. We would like to ask:
Who? Where? and Why? What kind of people distribute books online? What
motivates their activity? What technologies enable the sharing of print
media? And what lessons can we draw from them? Our secondary aim is to
continue the work of exploring the phenomenon of book sharing more
widely, placing it in the context of other commons-based peer production
communities like Project Gutenberg and Wikipedia. The archetypal model
of peer production is one motivated by altruistic participation. But the
very history of public libraries is one that combines the impulse to
share and to protect. To paraphrase Jacques Derrida
^[4](#fn-2025-4){#fnref-2025-4}^ writing in "Archive Fever," the archive
shelters memory just as it shelters itself from memory. We encompass
this dual dynamic under the term "peer preservation," where the
logistics of "peers" and of "preservation" can sometimes work at odds to
one another.

Academic literature tends to view piracy on the continuum between free
culture and intellectual property rights. On the one side, an argument
is made for unrestricted access to information as a prerequisite to
properly deliberative democracy.^[5](#fn-2025-5){#fnref-2025-5}^ On this
view, access to knowledge is a form of political power, which must be
equitably distributed, redressing regional and social imbalances of
access.^[6](#fn-2025-6){#fnref-2025-6}^ The other side offers pragmatic
reasoning related to the long-term sustainability of the cultural
sphere, which, in order to prosper, must provide proper economic
incentives to content creators.^[7](#fn-2025-7){#fnref-2025-7}^

It is our contention that grassroots file sharing practices cannot be
understood solely in terms of access or intellectual property. Our field
work shows that while some members of the book sharing community
participate for activist or ideological reasons, others do so as
collectors, preservationists, curators, or simply readers. Despite
romantic notions to the contrary, reading is a social and mediated
activity. The reader encounters texts in conversation, through a variety
of physical interfaces and within an ecosystem of overlapping
communities, each projecting their own material contexts, social norms,
and ideologies. A technician who works in a biology laboratory, for
example, might publish closed-access peer-review articles by day, as
part of his work collective, and release terabytes of published material
by night, in the role of a moderator for an online digital library. Our
approach then, is to capture some of the complexity of such an
ecosystem, particularly in the liminal areas where people, texts, and
technology converge.

**Ethics disclaimer**

Research for this paper was conducted under the aegis of piracyLab, an
academic collective exploring the impact of technology on the spread of
knowledge globally.^[8](#fn-2025-8){#fnref-2025-8}^ One of the lab's
first tasks was to discuss the ethical challenges of collaborative
research in this space. The conversation involved students, faculty,
librarians, and informal legal council. Neutrality, to the extent that
it is possible, emerged as one of our foundational principles. To keep
all channels of communication open, we wanted to avoid bias and to give
voice to a diversity of stakeholders: from authors, to publishers, to
distributors, whether sanctioned or not. Following a frank discussion
and after several iterations, we drafted an ethics charter that
continues to inform our work today. The charter contains the following
provisions:

-- We neither condone nor condemn any forms of information exchange.\
-- We strive to protect our sources and do not retain any identifying
personal information.\
-- We seek transparency in sharing our methods, data, and findings with
the widest possible audience.\
-- Credit where credit is due. We believe in documenting attribution
thoroughly.\
-- We limit our usage of licensed material to the analysis of metadata,
with results used for non-commercial, nonprofit, educational purposes.\
-- Lab participants commit to abiding by these principles as long as
they remain active members of the research group.

In accordance with these principles and following the practice of
scholars like Balazs Bodo ^[9](#fn-2025-9){#fnref-2025-9}^, Eric Priest
^[10](#fn-2025-10){#fnref-2025-10}^, and Ramon Lobato and Leah Tang
^[11](#fn-2025-11){#fnref-2025-11}^, we redact the names of file sharing
services and user names, where such names are not made explicitly public
elsewhere.

**Centralization**

We begin with the intuition that all infrastructure is social to an
extent. Even private library collections cannot be said to reflect the
work of a single individual. Collective forces shape furniture, books,
and the very cognitive scaffolding that enables reading and
interpretation. Yet, there are significant qualitative differences in
the systems underpinning private collections, public libraries, and
unsanctioned peer-to-peer information exchanges like *The Pirate Bay*,
for example. Given these differences, the recent history of online book
sharing can be divided roughly into two periods. The first is
characterized by local, ad-hoc peer-to-peer document exchanges and the
subsequent growth of centralized content aggregators. Following trends
in the development of the web as a whole, shadow libraries of the second
period are characterized by communal governance and distributed
infrastructure.

Shadow libraries of the first period resemble a private library in that
they often emanate from a single authoritative source--a site of
collection and distribution associated with an individual collector,
sometimes explicitly. The library of Maxim Moshkov, for example,
established in 1994 and still thriving at *lib.ru*, is one of the most
visible collections of this kind. Despite their success, such libraries
are limited in scale by the means and efforts of a few individuals. Due
to their centralized architecture they are also susceptible to legal
challenges from copyright owners and to state intervention.
Shadow libraries responded to these problems by distributing labor,
responsibility, and infrastructure, resulting in a system that is more
robust, more redundant, and more resistant to any single point of
failure or control.

The case of *Gigapedia* (later *library.nu*) and its related file
hosting service *ifile.it* demonstrates the successes and the
deficiencies of the centralized digital library model. Arguably among
the largest and most popular virtual libraries online in the period of
2009-2011, the sites were operated by Irish
nationals^[12](#fn-2025-12){#fnref-2025-12}^ on domains registered in
Italy and on the island state of Niue, with servers on the territory of
Germany and Ukraine. At its peak, *library.nu* (LNU) hosted more than
400,000 books and was purported to make an "estimated turnover of EUR 8
million (USD 10,602,400) from advertising revenues, donations and sales
of premium-level accounts," at least according to a press release made
by the International Publishers Association
(IPA).^[13](#fn-2025-13){#fnref-2025-13}^\
*Archived version of library.nu, circa 12/10/2010*

Its apparent popularity notwithstanding, *LNU/Gigapedia* was supported
by relatively simple architecture, likely maintained by a lone
developer-administrator. The site itself consisted of a catalog of
digital books and related metadata, including title, author, year of
publication, number of pages, description, category classification, and
a number of boolean parameters (whether the file is bookmarked,
paginated, vectorized, is searchable, and has a cover). Although the
books could be hosted anywhere, many in the catalog resided on the
servers of a "cyberlocker" service *ifile.it*, affiliated with the main
site. Not strictly a single-source archive, *LNU/Gigapedia* was
nevertheless a federated entity, tied to a single site and to a single
individual. On February 15, 2012, in a Munich court, the IPA, in
conjunction with a consortium of international publishing houses and the
help of the German law firm Lausen
Rechtsanwalte,^[14](#fn-2025-14){#fnref-2025-14}^ served judicial
cease-and-desist orders naming both sites (*Gigapedia* and *ifile.it*).
Seventeen injunctions were sought in Ireland, with the consequent
voluntary shut-down of both domains, which for a brief time redirected
visitors first to *Google Books* and then to *Blue Latitudes*, a *New
York Times* bestseller about pirates, for sale on *Amazon*.

::: {#attachment_2430 .wp-caption .alignnone style="width: 310px"}
[![](http://computationalculture.net/wp-content/uploads/2014/11/figure-13-300x176.jpg "figure-1"){.size-medium
.wp-image-2430 width="300" height="176"
sizes="(max-width: 300px) 100vw, 300px"
srcset="http://computationalculture.net/wp-content/uploads/2014/11/figure-13-300x176.jpg 300w, http://computationalculture.net/wp-content/uploads/2014/11/figure-13-1024x603.jpg 1024w"}](http://computationalculture.net/wp-content/uploads/2014/11/figure-13.jpg)

Figure 1: Archived version of library.nu, circa 12/10/2010
:::

The relatively brief, by library standards, existence of *LNU/Gigapedia*
underscores a weakness in the federated library model. The site
flourished as long as it did not attract the ire of the publishing
industry. A lack of redundancy in the site's administrative structure
paralleled its lack on the server level. Once the authorities were able
to establish the identity of the site's operators (via *Paypal*
receipts, according to a partner at Lausen Rechtsanwalte), the project
was forced to shut down irrevocably.^[15](#fn-2025-15){#fnref-2025-15}^
The system's single point of origin proved also to be its single point
of failure.

Jens Bammel, Secretary General of the IPA, called the action "an
important step towards a more transparent, honest and fair trade of
digital content on the Internet."^[16](#fn-2025-16){#fnref-2025-16}^ The
rest of the internet mourned the passage of "the greatest, largest and
the best website for downloading
eBooks,"^[17](#fn-2025-17){#fnref-2025-17}^ comparing the demise of
*LNU/Gigapedia* to the burning of the ancient Library of
Alexandria.^[18](#fn-2025-18){#fnref-2025-18}^ Readers from around the
world flocked to sites like *Reddit* and *TorrentFreak* to express their
support and anger. For example, one reader wrote on *TorrentFreak*:

> I live in Macedonia (the Balkans), a country where the average salary
> is somewhere around 200eu, and I'm a student, attending a MA degree in
> communication sci. \[...\] where I come from the public library is not
> an option. \[...\] Our libraries are so poor, mostly containing 30year
> or older editions of books that almost never refer to the field of
> communication or any other contemporary science. My professors never
> hide that they use sites like library.nu \[...\] Original textbooks
> \[...\] are copy-printed handouts of some god knows how obtained
> original \[...\] For a country like Macedonia and the Balkans region
> generally THIS IS A APOCALYPTIC SCALE DISASTER! I really feel like the
> dark age is just around the corner these
> days.^[19](#fn-2025-19){#fnref-2025-19}^

A similar comment on *Reddit* reads:

> This is the saddest news of the year...heart-breaking...shocking...I
> was so attached to this site...I am from a third world country where
> buying original books is way too expensive if we see currency exchange
> rates...library.nu was a sea of knowledge for me and I learnt a lot
> from it \[...\] RIP library.nu...you have ignited several minds with
> free knowledge.^[20](#fn-2025-20){#fnref-2025-20}^

Another redditor wrote:

> This was an invaluable resource for international academics. The
> catalog of libraries overseas often cannot meet the needs of
> researchers in fields not specific to the country in which they are
> located. My doctoral research has taken a significant blow due to this
> recent shutdown \[...\] Please publishers, if you take away such a
> valuable resource, realize that you have created a gap that will be
> filled. This gap can either be filled by you or by
> us.^[21](#fn-2025-21){#fnref-2025-21}^

Another concludes:

> This just makes me want to start archiving everything I can get my
> hands on.^[22](#fn-2025-22){#fnref-2025-22}^

These anecdotal reports confirm our own experiences of studying and
teaching at universities with a diverse audience of international
students, who often recount a similar personal narrative. *Gigapedia*
and analogous sites fulfilled an unmet need in the international market,
redressing global inequities of access to
information.^[23](#fn-2025-23){#fnref-2025-23}^

But, being a cyberlocker-based service, *Gigapedia* did not succeed in
cultivating a meaningful sense of a community (even though it supported
a forum for brief periods of its existence). As Lobato and Tang
^[24](#fn-2025-24){#fnref-2025-24}^ write in their paper on
cyberlocker-based media distribution systems, cyberlockers in general
"do not foster collaboration and co-creation," taking an "instrumental
view of content hosted on their
sites."^[25](#fn-2025-25){#fnref-2025-25}^ Although not strictly a
cyberlocker, *LNU/Gigapedia* fit the profile of a passive,
non-transformative site by these criteria. For Lobato and Tang, the
rapid disappearance of many prominent cyberlocker sites underscores the
"structural instability" of "fragile file-hosting
ecology."^[26](#fn-2025-26){#fnref-2025-26}^ In our case, it would be
more precise to say that cyberlocker architecture highlights rather the
structural instability of centralized media archives, and not of file
sharing communities in general. Although bereaved readers were concerned
about the irrevocable loss of a valuable resource, digital libraries
that followed built a model of file sharing that is more resilient, more
transparent, and more participatory than their *LNU/Gigapedia*
predecessors.

**Distribution**

In parallel with the development of *LNU/Gigapedia*, a group of Russian
enthusiasts were working on a meta-library of sorts, under the name of
*Aleph*. Records of *Aleph's* activity go back at least as far as 2009.
Colloquially known as "prospectors," the volunteer members of *Aleph*
compiled library collections widely available on the gray market, with
an emphasis on academic and technical literature in Russian and
English.\
*DVD case cover of "Traum's library" advertising "more than 167,000
books" in fb2 format. Similar DVDs sell for around 1,000 RUB (\$25-30
US) on the streets of Moscow.*

At its inception, *Aleph* aggregated several "home-grown" archives,
already in wide circulation in universities and on the gray market.
These included:

-- *KoLXo3*, a collection of scientific texts that was at one time
distributed on 20 DVDs, overlapping with early Gigapedia efforts;\
-- *mexmat*, a library collected by the members of Moscow State
University's Department of Mechanics and Mathematics for internal use,
originally distributed through private FTP servers;\
-- *Homelab*, *Ihtik*, and *Ingsat* libraries;\
-- the Foreign Fiction archive collected from IRC \#\*\*\*
2003.09-2011.07.09 and the Internet Library;\
-- the *Great Science Textbooks* collection and, later, over 20 smaller
miscellaneous archives.^[27](#fn-2025-27){#fnref-2025-27}^

In retrospect, we can categorize the founding efforts along three
parallel tracks: 1) as the development of "front-end" server software
for searching and downloading books, 2) as the organization of an online
forum for enthusiasts willing to contribute to the project, and 3) the
collection effort required to expand and maintain the "back-end" archive
of documents, primarily in .pdf and .djvu
formats.^[28](#fn-2025-28){#fnref-2025-28}^ "What do we do?" writes one
of the early volunteers (in 2009) on the topic of "Outcomes, Goals, and
Scope of the Project." He answers: "we loot sites with ready-made
collections," "sort the indices in arbitrary normalized formats," "for
uncatalogued books we build a 'technical index': name of file, size,
hashcode," "write scripts for database sorting after the initial catalog
process," "search the database," "use the database for the construction
of an accessible catalog," "build torrents for the distribution of files
in the collection."^[29](#fn-2025-29){#fnref-2025-29}^ But, "everything
begins with the forum," in the words of another founding
member.^[30](#fn-2025-30){#fnref-2025-30}^ *Aleph*, the very name of the
group, reflects the aspiration to develop a "platform for the inception
of subsequent and more user-friendly" libraries--a platform "useful for
the developer, the reader, and the
librarian."^[31](#fn-2025-31){#fnref-2025-31}^\
Aleph's *anatomy*

::: {#attachment_2431 .wp-caption .alignnone style="width: 310px"}
[![](http://computationalculture.net/wp-content/uploads/2014/11/figure-21-300x300.jpg "figure-2"){.size-medium
.wp-image-2431 width="300" height="300"
sizes="(max-width: 300px) 100vw, 300px"
srcset="http://computationalculture.net/wp-content/uploads/2014/11/figure-21-300x300.jpg 300w, http://computationalculture.net/wp-content/uploads/2014/11/figure-21-150x150.jpg 150w, http://computationalculture.net/wp-content/uploads/2014/11/figure-21-1024x1024.jpg 1024w, http://computationalculture.net/wp-content/uploads/2014/11/figure-21.jpg 1200w"}](http://computationalculture.net/wp-content/uploads/2014/11/figure-21.jpg)

Figure 2: DVD case cover of "Traum's library" advertising "more than
167,000 books
:::

What is *Aleph*? Is it a collection of books? A community? A piece of
software? What makes a library? When attempting to visualize Aleph's
constituents (Figure 3), it seems insufficient to point to books alone,
or to social structure, or to technology in the absence of people and
content. Taking a systems approach to description, we understand a
library to comprise an assemblage of books, people, and infrastructure,
along with their corresponding words and texts, rules and institutions,
and shelves and servers.^[32](#fn-2025-32){#fnref-2025-32}^ In this
light, *Aleph*'s iteration on *LNU/Gigapedia* lies not in technological
advancement alone, but in system architecture, on all levels of
analysis.

Where the latter relied on proprietary server applications, *Aleph*
built software that enabled others to mirror and to serve the site in
its entirety. The server was written by d\* from www.l\*.com (Bet),
utilizing a codebase common to several similar large book-sharing
communities. The initial organizational efforts happened on a sub-forum
of a popular torrent tracker (*RR*). Fifteen founding members reached
early consensus to start hashing document filenames (using the MD5
message-digest algorithm), rather than to store files as is, with their
appropriate .pdf or .mobi extensions.^[33](#fn-2025-33){#fnref-2025-33}^
Bit-wise hashing was likely chosen as a (computationally) cheap way to
de-duplicate documents, since two identical files would hash into an
identical string. Hashing the filenames was hoped to have the
side-effect of discouraging direct (file system-level) browsing of the
archive.^[34](#fn-2025-34){#fnref-2025-34}^ Instead, the books were
meant to be accessed through the front-end "librarian" interface, which
added a layer of meta-data and search tools. In other words, the group
went out of its way to distribute *Aleph* as a library and not merely as
a large aggregation of raw files.

::: {#attachment_2221 .wp-caption .alignnone style="width: 593px"}
[![](http://computationalculture.net/wp-content/uploads/2014/10/figure-3.jpg "figure-3"){.size-full
.wp-image-2221 width="583" height="526"
sizes="(max-width: 583px) 100vw, 583px"
srcset="http://computationalculture.net/wp-content/uploads/2014/10/figure-3.jpg 583w, http://computationalculture.net/wp-content/uploads/2014/10/figure-3-300x270.jpg 300w"}](http://computationalculture.net/wp-content/uploads/2014/10/figure-3.jpg)

Figure 3: Aleph's anatomy
:::

Site volunteers coordinate their efforts asynchronously, by means of a
simple online forum (using *phpBB* software), open to all interested
participants. Important issues related to the governance of the
project--decisions about new hardware upgrades, software design, and
book acquisition--receive public airing. For example, at one point, the
site experienced increased traffic from *Google* searches. Some senior
members welcomed the attention, hoping to attract new volunteers. Others
worried increased visibility would bring unwanted scrutiny. To resolve
the issue, a member suggested delisting the website by altering the
robots.txt configuration file and thereby blocking *Google*
crawlers.^[35](#fn-2025-35){#fnref-2025-35}^ Consequently, the site
would become invisible to *Google*, while remaining freely accessible
via a direct link. Early conversations on *RR*, reflect a consistent
concern about the archive's longevity and its vulnerability to official
sanctions. Rather than following the cyber-locker model of distribution,
the prospectors decided to release canonical versions of the library in
chunks, via *BitTorrent*--a distributed protocol for file sharing.
Another decision was made to "store" the library on open trackers (like
*The Pirate Bay*), rather than tying it to a closed, by-invitation-only
community. Although *LN/Gigapedia* was already decentralized to an
extent, the archeology of the community discussion reveals a multitude
of concious choices that work to further atomize *Aleph* and to
decentralize it along the axes of the collection, governance, and
engineering.

By March of 2009 these efforts resulted in approximately 79k volumes or
around 180gb of data.^[36](#fn-2025-36){#fnref-2025-36}^ By December of
the same year, the moderators began talking about a terabyte, 2tb in
2010, and around 7tb by 2011.^[37](#fn-2025-37){#fnref-2025-37}^ By
2012, the core group of "prospectors" grew to 1,000 registered users.
*Aleph*'s main mirror received over a million page views per month and
about 40,000 unique visits per day.^[38](#fn-2025-38){#fnref-2025-38}^
An online eBook piracy report estimates a combined total of a million
unique visitors per day for *Aleph* and its
mirrors.^[39](#fn-2025-39){#fnref-2025-39}^

As of January 2014, the *Aleph* catalog contains over a million books
(1,021,000) and over 15 million academic articles, "weighing in" at just
under 10tb. Most remarkably, one of the world's largest digital
libraries operates on an annual budget of \$1,900
US.^[40](#fn-2025-40){#fnref-2025-40}^

\#\#\# Vulnerability\
Distributed architecture gives *Aleph* significant advantages over its
federated predecessors. Were *Aleph* servers to go offline the archive
would survive "in the cloud" of the *BitTorrent* network. Should the
forum (*Bet*) close, another online forum could easily take its place.
And were *Aleph* library portal itself go dark, other mirrors would (and
usually do) quickly take its place.

But the decentralized model of content distribution is not without its
challenges. To understand them, we need to review some of the
fundamentals behind the *BitTorrent* protocol. At its bare minimum (as
it was described in the original specification by Bram Cohen) the
protocol involves a "seeder," someone willing to share something it its
entirety; a "leecher," someone downloading shared data; and a torrent
"tracker" that coordinates activity between seeders and
leechers.^[41](#fn-2025-41){#fnref-2025-41}^

Imagine a music album sharing agreement between three friends, where,
initially, only one holds a copy of some album: for example, Nirvana's
*Nevermind*. Under the centralized model of file sharing, the friend
holding the album would transmit two copies, one to each friend. The
power of *BitTorrent* comes from shifting the burden of sharing from a
single seeder (friend one) to a "swarm" of leechers (friends two and
three). On this model, the first leecher joining the network (friend
two, in our case) would begin to get his data from the seeder directly,
as before. But the second leecher would receive some bits from the
seeder and some from the first leecher, in a non-linear, asynchronous
fashion. In our example, we can imagine the remaining friend getting
some songs from the first friend and some from the second. The friend
who held the album originally now transmitted something less than two
full copies of the album, since the other two friends exchanged some
bits of information between themselves, lessening the load on the
original album holder.

When downloading from the *BitTorrent* network, a peer may receive some
bits from the beginning of the document, some from the middle, and some
from the end, in parts distributed among the members of the swarm. A
local application called the "client" is responsible for checking the
integrity of the pieces and for reassembling the them into a coherent
whole. A torrent "tracker" coordinates the activity between peers,
keeping track of who has what where. Having received the whole document,
a leecher can, in turn, become a seeder by sharing all of his downloaded
bits with the remaining swarm (who only have partial copies). The
leecher can also take the file offline, choosing not to share at
all.^[42](#fn-2025-42){#fnref-2025-42}^

The original protocol left torrent trackers vulnerable to charges of
aiding and abetting copyright
infringement.^[43](#fn-2025-43){#fnref-2025-43}^ Early in 2008, Cohen
extended *BitTorrent* to make use of  "distributed sloppy hash tables"
(DHT) for storing peer locations without resorting to a central tracker.
Under these new guidelines, each peer would maintain a small routing
table pointing to a handful of nearby peer locations. In effect, DHT
placed additional responsibility on the swarm to become a tracker of
sorts, however "sloppy" and imperfect. By November of of 2009, *Pirate
Bay* announced its transition away from tracking entirely, in favor of
DHT and the related PEX and Magnetic Links protocols. At the time they
called it, "world's most resilient
tracking."^[44](#fn-2025-44){#fnref-2025-44}^

Despite these advancements, the decentralized model of file sharing
remains susceptible to several chronic ailments. The first follows from
the fact that ad-hoc distribution networks privilege popular material. A
file needs to be actively traded to ensure its availability. If nobody
is actively sharing and downloading Nirvana's *Nevermind*, the album is
in danger of fading out of the cloud. As one member wrote succinctly on
*Gimel* forums, "unpopular files are in danger of become
inaccessible."^[45](#fn-2025-45){#fnref-2025-45}^ This dynamic is less
of a concern for Hollywood blockbusters, but more so for "long tail"
specialized materials of the sort found in *Aleph*, and indeed, for
*Aleph* itself as a piece of software distributed through the network.
*Aleph* combats the problem of fading torrents by renting
"seedboxes"--servers dedicated to keeping the *Aleph* seeds containing
the archive alive, preserving the availability of the collection. The
server in production as of 2014 can serve up to 12tb of data speeds of
100-800 megabits per second. Other file sharing communities address the
issue by enforcing a certain download to upload ratio on members of
their network.

The lack of true anonymity is the second problem intrinsic to the
*BitTorrent* protocol. Peers sharing bits directly cannot but avoid
exposing their IP address (unless these are masked behind virtual
private networks or TOR relays). A "Sybil" attack becomes possible when
a malicious peer shares bits in bad faith, with the intent to log IP
addresses.^[46](#fn-2025-46){#fnref-2025-46}^ Researchers exploring this
vector of attack were able to harvest more than 91,000 IP addresses in
less than 24 hours of sharing a popular television
show.^[47](#fn-2025-47){#fnref-2025-47}^ They report that more than 9%
of requests made to their servers indicated "modified clients", which
are likely also to be running experiments in the DHT. Legitimate
copyright holders and copyright "trolls" alike have used this
vulnerability to bring lawsuits against individual sharers in
court.^[48](#fn-2025-48){#fnref-2025-48}^

These two challenges are further exacerbated in the case of *Aleph*,
which uses *BitTorrent* to distribute large parts of its own
architecture. These parts are relatively large--around 40-50GB each.
Long-term sustainability of *Aleph* as a distributed system therefore
requires a rare participant: one interested in downloading the archive
as a whole (as opposed to downloading individual books), one who owns
the hardware to store and transmit terabytes of data, and one possessing
the technical expertise to do so safely.

**Peer preservation**

In light of the challenges and the effort involved in maintaining the
archive, one would be remiss to describe *Aleph* merely in terms of book
piracy, understood in conventional terms of financial gain, theft, or
profiteering. Day-to-day labor of the core group is much more
comprehensible as a mode of commons-based peer production, which is, in
the canonical definition, work made possible by a "networked
environment," "radically decentralized, collaborative, and
non-proprietary; based on sharing resources and outputs among widely
distributed, loosely connected individuals who cooperate with each other
without relying on either market signals or managerial
commands."^[49](#fn-2025-49){#fnref-2025-49}^ *Aleph* answers the
definition of peer production, resembling in many respects projects like
*Linux*, *Wikipedia*, and *Project Gutenberg*.

Yet, *Aleph* is also patently a library. Its work can and should be
viewed in the broader context of Enlightenment ideals: access to
literacy, universal education, and the democratization of knowledge. The
very same ideals gave birth to the public library movement as a whole at
the turn of the 20th century, in the United States, Europe, and
Russia.^[50](#fn-2025-50){#fnref-2025-50}^ Parallels between free
library movements of the early 20th and the early 21st centuries point
to a social dynamic that runs contrary to the populist spirit of
commons-based peer production projects, in a mechanism that we describe
as peer preservation. The idea encompasses conflicting drives both to
share and to hoard information.

The roots of many public libraries lie in extensive private collections.
Bodleian Library at Oxford, for example, traces its origins back to the
collections of Thomas Cobham, Bishop of Worcester, Humphrey, Duke of
Gloucester, and to Thomas Bodley, himself an avid book collector.
Similarly, Poland's Zaluski Library, one of Europe's oldest, owes its
existence to the collecting efforts of the Zaluski brothers, both
bishops and bibliophiles.^[51](#fn-2025-51){#fnref-2025-51}^ As we
mentioned earlier, *Aleph* too began its life as an aggregator of
collections, including the personal libraries of Moshkov and Traum. When
books are scarce, private libraries are a sign of material wealth and
prestige. In the digital realm, where the cost of media acquisition is
low, collectors amass social capital. *Aleph* extends its collecting
efforts on *RR*, a much larger, moderated torrent exchange forum and
tracker. *RR* hosts a number of sub-forums dedicated to the exchange of
software, film, music, and books (where members of *Aleph* often make an
appearance). In the exchange economy of symbolic goods, top collectors
are known by their standing in the community, as measured by their
seniority, upload and download ratios, and the number of "releases." A
release is more than just a file: it must not duplicate items in the
archive and follows strict community guidelines related to packaging,
quality, and meta-data accompanying the document. Less experienced
members of the community treat high status numbers with reverence and
respect.

According to a question and answer session with an official *RR*
representative, *RR* is not particularly friendly to new
users.^[52](#fn-2025-52){#fnref-2025-52}^ In fact, high barriers to
entry are exactly what differentiates *RR* from sites like *The Pirate
Bay* and other unmoderated, open trackers. *RR* prides itself on the
"quality of its moderation." Unlike *Pirate Bay*, *RR* sees itself as a
"media library", where content is "organized and properly shelved." To
produce an acceptable book "release" one needs to create a package of
files, including well-formatted meta-data (following strict stylistic
rules) in the header, the name of the book, an image of its cover, the
year of release, author, genre, publisher, format, language, a required
description, and screenshots of a sample page. The files must be named
according to a convention, be "of the same kind" (that is belong to the
same collection), and be of the right size. Home-made scans are
discouraged and governed by a 1,000-words instruction manual. Scanned
books must have clear attribution to the releaser responsible for
scanning and processing.

More than that, guidelines indicate that smaller releases should be
expected to be "absorbed" into larger ones. In this way, a single novel
by Charles Dickens can and will be absorbed into his collected works,
which might further be absorbed into "Novels of 19th Century," and then
into "Foreign Fiction" (as a hypothetical, but realistic example).
According to the rules, the collection doing the absorbing must be "at
least 50% larger than the collection it is absorbing." Releases are
further governed by a subset or rules particular to the forum
subsections (e.g. journals, fiction, documentation, service manuals,
etc.).^[53](#fn-2025-53){#fnref-2025-53}^

All this to say that although barriers to acquisition are low, the
barriers to active participation are high and continually *increase with
time*. The absorption of smaller collections by larger favors the
veterans. Rules and regulations grow in complexity with the maturation
of the community, further widening the rift between senior and junior
peers. We are then witnessing something like the institutionalization of
a professional "librarian" class, whose task it is to protect the
collection from the encroachment of low-quality contributors. Rather
than serving the public, a librarian's primary commitment is to the
preservation of the archive as a whole. Thus what starts as a true peer
production project, may, in the end, grow to erect solid walls to
peering. This dynamic is already embodied in the history of public
libraries, where amateur librarians of the late 19th century eventually
gave way to their modern degree-holding counterparts. The conflicting
logistics of access and preservation may lead digital library
development along a similar path.

The expression of this dual push and pull dynamic in the observed
practices of peer preservation communities conforms to Derrida's insight
into the nature of the archive. Just as the walls of a library serve to
shelter the documents within, they also isolate the collection from the
public at large. Access and preservation, in that sense, subsist at
opposite and sometime mutually exclusive ends of the sharing spectrum.
And it may be that this dynamic is particular to all peer production
communities, like *Wikipedia*, which, according to recent studies, saw a
decline in new contributors due to increasingly strict rule
enforcement.^[54](#fn-2025-54){#fnref-2025-54}^ However, our results are
merely speculative at the moment. The analysis of a large dataset we
have collected as corollary to our field work online may offer further
evidence for these initial intuitions. In the meantime, it is not enough
to conclude that brick-and-mortar libraries should learn from these
emergent, distributed architectures of peer preservation. If the future
of *Aleph* is leading to increased institutionalization, the community
may soon face the fate embodied by its own procedures: the absorption of
smaller, wonderfully messy, ascending collections into larger, more
established, and more rigid social structures.

 

 

**Biographies**

Dennis Tenen teaches in the fields of new media and digital humanities
at Columbia University, Department of English and Comparative
Literature. His research often happens at the intersection of people,
texts, and technology. He is currently writing a book on minimal
computing, called *Plain Text*.

Maxwell Foxman is an adjunct professor at Marymount Manhattan College
and a PhD candidate in Communications at Columbia University, where he
studies the use and adoption of digital media into everyday life. He has
written on failed social media and on gamification in electoral
politics, newsrooms, and mobile media.

**References**

Allen, Elizabeth Akers, and James Phinney Baxter. *Dedicatory Exercises
of the Baxter Building*. Auburn, Me: Lakeside Press, 1889.

Anonymous author. "Library.nu: Modern era's 'Destruction of the Library
of Alexandria.'" *Breaking Culture*. Last edited on February 16, 2012
and archived on archived on January 14, 2014.
[http://breakingculture.tumblr.com/post/17697325088/gigapedia-rip](“https://web.archive.org/web/20140113135846/http://breakingculture.tumblr.com/post/17697325088/gigapedia-rip”).

Benkler, Yochai. *The Wealth of Networks: How Social Production
Transforms Markets and Freedom*. New Haven: Yale University Press, 2006.

Bittorrent.org. "The BitTorrent Protocol Specification." Last modified
October 20, 2012 and archived on June 13, 2014.
[http://www.bittorrent.org/beps/bep\_0003.html](“http://web.archive.org/web/20140613190300/http://www.bittorrent.org/beps/bep_0003.html”).

Bodo, Balazs. "Set the Fox to Watch the Geese: Voluntary IP Regimes in
Piratical File-Sharing Communities." In *Piracy: Leakages from
Modernity*. Litwin Books, LLC, 2012.

Bowker, Geoffrey C., and Susan Leigh Star. *Sorting Things Out:
Classification and Its Consequences*. The MIT Press, 1999.

Calandrillo, Steve P. "Economic Analysis of Property Rights in
Information: Justifications and Problems of Exclusive Rights, Incentives
to Generate Information, and the Alternative of a Government-Run Reward
System, an." *Fordham Intellectual Property, Media & Entertainment Law
Journal* 9 (1998): 301.

Calhoun, Craig. "Information Technology and the International Public
Sphere." *In Shaping the Network Society: the New Role of Civil Society
in Cyberspace*, edited by Douglas Schuler and Peter Day, 229--52. MIT
Press, 2004.

Castells, Manuel. "Communication, Power and Counter-Power in the Network
Society." *International Journal of Communication* 1 (2007): 238--66.

Cholez, Thibault, Isabelle Chrisment, and Olivier Festor. "Evaluation of
Sybil Attacks Protection Schemes in KAD." In *Scalability of Networks
and Services*, edited by Ramin Sadre and Aiko Pras, 70--82. Lecture
Notes in Computer Science 5637. Springer Berlin Heidelberg, 2009.

Cohen, Bram. *Incentives Build Robustness in BitTorrent*, May 22, 2003.
[http://www.bittorrent.org/bittorrentecon.pdf](“http://www.bittorrent.org/bittorrentecon.pdf”).

Cohen, Julie. "Creativity and Culture in Copyright Theory." *U.C. Davis
Law Review* 40 (2006): 1151.

Day, Brian R. *In Defense of Copyright: Creativity, Record Labels, and
the Future of Music*. SSRN Scholarly Paper. Rochester, NY: Social
Science Research Network, May 2010.

Derrida, Jacques. "Archive Fever: a Freudian Impression." *Diacritics*
25, no. 2 (July 1995): 9--63.

DiMaggio, Paul, Eszter Hargittai, W. Russell Neuman, and John P.
Robinson. "Social Implications of the Internet." *Annual Review of
Sociology* 27 (January 2001): 307--36.

Edwards, Paul N. "Infrastructure and Modernity: Force, Time, and Social
Organization in the History of Sociotechnical Systems." In *Modernity
and Technology*, 185--225, 2003.

---------. "Y2K: Millennial Reflections on Computers as Infrastructure."
*History and Technology* 15, no. 1-2 (1998): 7--29.

Edwards, Paul N., Geoffrey C. Bowker, Steven J. Jackson, and Robin
Williams. "Introduction: an Agenda for Infrastructure Studies." *Journal
of the Association for Information Systems* 10, no. 5 (2009): 364--74.

Ernesto. "US P2P Lawsuit Shows Signs of a 'Pirate Honeypot'."
Technology. *TorrentFreak*. Last edited in June 2011 and archived on
January 14, 2014.
[http://torrentfreak.com/u-s-p2p-lawsuit-shows-signs-of-a-pirate-honeypot-110601/](“https://web.archive.org/web/20140114200326/http://torrentfreak.com/u-s-p2p-lawsuit-shows-signs-of-a-pirate-honeypot-110601/”).

Gauravaram, Praveen, and Lars R. Knudsen. "Cryptographic Hash
Functions." In *Handbook of Information and Communication Security*,
edited by Peter Stavroulakis and Mark Stamp, 59--79. Springer Berlin
Heidelberg, 2010.

Greenwood, Thomas. *Public Libraries: a History of the Movement and a
Manual for the Organization and Management of Rate Supported Libraries*.
Simpkin, Marshall, Hamilton, Kent, 1890.

Halfaker, Aaron, R. Stuart Geiger, Jonathan T. Morgan, and John Riedl.
"The Rise and Decline of an Open Collaboration System: How Wikipedia's
Reaction to Popularity Is Causing Its Decline." *American Behavioral
Scientist*, December 2012, 0002764212469365.

Harris, Michael H. *History of Libraries of the Western World*. Fourth
Edition. Lanham, Md.; London: Scarecrow Press, 1999.

Hughes, Justin. "Philosophy of Intellectual Property, the." *Georgetown
Law Journal* 77 (1988): 287.
http://heinonline.org/HOL/Page?handle=hein.journals/glj77&id=309&div=&collection=journals.

Hugo, Victor. *Works of Victor Hugo*. New York: Nottingham Society,
1907.

International Publishers Association. "Publishers Strike Major Blow
against Internet Piracy." Last modified February 15, 2012.
[http://www.internationalpublishers.org/ipa-press-releases/286-publishers-strike-major-blow-against-internet-piracy](“http://www.internationalpublishers.org/ipa-press-releases/286-publishers-strike-major-blow-against-internet-piracy”).

Johnson, Simon for Reuters.com. "Pirate Bay Copyright Test Case Begins
in Sweden." Last edited on February 16, 2009 and archived on August 4,
2014.
[http://uk.reuters.com/article/2009/02/16/tech-us-sweden-piratebay-idUKTRE51F3K120090216](http://web.archive.org/web/20140804000829/http://uk.reuters.com/article/2009/02/16/tech-us-sweden-piratebay-idUKTRE51F3K120090216”).\]

Karaganis, Joe, ed. *Media Piracy in Emerging Economies*. Social Science
Research Network, March 2011.
[http://piracy.americanassembly.org/the-report/.](“http://piracy.americanassembly.org/the-report/”).

Landes, William M., and Richard A. Posner. *The Economic Structure of
Intellectual Property Law*. Harvard University Press, 2003.

Larkin, Brian. "Degraded Images, Distorted Sounds: Nigerian Video and
the Infrastructure of Piracy." *Public Culture* 16, no. 2 (2004):
289--314.

---------. "Pirate Infrastructures." In *Structures of Participation in
Digital Culture*, edited by Joe Karaganis, 74--87. New York: SSRC, 2008.

Lessig, Lawrence. *Free Culture: How Big Media Uses Technology and the
Law to Lock Down Culture and Control Creativity*. The Penguin Press,
2004.

Liang, Lawrence. "Shadow Libraries E-Flux," last edited 2012 and
archived on October 14, 2014.
http://www.e-flux.com/journal/shadow-libraries/.

Lobato, Ramon, and Leah Tang. "The Cyberlocker Gold Rush: Tracking the
Rise of File-Hosting Sites as Media Distribution Platforms."
*International Journal of Cultural Studies*, November 2013.

Losowsky, Andrew. "Book Downloading Site Targeted in Injunctions
Requested by 17 Publishers." *Huffington Post*, last edited on February
2012 and archived on October 14, 2014.
[http://www.huffingtonpost.com/2012/02/15/librarynu-book-downloading-injunction\_n\_1280383.html](“http://www.huffingtonpost.com/2012/02/15/librarynu-book-downloading-injunction_n_1280383.html”).

Papacharissi, Zizi. "The Virtual Sphere the Internet as a Public
Sphere." *New Media & Society* 4, no. 1 (February 2002): 9--27.

Priest, Eric. "The Future of Music and Film Piracy in China." *Berkeley
Technology Law Journal* 21 (2006): 795.

Salmon, Ricardo, Jimmy Tran, and Abdolreza Abhari. "Simulating a File
Sharing System Based on BitTorrent." In *Proceedings of the 2008 Spring
Simulation Multiconference*, 21:1--:5. SpringSim '08. San Diego, CA,
USA: Society for Computer Simulation International, 2008.

Shirky, Clay. *Here Comes Everybody: the Power of Organizing Without
Organizations*. New York: Penguin Press, 2008.

Star, Susan Leigh, and Geoffrey C. Bowker. "How to Infrastructure." In
*Handbook of New Media: Social Shaping and Social Consequences of ICTs*,
Updated Student Edition., 230--46. SAGE Publications Ltd, 2010.

Stuart, Mary. "Creating a National Library for the Workers' State: the
Public Library in Petrograd and the Rumiantsev Library Under Bolshevik
Rule." *The Slavonic and East European Review* 72, no. 2 (April 1994):
233--58.

---------. "'The Ennobling Illusion': the Public Library Movement in
Late Imperial Russia." *The Slavonic and East European Review* 76, no. 3
(July 1998): 401--40.

---------. "The Evolution of Librarianship in Russia: the Librarians of
the Imperial Public Library, 1808-1868." *The Library Quarterly* 64, no.
1 (January 1994): 1--29.

Timpanaro, J.P., T. Cholez, I Chrisment, and O. Festor. "BitTorrent's
Mainline DHT Security Assessment." In *2011 4th IFIP International
Conference on New Technologies, Mobility and Security (NTMS)*, 1--5,
2011.

TPB. "Worlds most resiliant tracking." Last edited November 17, 2009 and
archived on August 4, 2014.
[thepiratebay.se/blog/175](“http://web.archive.org/web/20140804015645/http://thepiratebay.se/blog/175”)

Vik. "Gigapedia: The greatest, largest and the best website for
downloading eBooks." Emotionallyspeaking.com. Last edited on August 10,
2009 and archived on July 15, 2012.
[http://archive.is/g205"\>http://vikas-gupta.in/2009/08/10/gigapedia-the-greatest-largest-and-the-best-website-for-downloading-free-e-books/](“http://archive.is/g205”).

 

 

 

 

 

 

 

 

::: {#footnotes-2025 .footnotes}
::: {.footnotedivider}
:::

1. [Victor Hugo, *Works of Victor Hugo* (New York: Nottingham Society,
1907), 230. [[↩](#fnref-2025-1)]{.footnotereverse}]{#fn-2025-1}
2. [Lawrence Liang, "Shadow Libraries E-Flux," 2012.
[[↩](#fnref-2025-2)]{.footnotereverse}]{#fn-2025-2}
3. [McKendrick, Joseph. *Libraries: At the Epicenter of the Digital
Disruption, The Library Resource Guide Benchmark Study on 2013/14
Library Spending Plans* (Unisphere Media, 2013).
[[↩](#fnref-2025-3)]{.footnotereverse}]{#fn-2025-3}
4. ["Archive Fever: a Freudian Impression," *Diacritics* 25, no. 2
(July 1995): 9--63.
[[↩](#fnref-2025-4)]{.footnotereverse}]{#fn-2025-4}
5. [Yochai Benkler, *The Wealth of Networks: How Social Production
Transforms Markets and Freedom* (New Haven: Yale University Press,
2006), 92; Paul DiMaggio et al., "Social Implications of the
Internet," *Annual Review of Sociology* 27 (January 2001): 320; Zizi
Papacharissi "The Virtual Sphere the Internet as a Public Sphere,"
*New Media & Society* 4.1 (2002): 9--27; Craig Calhoun "Information
Technology and the International Public Sphere," in *Shaping the
Network Society: the New Role of Civil Society in Cyberspace*, ed.
Douglas Schuler and Peter Day (MIT Press, 2004), 229--52.
[[↩](#fnref-2025-5)]{.footnotereverse}]{#fn-2025-5}
6. [Benkler, *The Wealth of Networks*, 442; Manuel Castells,
"Communication, Power and Counter-Power in the Network Society,"
*International Journal of Communication* (2007): 251; Lawrence
Lessig *Free Culture:How Big Media Uses Technology and the Law to
Lock Down Culture and Control Creativity* (The Penguin Press, 2004);
Clay Shirky Here Comes Everybody: the Power of Organizing Without
Organizations (New York: Penguin Press, 2008), 153.
[[↩](#fnref-2025-6)]{.footnotereverse}]{#fn-2025-6}
7. [Brian R. Day "In Defense of Copyright: Creativity, Record Labels,
and the Future of Music," *Seton Hall Journal of Sports and
Entertainment Law*, 21.1 (2011); William M. Landes and Richard A.
Posner, *The Economic Structure of Intellectual Property Law*
(Harvard University Press, 2003). For further discussion see
Steve P. Calandrillo, "Economic Analysis of Property Rights in
Information: Justifications and Problems of Exclusive Rights,
Incentives to Generate Information, and the Alternative of a
Government-Run Reward System" *Fordham Intellectual Property, Media
& Entertainment Law Journal* 9 (1998): 306; Julie Cohen, "Creativity
and Culture in Copyright Theory," *U.C. Davis Law Review* 40 (2006):
1151; Justin Hughes "Philosophy of Intellectual Property,"
*Georgetown Law Journal* 77 (1988): 303.
[[↩](#fnref-2025-7)]{.footnotereverse}]{#fn-2025-7}
8. [[piracylab.org](“http://piracylab.org”).
[[↩](#fnref-2025-8)]{.footnotereverse}]{#fn-2025-8}
9. ["Set the Fox to Watch the Geese: Voluntary IP Regimes in Piratical
File-Sharing Communities, in *Piracy: Leakages from Modernity*
(Litwin Books, LLC, 2012).
[[↩](#fnref-2025-9)]{.footnotereverse}]{#fn-2025-9}
10. ["The Future of Music and Film Piracy in China," *Berkeley
Technology Law Journal* 21 (2006): 795.
[[↩](#fnref-2025-10)]{.footnotereverse}]{#fn-2025-10}
11. ["The Cyberlocker Gold Rush: Tracking the Rise of File-Hosting Sites
as Media Distribution Platforms," *International Journal of Cultural
Studies*, (2013).
[[↩](#fnref-2025-11)]{.footnotereverse}]{#fn-2025-11}
12. [The injunctions name I\* and F\* N\* (also known as Smiley).
[[↩](#fnref-2025-12)]{.footnotereverse}]{#fn-2025-12}
13. ["Publishers Strike Major Blow against Internet Piracy" last
modified February 15, 2012 and archived on January 10, 2014,
[http://www.internationalpublishers.org/ipa-press-releases/286-publishers-strike-major-blow-against-internet-piracy](“http://web.archive.org/web/20140110160254/http://www.internationalpublishers.org/ipa-press-releases/286-publishers-strike-major-blow-against-internet-piracy”).
[[↩](#fnref-2025-13)]{.footnotereverse}]{#fn-2025-13}
14. [Including the German Publishers and Booksellers Association,
Cambridge University Press, Georg Thieme, Harper Collins, Hogrefe,
Macmillan Publishers Ltd., Cengage Learning, Elsevier, John Wiley &
Sons, The McGraw-Hill Companies, Pearson Education Ltd., Pearson
Education Inc., Oxford University Press, Springer, Taylor & Francis,
C.H. Beck as well as Walter De Gruyter. The legal proceedings are
also supported by the Association of American Publishers (AAP), the
Dutch Publishers Association (NUV), the Italian Publishers
Association (AIE) and the International Association of Scientific
Technical and Medical Publishers (STM).
[[↩](#fnref-2025-14)]{.footnotereverse}]{#fn-2025-14}
15. [Andrew Losowsky, "Book Downloading Site Targeted in Injunctions
Requested by 17 Publishers," *Huffington Post*, accessed on
September 1, 2014,
[http://www.huffingtonpost.com/2012/02/15/librarynu-book-downloading-injunction\_n\_1280383.html](“http://www.huffingtonpost.com/2012/02/15/librarynu-book-downloading-injunction_n_1280383.html”).
[[↩](#fnref-2025-15)]{.footnotereverse}]{#fn-2025-15}
16. [International Publishers Association.
[[↩](#fnref-2025-16)]{.footnotereverse}]{#fn-2025-16}
17. [Vik, "Gigapedia: The greatest, largest and the best website for
downloading eBooks," Emotionallyspeaking.com, last edited on August
10, 2009 and archived on July 15, 2012,
[http://archive.is/g205"\>http://vikas-gupta.in/2009/08/10/gigapedia-the-greatest-largest-and-the-best-website-for-downloading-free-e-books/](“http://archive.is/g205”).
[[↩](#fnref-2025-17)]{.footnotereverse}]{#fn-2025-17}
18. [Anonymous author, "Library.nu: Modern era's 'Destruction of the
Library of Alexandria,'" *Breaking Culture* (on tublr.com), last
edited on February 16, 2012 and archived on January 14, 2014,
[http://breakingculture.tumblr.com/post/17697325088/gigapedia-rip](“https://web.archive.org/web/20140113135846/http://breakingculture.tumblr.com/post/17697325088/gigapedia-rip”).
[[↩](#fnref-2025-18)]{.footnotereverse}]{#fn-2025-18}
19. [[http://torrentfreak.com/book-publishers-shut-down-library-nu-and-ifile-it-120215](“https://web.archive.org/web/20140110050710/http://torrentfreak.com/book-publishers-shut-down-library-nu-and-ifile-it-120215”)
archived on January 10, 2014.
[[↩](#fnref-2025-19)]{.footnotereverse}]{#fn-2025-19}
20. [[http://www.reddit.com/r/trackers/comments/ppfwc/librarynu\_admin\_the\_website\_is\_shutting\_down\_due](“https://web.archive.org/web/20140110050450/http://www.reddit.com/r/trackers/comments/ppfwc/librarynu_admin_the_website_is_shutting_down_due”)
archived on January 10, 2014.
[[↩](#fnref-2025-20)]{.footnotereverse}]{#fn-2025-20}
21. [[http://www.reddit.com/r/trackers/comments/ppfwc/librarynu\_admin\_the\_website\_is\_shutting\_down\_due](“https://web.archive.org/web/20140110050450/http://www.reddit.com/r/trackers/comments/ppfwc/librarynu_admin_the_website_is_shutting_down_due”)
orchived on January 10, 2014.
[[↩](#fnref-2025-21)]{.footnotereverse}]{#fn-2025-21}
22. [[www.reddit.com/r/trackers/comments/ppfwc/librarynu\_admin\_the\_website\_is\_shutting\_down\_due](“https://web.archive.org/web/20140110050450/http://www.reddit.com/r/trackers/comments/ppfwc/librarynu_admin_the_website_is_shutting_down_due”)
archived on January 10, 2014.
[[↩](#fnref-2025-22)]{.footnotereverse}]{#fn-2025-22}
23. [This point is made at length in the report on media piracy in
emerging economies, released by the American Assembly in 2011. See
Joe Karaganis, ed. *Media Piracy in Emerging Economies* (Social
Science Research Network, March 2011),
[http://piracy.americanassembly.org/the-report/](“http://piracy.americanassembly.org/the-report/”), I.
[[↩](#fnref-2025-23)]{.footnotereverse}]{#fn-2025-23}
24. [Lobato and Tang, "The Cyberlocker Gold Rush."
[[↩](#fnref-2025-24)]{.footnotereverse}]{#fn-2025-24}
25. [Lobato and Tang, "The Cyberlocker Gold Rush," 9.
[[↩](#fnref-2025-25)]{.footnotereverse}]{#fn-2025-25}
26. [Lobato and Tang, "The Cyberlocker Gold Rush," 7.
[[↩](#fnref-2025-26)]{.footnotereverse}]{#fn-2025-26}
27. [GIMEL/viewtopic.php?f=8&t=169; GIMEL/viewtopic.php?f=17&t=299.
[[↩](#fnref-2025-27)]{.footnotereverse}]{#fn-2025-27}
28. [GIMEL/viewtopic.php?f=17&t=299.
[[↩](#fnref-2025-28)]{.footnotereverse}]{#fn-2025-28}
29. [GIMEL/viewtopic.php?f=8&t=169. All quotes translated from Russian
by the authors, unless otherwise noted.
[[↩](#fnref-2025-29)]{.footnotereverse}]{#fn-2025-29}
30. [GIMEL/viewtopic.php?f=8&t=6999&p=41911.
[[↩](#fnref-2025-30)]{.footnotereverse}]{#fn-2025-30}
31. [GIMEL/viewtopic.php?f=8&t=757.
[[↩](#fnref-2025-31)]{.footnotereverse}]{#fn-2025-31}
32. [In this sense, we see our work as complementary to but not
exhausted by infrastructure studies. See Geoffrey C. Bowker and
Susan Leigh Star, *Sorting Things Out: Classification and Its
Consequences* (The MIT Press, 1999); Paul N. Edwards, "Y2K:
Millennial Reflections on Computers as Infrastructure," *History and
Technology* 15.1-2 (1998): 7--29; Paul N. Edwards, "Infrastructure
and Modernity: Force, Time, and Social Organization in the History
of Sociotechnical Systems," in *Modernity and Technology*, 2003,
185--225; Paul N. Edwards et al., "Introduction: an Agenda for
Infrastructure Studies," *Journal of the Association for Information
Systems* 10.5 (2009): 364--74; Brian Larkin "Degraded Images,
Distorted Sounds: Nigerian Video and the Infrastructure of Piracy,"
*Public Culture* 16.2 (2004): 289--314; Brian Larkin "Pirate
Infrastructures," in *Structures of Participation in Digital
Culture*, ed. Joe Karaganis (New York: SSRC, 2008), 74--87; Susan
Leigh Star and Geoffrey C. Bowker, "How to Infrastructure," in
*Handbook of New Media: Social Shaping and Social Consequences of
ICTs*, (SAGE Publications Ltd, 2010), 230--46.
[[↩](#fnref-2025-32)]{.footnotereverse}]{#fn-2025-32}
33. [For information on cryptographic hashing see Praveen Gauravaram and
Lars R. Knudsen, "Cryptographic Hash Functions," in *Handbook of
Information and Communication Security*, ed. Peter Stavroulakis and
Mark Stamp (Springer Berlin Heidelberg, 2010), 59--79.
[[↩](#fnref-2025-33)]{.footnotereverse}]{#fn-2025-33}
34. [See GIMEL/viewtopic.php?f=8&t=55kj and
GIMEL/viewtopic.php?f=8&t=18&sid=936.
[[↩](#fnref-2025-34)]{.footnotereverse}]{#fn-2025-34}
35. [GIMEL/viewtopic.php?f=8&t=714.
[[↩](#fnref-2025-35)]{.footnotereverse}]{#fn-2025-35}
36. [GIMEL/viewtopic.php?f=8&t=47.
[[↩](#fnref-2025-36)]{.footnotereverse}]{#fn-2025-36}
37. [GIMEL/viewtopic.php?f=17&t=175&hilit=RR&start=25.
[[↩](#fnref-2025-37)]{.footnotereverse}]{#fn-2025-37}
38. [GIMEL/viewtopic.php?f=17&t=104&start=450.
[[↩](#fnref-2025-38)]{.footnotereverse}]{#fn-2025-38}
39. [URL redacted; These numbers should be taken as a very rough
estimate because 1) we do not consider Alexa to be a reliable source
for web traffic and 2) some of the other figures cited in the report
are suspicious. For example, *Aleph* has a relatively small archive
of foreign fiction, at odds with the reported figure of 800,000
volumes. [[↩](#fnref-2025-39)]{.footnotereverse}]{#fn-2025-39}
40. [GIMEL/viewtopic.php?f=17&t=7061.
[[↩](#fnref-2025-40)]{.footnotereverse}]{#fn-2025-40}
41. ["The BitTorrent Protocol Specification," last modified October 20,
2012 and archived on June 13, 2014,
[http://www.bittorrent.org/beps/bep\_0003.html](“http://web.archive.org/web/20140613190300/http://www.bittorrent.org/beps/bep_0003.html”).
[[↩](#fnref-2025-41)]{.footnotereverse}]{#fn-2025-41}
42. [For more information on BitTorrent, see Bram Cohen, *Incentives
Build Robustness in BitTorrent*, last modified on May 22, 2003,
[http://www.bittorrent.org/bittorrentecon.pdf](“http://www.bittorrent.org/bittorrentecon.pdf”);
Ricardo Salmon, Jimmy Tran, and Abdolreza Abhari, "Simulating a File
Sharing System Based on BitTorrent," in *Proceedings of the 2008
Spring Simulation Multiconference*, SpringSim '08 (San Diego, CA,
USA: Society for Computer Simulation International, 2008), 21:1--5.
[[↩](#fnref-2025-42)]{.footnotereverse}]{#fn-2025-42}
43. [In 2008 *The Pirate Bay* co-founders Peter Sunde, Gottfrid
Svartholm Warg, Fredrik Neij, and Carl Lundstromwere were charged
with "conspiracy to break copyright related offenses" in Sweden. See
Simon Johnson for Reuters.com, "Pirate Bay Copyright Test Case
Begins in Sweden," last edited on February 16, 2009 and archived on
August 4, 2014,
[http://uk.reuters.com/article/2009/02/16/tech-us-sweden-piratebay-idUKTRE51F3K120090216](http://web.archive.org/web/20140804000829/http://uk.reuters.com/article/2009/02/16/tech-us-sweden-piratebay-idUKTRE51F3K120090216”).
[[↩](#fnref-2025-43)]{.footnotereverse}]{#fn-2025-43}
44. [TPB, "Worlds most resiliant tracking," last edited November 17,
2009 and archived on August 4, 2014,
[thepiratebay.se/blog/175](“http://web.archive.org/web/20140804015645/http://thepiratebay.se/blog/175”).
[[↩](#fnref-2025-44)]{.footnotereverse}]{#fn-2025-44}
45. [GIMEL/viewtopic.php?f=8&t=6999.
[[↩](#fnref-2025-45)]{.footnotereverse}]{#fn-2025-45}
46. [Thibault Cholez, Isabelle Chrisment, and Olivier Festor "Evaluation
of Sybil Attacks Protection Schemes in KAD," in *Scalability of
Networks and Services*, ed. Ramin Sadre and Aiko Pras, Lecture Notes
in Computer Science 5637 (Springer Berlin Heidelberg, 2009), 70--82.
[[↩](#fnref-2025-46)]{.footnotereverse}]{#fn-2025-46}
47. [J.P. Timpanaro et al., "BitTorrent's Mainline DHT Security
Assessment," in *2011 4th IFIP International Conference on New
Technologies, Mobility and Security (NTMS)*, 2011, 1--5.
[[↩](#fnref-2025-47)]{.footnotereverse}]{#fn-2025-47}
48. [Ernesto, "US P2P Lawsuit Shows Signs of a 'Pirate Honeypot',"
Technology, *TorrentFreak*, last edited in June 2011 and archived on
January 14, 2014,
[http://torrentfreak.com/u-s-p2p-lawsuit-shows-signs-of-a-pirate-honeypot-110601/](“https://web.archive.org/web/20140114200326/http://torrentfreak.com/u-s-p2p-lawsuit-shows-signs-of-a-pirate-honeypot-110601/”).
[[↩](#fnref-2025-48)]{.footnotereverse}]{#fn-2025-48}
49. [Benkler *The Wealth of Networks*, 60.
[[↩](#fnref-2025-49)]{.footnotereverse}]{#fn-2025-49}
50. [On the free and public library movement in England and the United
States see Thomas Greenwood, *Public Libraries: a History of the
Movement and a Manual for the Organization and Management of Rate
Supported Libraries* (Simpkin, Marshall, Hamilton, Kent, 1890);
Elizabeth Akers Allen and James Phinney Baxter, *Dedicatory
Exercises of the Baxter Building* (Auburn, Me: Lakeside Press,
1889). To read more about the history of free and public library
movements in Russia see Mary Stuart, "The Evolution of Librarianship
in Russia: the Librarians of the Imperial Public Library,
1808-1868," *The Library Quarterly* 64.1 (January 1994): 1--29; Mary
Stuart, "Creating a National Library for the Workers' State: the
Public Library in Petrograd and the Rumiantsev Library Under
Bolshevik Rule," *The Slavonic and East European Review* 72.2 (April
1994): 233--58; Mary Stuart "The Ennobling Illusion: the Public
Library Movement in Late Imperial Russia," *The Slavonic and East
European Review* 76.3 (July 1998): 401--40.
[[↩](#fnref-2025-50)]{.footnotereverse}]{#fn-2025-50}
51. [Michael H. Harris, *History of Libraries of the Western World*,
(London: Scarecrow Press, 1999), 136.
[[↩](#fnref-2025-51)]{.footnotereverse}]{#fn-2025-51}
52. [http://s\*.d\*.ru/comments/508985/.
[[↩](#fnref-2025-52)]{.footnotereverse}]{#fn-2025-52}
53. [RR/forum/viewtopic.php?t=1590026.
[[↩](#fnref-2025-53)]{.footnotereverse}]{#fn-2025-53}
54. [Aaron Halfaker et al."The Rise and Decline of an Open Collaboration
System: How Wikipedia's Reaction to Popularity Is Causing Its
Decline," *American Behavioral Scientist*, December 2012.
[[↩](#fnref-2025-54)]{.footnotereverse}]{#fn-2025-54}
:::

Series Navigation[[\<\< What Do Metrics Want? How Quantification
Prescribes Social Interaction on
Facebook](http://computationalculture.net/what-do-metrics-want/ "<< What Do Metrics Want? How Quantification Prescribes Social Interaction on Facebook")]{.series-nav-left}[[Modelling
biology -- working through (in-)stabilities and frictions
\>\>](http://computationalculture.net/modelling-biology/ "Modelling biology – working through (in-)stabilities and frictions >>")]{.series-nav-right}
:::

::: {.comments}
:::

Article printed from Computational Culture:
**http://computationalculture.net**

URL to article:
**http://computationalculture.net/book-piracy-as-peer-preservation/**

[Click here to print.](#Print "Click here to print.")

Copyright © 2012 Computational Culture. All rights reserved.

Murtaugh
A bag but is language nothing of words
2016


## A bag but is language nothing of words

### From Mondotheque

#####

(language is nothing but a bag of words)

[Michael Murtaugh](/wiki/index.php?title=Michael_Murtaugh "Michael Murtaugh")

In text indexing and other machine reading applications the term "bag of
words" is frequently used to underscore how processing algorithms often
represent text using a data structure (word histograms or weighted vectors)
where the original order of the words in sentence form is stripped away. While
"bag of words" might well serve as a cautionary reminder to programmers of the
essential violence perpetrated to a text and a call to critically question the
efficacy of methods based on subsequent transformations, the expression's use
seems in practice more like a badge of pride or a schoolyard taunt that would
go: Hey language: you're nothin' but a big BAG-OF-WORDS.

## Bag of words

In information retrieval and other so-called _machine-reading_ applications
(such as text indexing for web search engines) the term "bag of words" is used
to underscore how in the course of processing a text the original order of the
words in sentence form is stripped away. The resulting representation is then
a collection of each unique word used in the text, typically weighted by the
number of times the word occurs.

Bag of words, also known as word histograms or weighted term vectors, are a
standard part of the data engineer's toolkit. But why such a drastic
transformation? The utility of "bag of words" is in how it makes text amenable
to code, first in that it's very straightforward to implement the translation
from a text document to a bag of words representation. More significantly,
this transformation then opens up a wide collection of tools and techniques
for further transformation and analysis purposes. For instance, a number of
libraries available in the booming field of "data sciences" work with "high
dimension" vectors; bag of words is a way to transform a written document into
a mathematical vector where each "dimension" corresponds to the (relative)
quantity of each unique word. While physically unimaginable and abstract
(imagine each of Shakespeare's works as points in a 14 million dimensional
space), from a formal mathematical perspective, it's quite a comfortable idea,
and many complementary techniques (such as principle component analysis) exist
to reduce the resulting complexity.

What's striking about a bag of words representation, given is centrality in so
many text retrieval application is its irreversibility. Given a bag of words
representation of a text and faced with the task of producing the original
text would require in essence the "brain" of a writer to recompose sentences,
working with the patience of a devoted cryptogram puzzler to draw from the
precise stock of available words. While "bag of words" might well serve as a
cautionary reminder to programmers of the essential violence perpetrated to a
text and a call to critically question the efficacy of methods based on
subsequent transformations, the expressions use seems in practice more like a
badge of pride or a schoolyard taunt that would go: Hey language: you're
nothing but a big BAG-OF-WORDS. Following this spirit of the term, "bag of
words" celebrates a perfunctory step of "breaking" a text into a purer form
amenable to computation, to stripping language of its silly redundant
repetitions and foolishly contrived stylistic phrasings to reveal a purer
inner essence.

## Book of words

Lieber's Standard Telegraphic Code, first published in 1896 and republished in
various updated editions through the early 1900s, is an example of one of
several competing systems of telegraph code books. The idea was for both
senders and receivers of telegraph messages to use the books to translate
their messages into a sequence of code words which can then be sent for less
money as telegraph messages were paid by the word. In the front of the book, a
list of examples gives a sampling of how messages like: "Have bought for your
account 400 bales of cotton, March delivery, at 8.34" can be conveyed by a
telegram with the message "Ciotola, Delaboravi". In each case the reduction of
number of transmitted words is highlighted to underscore the efficacy of the
method. Like a dictionary or thesaurus, the book is primarily organized around
key words, such as _act_ , _advice_ , _affairs_ , _bags_ , _bail_ , and
_bales_ , under which exhaustive lists of useful phrases involving the
corresponding word are provided in the main pages of the volume. [1]

[![Liebers
P1016847.JPG](/wiki/images/4/41/Liebers_P1016847.JPG)](/wiki/index.php?title=File:Liebers_P1016847.JPG)

[![Liebers
P1016859.JPG](/wiki/images/3/35/Liebers_P1016859.JPG)](/wiki/index.php?title=File:Liebers_P1016859.JPG)

[![Liebers
P1016861.JPG](/wiki/images/3/34/Liebers_P1016861.JPG)](/wiki/index.php?title=File:Liebers_P1016861.JPG)

[![Liebers
P1016869.JPG](/wiki/images/f/fd/Liebers_P1016869.JPG)](/wiki/index.php?title=File:Liebers_P1016869.JPG)

> [...] my focus in this chapter is on the inscription technology that grew
parasitically alongside the monopolistic pricing strategies of telegraph
companies: telegraph code books. Constructed under the bywords “economy,”
“secrecy,” and “simplicity,” telegraph code books matched phrases and words
with code letters or numbers. The idea was to use a single code word instead
of an entire phrase, thus saving money by serving as an information
compression technology. Generally economy won out over secrecy, but in
specialized cases, secrecy was also important.[2]

In Katherine Hayles' chapter devoted to telegraph code books she observes how:

> The interaction between code and language shows a steady movement away from
a human-centric view of code toward a machine-centric view, thus anticipating
the development of full-fledged machine codes with the digital computer. [3]

[![Liebers
P1016851.JPG](/wiki/images/1/13/Liebers_P1016851.JPG)](/wiki/index.php?title=File:Liebers_P1016851.JPG)
Aspects of this transitional moment are apparent in a notice included
prominently inserted in the Lieber's code book:

> After July, 1904, all combinations of letters that do not exceed ten will
pass as one cipher word, provided that it is pronounceable, or that it is
taken from the following languages: English, French, German, Dutch, Spanish,
Portuguese or Latin -- International Telegraphic Conference, July 1903 [4]

Conforming to international conventions regulating telegraph communication at
that time, the stipulation that code words be actual words drawn from a
variety of European languages (many of Lieber's code words are indeed
arbitrary Dutch, German, and Spanish words) underscores this particular moment
of transition as reference to the human body in the form of "pronounceable"
speech from representative languages begins to yield to the inherent potential
for arbitrariness in digital representation.

What telegraph code books do is remind us of is the relation of language in
general to economy. Whether they may be economies of memory, attention, costs
paid to a telecommunicatons company, or in terms of computer processing time
or storage space, encoding language or knowledge in any form of writing is a
form of shorthand and always involves an interplay with what one expects to
perform or "get out" of the resulting encoding.

> Along with the invention of telegraphic codes comes a paradox that John
Guillory has noted: code can be used both to clarify and occlude. Among the
sedimented structures in the technological unconscious is the dream of a
universal language. Uniting the world in networks of communication that
flashed faster than ever before, telegraphy was particularly suited to the
idea that intercultural communication could become almost effortless. In this
utopian vision, the effects of continuous reciprocal causality expand to
global proportions capable of radically transforming the conditions of human
life. That these dreams were never realized seems, in retrospect, inevitable.
[5]

[![Liebers
P1016884.JPG](/wiki/images/9/9c/Liebers_P1016884.JPG)](/wiki/index.php?title=File:Liebers_P1016884.JPG)

[![Liebers
P1016852.JPG](/wiki/images/7/74/Liebers_P1016852.JPG)](/wiki/index.php?title=File:Liebers_P1016852.JPG)

[![Liebers
P1016880.JPG](/wiki/images/1/11/Liebers_P1016880.JPG)](/wiki/index.php?title=File:Liebers_P1016880.JPG)

Far from providing a universal system of encoding messages in the English
language, Lieber's code is quite clearly designed for the particular needs and
conditions of its use. In addition to the phrases ordered by keywords, the
book includes a number of tables of terms for specialized use. One table lists
a set of words used to describe all possible permutations of numeric grades of
coffee (Choliam = 3,4, Choliambos = 3,4,5, Choliba = 4,5, etc.); another table
lists pairs of code words to express the respective daily rise or fall of the
price of coffee at the port of Le Havre in increments of a quarter of a Franc
per 50 kilos ("Chirriado = prices have advanced 1 1/4 francs"). From an
archaeological perspective, the Lieber's code book reveals a cross section of
the needs and desires of early 20th century business communication between the
United States and its trading partners.

The advertisements lining the Liebers Code book further situate its use and
that of commercial telegraphy. Among the many advertisements for banking and
law services, office equipment, and alcohol are several ads for gun powder and
explosives, drilling equipment and metallurgic services all with specific
applications to mining. Extending telegraphy's formative role for ship-to-
shore and ship-to-ship communication for reasons of safety, commercial
telegraphy extended this network of communication to include those parties
coordinating the "raw materials" being mined, grown, or otherwise extracted
from overseas sources and shipped back for sale.

## "Raw data now!"

From [La ville intelligente - Ville de la connaissance](/wiki/index.php?title
=La_ville_intelligente_-_Ville_de_la_connaissance "La ville intelligente -
Ville de la connaissance"):

Étant donné que les nouvelles formes modernistes et l'utilisation de matériaux
propageaient l'abondance d'éléments décoratifs, Paul Otlet croyait en la
possibilité du langage comme modèle de « [données
brutes](/wiki/index.php?title=Bag_of_words "Bag of words") », le réduisant aux
informations essentielles et aux faits sans ambiguïté, tout en se débarrassant
de tous les éléments inefficaces et subjectifs.


From [The Smart City - City of Knowledge](/wiki/index.php?title
=The_Smart_City_-_City_of_Knowledge "The Smart City - City of Knowledge"):

As new modernist forms and use of materials propagated the abundance of
decorative elements, Otlet believed in the possibility of language as a model
of '[raw data](/wiki/index.php?title=Bag_of_words "Bag of words")', reducing
it to essential information and unambiguous facts, while removing all
inefficient assets of ambiguity or subjectivity.


> Tim Berners-Lee: [...] Make a beautiful website, but first give us the
unadulterated data, we want the data. We want unadulterated data. OK, we have
to ask for raw data now. And I'm going to ask you to practice that, OK? Can
you say "raw"?

>

> Audience: Raw.

>

> Tim Berners-Lee: Can you say "data"?

>

> Audience: Data.

>

> TBL: Can you say "now"?

>

> Audience: Now!

>

> TBL: Alright, "raw data now"!

>

> [...]

>

> So, we're at the stage now where we have to do this -- the people who think
it's a great idea. And all the people -- and I think there's a lot of people
at TED who do things because -- even though there's not an immediate return on
the investment because it will only really pay off when everybody else has
done it -- they'll do it because they're the sort of person who just does
things which would be good if everybody else did them. OK, so it's called
linked data. I want you to make it. I want you to demand it. [6]

## Un/Structured

As graduate students at Stanford, Sergey Brin and Lawrence (Larry) Page had an
early interest in producing "structured data" from the "unstructured" web. [7]

> The World Wide Web provides a vast source of information of almost all
types, ranging from DNA databases to resumes to lists of favorite restaurants.
However, this information is often scattered among many web servers and hosts,
using many different formats. If these chunks of information could be
extracted from the World Wide Web and integrated into a structured form, they
would form an unprecedented source of information. It would include the
largest international directory of people, the largest and most diverse
databases of products, the greatest bibliography of academic works, and many
other useful resources. [...]

>

> **2.1 The Problem**
> Here we define our problem more formally:
> Let D be a large database of unstructured information such as the World
Wide Web [...] [8]

In a paper titled _Dynamic Data Mining_ Brin and Page situate their research
looking for _rules_ (statistical correlations) between words used in web
pages. The "baskets" they mention stem from the origins of "market basket"
techniques developed to find correlations between the items recorded in the
purchase receipts of supermarket customers. In their case, they deal with web
pages rather than shopping baskets, and words instead of purchases. In
transitioning to the much larger scale of the web, they describe the
usefulness of their research in terms of its computational economy, that is
the ability to tackle the scale of the web and still perform using
contemporary computing power completing its task in a reasonably short amount
of time.

> A traditional algorithm could not compute the large itemsets in the lifetime
of the universe. [...] Yet many data sets are difficult to mine because they
have many frequently occurring items, complex relationships between the items,
and a large number of items per basket. In this paper we experiment with word
usage in documents on the World Wide Web (see Section 4.2 for details about
this data set). This data set is fundamentally different from a supermarket
data set. Each document has roughly 150 distinct words on average, as compared
to roughly 10 items for cash register transactions. We restrict ourselves to a
subset of about 24 million documents from the web. This set of documents
contains over 14 million distinct words, with tens of thousands of them
occurring above a reasonable support threshold. Very many sets of these words
are highly correlated and occur often. [9]

## Un/Ordered

In programming, I've encountered a recurring "problem" that's quite
symptomatic. It goes something like this: you (the programmer) have managed to
cobble out a lovely "content management system" (either from scratch, or using
any number of helpful frameworks) where your user can enter some "items" into
a database, for instance to store bookmarks. After this ordered items are
automatically presented in list form (say on a web page). The author: It's
great, except... could this bookmark come before that one? The problem stems
from the fact that the database ordering (a core functionality provided by any
database) somehow applies a sorting logic that's almost but not quite right. A
typical example is the sorting of names where details (where to place a name
that starts with a Norwegian "Ø" for instance), are language-specific, and
when a mixture of languages occurs, no single ordering is necessarily
"correct". The (often) exascerbated programmer might hastily add an additional
database field so that each item can also have an "order" (perhaps in the form
of a date or some other kind of (alpha)numerical "sorting" value) to be used
to correctly order the resulting list. Now the author has a means, awkward and
indirect but workable, to control the order of the presented data on the start
page. But one might well ask, why not just edit the resulting listing as a
document? Not possible! Contemporary content management systems are based on a
data flow from a "pure" source of a database, through controlling code and
templates to produce a document as a result. The document isn't the data, it's
the end result of an irreversible process. This problem, in this and many
variants, is widespread and reveals an essential backwardness that a
particular "computer scientist" mindset relating to what constitutes "data"
and in particular it's relationship to order that makes what might be a
straightforward question of editing a document into an over-engineered
database.

Recently working with Nikolaos Vogiatzis whose research explores playful and
radically subjective alternatives to the list, Vogiatzis was struck by how
from the earliest specifications of HTML (still valid today) have separate
elements (OL and UL) for "ordered" and "unordered" lists.

> The representation of the list is not defined here, but a bulleted list for
unordered lists, and a sequence of numbered paragraphs for an ordered list
would be quite appropriate. Other possibilities for interactive display
include embedded scrollable browse panels. [10]

Vogiatzis' surprise lay in the idea of a list ever being considered
"unordered" (or in opposition to the language used in the specification, for
order to ever be considered "insignificant"). Indeed in its suggested
representation, still followed by modern web browsers, the only difference
between the two visually is that UL items are preceded by a bullet symbol,
while OL items are numbered.

The idea of ordering runs deep in programming practice where essentially
different data structures are employed depending on whether order is to be
maintained. The indexes of a "hash" table, for instance (also known as an
associative array), are ordered in an unpredictable way governed by a
representation's particular implementation. This data structure, extremely
prevalent in contemporary programming practice sacrifices order to offer other
kinds of efficiency (fast text-based retrieval for instance).

## Data mining

In announcing Google's impending data center in Mons, Belgian prime minister
Di Rupo invoked the link between the history of the mining industry in the
region and the present and future interest in "data mining" as practiced by IT
companies such as Google.

Whether speaking of bales of cotton, barrels of oil, or bags of words, what
links these subjects is the way in which the notion of "raw material" obscures
the labor and power structures employed to secure them. "Raw" is always
relative: "purity" depends on processes of "refinement" that typically carry
social/ecological impact.

Stripping language of order is an act of "disembodiment", detaching it from
the acts of writing and reading. The shift from (human) reading to machine
reading involves a shift of responsibility from the individual human body to
the obscured responsibilities and seemingly inevitable forces of the
"machine", be it the machine of a market or the machine of an algorithm.

From [X = Y](/wiki/index.php?title=X_%3D_Y "X = Y"):

Still, it is reassuring to know that the products hold traces of the work,
that even with the progressive removal of human signs in automated processes,
the workers' presence never disappears completely. This presence is proof of
the materiality of information production, and becomes a sign of the economies
and paradigms of efficiency and profitability that are involved.


The computer scientists' view of textual content as "unstructured", be it in a
webpage or the OCR scanned pages of a book, reflect a negligence to the
processes and labor of writing, editing, design, layout, typesetting, and
eventually publishing, collecting and cataloging [11].

"Unstructured" to the computer scientist, means non-conformant to particular
forms of machine reading. "Structuring" then is a social process by which
particular (additional) conventions are agreed upon and employed. Computer
scientists often view text through the eyes of their particular reading
algorithm, and in the process (voluntarily) blind themselves to the work
practices which have produced and maintain these "resources".

Berners-Lee, in chastising his audience of web publishers to not only publish
online, but to release "unadulterated" data belies a lack of imagination in
considering how language is itself structured and a blindness to the need for
more than additional technical standards to connect to existing publishing
practices.

Last Revision: 2*08*2016

1. ↑ Benjamin Franklin Lieber, Lieber's Standard Telegraphic Code, 1896, New York;
2. ↑ Katherine Hayles, "Technogenesis in Action: Telegraph Code Books and the Place of the Human", How We Think: Digital Media and Contemporary Technogenesis, 2006
3. ↑ Hayles
4. ↑ Lieber's
5. ↑ Hayles
6. ↑ Tim Berners-Lee: The next web, TED Talk, February 2009
7. ↑ "Research on the Web seems to be fashionable these days and I guess I'm no exception." from Brin's [Stanford webpage](http://infolab.stanford.edu/~sergey/)
8. ↑ Extracting Patterns and Relations from the World Wide Web, Sergey Brin, Proceedings of the WebDB Workshop at EDBT 1998,
9. ↑ Dynamic Data Mining: Exploring Large Rule Spaces by Sampling; Sergey Brin and Lawrence Page, 1998; p. 2
10. ↑ Hypertext Markup Language (HTML): "Internet Draft", Tim Berners-Lee and Daniel Connolly, June 1993,
11. ↑

Retrieved from
[https://www.mondotheque.be/wiki/index.php?title=A_bag_but_is_language_nothing_of_words&oldid=8480](https://www.mondotheque.be/wiki/index.php?title=A_bag_but_is_language_nothing_of_words&oldid=8480)

 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.