1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx*2)), $2}' > freq.$i.txt; done && rm temp.txt

* 2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR
line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR
line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf*$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

* 3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR

Barok
Poetics of Research
2014

_An unedited version of a talk given at the conference[Public
Library](http://www.wkv-stuttgart.de/en/program/2014/events/public-library/)
held at Württembergischer Kunstverein Stuttgart, 1 November 2014._

_Bracketed sequences are to be reformulated._

Poetics of Research

In this talk I'm going to attempt to identify [particular] cultural
algorithms, ie. processes in which cultural practises and software meet. With
them a sphere is implied in which algorithms gather to form bodies of
practices and in which cultures gather around algorithms. I'm going to
approach them through the perspective of my practice as a cultural worker,
editor and artist, considering practice in the same rank as theory and
poetics, and where theorization of practice can also lead to the
identification of poetical devices.

The primary motivation for this talk is an attempt to figure out where do we
stand as operators, users [and communities] gathering around infrastructures
containing a massive body of text (among other things) and what sort of things
might be considered to make a difference [or to keep making difference].

The talk mainly [considers] the role of text and the word in research, by way
of several figures.

A

A reference, list, scheme, table, index; those things that intervene in the
flow of narrative, illustrating the point, perhaps in a more economic way than
the linear text would do. Yet they don't function as pictures, they are
primarily texts, arranged in figures. Their forms have been
standardised[normalised] over centuries, withstood the transition to the
digital without any significant change, being completely intuitive to the
modern reader. Compared to the body of text they are secondary, run parallel
to it. Their function is however different to that of the punctuation. They
are there neither to shape the narrative nor to aid structuring the argument
into logical blocks. Nor is their function spatial, like in visual poems.
Their positions within a document are determined according to the sequential
order of the text, [standing as attachments] and are there to clarify the
nature of relations among elements of the subject-matter, or to establish
relations with other documents. The [premise] of my talk is that these
_textual figures_ also came to serve as the abstract[relational] models
determining possible relations among documents as such, and in consequence [to
structure conditions [of research]].

B

It can be said that research, as inquiry into a subject-matter, consists of
discrete queries. A query, such as a question about what something is, what
kinds, parts and properties does it have, and so on, can be consulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.

Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.

C

Searching in a collection of [amassed/assembled] [tangible] documents (ie.
bookshelf) is different from searching in a systematically structured
repository (library) and even more so from searching in a digital repository
(digital library). Not that they are mutually exclusive. One can devise
structures and algorithms to search through a printed text, or read books in a
library one by one. They are rather [models] [embodying] various [processes]
associated with the query. These properties of the query might be called [the
sequence], the structure and the index. If they are present in the ways of
querying documents, and we will return to this issue, are they persistent
within the inquiry as such? [wait]

D

This question itself is a rupture in the sequence. It makes a demand to depart
from one narrative [a continuous flow of words] to another, to figure out,
while remaining bound to it [it would be even more as a so-called rhetorical
question]. So there has been one sequence, or line, of the inquiry--about the
kinds of the query and its properties. That sequence itself is a digression,
from within the sequence about what is research and describing its parts
(queries). We are thus returning to it and continue with a question whether
the properties of the inquiry are the same as the properties of the query.

E

But isn't it true that every single utterance occurring in a sequence yields a
query as well? Let's consider the word _utterance_. [wait] It can produce a
number of associations, for example with how Foucault employs the notion of
_énoncé_ in his _Archaeology of Knowledge_ , giving hard time to his English
translators wondering whether _utterance_ or _statement_ is more appropriate,
or whether they are interchangeable, and what impact would each choice have on
his reception in the Anglophone world. Limiting ourselves to textual forms for
now (and not translating his work but pursing a different inquiry), let us say
the utterance is a word [or a phrase or an idiom] in a sequence such as a
sentence, a paragraph, or a document.

## (F) The
structure[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=1
"Edit section: $F$ The structure")]

This distinction is as old as recorded Western thought since both Plato and
Aristotle differentiate between a word on its own ("the said", a thing said)
and words in the company of other words. For example, Aristotle's _Categories_
[lay] on the [notion] of words on their own, and they are made the subject-
matter of that inquiry. [For him], the ambiguity of connotation words
[produce] lies in their synonymity, understood differently from the moderns--
not as more words denoting a similar thing but rather one word denoting
various things. Categories were outlined as a device to differentiate among
words according to kinds of these things. Every word as such belonged to not
less and not more than one of ten categories.

So it happens to the word _utterance_ , as to any other word uttered in a
sequence, that it poses a question, a query about what share of the spectrum
of possibly denoted things might yield as the most appropriate in a given
context. The more context the more precise share comes to the fore. When taken
out of the context ambiguity prevails as the spectrum unveils in its variety.

Thus single words [as any other utterances] are questions, queries,
themselves, and by occuring in statements, in context, their [means] are being
singled out.

This process is _conditioned_ by what has been formalized as the techniques of
_regulating_ definitions of words.

### (G) The structure: words as
words[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=2
"Edit section: $G$ The structure: words as words")]

* [![](/images/thumb/c/c8/Philitas_in_P.Oxy.XX_2260_i.jpg/144px-Philitas_in_P.Oxy.XX_2260_i.jpg)](/File:Philitas_in_P.Oxy.XX_2260_i.jpg)

P.Oxy.XX 2260 i: Oxyrhynchus papyrus XX, 2260, column i, with quotation from
Philitas, early 2nd c. CE. ¹(http://163.1.169.40/cgi-
bin/library?e=q-000-00---0POxy--00-0-0--0prompt-10---4------0-1l--1-en-50---
20-about-2260--
00031-001-0-0utfZz-8-00&a=d&c=POxy&cl=search&d=HASH13af60895d5e9b50907367)
²(http://en.wikipedia.org/wiki/File:POxy.XX.2260.i-Philitas-
highlight.jpeg)

* [![](/images/thumb/9/9e/Cyclopaedia_1728_page_210_Dictionary_entry.jpg/88px-Cyclopaedia_1728_page_210_Dictionary_entry.jpg)](/File:Cyclopaedia_1728_page_210_Dictionary_entry.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , 1728, p. 210. ³(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0576&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/b/b8/Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg/160px-Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)](/File:Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)

Detail from the Liddell-Scott Greek-English Lexicon, c1843.

Dictionaries have had a long life. The ancient Greek scholar and poet Philitas
of Cos living in the 4th c. BCE wrote a vocabulary explaining the meanings of
rare Homeric and other literary words, words from local dialects, and
technical terms. The vocabulary, called _Disorderly Words_ (Átaktoi glôssai),
has been lost, with a few fragments quoted by later authors. One example is
that the word πέλλα (pélla) meant "wine cup" in the ancient Greek region of
Boeotia; contrasted to the same word meaning "milk pail" in Homer's _Iliad_.

Not much has changed in the way how dictionaries constitute order. Selected
archives of statements are queried to yield occurrences of particular words,
various _criteria[indicators]_ are applied to filtering and sorting them and
in turn the spectrum of [denoted] things allocated in this way is structured
into groups and subgroups which are then given, according to other set of
rules, shorter or longer names. These constitute facets of [potential]
meanings of a word.

So there are at least _four_ sets of conditions [structuring] dictionaries.
One is required to delimit an archive[corpus of texts], one to select and give
preference[weights] to occurrences of a word, another to cluster them, and yet
another to abstract[generalize] the subject-matter of each of these clusters.
Needless to say, this is a craft of a few and these criteria are rarely being
disclosed, despite their impact on research, and more generally, their
influence as conditions for production[making] of a so called _common sense_.

It doesn't take that much to reimagine what a dictionary is and what it could
be, especially having large specialized corpora of texts at hand. These can
also serve as aids in production of new words and new meanings.

### (H) The structure: words as knowledge and the
world[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=3
"Edit section: $H$ The structure: words as knowledge and the world")]

* [![](/images/thumb/0/02/Boethius_Porphyrys_Isagoge.jpg/120px-Boethius_Porphyrys_Isagoge.jpg)](/File:Boethius_Porphyrys_Isagoge.jpg)

Boethius's rendering of a classification tree described in Porphyry's Isagoge
(3th c.), [6th c.] 10th c.
⁴(http://www.e-codices.unifr.ch/en/sbe/0315/53/medium)

* [![](/images/thumb/d/d0/Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg/94px-Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)](/File:Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , London, 1728, p. II. ⁵(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0015&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/d/d6/Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg/116px-Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)](/File:Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)

Système figuré des connaissances humaines, _Encyclopédie ou Dictionnaire
raisonné des sciences, des arts et des métiers_ , 1751.
⁶(http://encyclopedie.uchicago.edu/content/syst%C3%A8me-figur%C3%A9-des-
connaissances-humaines)

* [![](/images/thumb/9/96/Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg/96px-Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)](/File:Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)

Haeckel - Darwin's tree.

Another _formalized_ and [internalized] process being at play when figuring
out a word is its [containment]. Word is not only structured by way of things
it potentially denotes but also by words it is potentially part of and those
it contains.

The fuzz around categorization of knowledge _and_ the world in the Western
thought can be traced back to Porphyry, if not further. In his introduction to
Aristotle's _Categories_ this 3rd century AD Neoplatonist began expanding the
notions of genus and species into their hypothetic consequences. Aristotle's
brief work outlines ten categories of 'things that are said' (legomena,
λεγόμενα), namely substance (or substantive, {not the same as matter!},
οὐσία), quantity (ποσόν), qualification (ποιόν), a relation (πρός), where
(ποῦ), when (πότε), being-in-a-position (κεῖσθαι), having (or state,
condition, ἔχειν), doing (ποιεῖν), and being-affected (πάσχειν). In his
different work, _Topics_ , Aristotle outlines four kinds of subjects/materials
indicated in propositions/problems from which arguments/deductions start.
These are a definition (όρος), a genus (γένος), a property (ἴδιος), and an
accident (συμβεβηϰόϛ). Porphyry does not explicitly refer _Topics_ , and says
he omits speaking "about genera and species, as to whether they subsist (in
the nature of things) or in mere conceptions only"
⁸(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C1),
which means he avoids explicating whether he talks about kinds of concepts or
kinds of things in the sensible world. However, the work sparked confusion, as
the following passage [suggests]:

> "[I]n each category there are certain things most generic, and again, others
most special, and between the most generic and the most special, others which
are alike called both genera and species, but the most generic is that above
which there cannot be another superior genus, and the most special that below
which there cannot be another inferior species. Between the most generic and
the most special, there are others which are alike both genera and species,
referred, nevertheless, to different things, but what is stated may become
clear in one category. Substance indeed, is itself genus, under this is body,
under body animated body, under which is animal, under animal rational animal,
under which is man, under man Socrates, Plato, and men particularly." (Owen
1853,
⁹(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C2))

Porphyry took one of Aristotle's ten categories of the word, substance, and
dissected it using one of his four rhetorical devices, genus. Employing
Aristotle's categories, genera and species as means for logical operations,
for dialectic, Porphyry's interpretation resulted in having more resemblance
to the perceived _structures_ of the world. So they began to bloom.

There were earlier examples, but Porphyry was the most influential in
injecting the _universalist_ version of classification [implying] the figure
of a tree into the [locus] of Aristotle's thought. Knowledge became
monotheistic.

Classification schemes [growing from one point] play a major role in
untangling the format of modern encyclopedia from that of the dictionary
governed by alphabet. Two of the most influential encyclopedias of the 18th
century are cases in the point. Although still keeping 'dictionary' in their
titles, they are conceived not to represent words but knowledge. The [upper-
most] genus of the body was set as the body of knowledge. The English
_Cyclopaedia, or an Universal Dictionary of Arts and Sciences_ (1728) splits
into two main branches: "natural and scientifical" and "artificial and
technical"; these further split down to 47 classes in total, each carrying a
structured list (on the following pages) of thematic articles, serving as
table of contents. The French _Encyclopedia: or a Systematic Dictionary of the
Sciences, Arts, and Crafts_ (1751) [unwinds] from judgement ( _entendement_ ),
branches into memory as history, reason as philosophy, and imagination as
poetry. The logic of containers was employed as an aid not only to deal with
the enormous task of naming and not omiting anything from what is known, but
also for the management of labour of hundreds of writers and researchers, to
create a mechanism for delegating work and the distribution of
responsibilities. Flesh was also more present, in the field research, with
researchers attending workshops and sites of everyday life to annotate it.

The world came forward to unshine the word in other schemes. Darwin's tree of
evolution and some of the modern document classification systems such as
Charles A. Cutter's _Expansive Classification_ (1882) set to classify the
world itself and set the field for what has came to be known as authority
lists structuring metadata in today's computing.

### The structure
(summary)[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=4
"Edit section: The structure $summary$")]

Facetization of meaning and branching of knowledge are both the domain of the
unit of utterance.

While lexicographers[dictionarists] structure thought through multi-layered
processes of abstraction of the written record, knowledge growers dissect it
into hierarchies of [mutually] contained notions.

One seek to describe the word as a faceted list of small worlds, another to
describe the world as a structured lists of words. One play prime in the
domain of epistemology, in what is known, controlling the vocabulary, another
in the domain of ontology, in what is, controlling reality.

Every [word] has its given things, every thing has its place, closer or
further from a single word.

The schism between classifying words and classifying the world implies it is
not possible to construct a universal classification scheme[system]. On top of
that, any classification system of words is bound to a corpus of texts it is
operating upon and any classification system of the world again operates with
words which are bound to a vocabulary[lexicon] which is again bound to a
corpus [of texts]. It doesn't mean it would prevent people from trying.
Classifications function as descriptors of and 'inscriptors' upon the world,
imprinting their authority. They operate from [a locus of] their
corpus[context]-specificity. The larger the corpus, the more power it has on
shaping the world, as far as the word shapes it (yes, I do imply Google here,
for which it is a domain to be potentially exploited).

## (J) The
sequence[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=5
"Edit section: $J$ The sequence")]

The structure-yielding query [of] the single word [shrinks][zuzuje
sa,spresnuje] with preceding and following words. Inquiry proceeds in the flow
that establishes another kind[mode] of relationality, chaining words into the
sequence. While the structuring property of the query brings words apart from
each other, its sequential property establishes continuity and brings these
units into an ordered set.

This is what is responsible for attaching textual figures mentioned earlier
(lists, schemes, tables) to the body of the text. Associations can be also
stated explicitly, by indexing tables and then referring them from a
particular point in the text. The same goes for explicit associations made
between blocks of the text by means of indexed paragraphs, chapters or pages.

From this follows that all utterances point to the following utterance by the
nature of sequential order, and indexing provides means for pointing elsewhere
in the document as well.

A lot can be said about references to other texts. Here, to spare time, I
would refer you to a talk I gave a few months ago and which is online
¹⁰(http://monoskop.org/Talks/Communing_Texts).

This is still the realm of print. What happens with document when it is
digitized?

Digitization breaks a document into units of which each is assigned a numbered
position in the sequence of the document. From this perspective digitization
can be viewed as a total indexation of the document. It is converted into
units rendered for machine operations. This sequentiality is made explicit, by
means of an underlying index.

Sequences and chains are orders of one dimension. Their one-dimensional
ordering allows addressability of each element and [random] access. [Jumps]
between [random] addresses are still sequential, processing elements one at a
time.

## (K) The
index[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=6
"Edit section: $K$ The index")]

* [![](/images/thumb/2/27/Summa_confessorum.1310.jpg/103px-Summa_confessorum.1310.jpg)](/File:Summa_confessorum.1310.jpg)

Summa confessorum [1297-98], 1310.
⁷(http://www.bl.uk/onlinegallery/onlineex/illmanus/roymanucoll/j/011roy000008g11u00002000.html)

[The] sequencing not only weaves words into statements but activates other
temporalities, and _presents occurrences of words from past statements_. As
now when I am saying the word _utterance_ , each time there surface contexts
in which I have used it earlier.

A long quote from Frederick G. Kilgour, _The Evolution of the Book_ , 1998, pp
76-77:

> "A century of invention of various types of indexes and reference tools
preceded the advent of the first subject index to a specific book, which
occurred in the last years of the thirteenth century. The first subject
indexes were "distinctions," collections of "various figurative or symbolic
meanings of a noun found in the scriptures" that "are the earliest of all
alphabetical tools aside from dictionaries." (Richard and Mary Rouse supply an
example: "Horse = Preacher. Job 39: 'Hast thou given the horse strength, or
encircled his neck with whinning?')

>

> [Concordance] By the end of the third decade of the thirteenth century Hugh
de Saint-Cher had produced the first word concordance. It was a simple word
index of the Bible, with every location of each word listed by [its position
in the Bible specified by book, chapter, and letter indicating part of the
chapter]. Hugh organized several dozen men, assigning to each man an initial
letter to search; for example, the man assigned M was to go through the entire
Bible, list each word beginning with M and give its location. As it was soon
perceived that this original reference work would be even more useful if words
were cited in context, a second concordance was produced, with each word in
lengthy context, but it proved to be unwieldy. [Soon] a third version was
produced, with words in contexts of four to seven words, the model for
biblical concordances ever since.

>

> [Subject index] The subject index, also an innovation of the thirteenth
century, evolved over the same period as did the concordance. Most of the
early topical indexes were designed for writing sermons; some were organized,
while others were apparently sequential without any arrangement. By midcentury
the entries were in alphabetical order, except for a few in some classified
arrangement. Until the end of the century these alphabetical reference works
indexed a small group of books. Finally John of Freiburg added an alphabetical
subject index to his own book, _Summa Confessorum_ (1297—1298). As the Rouses
have put it, 'By the end of the [13]th century the practical utility of the
subject index is taken for granted by the literate West, no longer solely as
an aid for preachers, but also in the disciplines of theology, philosophy, and
both kinds of law.'"

In one sense neither subject-index nor concordane are indexes, they are words
or group of words selected according to given criteria from the body of the
text, each accompanied with a list of identifiers. These identifiers are
elements of an index, whether they represent a page, chapter, column, or other
[kind of] block of text. Every identifier is an unique _address_.

The index is thus an ordering of a sequence by means of associating its
elements with a set of symbols, when each element is given unique combination
of symbols. Different sizes of sets yield different number of variations.
Symbol sets such as an alphabet, arabic numerals, roman numerals, and binary
digits have different proportions between the length of a string of symbols
and the number of possible variations it can contain. Thus two symbols of
English alphabet can store 26^2 various values, of arabic numerals 10^2, of
roman numberals 8^2 and of binary digits 2^2.

Indexation is segmentation, a breaking into segments. From as early as the
13th century the index such as that of sections has served as enabler of
search. The more [detailed] indexation the more precise search results it
enables.

The subject-index and concordance are tables of search results. There is a
direct lineage from the 13th-century biblical concordances and the birth of
computational linguistic analysis, they were both initiated and realised by
priests.

During the World War II, Jesuit Father Roberto Busa began to look for machines
for the automation of the linguistic analysis of the 11 million-word Latin
corpus of Thomas Aquinas and related authors.

Working on his Ph.D. thesis on the concept of _praesens_ in Aquinas he
realised two things:

> "I realized first that a philological and lexicographical inquiry into the
verbal system of an author has t o precede and prepare for a doctrinal
interpretation of his works. Each writer expresses his conceptual system in
and through his verbal system, with the consequence that the reader who
masters this verbal system, using his own conceptual system, has to get an
insight into the writer's conceptual system. The reader should not simply
attach t o the words he reads the significance they have in his mind, but
should try t o find out what significance they had in the writer's mind.
Second, I realized that all functional or grammatical words (which in my mind
are not 'empty' at all but philosophically rich) manifest the deepest logic of
being which generates the basic structures of human discourse. It is .this
basic logic that allows the transfer from what the words mean today t o what
they meant to the writer.

>

> In the works of every philosopher there are two philosophies: the one which
he consciously intends to express and the one he actually uses to express it.
The structure of each sentence implies in itself some philosophical
assumptions and truths. In this light, one can legitimately criticize a
philosopher only when these two philosophies are in contradiction."
¹¹(http://www.alice.id.tue.nl/references/busa-1980.pdf)

Collaborating with the IBM in New York from 1949, the work, a concordance of
all the words of Thomas Aquinas, was finally published in the 1970s in 56
printed volumes (a version is online since 2005
¹²(http://www.corpusthomisticum.org/it/index.age)). Besides that, an
electronic lexicon for automatic lemmatization of Latin words was created by a
team of ten priests in the scope of two years (in two phases: grouping all the
forms of an inflected word under their lemma, and coding the morphological
categories of each form and lemma), containing 150,000 forms
¹³(http://www.alice.id.tue.nl/references/busa-1980.pdf#page=4). Father
Busa has been dubbed the father of humanities computing and recently also of
digital humanities.

The subject-index has a crucial role in the printed book. It is the only means
for search the book offers. Subjects composing an index can be selected
according to a classification scheme (specific to a field of an inquiry), for
example as elements of a certain degree (with a given minimum number of
subclasses).

Its role seemingly vanishes in the digital text. But it can be easily
transformed. Besides serving as a table of pre-searched results the subject-
index also gives a distinct idea about content of the book. Two patterns give
us a clue: numbers of occurrences of selected words give subjects weights,
while words that seem specific to the book outweights other even if they don't
occur very often. A selection of these words then serves as a descriptor of
the whole text, and can be thought of as a specific kind of 'tags'.

This process was formalized in a mathematical function in the 1970s, thanks to
a formula by Karen Spärck Jones which she entitled 'inverse document
frequency' (IDF), or in other words, "term specificity". It is measured as a
proportion of texts in the corpus where the word appears at least once to the
total number of texts. When multiplied by the frequency of the word _in_ the
text (divided by the maximum frequency of any word in the text), we get _term
frequency-inverse document frequency_ (tf-idf). In this way we can get an
automated list of subjects which are particular in the text when compared to a
group of texts.

We came to learn it by practice of searching the web. It is a mechanism not
dissimilar to thought process involved in retrieving particular information
online. And search engines have it built in their indexing algorithms as well.

There is a paper proposing attaching words generated by tf-idf to the
hyperlinks when referring websites ¹⁴(http://bscit.berkeley.edu/cgi-
bin/pl_dochome?query_src=&format=html&collection=Wilensky_papers&id=3&show_doc=yes).
This would enable finding the referred content even after the link is dead.
Hyperlinks in references in the paper use this feature and it can be easily
tested: ¹⁵(http://www.cs.berkeley.edu/~phelps/papers/dissertation-
abstract.html?lexical-
signature=notemarks+multivalent+semantically+franca+stylized).

There is another measure, cosine similarity, which takes tf-idf further and
can be applied for clustering texts according to similarities in their
specificity. This might be interesting as a feature for digital libraries, or
even a way of organising library bottom-up into novel categories, new
discourses could emerge. Or as an aid for researchers to sort through texts,
or even for editors as an aid in producing interesting anthologies.

## Final
remarks[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=7
"Edit section: Final remarks")]

1

New disciplines emerge all the time - most recently, for example, cultural
techniques, software studies, or media archaeology. It takes years, even
decades, before they gain dedicated shelves in libraries or a category in
interlibrary digital repositories. Not that it matters that much. They are not
only sites of academic opportunities but, firstly, frameworks of new
perspectives of looking at the world, new domains of knowledge. From the
perspective of researcher the partaking in a discipline involves negotiating
its vocabulary, classifications, corpus, reference field, and specific
terms[subjects]. Creating new fields involves all that, and more. Even when
one goes against all disciplines.

2

Google can still surprise us.

3

Knowledge has been in the making for millenia. There have been (abstract)
mechanisms established that govern its conditions. We now possess specialized
corpora of texts which are interesting enough to serve as a ground to discuss
and experiment with dictionaries, classifications, indexes, and tools for
references retrieval. These all belong to the poetic devices of knowledge-
making.

4

Command-line example of tf-idf and concordance in 3 steps.

* 1\. Process the files text.1-5.txt and produce freq.1-5.txt with lists of (nonlemmatized) words (in respective texts), ordered by frequency:

> for i in {1..5}; do tr '[A-Z]' '[a-z]' < text.$i.txt | tr -c '[a-z]'
'[\012*]' | tr -d '[:punct:]' | sort | uniq -c | sort -k 1nr | sed '1,1d' >
temp.txt; max=$(awk -vvar=1 -F" " 'NR

1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx2)), $2}' > freq.$i.txt; done && rm temp.txt

2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR
line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR
line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR
line {print $field}' tfidf.$j.txt); egrep -i
"[alpha:](/index.php?title=Alpha:&action=edit&redlink=1 "Alpha: $page does
not exist$") $word" occ.$j.txt >> occ.txt; done; done

Dušan Barok

_Written 23 October - 1 November 2014 in Bratislava and Stuttgart._

Tenen & Foxman
Book Piracy as Peer Preservation
2014

Book Piracy as Peer Preservation {#book-piracy-as-peer-preservation .entry-title}

Abstract

In describing the people, books, and technologies behind one of the
largest "shadow libraries" in the world, we find a tension between the
dynamics of sharing and preservation. The paper proceeds to
contextualize contemporary book piracy historically, challenging
accepted theories of peer production. Through a close analysis of one
digital library's system architecture, software and community, we assert
that the activities cultivated by its members are closer to that of
conservationists of the public libraries movement, with the goal of
preserving rather than mass distributing their collected material.
Unlike common peer production models emphasis is placed on the expertise
of its members as digital preservations, as well as the absorption of
digital repositories. Additionally, we highlight issues that arise from
their particular form of distributed architecture and community.

>
>
> Literature is the secretion of civilization, poetry of the ideal.
> That is why literature is one of the wants of societies. That is why
> poetry is a hunger of the soul. That is why poets are the first
> instructors of the people. That is why Shakespeare must be translated
> in France. That is why Molière must be translated in England. That is
> why comments must be made on them. That is why there must be a vast
> public literary domain. That is why all poets, all philosophers, all
> thinkers, all the producers of the greatness of the mind must be
> translated, commented on, published, printed, reprinted, stereotyped,
> distributed, explained, recited, spread abroad, given to all, given
> cheaply, given at cost price, given for nothing.
> ^[1](#fn-2025-1){#fnref-2025-1}^

Introduction

The big money (and the bandwidth) in online media is in film, music, and
software. Text is less profitable for copyright holders; it is cheaper
to duplicate and easier to share. Consequently, issues surrounding the
unsanctioned sharing of print material receive less press and scant
academic attention. The very words, "book piracy," fail to capture the
spirit of what is essentially an Enlightenment-era project, openly
embodied in many contemporary "shadow libraries":^[2](#fn-2025-2){#fnref-2025-2}^
in the words of Victor Hugo, to establish a "vast public
literary domain." Writers, librarians, and political activists from Hugo
to Leo Tolstoy and Andrew Carnegie have long argued for unrestricted
access to information as a form of a public good essential to civic
engagement. In that sense, people participating in online book exchanges
enact a role closer to that of a librarian than that of a bootlegger or
a plagiarist. Whatever the reader's stance on the ethics of copyright
and copyleft, book piracy should not be dismissed as mere search for
free entertainment. Under the conditions of "digital
disruption,"^[3](#fn-2025-3){#fnref-2025-3}^ when the traditional
institutions of knowledge dissemination---the library, the university,
the newspaper, and the publishing house---feel themselves challenged and
transformed by the internet, we can look to online book sharing
communities for lessons in participatory governance, technological
innovation, and economic sustainability.

The primary aims of this paper are ethnographic and descriptive: to
study and to learn from a library that constitutes one of the world's
largest digital archives, rivaling Google Books, Hathi Trust, and
Europeana. In approaching a "thick description" of this archive we
begin to broach questions of scope and impact. We would like to ask:
Who? Where? and Why? What kind of people distribute books online? What
motivates their activity? What technologies enable the sharing of print
media? And what lessons can we draw from them? Our secondary aim is to
continue the work of exploring the phenomenon of book sharing more
widely, placing it in the context of other commons-based peer production
communities like Project Gutenberg and Wikipedia. The archetypal model
of peer production is one motivated by altruistic participation. But the
very history of public libraries is one that combines the impulse to
share and to protect. To paraphrase Jacques Derrida
^[4](#fn-2025-4){#fnref-2025-4}^ writing in "Archive Fever," the archive
shelters memory just as it shelters itself from memory. We encompass
this dual dynamic under the term "peer preservation," where the
logistics of "peers" and of "preservation" can sometimes work at odds to
one another.

Academic literature tends to view piracy on the continuum between free
culture and intellectual property rights. On the one side, an argument
is made for unrestricted access to information as a prerequisite to
properly deliberative democracy.^[5](#fn-2025-5){#fnref-2025-5}^ On this
view, access to knowledge is a form of political power, which must be
equitably distributed, redressing regional and social imbalances of
access.^[6](#fn-2025-6){#fnref-2025-6}^ The other side offers pragmatic
reasoning related to the long-term sustainability of the cultural
sphere, which, in order to prosper, must provide proper economic
incentives to content creators.^[7](#fn-2025-7){#fnref-2025-7}^

It is our contention that grassroots file sharing practices cannot be
understood solely in terms of access or intellectual property. Our field
work shows that while some members of the book sharing community
participate for activist or ideological reasons, others do so as
collectors, preservationists, curators, or simply readers. Despite
romantic notions to the contrary, reading is a social and mediated
activity. The reader encounters texts in conversation, through a variety
of physical interfaces and within an ecosystem of overlapping
communities, each projecting their own material contexts, social norms,
and ideologies. A technician who works in a biology laboratory, for
example, might publish closed-access peer-review articles by day, as
part of his work collective, and release terabytes of published material
by night, in the role of a moderator for an online digital library. Our
approach then, is to capture some of the complexity of such an
ecosystem, particularly in the liminal areas where people, texts, and
technology converge.

Ethics disclaimer

Research for this paper was conducted under the aegis of piracyLab, an
academic collective exploring the impact of technology on the spread of
knowledge globally.^[8](#fn-2025-8){#fnref-2025-8}^ One of the lab's
first tasks was to discuss the ethical challenges of collaborative
research in this space. The conversation involved students, faculty,
librarians, and informal legal council. Neutrality, to the extent that
it is possible, emerged as one of our foundational principles. To keep
all channels of communication open, we wanted to avoid bias and to give
voice to a diversity of stakeholders: from authors, to publishers, to
distributors, whether sanctioned or not. Following a frank discussion
and after several iterations, we drafted an ethics charter that
continues to inform our work today. The charter contains the following
provisions:

-- We neither condone nor condemn any forms of information exchange.\
-- We strive to protect our sources and do not retain any identifying
personal information.\
-- We seek transparency in sharing our methods, data, and findings with
the widest possible audience.\
-- Credit where credit is due. We believe in documenting attribution
thoroughly.\
-- We limit our usage of licensed material to the analysis of metadata,
with results used for non-commercial, nonprofit, educational purposes.\
-- Lab participants commit to abiding by these principles as long as
they remain active members of the research group.

In accordance with these principles and following the practice of
scholars like Balazs Bodo ^[9](#fn-2025-9){#fnref-2025-9}^, Eric Priest
^[10](#fn-2025-10){#fnref-2025-10}^, and Ramon Lobato and Leah Tang
^[11](#fn-2025-11){#fnref-2025-11}^, we redact the names of file sharing
services and user names, where such names are not made explicitly public
elsewhere.

Centralization

We begin with the intuition that all infrastructure is social to an
extent. Even private library collections cannot be said to reflect the
work of a single individual. Collective forces shape furniture, books,
and the very cognitive scaffolding that enables reading and
interpretation. Yet, there are significant qualitative differences in
the systems underpinning private collections, public libraries, and
unsanctioned peer-to-peer information exchanges like The Pirate Bay,
for example. Given these differences, the recent history of online book
sharing can be divided roughly into two periods. The first is
characterized by local, ad-hoc peer-to-peer document exchanges and the
subsequent growth of centralized content aggregators. Following trends
in the development of the web as a whole, shadow libraries of the second
period are characterized by communal governance and distributed
infrastructure.

Shadow libraries of the first period resemble a private library in that
they often emanate from a single authoritative source--a site of
collection and distribution associated with an individual collector,
sometimes explicitly. The library of Maxim Moshkov, for example,
established in 1994 and still thriving at lib.ru, is one of the most
visible collections of this kind. Despite their success, such libraries
are limited in scale by the means and efforts of a few individuals. Due
to their centralized architecture they are also susceptible to legal
challenges from copyright owners and to state intervention.
Shadow libraries responded to these problems by distributing labor,
responsibility, and infrastructure, resulting in a system that is more
robust, more redundant, and more resistant to any single point of
failure or control.

The case of Gigapedia (later library.nu) and its related file
hosting service ifile.it demonstrates the successes and the
deficiencies of the centralized digital library model. Arguably among
the largest and most popular virtual libraries online in the period of
2009-2011, the sites were operated by Irish
nationals^[12](#fn-2025-12){#fnref-2025-12}^ on domains registered in
Italy and on the island state of Niue, with servers on the territory of
Germany and Ukraine. At its peak, library.nu (LNU) hosted more than
400,000 books and was purported to make an "estimated turnover of EUR 8
million (USD 10,602,400) from advertising revenues, donations and sales
of premium-level accounts," at least according to a press release made
by the International Publishers Association
(IPA).^[13](#fn-2025-13){#fnref-2025-13}^\
Archived version of library.nu, circa 12/10/2010

Its apparent popularity notwithstanding, LNU/Gigapedia was supported
by relatively simple architecture, likely maintained by a lone
developer-administrator. The site itself consisted of a catalog of
digital books and related metadata, including title, author, year of
publication, number of pages, description, category classification, and
a number of boolean parameters (whether the file is bookmarked,
paginated, vectorized, is searchable, and has a cover). Although the
books could be hosted anywhere, many in the catalog resided on the
servers of a "cyberlocker" service ifile.it, affiliated with the main
site. Not strictly a single-source archive, LNU/Gigapedia was
nevertheless a federated entity, tied to a single site and to a single
individual. On February 15, 2012, in a Munich court, the IPA, in
conjunction with a consortium of international publishing houses and the
help of the German law firm Lausen
Rechtsanwalte,^[14](#fn-2025-14){#fnref-2025-14}^ served judicial
cease-and-desist orders naming both sites (Gigapedia and ifile.it).
Seventeen injunctions were sought in Ireland, with the consequent
voluntary shut-down of both domains, which for a brief time redirected
visitors first to Google Books and then to Blue Latitudes, a New
York Times bestseller about pirates, for sale on Amazon.

::: {#attachment_2430 .wp-caption .alignnone style="width: 310px"}
[![](http://computationalculture.net/wp-content/uploads/2014/11/figure-13-300x176.jpg "figure-1"){.size-medium
.wp-image-2430 width="300" height="176"
sizes="(max-width: 300px) 100vw, 300px"
srcset="http://computationalculture.net/wp-content/uploads/2014/11/figure-13-300x176.jpg 300w, http://computationalculture.net/wp-content/uploads/2014/11/figure-13-1024x603.jpg 1024w"}](http://computationalculture.net/wp-content/uploads/2014/11/figure-13.jpg)

Figure 1: Archived version of library.nu, circa 12/10/2010
:::

The relatively brief, by library standards, existence of LNU/Gigapedia
underscores a weakness in the federated library model. The site
flourished as long as it did not attract the ire of the publishing
industry. A lack of redundancy in the site's administrative structure
paralleled its lack on the server level. Once the authorities were able
to establish the identity of the site's operators (via Paypal
receipts, according to a partner at Lausen Rechtsanwalte), the project
was forced to shut down irrevocably.^[15](#fn-2025-15){#fnref-2025-15}^
The system's single point of origin proved also to be its single point
of failure.

Jens Bammel, Secretary General of the IPA, called the action "an
important step towards a more transparent, honest and fair trade of
digital content on the Internet."^[16](#fn-2025-16){#fnref-2025-16}^ The
rest of the internet mourned the passage of "the greatest, largest and
the best website for downloading
eBooks,"^[17](#fn-2025-17){#fnref-2025-17}^ comparing the demise of
LNU/Gigapedia to the burning of the ancient Library of
Alexandria.^[18](#fn-2025-18){#fnref-2025-18}^ Readers from around the
world flocked to sites like Reddit and TorrentFreak to express their
support and anger. For example, one reader wrote on TorrentFreak:

> I live in Macedonia (the Balkans), a country where the average salary
> is somewhere around 200eu, and I'm a student, attending a MA degree in
> communication sci. \[...\] where I come from the public library is not
> an option. \[...\] Our libraries are so poor, mostly containing 30year
> or older editions of books that almost never refer to the field of
> communication or any other contemporary science. My professors never
> hide that they use sites like library.nu \[...\] Original textbooks
> \[...\] are copy-printed handouts of some god knows how obtained
> original \[...\] For a country like Macedonia and the Balkans region
> generally THIS IS A APOCALYPTIC SCALE DISASTER! I really feel like the
> dark age is just around the corner these
> days.^[19](#fn-2025-19){#fnref-2025-19}^

A similar comment on Reddit reads:

> This is the saddest news of the year...heart-breaking...shocking...I
> was so attached to this site...I am from a third world country where
> buying original books is way too expensive if we see currency exchange
> rates...library.nu was a sea of knowledge for me and I learnt a lot
> from it \[...\] RIP library.nu...you have ignited several minds with
> free knowledge.^[20](#fn-2025-20){#fnref-2025-20}^

Another redditor wrote:

> This was an invaluable resource for international academics. The
> catalog of libraries overseas often cannot meet the needs of
> researchers in fields not specific to the country in which they are
> located. My doctoral research has taken a significant blow due to this
> recent shutdown \[...\] Please publishers, if you take away such a
> valuable resource, realize that you have created a gap that will be
> filled. This gap can either be filled by you or by
> us.^[21](#fn-2025-21){#fnref-2025-21}^

Another concludes:

> This just makes me want to start archiving everything I can get my
> hands on.^[22](#fn-2025-22){#fnref-2025-22}^

These anecdotal reports confirm our own experiences of studying and
teaching at universities with a diverse audience of international
students, who often recount a similar personal narrative. Gigapedia
and analogous sites fulfilled an unmet need in the international market,
redressing global inequities of access to
information.^[23](#fn-2025-23){#fnref-2025-23}^

But, being a cyberlocker-based service, Gigapedia did not succeed in
cultivating a meaningful sense of a community (even though it supported
a forum for brief periods of its existence). As Lobato and Tang
^[24](#fn-2025-24){#fnref-2025-24}^ write in their paper on
cyberlocker-based media distribution systems, cyberlockers in general
"do not foster collaboration and co-creation," taking an "instrumental
view of content hosted on their
sites."^[25](#fn-2025-25){#fnref-2025-25}^ Although not strictly a
cyberlocker, LNU/Gigapedia fit the profile of a passive,
non-transformative site by these criteria. For Lobato and Tang, the
rapid disappearance of many prominent cyberlocker sites underscores the
"structural instability" of "fragile file-hosting
ecology."^[26](#fn-2025-26){#fnref-2025-26}^ In our case, it would be
more precise to say that cyberlocker architecture highlights rather the
structural instability of centralized media archives, and not of file
sharing communities in general. Although bereaved readers were concerned
about the irrevocable loss of a valuable resource, digital libraries
that followed built a model of file sharing that is more resilient, more
transparent, and more participatory than their LNU/Gigapedia
predecessors.

Distribution

In parallel with the development of LNU/Gigapedia, a group of Russian
enthusiasts were working on a meta-library of sorts, under the name of
Aleph. Records of Aleph's activity go back at least as far as 2009.
Colloquially known as "prospectors," the volunteer members of Aleph
compiled library collections widely available on the gray market, with
an emphasis on academic and technical literature in Russian and
English.\
DVD case cover of "Traum's library" advertising "more than 167,000
books" in fb2 format. Similar DVDs sell for around 1,000 RUB (\$25-30
US) on the streets of Moscow.

At its inception, Aleph aggregated several "home-grown" archives,
already in wide circulation in universities and on the gray market.
These included:

-- KoLXo3, a collection of scientific texts that was at one time
distributed on 20 DVDs, overlapping with early Gigapedia efforts;\
-- mexmat, a library collected by the members of Moscow State
University's Department of Mechanics and Mathematics for internal use,
originally distributed through private FTP servers;\
-- Homelab, Ihtik, and Ingsat libraries;\
-- the Foreign Fiction archive collected from IRC \#\\\*
2003.09-2011.07.09 and the Internet Library;\
-- the Great Science Textbooks collection and, later, over 20 smaller
miscellaneous archives.^[27](#fn-2025-27){#fnref-2025-27}^

In retrospect, we can categorize the founding efforts along three
parallel tracks: 1) as the development of "front-end" server software
for searching and downloading books, 2) as the organization of an online
forum for enthusiasts willing to contribute to the project, and 3) the
collection effort required to expand and maintain the "back-end" archive
of documents, primarily in .pdf and .djvu
formats.^[28](#fn-2025-28){#fnref-2025-28}^ "What do we do?" writes one
of the early volunteers (in 2009) on the topic of "Outcomes, Goals, and
Scope of the Project." He answers: "we loot sites with ready-made
collections," "sort the indices in arbitrary normalized formats," "for
uncatalogued books we build a 'technical index': name of file, size,
hashcode," "write scripts for database sorting after the initial catalog
process," "search the database," "use the database for the construction
of an accessible catalog," "build torrents for the distribution of files
in the collection."^[29](#fn-2025-29){#fnref-2025-29}^ But, "everything
begins with the forum," in the words of another founding
member.^[30](#fn-2025-30){#fnref-2025-30}^ Aleph, the very name of the
group, reflects the aspiration to develop a "platform for the inception
of subsequent and more user-friendly" libraries--a platform "useful for
the developer, the reader, and the
librarian."^[31](#fn-2025-31){#fnref-2025-31}^\
Aleph's anatomy

::: {#attachment_2431 .wp-caption .alignnone style="width: 310px"}
[![](http://computationalculture.net/wp-content/uploads/2014/11/figure-21-300x300.jpg "figure-2"){.size-medium
.wp-image-2431 width="300" height="300"
sizes="(max-width: 300px) 100vw, 300px"
srcset="http://computationalculture.net/wp-content/uploads/2014/11/figure-21-300x300.jpg 300w, http://computationalculture.net/wp-content/uploads/2014/11/figure-21-150x150.jpg 150w, http://computationalculture.net/wp-content/uploads/2014/11/figure-21-1024x1024.jpg 1024w, http://computationalculture.net/wp-content/uploads/2014/11/figure-21.jpg 1200w"}](http://computationalculture.net/wp-content/uploads/2014/11/figure-21.jpg)

Figure 2: DVD case cover of "Traum's library" advertising "more than
167,000 books
:::

What is Aleph? Is it a collection of books? A community? A piece of
software? What makes a library? When attempting to visualize Aleph's
constituents (Figure 3), it seems insufficient to point to books alone,
or to social structure, or to technology in the absence of people and
content. Taking a systems approach to description, we understand a
library to comprise an assemblage of books, people, and infrastructure,
along with their corresponding words and texts, rules and institutions,
and shelves and servers.^[32](#fn-2025-32){#fnref-2025-32}^ In this
light, Aleph's iteration on LNU/Gigapedia lies not in technological
advancement alone, but in system architecture, on all levels of
analysis.

Where the latter relied on proprietary server applications, Aleph
built software that enabled others to mirror and to serve the site in
its entirety. The server was written by d\* from www.l\.com (Bet),
utilizing a codebase common to several similar large book-sharing
communities. The initial organizational efforts happened on a sub-forum
of a popular torrent tracker (RR). Fifteen founding members reached
early consensus to start hashing document filenames (using the MD5
message-digest algorithm), rather than to store files as is, with their
appropriate .pdf or .mobi extensions.^[33](#fn-2025-33){#fnref-2025-33}^
Bit-wise hashing was likely chosen as a (computationally) cheap way to
de-duplicate documents, since two identical files would hash into an
identical string. Hashing the filenames was hoped to have the
side-effect of discouraging direct (file system-level) browsing of the
archive.^[34](#fn-2025-34){#fnref-2025-34}^ Instead, the books were
meant to be accessed through the front-end "librarian" interface, which
added a layer of meta-data and search tools. In other words, the group
went out of its way to distribute Aleph* as a library and not merely as
a large aggregation of raw files.

::: {#attachment_2221 .wp-caption .alignnone style="width: 593px"}
[![](http://computationalculture.net/wp-content/uploads/2014/10/figure-3.jpg "figure-3"){.size-full
.wp-image-2221 width="583" height="526"
sizes="(max-width: 583px) 100vw, 583px"
srcset="http://computationalculture.net/wp-content/uploads/2014/10/figure-3.jpg 583w, http://computationalculture.net/wp-content/uploads/2014/10/figure-3-300x270.jpg 300w"}](http://computationalculture.net/wp-content/uploads/2014/10/figure-3.jpg)

Figure 3: Aleph's anatomy
:::

Site volunteers coordinate their efforts asynchronously, by means of a
simple online forum (using phpBB software), open to all interested
participants. Important issues related to the governance of the
project--decisions about new hardware upgrades, software design, and
book acquisition--receive public airing. For example, at one point, the
site experienced increased traffic from Google searches. Some senior
members welcomed the attention, hoping to attract new volunteers. Others
worried increased visibility would bring unwanted scrutiny. To resolve
the issue, a member suggested delisting the website by altering the
robots.txt configuration file and thereby blocking Google
crawlers.^[35](#fn-2025-35){#fnref-2025-35}^ Consequently, the site
would become invisible to Google, while remaining freely accessible
via a direct link. Early conversations on RR, reflect a consistent
concern about the archive's longevity and its vulnerability to official
sanctions. Rather than following the cyber-locker model of distribution,
the prospectors decided to release canonical versions of the library in
chunks, via BitTorrent--a distributed protocol for file sharing.
Another decision was made to "store" the library on open trackers (like
The Pirate Bay), rather than tying it to a closed, by-invitation-only
community. Although LN/Gigapedia was already decentralized to an
extent, the archeology of the community discussion reveals a multitude
of concious choices that work to further atomize Aleph and to
decentralize it along the axes of the collection, governance, and
engineering.

By March of 2009 these efforts resulted in approximately 79k volumes or
around 180gb of data.^[36](#fn-2025-36){#fnref-2025-36}^ By December of
the same year, the moderators began talking about a terabyte, 2tb in
2010, and around 7tb by 2011.^[37](#fn-2025-37){#fnref-2025-37}^ By
2012, the core group of "prospectors" grew to 1,000 registered users.
Aleph's main mirror received over a million page views per month and
about 40,000 unique visits per day.^[38](#fn-2025-38){#fnref-2025-38}^
An online eBook piracy report estimates a combined total of a million
unique visitors per day for Aleph and its
mirrors.^[39](#fn-2025-39){#fnref-2025-39}^

As of January 2014, the Aleph catalog contains over a million books
(1,021,000) and over 15 million academic articles, "weighing in" at just
under 10tb. Most remarkably, one of the world's largest digital
libraries operates on an annual budget of \$1,900
US.^[40](#fn-2025-40){#fnref-2025-40}^

\#\#\# Vulnerability\
Distributed architecture gives Aleph significant advantages over its
federated predecessors. Were Aleph servers to go offline the archive
would survive "in the cloud" of the BitTorrent network. Should the
forum (Bet) close, another online forum could easily take its place.
And were Aleph library portal itself go dark, other mirrors would (and
usually do) quickly take its place.

But the decentralized model of content distribution is not without its
challenges. To understand them, we need to review some of the
fundamentals behind the BitTorrent protocol. At its bare minimum (as
it was described in the original specification by Bram Cohen) the
protocol involves a "seeder," someone willing to share something it its
entirety; a "leecher," someone downloading shared data; and a torrent
"tracker" that coordinates activity between seeders and
leechers.^[41](#fn-2025-41){#fnref-2025-41}^

Imagine a music album sharing agreement between three friends, where,
initially, only one holds a copy of some album: for example, Nirvana's
Nevermind. Under the centralized model of file sharing, the friend
holding the album would transmit two copies, one to each friend. The
power of BitTorrent comes from shifting the burden of sharing from a
single seeder (friend one) to a "swarm" of leechers (friends two and
three). On this model, the first leecher joining the network (friend
two, in our case) would begin to get his data from the seeder directly,
as before. But the second leecher would receive some bits from the
seeder and some from the first leecher, in a non-linear, asynchronous
fashion. In our example, we can imagine the remaining friend getting
some songs from the first friend and some from the second. The friend
who held the album originally now transmitted something less than two
full copies of the album, since the other two friends exchanged some
bits of information between themselves, lessening the load on the
original album holder.

When downloading from the BitTorrent network, a peer may receive some
bits from the beginning of the document, some from the middle, and some
from the end, in parts distributed among the members of the swarm. A
local application called the "client" is responsible for checking the
integrity of the pieces and for reassembling the them into a coherent
whole. A torrent "tracker" coordinates the activity between peers,
keeping track of who has what where. Having received the whole document,
a leecher can, in turn, become a seeder by sharing all of his downloaded
bits with the remaining swarm (who only have partial copies). The
leecher can also take the file offline, choosing not to share at
all.^[42](#fn-2025-42){#fnref-2025-42}^

The original protocol left torrent trackers vulnerable to charges of
aiding and abetting copyright
infringement.^[43](#fn-2025-43){#fnref-2025-43}^ Early in 2008, Cohen
extended BitTorrent to make use of "distributed sloppy hash tables"
(DHT) for storing peer locations without resorting to a central tracker.
Under these new guidelines, each peer would maintain a small routing
table pointing to a handful of nearby peer locations. In effect, DHT
placed additional responsibility on the swarm to become a tracker of
sorts, however "sloppy" and imperfect. By November of of 2009, Pirate
Bay announced its transition away from tracking entirely, in favor of
DHT and the related PEX and Magnetic Links protocols. At the time they
called it, "world's most resilient
tracking."^[44](#fn-2025-44){#fnref-2025-44}^

Despite these advancements, the decentralized model of file sharing
remains susceptible to several chronic ailments. The first follows from
the fact that ad-hoc distribution networks privilege popular material. A
file needs to be actively traded to ensure its availability. If nobody
is actively sharing and downloading Nirvana's Nevermind, the album is
in danger of fading out of the cloud. As one member wrote succinctly on
Gimel forums, "unpopular files are in danger of become
inaccessible."^[45](#fn-2025-45){#fnref-2025-45}^ This dynamic is less
of a concern for Hollywood blockbusters, but more so for "long tail"
specialized materials of the sort found in Aleph, and indeed, for
Aleph itself as a piece of software distributed through the network.
Aleph combats the problem of fading torrents by renting
"seedboxes"--servers dedicated to keeping the Aleph seeds containing
the archive alive, preserving the availability of the collection. The
server in production as of 2014 can serve up to 12tb of data speeds of
100-800 megabits per second. Other file sharing communities address the
issue by enforcing a certain download to upload ratio on members of
their network.

The lack of true anonymity is the second problem intrinsic to the
BitTorrent protocol. Peers sharing bits directly cannot but avoid
exposing their IP address (unless these are masked behind virtual
private networks or TOR relays). A "Sybil" attack becomes possible when
a malicious peer shares bits in bad faith, with the intent to log IP
addresses.^[46](#fn-2025-46){#fnref-2025-46}^ Researchers exploring this
vector of attack were able to harvest more than 91,000 IP addresses in
less than 24 hours of sharing a popular television
show.^[47](#fn-2025-47){#fnref-2025-47}^ They report that more than 9%
of requests made to their servers indicated "modified clients", which
are likely also to be running experiments in the DHT. Legitimate
copyright holders and copyright "trolls" alike have used this
vulnerability to bring lawsuits against individual sharers in
court.^[48](#fn-2025-48){#fnref-2025-48}^

These two challenges are further exacerbated in the case of Aleph,
which uses BitTorrent to distribute large parts of its own
architecture. These parts are relatively large--around 40-50GB each.
Long-term sustainability of Aleph as a distributed system therefore
requires a rare participant: one interested in downloading the archive
as a whole (as opposed to downloading individual books), one who owns
the hardware to store and transmit terabytes of data, and one possessing
the technical expertise to do so safely.

Peer preservation

In light of the challenges and the effort involved in maintaining the
archive, one would be remiss to describe Aleph merely in terms of book
piracy, understood in conventional terms of financial gain, theft, or
profiteering. Day-to-day labor of the core group is much more
comprehensible as a mode of commons-based peer production, which is, in
the canonical definition, work made possible by a "networked
environment," "radically decentralized, collaborative, and
non-proprietary; based on sharing resources and outputs among widely
distributed, loosely connected individuals who cooperate with each other
without relying on either market signals or managerial
commands."^[49](#fn-2025-49){#fnref-2025-49}^ Aleph answers the
definition of peer production, resembling in many respects projects like
Linux, Wikipedia, and Project Gutenberg.

Yet, Aleph is also patently a library. Its work can and should be
viewed in the broader context of Enlightenment ideals: access to
literacy, universal education, and the democratization of knowledge. The
very same ideals gave birth to the public library movement as a whole at
the turn of the 20th century, in the United States, Europe, and
Russia.^[50](#fn-2025-50){#fnref-2025-50}^ Parallels between free
library movements of the early 20th and the early 21st centuries point
to a social dynamic that runs contrary to the populist spirit of
commons-based peer production projects, in a mechanism that we describe
as peer preservation. The idea encompasses conflicting drives both to
share and to hoard information.

The roots of many public libraries lie in extensive private collections.
Bodleian Library at Oxford, for example, traces its origins back to the
collections of Thomas Cobham, Bishop of Worcester, Humphrey, Duke of
Gloucester, and to Thomas Bodley, himself an avid book collector.
Similarly, Poland's Zaluski Library, one of Europe's oldest, owes its
existence to the collecting efforts of the Zaluski brothers, both
bishops and bibliophiles.^[51](#fn-2025-51){#fnref-2025-51}^ As we
mentioned earlier, Aleph too began its life as an aggregator of
collections, including the personal libraries of Moshkov and Traum. When
books are scarce, private libraries are a sign of material wealth and
prestige. In the digital realm, where the cost of media acquisition is
low, collectors amass social capital. Aleph extends its collecting
efforts on RR, a much larger, moderated torrent exchange forum and
tracker. RR hosts a number of sub-forums dedicated to the exchange of
software, film, music, and books (where members of Aleph often make an
appearance). In the exchange economy of symbolic goods, top collectors
are known by their standing in the community, as measured by their
seniority, upload and download ratios, and the number of "releases." A
release is more than just a file: it must not duplicate items in the
archive and follows strict community guidelines related to packaging,
quality, and meta-data accompanying the document. Less experienced
members of the community treat high status numbers with reverence and
respect.

According to a question and answer session with an official RR
representative, RR is not particularly friendly to new
users.^[52](#fn-2025-52){#fnref-2025-52}^ In fact, high barriers to
entry are exactly what differentiates RR from sites like The Pirate
Bay and other unmoderated, open trackers. RR prides itself on the
"quality of its moderation." Unlike Pirate Bay, RR sees itself as a
"media library", where content is "organized and properly shelved." To
produce an acceptable book "release" one needs to create a package of
files, including well-formatted meta-data (following strict stylistic
rules) in the header, the name of the book, an image of its cover, the
year of release, author, genre, publisher, format, language, a required
description, and screenshots of a sample page. The files must be named
according to a convention, be "of the same kind" (that is belong to the
same collection), and be of the right size. Home-made scans are
discouraged and governed by a 1,000-words instruction manual. Scanned
books must have clear attribution to the releaser responsible for
scanning and processing.

More than that, guidelines indicate that smaller releases should be
expected to be "absorbed" into larger ones. In this way, a single novel
by Charles Dickens can and will be absorbed into his collected works,
which might further be absorbed into "Novels of 19th Century," and then
into "Foreign Fiction" (as a hypothetical, but realistic example).
According to the rules, the collection doing the absorbing must be "at
least 50% larger than the collection it is absorbing." Releases are
further governed by a subset or rules particular to the forum
subsections (e.g. journals, fiction, documentation, service manuals,
etc.).^[53](#fn-2025-53){#fnref-2025-53}^

All this to say that although barriers to acquisition are low, the
barriers to active participation are high and continually increase with
time. The absorption of smaller collections by larger favors the
veterans. Rules and regulations grow in complexity with the maturation
of the community, further widening the rift between senior and junior
peers. We are then witnessing something like the institutionalization of
a professional "librarian" class, whose task it is to protect the
collection from the encroachment of low-quality contributors. Rather
than serving the public, a librarian's primary commitment is to the
preservation of the archive as a whole. Thus what starts as a true peer
production project, may, in the end, grow to erect solid walls to
peering. This dynamic is already embodied in the history of public
libraries, where amateur librarians of the late 19th century eventually
gave way to their modern degree-holding counterparts. The conflicting
logistics of access and preservation may lead digital library
development along a similar path.

The expression of this dual push and pull dynamic in the observed
practices of peer preservation communities conforms to Derrida's insight
into the nature of the archive. Just as the walls of a library serve to
shelter the documents within, they also isolate the collection from the
public at large. Access and preservation, in that sense, subsist at
opposite and sometime mutually exclusive ends of the sharing spectrum.
And it may be that this dynamic is particular to all peer production
communities, like Wikipedia, which, according to recent studies, saw a
decline in new contributors due to increasingly strict rule
enforcement.^[54](#fn-2025-54){#fnref-2025-54}^ However, our results are
merely speculative at the moment. The analysis of a large dataset we
have collected as corollary to our field work online may offer further
evidence for these initial intuitions. In the meantime, it is not enough
to conclude that brick-and-mortar libraries should learn from these
emergent, distributed architectures of peer preservation. If the future
of Aleph is leading to increased institutionalization, the community
may soon face the fate embodied by its own procedures: the absorption of
smaller, wonderfully messy, ascending collections into larger, more
established, and more rigid social structures.

Biographies

Dennis Tenen teaches in the fields of new media and digital humanities
at Columbia University, Department of English and Comparative
Literature. His research often happens at the intersection of people,
texts, and technology. He is currently writing a book on minimal
computing, called Plain Text.

Maxwell Foxman is an adjunct professor at Marymount Manhattan College
and a PhD candidate in Communications at Columbia University, where he
studies the use and adoption of digital media into everyday life. He has
written on failed social media and on gamification in electoral
politics, newsrooms, and mobile media.

References

Allen, Elizabeth Akers, and James Phinney Baxter. Dedicatory Exercises
of the Baxter Building. Auburn, Me: Lakeside Press, 1889.

Anonymous author. "Library.nu: Modern era's 'Destruction of the Library
of Alexandria.'" Breaking Culture. Last edited on February 16, 2012
and archived on archived on January 14, 2014.
[http://breakingculture.tumblr.com/post/17697325088/gigapedia-rip](“https://web.archive.org/web/20140113135846/http://breakingculture.tumblr.com/post/17697325088/gigapedia-rip”).

Benkler, Yochai. The Wealth of Networks: How Social Production
Transforms Markets and Freedom. New Haven: Yale University Press, 2006.

Bittorrent.org. "The BitTorrent Protocol Specification." Last modified
October 20, 2012 and archived on June 13, 2014.
[http://www.bittorrent.org/beps/bep\_0003.html](“http://web.archive.org/web/20140613190300/http://www.bittorrent.org/beps/bep_0003.html”).

Bodo, Balazs. "Set the Fox to Watch the Geese: Voluntary IP Regimes in
Piratical File-Sharing Communities." In Piracy: Leakages from
Modernity. Litwin Books, LLC, 2012.

Bowker, Geoffrey C., and Susan Leigh Star. Sorting Things Out:
Classification and Its Consequences. The MIT Press, 1999.

Calandrillo, Steve P. "Economic Analysis of Property Rights in
Information: Justifications and Problems of Exclusive Rights, Incentives
to Generate Information, and the Alternative of a Government-Run Reward
System, an." Fordham Intellectual Property, Media & Entertainment Law
Journal 9 (1998): 301.

Calhoun, Craig. "Information Technology and the International Public
Sphere." In Shaping the Network Society: the New Role of Civil Society
in Cyberspace, edited by Douglas Schuler and Peter Day, 229--52. MIT
Press, 2004.

Castells, Manuel. "Communication, Power and Counter-Power in the Network
Society." International Journal of Communication 1 (2007): 238--66.

Cholez, Thibault, Isabelle Chrisment, and Olivier Festor. "Evaluation of
Sybil Attacks Protection Schemes in KAD." In Scalability of Networks
and Services, edited by Ramin Sadre and Aiko Pras, 70--82. Lecture
Notes in Computer Science 5637. Springer Berlin Heidelberg, 2009.

Cohen, Bram. Incentives Build Robustness in BitTorrent, May 22, 2003.
[http://www.bittorrent.org/bittorrentecon.pdf](“http://www.bittorrent.org/bittorrentecon.pdf”).

Cohen, Julie. "Creativity and Culture in Copyright Theory." U.C. Davis
Law Review 40 (2006): 1151.

Day, Brian R. In Defense of Copyright: Creativity, Record Labels, and
the Future of Music. SSRN Scholarly Paper. Rochester, NY: Social
Science Research Network, May 2010.

Derrida, Jacques. "Archive Fever: a Freudian Impression." Diacritics
25, no. 2 (July 1995): 9--63.

DiMaggio, Paul, Eszter Hargittai, W. Russell Neuman, and John P.
Robinson. "Social Implications of the Internet." Annual Review of
Sociology 27 (January 2001): 307--36.

Edwards, Paul N. "Infrastructure and Modernity: Force, Time, and Social
Organization in the History of Sociotechnical Systems." In Modernity
and Technology, 185--225, 2003.

---------. "Y2K: Millennial Reflections on Computers as Infrastructure."
History and Technology 15, no. 1-2 (1998): 7--29.

Edwards, Paul N., Geoffrey C. Bowker, Steven J. Jackson, and Robin
Williams. "Introduction: an Agenda for Infrastructure Studies." Journal
of the Association for Information Systems 10, no. 5 (2009): 364--74.

Ernesto. "US P2P Lawsuit Shows Signs of a 'Pirate Honeypot'."
Technology. TorrentFreak. Last edited in June 2011 and archived on
January 14, 2014.
[http://torrentfreak.com/u-s-p2p-lawsuit-shows-signs-of-a-pirate-honeypot-110601/](“https://web.archive.org/web/20140114200326/http://torrentfreak.com/u-s-p2p-lawsuit-shows-signs-of-a-pirate-honeypot-110601/”).

Gauravaram, Praveen, and Lars R. Knudsen. "Cryptographic Hash
Functions." In Handbook of Information and Communication Security,
edited by Peter Stavroulakis and Mark Stamp, 59--79. Springer Berlin
Heidelberg, 2010.

Greenwood, Thomas. Public Libraries: a History of the Movement and a
Manual for the Organization and Management of Rate Supported Libraries.
Simpkin, Marshall, Hamilton, Kent, 1890.

Halfaker, Aaron, R. Stuart Geiger, Jonathan T. Morgan, and John Riedl.
"The Rise and Decline of an Open Collaboration System: How Wikipedia's
Reaction to Popularity Is Causing Its Decline." American Behavioral
Scientist, December 2012, 0002764212469365.

Harris, Michael H. History of Libraries of the Western World. Fourth
Edition. Lanham, Md.; London: Scarecrow Press, 1999.

Hughes, Justin. "Philosophy of Intellectual Property, the." Georgetown
Law Journal 77 (1988): 287.
http://heinonline.org/HOL/Page?handle=hein.journals/glj77&id=309&div=&collection=journals.

Hugo, Victor. Works of Victor Hugo. New York: Nottingham Society,
1907.

International Publishers Association. "Publishers Strike Major Blow
against Internet Piracy." Last modified February 15, 2012.
[http://www.internationalpublishers.org/ipa-press-releases/286-publishers-strike-major-blow-against-internet-piracy](“http://www.internationalpublishers.org/ipa-press-releases/286-publishers-strike-major-blow-against-internet-piracy”).

Johnson, Simon for Reuters.com. "Pirate Bay Copyright Test Case Begins
in Sweden." Last edited on February 16, 2009 and archived on August 4,
2014.
[http://uk.reuters.com/article/2009/02/16/tech-us-sweden-piratebay-idUKTRE51F3K120090216](http://web.archive.org/web/20140804000829/http://uk.reuters.com/article/2009/02/16/tech-us-sweden-piratebay-idUKTRE51F3K120090216”).\]

Karaganis, Joe, ed. Media Piracy in Emerging Economies. Social Science
Research Network, March 2011.
[http://piracy.americanassembly.org/the-report/.](“http://piracy.americanassembly.org/the-report/”).

Landes, William M., and Richard A. Posner. The Economic Structure of
Intellectual Property Law. Harvard University Press, 2003.

Larkin, Brian. "Degraded Images, Distorted Sounds: Nigerian Video and
the Infrastructure of Piracy." Public Culture 16, no. 2 (2004):
289--314.

---------. "Pirate Infrastructures." In Structures of Participation in
Digital Culture, edited by Joe Karaganis, 74--87. New York: SSRC, 2008.

Lessig, Lawrence. Free Culture: How Big Media Uses Technology and the
Law to Lock Down Culture and Control Creativity. The Penguin Press,
2004.

Liang, Lawrence. "Shadow Libraries E-Flux," last edited 2012 and
archived on October 14, 2014.
http://www.e-flux.com/journal/shadow-libraries/.

Lobato, Ramon, and Leah Tang. "The Cyberlocker Gold Rush: Tracking the
Rise of File-Hosting Sites as Media Distribution Platforms."
International Journal of Cultural Studies, November 2013.

Losowsky, Andrew. "Book Downloading Site Targeted in Injunctions
Requested by 17 Publishers." Huffington Post, last edited on February
2012 and archived on October 14, 2014.
[http://www.huffingtonpost.com/2012/02/15/librarynu-book-downloading-injunction\_n\_1280383.html](“http://www.huffingtonpost.com/2012/02/15/librarynu-book-downloading-injunction_n_1280383.html”).

Papacharissi, Zizi. "The Virtual Sphere the Internet as a Public
Sphere." New Media & Society 4, no. 1 (February 2002): 9--27.

Priest, Eric. "The Future of Music and Film Piracy in China." Berkeley
Technology Law Journal 21 (2006): 795.

Salmon, Ricardo, Jimmy Tran, and Abdolreza Abhari. "Simulating a File
Sharing System Based on BitTorrent." In Proceedings of the 2008 Spring
Simulation Multiconference, 21:1--:5. SpringSim '08. San Diego, CA,
USA: Society for Computer Simulation International, 2008.

Shirky, Clay. Here Comes Everybody: the Power of Organizing Without
Organizations. New York: Penguin Press, 2008.

Star, Susan Leigh, and Geoffrey C. Bowker. "How to Infrastructure." In
Handbook of New Media: Social Shaping and Social Consequences of ICTs,
Updated Student Edition., 230--46. SAGE Publications Ltd, 2010.

Stuart, Mary. "Creating a National Library for the Workers' State: the
Public Library in Petrograd and the Rumiantsev Library Under Bolshevik
Rule." The Slavonic and East European Review 72, no. 2 (April 1994):
233--58.

---------. "'The Ennobling Illusion': the Public Library Movement in
Late Imperial Russia." The Slavonic and East European Review 76, no. 3
(July 1998): 401--40.

---------. "The Evolution of Librarianship in Russia: the Librarians of
the Imperial Public Library, 1808-1868." The Library Quarterly 64, no.
1 (January 1994): 1--29.

Timpanaro, J.P., T. Cholez, I Chrisment, and O. Festor. "BitTorrent's
Mainline DHT Security Assessment." In 2011 4th IFIP International
Conference on New Technologies, Mobility and Security (NTMS), 1--5,
2011.

TPB. "Worlds most resiliant tracking." Last edited November 17, 2009 and
archived on August 4, 2014.
[thepiratebay.se/blog/175](“http://web.archive.org/web/20140804015645/http://thepiratebay.se/blog/175”)

Vik. "Gigapedia: The greatest, largest and the best website for
downloading eBooks." Emotionallyspeaking.com. Last edited on August 10,
2009 and archived on July 15, 2012.
[http://archive.is/g205"\>http://vikas-gupta.in/2009/08/10/gigapedia-the-greatest-largest-and-the-best-website-for-downloading-free-e-books/](“http://archive.is/g205”).

::: {#footnotes-2025 .footnotes}
::: {.footnotedivider}
:::

1. [Victor Hugo, Works of Victor Hugo (New York: Nottingham Society,
1907), 230. [[↩](#fnref-2025-1)]{.footnotereverse}]{#fn-2025-1}
2. [Lawrence Liang, "Shadow Libraries E-Flux," 2012.
[[↩](#fnref-2025-2)]{.footnotereverse}]{#fn-2025-2}
3. [McKendrick, Joseph. Libraries: At the Epicenter of the Digital
Disruption, The Library Resource Guide Benchmark Study on 2013/14
Library Spending Plans (Unisphere Media, 2013).
[[↩](#fnref-2025-3)]{.footnotereverse}]{#fn-2025-3}
4. ["Archive Fever: a Freudian Impression," Diacritics 25, no. 2
(July 1995): 9--63.
[[↩](#fnref-2025-4)]{.footnotereverse}]{#fn-2025-4}
5. [Yochai Benkler, The Wealth of Networks: How Social Production
Transforms Markets and Freedom (New Haven: Yale University Press,
2006), 92; Paul DiMaggio et al., "Social Implications of the
Internet," Annual Review of Sociology 27 (January 2001): 320; Zizi
Papacharissi "The Virtual Sphere the Internet as a Public Sphere,"
New Media & Society 4.1 (2002): 9--27; Craig Calhoun "Information
Technology and the International Public Sphere," in Shaping the
Network Society: the New Role of Civil Society in Cyberspace, ed.
Douglas Schuler and Peter Day (MIT Press, 2004), 229--52.
[[↩](#fnref-2025-5)]{.footnotereverse}]{#fn-2025-5}
6. [Benkler, The Wealth of Networks, 442; Manuel Castells,
"Communication, Power and Counter-Power in the Network Society,"
International Journal of Communication (2007): 251; Lawrence
Lessig Free Culture:How Big Media Uses Technology and the Law to
Lock Down Culture and Control Creativity (The Penguin Press, 2004);
Clay Shirky Here Comes Everybody: the Power of Organizing Without
Organizations (New York: Penguin Press, 2008), 153.
[[↩](#fnref-2025-6)]{.footnotereverse}]{#fn-2025-6}
7. [Brian R. Day "In Defense of Copyright: Creativity, Record Labels,
and the Future of Music," Seton Hall Journal of Sports and
Entertainment Law, 21.1 (2011); William M. Landes and Richard A.
Posner, The Economic Structure of Intellectual Property Law
(Harvard University Press, 2003). For further discussion see
Steve P. Calandrillo, "Economic Analysis of Property Rights in
Information: Justifications and Problems of Exclusive Rights,
Incentives to Generate Information, and the Alternative of a
Government-Run Reward System" Fordham Intellectual Property, Media
& Entertainment Law Journal 9 (1998): 306; Julie Cohen, "Creativity
and Culture in Copyright Theory," U.C. Davis Law Review 40 (2006):
1151; Justin Hughes "Philosophy of Intellectual Property,"
Georgetown Law Journal 77 (1988): 303.
[[↩](#fnref-2025-7)]{.footnotereverse}]{#fn-2025-7}
8. [[piracylab.org](“http://piracylab.org”).
[[↩](#fnref-2025-8)]{.footnotereverse}]{#fn-2025-8}
9. ["Set the Fox to Watch the Geese: Voluntary IP Regimes in Piratical
File-Sharing Communities, in Piracy: Leakages from Modernity
(Litwin Books, LLC, 2012).
[[↩](#fnref-2025-9)]{.footnotereverse}]{#fn-2025-9}
10. ["The Future of Music and Film Piracy in China," Berkeley
Technology Law Journal 21 (2006): 795.
[[↩](#fnref-2025-10)]{.footnotereverse}]{#fn-2025-10}
11. ["The Cyberlocker Gold Rush: Tracking the Rise of File-Hosting Sites
as Media Distribution Platforms," International Journal of Cultural
Studies, (2013).
[[↩](#fnref-2025-11)]{.footnotereverse}]{#fn-2025-11}
12. [The injunctions name I\* and F\* N\* (also known as Smiley).
[[↩](#fnref-2025-12)]{.footnotereverse}]{#fn-2025-12}
13. ["Publishers Strike Major Blow against Internet Piracy" last
modified February 15, 2012 and archived on January 10, 2014,
[http://www.internationalpublishers.org/ipa-press-releases/286-publishers-strike-major-blow-against-internet-piracy](“http://web.archive.org/web/20140110160254/http://www.internationalpublishers.org/ipa-press-releases/286-publishers-strike-major-blow-against-internet-piracy”).
[[↩](#fnref-2025-13)]{.footnotereverse}]{#fn-2025-13}
14. [Including the German Publishers and Booksellers Association,
Cambridge University Press, Georg Thieme, Harper Collins, Hogrefe,
Macmillan Publishers Ltd., Cengage Learning, Elsevier, John Wiley &
Sons, The McGraw-Hill Companies, Pearson Education Ltd., Pearson
Education Inc., Oxford University Press, Springer, Taylor & Francis,
C.H. Beck as well as Walter De Gruyter. The legal proceedings are
also supported by the Association of American Publishers (AAP), the
Dutch Publishers Association (NUV), the Italian Publishers
Association (AIE) and the International Association of Scientific
Technical and Medical Publishers (STM).
[[↩](#fnref-2025-14)]{.footnotereverse}]{#fn-2025-14}
15. [Andrew Losowsky, "Book Downloading Site Targeted in Injunctions
Requested by 17 Publishers," Huffington Post, accessed on
September 1, 2014,
[http://www.huffingtonpost.com/2012/02/15/librarynu-book-downloading-injunction\_n\_1280383.html](“http://www.huffingtonpost.com/2012/02/15/librarynu-book-downloading-injunction_n_1280383.html”).
[[↩](#fnref-2025-15)]{.footnotereverse}]{#fn-2025-15}
16. [International Publishers Association.
[[↩](#fnref-2025-16)]{.footnotereverse}]{#fn-2025-16}
17. [Vik, "Gigapedia: The greatest, largest and the best website for
downloading eBooks," Emotionallyspeaking.com, last edited on August
10, 2009 and archived on July 15, 2012,
[http://archive.is/g205"\>http://vikas-gupta.in/2009/08/10/gigapedia-the-greatest-largest-and-the-best-website-for-downloading-free-e-books/](“http://archive.is/g205”).
[[↩](#fnref-2025-17)]{.footnotereverse}]{#fn-2025-17}
18. [Anonymous author, "Library.nu: Modern era's 'Destruction of the
Library of Alexandria,'" Breaking Culture (on tublr.com), last
edited on February 16, 2012 and archived on January 14, 2014,
[http://breakingculture.tumblr.com/post/17697325088/gigapedia-rip](“https://web.archive.org/web/20140113135846/http://breakingculture.tumblr.com/post/17697325088/gigapedia-rip”).
[[↩](#fnref-2025-18)]{.footnotereverse}]{#fn-2025-18}
19. [[http://torrentfreak.com/book-publishers-shut-down-library-nu-and-ifile-it-120215](“https://web.archive.org/web/20140110050710/http://torrentfreak.com/book-publishers-shut-down-library-nu-and-ifile-it-120215”)
archived on January 10, 2014.
[[↩](#fnref-2025-19)]{.footnotereverse}]{#fn-2025-19}
20. [[http://www.reddit.com/r/trackers/comments/ppfwc/librarynu\_admin\_the\_website\_is\_shutting\_down\_due](“https://web.archive.org/web/20140110050450/http://www.reddit.com/r/trackers/comments/ppfwc/librarynu_admin_the_website_is_shutting_down_due”)
archived on January 10, 2014.
[[↩](#fnref-2025-20)]{.footnotereverse}]{#fn-2025-20}
21. [[http://www.reddit.com/r/trackers/comments/ppfwc/librarynu\_admin\_the\_website\_is\_shutting\_down\_due](“https://web.archive.org/web/20140110050450/http://www.reddit.com/r/trackers/comments/ppfwc/librarynu_admin_the_website_is_shutting_down_due”)
orchived on January 10, 2014.
[[↩](#fnref-2025-21)]{.footnotereverse}]{#fn-2025-21}
22. [[www.reddit.com/r/trackers/comments/ppfwc/librarynu\_admin\_the\_website\_is\_shutting\_down\_due](“https://web.archive.org/web/20140110050450/http://www.reddit.com/r/trackers/comments/ppfwc/librarynu_admin_the_website_is_shutting_down_due”)
archived on January 10, 2014.
[[↩](#fnref-2025-22)]{.footnotereverse}]{#fn-2025-22}
23. [This point is made at length in the report on media piracy in
emerging economies, released by the American Assembly in 2011. See
Joe Karaganis, ed. Media Piracy in Emerging Economies (Social
Science Research Network, March 2011),
[http://piracy.americanassembly.org/the-report/](“http://piracy.americanassembly.org/the-report/”), I.
[[↩](#fnref-2025-23)]{.footnotereverse}]{#fn-2025-23}
24. [Lobato and Tang, "The Cyberlocker Gold Rush."
[[↩](#fnref-2025-24)]{.footnotereverse}]{#fn-2025-24}
25. [Lobato and Tang, "The Cyberlocker Gold Rush," 9.
[[↩](#fnref-2025-25)]{.footnotereverse}]{#fn-2025-25}
26. [Lobato and Tang, "The Cyberlocker Gold Rush," 7.
[[↩](#fnref-2025-26)]{.footnotereverse}]{#fn-2025-26}
27. [GIMEL/viewtopic.php?f=8&t=169; GIMEL/viewtopic.php?f=17&t=299.
[[↩](#fnref-2025-27)]{.footnotereverse}]{#fn-2025-27}
28. [GIMEL/viewtopic.php?f=17&t=299.
[[↩](#fnref-2025-28)]{.footnotereverse}]{#fn-2025-28}
29. [GIMEL/viewtopic.php?f=8&t=169. All quotes translated from Russian
by the authors, unless otherwise noted.
[[↩](#fnref-2025-29)]{.footnotereverse}]{#fn-2025-29}
30. [GIMEL/viewtopic.php?f=8&t=6999&p=41911.
[[↩](#fnref-2025-30)]{.footnotereverse}]{#fn-2025-30}
31. [GIMEL/viewtopic.php?f=8&t=757.
[[↩](#fnref-2025-31)]{.footnotereverse}]{#fn-2025-31}
32. [In this sense, we see our work as complementary to but not
exhausted by infrastructure studies. See Geoffrey C. Bowker and
Susan Leigh Star, Sorting Things Out: Classification and Its
Consequences (The MIT Press, 1999); Paul N. Edwards, "Y2K:
Millennial Reflections on Computers as Infrastructure," History and
Technology 15.1-2 (1998): 7--29; Paul N. Edwards, "Infrastructure
and Modernity: Force, Time, and Social Organization in the History
of Sociotechnical Systems," in Modernity and Technology, 2003,
185--225; Paul N. Edwards et al., "Introduction: an Agenda for
Infrastructure Studies," Journal of the Association for Information
Systems 10.5 (2009): 364--74; Brian Larkin "Degraded Images,
Distorted Sounds: Nigerian Video and the Infrastructure of Piracy,"
Public Culture 16.2 (2004): 289--314; Brian Larkin "Pirate
Infrastructures," in Structures of Participation in Digital
Culture, ed. Joe Karaganis (New York: SSRC, 2008), 74--87; Susan
Leigh Star and Geoffrey C. Bowker, "How to Infrastructure," in
Handbook of New Media: Social Shaping and Social Consequences of
ICTs, (SAGE Publications Ltd, 2010), 230--46.
[[↩](#fnref-2025-32)]{.footnotereverse}]{#fn-2025-32}
33. [For information on cryptographic hashing see Praveen Gauravaram and
Lars R. Knudsen, "Cryptographic Hash Functions," in Handbook of
Information and Communication Security, ed. Peter Stavroulakis and
Mark Stamp (Springer Berlin Heidelberg, 2010), 59--79.
[[↩](#fnref-2025-33)]{.footnotereverse}]{#fn-2025-33}
34. [See GIMEL/viewtopic.php?f=8&t=55kj and
GIMEL/viewtopic.php?f=8&t=18&sid=936.
[[↩](#fnref-2025-34)]{.footnotereverse}]{#fn-2025-34}
35. [GIMEL/viewtopic.php?f=8&t=714.
[[↩](#fnref-2025-35)]{.footnotereverse}]{#fn-2025-35}
36. [GIMEL/viewtopic.php?f=8&t=47.
[[↩](#fnref-2025-36)]{.footnotereverse}]{#fn-2025-36}
37. [GIMEL/viewtopic.php?f=17&t=175&hilit=RR&start=25.
[[↩](#fnref-2025-37)]{.footnotereverse}]{#fn-2025-37}
38. [GIMEL/viewtopic.php?f=17&t=104&start=450.
[[↩](#fnref-2025-38)]{.footnotereverse}]{#fn-2025-38}
39. [URL redacted; These numbers should be taken as a very rough
estimate because 1) we do not consider Alexa to be a reliable source
for web traffic and 2) some of the other figures cited in the report
are suspicious. For example, Aleph has a relatively small archive
of foreign fiction, at odds with the reported figure of 800,000
volumes. [[↩](#fnref-2025-39)]{.footnotereverse}]{#fn-2025-39}
40. [GIMEL/viewtopic.php?f=17&t=7061.
[[↩](#fnref-2025-40)]{.footnotereverse}]{#fn-2025-40}
41. ["The BitTorrent Protocol Specification," last modified October 20,
2012 and archived on June 13, 2014,
[http://www.bittorrent.org/beps/bep\_0003.html](“http://web.archive.org/web/20140613190300/http://www.bittorrent.org/beps/bep_0003.html”).
[[↩](#fnref-2025-41)]{.footnotereverse}]{#fn-2025-41}
42. [For more information on BitTorrent, see Bram Cohen, Incentives
Build Robustness in BitTorrent, last modified on May 22, 2003,
[http://www.bittorrent.org/bittorrentecon.pdf](“http://www.bittorrent.org/bittorrentecon.pdf”);
Ricardo Salmon, Jimmy Tran, and Abdolreza Abhari, "Simulating a File
Sharing System Based on BitTorrent," in Proceedings of the 2008
Spring Simulation Multiconference, SpringSim '08 (San Diego, CA,
USA: Society for Computer Simulation International, 2008), 21:1--5.
[[↩](#fnref-2025-42)]{.footnotereverse}]{#fn-2025-42}
43. [In 2008 The Pirate Bay co-founders Peter Sunde, Gottfrid
Svartholm Warg, Fredrik Neij, and Carl Lundstromwere were charged
with "conspiracy to break copyright related offenses" in Sweden. See
Simon Johnson for Reuters.com, "Pirate Bay Copyright Test Case
Begins in Sweden," last edited on February 16, 2009 and archived on
August 4, 2014,
[http://uk.reuters.com/article/2009/02/16/tech-us-sweden-piratebay-idUKTRE51F3K120090216](http://web.archive.org/web/20140804000829/http://uk.reuters.com/article/2009/02/16/tech-us-sweden-piratebay-idUKTRE51F3K120090216”).
[[↩](#fnref-2025-43)]{.footnotereverse}]{#fn-2025-43}
44. [TPB, "Worlds most resiliant tracking," last edited November 17,
2009 and archived on August 4, 2014,
[thepiratebay.se/blog/175](“http://web.archive.org/web/20140804015645/http://thepiratebay.se/blog/175”).
[[↩](#fnref-2025-44)]{.footnotereverse}]{#fn-2025-44}
45. [GIMEL/viewtopic.php?f=8&t=6999.
[[↩](#fnref-2025-45)]{.footnotereverse}]{#fn-2025-45}
46. [Thibault Cholez, Isabelle Chrisment, and Olivier Festor "Evaluation
of Sybil Attacks Protection Schemes in KAD," in Scalability of
Networks and Services, ed. Ramin Sadre and Aiko Pras, Lecture Notes
in Computer Science 5637 (Springer Berlin Heidelberg, 2009), 70--82.
[[↩](#fnref-2025-46)]{.footnotereverse}]{#fn-2025-46}
47. [J.P. Timpanaro et al., "BitTorrent's Mainline DHT Security
Assessment," in 2011 4th IFIP International Conference on New
Technologies, Mobility and Security (NTMS), 2011, 1--5.
[[↩](#fnref-2025-47)]{.footnotereverse}]{#fn-2025-47}
48. [Ernesto, "US P2P Lawsuit Shows Signs of a 'Pirate Honeypot',"
Technology, TorrentFreak, last edited in June 2011 and archived on
January 14, 2014,
[http://torrentfreak.com/u-s-p2p-lawsuit-shows-signs-of-a-pirate-honeypot-110601/](“https://web.archive.org/web/20140114200326/http://torrentfreak.com/u-s-p2p-lawsuit-shows-signs-of-a-pirate-honeypot-110601/”).
[[↩](#fnref-2025-48)]{.footnotereverse}]{#fn-2025-48}
49. [Benkler The Wealth of Networks, 60.
[[↩](#fnref-2025-49)]{.footnotereverse}]{#fn-2025-49}
50. [On the free and public library movement in England and the United
States see Thomas Greenwood, Public Libraries: a History of the
Movement and a Manual for the Organization and Management of Rate
Supported Libraries (Simpkin, Marshall, Hamilton, Kent, 1890);
Elizabeth Akers Allen and James Phinney Baxter, Dedicatory
Exercises of the Baxter Building (Auburn, Me: Lakeside Press,
1889). To read more about the history of free and public library
movements in Russia see Mary Stuart, "The Evolution of Librarianship
in Russia: the Librarians of the Imperial Public Library,
1808-1868," The Library Quarterly 64.1 (January 1994): 1--29; Mary
Stuart, "Creating a National Library for the Workers' State: the
Public Library in Petrograd and the Rumiantsev Library Under
Bolshevik Rule," The Slavonic and East European Review 72.2 (April
1994): 233--58; Mary Stuart "The Ennobling Illusion: the Public
Library Movement in Late Imperial Russia," The Slavonic and East
European Review 76.3 (July 1998): 401--40.
[[↩](#fnref-2025-50)]{.footnotereverse}]{#fn-2025-50}
51. [Michael H. Harris, History of Libraries of the Western World,
(London: Scarecrow Press, 1999), 136.
[[↩](#fnref-2025-51)]{.footnotereverse}]{#fn-2025-51}
52. [http://s\.d\.ru/comments/508985/.
[[↩](#fnref-2025-52)]{.footnotereverse}]{#fn-2025-52}
53. [RR/forum/viewtopic.php?t=1590026.
[[↩](#fnref-2025-53)]{.footnotereverse}]{#fn-2025-53}
54. [Aaron Halfaker et al."The Rise and Decline of an Open Collaboration
System: How Wikipedia's Reaction to Popularity Is Causing Its
Decline," American Behavioral Scientist, December 2012.
[[↩](#fnref-2025-54)]{.footnotereverse}]{#fn-2025-54}
:::

Series Navigation[[\<\< What Do Metrics Want? How Quantification
Prescribes Social Interaction on
Facebook](http://computationalculture.net/what-do-metrics-want/ "<< What Do Metrics Want? How Quantification Prescribes Social Interaction on Facebook")]{.series-nav-left}[[Modelling
biology -- working through (in-)stabilities and frictions
\>\>](http://computationalculture.net/modelling-biology/ "Modelling biology – working through (in-)stabilities and frictions >>")]{.series-nav-right}
:::

::: {.comments}
:::

Article printed from Computational Culture:
http://computationalculture.net

URL to article:
http://computationalculture.net/book-piracy-as-peer-preservation/

[Click here to print.](#Print "Click here to print.")

Copyright © 2012 Computational Culture. All rights reserved.

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.

line {print $field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR

line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR