1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx*2)), $2}' > freq.$i.txt; done && rm temp.txt

* 2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR
line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR
line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf*$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

* 3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR

Barok
Poetics of Research
2014

_An unedited version of a talk given at the conference[Public
Library](http://www.wkv-stuttgart.de/en/program/2014/events/public-library/)
held at Württembergischer Kunstverein Stuttgart, 1 November 2014._

_Bracketed sequences are to be reformulated._

Poetics of Research

In this talk I'm going to attempt to identify [particular] cultural
algorithms, ie. processes in which cultural practises and software meet. With
them a sphere is implied in which algorithms gather to form bodies of
practices and in which cultures gather around algorithms. I'm going to
approach them through the perspective of my practice as a cultural worker,
editor and artist, considering practice in the same rank as theory and
poetics, and where theorization of practice can also lead to the
identification of poetical devices.

The primary motivation for this talk is an attempt to figure out where do we
stand as operators, users [and communities] gathering around infrastructures
containing a massive body of text (among other things) and what sort of things
might be considered to make a difference [or to keep making difference].

The talk mainly [considers] the role of text and the word in research, by way
of several figures.

A

A reference, list, scheme, table, index; those things that intervene in the
flow of narrative, illustrating the point, perhaps in a more economic way than
the linear text would do. Yet they don't function as pictures, they are
primarily texts, arranged in figures. Their forms have been
standardised[normalised] over centuries, withstood the transition to the
digital without any significant change, being completely intuitive to the
modern reader. Compared to the body of text they are secondary, run parallel
to it. Their function is however different to that of the punctuation. They
are there neither to shape the narrative nor to aid structuring the argument
into logical blocks. Nor is their function spatial, like in visual poems.
Their positions within a document are determined according to the sequential
order of the text, [standing as attachments] and are there to clarify the
nature of relations among elements of the subject-matter, or to establish
relations with other documents. The [premise] of my talk is that these
_textual figures_ also came to serve as the abstract[relational] models
determining possible relations among documents as such, and in consequence [to
structure conditions [of research]].

B

It can be said that research, as inquiry into a subject-matter, consists of
discrete queries. A query, such as a question about what something is, what
kinds, parts and properties does it have, and so on, can be consulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.

Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.

C

Searching in a collection of [amassed/assembled] [tangible] documents (ie.
bookshelf) is different from searching in a systematically structured
repository (library) and even more so from searching in a digital repository
(digital library). Not that they are mutually exclusive. One can devise
structures and algorithms to search through a printed text, or read books in a
library one by one. They are rather [models] [embodying] various [processes]
associated with the query. These properties of the query might be called [the
sequence], the structure and the index. If they are present in the ways of
querying documents, and we will return to this issue, are they persistent
within the inquiry as such? [wait]

D

This question itself is a rupture in the sequence. It makes a demand to depart
from one narrative [a continuous flow of words] to another, to figure out,
while remaining bound to it [it would be even more as a so-called rhetorical
question]. So there has been one sequence, or line, of the inquiry--about the
kinds of the query and its properties. That sequence itself is a digression,
from within the sequence about what is research and describing its parts
(queries). We are thus returning to it and continue with a question whether
the properties of the inquiry are the same as the properties of the query.

E

But isn't it true that every single utterance occurring in a sequence yields a
query as well? Let's consider the word _utterance_. [wait] It can produce a
number of associations, for example with how Foucault employs the notion of
_énoncé_ in his _Archaeology of Knowledge_ , giving hard time to his English
translators wondering whether _utterance_ or _statement_ is more appropriate,
or whether they are interchangeable, and what impact would each choice have on
his reception in the Anglophone world. Limiting ourselves to textual forms for
now (and not translating his work but pursing a different inquiry), let us say
the utterance is a word [or a phrase or an idiom] in a sequence such as a
sentence, a paragraph, or a document.

## (F) The
structure[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=1
"Edit section: $F$ The structure")]

This distinction is as old as recorded Western thought since both Plato and
Aristotle differentiate between a word on its own ("the said", a thing said)
and words in the company of other words. For example, Aristotle's _Categories_
[lay] on the [notion] of words on their own, and they are made the subject-
matter of that inquiry. [For him], the ambiguity of connotation words
[produce] lies in their synonymity, understood differently from the moderns--
not as more words denoting a similar thing but rather one word denoting
various things. Categories were outlined as a device to differentiate among
words according to kinds of these things. Every word as such belonged to not
less and not more than one of ten categories.

So it happens to the word _utterance_ , as to any other word uttered in a
sequence, that it poses a question, a query about what share of the spectrum
of possibly denoted things might yield as the most appropriate in a given
context. The more context the more precise share comes to the fore. When taken
out of the context ambiguity prevails as the spectrum unveils in its variety.

Thus single words [as any other utterances] are questions, queries,
themselves, and by occuring in statements, in context, their [means] are being
singled out.

This process is _conditioned_ by what has been formalized as the techniques of
_regulating_ definitions of words.

### (G) The structure: words as
words[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=2
"Edit section: $G$ The structure: words as words")]

* [![](/images/thumb/c/c8/Philitas_in_P.Oxy.XX_2260_i.jpg/144px-Philitas_in_P.Oxy.XX_2260_i.jpg)](/File:Philitas_in_P.Oxy.XX_2260_i.jpg)

P.Oxy.XX 2260 i: Oxyrhynchus papyrus XX, 2260, column i, with quotation from
Philitas, early 2nd c. CE. ¹(http://163.1.169.40/cgi-
bin/library?e=q-000-00---0POxy--00-0-0--0prompt-10---4------0-1l--1-en-50---
20-about-2260--
00031-001-0-0utfZz-8-00&a=d&c=POxy&cl=search&d=HASH13af60895d5e9b50907367)
²(http://en.wikipedia.org/wiki/File:POxy.XX.2260.i-Philitas-
highlight.jpeg)

* [![](/images/thumb/9/9e/Cyclopaedia_1728_page_210_Dictionary_entry.jpg/88px-Cyclopaedia_1728_page_210_Dictionary_entry.jpg)](/File:Cyclopaedia_1728_page_210_Dictionary_entry.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , 1728, p. 210. ³(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0576&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/b/b8/Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg/160px-Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)](/File:Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)

Detail from the Liddell-Scott Greek-English Lexicon, c1843.

Dictionaries have had a long life. The ancient Greek scholar and poet Philitas
of Cos living in the 4th c. BCE wrote a vocabulary explaining the meanings of
rare Homeric and other literary words, words from local dialects, and
technical terms. The vocabulary, called _Disorderly Words_ (Átaktoi glôssai),
has been lost, with a few fragments quoted by later authors. One example is
that the word πέλλα (pélla) meant "wine cup" in the ancient Greek region of
Boeotia; contrasted to the same word meaning "milk pail" in Homer's _Iliad_.

Not much has changed in the way how dictionaries constitute order. Selected
archives of statements are queried to yield occurrences of particular words,
various _criteria[indicators]_ are applied to filtering and sorting them and
in turn the spectrum of [denoted] things allocated in this way is structured
into groups and subgroups which are then given, according to other set of
rules, shorter or longer names. These constitute facets of [potential]
meanings of a word.

So there are at least _four_ sets of conditions [structuring] dictionaries.
One is required to delimit an archive[corpus of texts], one to select and give
preference[weights] to occurrences of a word, another to cluster them, and yet
another to abstract[generalize] the subject-matter of each of these clusters.
Needless to say, this is a craft of a few and these criteria are rarely being
disclosed, despite their impact on research, and more generally, their
influence as conditions for production[making] of a so called _common sense_.

It doesn't take that much to reimagine what a dictionary is and what it could
be, especially having large specialized corpora of texts at hand. These can
also serve as aids in production of new words and new meanings.

### (H) The structure: words as knowledge and the
world[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=3
"Edit section: $H$ The structure: words as knowledge and the world")]

* [![](/images/thumb/0/02/Boethius_Porphyrys_Isagoge.jpg/120px-Boethius_Porphyrys_Isagoge.jpg)](/File:Boethius_Porphyrys_Isagoge.jpg)

Boethius's rendering of a classification tree described in Porphyry's Isagoge
(3th c.), [6th c.] 10th c.
⁴(http://www.e-codices.unifr.ch/en/sbe/0315/53/medium)

* [![](/images/thumb/d/d0/Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg/94px-Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)](/File:Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , London, 1728, p. II. ⁵(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0015&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/d/d6/Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg/116px-Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)](/File:Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)

Système figuré des connaissances humaines, _Encyclopédie ou Dictionnaire
raisonné des sciences, des arts et des métiers_ , 1751.
⁶(http://encyclopedie.uchicago.edu/content/syst%C3%A8me-figur%C3%A9-des-
connaissances-humaines)

* [![](/images/thumb/9/96/Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg/96px-Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)](/File:Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)

Haeckel - Darwin's tree.

Another _formalized_ and [internalized] process being at play when figuring
out a word is its [containment]. Word is not only structured by way of things
it potentially denotes but also by words it is potentially part of and those
it contains.

The fuzz around categorization of knowledge _and_ the world in the Western
thought can be traced back to Porphyry, if not further. In his introduction to
Aristotle's _Categories_ this 3rd century AD Neoplatonist began expanding the
notions of genus and species into their hypothetic consequences. Aristotle's
brief work outlines ten categories of 'things that are said' (legomena,
λεγόμενα), namely substance (or substantive, {not the same as matter!},
οὐσία), quantity (ποσόν), qualification (ποιόν), a relation (πρός), where
(ποῦ), when (πότε), being-in-a-position (κεῖσθαι), having (or state,
condition, ἔχειν), doing (ποιεῖν), and being-affected (πάσχειν). In his
different work, _Topics_ , Aristotle outlines four kinds of subjects/materials
indicated in propositions/problems from which arguments/deductions start.
These are a definition (όρος), a genus (γένος), a property (ἴδιος), and an
accident (συμβεβηϰόϛ). Porphyry does not explicitly refer _Topics_ , and says
he omits speaking "about genera and species, as to whether they subsist (in
the nature of things) or in mere conceptions only"
⁸(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C1),
which means he avoids explicating whether he talks about kinds of concepts or
kinds of things in the sensible world. However, the work sparked confusion, as
the following passage [suggests]:

> "[I]n each category there are certain things most generic, and again, others
most special, and between the most generic and the most special, others which
are alike called both genera and species, but the most generic is that above
which there cannot be another superior genus, and the most special that below
which there cannot be another inferior species. Between the most generic and
the most special, there are others which are alike both genera and species,
referred, nevertheless, to different things, but what is stated may become
clear in one category. Substance indeed, is itself genus, under this is body,
under body animated body, under which is animal, under animal rational animal,
under which is man, under man Socrates, Plato, and men particularly." (Owen
1853,
⁹(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C2))

Porphyry took one of Aristotle's ten categories of the word, substance, and
dissected it using one of his four rhetorical devices, genus. Employing
Aristotle's categories, genera and species as means for logical operations,
for dialectic, Porphyry's interpretation resulted in having more resemblance
to the perceived _structures_ of the world. So they began to bloom.

There were earlier examples, but Porphyry was the most influential in
injecting the _universalist_ version of classification [implying] the figure
of a tree into the [locus] of Aristotle's thought. Knowledge became
monotheistic.

Classification schemes [growing from one point] play a major role in
untangling the format of modern encyclopedia from that of the dictionary
governed by alphabet. Two of the most influential encyclopedias of the 18th
century are cases in the point. Although still keeping 'dictionary' in their
titles, they are conceived not to represent words but knowledge. The [upper-
most] genus of the body was set as the body of knowledge. The English
_Cyclopaedia, or an Universal Dictionary of Arts and Sciences_ (1728) splits
into two main branches: "natural and scientifical" and "artificial and
technical"; these further split down to 47 classes in total, each carrying a
structured list (on the following pages) of thematic articles, serving as
table of contents. The French _Encyclopedia: or a Systematic Dictionary of the
Sciences, Arts, and Crafts_ (1751) [unwinds] from judgement ( _entendement_ ),
branches into memory as history, reason as philosophy, and imagination as
poetry. The logic of containers was employed as an aid not only to deal with
the enormous task of naming and not omiting anything from what is known, but
also for the management of labour of hundreds of writers and researchers, to
create a mechanism for delegating work and the distribution of
responsibilities. Flesh was also more present, in the field research, with
researchers attending workshops and sites of everyday life to annotate it.

The world came forward to unshine the word in other schemes. Darwin's tree of
evolution and some of the modern document classification systems such as
Charles A. Cutter's _Expansive Classification_ (1882) set to classify the
world itself and set the field for what has came to be known as authority
lists structuring metadata in today's computing.

### The structure
(summary)[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=4
"Edit section: The structure $summary$")]

Facetization of meaning and branching of knowledge are both the domain of the
unit of utterance.

While lexicographers[dictionarists] structure thought through multi-layered
processes of abstraction of the written record, knowledge growers dissect it
into hierarchies of [mutually] contained notions.

One seek to describe the word as a faceted list of small worlds, another to
describe the world as a structured lists of words. One play prime in the
domain of epistemology, in what is known, controlling the vocabulary, another
in the domain of ontology, in what is, controlling reality.

Every [word] has its given things, every thing has its place, closer or
further from a single word.

The schism between classifying words and classifying the world implies it is
not possible to construct a universal classification scheme[system]. On top of
that, any classification system of words is bound to a corpus of texts it is
operating upon and any classification system of the world again operates with
words which are bound to a vocabulary[lexicon] which is again bound to a
corpus [of texts]. It doesn't mean it would prevent people from trying.
Classifications function as descriptors of and 'inscriptors' upon the world,
imprinting their authority. They operate from [a locus of] their
corpus[context]-specificity. The larger the corpus, the more power it has on
shaping the world, as far as the word shapes it (yes, I do imply Google here,
for which it is a domain to be potentially exploited).

## (J) The
sequence[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=5
"Edit section: $J$ The sequence")]

The structure-yielding query [of] the single word [shrinks][zuzuje
sa,spresnuje] with preceding and following words. Inquiry proceeds in the flow
that establishes another kind[mode] of relationality, chaining words into the
sequence. While the structuring property of the query brings words apart from
each other, its sequential property establishes continuity and brings these
units into an ordered set.

This is what is responsible for attaching textual figures mentioned earlier
(lists, schemes, tables) to the body of the text. Associations can be also
stated explicitly, by indexing tables and then referring them from a
particular point in the text. The same goes for explicit associations made
between blocks of the text by means of indexed paragraphs, chapters or pages.

From this follows that all utterances point to the following utterance by the
nature of sequential order, and indexing provides means for pointing elsewhere
in the document as well.

A lot can be said about references to other texts. Here, to spare time, I
would refer you to a talk I gave a few months ago and which is online
¹⁰(http://monoskop.org/Talks/Communing_Texts).

This is still the realm of print. What happens with document when it is
digitized?

Digitization breaks a document into units of which each is assigned a numbered
position in the sequence of the document. From this perspective digitization
can be viewed as a total indexation of the document. It is converted into
units rendered for machine operations. This sequentiality is made explicit, by
means of an underlying index.

Sequences and chains are orders of one dimension. Their one-dimensional
ordering allows addressability of each element and [random] access. [Jumps]
between [random] addresses are still sequential, processing elements one at a
time.

## (K) The
index[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=6
"Edit section: $K$ The index")]

* [![](/images/thumb/2/27/Summa_confessorum.1310.jpg/103px-Summa_confessorum.1310.jpg)](/File:Summa_confessorum.1310.jpg)

Summa confessorum [1297-98], 1310.
⁷(http://www.bl.uk/onlinegallery/onlineex/illmanus/roymanucoll/j/011roy000008g11u00002000.html)

[The] sequencing not only weaves words into statements but activates other
temporalities, and _presents occurrences of words from past statements_. As
now when I am saying the word _utterance_ , each time there surface contexts
in which I have used it earlier.

A long quote from Frederick G. Kilgour, _The Evolution of the Book_ , 1998, pp
76-77:

> "A century of invention of various types of indexes and reference tools
preceded the advent of the first subject index to a specific book, which
occurred in the last years of the thirteenth century. The first subject
indexes were "distinctions," collections of "various figurative or symbolic
meanings of a noun found in the scriptures" that "are the earliest of all
alphabetical tools aside from dictionaries." (Richard and Mary Rouse supply an
example: "Horse = Preacher. Job 39: 'Hast thou given the horse strength, or
encircled his neck with whinning?')

>

> [Concordance] By the end of the third decade of the thirteenth century Hugh
de Saint-Cher had produced the first word concordance. It was a simple word
index of the Bible, with every location of each word listed by [its position
in the Bible specified by book, chapter, and letter indicating part of the
chapter]. Hugh organized several dozen men, assigning to each man an initial
letter to search; for example, the man assigned M was to go through the entire
Bible, list each word beginning with M and give its location. As it was soon
perceived that this original reference work would be even more useful if words
were cited in context, a second concordance was produced, with each word in
lengthy context, but it proved to be unwieldy. [Soon] a third version was
produced, with words in contexts of four to seven words, the model for
biblical concordances ever since.

>

> [Subject index] The subject index, also an innovation of the thirteenth
century, evolved over the same period as did the concordance. Most of the
early topical indexes were designed for writing sermons; some were organized,
while others were apparently sequential without any arrangement. By midcentury
the entries were in alphabetical order, except for a few in some classified
arrangement. Until the end of the century these alphabetical reference works
indexed a small group of books. Finally John of Freiburg added an alphabetical
subject index to his own book, _Summa Confessorum_ (1297—1298). As the Rouses
have put it, 'By the end of the [13]th century the practical utility of the
subject index is taken for granted by the literate West, no longer solely as
an aid for preachers, but also in the disciplines of theology, philosophy, and
both kinds of law.'"

In one sense neither subject-index nor concordane are indexes, they are words
or group of words selected according to given criteria from the body of the
text, each accompanied with a list of identifiers. These identifiers are
elements of an index, whether they represent a page, chapter, column, or other
[kind of] block of text. Every identifier is an unique _address_.

The index is thus an ordering of a sequence by means of associating its
elements with a set of symbols, when each element is given unique combination
of symbols. Different sizes of sets yield different number of variations.
Symbol sets such as an alphabet, arabic numerals, roman numerals, and binary
digits have different proportions between the length of a string of symbols
and the number of possible variations it can contain. Thus two symbols of
English alphabet can store 26^2 various values, of arabic numerals 10^2, of
roman numberals 8^2 and of binary digits 2^2.

Indexation is segmentation, a breaking into segments. From as early as the
13th century the index such as that of sections has served as enabler of
search. The more [detailed] indexation the more precise search results it
enables.

The subject-index and concordance are tables of search results. There is a
direct lineage from the 13th-century biblical concordances and the birth of
computational linguistic analysis, they were both initiated and realised by
priests.

During the World War II, Jesuit Father Roberto Busa began to look for machines
for the automation of the linguistic analysis of the 11 million-word Latin
corpus of Thomas Aquinas and related authors.

Working on his Ph.D. thesis on the concept of _praesens_ in Aquinas he
realised two things:

> "I realized first that a philological and lexicographical inquiry into the
verbal system of an author has t o precede and prepare for a doctrinal
interpretation of his works. Each writer expresses his conceptual system in
and through his verbal system, with the consequence that the reader who
masters this verbal system, using his own conceptual system, has to get an
insight into the writer's conceptual system. The reader should not simply
attach t o the words he reads the significance they have in his mind, but
should try t o find out what significance they had in the writer's mind.
Second, I realized that all functional or grammatical words (which in my mind
are not 'empty' at all but philosophically rich) manifest the deepest logic of
being which generates the basic structures of human discourse. It is .this
basic logic that allows the transfer from what the words mean today t o what
they meant to the writer.

>

> In the works of every philosopher there are two philosophies: the one which
he consciously intends to express and the one he actually uses to express it.
The structure of each sentence implies in itself some philosophical
assumptions and truths. In this light, one can legitimately criticize a
philosopher only when these two philosophies are in contradiction."
¹¹(http://www.alice.id.tue.nl/references/busa-1980.pdf)

Collaborating with the IBM in New York from 1949, the work, a concordance of
all the words of Thomas Aquinas, was finally published in the 1970s in 56
printed volumes (a version is online since 2005
¹²(http://www.corpusthomisticum.org/it/index.age)). Besides that, an
electronic lexicon for automatic lemmatization of Latin words was created by a
team of ten priests in the scope of two years (in two phases: grouping all the
forms of an inflected word under their lemma, and coding the morphological
categories of each form and lemma), containing 150,000 forms
¹³(http://www.alice.id.tue.nl/references/busa-1980.pdf#page=4). Father
Busa has been dubbed the father of humanities computing and recently also of
digital humanities.

The subject-index has a crucial role in the printed book. It is the only means
for search the book offers. Subjects composing an index can be selected
according to a classification scheme (specific to a field of an inquiry), for
example as elements of a certain degree (with a given minimum number of
subclasses).

Its role seemingly vanishes in the digital text. But it can be easily
transformed. Besides serving as a table of pre-searched results the subject-
index also gives a distinct idea about content of the book. Two patterns give
us a clue: numbers of occurrences of selected words give subjects weights,
while words that seem specific to the book outweights other even if they don't
occur very often. A selection of these words then serves as a descriptor of
the whole text, and can be thought of as a specific kind of 'tags'.

This process was formalized in a mathematical function in the 1970s, thanks to
a formula by Karen Spärck Jones which she entitled 'inverse document
frequency' (IDF), or in other words, "term specificity". It is measured as a
proportion of texts in the corpus where the word appears at least once to the
total number of texts. When multiplied by the frequency of the word _in_ the
text (divided by the maximum frequency of any word in the text), we get _term
frequency-inverse document frequency_ (tf-idf). In this way we can get an
automated list of subjects which are particular in the text when compared to a
group of texts.

We came to learn it by practice of searching the web. It is a mechanism not
dissimilar to thought process involved in retrieving particular information
online. And search engines have it built in their indexing algorithms as well.

There is a paper proposing attaching words generated by tf-idf to the
hyperlinks when referring websites ¹⁴(http://bscit.berkeley.edu/cgi-
bin/pl_dochome?query_src=&format=html&collection=Wilensky_papers&id=3&show_doc=yes).
This would enable finding the referred content even after the link is dead.
Hyperlinks in references in the paper use this feature and it can be easily
tested: ¹⁵(http://www.cs.berkeley.edu/~phelps/papers/dissertation-
abstract.html?lexical-
signature=notemarks+multivalent+semantically+franca+stylized).

There is another measure, cosine similarity, which takes tf-idf further and
can be applied for clustering texts according to similarities in their
specificity. This might be interesting as a feature for digital libraries, or
even a way of organising library bottom-up into novel categories, new
discourses could emerge. Or as an aid for researchers to sort through texts,
or even for editors as an aid in producing interesting anthologies.

## Final
remarks[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=7
"Edit section: Final remarks")]

1

New disciplines emerge all the time - most recently, for example, cultural
techniques, software studies, or media archaeology. It takes years, even
decades, before they gain dedicated shelves in libraries or a category in
interlibrary digital repositories. Not that it matters that much. They are not
only sites of academic opportunities but, firstly, frameworks of new
perspectives of looking at the world, new domains of knowledge. From the
perspective of researcher the partaking in a discipline involves negotiating
its vocabulary, classifications, corpus, reference field, and specific
terms[subjects]. Creating new fields involves all that, and more. Even when
one goes against all disciplines.

2

Google can still surprise us.

3

Knowledge has been in the making for millenia. There have been (abstract)
mechanisms established that govern its conditions. We now possess specialized
corpora of texts which are interesting enough to serve as a ground to discuss
and experiment with dictionaries, classifications, indexes, and tools for
references retrieval. These all belong to the poetic devices of knowledge-
making.

4

Command-line example of tf-idf and concordance in 3 steps.

* 1\. Process the files text.1-5.txt and produce freq.1-5.txt with lists of (nonlemmatized) words (in respective texts), ordered by frequency:

> for i in {1..5}; do tr '[A-Z]' '[a-z]' < text.$i.txt | tr -c '[a-z]'
'[\012*]' | tr -d '[:punct:]' | sort | uniq -c | sort -k 1nr | sed '1,1d' >
temp.txt; max=$(awk -vvar=1 -F" " 'NR

1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx2)), $2}' > freq.$i.txt; done && rm temp.txt

2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR
line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR
line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR
line {print $field}' tfidf.$j.txt); egrep -i
"[alpha:](/index.php?title=Alpha:&action=edit&redlink=1 "Alpha: $page does
not exist$") $word" occ.$j.txt >> occ.txt; done; done

Dušan Barok

_Written 23 October - 1 November 2014 in Bratislava and Stuttgart._

Dockray
Interface Access Loss
2013

Interface Access Loss

I want to begin this talk at the end -- by which I mean the end of property - at least according to
the cyber-utopian account of things, where digital file sharing and online communication liberate
culture from corporations and their drive for profit. This is just one of the promised forms of
emancipation -- property, in a sense, was undone. People, on a massive scale, used their
computers and their internet connections to share digitized versions of their objects with each
other, quickly producing a different, common form of ownership. The crisis that this provoked is
well-known -- it could be described in one word: Napster. What is less recognized - because it is
still very much in process - is the subsequent undoing of property, of both the private and common
kind. What follows is one story of "the cloud" -- the post-dot-com bubble techno-super-entity -which sucks up property, labor, and free time.

Object, Interface

It's debated whether the growing automation of production leads to global structural
unemployment or not -- Karl Marx wrote that "the self-expansion of capital by means of machinery
is thenceforward directly proportional to the number of the workpeople, whose means of
livelihood have been destroyed by that machinery" - but the promise is, of course, that when
robots do the work, we humans are free to be creative. Karl Kautsky predicted that increasing
automation would actually lead, not to a mass surplus population or widespread creativity, but
something much more mundane: the growth of clerks and bookkeepers, and the expansion of
unproductive sectors like "the banking system, the credit system, insurance empires and
advertising."

Marx was analyzing the number of people employed by some of the new industries in the middle
of the 19th century: "gas-works, telegraphy, photography, steam navigation, and railways." The
facts were that these industries were incredibly important, expansive and growing, highly
mechanized.. and employed a very small number of people. It is difficult not to read his study of
these technologies of connection and communication - against the background of our present
moment, in which the rise of the Internet has been accompanied by the deindustrialization of
cities, increased migrant and mobile labor, and jobs made obsolete by computation.

There are obvious examples of the impact of computation on the workplace: at factories and
distribution centers, robots engineered with computer-vision can replace a handful of workers,
with a savings of millions of dollars per robot over the life of the system. And there are less
apparent examples as well, like algorithms determining when and where to hire people and for
how long, according to fluctuating conditions.
Both examples have parallels within computer programming, namely reuse and garbage
collection. Code reuse refers to the practice of writing software in such a way that the code can be
used again later, in another program, to perform the same task. It is considered wasteful to give the
same time, attention, and energy to a function, because the development environment is not an
assembly line - a programmer shouldn't repeat. Such repetition then gives way to copy-andpasting (or merely calling). The analogy here is to the robot, to the replacement of human labor
with technology.

Now, when a program is in the midst of being executed, the computer's memory fills with data -but some of that is obsolete, no longer necessary for that program to run. If left alone, the memory
would become clogged, the program would crash, the computer might crash. It is the role of the
garbage collector to free up memory, deleting what is no longer in use. And here, I'm making the
analogy with flexible labor, workers being made redundant, and so on.

In Object-Oriented Programming, a programmer designs the software that she is writing around
“objects,” where each object is conceptually divided into “public” and “private” parts. The public
parts are accessible to other objects, but the private ones are hidden to the world outside the
boundaries of that object. It's a “black box” - a thing that can be known through its inputs and
outputs - even in total ignorance of its internal mechanisms. What difference does it make if the
code is written in one way versus an other .. if it behaves the same? As William James wrote, “If no
practical difference whatever can be traced, then the alternatives mean practically the same thing,
and all dispute is idle.”

By merely having a public interface, an object is already a social entity. It makes no sense to even
provide access to the outside if there are no potential objects with which to interact! So to

understand the object-oriented program, we must scale up - not by increasing the size or
complexity of the object, but instead by increasing the number and types of objects such that their
relations become more dense. The result is an intricate machine with an on and an off state, rather
than a beginning and an end. Its parts are interchangeable -- provided that they reliably produce
the same behavior, the same inputs and outputs. Furthermore, this machine can be modified:
objects can be added and removed, changing but not destroying the machine; and it might be,
using Gerald Raunig’s appropriate term, “concatenated” with other machines.

Inevitably, this paradigm for describing the relationship between software objects spread outwards,
subsuming more of the universe outside of the immediate code. External programs, powerful
computers, banking institutions, people, and satellites have all been “encapsulated” and
“abstracted” into objects with inputs and outputs. Is this a conceptual reduction of the richness
and complexity of reality? Yes, but only partially. It is also a real description of how people,
institutions, software, and things are being brought into relationship with one another according to
the demands of networked computation.. and the expanding field of objects are exactly those
entities integrated into such a network.

Consider a simple example of decentralized file-sharing: its diagram might represent an objectoriented piece of software, but here each object is a person-computer, shown in potential relation
to every other person-computer. Files might be sent or received at any point in this machine,
which seems particularly oriented towards circulation and movement. Much remains private, but a
collection of files from every person is made public and opened up to the network. Taken as a
whole, the entire collection of all files - which on the one hand exceeds the storage capacity of
any one person’s technical hardware, is on the other hand entirely available to every personcomputer. If the files were books.. then this collective collection would be a public library.

In order for a system like this to work, for the inputs and the outputs to actually engage with one
another to produce action or transmit data, there needs to be something in place already to enable
meaningful couplings. Before there is any interaction or any relationship, there must be some
common ground in place that allows heterogenous objects to ‘talk to each other’ (to use a phrase
from the business casual language of the Californian Ideology). The term used for such a common
ground - especially on the Internet - is platform, a word for that which enables and anticipates

future action without directly producing it. A platform provides tools and resources to the objects
that run “on top” of the platform so that those objects don't need to have their own tools and
resources. In this sense, the platform offers itself as a way for objects to externalize (and reuse)
labor. Communication between objects is one of the most significant actions that a platform can
provide, but it requires that the objects conform some amount of their inputs and outputs to the
specifications dictated by the platform.

But haven’t I only introduced another coupling, instead of between two objects, this time between
the object and the platform? What I'm talking about with "couplings" is the meeting point between
things - in other words, an “interface.” In the terms of OOP, the interface is an abstraction that
defines what kinds of interaction are possible with an object. It maps out the public face of the
object in a way that is legible and accessible to other objects. Similarly, computer interfaces like
screens and keyboards are designed to meet with human interfaces like fingers and eyes, allowing
for a specific form of interaction between person and machine. Any coupling between objects
passes through some interface and every interface obscures as much as it reveals - it establishes
the boundary between what is public and what is private, what is visible and what is not. The
dominant aesthetic values of user interface design actually privilege such concealment as “good
design,” appealing to principles of simplicity, cleanliness, and clarity.
Cloud, Access

One practical outcome of this has been that there can be tectonic shifts behind the interface where entire systems are restructured or revolutionized - without any interruption, as long as the
interface itself remains essentially unchanged. In Pragmatism’s terms, a successful interface keeps
any difference (in back) from making a difference (in front). Using books again as an example: for
consumers to become accustomed to the initial discomfort of purchasing a product online instead
of from a shop, the interface needs to make it so that “buying a book” is something that could be
interchangeably accomplished either by a traditional bookstore or the online "marketplace"
equivalent. But behind the interface is Amazon, which through low prices and wide selection is
the most visible platform for buying books and uses that position to push retailers and publishers
both to, at best, the bare minimum of profitability.

In addition to selling things to people and collecting data about its users (what they look at and
what they buy) to personalize product recommendations, Amazon has also made an effort to be a
platform for the technical and logistical parts of other retailers. Ultimately collecting data from
them as well, Amazon realizes a competitive advantage from having a comprehensive, up-to-theminute perspective on market trends and inventories. This volume of data is so vast and valuable
that warehouses packed with computers are constructed to store it, protect it, and make it readily
available to algorithms. Data centers, such as these, organize how commodities circulate (they run
business applications, store data about retail, manage fulfillment) but also - increasingly - they
hold the commodity itself - for example, the book. Digital book sales started the millennium very
slowly but by 2010 had overtaken hardcover sales.

Amazon’s store of digital books (or Apple’s or Google’s, for that matter) is a distorted reflection of
the collection circulating within the file-sharing network, displaced from personal computers to
corporate data centers. Here are two regimes of digital property: the swarm and the cloud. For
swarms (a reference to swarm downloading where a single file can be downloaded in parallel
from multiple sources) property is held in common between peers -- however, property is
positioned out of reach, on the cloud, accessible only through an interface that has absorbed legal
and business requirements.

It's just half of the story, however, to associate the cloud with mammoth data centers; the other
half is to be found in our hands and laps. Thin computing, including tablets and e-readers, iPads
and Kindles, and mobile phones have co-evolved with data centers, offering powerful, lightweight
computing precisely because so much processing and storage has been externalized.

In this technical configuration of the cloud, the thin computer and the fat data center meet through
an interface, inevitably clean and simple, that manages access to the remote resources. Typically,
a person needs to agree to certain “terms of service,” have a unique, measurable account, and
provide payment information; in return, access is granted. This access is not ownership in the
conventional sense of a book, or even the digital sense of a file, but rather a license that gives the
person a “non-exclusive right to keep a permanent copy… solely for your personal and noncommercial use,” contradicting the First Sale Doctrine, which gives the “owner” the right to sell,
lease, or rent their copy to anyone they choose at any price they choose. The doctrine,

established within America's legal system in 1908, separated the rights of reproduction, from
distribution, as a way to "exhaust" the copyright holder's control over the commodities that people
purchased.. legitimizing institutions like used book stores and public libraries. Computer software
famously attempted to bypass the First Sale Doctrine with its "shrink wrap" licenses that restricted
the rights of the buyer once she broke through the plastic packaging to open the product. This
practice has only evolved and become ubiquitous over the last three decades as software began
being distributed digitally through networks rather than as physical objects in stores. Such
contradictions are symptoms of the shift in property regimes, or what Jeremy Rifkin called “the age
of access.” He writes that “property continues to exist but is far less likely to be exchanged in
markets. Instead, suppliers hold on to property in the new economy and lease, rent, or charge an
admission fee, subscription, or membership dues for its short-term use.”

Thinking again of books, Rifkin’s description gives the image of a paid library emerging as the
synthesis of the public library and the marketplace for commodity exchange. Considering how, on
the one side, traditional public libraries are having their collections deaccessioned, hours of
operation cut, and are in some cases being closed down entirely, and on the other side, the
traditional publishing industry finds its stores, books, and profits dematerialized, the image is
perhaps appropriate. Server racks, in photographs inside data centers, strike an eerie resemblance
to library stacks - - while e-readers are consciously designed to look and feel something like a
book. Yet, when one peers down into the screen of the device, one sees both the book - and the
library.

Like a Facebook account, which must uniquely correspond to a real person, the e-reader is an
individualizing device. It is the object that establishes trusted access with books stored in the cloud
and ensures that each and every person purchases their own rights to read each book. The only
transfer that is allowed is of the device itself, which is the thing that a person actually does own.
But even then, such an act must be reported back to the cloud: the hardware needs to be deregistered and then re-registered with credit card and authentication details about the new owner.

This is no library - or it's only a library in the most impoverished sense of the word. It is a new
enclosure, and it is a familiar story: things in the world (from letters, to photographs, to albums, to
books) are digitized (as emails, JPEGs, MP3s, and PDFs) and subsequently migrate to a remote

location or service (Gmail, Facebook, iTunes, Kindle Store). The middle phase is the biggest
disruption, when the interface does the poorest job concealing the material transformations taking
place, when the work involved in creating those transformations is most apparent, often because
the person themselves is deeply involved in the process (of ripping vinyl, for instance). In the third
phase, the user interface becomes easier, more “frictionless,” and what appears to be just another
application or folder on one’s computer is an engorged, property-and-energy-hungry warehouse a
thousand miles away.

Capture, Loss

Intellectual property's enclosure is easy enough to imagine in warehouses of remote, secure hard
drives. But the cloud internalizes processing as well as storage, capturing the new forms of cooperation and collaboration characterizing the new economy and its immaterial labor. Social
relations are transmuted into database relations on the "social web," which absorbs selforganization as well. Because of this, the cloud impacts as strongly on the production of
publications, as on their consumption, in the tradition sense.

Storage, applications, and services offered in the cloud are marketed for consumption by authors
and publishers alike. Document editing, project management, and accounting are peeled slowly
away from the office staff and personal computers into the data centers; interfaces are established
into various publication channels from print on demand to digital book platforms. In the fully
realized vision of cloud publishing, the entire technical and logistical apparatus is externalized,
leaving only the human labor.. and their thin devices remaining. Little distinguishes the authorobject from the editor-object from the reader-object. All of them.. maintain their position in the
network by paying for lightweight computers and their updates, cloud services, and broadband
internet connections.
On the production side of the book, the promise of the cloud is a recovery of the profits “lost” to
file-sharing, as all that exchange is disciplined, standardized and measured. Consumers are finally
promised the access to the history of human knowledge that they had already improvised by
themselves, but now without the omnipresent threat of legal prosecution. One has the sneaking
suspicion though.. that such a compromise is as hollow.. as the promises to a desperate city of the

jobs that will be created in a new constructed data center - - and that pitting “food on the table”
against “access to knowledge” is both a distraction from and a legitimation of the forms of power
emerging in the cloud. It's a distraction because it's by policing access to knowledge that the
middle-man platform can extract value from publication, both on the writing and reading sides of
the book; and it's a legitimation because the platform poses itself as the only entity that can resolve
the contradiction between the two sides.

When the platform recedes behind the interface, these two sides are the the most visible
antagonism - in a tug-of-war with each other - - yet neither the “producers” nor the “consumers” of
publications are becoming more wealthy, or working less to survive. If we turn the picture
sideways, however, a new contradiction emerges, between the indebted, living labor - of authors,
editors, translators, and readers - on one side, and on the other.. data centers, semiconductors,
mobile technology, expropriated software, power companies, and intellectual property.
The talk in the data center industry of the “industrialization” of the cloud refers to the scientific
approach to improving design, efficiency, and performance. But the term also recalls the basic
narrative of the Industrial Revolution: the movement from home-based manufacturing by hand to
large-scale production in factories. As desktop computers pass into obsolescence, we shift from a
networked, but small-scale, relationship to computation (think of “home publishing”) to a
reorganized form of production that puts the accumulated energy of millions to work through
these cloud companies and their modernized data centers.

What kind of buildings are these blank superstructures? Factories for the 21st century? An engineer
named Ken Patchett described the Facebook data center that way in a television interview, “This is
a factory. It’s just a different kind of factory than you might be used to.” Those factories that we’re
“used to,” continue to exist (at Foxconn, for instance) producing the infrastructure, under
recognizably exploitative conditions, for a “different kind of factory,” - a factory that extends far
beyond the walls of the data center.

But the idea of the factory is only part of the picture - this building is also a mine.. and the
dispersed workforce devote most of their waking hours to mining-in-reverse, packing it full of data,
under the expectation that someone - soon - will figure out how to pull out something valuable.

Both metaphors rely on the image of a mass of workers (dispersed as it may be) and leave a darker
and more difficult possibility: the data center is like the hydroelectric plant, damming up property,
sociality, creativity and knowledge, while engineers and financiers look for the algorithms to
release the accumulated cultural and social resources on demand, as profit.

This returns us to the interface, site of the struggles over the management and control of access to
property and infrastructure. Previously, these struggles were situated within the computer-object
and the implied freedom provided by its computation, storage, and possibilities for connection
with others. Now, however, the eviscerated device is more interface than object, and it is exactly
here at the interface that the new technological enclosures have taken form (for example, see
Apple's iOS products, Google's search box, and Amazon's "marketplace"). Control over the
interface is guaranteed by control over the entire techno-business stack: the distributed hardware
devices, centralized data centers, and the software that mediates the space between. Every major
technology corporation must now operate on all levels to protect against any loss.

There is a centripetal force to the cloud and this essay has been written in its irresistible pull. In
spite of the sheer mass of capital that is organized to produce this gravity and the seeming
insurmountability of it all, there is no chance that the system will absolutely manage and control
the noise within it. Riots break out on the factory floor; algorithmic trading wreaks havoc on the
stock market in an instant; data centers go offline; 100 million Facebook accounts are discovered
to be fake; the list will go on. These cracks in the interface don't point to any possible future, or
any desirable one, but they do draw attention to openings that might circumvent the logic of
access.

"What happens from there is another question." This is where I left things off in the text when I
finished it a year ago. It's a disappointing ending: we just have to invent ways of occupying the
destruction, violence and collapse that emerge out of economic inequality, global warming,
dismantled social welfare, and so on. And there's not much that's happened since then to make us
very optimistic - maybe here I only have to mention the NSA. But as I began with an ending, I
really should end at a beginning.
I think we were obliged to adopt a negative, critical position in response the cyber-utopianism of

the last almost 20 years, whether in its naive or cynical forms. We had to identify and theorize the
darker side of things. But it can become habitual, and when the dark side materializes, as it has
over the past few years - so that everyone knows the truth - then the obligation flips around,
doesn't it? To break out of habitual criticism as the tacit, defeated acceptance of what is. But, what
could be? Where do we find new political imaginaries? Not to ask what is the bright side, or what
can we do to cope, but what are the genuinely emancipatory possibilities that are somehow still
latent, buried under the present - or emerging within those ruptures in it? - - - I can't make it all
the way to a happy ending, to a happy beginning, but at least it's a beginning and not the end.

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.

line {print $field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR

line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR