Barok
Poetics of Research
2014


_An unedited version of a talk given at the conference[Public
Library](http://www.wkv-stuttgart.de/en/program/2014/events/public-library/)
held at Württembergischer Kunstverein Stuttgart, 1 November 2014._

_Bracketed sequences are to be reformulated._

Poetics of Research

In this talk I'm going to attempt to identify [particular] cultural
algorithms, ie. processes in which cultural practises and software meet. With
them a sphere is implied in which algorithms gather to form bodies of
practices and in which cultures gather around algorithms. I'm going to
approach them through the perspective of my practice as a cultural worker,
editor and artist, considering practice in the same rank as theory and
poetics, and where theorization of practice can also lead to the
identification of poetical devices.

The primary motivation for this talk is an attempt to figure out where do we
stand as operators, users [and communities] gathering around infrastructures
containing a massive body of text (among other things) and what sort of things
might be considered to make a difference [or to keep making difference].

The talk mainly [considers] the role of text and the word in research, by way
of several figures.

A

A reference, list, scheme, table, index; those things that intervene in the
flow of narrative, illustrating the point, perhaps in a more economic way than
the linear text would do. Yet they don't function as pictures, they are
primarily texts, arranged in figures. Their forms have been
standardised[normalised] over centuries, withstood the transition to the
digital without any significant change, being completely intuitive to the
modern reader. Compared to the body of text they are secondary, run parallel
to it. Their function is however different to that of the punctuation. They
are there neither to shape the narrative nor to aid structuring the argument
into logical blocks. Nor is their function spatial, like in visual poems.
Their positions within a document are determined according to the sequential
order of the text, [standing as attachments] and are there to clarify the
nature of relations among elements of the subject-matter, or to establish
relations with other documents. The [premise] of my talk is that these
_textual figures_ also came to serve as the abstract[relational] models
determining possible relations among documents as such, and in consequence [to
structure conditions [of research]].

B

It can be said that research, as inquiry into a subject-matter, consists of
discrete queries. A query, such as a question about what something is, what
kinds, parts and properties does it have, and so on, can be consulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.

Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.

C

Searching in a collection of [amassed/assembled] [tangible] documents (ie.
bookshelf) is different from searching in a systematically structured
repository (library) and even more so from searching in a digital repository
(digital library). Not that they are mutually exclusive. One can devise
structures and algorithms to search through a printed text, or read books in a
library one by one. They are rather [models] [embodying] various [processes]
associated with the query. These properties of the query might be called [the
sequence], the structure and the index. If they are present in the ways of
querying documents, and we will return to this issue, are they persistent
within the inquiry as such? [wait]

D

This question itself is a rupture in the sequence. It makes a demand to depart
from one narrative [a continuous flow of words] to another, to figure out,
while remaining bound to it [it would be even more as a so-called rhetorical
question]. So there has been one sequence, or line, of the inquiry--about the
kinds of the query and its properties. That sequence itself is a digression,
from within the sequence about what is research and describing its parts
(queries). We are thus returning to it and continue with a question whether
the properties of the inquiry are the same as the properties of the query.

E

But isn't it true that every single utterance occurring in a sequence yields a
query as well? Let's consider the word _utterance_. [wait] It can produce a
number of associations, for example with how Foucault employs the notion of
_énoncé_ in his _Archaeology of Knowledge_ , giving hard time to his English
translators wondering whether _utterance_ or _statement_ is more appropriate,
or whether they are interchangeable, and what impact would each choice have on
his reception in the Anglophone world. Limiting ourselves to textual forms for
now (and not translating his work but pursing a different inquiry), let us say
the utterance is a word [or a phrase or an idiom] in a sequence such as a
sentence, a paragraph, or a document.

## (F) The
structure[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=1
"Edit section: \(F\) The structure")]

This distinction is as old as recorded Western thought since both Plato and
Aristotle differentiate between a word on its own ("the said", a thing said)
and words in the company of other words. For example, Aristotle's _Categories_
[lay] on the [notion] of words on their own, and they are made the subject-
matter of that inquiry. [For him], the ambiguity of connotation words
[produce] lies in their synonymity, understood differently from the moderns--
not as more words denoting a similar thing but rather one word denoting
various things. Categories were outlined as a device to differentiate among
words according to kinds of these things. Every word as such belonged to not
less and not more than one of ten categories.

So it happens to the word _utterance_ , as to any other word uttered in a
sequence, that it poses a question, a query about what share of the spectrum
of possibly denoted things might yield as the most appropriate in a given
context. The more context the more precise share comes to the fore. When taken
out of the context ambiguity prevails as the spectrum unveils in its variety.

Thus single words [as any other utterances] are questions, queries,
themselves, and by occuring in statements, in context, their [means] are being
singled out.

This process is _conditioned_ by what has been formalized as the techniques of
_regulating_ definitions of words.

### (G) The structure: words as
words[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=2
"Edit section: \(G\) The structure: words as words")]

* [![](/images/thumb/c/c8/Philitas_in_P.Oxy.XX_2260_i.jpg/144px-Philitas_in_P.Oxy.XX_2260_i.jpg)](/File:Philitas_in_P.Oxy.XX_2260_i.jpg)

P.Oxy.XX 2260 i: Oxyrhynchus papyrus XX, 2260, column i, with quotation from
Philitas, early 2nd c. CE. 1(http://163.1.169.40/cgi-
bin/library?e=q-000-00---0POxy--00-0-0--0prompt-10---4------0-1l--1-en-50---
20-about-2260--
00031-001-0-0utfZz-8-00&a=d&c=POxy&cl=search&d=HASH13af60895d5e9b50907367)
2(http://en.wikipedia.org/wiki/File:POxy.XX.2260.i-Philitas-
highlight.jpeg)

* [![](/images/thumb/9/9e/Cyclopaedia_1728_page_210_Dictionary_entry.jpg/88px-Cyclopaedia_1728_page_210_Dictionary_entry.jpg)](/File:Cyclopaedia_1728_page_210_Dictionary_entry.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , 1728, p. 210. 3(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0576&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/b/b8/Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg/160px-Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)](/File:Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)

Detail from the Liddell-Scott Greek-English Lexicon, c1843.

Dictionaries have had a long life. The ancient Greek scholar and poet Philitas
of Cos living in the 4th c. BCE wrote a vocabulary explaining the meanings of
rare Homeric and other literary words, words from local dialects, and
technical terms. The vocabulary, called _Disorderly Words_ (Átaktoi glôssai),
has been lost, with a few fragments quoted by later authors. One example is
that the word πέλλα (pélla) meant "wine cup" in the ancient Greek region of
Boeotia; contrasted to the same word meaning "milk pail" in Homer's _Iliad_.

Not much has changed in the way how dictionaries constitute order. Selected
archives of statements are queried to yield occurrences of particular words,
various _criteria[indicators]_ are applied to filtering and sorting them and
in turn the spectrum of [denoted] things allocated in this way is structured
into groups and subgroups which are then given, according to other set of
rules, shorter or longer names. These constitute facets of [potential]
meanings of a word.

So there are at least _four_ sets of conditions [structuring] dictionaries.
One is required to delimit an archive[corpus of texts], one to select and give
preference[weights] to occurrences of a word, another to cluster them, and yet
another to abstract[generalize] the subject-matter of each of these clusters.
Needless to say, this is a craft of a few and these criteria are rarely being
disclosed, despite their impact on research, and more generally, their
influence as conditions for production[making] of a so called _common sense_.

It doesn't take that much to reimagine what a dictionary is and what it could
be, especially having large specialized corpora of texts at hand. These can
also serve as aids in production of new words and new meanings.

### (H) The structure: words as knowledge and the
world[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=3
"Edit section: \(H\) The structure: words as knowledge and the world")]

* [![](/images/thumb/0/02/Boethius_Porphyrys_Isagoge.jpg/120px-Boethius_Porphyrys_Isagoge.jpg)](/File:Boethius_Porphyrys_Isagoge.jpg)

Boethius's rendering of a classification tree described in Porphyry's Isagoge
(3th c.), [6th c.] 10th c.
4(http://www.e-codices.unifr.ch/en/sbe/0315/53/medium)

* [![](/images/thumb/d/d0/Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg/94px-Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)](/File:Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , London, 1728, p. II. 5(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0015&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/d/d6/Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg/116px-Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)](/File:Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)

Système figuré des connaissances humaines, _Encyclopédie ou Dictionnaire
raisonné des sciences, des arts et des métiers_ , 1751.
6(http://encyclopedie.uchicago.edu/content/syst%C3%A8me-figur%C3%A9-des-
connaissances-humaines)

* [![](/images/thumb/9/96/Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg/96px-Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)](/File:Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)

Haeckel - Darwin's tree.

Another _formalized_ and [internalized] process being at play when figuring
out a word is its [containment]. Word is not only structured by way of things
it potentially denotes but also by words it is potentially part of and those
it contains.

The fuzz around categorization of knowledge _and_ the world in the Western
thought can be traced back to Porphyry, if not further. In his introduction to
Aristotle's _Categories_ this 3rd century AD Neoplatonist began expanding the
notions of genus and species into their hypothetic consequences. Aristotle's
brief work outlines ten categories of 'things that are said' (legomena,
λεγόμενα), namely substance (or substantive, {not the same as matter!},
οὐσία), quantity (ποσόν), qualification (ποιόν), a relation (πρός), where
(ποῦ), when (πότε), being-in-a-position (κεῖσθαι), having (or state,
condition, ἔχειν), doing (ποιεῖν), and being-affected (πάσχειν). In his
different work, _Topics_ , Aristotle outlines four kinds of subjects/materials
indicated in propositions/problems from which arguments/deductions start.
These are a definition (όρος), a genus (γένος), a property (ἴδιος), and an
accident (συμβεβηϰόϛ). Porphyry does not explicitly refer _Topics_ , and says
he omits speaking "about genera and species, as to whether they subsist (in
the nature of things) or in mere conceptions only"
8(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C1),
which means he avoids explicating whether he talks about kinds of concepts or
kinds of things in the sensible world. However, the work sparked confusion, as
the following passage [suggests]:

> "[I]n each category there are certain things most generic, and again, others
most special, and between the most generic and the most special, others which
are alike called both genera and species, but the most generic is that above
which there cannot be another superior genus, and the most special that below
which there cannot be another inferior species. Between the most generic and
the most special, there are others which are alike both genera and species,
referred, nevertheless, to different things, but what is stated may become
clear in one category. Substance indeed, is itself genus, under this is body,
under body animated body, under which is animal, under animal rational animal,
under which is man, under man Socrates, Plato, and men particularly." (Owen
1853,
9(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C2))

Porphyry took one of Aristotle's ten categories of the word, substance, and
dissected it using one of his four rhetorical devices, genus. Employing
Aristotle's categories, genera and species as means for logical operations,
for dialectic, Porphyry's interpretation resulted in having more resemblance
to the perceived _structures_ of the world. So they began to bloom.

There were earlier examples, but Porphyry was the most influential in
injecting the _universalist_ version of classification [implying] the figure
of a tree into the [locus] of Aristotle's thought. Knowledge became
monotheistic.

Classification schemes [growing from one point] play a major role in
untangling the format of modern encyclopedia from that of the dictionary
governed by alphabet. Two of the most influential encyclopedias of the 18th
century are cases in the point. Although still keeping 'dictionary' in their
titles, they are conceived not to represent words but knowledge. The [upper-
most] genus of the body was set as the body of knowledge. The English
_Cyclopaedia, or an Universal Dictionary of Arts and Sciences_ (1728) splits
into two main branches: "natural and scientifical" and "artificial and
technical"; these further split down to 47 classes in total, each carrying a
structured list (on the following pages) of thematic articles, serving as
table of contents. The French _Encyclopedia: or a Systematic Dictionary of the
Sciences, Arts, and Crafts_ (1751) [unwinds] from judgement ( _entendement_ ),
branches into memory as history, reason as philosophy, and imagination as
poetry. The logic of containers was employed as an aid not only to deal with
the enormous task of naming and not omiting anything from what is known, but
also for the management of labour of hundreds of writers and researchers, to
create a mechanism for delegating work and the distribution of
responsibilities. Flesh was also more present, in the field research, with
researchers attending workshops and sites of everyday life to annotate it.

The world came forward to unshine the word in other schemes. Darwin's tree of
evolution and some of the modern document classification systems such as
Charles A. Cutter's _Expansive Classification_ (1882) set to classify the
world itself and set the field for what has came to be known as authority
lists structuring metadata in today's computing.

### The structure
(summary)[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=4
"Edit section: The structure \(summary\)")]

Facetization of meaning and branching of knowledge are both the domain of the
unit of utterance.

While lexicographers[dictionarists] structure thought through multi-layered
processes of abstraction of the written record, knowledge growers dissect it
into hierarchies of [mutually] contained notions.

One seek to describe the word as a faceted list of small worlds, another to
describe the world as a structured lists of words. One play prime in the
domain of epistemology, in what is known, controlling the vocabulary, another
in the domain of ontology, in what is, controlling reality.

Every [word] has its given things, every thing has its place, closer or
further from a single word.

The schism between classifying words and classifying the world implies it is
not possible to construct a universal classification scheme[system]. On top of
that, any classification system of words is bound to a corpus of texts it is
operating upon and any classification system of the world again operates with
words which are bound to a vocabulary[lexicon] which is again bound to a
corpus [of texts]. It doesn't mean it would prevent people from trying.
Classifications function as descriptors of and 'inscriptors' upon the world,
imprinting their authority. They operate from [a locus of] their
corpus[context]-specificity. The larger the corpus, the more power it has on
shaping the world, as far as the word shapes it (yes, I do imply Google here,
for which it is a domain to be potentially exploited).

## (J) The
sequence[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=5
"Edit section: \(J\) The sequence")]

The structure-yielding query [of] the single word [shrinks][zuzuje
sa,spresnuje] with preceding and following words. Inquiry proceeds in the flow
that establishes another kind[mode] of relationality, chaining words into the
sequence. While the structuring property of the query brings words apart from
each other, its sequential property establishes continuity and brings these
units into an ordered set.

This is what is responsible for attaching textual figures mentioned earlier
(lists, schemes, tables) to the body of the text. Associations can be also
stated explicitly, by indexing tables and then referring them from a
particular point in the text. The same goes for explicit associations made
between blocks of the text by means of indexed paragraphs, chapters or pages.

From this follows that all utterances point to the following utterance by the
nature of sequential order, and indexing provides means for pointing elsewhere
in the document as well.

A lot can be said about references to other texts. Here, to spare time, I
would refer you to a talk I gave a few months ago and which is online
10(http://monoskop.org/Talks/Communing_Texts).

This is still the realm of print. What happens with document when it is
digitized?

Digitization breaks a document into units of which each is assigned a numbered
position in the sequence of the document. From this perspective digitization
can be viewed as a total indexation of the document. It is converted into
units rendered for machine operations. This sequentiality is made explicit, by
means of an underlying index.

Sequences and chains are orders of one dimension. Their one-dimensional
ordering allows addressability of each element and [random] access. [Jumps]
between [random] addresses are still sequential, processing elements one at a
time.

## (K) The
index[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=6
"Edit section: \(K\) The index")]

* [![](/images/thumb/2/27/Summa_confessorum.1310.jpg/103px-Summa_confessorum.1310.jpg)](/File:Summa_confessorum.1310.jpg)

Summa confessorum [1297-98], 1310.
7(http://www.bl.uk/onlinegallery/onlineex/illmanus/roymanucoll/j/011roy000008g11u00002000.html)

[The] sequencing not only weaves words into statements but activates other
temporalities, and _presents occurrences of words from past statements_. As
now when I am saying the word _utterance_ , each time there surface contexts
in which I have used it earlier.

A long quote from Frederick G. Kilgour, _The Evolution of the Book_ , 1998, pp
76-77:

> "A century of invention of various types of indexes and reference tools
preceded the advent of the first subject index to a specific book, which
occurred in the last years of the thirteenth century. The first subject
indexes were "distinctions," collections of "various figurative or symbolic
meanings of a noun found in the scriptures" that "are the earliest of all
alphabetical tools aside from dictionaries." (Richard and Mary Rouse supply an
example: "Horse = Preacher. Job 39: 'Hast thou given the horse strength, or
encircled his neck with whinning?')

>

> [Concordance] By the end of the third decade of the thirteenth century Hugh
de Saint-Cher had produced the first word concordance. It was a simple word
index of the Bible, with every location of each word listed by [its position
in the Bible specified by book, chapter, and letter indicating part of the
chapter]. Hugh organized several dozen men, assigning to each man an initial
letter to search; for example, the man assigned M was to go through the entire
Bible, list each word beginning with M and give its location. As it was soon
perceived that this original reference work would be even more useful if words
were cited in context, a second concordance was produced, with each word in
lengthy context, but it proved to be unwieldy. [Soon] a third version was
produced, with words in contexts of four to seven words, the model for
biblical concordances ever since.

>

> [Subject index] The subject index, also an innovation of the thirteenth
century, evolved over the same period as did the concordance. Most of the
early topical indexes were designed for writing sermons; some were organized,
while others were apparently sequential without any arrangement. By midcentury
the entries were in alphabetical order, except for a few in some classified
arrangement. Until the end of the century these alphabetical reference works
indexed a small group of books. Finally John of Freiburg added an alphabetical
subject index to his own book, _Summa Confessorum_ (1297—1298). As the Rouses
have put it, 'By the end of the [13]th century the practical utility of the
subject index is taken for granted by the literate West, no longer solely as
an aid for preachers, but also in the disciplines of theology, philosophy, and
both kinds of law.'"

In one sense neither subject-index nor concordane are indexes, they are words
or group of words selected according to given criteria from the body of the
text, each accompanied with a list of identifiers. These identifiers are
elements of an index, whether they represent a page, chapter, column, or other
[kind of] block of text. Every identifier is an unique _address_.

The index is thus an ordering of a sequence by means of associating its
elements with a set of symbols, when each element is given unique combination
of symbols. Different sizes of sets yield different number of variations.
Symbol sets such as an alphabet, arabic numerals, roman numerals, and binary
digits have different proportions between the length of a string of symbols
and the number of possible variations it can contain. Thus two symbols of
English alphabet can store 26^2 various values, of arabic numerals 10^2, of
roman numberals 8^2 and of binary digits 2^2.

Indexation is segmentation, a breaking into segments. From as early as the
13th century the index such as that of sections has served as enabler of
search. The more [detailed] indexation the more precise search results it
enables.

The subject-index and concordance are tables of search results. There is a
direct lineage from the 13th-century biblical concordances and the birth of
computational linguistic analysis, they were both initiated and realised by
priests.

During the World War II, Jesuit Father Roberto Busa began to look for machines
for the automation of the linguistic analysis of the 11 million-word Latin
corpus of Thomas Aquinas and related authors.

Working on his Ph.D. thesis on the concept of _praesens_ in Aquinas he
realised two things:

> "I realized first that a philological and lexicographical inquiry into the
verbal system of an author has t o precede and prepare for a doctrinal
interpretation of his works. Each writer expresses his conceptual system in
and through his verbal system, with the consequence that the reader who
masters this verbal system, using his own conceptual system, has to get an
insight into the writer's conceptual system. The reader should not simply
attach t o the words he reads the significance they have in his mind, but
should try t o find out what significance they had in the writer's mind.
Second, I realized that all functional or grammatical words (which in my mind
are not 'empty' at all but philosophically rich) manifest the deepest logic of
being which generates the basic structures of human discourse. It is .this
basic logic that allows the transfer from what the words mean today t o what
they meant to the writer.

>

> In the works of every philosopher there are two philosophies: the one which
he consciously intends to express and the one he actually uses to express it.
The structure of each sentence implies in itself some philosophical
assumptions and truths. In this light, one can legitimately criticize a
philosopher only when these two philosophies are in contradiction."
11(http://www.alice.id.tue.nl/references/busa-1980.pdf)

Collaborating with the IBM in New York from 1949, the work, a concordance of
all the words of Thomas Aquinas, was finally published in the 1970s in 56
printed volumes (a version is online since 2005
12(http://www.corpusthomisticum.org/it/index.age)). Besides that, an
electronic lexicon for automatic lemmatization of Latin words was created by a
team of ten priests in the scope of two years (in two phases: grouping all the
forms of an inflected word under their lemma, and coding the morphological
categories of each form and lemma), containing 150,000 forms
13(http://www.alice.id.tue.nl/references/busa-1980.pdf#page=4). Father
Busa has been dubbed the father of humanities computing and recently also of
digital humanities.

The subject-index has a crucial role in the printed book. It is the only means
for search the book offers. Subjects composing an index can be selected
according to a classification scheme (specific to a field of an inquiry), for
example as elements of a certain degree (with a given minimum number of
subclasses).

Its role seemingly vanishes in the digital text. But it can be easily
transformed. Besides serving as a table of pre-searched results the subject-
index also gives a distinct idea about content of the book. Two patterns give
us a clue: numbers of occurrences of selected words give subjects weights,
while words that seem specific to the book outweights other even if they don't
occur very often. A selection of these words then serves as a descriptor of
the whole text, and can be thought of as a specific kind of 'tags'.

This process was formalized in a mathematical function in the 1970s, thanks to
a formula by Karen Spärck Jones which she entitled 'inverse document
frequency' (IDF), or in other words, "term specificity". It is measured as a
proportion of texts in the corpus where the word appears at least once to the
total number of texts. When multiplied by the frequency of the word _in_ the
text (divided by the maximum frequency of any word in the text), we get _term
frequency-inverse document frequency_ (tf-idf). In this way we can get an
automated list of subjects which are particular in the text when compared to a
group of texts.

We came to learn it by practice of searching the web. It is a mechanism not
dissimilar to thought process involved in retrieving particular information
online. And search engines have it built in their indexing algorithms as well.

There is a paper proposing attaching words generated by tf-idf to the
hyperlinks when referring websites 14(http://bscit.berkeley.edu/cgi-
bin/pl_dochome?query_src=&format=html&collection=Wilensky_papers&id=3&show_doc=yes).
This would enable finding the referred content even after the link is dead.
Hyperlinks in references in the paper use this feature and it can be easily
tested: 15(http://www.cs.berkeley.edu/~phelps/papers/dissertation-
abstract.html?lexical-
signature=notemarks+multivalent+semantically+franca+stylized).

There is another measure, cosine similarity, which takes tf-idf further and
can be applied for clustering texts according to similarities in their
specificity. This might be interesting as a feature for digital libraries, or
even a way of organising library bottom-up into novel categories, new
discourses could emerge. Or as an aid for researchers to sort through texts,
or even for editors as an aid in producing interesting anthologies.

## Final
remarks[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=7
"Edit section: Final remarks")]

1

New disciplines emerge all the time - most recently, for example, cultural
techniques, software studies, or media archaeology. It takes years, even
decades, before they gain dedicated shelves in libraries or a category in
interlibrary digital repositories. Not that it matters that much. They are not
only sites of academic opportunities but, firstly, frameworks of new
perspectives of looking at the world, new domains of knowledge. From the
perspective of researcher the partaking in a discipline involves negotiating
its vocabulary, classifications, corpus, reference field, and specific
terms[subjects]. Creating new fields involves all that, and more. Even when
one goes against all disciplines.

2

Google can still surprise us.

3

Knowledge has been in the making for millenia. There have been (abstract)
mechanisms established that govern its conditions. We now possess specialized
corpora of texts which are interesting enough to serve as a ground to discuss
and experiment with dictionaries, classifications, indexes, and tools for
references retrieval. These all belong to the poetic devices of knowledge-
making.

4

Command-line example of tf-idf and concordance in 3 steps.

* 1\. Process the files text.1-5.txt and produce freq.1-5.txt with lists of (nonlemmatized) words (in respective texts), ordered by frequency:

> for i in {1..5}; do tr '[A-Z]' '[a-z]' < text.$i.txt | tr -c '[a-z]'
'[\012*]' | tr -d '[:punct:]' | sort | uniq -c | sort -k 1nr | sed '1,1d' >
temp.txt; max=$(awk -vvar=1 -F" " 'NR

1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx*2)), $2}' > freq.$i.txt; done && rm temp.txt

* 2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR

line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR

line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf*$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

* 3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR

line {print $field}' tfidf.$j.txt); egrep -i
"[alpha:](/index.php?title=Alpha:&action=edit&redlink=1 "Alpha: \(page does
not exist\)") $word" occ.$j.txt >> occ.txt; done; done

Dušan Barok

_Written 23 October - 1 November 2014 in Bratislava and Stuttgart._


Dockray
The Scan and the Export
2010


the image, corrects the contrast, crops out the use­
less bits, sharpens the text, and occasionally even
attempts to read it. All of this computation wants
to repress any traces of reading and scanning, with
the obvious goal of returning to the pure book, or
an even more Platonic form.
That purified, originary version of the text
might be the e-book. Publishers are occasionally
skipping the act of printing altogether and selling
the files themselves, such that the words reserved
for “
well-scanned”books ultimately describe ebooks: clean, searchable, small (i.e., file size). Al­
though it is perfectly understandable for a reader
to prefer aligned text without smudges or other
markings where “
paper”is nothing but a pure,
bright white, this movement towards the clean has
its consequences. Distinguished as a form by the
fact that it is produced, distributed, and consumed
digitally, the e-book never leaves the factory.
A minimal gap is, however, created between
the file that the producer uses and the one that
the consumer uses— imagine the cultural chaos
if the typical way of distributing books were as
Word documents!— through the process of export­
ing. Whereas scanning is a complex process and
material transformation (which includes exporting
at the very end), exporting is merely converting
formats. But however minor an act, this conver­
sion is what puts a halt to the writing and turns
the file into a product for reading. It is also at this
stage that forms of “
digital rights management”ate
applied in order to restrict copying and printing of
the file.
Sharing and copying texts is as old as books
themselves— actually, one could argue that this is
almost a definition of the book— but computers
and the Internet have only accelerated this
activ­ity. From transcription to tracing to photocopying
to scanning, the labour and material costs involved
in producing a copy has fallen to nothing in our
present digital file situation. Once the scan has
generated a digitized version of some kind, say a
PDF, it easily replicates and circulates. This is not
aberrant behaviour, either, but normative comput­
er use: copy and paste are two of the first choices
in any contextual menu. Personal file storage has
slowly been migrating onto computer networks,
particularly with the growth of mobile devices, so

Sean Dockray

The Scan and the Export
The scan is an ambivalent image. It oscillates
back and forth: between a physical page and a
digital file, between one reader and another, be­
tween an economy of objects and an economy of
data. Scans are failures in terms of quality, neither
as “
readable”as the original book nor the inevi­
table ebook, always containing too much visual
information or too little.
Technically speaking, it is by scanning that
one can make a digital representation of a physical
object, such as a book. When a representation of
that representation (the image) appears on a digital
display device, it hovers like a ghost, one world
haunting another. But it is not simply the object
asserting itself in the milieu of light, informa­
tion, and electricity. Much more is encoded in
the image: indexes of past readings and the act of
scanning itself.
An incomplete inventory of modifications to
the book through reading and other typical events
in the life of the thing: folded pages, underlines,
marginal notes, erasures, personal symbolic sys­
tems, coffee spills, signatures, stamps, tears, etc.
Intimacy between reader and text marking the
pages, suggesting some distant future palimpsest in
which the original text has finally given way to a
mass of negligible marks.
Whereas the effects of reading are cumulative,
the scan is a singular event. Pages are spread and
pressed flat against a sheet of glass. The binding
stretches, occasionally to the point of breaking.
A camera driven by a geared down motor slides
slowly down the surface of the page. Slight move­
ment by the person scanning (who is also a scan­
ner; this is a man-machine performance) before
the scan is complete produces a slight motion blur,
the type goes askew, maybe a finger enters the
frame of the image. The glass is rarely covered in
its entirety by the book and these windows into
the actual room where the scanning is done are
ultimately rendered as solid, censored black. After
the physical scanning process comes post-produc­
tion. Software— automated or not— straightens

99

one's files are not always located on one's
equip­ment. The act of storing and retrieving shuffles
data across machines and state lines.
A public space is produced when something
is shared— which is to say, made public — but this
space is not the same everywhere or in all
circum­stances. When music is played for a room full of
people, or rather when all those people are simply
sharing the room, something is being made public.
Capitalism itself is a massive mechanism for
making things public, for appropriating materials,
people, and knowledge and subjecting them to its
logic. On the other hand, a circulating library, or a
library with a reading room, creates a public space
around the availability of books and other forms of
material knowledge. And even books being sold
through shops create a particular kind of public,
which is quite different from the public that is
formed by bootlegging those same books.
ft would appear that publicness is not simply a
question of state control or the absence of money.
Those categorical definitions offer very little to
help think about digital files and their native
tendency to replicate and travel across networks.
What kinds of public spaces are these, coming into
the foreground by an incessant circulation of data?
Tw o paradigmatic forms of publicness can be
described through the lens of the scan and the
export, two methods for producing a digital text.
Although neither method necessarily results in a
file that must be distributed, such files typically
are. In the case of the export, the system of
distribution tends to be through official, secure
digital repositories; limited previews provide a
small window into the content, which is ultimately
accessible only through the interface of the
shopping cart. On the other hand, the scan is
created by and moves between individuals, often
via improvised and itinerant distribution systems.
The scan travels from person to person, like a
virus. As long as it passes between people, that
common space between them stays alive. That
space might be contagious; it might break out into
something quite persuasive, an intimate publicness
becoming more common.
The scan is an image of a thing and is therefore
different from the thing (it is digital, not physical,
and it includes indexes of reading and scanning),

whereas a copy of the export is essentially identi­cal
to the export. Here is one reason there will ex­ist
many variations of a scan for a particular text,
while there will be one approved version (always a
clean one) of the export. A person may hold in his
or her possession a scan of a book but, no matter
what publishers may claim, the scan will never be
the book. Even if one was to inspect two files and
find them to be identical in every observable and
measurable quality, it may be revealed that these
are in fact different after all: one is a legitimate
copy and the other is not. Legitimacy in this case
has nothing whatsoever to do with internal traits,
such as fidelity to the original, but with external
ones, namely, records of economic transactions in
customer databases.
In practical terms, this means that a digital
book must be purchased by every single reader.
Unlike the book, which is commonly purchased,
read, then handed it off to a friend (who then
shares it with another friend and so on until it
comes to rest on someone’
s bookshelf) the digital
book is not transferable, by design and by law.
If ownership is fundamentally the capacity to give
something away, these books are never truly ours.
The intimate, transient publics that emerge out
of passing a book around are here eclipsed by a
singular, more inclusive public in which everyone
relates to his or her individual (identical) file.
Recently, with the popularization of digital
book readers (a device for another man-machine
pairing), the picture of this kind of publicness has
come into greater definition. Although a group of
people might all possess the same file, they will be
viewing that file through their particular readers,
which means surprisingly that they might all be
seeing something different. With variations built
into the device (in resolution, size, colour, display
technology) or afforded to the user (perhaps to
change font size or other flexible design ele­
ments), familiar forms of orientation within the
writing disappear as it loses the historical struc­
ture of the book and becomes pure, continuous
text. For example, page numbers give way to the
more abstract concept of a "location" when the
file is derived from the export as opposed to the
scan, from the text data as opposed to the
physi­cal object. The act of reading in a group is also

100

different ways. An analogy: they are not prints
from the same negative, but entirely different
photographs of the same subject. Our scans are
variations, perhaps competing (if we scanned the
same pages from the same edition), but, more
likely, functioning in parallel.
Gompletists prefer the export, which has a
number of advantages from their perspective:
the whole book is usually kept intact as one unit,
the file; file sizes are smaller because the files are
based more on the text than an image; the file is
found by searching (the Internet) as opposed to
searching through stacks, bookstores, and attics; it
is at least theoretically possible to have every file.
Each file is complete and the same everywhere,
such that there should be no need for variations.
At present, there are important examples of where
variations do occur, notably efforts to improve
metadata, transcode out of proprietary formats,
and to strip DRM restrictions. One imagines an
imminent future where variations proliferate based
on an additive reading— a reader makes highlights,
notations, and marginal arguments and then
re­distributes the file such that someone's
"reading" of a particular text would generate its own public,
the logic of the scan infiltrating the export.

different — "Turn to page 24" is followed by the
sound of a race of collective page flipping, while
"Go to location 2136" leads to finger taps and
caresses on plastic. Factions based on who has the
same edition of a book are now replaced by those
with people who have the same reading device.
If historical structures within the book are
made abstract then so are those organizing
struc­tures outside of the book. In other words, it's not
simply that the book has become the digital book
reader, but that the reader now contains the
li­brary itself! Public libraries are on the brink of be­
ing outmoded; books are either not being acquired
or they are moving into deep storage; and physical
spaces are being reclaimed as cafes, restaurants,
auditoriums, and gift shops. Even the concept
of donation is thrown into question: when most
public libraries were being initiated a century ago,
it was often women's clubs that donated their
col­lections to establish the institution; it is difficult to
imagine a corresponding form of cultural sharing
of texts within the legal framework of the export.
Instead, publishers might enter into a contract
directly with the government to allow access to
files from computers within the premises of the
library building. This fate seems counter-intuitive,
considering the potential for distribution latent
in the underlying technology, but even more so
when compared to the "traveling libraries" at the
turn of the twentieth century, which were literally
small boxes that brought books to places without
libraries (most often, rural communities).
Many scans, in fact, are made from library
books, which are identified through a stamp or a
sticker somewhere. (It is not difficult to see how
the scan is closely related to the photocopy, such
that they are now mutually evolving technolo­
gies.) Although it circulates digitally, like the
export, the scan is rooted in the object and is
never complete. In a basic sense, scanning is slow
and time-consuming (photocopies were slow and
expensive), and it requires that choices are made
about what to focus on. A scan of an entire book
is rare— really a labour of love and endurance;
instead, scanners excerpt from books, pulling out
the most interesting, compelling, difficult-to-find,
or useful bits. They skip pages. The scan is partial,
subjective. You and I will scan the same book in

About the Author

Sean Dockray is a Los Angeles-based artist. He is a
co­-director of Telic Arts Exchange and has initiated several
collaborative projects including AAAARG.ORG and The
Public School. He recently co-organized There is
noth­ing less passive than the act of fleeing, a 13-day seminar at
various sites in Berlin organized through The Public School
that discussed the promises, pitfalls, and possibilities for
extra-institutionality.

101

t often the starting-point is an idea composed of
a group of centrally aroused sensations due to simultaneous
excitation of a group
This would probably
in every case he in large part the result of association by
contiguity in terms of the older classification, although
there might be some part played by the immediate
excita­tion of the separatefP pby an external stimulus. Starting
from this given mass of central elements, all change comes
from the fact that some of the elements disappear and are
replaced by others through a second series of associations
by contiguity. The parts of the original idea which remain
serve as the excitants for the new elements which arise.
The nature of the process is exactly like that by which
the elements of the first idea were excited, and no new
process comes in. These successive associations are thus
really in their mechanism but a series of simultaneous
associations in which the elements that make up the different
ideas are constantly changing, but with some elements
that persist from idea to idea. There is thus a constant
flux of the ideas, but there is always a part of each idea
that persists over into the next and serves to start the
mechanism of revival There is never an entire stoppage
in the course of the ideas, never an absolute break in the
series, but the second idea is joined to the one that precedes
by an identical element in each.

124

A short time later, this control of urban noise had been implemented almost
everywhere, or at least in the politically best-controlled cities, where repetition
is most advanced.
We see noise reappear, however, in exemplary fashion at certain ritualized
moments: in these instances, the horn emerges as a derivative form of violence
masked by festival. All we have to do is observe how noise proliferates in echo
at such times to get a hint of what the epidemic proliferation of the essential
vio­lence can be like. The noise of car horns on New Year's Eve is, to my mind,
for the drivers an unconscious substitute for Carnival, itself a substitute for the
Dionysian festival preceding the sacrifice. A rare moment, when the hierarchies
are masked behind the windshields and a harmless civil war temporarily breaks
out throughout the city.
Temporarily. For silence and the centralized monopoly on the emission,
audition and surveillance of noise are afterward reimposed. This is an essential
control, because if effective it represses the emergence of a new order and a
challenge to repetition.

103

Thus, with the ball, we are all possible victims; we all expose our­
selves to this danger and we escape back and forth of "I."
The "I" in the game is a token exchanged. And
this passing, this network of passes, these vicariances of subjects weave
the collection. I am I now, a subject, that is to say, exposed to being
thrown down, exposed to falling, to being placed beneath the compact
mass of the others; then you take the relay, you are substituted for "I"
and become it; later on, it is he who gives it to you, his work done, his
danger finished, his part of the collective constructed. The "we" is made
by the bursts and occultations of the "I." The "we" is made by passing
the "I." By exchanging the "I." And by substitution and vicariance of
the "I."
That immediately appears easy to think about. Everyone carries
his stone, and the wall is built. Everyone carries his "I," and the "we" is
built. This addition is idiotic and resembles a political speech. No.

104

But then let them say it clearly:

The practice of happiness is subversive when it becom es collective.
Our will tor happiness and liberation is their terror, and they react by terrorizing
us with prison, when the repression of work, of the patriarchal family, and of sex­
ism is not enough.

But then let them say it clearly:

To conspire means to breathe together.

And that is what we are accused of, they want to prevent us from breathing
because we have refused to breathe In Isolation, in their asphyxiating places of
work, in their individuating familial relationships, in their atomizing houses.

There is a crime I confess I have committed:

It is the attack against the separation of life and desire, against sexism in Interindividual relationships, against the reduction of life to the payment of a salary.

105

Counterpublics

The stronger modification of ... analysis — one in which
he has shown little interest, though it is clearly of major
signifi­cance in the critical analysis of gender and sexuality — is that some
publics are defined by their tension with a larger public. Their
par­ticipants are marked off from persons or citizens in general.
Dis­cussion within such a public is understood to contravene the rules
obtaining in the world at large, being structured by alternative dis­
positions or protocols, making different assumptions about what
can be said or what goes without saying. This kind of public is, in
effect, a counterpublic: it maintains at some level, conscious or
not, an awareness of its subordinate status. The sexual cultures of
gay men or of lesbians would be one kind of example, but so would
camp discourse or the media of women's culture. A counterpublic
in this sense is usually related to a subculture, but there are
impor­tant differences between these concepts. A counterpublic, against
the background of the public sphere, enables a horizon of opinion
and exchange] its exchanges remain distinct from authority and
can have a critical relation to power; its extent is in principle
indef­inite, because it is not based on a precise demography but
medi­ated by print, theater, diffuse networks of talk, commerce, and ...

106

The term slang, which is less broad than language variety is described
by ... as a label that is frequently used to denote
certain informal or faddish usages of nearly anyone in the speech commu­nity.
However, slang, while subject to rapid change, is widespread and
familiar to a large number of speakers, unlike Polari. The terms jargon
and argot perhaps signify more what Polari stands for. as they are asso­
ciated with group membership and are used to serve as affirmation or
solidarity with other members. Both terms refer to "obscure or secret
language’or language of a particular occupational group ...
While jargon tends to refer to an occupational sociolect,
or a vocabulary particular to a field, argot is more concerned with language
varieties where speakers wish to conceal either themselves or aspects of
their communication from non-members. Although argot is perhaps the
most useful term considered so far in relation to Polari. there exists a
more developed theory that concentrates on stigmatised groups, and could
have been created with Polari specifically in mind: anti-language.
For ..., anti-language was to anti-society what language
was to society. An anti-society is a counter-culture, a society within a
society, a conscious alternative to society, existing by resisting either
pas-sively or by more hostile, destructive means. Anti-languages are
gen­erated by anti-societies and in their simplest forms arc partially relexicalised
languages, consisting of the same grammar but a different vocabulary
... in areas central to the activities ot subcultures.
Therefore a subculture based around illegal drug use would have words tor
drugs, the psychological effects of drugs, the police, money and so on. In
anti-languages the social values of words and phrases tend to be more
emphasised than in mainstream languages.

... found that 41 per cent of the criminals he
interviewed cave "the need for secrecy" as an important reason lor using
an anti-language, while 38 per cent listed 'verbal art'. However ...
in his account of the anti-language or grypserka of Polish
pris­oners. describes how, for the prisoners, their identity was threatened and
the creation of an anti-society provided a means by wtnclt an alternative
social structure (or reality) could be constructed, becoming the source of
a second identity tor the prisoners.

107

Streetwalker theorists cul­tivate the ability to sustain and create hangouts by hanging
out. Hangouts are highly fluid, worldly, nonsanctioned,
communicative, occupations of space, contestatory retreats for the
passing on of knowledge, for the tactical-strategic fashioning
of multivocal sense, of enigm atic vocabularies and gestures,
for the development of keen commentaries on structural
pres­sures and gaps, spaces of complex and open-ended recognition.
Hangouts are spaces that cannot be kept captive by the
private / public split. They are worldly, contestatory concrete
spaces within geographies sieged by and in defiance of logics
and structures of domination.20 The streetwalker theorist
walks in illegitim ate refusal to legitimate oppressive
arrange­ments and logics.

Common

108

As we apprehend it, the process of instituting com ­
munism can only take the form of a collection of
acts of communisation, of making common such-and-such
space, such-and-such machine, such-and-such knowledge.
That is to say, the elaboration
of the mode of sharing that attaches to them.
In­surrection itself is just an accelerator, a decisive
moment in the process.

... is a collection of places, infrastructures,
communised means; and the dreams, bodies,
mur­murs, thoughts, desires that circulate among those
places, the use of those means, the sharing of those
infrastructures.
The notion of ... responds to the necessity of
a minimal formalisation, which makes us accessible
as well as allows us to remain invisible. It belongs
to the communist way that we explain to ourselves
and formulate the basis of our sharing. So that the
most recent arrival is, at the very least, the equal of
the elder.

Whatever singularity, which wants to appropriate be longing itself,
its own being-in-language, and thus rejects all identity and every
condition of belonging, is the principal enemy of the State. Wherever these
singularities peacefully demonstrate their being in common there will be a
Tiananmen, and, sooner or later, the tanks will appear.

110


 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.