1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx*2)), $2}' > freq.$i.txt; done && rm temp.txt

* 2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR
line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR
line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf*$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

* 3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR

Barok
Poetics of Research
2014

_An unedited version of a talk given at the conference[Public
Library](http://www.wkv-stuttgart.de/en/program/2014/events/public-library/)
held at Württembergischer Kunstverein Stuttgart, 1 November 2014._

_Bracketed sequences are to be reformulated._

Poetics of Research

In this talk I'm going to attempt to identify [particular] cultural
algorithms, ie. processes in which cultural practises and software meet. With
them a sphere is implied in which algorithms gather to form bodies of
practices and in which cultures gather around algorithms. I'm going to
approach them through the perspective of my practice as a cultural worker,
editor and artist, considering practice in the same rank as theory and
poetics, and where theorization of practice can also lead to the
identification of poetical devices.

The primary motivation for this talk is an attempt to figure out where do we
stand as operators, users [and communities] gathering around infrastructures
containing a massive body of text (among other things) and what sort of things
might be considered to make a difference [or to keep making difference].

The talk mainly [considers] the role of text and the word in research, by way
of several figures.

A

A reference, list, scheme, table, index; those things that intervene in the
flow of narrative, illustrating the point, perhaps in a more economic way than
the linear text would do. Yet they don't function as pictures, they are
primarily texts, arranged in figures. Their forms have been
standardised[normalised] over centuries, withstood the transition to the
digital without any significant change, being completely intuitive to the
modern reader. Compared to the body of text they are secondary, run parallel
to it. Their function is however different to that of the punctuation. They
are there neither to shape the narrative nor to aid structuring the argument
into logical blocks. Nor is their function spatial, like in visual poems.
Their positions within a document are determined according to the sequential
order of the text, [standing as attachments] and are there to clarify the
nature of relations among elements of the subject-matter, or to establish
relations with other documents. The [premise] of my talk is that these
_textual figures_ also came to serve as the abstract[relational] models
determining possible relations among documents as such, and in consequence [to
structure conditions [of research]].

B

It can be said that research, as inquiry into a subject-matter, consists of
discrete queries. A query, such as a question about what something is, what
kinds, parts and properties does it have, and so on, can be consulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.

Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.

C

Searching in a collection of [amassed/assembled] [tangible] documents (ie.
bookshelf) is different from searching in a systematically structured
repository (library) and even more so from searching in a digital repository
(digital library). Not that they are mutually exclusive. One can devise
structures and algorithms to search through a printed text, or read books in a
library one by one. They are rather [models] [embodying] various [processes]
associated with the query. These properties of the query might be called [the
sequence], the structure and the index. If they are present in the ways of
querying documents, and we will return to this issue, are they persistent
within the inquiry as such? [wait]

D

This question itself is a rupture in the sequence. It makes a demand to depart
from one narrative [a continuous flow of words] to another, to figure out,
while remaining bound to it [it would be even more as a so-called rhetorical
question]. So there has been one sequence, or line, of the inquiry--about the
kinds of the query and its properties. That sequence itself is a digression,
from within the sequence about what is research and describing its parts
(queries). We are thus returning to it and continue with a question whether
the properties of the inquiry are the same as the properties of the query.

E

But isn't it true that every single utterance occurring in a sequence yields a
query as well? Let's consider the word _utterance_. [wait] It can produce a
number of associations, for example with how Foucault employs the notion of
_énoncé_ in his _Archaeology of Knowledge_ , giving hard time to his English
translators wondering whether _utterance_ or _statement_ is more appropriate,
or whether they are interchangeable, and what impact would each choice have on
his reception in the Anglophone world. Limiting ourselves to textual forms for
now (and not translating his work but pursing a different inquiry), let us say
the utterance is a word [or a phrase or an idiom] in a sequence such as a
sentence, a paragraph, or a document.

## (F) The
structure[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=1
"Edit section: $F$ The structure")]

This distinction is as old as recorded Western thought since both Plato and
Aristotle differentiate between a word on its own ("the said", a thing said)
and words in the company of other words. For example, Aristotle's _Categories_
[lay] on the [notion] of words on their own, and they are made the subject-
matter of that inquiry. [For him], the ambiguity of connotation words
[produce] lies in their synonymity, understood differently from the moderns--
not as more words denoting a similar thing but rather one word denoting
various things. Categories were outlined as a device to differentiate among
words according to kinds of these things. Every word as such belonged to not
less and not more than one of ten categories.

So it happens to the word _utterance_ , as to any other word uttered in a
sequence, that it poses a question, a query about what share of the spectrum
of possibly denoted things might yield as the most appropriate in a given
context. The more context the more precise share comes to the fore. When taken
out of the context ambiguity prevails as the spectrum unveils in its variety.

Thus single words [as any other utterances] are questions, queries,
themselves, and by occuring in statements, in context, their [means] are being
singled out.

This process is _conditioned_ by what has been formalized as the techniques of
_regulating_ definitions of words.

### (G) The structure: words as
words[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=2
"Edit section: $G$ The structure: words as words")]

* [![](/images/thumb/c/c8/Philitas_in_P.Oxy.XX_2260_i.jpg/144px-Philitas_in_P.Oxy.XX_2260_i.jpg)](/File:Philitas_in_P.Oxy.XX_2260_i.jpg)

P.Oxy.XX 2260 i: Oxyrhynchus papyrus XX, 2260, column i, with quotation from
Philitas, early 2nd c. CE. ¹(http://163.1.169.40/cgi-
bin/library?e=q-000-00---0POxy--00-0-0--0prompt-10---4------0-1l--1-en-50---
20-about-2260--
00031-001-0-0utfZz-8-00&a=d&c=POxy&cl=search&d=HASH13af60895d5e9b50907367)
²(http://en.wikipedia.org/wiki/File:POxy.XX.2260.i-Philitas-
highlight.jpeg)

* [![](/images/thumb/9/9e/Cyclopaedia_1728_page_210_Dictionary_entry.jpg/88px-Cyclopaedia_1728_page_210_Dictionary_entry.jpg)](/File:Cyclopaedia_1728_page_210_Dictionary_entry.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , 1728, p. 210. ³(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0576&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/b/b8/Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg/160px-Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)](/File:Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)

Detail from the Liddell-Scott Greek-English Lexicon, c1843.

Dictionaries have had a long life. The ancient Greek scholar and poet Philitas
of Cos living in the 4th c. BCE wrote a vocabulary explaining the meanings of
rare Homeric and other literary words, words from local dialects, and
technical terms. The vocabulary, called _Disorderly Words_ (Átaktoi glôssai),
has been lost, with a few fragments quoted by later authors. One example is
that the word πέλλα (pélla) meant "wine cup" in the ancient Greek region of
Boeotia; contrasted to the same word meaning "milk pail" in Homer's _Iliad_.

Not much has changed in the way how dictionaries constitute order. Selected
archives of statements are queried to yield occurrences of particular words,
various _criteria[indicators]_ are applied to filtering and sorting them and
in turn the spectrum of [denoted] things allocated in this way is structured
into groups and subgroups which are then given, according to other set of
rules, shorter or longer names. These constitute facets of [potential]
meanings of a word.

So there are at least _four_ sets of conditions [structuring] dictionaries.
One is required to delimit an archive[corpus of texts], one to select and give
preference[weights] to occurrences of a word, another to cluster them, and yet
another to abstract[generalize] the subject-matter of each of these clusters.
Needless to say, this is a craft of a few and these criteria are rarely being
disclosed, despite their impact on research, and more generally, their
influence as conditions for production[making] of a so called _common sense_.

It doesn't take that much to reimagine what a dictionary is and what it could
be, especially having large specialized corpora of texts at hand. These can
also serve as aids in production of new words and new meanings.

### (H) The structure: words as knowledge and the
world[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=3
"Edit section: $H$ The structure: words as knowledge and the world")]

* [![](/images/thumb/0/02/Boethius_Porphyrys_Isagoge.jpg/120px-Boethius_Porphyrys_Isagoge.jpg)](/File:Boethius_Porphyrys_Isagoge.jpg)

Boethius's rendering of a classification tree described in Porphyry's Isagoge
(3th c.), [6th c.] 10th c.
⁴(http://www.e-codices.unifr.ch/en/sbe/0315/53/medium)

* [![](/images/thumb/d/d0/Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg/94px-Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)](/File:Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , London, 1728, p. II. ⁵(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0015&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/d/d6/Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg/116px-Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)](/File:Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)

Système figuré des connaissances humaines, _Encyclopédie ou Dictionnaire
raisonné des sciences, des arts et des métiers_ , 1751.
⁶(http://encyclopedie.uchicago.edu/content/syst%C3%A8me-figur%C3%A9-des-
connaissances-humaines)

* [![](/images/thumb/9/96/Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg/96px-Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)](/File:Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)

Haeckel - Darwin's tree.

Another _formalized_ and [internalized] process being at play when figuring
out a word is its [containment]. Word is not only structured by way of things
it potentially denotes but also by words it is potentially part of and those
it contains.

The fuzz around categorization of knowledge _and_ the world in the Western
thought can be traced back to Porphyry, if not further. In his introduction to
Aristotle's _Categories_ this 3rd century AD Neoplatonist began expanding the
notions of genus and species into their hypothetic consequences. Aristotle's
brief work outlines ten categories of 'things that are said' (legomena,
λεγόμενα), namely substance (or substantive, {not the same as matter!},
οὐσία), quantity (ποσόν), qualification (ποιόν), a relation (πρός), where
(ποῦ), when (πότε), being-in-a-position (κεῖσθαι), having (or state,
condition, ἔχειν), doing (ποιεῖν), and being-affected (πάσχειν). In his
different work, _Topics_ , Aristotle outlines four kinds of subjects/materials
indicated in propositions/problems from which arguments/deductions start.
These are a definition (όρος), a genus (γένος), a property (ἴδιος), and an
accident (συμβεβηϰόϛ). Porphyry does not explicitly refer _Topics_ , and says
he omits speaking "about genera and species, as to whether they subsist (in
the nature of things) or in mere conceptions only"
⁸(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C1),
which means he avoids explicating whether he talks about kinds of concepts or
kinds of things in the sensible world. However, the work sparked confusion, as
the following passage [suggests]:

> "[I]n each category there are certain things most generic, and again, others
most special, and between the most generic and the most special, others which
are alike called both genera and species, but the most generic is that above
which there cannot be another superior genus, and the most special that below
which there cannot be another inferior species. Between the most generic and
the most special, there are others which are alike both genera and species,
referred, nevertheless, to different things, but what is stated may become
clear in one category. Substance indeed, is itself genus, under this is body,
under body animated body, under which is animal, under animal rational animal,
under which is man, under man Socrates, Plato, and men particularly." (Owen
1853,
⁹(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C2))

Porphyry took one of Aristotle's ten categories of the word, substance, and
dissected it using one of his four rhetorical devices, genus. Employing
Aristotle's categories, genera and species as means for logical operations,
for dialectic, Porphyry's interpretation resulted in having more resemblance
to the perceived _structures_ of the world. So they began to bloom.

There were earlier examples, but Porphyry was the most influential in
injecting the _universalist_ version of classification [implying] the figure
of a tree into the [locus] of Aristotle's thought. Knowledge became
monotheistic.

Classification schemes [growing from one point] play a major role in
untangling the format of modern encyclopedia from that of the dictionary
governed by alphabet. Two of the most influential encyclopedias of the 18th
century are cases in the point. Although still keeping 'dictionary' in their
titles, they are conceived not to represent words but knowledge. The [upper-
most] genus of the body was set as the body of knowledge. The English
_Cyclopaedia, or an Universal Dictionary of Arts and Sciences_ (1728) splits
into two main branches: "natural and scientifical" and "artificial and
technical"; these further split down to 47 classes in total, each carrying a
structured list (on the following pages) of thematic articles, serving as
table of contents. The French _Encyclopedia: or a Systematic Dictionary of the
Sciences, Arts, and Crafts_ (1751) [unwinds] from judgement ( _entendement_ ),
branches into memory as history, reason as philosophy, and imagination as
poetry. The logic of containers was employed as an aid not only to deal with
the enormous task of naming and not omiting anything from what is known, but
also for the management of labour of hundreds of writers and researchers, to
create a mechanism for delegating work and the distribution of
responsibilities. Flesh was also more present, in the field research, with
researchers attending workshops and sites of everyday life to annotate it.

The world came forward to unshine the word in other schemes. Darwin's tree of
evolution and some of the modern document classification systems such as
Charles A. Cutter's _Expansive Classification_ (1882) set to classify the
world itself and set the field for what has came to be known as authority
lists structuring metadata in today's computing.

### The structure
(summary)[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=4
"Edit section: The structure $summary$")]

Facetization of meaning and branching of knowledge are both the domain of the
unit of utterance.

While lexicographers[dictionarists] structure thought through multi-layered
processes of abstraction of the written record, knowledge growers dissect it
into hierarchies of [mutually] contained notions.

One seek to describe the word as a faceted list of small worlds, another to
describe the world as a structured lists of words. One play prime in the
domain of epistemology, in what is known, controlling the vocabulary, another
in the domain of ontology, in what is, controlling reality.

Every [word] has its given things, every thing has its place, closer or
further from a single word.

The schism between classifying words and classifying the world implies it is
not possible to construct a universal classification scheme[system]. On top of
that, any classification system of words is bound to a corpus of texts it is
operating upon and any classification system of the world again operates with
words which are bound to a vocabulary[lexicon] which is again bound to a
corpus [of texts]. It doesn't mean it would prevent people from trying.
Classifications function as descriptors of and 'inscriptors' upon the world,
imprinting their authority. They operate from [a locus of] their
corpus[context]-specificity. The larger the corpus, the more power it has on
shaping the world, as far as the word shapes it (yes, I do imply Google here,
for which it is a domain to be potentially exploited).

## (J) The
sequence[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=5
"Edit section: $J$ The sequence")]

The structure-yielding query [of] the single word [shrinks][zuzuje
sa,spresnuje] with preceding and following words. Inquiry proceeds in the flow
that establishes another kind[mode] of relationality, chaining words into the
sequence. While the structuring property of the query brings words apart from
each other, its sequential property establishes continuity and brings these
units into an ordered set.

This is what is responsible for attaching textual figures mentioned earlier
(lists, schemes, tables) to the body of the text. Associations can be also
stated explicitly, by indexing tables and then referring them from a
particular point in the text. The same goes for explicit associations made
between blocks of the text by means of indexed paragraphs, chapters or pages.

From this follows that all utterances point to the following utterance by the
nature of sequential order, and indexing provides means for pointing elsewhere
in the document as well.

A lot can be said about references to other texts. Here, to spare time, I
would refer you to a talk I gave a few months ago and which is online
¹⁰(http://monoskop.org/Talks/Communing_Texts).

This is still the realm of print. What happens with document when it is
digitized?

Digitization breaks a document into units of which each is assigned a numbered
position in the sequence of the document. From this perspective digitization
can be viewed as a total indexation of the document. It is converted into
units rendered for machine operations. This sequentiality is made explicit, by
means of an underlying index.

Sequences and chains are orders of one dimension. Their one-dimensional
ordering allows addressability of each element and [random] access. [Jumps]
between [random] addresses are still sequential, processing elements one at a
time.

## (K) The
index[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=6
"Edit section: $K$ The index")]

* [![](/images/thumb/2/27/Summa_confessorum.1310.jpg/103px-Summa_confessorum.1310.jpg)](/File:Summa_confessorum.1310.jpg)

Summa confessorum [1297-98], 1310.
⁷(http://www.bl.uk/onlinegallery/onlineex/illmanus/roymanucoll/j/011roy000008g11u00002000.html)

[The] sequencing not only weaves words into statements but activates other
temporalities, and _presents occurrences of words from past statements_. As
now when I am saying the word _utterance_ , each time there surface contexts
in which I have used it earlier.

A long quote from Frederick G. Kilgour, _The Evolution of the Book_ , 1998, pp
76-77:

> "A century of invention of various types of indexes and reference tools
preceded the advent of the first subject index to a specific book, which
occurred in the last years of the thirteenth century. The first subject
indexes were "distinctions," collections of "various figurative or symbolic
meanings of a noun found in the scriptures" that "are the earliest of all
alphabetical tools aside from dictionaries." (Richard and Mary Rouse supply an
example: "Horse = Preacher. Job 39: 'Hast thou given the horse strength, or
encircled his neck with whinning?')

>

> [Concordance] By the end of the third decade of the thirteenth century Hugh
de Saint-Cher had produced the first word concordance. It was a simple word
index of the Bible, with every location of each word listed by [its position
in the Bible specified by book, chapter, and letter indicating part of the
chapter]. Hugh organized several dozen men, assigning to each man an initial
letter to search; for example, the man assigned M was to go through the entire
Bible, list each word beginning with M and give its location. As it was soon
perceived that this original reference work would be even more useful if words
were cited in context, a second concordance was produced, with each word in
lengthy context, but it proved to be unwieldy. [Soon] a third version was
produced, with words in contexts of four to seven words, the model for
biblical concordances ever since.

>

> [Subject index] The subject index, also an innovation of the thirteenth
century, evolved over the same period as did the concordance. Most of the
early topical indexes were designed for writing sermons; some were organized,
while others were apparently sequential without any arrangement. By midcentury
the entries were in alphabetical order, except for a few in some classified
arrangement. Until the end of the century these alphabetical reference works
indexed a small group of books. Finally John of Freiburg added an alphabetical
subject index to his own book, _Summa Confessorum_ (1297—1298). As the Rouses
have put it, 'By the end of the [13]th century the practical utility of the
subject index is taken for granted by the literate West, no longer solely as
an aid for preachers, but also in the disciplines of theology, philosophy, and
both kinds of law.'"

In one sense neither subject-index nor concordane are indexes, they are words
or group of words selected according to given criteria from the body of the
text, each accompanied with a list of identifiers. These identifiers are
elements of an index, whether they represent a page, chapter, column, or other
[kind of] block of text. Every identifier is an unique _address_.

The index is thus an ordering of a sequence by means of associating its
elements with a set of symbols, when each element is given unique combination
of symbols. Different sizes of sets yield different number of variations.
Symbol sets such as an alphabet, arabic numerals, roman numerals, and binary
digits have different proportions between the length of a string of symbols
and the number of possible variations it can contain. Thus two symbols of
English alphabet can store 26^2 various values, of arabic numerals 10^2, of
roman numberals 8^2 and of binary digits 2^2.

Indexation is segmentation, a breaking into segments. From as early as the
13th century the index such as that of sections has served as enabler of
search. The more [detailed] indexation the more precise search results it
enables.

The subject-index and concordance are tables of search results. There is a
direct lineage from the 13th-century biblical concordances and the birth of
computational linguistic analysis, they were both initiated and realised by
priests.

During the World War II, Jesuit Father Roberto Busa began to look for machines
for the automation of the linguistic analysis of the 11 million-word Latin
corpus of Thomas Aquinas and related authors.

Working on his Ph.D. thesis on the concept of _praesens_ in Aquinas he
realised two things:

> "I realized first that a philological and lexicographical inquiry into the
verbal system of an author has t o precede and prepare for a doctrinal
interpretation of his works. Each writer expresses his conceptual system in
and through his verbal system, with the consequence that the reader who
masters this verbal system, using his own conceptual system, has to get an
insight into the writer's conceptual system. The reader should not simply
attach t o the words he reads the significance they have in his mind, but
should try t o find out what significance they had in the writer's mind.
Second, I realized that all functional or grammatical words (which in my mind
are not 'empty' at all but philosophically rich) manifest the deepest logic of
being which generates the basic structures of human discourse. It is .this
basic logic that allows the transfer from what the words mean today t o what
they meant to the writer.

>

> In the works of every philosopher there are two philosophies: the one which
he consciously intends to express and the one he actually uses to express it.
The structure of each sentence implies in itself some philosophical
assumptions and truths. In this light, one can legitimately criticize a
philosopher only when these two philosophies are in contradiction."
¹¹(http://www.alice.id.tue.nl/references/busa-1980.pdf)

Collaborating with the IBM in New York from 1949, the work, a concordance of
all the words of Thomas Aquinas, was finally published in the 1970s in 56
printed volumes (a version is online since 2005
¹²(http://www.corpusthomisticum.org/it/index.age)). Besides that, an
electronic lexicon for automatic lemmatization of Latin words was created by a
team of ten priests in the scope of two years (in two phases: grouping all the
forms of an inflected word under their lemma, and coding the morphological
categories of each form and lemma), containing 150,000 forms
¹³(http://www.alice.id.tue.nl/references/busa-1980.pdf#page=4). Father
Busa has been dubbed the father of humanities computing and recently also of
digital humanities.

The subject-index has a crucial role in the printed book. It is the only means
for search the book offers. Subjects composing an index can be selected
according to a classification scheme (specific to a field of an inquiry), for
example as elements of a certain degree (with a given minimum number of
subclasses).

Its role seemingly vanishes in the digital text. But it can be easily
transformed. Besides serving as a table of pre-searched results the subject-
index also gives a distinct idea about content of the book. Two patterns give
us a clue: numbers of occurrences of selected words give subjects weights,
while words that seem specific to the book outweights other even if they don't
occur very often. A selection of these words then serves as a descriptor of
the whole text, and can be thought of as a specific kind of 'tags'.

This process was formalized in a mathematical function in the 1970s, thanks to
a formula by Karen Spärck Jones which she entitled 'inverse document
frequency' (IDF), or in other words, "term specificity". It is measured as a
proportion of texts in the corpus where the word appears at least once to the
total number of texts. When multiplied by the frequency of the word _in_ the
text (divided by the maximum frequency of any word in the text), we get _term
frequency-inverse document frequency_ (tf-idf). In this way we can get an
automated list of subjects which are particular in the text when compared to a
group of texts.

We came to learn it by practice of searching the web. It is a mechanism not
dissimilar to thought process involved in retrieving particular information
online. And search engines have it built in their indexing algorithms as well.

There is a paper proposing attaching words generated by tf-idf to the
hyperlinks when referring websites ¹⁴(http://bscit.berkeley.edu/cgi-
bin/pl_dochome?query_src=&format=html&collection=Wilensky_papers&id=3&show_doc=yes).
This would enable finding the referred content even after the link is dead.
Hyperlinks in references in the paper use this feature and it can be easily
tested: ¹⁵(http://www.cs.berkeley.edu/~phelps/papers/dissertation-
abstract.html?lexical-
signature=notemarks+multivalent+semantically+franca+stylized).

There is another measure, cosine similarity, which takes tf-idf further and
can be applied for clustering texts according to similarities in their
specificity. This might be interesting as a feature for digital libraries, or
even a way of organising library bottom-up into novel categories, new
discourses could emerge. Or as an aid for researchers to sort through texts,
or even for editors as an aid in producing interesting anthologies.

## Final
remarks[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=7
"Edit section: Final remarks")]

1

New disciplines emerge all the time - most recently, for example, cultural
techniques, software studies, or media archaeology. It takes years, even
decades, before they gain dedicated shelves in libraries or a category in
interlibrary digital repositories. Not that it matters that much. They are not
only sites of academic opportunities but, firstly, frameworks of new
perspectives of looking at the world, new domains of knowledge. From the
perspective of researcher the partaking in a discipline involves negotiating
its vocabulary, classifications, corpus, reference field, and specific
terms[subjects]. Creating new fields involves all that, and more. Even when
one goes against all disciplines.

2

Google can still surprise us.

3

Knowledge has been in the making for millenia. There have been (abstract)
mechanisms established that govern its conditions. We now possess specialized
corpora of texts which are interesting enough to serve as a ground to discuss
and experiment with dictionaries, classifications, indexes, and tools for
references retrieval. These all belong to the poetic devices of knowledge-
making.

4

Command-line example of tf-idf and concordance in 3 steps.

* 1\. Process the files text.1-5.txt and produce freq.1-5.txt with lists of (nonlemmatized) words (in respective texts), ordered by frequency:

> for i in {1..5}; do tr '[A-Z]' '[a-z]' < text.$i.txt | tr -c '[a-z]'
'[\012*]' | tr -d '[:punct:]' | sort | uniq -c | sort -k 1nr | sed '1,1d' >
temp.txt; max=$(awk -vvar=1 -F" " 'NR

1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx2)), $2}' > freq.$i.txt; done && rm temp.txt

2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR
line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR
line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR
line {print $field}' tfidf.$j.txt); egrep -i
"[alpha:](/index.php?title=Alpha:&action=edit&redlink=1 "Alpha: $page does
not exist$") $word" occ.$j.txt >> occ.txt; done; done

Dušan Barok

_Written 23 October - 1 November 2014 in Bratislava and Stuttgart._

Sollfrank, Francke & Weinmayr
Piracy Project
2013

Giving What You Don't Have

Andrea Francke, Eva Weinmayr
Piracy Project

Birmingham, 6 December 2013

[00:12]
Eva Weinmayr: When we talk about the word piracy, it causes a lot of problems
to quite a few institutions to deal with it. So events that we’ve organised
have been announced by Central Saint Martins without using the word piracy.
That’s interesting, the problems it still causes…

Cornelia Sollfrank: And how do you announce the project without “Piracy”? The
Project?

E. W.: It’s a project about intellectual property.

C. S.: The P Project.

Andrea Francke, Eva Weinmayr: [laugh] Yes.

[00:52]
Andrea Francke: The Piracy Project is a knowledge platform, and it is based
around a collection of pirated books, of books that have been copied by
people. And we use it to raise discussion about originality, authorship,
intellectual property questions, and to produce new material, new essays and
new questions.

[01:12]
E. W.: So the Piracy Project includes several aspects. One is that it is an
act of piracy in itself, because it is located in an art school, in a library,
in an officially built up a collection of pirated books. [01:30] So that’s the
second aspect, it’s a collection of books which have been copied,
appropriated, modified, improved, which live in this library. [01:40] And the
third part is that it is a collection of physical books, which is touring. We
create reading rooms and invite people to explore the books and discuss issues
raised by cultural piracy.
[01:58] The Piracy Project started in an art college library, which was
supposed to be closed down. And the Piracy Project is one project of And
Publishing. And Publishing is a publishing activity exploring print-on-demand
and new modes of production and of dissemination, the immediacy of
dissemination. [02:20] And Publishing is a collaboration between myself and
Lynn Harris, and we were hosted by Central Saint Martins College of Art and
Design in London. And the campus where this library was situated was the
campus we were working at. [02:40] So when the library was being closed, we
moved in the library together with other members of staff, and kept the
library open in a self-organised way. But we were aware that there’s no budget
to buy new books, and we wanted to have this as a lively space, so we created
an open call for submissions and we asked people to select a book which is
really important to them and make a copy of it. [03:09] So we weren’t
interested in piling up a collection of second hand books, we were really
interested in this process: what happens when you make a copy of a book, and
how does this copy sit next to the original authoritative copy of the book.
This is how it started.

[03:31]
A. F.: I met Eva at the moment when And Publishing was helping to set up this
new space in the library, and they were trying to think how to make the
library more alive inside that university. [03:44] And I was doing research on
Peruvian book piracy at that time, and I had found this book that was modified
and was in circulation. And it was a very exciting moment for us to think what
happens if we can promote this type of production inside this academic
library.

[04:05] Piracy Project
Collection / Reading Room / Research

[04:11]
The Collection

[04:15]
E. W.: We asked people to make a copy of a book which is important to them and
send it to us, and so with these submission we started to build up the
collections. Lots of students were getting involved, but also lots of people
who work in this topic, and were interested in these topics. [04:38] So we
received about one hundred books in a couple of months. And then, parallel to
this, we started to do research ourselves. [04:50] We had a residency in
China, so we went to China, to Beijing and Shanghai, to meet illegal
booksellers of pirated architecture books. And we had a residency in Turkey,
in Istanbul, where we did lots of interviews with publishers and artists on
book piracy. [05:09] So the collection is a mix of our own research and cases
from the real book markets, and creative work, artistic work which is produced
in the context of an art college and the wider cultural realm.

[05:29]
A. F.: And it is an ongoing project.

E. W.: The project is ongoing, we still receive submissions. The collection is
growing, and at the moment here we have about 180 books, here at Grand Union
(Birmingham).

[05:42]
A. F.: When we did the open call, something that was really important to us
was to make clear for people that they have a space of creativity when they
are making a copy. So we wrote, please send us a copy of a book, and be aware
that things happen when you copy a book. [05:57] Whether you do it
intentionally or not a copy is never the same. So you can use that space, take
ownership of that space and make something out of that; or you can take a step
back and allow things to happen without having control. And I think that is
something that is quite important for us in the project. [06:12] And it is
really interesting how people have embraced that in different measures, like
subtle things, or material things, or adding text, taking text out, mixing
things, judging things. Sometimes just saying, I just want it to circulate, I
don’t mind what happens in the space, I just want the subject to be in the
world again.

[06:35]
E. W.: I think this is one which I find interesting in terms of making a copy,
because it’s not so much about my own creativity, it’s more about exploring
how technology edits what you can see. It’s Jan van Toorn’s Critical Practice,
and the artist is Hester Barnard, a Canadian artist. [07:02] She sent us these
three copies, and we thought, that’s really generous, three copies. But they
are not identical copies, they are very different. Some have a lot of empty
pages in the book. And this book has been screen-captured on a 3.5 inch
iPhone, whereas this book has been screen-captured on a desktop, and this one
has been screen-captured with a laptop. [07:37] So the device you use to
access information online determines what you actually receive. And I find
this really interesting, that she translated this back into a hardcopy, the
online edited material. [07:53] And this is kind of taught by this book,
standard International Copyright. She went to Google Books, and screen-
captured all the pages Google Books are showing. So we are all familiar with
blurry text pages, but then it starts that you get the message “Page 38 is not
shown in this preview.” [08:18] And then it’s going through the whole book, so
she printed every page basically, omitting the actual information. But the
interesting thing is that we are all aware that this is happening on Google,
on screen online, but the fact that she’s translating this back into an
object, into a printed book, is interesting.

[08:44]
Reading Room

[08:48]
A. F.: We create these reading rooms with the collection as a way to tour the
collection, and meet people and have conversations around the books. And that
is something quite important to us, that we go with the physical books to a
place, either for two or three months, and meet different people that have
different interests in relation to the collection in that locality. We’ve been
doing that for the last two years, I think, three years. [09:12] And it’s
quite interesting because different places have very different experiences of
piracy. So you can go to a country where piracy is something very common, or a
different place where people have a very strong position against piracy, or a
different legal framework. And I feel the type of conversations and the
quality of interactions is quite different from being present on the space and
with the books. [09:36] And that’s why we don’t call these exhibitions,
because we always have places where people can come and they can stay, and
they can come again. Sometimes people come three or four times and they
actually read the books. And a few times they go back to their houses and they
bring books back, and they said, I’m going to contact this friend who has been
to Russia and he told me about this book – so we can add it to the collection.
I think that makes a big difference to how the research in the project
functions.

[10:06]
E. W.: One of the most interesting events we did with the Piracy collection
was at the Show Room where we had a residency for the last year. There were
three events, and one was A Day At The Courtroom. This was an afternoon where
we invited three copyright lawyers coming from different legal systems: the
US, the UK, and the Continental European, Athens. And we presented ten
selected cases from the collection and the three copyright lawyers had to
assess them in the eyes of the law, and they had to agree where to put this
book in a scale from legal to illegal. [10:51] So we weren’t interested really
to say, this is legal and this is illegal, we were interested in all the
shades in between. And then they had to discuss where they would place the
book. But then the audience had the last verdict, and then the audience placed
the book. [11:05] And this was an extremely interesting discussion, because it
was interesting to see how different the legal backgrounds are, how blurry the
whole field is, how you can assess when is the moment where a work becomes a
transformative work, or when it stays a derivative work, and this whole
discussion.
[11:30] When we do these reading rooms – and we had one in New York, for
example, at the New York Art Book Fair – people are coming, and they are
coming to see the physical books in a physical space, so this creates a social
encounter and we have these conversations. [11:47] For example, a woman stood
up to us in New york and she told us about a piracy project she run where she
was working in a juvenile detention centre, and she produced a whole shadow
library of books because the incarcerated kids couldn’t take the books in
their cells, so she created these copies, individual chapters, and they could
circulate. [12:20] I’m telling this because the fact that we are having this
reading room and that we are meeting people, and that we are having these
conversations, really furthers our research. We find out about these projects
by sharing knowledge.

[12:38]
Categories

[12:42]
A. F.: Whenever we set our reading room for the Piracy Project we need to
organise the books in a certain way. What we started to do now is that we’ve
created these different categories, and the first set of categories came from
the legal event. [12:56] So we set up, we organised the books in different
categories that would help us have questions for the lawyers, that would work
for groups of books instead of individual works. [13:07] And the idea is that,
for example, we are going to have our next events with librarians, and a new
set of categories would come. So the categories change as our interest or
research in the project is changing. [13:21] The current categories are:
Pirated Design, so books where the look of the book has been copied but not
the content; recirculation, books that have been copied trying to be
reproduced exactly as they were, because they need to be circulating again;
transformation, books that have been modified; For Sale Doctrine, so we
receive quite a few books where people haven’t actually made a copy but they
have cut the book or drawn inside the book, and legally you are allowed to do
anything with a book except copy it, so we thought that it was quite important
so that we didn’t have to discuss that with the lawyers; [14:03] Public
Domain, which are works that are already out of copyright, again, so whatever
you do with those books is legal; and collation, books gathered from different
sources, and who owns the copyright, which was a really interesting question,
which is when you have a book that has many authors – it’s really interesting.
Different systems in different countries have different ways to deal with who
owns the copyright and what are the rights of the owners of the different
works.

[14:36]
E. W.: Ahmet Şık is a journalist who published a book about the Ergenekon
scandal and the Turkish government, and connects that kind of mafioso
structures. Before the book could be published he was arrested and put in jail
for a whole year without trial, and he sent the PDF to friends, and the PDF
was circulating on many different computers so it couldn’t be taken. [15:06]
They published the PDF, and as authors they put over a hundred different
author names, so there was not just one author who could be taken into
responsibility.

[15:22] We have in the collection this book, it’s Teignmouth Electron by
Tacita Dean. This is the original, it’s published by Book Works and Steidl.
And to this round table, to this event, we invited also Jane Rolo, director of
Book Works (and she published this book). [15:41] And we invited her saying,
do you know that your book has been pirated? So she was really interested and
she came along. This is the pirated version, it’s Alias, [by] Damián Ortega in
Mexico. It’s a series of books where he translates texts and theory into
Spanish, which are not available in Spanish. So it’s about access, it’s about
circulation. [16:07] But actually he redesigned the book. The pirated version
looks very different, and it has a small film roll here, from Tacita Dean’s
book. And it was really amazing that Jane Rolo flipped the pirated book and
she said, well, actually this is really very nice.

[16:31] This is kind of a standard academic publishing format, it’s Gilles
Deleuze’s Proust and Signs, and the contributor, the artist who produced the
book is Neil Chapman, a writer based in London. And he made a facsimile of his
copy of this book, including the binding mistakes – so there’s one chapter
upside down printed in the book. [17:04] But the really interesting thing is
that he scanned it on his home inkjet printer – he scanned it on his scanner
and then printed it on his home inkjet printer. And the feel of it is very
crafty, because the inkjet has a very different typographic appearance than
the official copy. [17:28] And this makes you read the book in quite a
different way, you relate differently to the actual text. So it’s not just
about the information conveyed on this page, it’s really about how I can
relate to it visually. I find this really interesting when we put this book
into the library, in our collection in the library, and it sat next to the
original, [17:54] it raises really interesting questions about what kind of
authority decides which book can access the library, because this is
definitely and obviously a self-made copy – so if this self-made copy can
enter the library, any self-made text and self-published copy could enter the
library. So it was raising really interesting questions about gatekeepers of
knowledge, and hierarchies and authorities.

[18:26]
On-line catalogue

[18:30]
E. W.: We created this online catalogue give to an overview of what we have in
the collection. We have a cover photograph and then we have a short text where
we try to frame and to describe the approach taken, like the strategy, what’s
been pirated and what was the strategy. [18:55] And this is quite a lot,
because it’s giving you the framework of it, the conceptual framework. But
it’s not giving you the book, and this is really important because lots of the
books couldn’t be digitised, because it’s exactly their material quality which
is important, and which makes the point. [19:17] So if I would… if I have a
project which is working about mediation, and then I put another layer of
mediation on top of it by scanning it, it just wouldn’t work anymore.
[19:29] The purpose of the online catalogue isn’t to give you insight into all
the books to make actually all the information available, it’s more to talk
about the approach taken and the questions which are raised by this specific
book.

[19:47]
Cultures of the copy

[19:51]
A topic of cultural difference became really obvious when we went to Istanbul.
A copy shop which had many academic titles on the shelves, copied, pirated
titles... The fact is that in London, where I’m based, you can access anything
in any library, and it’s not too expensive to get the original book. [20:27]
But in Istanbul it’s very expensive, and the whole academic community thrives
on pirated, copied academic titles.

[20:39]
A. F.: So this is the original Jaime Bayly [No se lo digas a nadie], and this
is the pirated copy of the Jaime Bayly. This book is from Peru, it was bought
on the street, on a street market. [20:53] And Peru has a very big pirated
book market, most books in Peru are pirated. And we found this because there
was a rumour that books in Peru had been modified, pirated books. And this
version, the pirated version, has two extra chapters that are not in the
original one. [21:13] It’s really hard to understand the motivation behind it.
There’s no credit, so the person is inhabiting this author’s identity in a
sense. They are not getting any cultural capital from it. They are not getting
extra money, because if they are found out, nobody would buy books from this
publisher anymore. [21:33] The chapters are really well written, so you as a
reader would not realise that you are reading something that has been pirated.
And that was really fascinating in terms of what space you create. So when you
have this technology that allows you to have the book open and print it so
easily – how you can you take advantage of that, and take ownership or inhabit
these spaces that technology is opening up for you.

[22:01]
E. W.: Book piracy in China is really important when it comes to architecture
books, Western architecture books. Lots of architecture studios, but even
university libraries would buy from pirate book sellers, because it’s just so
much cheaper. [22:26] And we’ve found this Mark magazine with one of the
architecture sellers, and it’s supposed to be a bargain because you have six
magazines in one. [22:41] And we were really interested in the question, what
are the criteria for the editing? How do you edit six issues into one? But
basically everything is in here, from advertisement, to text, to images, it’s
all there. But then a really interesting question arises when it comes to
technology, because in this magazine there are pages in Italian language
clearly taken from other magazines.

[23:14]
A. F.: But it was also really interesting to go there, and actually interview
the distributor and go through the whole experience. We had to meet the
distributor in a neutral place, and he interviewed us to see if he was going
to allow us to go into the shop and buy his books. [23:31] And then going
through the catalogue and realising how Rem Koolhaas is really popular among
the pirates, but actually Chinese architecture is not popular, so there’s only
like three pirated books on Chinese architecture; or that from all the
architecture universities in the world only the AA books are copied – the
Architectural Association books. [23:51] And I think those small things are
really things that are worth spending time and reflecting on.

[23:58]
E. W.: We found this pirate copy of Tintin when we visited Beijing, and
obviously compared to the original, it looks different, a different format.
But also it’s black and white, but it’s not a photocopy of the original full-
colour. [24:23] It’s redrawn by hand, so all the drawings are redrawn and
obviously translated into Chinese. This is quite a labour of love, which is
really amazing. I can compare the two. The space is slightly differently
interpreted.

[24:50]
A. F.: And it’s really incredible, because at some point in China there were
14 or 15 different publishers publishing Tintin, and they all have their
versions. They are all hand-drawn by different people, so in the back, in
Chinese, it’s the credit. So you can buy it by deciding which person does the
best drawings of the production of Tintin, which I thought it was really…
[25:14] It’s such a different cultural way to actually give credit to the
person that is copying it, and recognise the labour, and the intention and the
value of that work.

[25:24]
Why books?

[25:28]
E. W.: Books have always been very important in my practice, in my artistic
practice, because lots of my projects culminated in a book, or led into a
book. And publications are important because they can circulate freely, they
can circulate much easier than artworks in a gallery. [25:50] So this question
of how to make things public and how to create an audience… not how to create
an audience – how to reach a reader and how to create a dialogue. So the book
is the perfect tool for this.

[26:04]
A. F.: My interest in books comes from making art, or thinking about art as a
way to interact with the world, so outside art settings, and I found books
really interesting in that. And that’s how I met Eva, in a sense, because I
was interested in that part of her practice. [26:26] When I found the Jaime
Bayly book, for me that was a real moment of excitement, of this person that
was doing this things in the world without taking any credit, but was having
such a profound effect on so many readers. I’m quite fascinated by that.
[26:44] I'm also really interested in research and using events – research
that works with people. So it kind of creates communities around certain
subjects, and then it uses that to explore different issues and to interact
with different areas of knowledge. And I think books are a privileged space to
do that.

[27:11]
E. W.: The books in the Piracy collection, because they are objects you can
grab, and because they need a place, they are a really important tool to start
a dialogue. When we had this reading room in the New York Art Book Fair, it
was really the book that created this moment when you started a conversation
with somebody else. And I think this is a very important moment in the Piracy
collection as a tool to start this discussion. [27:44] In the Piracy
collection the books are not so important to circulate, because they don’t
circulate. They only travel with us, in a way, or they travel here to Grand
Union to be installed in this reading room. But they are not meant to be
printed in a thousands print run and circulated in the world.

C. S.: So what is their function?

[28:08]
E. W.: The functions of the books here in the Piracy collection are to create
a dialogue, debate about these issues they are raising, and they are a tool
for a direct encounter, for a social encounter. As Andrea said, building a
community which is debating these issues which they are raising. [28:32] And I
also find it really interesting – when we where in China we also talked with
lots of publishers and artists, and they said that the book, in comparison to
an online file, is a really important tool in China, because it can’t be
controlled as easily as online communication. [28:53] So a book is an
autonomous object which can be passed on from one hand to the other, without
the state or another authority to intervene. I think that is an important
aspect when you talk about books in comparison with circulating information
online.

[29:13]
Passion for piracy

[29:17]
A. F.: I’m quite interested in enclosures, and people that jump those
enclosures. I’m kind of interested in these imposed… Maybe because I come from
Peru and we have a different relation to rules, and I’m in Britain where rules
seem to have so much strength. And I’m quite interested in this agency of
taking personal responsibility and saying, I’m going to obey this rule, I’m
not going to obey this one, and what does that mean. [29:42] That makes me
really interested in all these different strategies, and also to find a way to
value them and show them – how when you make this decision to jump a rule, you
actually help bring up questions, modifications, and propose new models or new
ways about thinking things. [30:02] And I think that is something that is part
of all the other projects that I do: stating the rules and the people that
break them.

[30:12]
E. W.: The pirate as a trickster who tries to push the boundaries which are
being set. And I think the interesting, or the complex part of the Piracy
Project is that we are not saying, I’m for piracy or I’m against piracy, I’m
for copyright, I’m against copyright. It’s really about testing out these
decisions and the own boundaries, the legal boundaries, the moral limits – to
push them and find them. [30:51] I mean, the Piracy Project as a whole is a
project which is pushing the boundaries because it started in this academic
library, and it’s assessed by copyright lawyers as illegal, so to run such a
project is an act of piracy in itself.

[31:17]
This method of doing or approaching this art project is to create a
collaboration to instigate this discourse, and this discourse is happening on
many different levels. One of them is conversation, debate. But the other one
is this material outcome, and then this material outcome is creating a new
debate.

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.

line {print $field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR

line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR