Barok
Techniques of Publishing
2014
Techniques of Publishing
Draft translation of a talk given at the seminar Informace mezi komoditou a komunitou [The Information Between Commodity and Community] held at Tranzitdisplay in Prague, Czech Republic, on May 6, 2014
My contribution has three parts. I will begin by sketching the current environment of publishing in general, move on to some of the specificities of publishing
in the humanities and art, and end with a brief introduction to the Monoskop
initiative I was asked to include in my talk.
I would like to thank Milos Vojtechovsky, Matej Strnad and CAS/FAMU for
the invitation, and Tranzitdisplay for hosting this seminar. It offers itself as an
opportunity for reflection for which there is a decent distance from a previous
presentation of Monoskop in Prague eight years ago when I took part in a new
media education workshop prepared by Miloš and Denisa Kera. Many things
changed since then, not only in new media, but in the humanities in general,
and I will try to articulate some of these changes from today’s perspective and
primarily from the perspective of publishing.
I. The Environment of Publishing
One change, perhaps the most serious, and which indeed relates to the humanities
publishing as well, is that from a subject that was just a year ago treated as a paranoia of a bunch of so called technological enthusiasts, is today a fact with which
the global public is well acquainted: we are all being surveilled. Virtually every
utterance on the internet, or rather made by means of the equipment connected
to it through standard protocols, is recorded, in encrypted or unencrypted form,
on servers of information agencies, besides copies of a striking share of these data
on servers of private companies. We are only at the beginning of civil mobilization towards reversal of the situation and the future is open, yet nothing suggests
so far that there is any real alternative other than “to demand the impossible.”
There are at least two certaintes today: surveillance is a feature of every communication technology controlled by third parties, from post, telegraphy, telephony
to internet; and at the same time it is also a feature of the ruling power in all its
variants humankind has come to know. In this regard, democracy can be also understood as the involvement of its participants in deciding on the scale and use of
information collected in this way.
I mention this because it suggests that also all publishing initiatives, from libraries,
through archives, publishing houses to schools have their online activities, back1
ends, shared documents and email communication recorded by public institutions–
which intelligence agencies are, or at least ought to be.
In regard to publishing houses it is notable that books and other publications today are printed from digital files, and are delivered to print over email, thus it is
not surprising to claim that a significant amount of electronically prepared publications is stored on servers in the public service. This means that besides being
required to send a number of printed copies to their national libraries, in fact,
publishers send their electronic versions to information agencies as well. Obviously, agencies couldn’t care less about them, but it doesn’t change anything on
the likely fact that, whatever it means, the world’s largest electronic repository of
publications today are the server farms of the NSA.
Information agencies archive publications without approval, perhaps without awareness, and indeed despite disapproval of their authors and publishers, as an
“incidental” effect of their surveillance techniques. This situation is obviously
radically different from a totalitarianism we got to know. Even though secret
agencies in the Eastern Bloc were blackmailing people to produce miserable literature as their agents, samizdat publications could at least theoretically escape their
attention.
This is not the only difference. While captured samizdats were read by agents of
flesh and blood, publications collected through the internet surveillance are “read”
by software agents. Both of them scan texts for “signals”, ie. terms and phrases
whose occurrences trigger interpretative mechanisms that control operative components of their organizations.
Today, publishing is similarly political and from the point of view of power a potentially subversive activity like it was in the communist Czechoslovakia. The
difference is its scale, reach and technique.
One of the messages of the recent “revelations” is that while it is recommended
to encrypt private communication, the internet is for its users also a medium of
direct contact with power. SEO, or search engine optimization, is now as relevant technique for websites as for books and other publications since all of them
are read by similar algorithms, and authors can read this situation as a political
dimension of their work, as a challenge to transform and model these algorithms
by texts.
2
II. Techniques of research in the humanities literature
Compiling the bibliography
Through the circuitry we got to the audience, readers. Today, they also include
software and algorithms such as those used for “reading” by information agencies
and corporations, and others facilitating reading for the so called ordinary reader,
the reader searching information online, but also the “expert” reader, searching
primarily in library systems.
Libraries, as we said, are different from information agencies in that they are
funded by the public not to hide publications from it but to provide access to
them. A telling paradox of the age is that on the one hand information agencies
are storing almost all contemporary book production in its electronic version,
while generally they absolutely don’t care about them since the “signal” information lies elsewhere, and on the other in order to provide electronic access, paid or
direct, libraries have to costly scan also publications that were prepared for print
electronically.
A more remarkable difference is, of course, that libraries select and catalogize
publications.
Their methods of selection are determined in the first place by their public institutional function of the protector and projector of patriotic values, and it is reflected
in their preference of domestic literature, ie. literature written in official state languages. Methods of catalogization, on the other hand, are characterized by sorting
by bibliographic records, particularly by categories of disciplines ordered in the
tree structure of knowledge. This results in libraries shaping the research, including academic research, towards a discursivity that is national and disciplinary, or
focused on the oeuvre of particular author.
Digitizing catalogue records and allowing readers to search library indexes by their
structural items, ie. the author, publisher, place and year of publication, words in
title, and disciplines, does not at all revert this tendency, but rather extends it to
the web as well.
I do not intend to underestimate the value and benefits of library work, nor the
importance of discipline-centered writing or of the recognition of the oeuvre of
the author. But consider an author working on an article who in the early phase
of his research needs to prepare a bibliography on the activity of Fluxus in central Europe or on the use of documentary film in education. Such research cuts
through national boundaries and/or branches of disciplines and he is left to travel
not only to locate artefacts, protagonists and experts in the field but also to find
literature, which in turn makes even the mere process of compiling bibliography
relatively demanding and costly activity.
3
In this sense, the digitization of publications and archival material, providing their
free online access and enabling fulltext search, in other words “open access”, catalyzes research across political-geographical and disciplinary configurations. Because while the index of the printed book contains only selected terms and for
the purposes of searching the index across several books the researcher has to have
them all at hand, the software-enabled search in digitized texts (with a good OCR)
works with the index of every single term in all of them.
This kind of research also obviously benefits from online translation tools, multilingual case bibliographies online, as well as second hand bookstores and small
specialized libraries that provide a corrective role to public ones, and whose “open
access” potential has been explored to the very small extent until now, but which
I won’t discuss here further for the lack of time.
Writing
The disciplinarity and patriotism are “embedded” in texts themselves, while I repeat that I don’t say this in a pejorative way.
Bibliographic records in bodies of texts, notes, attributions of sources and appended references can be read as formatted addresses of other texts, making apparent a kind of intertextual structure, well known in hypertext documents. However, for the reader these references are still “virtual”. When following a reference
she is led back to a library, and if interested in more references, to more libraries.
Instead, authors assume certain general erudition of their readers, while following references to their very sources is perceived as an exception from the standard
self-limitation to reading only the body of the text. Techniques of writing with
virtual bibliography thus affirm national-disciplinary discourses and form readers
and authors proficient in the field of references set by collections of local libraries
and so called standard literature of fields they became familiar with during their
studies.
When in this regime of writing someone in the Czech Republic wants to refer to
the work of Gilbert Simondon or Alexander Bogdanov, to give an example, the
effect of his work will be minimal, since there was practically nothing from these
authors translated into Czech. His closely reading colleague is left to try ordering
books through a library and wait for 3-4 weeks, or to order them from an online
store, travel to find them or search for them online. This applies, in the case of
these authors, for readers in the vast majority of countries worldwide. And we can
tell with certainty that this is not only the case of Simondon and Bogdanov but
of the vast majority of authors. Libraries as nationally and pyramidally situated
institutions face real challenges in regard to the needs of free research.
This is surely merely one aspect of techniques of writing.
4
Reading
Reading texts with “live” references and bibliographies using electronic devices is
today possible not only to imagine but to realise as well. This way of reading
allows following references to other texts, visual material, other related texts of
an author, but also working with occurrences of words in the text, etc., bringing
reading closer to textual analysis and other interesting levels. Due to the time
limits I am going to sketch only one example.
Linear reading is specific by reading from the beginning of the text to its end,
as well as ‘tree-like’ reading through the content structure of the document, and
through occurrences of indexed words. Still, techniques of close reading extend
its other aspect – ‘moving’ through bibliographic references in the document to
particular pages or passages in another. They make the virtual reference plastic –
texts are separated one from another merely by a click or a tap.
We are well familiar with a similar movement through the content on the web
– surfing, browsing, and clicking through. This leads us to an interesting parallel: standards of structuring, composing, etc., of texts in the humanities has been
evolving for centuries, what is incomparably more to decades of the web. From
this stems also one of the historical challenges the humanities are facing today:
how to attune to the existence of the web and most importantly to epistemological consequences of its irreversible social penetration. To upload a PDF online is
only a taste of changes in how we gain and make knowledge and how we know.
This applies both ways – what is at stake is not only making production of the
humanities “available” online, it is not only about open access, but also about the
ways of how the humanities realise the electronic and technical reality of their
own production, in regard to the research, writing, reading, and publishing.
Publishing
The analogy between information agencies and national libraries also points to
the fact that large portion of publications, particularly those created in software,
is electronic. However the exceptions are significant. They include works made,
typeset, illustrated and copied manually, such as manuscripts written on paper
or other media, by hand or using a typewriter or other mechanic means, and
other pre-digital techniques such as lithography, offset, etc., or various forms of
writing such as clay tablets, rolls, codices, in other words the history of print and
publishing in its striking variety, all of which provide authors and publishers with
heterogenous means of expression. Although this “segment” is today generally
perceived as artists’ books interesting primarily for collectors, the current process
of massive digitization has triggered the revival, comebacks, transformations and
5
novel approaches to publishing. And it is these publications whose nature is closer
to the label ‘book’ rather than the automated electro-chemical version of the offset
lithography of digital files on acid-free paper.
Despite that it is remarkable to observe a view spreading among publishers that
books created in software are books with attributes we have known for ages. On
top of that there is a tendency to handle files such as PDFs, EPUBs, MOBIs and
others as if they are printed books, even subject to the rules of limited edition, a
consequence of what can be found in the rise of so called electronic libraries that
“borrow” PDF files and while someone reads one, other users are left to wait in
the line.
Whilst, from today’s point of view of the humanities research, mass-printed books
are in the first place archives of the cultural content preserved in this way for the
time we run out of electricity or have the internet ‘switched off’ in some other
way.
III. Monoskop
Finally, I am getting to Monoskop and to begin with I am going to try to formulate
its brief definition, in three versions.
From the point of view of the humanities, Monoskop is a research, or questioning, whose object’s nature renders no answer as definite, since the object includes
art and culture in their widest sense, from folk music, through visual poetry to
experimental film, and namely their history as well as theory and techniques. The
research is framed by the means of recording itself, what makes it a practise whose
record is an expression with aesthetic qualities, what in turn means that the process of the research is subject to creative decisions whose outcomes are perceived
esthetically as well.
In the language of cultural management Monoskop is an independent research
project whose aim is subject to change according to its continual findings; which
has no legal body and thus as organisation it does not apply for funding; its participants have no set roles; and notably, it operates with no deadlines. It has a reach
to the global public about which, respecting the privacy of internet users, there
are no statistics other than general statistics on its social networks channels and a
figure of numbers of people and bots who registered on its website and subscribed
to its newsletter.
At the same time, technically said, Monoskop is primarily an internet website
and in this regard it is no different from any other communication media whose
function is to complicate interpersonal communication, at least due to the fact
that it is a medium with its own specific language, materiality, duration and access.
6
Contemporary media
Monoskop has began ten years ago in the milieu of a group of people running
a cultural space where they had organised events, workshops, discussion, a festival,
etc. Their expertise, if to call that way the trace left after years spent in the higher
education, varied well, and it spanned from fine art, architecture, philosophy,
through art history and literary theory, to library studies, cognitive science and
information technology. Each of us was obviously interested in these and other
fields other than his and her own, but the praxis in naming the substance whose
centripetal effects brought us into collaboration were the terms new media, media
culture and media art.
Notably, it was not contemporary art, because a constituent part of the praxis was
also non-visual expression, information media, etc., so the research began with the
essentially naive question ‘of what are we contemporary?’. There had been not
much written about media culture and art as such, a fact I perceived as drawback
but also as challenge.
The reflection, discussion and critique need to be grounded in reality, in a wider
context of the field, thus the research has began in-field. From the beginning, the
website of Monoskop served to record the environment, including people, groups,
organizations, events we had been in touch with and who/which were more or
less explicitly affiliated with media culture. The result of this is primarily a social
geography of live media culture and art, structured on the wiki into cities, with
a focus on the two recent decades.
Cities and agents
The first aim was to compile an overview of agents of this geography in their
wide variety, from eg. small independent and short-lived initiatives to established
museums. The focus on the 1990s and 2000s is of course problematic. One of
its qualities is a parallel to the history of the World Wide Web which goes back
precisely to the early 1990s and which is on the one hand the primary recording
medium of the Monoskop research and on the other a relevant self-archiving and–
stemming from its properties–presentation medium, in other words a platform on
which agents are not only meeting together but potentially influence one another
as well.
http://monoskop.org/Prague
The records are of diverse length and quality, while the priorities for what they
consist of can be generally summed up in several points in the following order:
7
1. Inclusion of a person, organisation or event in the context of the structure.
So in case of a festival or conference held in Prague the most important is to
mention it in the events section on the page on Prague.
2. Links to their web presence from inside their wiki pages, while it usually
implies their (self-)presentation.
http://monoskop.org/The_Media_Are_With_Us
3. Basic information, including a name or title in an original language, dates
of birth, foundation, realization, relations to other agents, ideally through
links inside the wiki. These are presented in narrative and in English.
4. Literature or bibliography in as many languages as possible, with links to
versions of texts online if there are any.
5. Biographical and other information relevant for the object of the research,
while the preference is for those appearing online for the first time.
6. Audiovisual material, works, especially those that cannot be found on linked
websites.
Even though pages are structured in the quasi same way, input fields are not structured, so when you create a wiki account and decide to edit or add an entry, the
wiki editor offers you merely one input box for the continuous text. As is the case
on other wiki websites. Better way to describe their format is thus articles.
There are many related questions about representation, research methodology,
openness and participation, formalization, etc., but I am not going to discuss them
due to the time constraint.
The first research layer thus consists of live and active agents, relations among
them and with them.
Countries
Another layer is related to a question about what does the field of media culture
and art stem from; what and upon what does it consciously, but also not fully
consciously, builds, comments, relates, negates; in other words of what it may be
perceived a post, meta, anti, retro, quasi and neo legacy.
An approach of national histories of art of the 20th century proved itself to be
relevant here. These entries are structured in the same way like cities: people,
groups, events, literature, at the same time building upon historical art forms and
periods as they are reflected in a range of literature.
8
http://monoskop.org/Czech_Republic
The overviews are organised purposely without any attempts for making relations
to the present more explicit, in order to leave open a wide range of intepretations
and connotations and to encourage them at the same time.
The focus on art of the 20th century originally related to, while the researched
countries were mostly of central and eastern Europe, with foundations of modern
national states, formations preserving this field in archives, museums, collections
but also publications, etc. Obviously I am not saying that contemporary media
culture is necessarily archived on the web while art of the 20th century lies in
collections “offline”, it applies vice versa as well.
In this way there began to appear new articles about filmmakers, fine artists, theorists and other partakers in artistic life of the previous century.
Since then the focus has considerably expanded to more than a century of art and
new media on the whole continent. Still it portrays merely another layer of the
research, the one which is yet a collection of fragmentary data, without much
context. Soon we also hit the limit of what is about this field online. The next
question was how to work in the internet environment with printed sources.
Log
http://monoskop.org/log
When I was installing this blog five years ago I treated it as a side project, an offshoot, which by the fact of being online may not be only an archive of selected
source literature for the Monoskop research but also a resource for others, mainly
students in the humanities. A few months later I found Aaaarg, then oriented
mainly on critical theory and philosophy; there was also Gigapedia with publications without thematic orientation; and several other community library portals
on password. These were the first sources where I was finding relevant literature
in electronic version, later on there were others too, I began to scan books and catalogues myself and to receive a large number of scans by email and soon came to
realise that every new entry is an event of its own not only for myself. According
to the response, the website has a wide usership across all the continents.
At this point it is proper to mention the copyright. When deciding about whether
to include this or that publication, there are at least two moments always present.
One brings me back to my local library at the outskirts of Bratislava in the early
1990s and asks that if I would have found this book there and then, could it change
my life? Because books that did I was given only later and elsewhere; and here I
think of people sitting behind computers in Belarus, China or Kongo. And even
9
if not, the latter is a wonder on whether this text has a potential to open up some
serious questions about disciplinarity or national discursivity in the humanities,
while here I am reminded by a recent study which claims that more than half
of academic publications are not read by more than three people: their author,
reviewer and editor. What does not imply that it is necessary to promote them
to more people but rather to think of reasons why is it so. It seems that the
consequences of the combination of high selectivity with open access resonate
also with publishers and authors from whom the complaints are rather scarce and
even if sometimes I don’t understand reasons of those received, I respect them.
Media technology
Throughout the years I came to learn, from the ontological perspective, two main
findings about media and technology.
For a long time I had a tendency to treat technologies as objects, things, while now
it seems much more productive to see them as processes, techniques. As indeed
nor the biologist does speak about the dear as biology. In this sense technology is
the science of techniques, including cultural techniques which span from reading,
writing and counting to painting, programming and publishing.
Media in the humanities are a compound of two long unrelated histories. One of
them treats media as a means of communication, signals sent from point A to the
point B, lacking the context and meaning. Another speaks about media as artistic
means of expression, such as the painting, sculpture, poetry, theatre, music or
film. The term “media art” is emblematic for this amalgam while the historical
awareness of these two threads sheds new light on it.
Media technology in art and the humanities continues to be the primary object of
research of Monoskop.
I attempted to comment on political, esthetic and technical aspects of publishing.
Let me finish by saying that Monoskop is an initiative open to people and future
and you are more than welcome to take part in it.
Dušan Barok
Written May 1-7, 2014, in Bergen and Prague. Translated by the author on May 10-13,
2014. This version generated June 10, 2014.
Barok
Poetics of Research
2014
_An unedited version of a talk given at the conference[Public
Library](http://www.wkv-stuttgart.de/en/program/2014/events/public-library/)
held at Württembergischer Kunstverein Stuttgart, 1 November 2014._
_Bracketed sequences are to be reformulated._
Poetics of Research
In this talk I'm going to attempt to identify [particular] cultural
algorithms, ie. processes in which cultural practises and software meet. With
them a sphere is implied in which algorithms gather to form bodies of
practices and in which cultures gather around algorithms. I'm going to
approach them through the perspective of my practice as a cultural worker,
editor and artist, considering practice in the same rank as theory and
poetics, and where theorization of practice can also lead to the
identification of poetical devices.
The primary motivation for this talk is an attempt to figure out where do we
stand as operators, users [and communities] gathering around infrastructures
containing a massive body of text (among other things) and what sort of things
might be considered to make a difference [or to keep making difference].
The talk mainly [considers] the role of text and the word in research, by way
of several figures.
A
A reference, list, scheme, table, index; those things that intervene in the
flow of narrative, illustrating the point, perhaps in a more economic way than
the linear text would do. Yet they don't function as pictures, they are
primarily texts, arranged in figures. Their forms have been
standardised[normalised] over centuries, withstood the transition to the
digital without any significant change, being completely intuitive to the
modern reader. Compared to the body of text they are secondary, run parallel
to it. Their function is however different to that of the punctuation. They
are there neither to shape the narrative nor to aid structuring the argument
into logical blocks. Nor is their function spatial, like in visual poems.
Their positions within a document are determined according to the sequential
order of the text, [standing as attachments] and are there to clarify the
nature of relations among elements of the subject-matter, or to establish
relations with other documents. The [premise] of my talk is that these
_textual figures_ also came to serve as the abstract[relational] models
determining possible relations among documents as such, and in consequence [to
structure conditions [of research]].
B
It can be said that research, as inquiry into a subject-matter, consists of
discrete queries. A query, such as a question about what something is, what
kinds, parts and properties does it have, and so on, can be consulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.
Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.
C
Searching in a collection of [amassed/assembled] [tangible] documents (ie.
bookshelf) is different from searching in a systematically structured
repository (library) and even more so from searching in a digital repository
(digital library). Not that they are mutually exclusive. One can devise
structures and algorithms to search through a printed text, or read books in a
library one by one. They are rather [models] [embodying] various [processes]
associated with the query. These properties of the query might be called [the
sequence], the structure and the index. If they are present in the ways of
querying documents, and we will return to this issue, are they persistent
within the inquiry as such? [wait]
D
This question itself is a rupture in the sequence. It makes a demand to depart
from one narrative [a continuous flow of words] to another, to figure out,
while remaining bound to it [it would be even more as a so-called rhetorical
question]. So there has been one sequence, or line, of the inquiry--about the
kinds of the query and its properties. That sequence itself is a digression,
from within the sequence about what is research and describing its parts
(queries). We are thus returning to it and continue with a question whether
the properties of the inquiry are the same as the properties of the query.
E
But isn't it true that every single utterance occurring in a sequence yields a
query as well? Let's consider the word _utterance_. [wait] It can produce a
number of associations, for example with how Foucault employs the notion of
_énoncé_ in his _Archaeology of Knowledge_ , giving hard time to his English
translators wondering whether _utterance_ or _statement_ is more appropriate,
or whether they are interchangeable, and what impact would each choice have on
his reception in the Anglophone world. Limiting ourselves to textual forms for
now (and not translating his work but pursing a different inquiry), let us say
the utterance is a word [or a phrase or an idiom] in a sequence such as a
sentence, a paragraph, or a document.
## (F) The
structure[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=1
"Edit section: \(F\) The structure")]
This distinction is as old as recorded Western thought since both Plato and
Aristotle differentiate between a word on its own ("the said", a thing said)
and words in the company of other words. For example, Aristotle's _Categories_
[lay] on the [notion] of words on their own, and they are made the subject-
matter of that inquiry. [For him], the ambiguity of connotation words
[produce] lies in their synonymity, understood differently from the moderns--
not as more words denoting a similar thing but rather one word denoting
various things. Categories were outlined as a device to differentiate among
words according to kinds of these things. Every word as such belonged to not
less and not more than one of ten categories.
So it happens to the word _utterance_ , as to any other word uttered in a
sequence, that it poses a question, a query about what share of the spectrum
of possibly denoted things might yield as the most appropriate in a given
context. The more context the more precise share comes to the fore. When taken
out of the context ambiguity prevails as the spectrum unveils in its variety.
Thus single words [as any other utterances] are questions, queries,
themselves, and by occuring in statements, in context, their [means] are being
singled out.
This process is _conditioned_ by what has been formalized as the techniques of
_regulating_ definitions of words.
### (G) The structure: words as
words[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=2
"Edit section: \(G\) The structure: words as words")]
* [![](/images/thumb/c/c8/Philitas_in_P.Oxy.XX_2260_i.jpg/144px-Philitas_in_P.Oxy.XX_2260_i.jpg)](/File:Philitas_in_P.Oxy.XX_2260_i.jpg)
P.Oxy.XX 2260 i: Oxyrhynchus papyrus XX, 2260, column i, with quotation from
Philitas, early 2nd c. CE. 1(http://163.1.169.40/cgi-
bin/library?e=q-000-00---0POxy--00-0-0--0prompt-10---4------0-1l--1-en-50---
20-about-2260--
00031-001-0-0utfZz-8-00&a=d&c=POxy&cl=search&d=HASH13af60895d5e9b50907367)
2(http://en.wikipedia.org/wiki/File:POxy.XX.2260.i-Philitas-
highlight.jpeg)
* [![](/images/thumb/9/9e/Cyclopaedia_1728_page_210_Dictionary_entry.jpg/88px-Cyclopaedia_1728_page_210_Dictionary_entry.jpg)](/File:Cyclopaedia_1728_page_210_Dictionary_entry.jpg)
Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , 1728, p. 210. 3(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0576&id=HistSciTech.Cyclopaedia01&isize=L)
* [![](/images/thumb/b/b8/Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg/160px-Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)](/File:Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)
Detail from the Liddell-Scott Greek-English Lexicon, c1843.
Dictionaries have had a long life. The ancient Greek scholar and poet Philitas
of Cos living in the 4th c. BCE wrote a vocabulary explaining the meanings of
rare Homeric and other literary words, words from local dialects, and
technical terms. The vocabulary, called _Disorderly Words_ (Átaktoi glôssai),
has been lost, with a few fragments quoted by later authors. One example is
that the word πέλλα (pélla) meant "wine cup" in the ancient Greek region of
Boeotia; contrasted to the same word meaning "milk pail" in Homer's _Iliad_.
Not much has changed in the way how dictionaries constitute order. Selected
archives of statements are queried to yield occurrences of particular words,
various _criteria[indicators]_ are applied to filtering and sorting them and
in turn the spectrum of [denoted] things allocated in this way is structured
into groups and subgroups which are then given, according to other set of
rules, shorter or longer names. These constitute facets of [potential]
meanings of a word.
So there are at least _four_ sets of conditions [structuring] dictionaries.
One is required to delimit an archive[corpus of texts], one to select and give
preference[weights] to occurrences of a word, another to cluster them, and yet
another to abstract[generalize] the subject-matter of each of these clusters.
Needless to say, this is a craft of a few and these criteria are rarely being
disclosed, despite their impact on research, and more generally, their
influence as conditions for production[making] of a so called _common sense_.
It doesn't take that much to reimagine what a dictionary is and what it could
be, especially having large specialized corpora of texts at hand. These can
also serve as aids in production of new words and new meanings.
### (H) The structure: words as knowledge and the
world[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=3
"Edit section: \(H\) The structure: words as knowledge and the world")]
* [![](/images/thumb/0/02/Boethius_Porphyrys_Isagoge.jpg/120px-Boethius_Porphyrys_Isagoge.jpg)](/File:Boethius_Porphyrys_Isagoge.jpg)
Boethius's rendering of a classification tree described in Porphyry's Isagoge
(3th c.), [6th c.] 10th c.
4(http://www.e-codices.unifr.ch/en/sbe/0315/53/medium)
* [![](/images/thumb/d/d0/Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg/94px-Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)](/File:Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)
Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , London, 1728, p. II. 5(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0015&id=HistSciTech.Cyclopaedia01&isize=L)
* [![](/images/thumb/d/d6/Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg/116px-Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)](/File:Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)
Système figuré des connaissances humaines, _Encyclopédie ou Dictionnaire
raisonné des sciences, des arts et des métiers_ , 1751.
6(http://encyclopedie.uchicago.edu/content/syst%C3%A8me-figur%C3%A9-des-
connaissances-humaines)
* [![](/images/thumb/9/96/Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg/96px-Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)](/File:Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)
Haeckel - Darwin's tree.
Another _formalized_ and [internalized] process being at play when figuring
out a word is its [containment]. Word is not only structured by way of things
it potentially denotes but also by words it is potentially part of and those
it contains.
The fuzz around categorization of knowledge _and_ the world in the Western
thought can be traced back to Porphyry, if not further. In his introduction to
Aristotle's _Categories_ this 3rd century AD Neoplatonist began expanding the
notions of genus and species into their hypothetic consequences. Aristotle's
brief work outlines ten categories of 'things that are said' (legomena,
λεγόμενα), namely substance (or substantive, {not the same as matter!},
οὐσία), quantity (ποσόν), qualification (ποιόν), a relation (πρός), where
(ποῦ), when (πότε), being-in-a-position (κεῖσθαι), having (or state,
condition, ἔχειν), doing (ποιεῖν), and being-affected (πάσχειν). In his
different work, _Topics_ , Aristotle outlines four kinds of subjects/materials
indicated in propositions/problems from which arguments/deductions start.
These are a definition (όρος), a genus (γένος), a property (ἴδιος), and an
accident (συμβεβηϰόϛ). Porphyry does not explicitly refer _Topics_ , and says
he omits speaking "about genera and species, as to whether they subsist (in
the nature of things) or in mere conceptions only"
8(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C1),
which means he avoids explicating whether he talks about kinds of concepts or
kinds of things in the sensible world. However, the work sparked confusion, as
the following passage [suggests]:
> "[I]n each category there are certain things most generic, and again, others
most special, and between the most generic and the most special, others which
are alike called both genera and species, but the most generic is that above
which there cannot be another superior genus, and the most special that below
which there cannot be another inferior species. Between the most generic and
the most special, there are others which are alike both genera and species,
referred, nevertheless, to different things, but what is stated may become
clear in one category. Substance indeed, is itself genus, under this is body,
under body animated body, under which is animal, under animal rational animal,
under which is man, under man Socrates, Plato, and men particularly." (Owen
1853,
9(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C2))
Porphyry took one of Aristotle's ten categories of the word, substance, and
dissected it using one of his four rhetorical devices, genus. Employing
Aristotle's categories, genera and species as means for logical operations,
for dialectic, Porphyry's interpretation resulted in having more resemblance
to the perceived _structures_ of the world. So they began to bloom.
There were earlier examples, but Porphyry was the most influential in
injecting the _universalist_ version of classification [implying] the figure
of a tree into the [locus] of Aristotle's thought. Knowledge became
monotheistic.
Classification schemes [growing from one point] play a major role in
untangling the format of modern encyclopedia from that of the dictionary
governed by alphabet. Two of the most influential encyclopedias of the 18th
century are cases in the point. Although still keeping 'dictionary' in their
titles, they are conceived not to represent words but knowledge. The [upper-
most] genus of the body was set as the body of knowledge. The English
_Cyclopaedia, or an Universal Dictionary of Arts and Sciences_ (1728) splits
into two main branches: "natural and scientifical" and "artificial and
technical"; these further split down to 47 classes in total, each carrying a
structured list (on the following pages) of thematic articles, serving as
table of contents. The French _Encyclopedia: or a Systematic Dictionary of the
Sciences, Arts, and Crafts_ (1751) [unwinds] from judgement ( _entendement_ ),
branches into memory as history, reason as philosophy, and imagination as
poetry. The logic of containers was employed as an aid not only to deal with
the enormous task of naming and not omiting anything from what is known, but
also for the management of labour of hundreds of writers and researchers, to
create a mechanism for delegating work and the distribution of
responsibilities. Flesh was also more present, in the field research, with
researchers attending workshops and sites of everyday life to annotate it.
The world came forward to unshine the word in other schemes. Darwin's tree of
evolution and some of the modern document classification systems such as
Charles A. Cutter's _Expansive Classification_ (1882) set to classify the
world itself and set the field for what has came to be known as authority
lists structuring metadata in today's computing.
### The structure
(summary)[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=4
"Edit section: The structure \(summary\)")]
Facetization of meaning and branching of knowledge are both the domain of the
unit of utterance.
While lexicographers[dictionarists] structure thought through multi-layered
processes of abstraction of the written record, knowledge growers dissect it
into hierarchies of [mutually] contained notions.
One seek to describe the word as a faceted list of small worlds, another to
describe the world as a structured lists of words. One play prime in the
domain of epistemology, in what is known, controlling the vocabulary, another
in the domain of ontology, in what is, controlling reality.
Every [word] has its given things, every thing has its place, closer or
further from a single word.
The schism between classifying words and classifying the world implies it is
not possible to construct a universal classification scheme[system]. On top of
that, any classification system of words is bound to a corpus of texts it is
operating upon and any classification system of the world again operates with
words which are bound to a vocabulary[lexicon] which is again bound to a
corpus [of texts]. It doesn't mean it would prevent people from trying.
Classifications function as descriptors of and 'inscriptors' upon the world,
imprinting their authority. They operate from [a locus of] their
corpus[context]-specificity. The larger the corpus, the more power it has on
shaping the world, as far as the word shapes it (yes, I do imply Google here,
for which it is a domain to be potentially exploited).
## (J) The
sequence[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=5
"Edit section: \(J\) The sequence")]
The structure-yielding query [of] the single word [shrinks][zuzuje
sa,spresnuje] with preceding and following words. Inquiry proceeds in the flow
that establishes another kind[mode] of relationality, chaining words into the
sequence. While the structuring property of the query brings words apart from
each other, its sequential property establishes continuity and brings these
units into an ordered set.
This is what is responsible for attaching textual figures mentioned earlier
(lists, schemes, tables) to the body of the text. Associations can be also
stated explicitly, by indexing tables and then referring them from a
particular point in the text. The same goes for explicit associations made
between blocks of the text by means of indexed paragraphs, chapters or pages.
From this follows that all utterances point to the following utterance by the
nature of sequential order, and indexing provides means for pointing elsewhere
in the document as well.
A lot can be said about references to other texts. Here, to spare time, I
would refer you to a talk I gave a few months ago and which is online
10(http://monoskop.org/Talks/Communing_Texts).
This is still the realm of print. What happens with document when it is
digitized?
Digitization breaks a document into units of which each is assigned a numbered
position in the sequence of the document. From this perspective digitization
can be viewed as a total indexation of the document. It is converted into
units rendered for machine operations. This sequentiality is made explicit, by
means of an underlying index.
Sequences and chains are orders of one dimension. Their one-dimensional
ordering allows addressability of each element and [random] access. [Jumps]
between [random] addresses are still sequential, processing elements one at a
time.
## (K) The
index[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=6
"Edit section: \(K\) The index")]
* [![](/images/thumb/2/27/Summa_confessorum.1310.jpg/103px-Summa_confessorum.1310.jpg)](/File:Summa_confessorum.1310.jpg)
Summa confessorum [1297-98], 1310.
7(http://www.bl.uk/onlinegallery/onlineex/illmanus/roymanucoll/j/011roy000008g11u00002000.html)
[The] sequencing not only weaves words into statements but activates other
temporalities, and _presents occurrences of words from past statements_. As
now when I am saying the word _utterance_ , each time there surface contexts
in which I have used it earlier.
A long quote from Frederick G. Kilgour, _The Evolution of the Book_ , 1998, pp
76-77:
> "A century of invention of various types of indexes and reference tools
preceded the advent of the first subject index to a specific book, which
occurred in the last years of the thirteenth century. The first subject
indexes were "distinctions," collections of "various figurative or symbolic
meanings of a noun found in the scriptures" that "are the earliest of all
alphabetical tools aside from dictionaries." (Richard and Mary Rouse supply an
example: "Horse = Preacher. Job 39: 'Hast thou given the horse strength, or
encircled his neck with whinning?')
>
> [Concordance] By the end of the third decade of the thirteenth century Hugh
de Saint-Cher had produced the first word concordance. It was a simple word
index of the Bible, with every location of each word listed by [its position
in the Bible specified by book, chapter, and letter indicating part of the
chapter]. Hugh organized several dozen men, assigning to each man an initial
letter to search; for example, the man assigned M was to go through the entire
Bible, list each word beginning with M and give its location. As it was soon
perceived that this original reference work would be even more useful if words
were cited in context, a second concordance was produced, with each word in
lengthy context, but it proved to be unwieldy. [Soon] a third version was
produced, with words in contexts of four to seven words, the model for
biblical concordances ever since.
>
> [Subject index] The subject index, also an innovation of the thirteenth
century, evolved over the same period as did the concordance. Most of the
early topical indexes were designed for writing sermons; some were organized,
while others were apparently sequential without any arrangement. By midcentury
the entries were in alphabetical order, except for a few in some classified
arrangement. Until the end of the century these alphabetical reference works
indexed a small group of books. Finally John of Freiburg added an alphabetical
subject index to his own book, _Summa Confessorum_ (1297—1298). As the Rouses
have put it, 'By the end of the [13]th century the practical utility of the
subject index is taken for granted by the literate West, no longer solely as
an aid for preachers, but also in the disciplines of theology, philosophy, and
both kinds of law.'"
In one sense neither subject-index nor concordane are indexes, they are words
or group of words selected according to given criteria from the body of the
text, each accompanied with a list of identifiers. These identifiers are
elements of an index, whether they represent a page, chapter, column, or other
[kind of] block of text. Every identifier is an unique _address_.
The index is thus an ordering of a sequence by means of associating its
elements with a set of symbols, when each element is given unique combination
of symbols. Different sizes of sets yield different number of variations.
Symbol sets such as an alphabet, arabic numerals, roman numerals, and binary
digits have different proportions between the length of a string of symbols
and the number of possible variations it can contain. Thus two symbols of
English alphabet can store 26^2 various values, of arabic numerals 10^2, of
roman numberals 8^2 and of binary digits 2^2.
Indexation is segmentation, a breaking into segments. From as early as the
13th century the index such as that of sections has served as enabler of
search. The more [detailed] indexation the more precise search results it
enables.
The subject-index and concordance are tables of search results. There is a
direct lineage from the 13th-century biblical concordances and the birth of
computational linguistic analysis, they were both initiated and realised by
priests.
During the World War II, Jesuit Father Roberto Busa began to look for machines
for the automation of the linguistic analysis of the 11 million-word Latin
corpus of Thomas Aquinas and related authors.
Working on his Ph.D. thesis on the concept of _praesens_ in Aquinas he
realised two things:
> "I realized first that a philological and lexicographical inquiry into the
verbal system of an author has t o precede and prepare for a doctrinal
interpretation of his works. Each writer expresses his conceptual system in
and through his verbal system, with the consequence that the reader who
masters this verbal system, using his own conceptual system, has to get an
insight into the writer's conceptual system. The reader should not simply
attach t o the words he reads the significance they have in his mind, but
should try t o find out what significance they had in the writer's mind.
Second, I realized that all functional or grammatical words (which in my mind
are not 'empty' at all but philosophically rich) manifest the deepest logic of
being which generates the basic structures of human discourse. It is .this
basic logic that allows the transfer from what the words mean today t o what
they meant to the writer.
>
> In the works of every philosopher there are two philosophies: the one which
he consciously intends to express and the one he actually uses to express it.
The structure of each sentence implies in itself some philosophical
assumptions and truths. In this light, one can legitimately criticize a
philosopher only when these two philosophies are in contradiction."
11(http://www.alice.id.tue.nl/references/busa-1980.pdf)
Collaborating with the IBM in New York from 1949, the work, a concordance of
all the words of Thomas Aquinas, was finally published in the 1970s in 56
printed volumes (a version is online since 2005
12(http://www.corpusthomisticum.org/it/index.age)). Besides that, an
electronic lexicon for automatic lemmatization of Latin words was created by a
team of ten priests in the scope of two years (in two phases: grouping all the
forms of an inflected word under their lemma, and coding the morphological
categories of each form and lemma), containing 150,000 forms
13(http://www.alice.id.tue.nl/references/busa-1980.pdf#page=4). Father
Busa has been dubbed the father of humanities computing and recently also of
digital humanities.
The subject-index has a crucial role in the printed book. It is the only means
for search the book offers. Subjects composing an index can be selected
according to a classification scheme (specific to a field of an inquiry), for
example as elements of a certain degree (with a given minimum number of
subclasses).
Its role seemingly vanishes in the digital text. But it can be easily
transformed. Besides serving as a table of pre-searched results the subject-
index also gives a distinct idea about content of the book. Two patterns give
us a clue: numbers of occurrences of selected words give subjects weights,
while words that seem specific to the book outweights other even if they don't
occur very often. A selection of these words then serves as a descriptor of
the whole text, and can be thought of as a specific kind of 'tags'.
This process was formalized in a mathematical function in the 1970s, thanks to
a formula by Karen Spärck Jones which she entitled 'inverse document
frequency' (IDF), or in other words, "term specificity". It is measured as a
proportion of texts in the corpus where the word appears at least once to the
total number of texts. When multiplied by the frequency of the word _in_ the
text (divided by the maximum frequency of any word in the text), we get _term
frequency-inverse document frequency_ (tf-idf). In this way we can get an
automated list of subjects which are particular in the text when compared to a
group of texts.
We came to learn it by practice of searching the web. It is a mechanism not
dissimilar to thought process involved in retrieving particular information
online. And search engines have it built in their indexing algorithms as well.
There is a paper proposing attaching words generated by tf-idf to the
hyperlinks when referring websites 14(http://bscit.berkeley.edu/cgi-
bin/pl_dochome?query_src=&format=html&collection=Wilensky_papers&id=3&show_doc=yes).
This would enable finding the referred content even after the link is dead.
Hyperlinks in references in the paper use this feature and it can be easily
tested: 15(http://www.cs.berkeley.edu/~phelps/papers/dissertation-
abstract.html?lexical-
signature=notemarks+multivalent+semantically+franca+stylized).
There is another measure, cosine similarity, which takes tf-idf further and
can be applied for clustering texts according to similarities in their
specificity. This might be interesting as a feature for digital libraries, or
even a way of organising library bottom-up into novel categories, new
discourses could emerge. Or as an aid for researchers to sort through texts,
or even for editors as an aid in producing interesting anthologies.
## Final
remarks[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=7
"Edit section: Final remarks")]
1
New disciplines emerge all the time - most recently, for example, cultural
techniques, software studies, or media archaeology. It takes years, even
decades, before they gain dedicated shelves in libraries or a category in
interlibrary digital repositories. Not that it matters that much. They are not
only sites of academic opportunities but, firstly, frameworks of new
perspectives of looking at the world, new domains of knowledge. From the
perspective of researcher the partaking in a discipline involves negotiating
its vocabulary, classifications, corpus, reference field, and specific
terms[subjects]. Creating new fields involves all that, and more. Even when
one goes against all disciplines.
2
Google can still surprise us.
3
Knowledge has been in the making for millenia. There have been (abstract)
mechanisms established that govern its conditions. We now possess specialized
corpora of texts which are interesting enough to serve as a ground to discuss
and experiment with dictionaries, classifications, indexes, and tools for
references retrieval. These all belong to the poetic devices of knowledge-
making.
4
Command-line example of tf-idf and concordance in 3 steps.
* 1\. Process the files text.1-5.txt and produce freq.1-5.txt with lists of (nonlemmatized) words (in respective texts), ordered by frequency:
> for i in {1..5}; do tr '[A-Z]' '[a-z]' < text.$i.txt | tr -c '[a-z]'
'[\012*]' | tr -d '[:punct:]' | sort | uniq -c | sort -k 1nr | sed '1,1d' >
temp.txt; max=$(awk -vvar=1 -F" " 'NRDisplay 200
300
400
500
600
700
800
900
1000
ALL
characters around the word.