Talks/More Than Numbers Less Than Words

From Monoskop
Jump to: navigation, search

A talk given at Public Library conference on 5 June 2015 at Nova Gallery in Zagreb, Croatia.

I am not going to defend the electronic book against printed one, nor digital libraries against brick-and-mortar ones. There is enough space for them all, also given that what were once distinct categories are today lacks of better terms.

Nor am I going to talk about universal access. I am a supporter of unrestricted access to written knowledge, but I am unable to identify with the imperative of universal access to all published content. Something tells me that the politics of knowledge resides elsewhere.

What I have been trying to understand for some time and what I am going to talk about now is something very simple. It also seems to be one of the big challenges for books and libraries today. Namely: what are we taking part in when dealing with texts online? What is the logic, what are the mechanisms, what are the algorithms at work making texts operate in a digital networked environment? How is it possible that those texts are at the same time digital and environmental? On what exactly are we taking part when we scan, OCR, put texts there, read, search, and share them? The task here is to identify broader epistemological, aesthetic and poetic conditions of writing today.


Let us start with collecting texts. Something, I assume, we are all involved with in one way or another. Collections are organised or disorganised onto shelves holding books and other material, into structures of folders and subfolders with pdfs and epubs on harddrives, sorted in the Calibre framework, uploaded, kept, moved, deleted from a website. Think of any of these images for now. As the number of things grows, they are split into groups. These can represent a letter of the alphabet, year of publication, genre, subject matter, format, size, color, those one wants to read, read and would never read, and so on. They can represent a combination of them, they can as well stand for a refusal of them all.

Ordering and disordering revolves around insistence on or resistance to splitting things into distinct groups. Such structuring brings along the element of hierarchy, a scheme with tops and bottoms. And how we design such hierarchies in turn imposes upon how we navigate them--how we can find things, where we put new things, memorise where they are and so on. They have a certain imprint on how we think. But let's look at a larger picture.

The question of how to order books is a very old and persistent one because it emerges as soon as a bin, shelf or folder starts to be "messy". Outliving the lifespans of publishing houses and bookstores, libraries have gained a privileged position in establishing a norm for these schemes. They had adopted and made as the single most dominant ordering principle the subject matter. Schemes organising subject matters into various classes and subclasses were standardised around the turn of the previous century.

Far from being neutral they carry strong authorial imprints. They have emerged from the minds of individual thinkers such as Dewey, Otlet or Cutter. Even though they are now managed by various consortiums, their main features have been kept pretty much the same. From this perspective we can say that most of the libraries in Europe and North America today (I am not familiar with the situation elsewhere) are still "curated" by Dewey, Otlet, Cutter and the like. They are products of their impressions of how knowledge was meant to be structured around 1900 and continue to imprint that order on libraries and more widely on research ever since.

It is not so easy, however, to trace their imprint in the libraries we are talking about at this event, nor in many others. Curating a collection involves not only actual items but also structures these items are contained in as well as defining conditions under which these structures are to be negotiated and changed over time. So here we have an aspect of knowledge-making, an agency that operates toward defamiliarising the "known" and toward questioning how is it known. In turn, it effects us constantly. We employ a structuring gaze not only when we look for or order books on shelves and in folders but also while reading them and even more broadly, when we think in categories, judging what belongs in what.


The way texts are ordered into hierarchies has an imprint on how they are accessed, treated, produced, reproduced--we learn also through categories. However, one should not forget about other ways how text collections are navigated (and which in turn normalise perception and making).

One of them is following a reference--i.e. following citations further to their sources, following recommended further reading, bibliographies, etc. The importance of this topic is strangely overlooked in the context of techniques of writing, especially when the web and open access principles call for turning textual references into hypertextual links, to "commune" texts. And even more when the imperative of the web appears to be nothing less than "I link therefore I am."

But I would like to talk about another way of navigating collections of texts: the index. We have came to treat index as a long list of words at the end of the book useful for a quick lookup where in the text they are discussed. Due to increasing digitisation and the introduction of full text search it is often told to be destined to become obsolete.

However, the index is the very element that allows full text search. Without indexing words a search would simply not be possible. Whether it is done on-the-fly or is pre-calculated, the occurrence of every single word is accounted for, none is omitted, this kind of book index is total. It is as if there would be a list of all the words occurring in a book appended to it.

This is where boundaries of collections emerge most clearly. Given total index of a text collection and a possibility to consult it within miliseconds, the reader is given a view of it as a whole, at one glance, however large it is and whatever classifications it adheres to. This is a perspective very different from walking inside a library.

Full text search makes the difference between printed text and digital text perhaps most apparent. And it opens way to devices not available in print.

First of all, one may search a collection that contains much more than books, magazines and printed ephemera - it can include basically anything from which there exist ways to extract text -- images, videos, music, emails, personal profiles, etc. This is something that radically expands the field of references available to a researcher and writer.

One can easily see whether a collection has anything to offer for a particular interest.

One can easily see contexts in which a given same text sequence is repeated (which shows that an implicit reference such as uncredited quotation can be read very explicitly, a fact that calls for further explorations by writers).

What is not so easy to see is which texts are not indexed - what is missing. Control over the corpus of texts is another device available to providers of online collections and also exploited for various ends. The larger collection the more possibilities are at disposal.

Another instance of obfuscation comes with sorting search results - there is hardly a more ambivalent notion than "relevance". Displaying results is subject to relatively limited interface design options--it normally comes down to a list. Here, setting rules for sorting is available as yet another device.

And so on.

An interesting thing about these devices, or "grey media" (after Fuller and Goffey), is that they reveal not only epistemic but also poetic conditions of operating online text collections.

From the perspective of full text search, structures don't matter much, a collection is simply one long "shelf", or even one long text, one book if you like. It is not "messy" because it is not meant to be read cover to cover. Where the reading begins is in a passage, an excerpt, where a searched term appears.

It needs to be noted how little this quite common way of reading books has "translated" into ways of writing them. More than a hundred years ago Otlet wrote that "once one read; today one refers to, checks through, skims." The normativity of that statement is questionable, but we have certainly already had a long time to develop what was once something to condemn into just another technique, skill.

Here, there is something to learn from popular online journalism and from Twitter. There the writing revolves around paragraphs and tweets, each somehow standing on its own. An ideal way of writing "compatible" with the availability of full text search would be in the form of sequences of paragraphs, each paragraph standing as a self-contained story, ready to be taken out of the context. To accommodate also the possibility of further reading of the text departing from a paragraph one "lands on" (proceeding forward or backward) the form calls for somewhat fractal, or hologrammatic, writing when the content of the whole book can be summed up by any of its paragraphs. This doesn't mean to simply publish books of aphorisms or anecdotes but to search for new ways of letting the whole become more than a sum of its parts. And to reimagine the book differently than a mere sum of its pasts.


What classification schemes divide into shelves and subshelves the index treats as a single shelf. Despite that it is more intuitive to find things on it, even if it gets very long. This is not to say that classifications are automatically made obsolete, only that their normative power is not what it used to be. At the same time, it is not impossible to imagine whole collections being structured into clusters by computation, i.e. according to the specificity of expressions and jargons occurring across various groups of texts (calculable as certain ratios of occurrences of words using tf-idf analysis, even on the fly) or according to densities of bibliographic interrelationships. This is one of the ways in which the normative classificatory power of universalising human minds may get outdone by algorithmic procedures for which knowledge is mere words, words without semantics, instead representing quantities of their occurrences, numbers.

While in c1900 classification schemes were crafted by the minds of individual library theorists and, channeled through libraries, in turn regulated the ordering of scholarly disciplines and knowledge, today this craft becomes strangely echoed by regulatory and normative power of devices of searching, sorting and clustering vast bodies of texts.

Various utilitarian, political and aesthetic motives has led us to take part in wider processes of digitisation. What stands ahead is also to reimagine what digits do to letters.

Dušan Barok

Written 4-5 June 2015 in Bergen and Zagreb. Slightly edited 10 June 2015.