A talk presented at the Technopolitics seminar in Vienna on 16 June 2015.
One of my long-term interests has been to explore the [operativity] of online texts. At its core it is an attempt to explore differences and similiarities between the print and online textuality in theory side by side with my work on editing online collections such as Monoskop. It is an open-ended research, developing from talk to talk. But rather than attempting to summarise what has been said earlier, I prefer to each time start anew, from a different entry point. This brief talk deals with some of the ways in which collections of online texts redefine the notion of public library through constituting alternative corpora of texts and [reverting] the access control and the function of full-text search.
Historically, we have been used to treat texts as discrete units, distinguished by their material properties such as cover, binding, script, characteristics that establish them as books, magazines, memos, manuscripts, diaries, sheet music. One book differs from another book, books differ from magazines, printed matter differs from handwritten one. Each volume is a self-contained whole further distinguished by descriptors such as title, author, and classification codes that allow it being located and referred to. The demarcation of publication as container of text is a frame[boundary] which organises searching and reading. Researching particular subject matter, the reader is carried along classificatory schemes under which volumes are organised, along references inside texts pointing to other volumes, and along indexes of subjects appended to texts pointing to places within that volume.
So while material properties separate texts into distinct objects, bibliographic information provides them with unique identifier, unique address in the world of print culture. That world is further organised into containers of objects called libraries.
The online environment, however, intervenes in this condition with texts being digitised, hyperlinked and searched for any text sequence.
A. Rather than distinct entities, texts are accessed through full-text search as if they are one long text, with its portions spread across the web, and including texts that had not been considered as candidates for library collections.
B. The unique identifier for these text portions is not the bibliographic information, but the URL.
C. The text is as long as web-crawlers of a given search engine are set to reach [out], refashioning the library into a storage of indexed data.
These are some of the lines along which online texts appear to produce difference.
The correspondence A. between the publication and the machine-readable text; B. between the bibliographic information and the URL; C. between the library and the search engine.
The introduction of full-text search has created an environment in which all machine-readable online documents at reach are effectively treated as a single document. For any text-sequence to be locatable, it doesn't matter in which file format it appears, nor whether its interface is a SQL-powered website or mere directory listing. As long as text can be extracted from a document, it is a container of text sequences and itself is a sequence in a "book" of the web.
Even though this is hardly any news after almost two decades of the rule of Google Search, little seem to have changed in respect to the forms and genres of writing. Loyal to standard forms of publishing, the writing adheres to the principle of coherence [based] on the units such as book chapters, journal papers, newspaper articles, etc., designed to be read from the beginning to the end.
Still, the scope of [forms] appearing in search results, and thus a corpus of texts in which they are being brought into, is radically diversified: it may include discussion board comments, product reviews, private e-mails, weather information, spam, etc., the content that used to be omitted from library collections.
Rather than being published in a traditional sense, all these texts are produced onto digital networks by mere typing, copying, OCR-ing, being fed through sensors tracking movement, temperature, etc.
Even though portions of this text may come with human or nonhuman authors attached, authors have relatively little control over discourses their writing gets embedded in. This is also where the ambiguity of copyright manifests itself. Crawling bots pre-read the internet with all its attached devices according to the agenda of their [providers], and the decisions about which, how and to whom the indexed texts are served in search results lay in the code of a library.
Libraries in this sense are not only digital versions of public or private libraries as we know them from history, but also commercial search engines, intelligence agencies, and virtually all forms of online text collections.
Acquisition policies figure here in the same rank with crawling bots, dragnet [surveillance] algorithms, and arbitrary motivations of users, all of which actuate the selection and embedding of texts into structures that regulate their retrievability and through access control[user management] produce certain kinds of communities[groups] of readers. The author's intentions of partaking in this or that discourse are confronted by discourse-conditioning operations of retrieval algorithms. Hence, Google structures discourse through its Google Search differently from how the Internet Archive does with its Wayback Machine, and from how the GCHQ does with its dragnet programme.
They are libraries, each containing a single "book" whose pages are URLs with timestamps and geostamps (in the form of IP address). Google, GCHQ, JStor, Elsevier, SpringerLink, each confined to its searchable corpus of texts. The decisions about who, which sections and under which conditions is to be admitted to read are informed by copyright laws, corporate agenda[interests], management hierarchies, and national security issues. Various sets of these conditions at work in a particular library also redefine the notion of publishing and of the publication, and in turn the notion of public.
Corporate journal repositories exploit publicly funded research renting it only to libraries which can afford it; intelligence agencies are set to extract text from any moving target, basically any networked device, apparently in public service and away from the public eye; publicly-funded libraries are being prevented by outdated copyright laws and bureaucracy from providing digitised content online; search engines [give] a sense of searching all the public record online while only a few know what is excluded and how search results are ordered.
It is within and against this milieu that the libraries such as the Internet Archive, Wikileaks, Aaaaarg, UbuWeb, Monoskop, Nettime, TheNextLayer and others [gain] their political agency. Their counter-techniques available for negotiating the publics of publishing include self-archiving, open access, book liberation, leaking, whistleblowing, open source search algorithms and so on.
Digitisation and posting things online are interventions in procedures making search possible. Operating an online collection of texts is as much the work of organising texts within as is placing them within a "book of the web".
Written 15-16 June 2015 in Prague, Brno and Vienna.