for further analysis and an alternative
approach to the same issues: “When everyone is librarian, library is
everywhere.”) The repository, growing to more than 60 millions papers, was
sued in 2015 by Elsevier for $15 million, resulting in a permanent injunction.
Library Genesis, another library of comparable scale, finds itself in a
similar legal predicament.
Arguably one of the largest digital archives of the “avant-garde” (loosely
defined), UbuWeb is transparent about this fragility. In 2011, its founder
[Kenneth Goldsmith wrote](http://www.ubu.com/resources/): “by the time you
read this, UbuWeb may be gone. […] Never meant to be a permanent archive, Ubu
could vanish for any number of reasons: our ISP pulls the plug, our university
support dries up, or we simply grow tired of it.” Even the banality of
exhaustion is a real risk to these libraries.
The simple fact is that some of these libraries are among the largest in the
world yet are subject to sudden disappearance. We can only begin to guess at
what the contours of “Lost Memory: Libraries and Archives Destroyed in the
Twenty-First Century” will be when it is written ninety years from now.
## Non-profit, non-state archives
Cultural and social movements have produced histories which are only partly
represented in state libraries and archives. Often they are deemed too small
or insignificant or, in some cases, dangerous. Most frequently, they are not
deemed to be anything at all — they are simply neglected. While the market,
eager for new resources to exploit, might occasionally fill in the gaps, it is
ultimately motivated by profit and not by responsibility to communities or
archives. (We should not forget the moment [Amazon silently erased legally
purchased copies of George Orwell’s
1984](http://www.nytimes.com/2009/07/18/technology/companies/18amazon.html)
from readers’ Kindle devices because of a change in the commercial agreement
with the publisher.)
So, what happens to these minor libraries? They are innumerable, but for the
sake of illustration let’s say that each could be represented by a single
book. Gathered together, these books would form a great library (in terms of
both importance and scale). But to extend the metaphor, the current reality
could be pictured as these books flying off their shelves to the furthest
reaches of the world, their covers flinging open and the pages themselves
scattering into bookshelves and basements, into the caring hands of relatives
or small institutions devoted to passing these words on to future generations.
While the massive digital archives listed above (library.nu, Library Genesis,
Sci-Hub, etc.) could play the role of the library of libraries, they tend to
be defined more as sites for [biblioleaks](https://www.jmir.org/2014/4/e112/).
Furthermore, given the vulnerability of these archives, we ought to look for
alternative approaches that do not rule out using their resources, but which
also do not _depend_ on them.
Dat Library takes the concept of “a library of libraries” not to manifest it
in a single, universal library, but to realise it progressively and partially
with different individuals, groups and institutions.
## Archival properties
So far, the emphasis of this README has been on _durability_ , and the
“accidents of the archive” have been instances of destruction and loss. The
persistence of an archive is, however, no guarantee of its _accessibility_ , a
common reality in digital libraries where access management is ubiquitous.
Official institutions police access to their archives vigilantly for the
ostensible purpose of preservation, but ultimately create a rarefied
relationship between the archives and their publics. Disregarding this
precious tendency toward preciousness, we also introduce _adaptability_ as a
fundamental consideration in the making of the projects Dat Library and
HyperReadings.
To adapt is to fit something for a new purpose. It emphasises that the archive
is not a dead object of research but a set of possible tools waiting to be
activated in new circumstances. This is always a possibility of an archive,
but we want to treat this possibility as desirable, as the horizon towards
which these projects move. We know how infrastructures can attenuate desire
and simply make things difficult. We want to actively encourage radical reuse.
In the following section, we don’t define these properties but rather discuss
how we implement (or fail to implement) them in software, while highlighting
some of the potential difficulties introduced.
### Durability
In 1964, in the midst of the “loss” of the twentieth-century, Paul Baran’s
RAND Corporation publication [On Distributed
Communications](https://www.rand.org/content/dam/rand/pubs/research_memoranda/2006/RM3420.pdf)
examined “redundancy as one means of building … highly survivable and reliable
communications systems”, thus midwifing the military foundations of the
digital networks that we operate within today. While the underlying framework
of the Internet generally follows distributed principles, the client–server/
request–response model of the HTTP protocol is highly centralised in practice
and is only as durable as the server.
Capitalism places a high value on originality and novelty, as exemplified in
art where the ultimate insult would to be the label “redundant”. Worse than
being derivative or merely unoriginal, being redundant means having no reason
to exist — a uselessness that art can’t tolerate. It means wasting a perfectly
good opportunity to be creative or innovative. In a relational network, on the
other hand, redundancy is a mode of support. It doesn’t stimulate competition
to capture its effects, but rather it is a product of cooperation. While this
attitude of redundancy arose within a Western military context, one can’t help
but notice that the shared resources, mutual support, and common
infrastructure seem fundamentally communist in nature. Computer networks are
not fundamentally exploitative or equitable, but they are used in specific
ways and they operate within particular economies. A redundant network of
interrelated, mutually supporting computers running mostly open-source
software can be the guts of an advanced capitalist engine, like Facebook. So,
could it be possible to organise our networked devices, embedded as they are
in a capitalist economy, in an anti-capitalist way?
Dat Library is built on the [Dat
Protocol](https://github.com/datproject/docs/blob/master/papers/dat-paper.md),
a peer-to-peer protocol for syncing folders of data. It is not the first
distributed protocol ([BitTorrent](https://en.wikipedia.org/wiki/BitTorrent)
is the best known and is noted as an inspiration for Dat), nor is it the only
new one being developed today ([IPFS](https://ipfs.io) or the Inter-Planetary
File System is often referenced in comparison), but it is unique in its
foundational goals of preserving scientific knowledge as a public good. Dat’s
provocation is that by creating custom infrastructure it will be possible to
overcome the accidents that restrict access to scientific knowledge. We would
specifically acknowledge here the role that the Dat community — or any
community around a protocol, for that matter — has in the formation of the
world that is built on top of that protocol. (For a sense of the Dat
community’s values — see its [code of conduct](https://github.com/datproject
/Code-of-Conduct/blob/master/CODE_OF_CONDUCT.md).)
When running Dat Library, a person sees their list of libraries. These can be
thought of as similar to a
[torrent](https://en.wikipedia.org/wiki/Torrent_file), where items are stored
across many computers. This means that many people will share in the provision
of disk space and bandwidth for a particular library, so that when someone
loses electricity or drops their computer, the library will not also break.
Although this is a technical claim — one that has been made in relation to
many projects, from Baran to BitTorrent — it is more importantly a social
claim: the users and lovers of a library will share the library. More than
that, they will share in the work of ensuring that it will continue to be
shared.
This is not dissimilar to the process of reading generally, where knowledge is
distributed and maintained through readers sharing and referencing the books
important to them. As [Peter Sloterdijk
describes](https://rekveld.home.xs4all.nl/tech/Sloterdijk_RulesForTheHumanZoo.pdf),
written philosophy is “reinscribed like a chain letter through the
generations, and despite all the errors of reproduction — indeed, perhaps
because of such errors — it has recruited its copyists and interpreters into
the ranks of brotherhood (sic)”. Or its sisterhood — but, the point remains
clear that the reading / writing / sharing of texts binds us together, even in
disagreement.
### Accessibility
In the world of the web, durability is synonymous with accessibility — if
something can’t be accessed, it doesn’t exist. Here, we disentangle the two in
order to consider _access_ independent from questions of resilience.
##### Technically Accessible
When you create a new library in Dat, a unique 64-digit “key” will
automatically be generated for it. An example key is
`6f963e59e9948d14f5d2eccd5b5ac8e157ca34d70d724b41cb0f565bc01162bf`, which
points to a library of texts. In order for someone else to see the library you
have created, you must provide to them your library’s unique key (by email,
chat, on paper or you could publish it on your website). In short, _you_
manage access to the library by copying that key, and then every key holder
also manages access _ad infinitum_.
At the moment this has its limitations. A Dat is only writable by a single
creator. If you want to collaboratively develop a library or reading list, you
need to have a single administrator managing its contents. This will change in
the near future with the integration of
[hyperdb](https://github.com/mafintosh/hyperdb) into Dat’s core. At that
point, the platform will enable multiple contributors and the management of
permissions, and our single key will become a key chain.
How is this key any different from knowing the domain name of a website? If a
site isn’t indexed by Google and has a suitably unguessable domain name, then
isn’t that effectively the same degree of privacy? Yes, and this is precisely
why the metaphor of the key is so apt (with whom do you share the key to your
apartment?) but also why it is limited. With the key, one not only has the
ability to _enter_ the library, but also to completely _reproduce_ the
library.
##### Consenting Accessibility
When we say “accessibility”, some hear “information wants to be free” — but
our idea of accessibility is not about indiscriminate open access to
everything. While we do support, in many instances, the desire to increase
access to knowledge where it has been restricted by monopoly property
ownership, or the urge to increase transparency in delegated decision-making
and representative government, we also recognise that Indigenous knowledge
traditions often depend on ownership, control, consent, and secrecy in the
hands of the traditions’ people. [see [“Managing Indigenous Knowledge and
Indigenous Cultural and Intellectual
Property”](https://epress.lib.uts.edu.au/system/files_force/Aus%20Indigenous%20Knowledge%20and%20Libraries.pdf?download=1),
pg 83] Accessibility understood in merely quantitative terms isn’t able to
reconcile these positions, which this is why we refuse to limit “access” to a
question of technology.
While “digital rights management” technologies have been developed almost
exclusively for protecting the commercial interests of capitalist property
owners within Western intellectual property regimes, many of the assumptions
and technological implementations are inadequate for the protection of
Indigenous knowledge. Rather than describing access in terms of commodities
and ownership of copyright, it might be defined by membership, status or role
within a community, and the rules of access would not be managed by a
generalised legal system but by the rules and traditions of the people and
their knowledge. [[“The Role of Information Technologies in Indigenous
Knowledge
Management”](https://epress.lib.uts.edu.au/system/files_force/Aus%20Indigenous%20Knowledge%20and%20Libraries.pdf?download=1),
101-102] These rights would not expire, nor would they be bought and sold,
because they are shared, i.e., held in common.
It is important, while imagining the possibilities of a technological
protocol, to also consider how different _cultural protocols_ might be
implemented and protected through the life of a project like Dat Library.
Certain aspects of this might be accomplished through library metadata, but
ultimately it is through people hosting their own archives and libraries
(rather than, for example, having them hosted by a state institution) that
cultural protocols can be translated and reproduced. Perhaps we should flip
the typical question of how might a culture exist within digital networks to
instead ask how should digital networks operate within cultural protocols?
### Adaptability (ability to use/modify as one’s own)
Durability and accessibility are the foundations of adoptability. Many would
say that this is a contradiction, that adoption is about use and
transformation and those qualities operate against the preservationist grain
of durability, that one must always be at the expense of the other. We say:
perhaps that is true, but it is a risk we’re willing to take because we don’t
want to be making monuments and cemeteries that people approach with reverence
or fear. We want tools and stories that we use and adapt and are always making
new again. But we also say: it is through use that something becomes
invaluable, which may change or distort but will not destroy — this is the
practical definition of durability. S.R. Ranganathan’s very first Law of
Library Science was [“BOOKS ARE FOR
USE”](https://babel.hathitrust.org/cgi/pt?id=uc1.$b99721;view=1up;seq=37),
which we would extend to the library itself, such that when he arrives at his
final law, [“THE LIBRARY IS A LIVING
ORGANISM”](https://babel.hathitrust.org/cgi/pt?id=uc1.$b99721;view=1up;seq=432),
we note that to live means not only to change, but also to live _in the
world_.
To borrow and gently distort another concept of Raganathan’s concepts, namely
that of ‘[Infinite
Hospitality](http://www.dextersinister.org/MEDIA/PDF/InfiniteHospitality.pdf)’,
it could be said that we are interested in ways to construct a form of
infrastructure that is infinitely hospitable. By this we mean, infrastructure
that accommodates the needs and desires of new users/audiences/communities and
allows them to enter and contort the technology to their own uses. We really
don’t see infrastructure as aimed at a single specific group, but rather that
it should generate spaces that people can inhabit as they wish. The poet Jean
Paul once wrote that books are thick letters to friends. Books as
infrastructure enable authors to find their friends. This is how we ideally
see Dat Library and HyperReadings working.
## Use cases
We began work on Dat Library and HyperReadings with a range of exemplary use
cases, real-world circumstances in which these projects might intervene. Not
only would the use cases make demands on the software we were and still are
beginning to write, but they would also give us demands to make on the Dat
protocol, which is itself still in the formative stages of development. And,
crucially, in an iterative feedback loop, this process of design produces
transformative effects on those situations described in the use cases
themselves, resulting in further new circumstances and new demands.
### Thorunka
Wendy Bacon and Chris Nash made us aware of Thorunka and Thor.
_Thorunka_ and _Thor_ were two underground papers in the early 1970’s that
spewed out from a censorship controversy surrounding the University of New
South Wales student newspaper _Tharunka_. Between 1971 and 1973, the student
magazine was under focused attack from the NSW state police, with several
arrests made on charges of obscenity and indecency. Rather than ceding to the
charges, this prompted a large and sustained political protest from Sydney
activists, writers, lawyers, students and others, to which _Thorunka_ and
_Thor_ were central.
> “The campaign contested the idea of obscenity and the legitimacy of the
legal system itself. The newspapers campaigned on the war in Vietnam,
Aboriginal land rights, women’s and gay liberation, and the violence of the
criminal justice system. By 1973 the censorship regime in Australia was
broken. Nearly all the charges were dropped.” – [Quotation from the 107
Projects Event](http://107.org.au/event/tharunka-thor-journalism-politics-
art-1970-1973/).
Although the collection of issues of _Tharunka_ is largely accessible [via
Trove](http://trove.nla.gov.au/newspaper/page/24773115), the subsequent issues
of _Thorunka_ , and later _Thor_ , are not. For us, this demonstrates clearly
how collections themselves can encourage modes of reading. If you focus on
_Tharunka_ as a singular and long-standing periodical, this significant
political moment is rendered almost invisible. On the other hand, if the
issues are presented together, with commentary and surrounding publications,
the political environment becomes palpable. Wendy and Chris have kindly
allowed us to make their personal collection available via Dat Library (the
key is: 73fd26846e009e1f7b7c5b580e15eb0b2423f9bea33fe2a5f41fac0ddb22cbdc), so
you can discover this for yourself.
### Academia.edu alternative
Academia.edu, started in 2008, has raised tens of millions of dollars as a
social network for academics to share their publications. As a for-profit
venture, it is rife with metrics and it attempts to capitalise on the innate
competition and self-promotion of precarious knowledge workers in the academy.
It is simultaneously popular and despised: popular because it fills an obvious
desire to share the fruits of ones intellectual work, but despised for the
neoliberal atmosphere that pervades every design decision and automated
correspondence. It is, however, just trying to provide a return on investment.
[Gary Hall has written](http://www.garyhall.info/journal/2015/10/18/does-
academiaedu-mean-open-access-is-becoming-irrelevant.html) that “its financial
rationale rests … on the ability of the angel-investor and venture-capital-
funded professional entrepreneurs who run Academia.edu to exploit the data
flows generated by the academics who use the platform as an intermediary for
sharing and discovering research”. Moreover, he emphasises that in the open-
access world (outside of the exploitative practice of for-profit publishers
like Elsevier, who charge a premium for subscriptions), the privileged
position is to be the one “ _who gate-keeps the data generated around the use
of that content_ ”. This lucrative position has been produced by recent
“[recentralising tendencies](http://commonstransition.org/the-revolution-will-
not-be-decentralised-blockchains/)” of the internet, which in Academia’s case
captures various, scattered open access repositories, personal web pages, and
other archives.
Is it possible to redecentralise? Can we break free of the subjectivities that
Academia.edu is crafting for us as we are interpellated by its infrastructure?
It is incredibly easy for any scholar running Dat Library to make a library of
their own publications and post the key to their faculty web page, Facebook
profile or business card. The tricky — and interesting — thing would be to
develop platforms that aggregate thousands of these libraries in direct
competition with Academia.edu. This way, individuals would maintain control
over their own work; their peer groups would assist in mirroring it; and no
one would be capitalising on the sale of data related to their performance and
popularity.
We note that Academia.edu is a typically centripetal platform: it provides no
tools for exporting one’s own content, so an alternative would necessarily be
a kind of centrifuge.
This alternative is becoming increasingly realistic. With open-access journals
already paving the way, there has more recently been a [call for free and open
access to citation data](https://www.insidehighered.com/news/2017/12/06
/scholars-push-free-access-online-citation-data-saying-they-need-and-deserve-
access). [The Initiative for Open Citations (I4OC)](https://i4oc.org) is
mobilising against the privatisation of data and working towards the
unrestricted availability of scholarly citation data. We see their new
database of citations as making this centrifugal force a possibility.
### Publication format
In writing this README, we have strung together several references. This
writing might be published in a book and the references will be listed as
words at the bottom of the page or at the end of the text. But the writing
might just as well be published as a HyperReadings object, providing the
reader with an archive of all the things we referred to and an editable
version of this text.
A new text editor could be created for this new publication format, not to
mention a new form of publication, which bundles together a set of
HyperReadings texts, producing a universe of texts and references. Each
HyperReadings text might reference others, of course, generating something
that begins to feel like a serverless World Wide Web.
It’s not even necessary to develop a new publication format, as any book might
be considered as a reading list (usually found in the footnotes and
bibliography) with a very detailed description of the relationship between the
consulted texts. What if the history of published works were considered in
this way, such that we might always be able to follow a reference from one
book directly into the pages of another, and so on?
### Syllabus
The syllabus is the manifesto of the twenty-first century. From [Your
Baltimore “Syllabus”](https://apis4blacklives.wordpress.com/2015/05/01/your-
baltimore-syllabus/), to
[#StandingRockSyllabus](https://nycstandswithstandingrock.wordpress.com/standingrocksyllabus/),
to [Women and gender non-conforming people writing about
tech](https://docs.google.com/document/d/1Qx8JDqfuXoHwk4_1PZYWrZu3mmCsV_05Fe09AtJ9ozw/edit),
syllabi are being produced as provocations, or as instructions for
reprogramming imaginaries. They do not announce a new world but they point out
a way to get there. As a programme, the syllabus shifts the burden of action
onto the readers, who will either execute the programme on their own fleshy
operating system — or not. A text that by its nature points to other texts,
the syllabus is already a relational document acknowledging its own position
within a living field of knowledge. It is decidedly not self-contained,
however it often circulates as if it were.
If a syllabus circulated as a HyperReadings document, then it could point
directly to the texts and other media that it aggregates. But just as easily
as it circulates, a HyperReadings syllabus could be forked into new versions:
the syllabus is changed because there is a new essay out, or because of a
political disagreement, or because following the syllabus produced new
suggestions. These forks become a family tree where one can follow branches
and trace epistemological mutations.
## Proposition (or Presuppositions)
While the software that we have started to write is a proposition in and of
itself, there is no guarantee as to _how_ it will be used. But when writing,
we _are_ imagining exactly that: we are making intuitive and hopeful
presuppositions about how it will be used, presuppositions that amount to a
set of social propositions.
### The role of individuals in the age of distribution
Different people have different technical resources and capabilities, but
everyone can contribute to an archive. By simply running the Dat Library
software and adding an archive to it, a person is sharing their disk space and
internet bandwidth in the service of that archive. At first, it is only the
archive’s index (a list of the contents) that is hosted, but if the person
downloads the contents (or even just a small portion of the contents) then
they are sharing in the hosting of the contents as well. Individuals, as
supporters of an archive or members of a community, can organise together to
guarantee the durability and accessibility of an archive, saving a future
UbuWeb from ever having to worry about if their ‘ISP pulling the plug’. As
supporters of many archives, as members of many communities, individuals can
use Dat Library to perform this function many times over.
On the Web, individuals are usually users or browsers — they use browsers. In
spite of the ostensible interactivity of the medium, users are kept at a
distance from the actual code, the infrastructure of a website, which is run
on a server. With a distributed protocol like Dat, applications such as
[Beaker Browser](https://beakerbrowser.com) or Dat Library eliminate the
central server, not by destroying it, but by distributing it across all of the
users. Individuals are then not _just_ users, but also hosts. What kind of
subject is this user-host, especially as compared to the user of the server?
Michel Serres writes in _The Parasite_ :
> “It is raining; a passer-by comes in. Here is the interrupted meal once
more. Stopped for only a moment, since the traveller is asked to join the
diners. His host does not have to ask him twice. He accepts the invitation and
sits down in front of his bowl. The host is the satyr, dining at home; he is
the donor. He calls to the passer-by, saying to him, be our guest. The guest
is the stranger, the interrupter, the one who receives the soup, agrees to the
meal. The host, the guest: the same word; he gives and receives, offers and
accepts, invites and is invited, master and passer-by… An invariable term
through the transfer of the gift. It might be dangerous not to decide who is
the host and who is the guest, who gives and who receives, who is the parasite
and who is the table d’hote, who has the gift and who has the loss, and where
hospitality begins with hospitality.” — Michel Serres, The Parasite (Baltimore
and London: The Johns Hopkins University Press), 15–16.
Serres notes that _guest_ and _host_ are the same word in French; we might say
the same for _client_ and _server_ in a distributed protocol. And we will
embrace this multiplying hospitality, giving and taking without measure.
### The role of institutions in the age of distribution
David Cameron launched a doomed initiative in 2010 called the Big Society,
which paired large-scale cuts in public programmes with a call for local
communities to voluntarily self-organise to provide these essential services
for themselves. This is not the political future that we should be working
toward: since 2010, austerity policies have resulted in [120,000 excess deaths
in England](http://bmjopen.bmj.com/content/7/11/e017722). In other words,
while it might seem as though _institutions_ might be comparable to _servers_
, inasmuch as both are centralised infrastructures, we should not give them up
or allow them to be dismantled under the assumption that those infrastructures
can simply be distributed and self-organised. On the contrary, institutions
should be defended and organised in order to support the distributed protocols
we are discussing.
One simple way for a larger, more established institution to help ensure the
durability and accessibility of diverse archives is through the provision of
hardware, network capability and some basic technical support. It can back up
the archives of smaller institutions and groups within its own community while
also giving access to its own archives so that those collections might be put
to new uses. A network of smaller institutions, separated by great distances,
might mirror each other’s archives, both as an expression of solidarity and
positive redundancy and also as a means of circulating their archives,
histories and struggles amongst each of the others.
It was the simultaneous recognition that some documents are too important to
be privatised or lost to the threats of neglect, fire, mould, insects, etc.,
that prompted the development of national and state archives (See page 39 in
[Beredo, B. C., Import of the archive: American colonial bureaucracy in the
Philippines, 1898-1916](http://hdl.handle.net/10125/101724)). As public
institutions they were, and still are, tasked with often competing efforts to
house and preserve while simultaneously also ensuring access to public
documents. Fire and unstable weather understandably have given rise to large
fire-proof and climate-controlled buildings as centralised repositories,
accompanied by highly regulated protocols for access. But in light of new
technologies and their new risks, as discussed above, it is compelling to
argue now that, in order to fulfil their public duty, public archives should
be distributing their collections where possible and providing their resources
to smaller institutions and community groups.
Through the provision of disk space, office space, grants, technical support
and employment, larger institutions can materially support smaller
organisations, individuals and their archival afterlives. They can provide
physical space and outreach for dispersed collectors, gathering and piecing
together a fragmented archive.
But what happens as more people and collections are brought in? As more
institutional archives are allowed to circulate outside of institutional
walls? As storage is cut loose from its dependency on the corporate cloud and
into forms of interdependency, such as mutual support networks? Could this
open up spaces for new forms of not-quite-organisations and queer-
institutions? These would be almost-organisations that uncomfortable exist
somewhere between the common categorical markings of the individual and the
institution. In our thinking, its not important what these future forms
exactly look like. Rather, as discussed above, what is important to us is that
in writing software we open up spaces for the unknown, and allow others agency
to build the forms that work for them. It is only in such an atmosphere of
infinite hospitality that we see the future of community libraries, individual
collections and other precarious archives.
## A note on this text
This README was, and still is being, collaboratively written in a
[Git](https://en.wikipedia.org/wiki/Git)
[repository](https://en.wikipedia.org/wiki/Repository_\(version_control\)).
Git is a free and open-source tool for version control used in software
development. All the code for Hyperreadings, Dat Library and their numerous
associated modules are managed openly using Git and hosted on GitHub under
open source licenses. In a real way, Git’s specification formally binds our
collaboration as well as the open invitation for others to participate. As
such, the form of this README reflects its content. Like this text, these
projects are, by design, works in progress that are malleable to circumstances
and open to contributions, for example by opening a pull request on this
document or raising an issue on our GitHub repositories.
Barok
Poetics of Research
2014
_An unedited version of a talk given at the conference[Public
Library](http://www.wkv-stuttgart.de/en/program/2014/events/public-library/)
held at Württembergischer Kunstverein Stuttgart, 1 November 2014._
_Bracketed sequences are to be reformulated._
Poetics of Research
In this talk I'm going to attempt to identify [particular] cultural
algorithms, ie. processes in which cultural practises and software meet. With
them a sphere is implied in which algorithms gather to form bodies of
practices and in which cultures gather around algorithms. I'm going to
approach them through the perspective of my practice as a cultural worker,
editor and artist, considering practice in the same rank as theory and
poetics, and where theorization of practice can also lead to the
identification of poetical devices.
The primary motivation for this talk is an attempt to figure out where do we
stand as operators, users [and communities] gathering around infrastructures
containing a massive body of text (among other things) and what sort of things
might be considered to make a difference [or to keep making difference].
The talk mainly [considers] the role of text and the word in research, by way
of several figures.
A
A reference, list, scheme, table, index; those things that intervene in the
flow of narrative, illustrating the point, perhaps in a more economic way than
the linear text would do. Yet they don't function as pictures, they are
primarily texts, arranged in figures. Their forms have been
standardised[normalised] over centuries, withstood the transition to the
digital without any significant change, being completely intuitive to the
modern reader. Compared to the body of text they are secondary, run parallel
to it. Their function is however different to that of the punctuation. They
are there neither to shape the narrative nor to aid structuring the argument
into logical blocks. Nor is their function spatial, like in visual poems.
Their positions within a document are determined according to the sequential
order of the text, [standing as attachments] and are there to clarify the
nature of relations among elements of the subject-matter, or to establish
relations with other documents. The [premise] of my talk is that these
_textual figures_ also came to serve as the abstract[relational] models
determining possible relations among documents as such, and in consequence [to
structure conditions [of research]].
B
It can be said that research, as inquiry into a subject-matter, consists of
discrete queries. A query, such as a question about what something is, what
kinds, parts and properties does it have, and so on, can be consulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.
Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.
C
Searching in a collection of [amassed/assembled] [tangible] documents (ie.
bookshelf) is different from searching in a systematically structured
repository (library) and even more so from searching in a digital repository
(digital library). Not that they are mutually exclusive. One can devise
structures and algorithms to search through a printed text, or read books in a
library one by one. They are rather [models] [embodying] various [processes]
associated with the query. These properties of the query might be called [the
sequence], the structure and the index. If they are present in the ways of
querying documents, and we will return to this issue, are they persistent
within the inquiry as such? [wait]
D
This question itself is a rupture in the sequence. It makes a demand to depart
from one narrative [a continuous flow of words] to another, to figure out,
while remaining bound to it [it would be even more as a so-called rhetorical
question]. So there has been one sequence, or line, of the inquiry--about the
kinds of the query and its properties. That sequence itself is a digression,
from within the sequence about what is research and describing its parts
(queries). We are thus returning to it and continue with a question whether
the properties of the inquiry are the same as the properties of the query.
E
But isn't it true that every single utterance occurring in a sequence yields a
query as well? Let's consider the word _utterance_. [wait] It can produce a
number of associations, for example with how Foucault employs the notion of
_énoncé_ in his _Archaeology of Knowledge_ , giving hard time to his English
translators wondering whether _utterance_ or _statement_ is more appropriate,
or whether they are interchangeable, and what impact would each choice have on
his reception in the Anglophone world. Limiting ourselves to textual forms for
now (and not translating his work but pursing a different inquiry), let us say
the utterance is a word [or a phrase or an idiom] in a sequence such as a
sentence, a paragraph, or a document.
## (F) The
structure[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=1
"Edit section: \(F\) The structure")]
This distinction is as old as recorded Western thought since both Plato and
Aristotle differentiate between a word on its own ("the said", a thing said)
and words in the company of other words. For example, Aristotle's _Categories_
[lay] on the [notion] of words on their own, and they are made the subject-
matter of that inquiry. [For him], the ambiguity of connotation words
[produce] lies in their synonymity, understood differently from the moderns--
not as more words denoting a similar thing but rather one word denoting
various things. Categories were outlined as a device to differentiate among
words according to kinds of these things. Every word as such belonged to not
less and not more than one of ten categories.
So it happens to the word _utterance_ , as to any other word uttered in a
sequence, that it poses a question, a query about what share of the spectrum
of possibly denoted things might yield as the most appropriate in a given
context. The more context the more precise share comes to the fore. When taken
out of the context ambiguity prevails as the spectrum unveils in its variety.
Thus single words [as any other utterances] are questions, queries,
themselves, and by occuring in statements, in context, their [means] are being
singled out.
This process is _conditioned_ by what has been formalized as the techniques of
_regulating_ definitions of words.
### (G) The structure: words as
words[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=2
"Edit section: \(G\) The structure: words as words")]
* [![](/images/thumb/c/c8/Philitas_in_P.Oxy.XX_2260_i.jpg/144px-Philitas_in_P.Oxy.XX_2260_i.jpg)](/File:Philitas_in_P.Oxy.XX_2260_i.jpg)
P.Oxy.XX 2260 i: Oxyrhynchus papyrus XX, 2260, column i, with quotation from
Philitas, early 2nd c. CE. 1(http://163.1.169.40/cgi-
bin/library?e=q-000-00---0POxy--00-0-0--0prompt-10---4------0-1l--1-en-50---
20-about-2260--
00031-001-0-0utfZz-8-00&a=d&c=POxy&cl=search&d=HASH13af60895d5e9b50907367)
2(http://en.wikipedia.org/wiki/File:POxy.XX.2260.i-Philitas-
highlight.jpeg)
* [![](/images/thumb/9/9e/Cyclopaedia_1728_page_210_Dictionary_entry.jpg/88px-Cyclopaedia_1728_page_210_Dictionary_entry.jpg)](/File:Cyclopaedia_1728_page_210_Dictionary_entry.jpg)
Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , 1728, p. 210. 3(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0576&id=HistSciTech.Cyclopaedia01&isize=L)
* [![](/images/thumb/b/b8/Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg/160px-Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)](/File:Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)
Detail from the Liddell-Scott Greek-English Lexicon, c1843.
Dictionaries have had a long life. The ancient Greek scholar and poet Philitas
of Cos living in the 4th c. BCE wrote a vocabulary explaining the meanings of
rare Homeric and other literary words, words from local dialects, and
technical terms. The vocabulary, called _Disorderly Words_ (Átaktoi glôssai),
has been lost, with a few fragments quoted by later authors. One example is
that the word πέλλα (pélla) meant "wine cup" in the ancient Greek region of
Boeotia; contrasted to the same word meaning "milk pail" in Homer's _Iliad_.
Not much has changed in the way how dictionaries constitute order. Selected
archives of statements are queried to yield occurrences of particular words,
various _criteria[indicators]_ are applied to filtering and sorting them and
in turn the spectrum of [denoted] things allocated in this way is structured
into groups and subgroups which are then given, according to other set of
rules, shorter or longer names. These constitute facets of [potential]
meanings of a word.
So there are at least _four_ sets of conditions [structuring] dictionaries.
One is required to delimit an archive[corpus of texts], one to select and give
preference[weights] to occurrences of a word, another to cluster them, and yet
another to abstract[generalize] the subject-matter of each of these clusters.
Needless to say, this is a craft of a few and these criteria are rarely being
disclosed, despite their impact on research, and more generally, their
influence as conditions for production[making] of a so called _common sense_.
It doesn't take that much to reimagine what a dictionary is and what it could
be, especially having large specialized corpora of texts at hand. These can
also serve as aids in production of new words and new meanings.
### (H) The structure: words as knowledge and the
world[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=3
"Edit section: \(H\) The structure: words as knowledge and the world")]
* [![](/images/thumb/0/02/Boethius_Porphyrys_Isagoge.jpg/120px-Boethius_Porphyrys_Isagoge.jpg)](/File:Boethius_Porphyrys_Isagoge.jpg)
Boethius's rendering of a classification tree described in Porphyry's Isagoge
(3th c.), [6th c.] 10th c.
4(http://www.e-codices.unifr.ch/en/sbe/0315/53/medium)
* [![](/images/thumb/d/d0/Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg/94px-Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)](/File:Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)
Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , London, 1728, p. II. 5(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0015&id=HistSciTech.Cyclopaedia01&isize=L)
* [![](/images/thumb/d/d6/Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg/116px-Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)](/File:Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)
Système figuré des connaissances humaines, _Encyclopédie ou Dictionnaire
raisonné des sciences, des arts et des métiers_ , 1751.
6(http://encyclopedie.uchicago.edu/content/syst%C3%A8me-figur%C3%A9-des-
connaissances-humaines)
* [![](/images/thumb/9/96/Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg/96px-Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)](/File:Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)
Haeckel - Darwin's tree.
Another _formalized_ and [internalized] process being at play when figuring
out a word is its [containment]. Word is not only structured by way of things
it potentially denotes but also by words it is potentially part of and those
it contains.
The fuzz around categorization of knowledge _and_ the world in the Western
thought can be traced back to Porphyry, if not further. In his introduction to
Aristotle's _Categories_ this 3rd century AD Neoplatonist began expanding the
notions of genus and species into their hypothetic consequences. Aristotle's
brief work outlines ten categories of 'things that are said' (legomena,
λεγόμενα), namely substance (or substantive, {not the same as matter!},
οὐσία), quantity (ποσόν), qualification (ποιόν), a relation (πρός), where
(ποῦ), when (πότε), being-in-a-position (κεῖσθαι), having (or state,
condition, ἔχειν), doing (ποιεῖν), and being-affected (πάσχειν). In his
different work, _Topics_ , Aristotle outlines four kinds of subjects/materials
indicated in propositions/problems from which arguments/deductions start.
These are a definition (όρος), a genus (γένος), a property (ἴδιος), and an
accident (συμβεβηϰόϛ). Porphyry does not explicitly refer _Topics_ , and says
he omits speaking "about genera and species, as to whether they subsist (in
the nature of things) or in mere conceptions only"
8(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C1),
which means he avoids explicating whether he talks about kinds of concepts or
kinds of things in the sensible world. However, the work sparked confusion, as
the following passage [suggests]:
> "[I]n each category there are certain things most generic, and again, others
most special, and between the most generic and the most special, others which
are alike called both genera and species, but the most generic is that above
which there cannot be another superior genus, and the most special that below
which there cannot be another inferior species. Between the most generic and
the most special, there are others which are alike both genera and species,
referred, nevertheless, to different things, but what is stated may become
clear in one category. Substance indeed, is itself genus, under this is body,
under body animated body, under which is animal, under animal rational animal,
under which is man, under man Socrates, Plato, and men particularly." (Owen
1853,
9(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C2))
Porphyry took one of Aristotle's ten categories of the word, substance, and
dissected it using one of his four rhetorical devices, genus. Employing
Aristotle's categories, genera and species as means for logical operations,
for dialectic, Porphyry's interpretation resulted in having more resemblance
to the perceived _structures_ of the world. So they began to bloom.
There were earlier examples, but Porphyry was the most influential in
injecting the _universalist_ version of classification [implying] the figure
of a tree into the [locus] of Aristotle's thought. Knowledge became
monotheistic.
Classification schemes [growing from one point] play a major role in
untangling the format of modern encyclopedia from that of the dictionary
governed by alphabet. Two of the most influential encyclopedias of the 18th
century are cases in the point. Although still keeping 'dictionary' in their
titles, they are conceived not to represent words but knowledge. The [upper-
most] genus of the body was set as the body of knowledge. The English
_Cyclopaedia, or an Universal Dictionary of Arts and Sciences_ (1728) splits
into two main branches: "natural and scientifical" and "artificial and
technical"; these further split down to 47 classes in total, each carrying a
structured list (on the following pages) of thematic articles, serving as
table of contents. The French _Encyclopedia: or a Systematic Dictionary of the
Sciences, Arts, and Crafts_ (1751) [unwinds] from judgement ( _entendement_ ),
branches into memory as history, reason as philosophy, and imagination as
poetry. The logic of containers was employed as an aid not only to deal with
the enormous task of naming and not omiting anything from what is known, but
also for the management of labour of hundreds of writers and researchers, to
create a mechanism for delegating work and the distribution of
responsibilities. Flesh was also more present, in the field research, with
researchers attending workshops and sites of everyday life to annotate it.
The world came forward to unshine the word in other schemes. Darwin's tree of
evolution and some of the modern document classification systems such as
Charles A. Cutter's _Expansive Classification_ (1882) set to classify the
world itself and set the field for what has came to be known as authority
lists structuring metadata in today's computing.
### The structure
(summary)[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=4
"Edit section: The structure \(summary\)")]
Facetization of meaning and branching of knowledge are both the domain of the
unit of utterance.
While lexicographers[dictionarists] structure thought through multi-layered
processes of abstraction of the written record, knowledge growers dissect it
into hierarchies of [mutually] contained notions.
One seek to describe the word as a faceted list of small worlds, another to
describe the world as a structured lists of words. One play prime in the
domain of epistemology, in what is known, controlling the vocabulary, another
in the domain of ontology, in what is, controlling reality.
Every [word] has its given things, every thing has its place, closer or
further from a single word.
The schism between classifying words and classifying the world implies it is
not possible to construct a universal classification scheme[system]. On top of
that, any classification system of words is bound to a corpus of texts it is
operating upon and any classification system of the world again operates with
words which are bound to a vocabulary[lexicon] which is again bound to a
corpus [of texts]. It doesn't mean it would prevent people from trying.
Classifications function as descriptors of and 'inscriptors' upon the world,
imprinting their authority. They operate from [a locus of] their
corpus[context]-specificity. The larger the corpus, the more power it has on
shaping the world, as far as the word shapes it (yes, I do imply Google here,
for which it is a domain to be potentially exploited).
## (J) The
sequence[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=5
"Edit section: \(J\) The sequence")]
The structure-yielding query [of] the single word [shrinks][zuzuje
sa,spresnuje] with preceding and following words. Inquiry proceeds in the flow
that establishes another kind[mode] of relationality, chaining words into the
sequence. While the structuring property of the query brings words apart from
each other, its sequential property establishes continuity and brings these
units into an ordered set.
This is what is responsible for attaching textual figures mentioned earlier
(lists, schemes, tables) to the body of the text. Associations can be also
stated explicitly, by indexing tables and then referring them from a
particular point in the text. The same goes for explicit associations made
between blocks of the text by means of indexed paragraphs, chapters or pages.
From this follows that all utterances point to the following utterance by the
nature of sequential order, and indexing provides means for pointing elsewhere
in the document as well.
A lot can be said about references to other texts. Here, to spare time, I
would refer you to a talk I gave a few months ago and which is online
10(http://monoskop.org/Talks/Communing_Texts).
This is still the realm of print. What happens with document when it is
digitized?
Digitization breaks a document into units of which each is assigned a numbered
position in the sequence of the document. From this perspective digitization
can be viewed as a total indexation of the document. It is converted into
units rendered for machine operations. This sequentiality is made explicit, by
means of an underlying index.
Sequences and chains are orders of one dimension. Their one-dimensional
ordering allows addressability of each element and [random] access. [Jumps]
between [random] addresses are still sequential, processing elements one at a
time.
## (K) The
index[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=6
"Edit section: \(K\) The index")]
* [![](/images/thumb/2/27/Summa_confessorum.1310.jpg/103px-Summa_confessorum.1310.jpg)](/File:Summa_confessorum.1310.jpg)
Summa confessorum [1297-98], 1310.
7(http://www.bl.uk/onlinegallery/onlineex/illmanus/roymanucoll/j/011roy000008g11u00002000.html)
[The] sequencing not only weaves words into statements but activates other
temporalities, and _presents occurrences of words from past statements_. As
now when I am saying the word _utterance_ , each time there surface contexts
in which I have used it earlier.
A long quote from Frederick G. Kilgour, _The Evolution of the Book_ , 1998, pp
76-77:
> "A century of invention of various types of indexes and reference tools
preceded the advent of the first subject index to a specific book, which
occurred in the last years of the thirteenth century. The first subject
indexes were "distinctions," collections of "various figurative or symbolic
meanings of a noun found in the scriptures" that "are the earliest of all
alphabetical tools aside from dictionaries." (Richard and Mary Rouse supply an
example: "Horse = Preacher. Job 39: 'Hast thou given the horse strength, or
encircled his neck with whinning?')
>
> [Concordance] By the end of the third decade of the thirteenth century Hugh
de Saint-Cher had produced the first word concordance. It was a simple word
index of the Bible, with every location of each word listed by [its position
in the Bible specified by book, chapter, and letter indicating part of the
chapter]. Hugh organized several dozen men, assigning to each man an initial
letter to search; for example, the man assigned M was to go through the entire
Bible, list each word beginning with M and give its location. As it was soon
perceived that this original reference work would be even more useful if words
were cited in context, a second concordance was produced, with each word in
lengthy context, but it proved to be unwieldy. [Soon] a third version was
produced, with words in contexts of four to seven words, the model for
biblical concordances ever since.
>
> [Subject index] The subject index, also an innovation of the thirteenth
century, evolved over the same period as did the concordance. Most of the
early topical indexes were designed for writing sermons; some were organized,
while others were apparently sequential without any arrangement. By midcentury
the entries were in alphabetical order, except for a few in some classified
arrangement. Until the end of the century these alphabetical reference works
indexed a small group of books. Finally John of Freiburg added an alphabetical
subject index to his own book, _Summa Confessorum_ (1297—1298). As the Rouses
have put it, 'By the end of the [13]th century the practical utility of the
subject index is taken for granted by the literate West, no longer solely as
an aid for preachers, but also in the disciplines of theology, philosophy, and
both kinds of law.'"
In one sense neither subject-index nor concordane are indexes, they are words
or group of words selected according to given criteria from the body of the
text, each accompanied with a list of identifiers. These identifiers are
elements of an index, whether they represent a page, chapter, column, or other
[kind of] block of text. Every identifier is an unique _address_.
The index is thus an ordering of a sequence by means of associating its
elements with a set of symbols, when each element is given unique combination
of symbols. Different sizes of sets yield different number of variations.
Symbol sets such as an alphabet, arabic numerals, roman numerals, and binary
digits have different proportions between the length of a string of symbols
and the number of possible variations it can contain. Thus two symbols of
English alphabet can store 26^2 various values, of arabic numerals 10^2, of
roman numberals 8^2 and of binary digits 2^2.
Indexation is segmentation, a breaking into segments. From as early as the
13th century the index such as that of sections has served as enabler of
search. The more [detailed] indexation the more precise search results it
enables.
The subject-index and concordance are tables of search results. There is a
direct lineage from the 13th-century biblical concordances and the birth of
computational linguistic analysis, they were both initiated and realised by
priests.
During the World War II, Jesuit Father Roberto Busa began to look for machines
for the automation of the linguistic analysis of the 11 million-word Latin
corpus of Thomas Aquinas and related authors.
Working on his Ph.D. thesis on the concept of _praesens_ in Aquinas he
realised two things:
> "I realized first that a philological and lexicographical inquiry into the
verbal system of an author has t o precede and prepare for a doctrinal
interpretation of his works. Each writer expresses his conceptual system in
and through his verbal system, with the consequence that the reader who
masters this verbal system, using his own conceptual system, has to get an
insight into the writer's conceptual system. The reader should not simply
attach t o the words he reads the significance they have in his mind, but
should try t o find out what significance they had in the writer's mind.
Second, I realized that all functional or grammatical words (which in my mind
are not 'empty' at all but philosophically rich) manifest the deepest logic of
being which generates the basic structures of human discourse. It is .this
basic logic that allows the transfer from what the words mean today t o what
they meant to the writer.
>
> In the works of every philosopher there are two philosophies: the one which
he consciously intends to express and the one he actually uses to express it.
The structure of each sentence implies in itself some philosophical
assumptions and truths. In this light, one can legitimately criticize a
philosopher only when these two philosophies are in contradiction."
11(http://www.alice.id.tue.nl/references/busa-1980.pdf)
Collaborating with the IBM in New York from 1949, the work, a concordance of
all the words of Thomas Aquinas, was finally published in the 1970s in 56
printed volumes (a version is online since 2005
12(http://www.corpusthomisticum.org/it/index.age)). Besides that, an
electronic lexicon for automatic lemmatization of Latin words was created by a
team of ten priests in the scope of two years (in two phases: grouping all the
forms of an inflected word under their lemma, and coding the morphological
categories of each form and lemma), containing 150,000 forms
13(http://www.alice.id.tue.nl/references/busa-1980.pdf#page=4). Father
Busa has been dubbed the father of humanities computing and recently also of
digital humanities.
The subject-index has a crucial role in the printed book. It is the only means
for search the book offers. Subjects composing an index can be selected
according to a classification scheme (specific to a field of an inquiry), for
example as elements of a certain degree (with a given minimum number of
subclasses).
Its role seemingly vanishes in the digital text. But it can be easily
transformed. Besides serving as a table of pre-searched results the subject-
index also gives a distinct idea about content of the book. Two patterns give
us a clue: numbers of occurrences of selected words give subjects weights,
while words that seem specific to the book outweights other even if they don't
occur very often. A selection of these words then serves as a descriptor of
the whole text, and can be thought of as a specific kind of 'tags'.
This process was formalized in a mathematical function in the 1970s, thanks to
a formula by Karen Spärck Jones which she entitled 'inverse document
frequency' (IDF), or in other words, "term specificity". It is measured as a
proportion of texts in the corpus where the word appears at least once to the
total number of texts. When multiplied by the frequency of the word _in_ the
text (divided by the maximum frequency of any word in the text), we get _term
frequency-inverse document frequency_ (tf-idf). In this way we can get an
automated list of subjects which are particular in the text when compared to a
group of texts.
We came to learn it by practice of searching the web. It is a mechanism not
dissimilar to thought process involved in retrieving particular information
online. And search engines have it built in their indexing algorithms as well.
There is a paper proposing attaching words generated by tf-idf to the
hyperlinks when referring websites 14(http://bscit.berkeley.edu/cgi-
bin/pl_dochome?query_src=&format=html&collection=Wilensky_papers&id=3&show_doc=yes).
This would enable finding the referred content even after the link is dead.
Hyperlinks in references in the paper use this feature and it can be easily
tested: 15(http://www.cs.berkeley.edu/~phelps/papers/dissertation-
abstract.html?lexical-
signature=notemarks+multivalent+semantically+franca+stylized).
There is another measure, cosine similarity, which takes tf-idf further and
can be applied for clustering texts according to similarities in their
specificity. This might be interesting as a feature for digital libraries, or
even a way of organising library bottom-up into novel categories, new
discourses could emerge. Or as an aid for researchers to sort through texts,
or even for editors as an aid in producing interesting anthologies.
## Final
remarks[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=7
"Edit section: Final remarks")]
1
New disciplines emerge all the time - most recently, for example, cultural
techniques, software studies, or media archaeology. It takes years, even
decades, before they gain dedicated shelves in libraries or a category in
interlibrary digital repositories. Not that it matters that much. They are not
only sites of academic opportunities but, firstly, frameworks of new
perspectives of looking at the world, new domains of knowledge. From the
perspective of researcher the partaking in a discipline involves negotiating
its vocabulary, classifications, corpus, reference field, and specific
terms[subjects]. Creating new fields involves all that, and more. Even when
one goes against all disciplines.
2
Google can still surprise us.
3
Knowledge has been in the making for millenia. There have been (abstract)
mechanisms established that govern its conditions. We now possess specialized
corpora of texts which are interesting enough to serve as a ground to discuss
and experiment with dictionaries, classifications, indexes, and tools for
references retrieval. These all belong to the poetic devices of knowledge-
making.
4
Command-line example of tf-idf and concordance in 3 steps.
* 1\. Process the files text.1-5.txt and produce freq.1-5.txt with lists of (nonlemmatized) words (in respective texts), ordered by frequency:
> for i in {1..5}; do tr '[A-Z]' '[a-z]' < text.$i.txt | tr -c '[a-z]'
'[\012*]' | tr -d '[:punct:]' | sort | uniq -c | sort -k 1nr | sed '1,1d' >
temp.txt; max=$(awk -vvar=1 -F" " 'NRDisplay 200
300
400
500
600
700
800
900 1000
ALL
characters around the word.