tf-idf in Barok 2014

rching the web. It is a mechanism not
dissimilar to thought process involved in retrieving particular information
online. And search engines have it built in their indexing algorithms as well.

There is a paper proposing attaching words generated by tf-idf to the
hyperlinks when referring websites 14(
This would enable finding the referred content even after the link is dead.

this feature and it can be easily
tested: 15(

There is another measure, cosine similarity, which takes tf-idf further and
can be applied for clustering texts according to similarities in their
specificity. This might be interesting as a feature for digital libraries, or
even a way of organising library bottom-up into novel categories, new
discourses could em

xts which are interesting enough to serve as a ground to discuss
and experiment with dictionaries, classifications, indexes, and tools for
references retrieval. These all belong to the poetic devices of knowledge-


Command-line example of tf-idf and concordance in 3 steps.

* 1\. Process the files text.1-5.txt and produce freq.1-5.txt with lists of (nonlemmatized) words (in respective texts), ordered by frequency:

> for i in {1..5}; do tr '[A-Z]' '[a-z]' < text.$i.txt | tr -c '[a-z]'

tf-idf in Dekker & Barok 2017

ough a tag cloud acting as
a multidimensional filter (unpublished). There is a reader
containing some fifty books whose mutual references are
turned into hyperlinks, and whose main interface consists
of terms specific to each text, generated through tf-idf algorithm (unpublished). And so on.


The publishing market frames the publication as a singular
body of work, autonomous from other titles on offer, and
subjects it to the rules of the market—with a price tag and
copyright notice attached. But


Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.