Sollfrank & Mars
Public Library
2013

Marcell Mars
Public Library

Berlin, 1 February 2013

[00:13]
Public Library is the concept, the idea, to encourage people to become a
librarian, where a librarian is a person which can allow access to books – and
also which has a catalogue or index, so that it's searchable. [00:32] And the
person, the human being, can communicate, can talk with others who are
interested in that catalogue of books. [00:43] And then when you have a
librarian, and you have a lot of librarians, you have a Public Library,
because we have access to books, we have a catalogue, and we have a librarian.
That's the basic set up. [00:55] And in order to really work, in practice, we
need to introduce a set of tools which are easy to use, like Calibre, for
example, for book management. [01:07] And then also some part of that set up
should be also developed because at the moment, because of the configuration
of the routers, IP addresses and other things, it's not that easy to share
your local library which you have on your laptop with the world. [01:30] So we
also provide... When I say ‘we,’ it's a small team, at the moment, of
developers who try to address that problem. [01:38] We don't need to reinvent
the public library. It's invented, and it should be just maintained. [01:47]
The old-school public libraries – they are in decline because of many reasons.
And when it comes to the digital networks, the digital books, it's almost like
the worst position. [01:59] For example, public libraries in the US, they are
not allowed to buy digital books, for example from Penguin. So even when they
want to buy, it's not that they are getting them, it's that they can't buy the
books. [02:16] By the current legal regulation, it's considered as illegal – a
million of books, or even more, are unavailable, and I think that these books
should be really available. [02:29] And it doesn't really matter how it got on
Internet – did it come from a graphic designer who is preparing that for
print, or if it was uploaded somewhere from the author of the book (that is
also very common, especially in humanities), or if it was digitised anywhere.
[02:50] So these are the books which we have, and we can't be blinded, they
are here. The practice at the moment is almost like trying to find a
prostitute or something, so when you want to get a book online you need to get
onto the websites with advertisements for casinos, for porn and things like
that. [03:14] I don't think that the library should be like that.

[03:18]
Book Management

[03:22]
What we are trying to provide is just suggesting what kind of book management
software they can use, and also what kind of new software tools they can
install in order to easily get the messy directory into the directory of
metadata which Calibre can recognise – and then you can just use Calibre. The
next step is if you can share your local library with the world. [03:52] You
need something like a management software where it's easy to see who are the
authors, what the titles, publishers and all of the metadata – and it's
accessible from the outside.

[04:08]
Calibre

[04:12]
Calibre is a book management software. It's developed by Kovid Goyal, a
software developer. [04:22] It's a free software, open source, and it started
like many other free software projects. It started as a small tool to solve
very particular small problems. [04:31] But then, because it was useful, it
got more and more users, and then Kovid started to develop it more into a
proper, big book management software. At the moment it has more that 10
million registered users who are running that. [04:52] It does so many things
for book management. It's really ‘the’ software tool... If you have an
e-reader, for example, it recognises your e-reader, it registers it inside of
Calibre and then you can easily just transfer the books. [05:08] Also for
years there was a big problem of file formats. So for example, Amazon, in
order to keep their monopoly in that area, they wouldn't support EPUB or PDF.
And then if you got your book somewhere – if you bought it or just downloaded
from the Internet, you wouldn't be able to read it on your reader. [05:31]
Then Calibre was just developing the converter tools. And it was all in one
package, so that Calibre just became the tool for book management. [05:43] It
has a web server as a part of it. So in a local area network – if you just
start that web server and you are running a local area network, it can have a
read-only searchable access to your local library, to your books, and it can
search by any of these metadata.

[06:05]
Tools Around Calibre

[06:09]
I developed a software which I call Let's Share Books, which is super small
compared to Calibre. It just allows you, with one click, to get your library
shared on the Internet. [06:24] So that means that you get a public URL, which
says something like www some-number dot memoryoftheworld dot net, and that is
the temporary public URL. You can send it to anyone in the world. [06:37] And
while you are running your local web server and share books, it would just
serve these books to the Internet. [06:45] I also set up a web chat – kind of
a room where people can talk to each other, chat to each other. [06:54] So
it’s just, trying to develop tools around Calibre, which is mostly for one
person, for one librarian – to try to make some kind of ecosystem for a lot of
librarians where they can meet with their readers or among themselves, and
talk about the books which they love to read and share. [07:23] It’s mostly
like a social networking around the books, where we use the idea and tradition
of the public library. [07:37] In order to get there I needed to set up a
server which only does routing. So with my software I don’t know which books
are transferred, anything. It’s just like a router. [07:56] You can do that
also if you have control of your router, or what we usually call modem, so the
device which you use to get to the Internet. But that is quite hard to hack,
just hackers know how to do that. [08:13] So I just made a server on the
Internet which you can use with one click, and it just routes the traffic
between you, if you’re a librarian, and your users, readers. So that’s that
easy.

[08:33]
Librarians

[08:38] It’s super easy to become a librarian, and that is what we should
celebrate. It’s not that the only librarians which we have were the librarians
who were the only ones wanting to become a librarian. [08:54] So lots of
people want to be a librarian, and lots of people are librarians whenever they
have a chance. [09:00] So you would probably recommend me some books which you
like. I’ll recommend you some books which I like. So I think we should
celebrate that now it’s super easy that anyone can be a librarian. [09:11] And
of course, we will still need professional librarians in order to push forward
the whole field. But that goes, again, in collaboration with software
engineers, information architectes, whatever… [09:26] It’s so easy to have
that, and the benefits of that are so great, that there is no reason why not
to do that, I would say.

[09:38]
Functioning

[09:43]
If you want to share your collection then you need to install at the moment
Calibre, and Let’s Share Books software, which I wrote. But also you can – for
example, there is a Calibre plugin for Aaaaarg, so if you use Calibre… from
Calibre you can search Aaaaarg, you can download books from Aaaaarg, you can
also change the metadata and upload the metadata up to Aaaaarg.

[10:13]
Repositories

[10:17]
At the moment the biggest repository for the books, in order to download and
make your catalogue, is Library Genesis. It’s around 900,000 books. It’s
libgen.info, libgen.org. And it’s a great project. [10:33] It’s done by some
Russian hackers, who also allow anyone to download all of that. It’s 9
Terabytes of books, quite some chunk of hard disks which you need for that.
[10:47] And you can also download PHP, the back end of the website and the
MySQL database (a thumb of the MySQL database), so you can run your own
Library Genesis. That’s one of the ways how you can do that. [11:00] You can
also go and join Aaaaarg.org, where it is also not just about downloading
books and uploading books, it’s also about communication and interpretation of
making, different issues and catalogues. [11:14] It’s a community of book
lovers who like to share knowledge, and who add quite a lot of value around
the books by doing that. [11:26] And then there is… you can use Calibre and
Let’s Share Books. It’s just one of these complimentary tools. So it’s not
really that Calibre and Let’s Share Books is the only way how you can today
share books.

[11:45]
Goal

[11:50]
What we do also has a non-hidden agenda for fighting for the public library. I
would say that most of the people we know, even the authors, they all
participate in the huge, massive Public Library – which we don’t call Public
Library, but usually just trying to hide that we are using that because we are
afraid of the restrictive regime. [12:20] So I don’t see a reason why we
should shut down such a great idea and great implementation – a great resource
which we have all around the world. [12:30] So it’s just an attempt to map all
of these projects and to try to improve them. Because, in order to get it into
the right shape, we need to improve the metadata. [12:47] Open Library, a
project which started also with Aaron Swartz, has 20 millions items, and we
use it. There is a basedata.org which connects the hash files, the MD5 hashes,
with the Open Library ID. And we try to contribute to Open Library as much as
possible. [13:10] So with very few people, around 5 people, we can improve it
so much that it will be for a billion of users a great Public Library, and at
the same time we can have millions of librarians, which we never had before.
So that’s the idea. [13:35] The goal is just to keep the Public Library. If we
didn’t screw up the whole situation with the Public Library, probably we’d
just try to add a little bit of new software, and new ways that we can read
the books. [13:53] But at the moment [it’s] super important actually to keep
this infrastructure running, because this super important infrastructure for
the access to knowledge is now under huge threat.

[14:09]
Copyright

[14:13]
I just think that it’s completely inappropriate – that copyright law is
completely inappropriate for the Public Library. I don’t know about other
cases, but in terms of Public Library it’s absolutely inappropriate. [14:29]
We should find the new ways of how to reward the ones who are adding value to
sharing knowledge. First authors, then anyone who is involved in public
libraries, like librarians, software engineers – so everyone who is involved
in that ecosystem should be rewarded, because it’s a great thing, it’s a
benefit for the society. [15:03] If this kind of things happens, so if the law
which regulates this blocks and doesn’t let that field blossom, it’s something
wrong with that law. [15:16] It’s getting worse and worse, so I don’t know for
how long we should wait, because while we’re waiting it’s getting worse.
[15:24] I don’t care. And I think that I can say that because I’m an artist.
Because all of these laws are made saying that they are representing art, they
are representing the interest of artists. I’m an artist. They don’t really
represent my interests. [15:46] I think that it should be taken over by the
artists. And if there are some artists who disagree – great, let’s have a
discussion.

[15:58]
Civil Disobedience

[16:03]
In the possibilities of civil disobedience – which are done also by
institutions, not just by individuals – and I think that in such clear cases
like the Public Library it’s easy. [16:17] So I think that what I did in this
particular case is nothing really super smart – it’s just reducing this huge
issue to something which is comprehensible, which is understandable for most
of the people. [16:31] There is no one really who doesn’t understand what
public library is. And if you say to anyone in the world, saying, like hey, no
more public libraries, hey, no books anymore, no books for the poor people. We
are just giving up on something which we almost consensually accepted through
the whole world. [16:55] And I think that in such clear cases, I’m really
interested [in] what institutions could do, like Transmediale. I’m now in
[Akademie] Schloss Solitude, I also proposed to make a server with a Public
Library. If you invest enough it’s a million of books, it’s a great library.
[17:16] And of course they are scared. And I think that the system will never
really move if people are not brave. [17:26] I’m not really trying to
encourage people to do something where no one could really understand, you
know, and you need expertise or whatever. [17:37] In my opinion this is the
big case. And if Transmediale or any other art institution is playing with
that, and showing that – let’s see how far away we can support this kind of
things. [17:56] The other issue which I am really interested in is what is the
infrastructure, who is running the infrastructures, and what kind of
infrastructures are happen in between these supposedly avant-garde
institutions, or something. [08:12] So I’m really interested in raising these
issues.

[18:17]
Art Project

[18:21]
Public Library is also an art project where… I would say that just in the same
way that corporations, by their legal status, can really kind of mess around
with different… they can’t be that much accountable and responsible – I think
that this is the counterpart. [18:44] So civil disobedience can use art just
the same way that corporations can use their legal status. [18:51] When I was
invited as a curator and artist to curate the HAIP Festival in Ljubljana, I
was already quite into the topic of sharing access to knowledge. And then I
came up with this idea and everybody liked it and everybody was enthusiastic.
It's one of these ideas where you can see that it’s great, there is no one
really who would oppose to that. [19:28] At the same time there was an
exhibition, Dear Art, curated by WHW, quite established curators. And then it
immediately became an art piece for that exhibition. Then I was invited here
to Transmediale, and have a couple of other invitations. [19:45] I think that
it also shows that art institutions are accepting that, they play with that
idea. And I think that this kind of projects – by having that acceptance it
becomes the issue, it becomes the problem of the whole arts establishment.
[20:10] So I think that if I do this in this way, and if there is a curator
who invites this kind of projects – so who invites Public Library into their
exhibition – it’s also showing their kind of readiness to fight for that
issue. [20:27] And if there are a number of art festivals, a number of art
exhibitions, who are supporting this kind of, lets say, civil disobedience,
that also shows something. [20:38] And I think that that kind of context
should be pushed into the confrontation, so it’s not anymore just playing “oh,
is it is ok, it is not? We should deal with all the complexity…” [20:57] There
is no real complexity here. That complexity is somewhere else, and in some
other step we should take care of that. But this is an art piece, it’s a well
established art piece. [21:11] If you make a Public Library, I'm fine, I’m
sacrificing for taking the responsibility. But you shouldn't melt down that
art piece, I think. [21:26] And I feel super stupid that such a simple concept
should be, in 2013, articulated to whom? In many ways it’s like playing dummy,
I play dummy. It’s like, why should I? [21:50] When we started to play in
Ljubljana like software developers we came up with so many great ideas of how
to use those resources. So it was immediately… just after couple of hours we
had tools – visualisations of that, a reader of Wikipedia which can embed any
page which is referred, as a reference, a quote. [22:17] It was immediately
obvious for anyone there and for anyone from the outside what a huge resource
is having a Public Library like that – and what’s the huge harm that we don’t
have it. [22:32] But still we need to play dummy, I need to play the artist’s
role, you know.

Murtaugh
A bag but is language nothing of words
2016

## A bag but is language nothing of words

### From Mondotheque

#####

(language is nothing but a bag of words)

[Michael Murtaugh](/wiki/index.php?title=Michael_Murtaugh "Michael Murtaugh")

In text indexing and other machine reading applications the term "bag of
words" is frequently used to underscore how processing algorithms often
represent text using a data structure (word histograms or weighted vectors)
where the original order of the words in sentence form is stripped away. While
"bag of words" might well serve as a cautionary reminder to programmers of the
essential violence perpetrated to a text and a call to critically question the
efficacy of methods based on subsequent transformations, the expression's use
seems in practice more like a badge of pride or a schoolyard taunt that would
go: Hey language: you're nothin' but a big BAG-OF-WORDS.

## Bag of words

In information retrieval and other so-called _machine-reading_ applications
(such as text indexing for web search engines) the term "bag of words" is used
to underscore how in the course of processing a text the original order of the
words in sentence form is stripped away. The resulting representation is then
a collection of each unique word used in the text, typically weighted by the
number of times the word occurs.

Bag of words, also known as word histograms or weighted term vectors, are a
standard part of the data engineer's toolkit. But why such a drastic
transformation? The utility of "bag of words" is in how it makes text amenable
to code, first in that it's very straightforward to implement the translation
from a text document to a bag of words representation. More significantly,
this transformation then opens up a wide collection of tools and techniques
for further transformation and analysis purposes. For instance, a number of
libraries available in the booming field of "data sciences" work with "high
dimension" vectors; bag of words is a way to transform a written document into
a mathematical vector where each "dimension" corresponds to the (relative)
quantity of each unique word. While physically unimaginable and abstract
(imagine each of Shakespeare's works as points in a 14 million dimensional
space), from a formal mathematical perspective, it's quite a comfortable idea,
and many complementary techniques (such as principle component analysis) exist
to reduce the resulting complexity.

What's striking about a bag of words representation, given is centrality in so
many text retrieval application is its irreversibility. Given a bag of words
representation of a text and faced with the task of producing the original
text would require in essence the "brain" of a writer to recompose sentences,
working with the patience of a devoted cryptogram puzzler to draw from the
precise stock of available words. While "bag of words" might well serve as a
cautionary reminder to programmers of the essential violence perpetrated to a
text and a call to critically question the efficacy of methods based on
subsequent transformations, the expressions use seems in practice more like a
badge of pride or a schoolyard taunt that would go: Hey language: you're
nothing but a big BAG-OF-WORDS. Following this spirit of the term, "bag of
words" celebrates a perfunctory step of "breaking" a text into a purer form
amenable to computation, to stripping language of its silly redundant
repetitions and foolishly contrived stylistic phrasings to reveal a purer
inner essence.

## Book of words

Lieber's Standard Telegraphic Code, first published in 1896 and republished in
various updated editions through the early 1900s, is an example of one of
several competing systems of telegraph code books. The idea was for both
senders and receivers of telegraph messages to use the books to translate
their messages into a sequence of code words which can then be sent for less
money as telegraph messages were paid by the word. In the front of the book, a
list of examples gives a sampling of how messages like: "Have bought for your
account 400 bales of cotton, March delivery, at 8.34" can be conveyed by a
telegram with the message "Ciotola, Delaboravi". In each case the reduction of
number of transmitted words is highlighted to underscore the efficacy of the
method. Like a dictionary or thesaurus, the book is primarily organized around
key words, such as _act_ , _advice_ , _affairs_ , _bags_ , _bail_ , and
_bales_ , under which exhaustive lists of useful phrases involving the
corresponding word are provided in the main pages of the volume. [1]

[![Liebers
P1016847.JPG](/wiki/images/4/41/Liebers_P1016847.JPG)](/wiki/index.php?title=File:Liebers_P1016847.JPG)

[![Liebers
P1016859.JPG](/wiki/images/3/35/Liebers_P1016859.JPG)](/wiki/index.php?title=File:Liebers_P1016859.JPG)

[![Liebers
P1016861.JPG](/wiki/images/3/34/Liebers_P1016861.JPG)](/wiki/index.php?title=File:Liebers_P1016861.JPG)

[![Liebers
P1016869.JPG](/wiki/images/f/fd/Liebers_P1016869.JPG)](/wiki/index.php?title=File:Liebers_P1016869.JPG)

> [...] my focus in this chapter is on the inscription technology that grew
parasitically alongside the monopolistic pricing strategies of telegraph
companies: telegraph code books. Constructed under the bywords “economy,”
“secrecy,” and “simplicity,” telegraph code books matched phrases and words
with code letters or numbers. The idea was to use a single code word instead
of an entire phrase, thus saving money by serving as an information
compression technology. Generally economy won out over secrecy, but in
specialized cases, secrecy was also important.[2]

In Katherine Hayles' chapter devoted to telegraph code books she observes how:

> The interaction between code and language shows a steady movement away from
a human-centric view of code toward a machine-centric view, thus anticipating
the development of full-fledged machine codes with the digital computer. [3]

[![Liebers
P1016851.JPG](/wiki/images/1/13/Liebers_P1016851.JPG)](/wiki/index.php?title=File:Liebers_P1016851.JPG)
Aspects of this transitional moment are apparent in a notice included
prominently inserted in the Lieber's code book:

> After July, 1904, all combinations of letters that do not exceed ten will
pass as one cipher word, provided that it is pronounceable, or that it is
taken from the following languages: English, French, German, Dutch, Spanish,
Portuguese or Latin -- International Telegraphic Conference, July 1903 [4]

Conforming to international conventions regulating telegraph communication at
that time, the stipulation that code words be actual words drawn from a
variety of European languages (many of Lieber's code words are indeed
arbitrary Dutch, German, and Spanish words) underscores this particular moment
of transition as reference to the human body in the form of "pronounceable"
speech from representative languages begins to yield to the inherent potential
for arbitrariness in digital representation.

What telegraph code books do is remind us of is the relation of language in
general to economy. Whether they may be economies of memory, attention, costs
paid to a telecommunicatons company, or in terms of computer processing time
or storage space, encoding language or knowledge in any form of writing is a
form of shorthand and always involves an interplay with what one expects to
perform or "get out" of the resulting encoding.

> Along with the invention of telegraphic codes comes a paradox that John
Guillory has noted: code can be used both to clarify and occlude. Among the
sedimented structures in the technological unconscious is the dream of a
universal language. Uniting the world in networks of communication that
flashed faster than ever before, telegraphy was particularly suited to the
idea that intercultural communication could become almost effortless. In this
utopian vision, the effects of continuous reciprocal causality expand to
global proportions capable of radically transforming the conditions of human
life. That these dreams were never realized seems, in retrospect, inevitable.
[5]

[![Liebers
P1016884.JPG](/wiki/images/9/9c/Liebers_P1016884.JPG)](/wiki/index.php?title=File:Liebers_P1016884.JPG)

[![Liebers
P1016852.JPG](/wiki/images/7/74/Liebers_P1016852.JPG)](/wiki/index.php?title=File:Liebers_P1016852.JPG)

[![Liebers
P1016880.JPG](/wiki/images/1/11/Liebers_P1016880.JPG)](/wiki/index.php?title=File:Liebers_P1016880.JPG)

Far from providing a universal system of encoding messages in the English
language, Lieber's code is quite clearly designed for the particular needs and
conditions of its use. In addition to the phrases ordered by keywords, the
book includes a number of tables of terms for specialized use. One table lists
a set of words used to describe all possible permutations of numeric grades of
coffee (Choliam = 3,4, Choliambos = 3,4,5, Choliba = 4,5, etc.); another table
lists pairs of code words to express the respective daily rise or fall of the
price of coffee at the port of Le Havre in increments of a quarter of a Franc
per 50 kilos ("Chirriado = prices have advanced 1 1/4 francs"). From an
archaeological perspective, the Lieber's code book reveals a cross section of
the needs and desires of early 20th century business communication between the
United States and its trading partners.

The advertisements lining the Liebers Code book further situate its use and
that of commercial telegraphy. Among the many advertisements for banking and
law services, office equipment, and alcohol are several ads for gun powder and
explosives, drilling equipment and metallurgic services all with specific
applications to mining. Extending telegraphy's formative role for ship-to-
shore and ship-to-ship communication for reasons of safety, commercial
telegraphy extended this network of communication to include those parties
coordinating the "raw materials" being mined, grown, or otherwise extracted
from overseas sources and shipped back for sale.

## "Raw data now!"

From [La ville intelligente - Ville de la connaissance](/wiki/index.php?title
=La_ville_intelligente_-_Ville_de_la_connaissance "La ville intelligente -
Ville de la connaissance"):

Étant donné que les nouvelles formes modernistes et l'utilisation de matériaux
propageaient l'abondance d'éléments décoratifs, Paul Otlet croyait en la
possibilité du langage comme modèle de « [données
brutes](/wiki/index.php?title=Bag_of_words "Bag of words") », le réduisant aux
informations essentielles et aux faits sans ambiguïté, tout en se débarrassant
de tous les éléments inefficaces et subjectifs.

From [The Smart City - City of Knowledge](/wiki/index.php?title
=The_Smart_City_-_City_of_Knowledge "The Smart City - City of Knowledge"):

As new modernist forms and use of materials propagated the abundance of
decorative elements, Otlet believed in the possibility of language as a model
of '[raw data](/wiki/index.php?title=Bag_of_words "Bag of words")', reducing
it to essential information and unambiguous facts, while removing all
inefficient assets of ambiguity or subjectivity.

> Tim Berners-Lee: [...] Make a beautiful website, but first give us the
unadulterated data, we want the data. We want unadulterated data. OK, we have
to ask for raw data now. And I'm going to ask you to practice that, OK? Can
you say "raw"?

>

> Audience: Raw.

>

> Tim Berners-Lee: Can you say "data"?

>

> Audience: Data.

>

> TBL: Can you say "now"?

>

> Audience: Now!

>

> TBL: Alright, "raw data now"!

>

> [...]

>

> So, we're at the stage now where we have to do this -- the people who think
it's a great idea. And all the people -- and I think there's a lot of people
at TED who do things because -- even though there's not an immediate return on
the investment because it will only really pay off when everybody else has
done it -- they'll do it because they're the sort of person who just does
things which would be good if everybody else did them. OK, so it's called
linked data. I want you to make it. I want you to demand it. [6]

## Un/Structured

As graduate students at Stanford, Sergey Brin and Lawrence (Larry) Page had an
early interest in producing "structured data" from the "unstructured" web. [7]

> The World Wide Web provides a vast source of information of almost all
types, ranging from DNA databases to resumes to lists of favorite restaurants.
However, this information is often scattered among many web servers and hosts,
using many different formats. If these chunks of information could be
extracted from the World Wide Web and integrated into a structured form, they
would form an unprecedented source of information. It would include the
largest international directory of people, the largest and most diverse
databases of products, the greatest bibliography of academic works, and many
other useful resources. [...]

>

> **2.1 The Problem**
> Here we define our problem more formally:
> Let D be a large database of unstructured information such as the World
Wide Web [...] [8]

In a paper titled _Dynamic Data Mining_ Brin and Page situate their research
looking for _rules_ (statistical correlations) between words used in web
pages. The "baskets" they mention stem from the origins of "market basket"
techniques developed to find correlations between the items recorded in the
purchase receipts of supermarket customers. In their case, they deal with web
pages rather than shopping baskets, and words instead of purchases. In
transitioning to the much larger scale of the web, they describe the
usefulness of their research in terms of its computational economy, that is
the ability to tackle the scale of the web and still perform using
contemporary computing power completing its task in a reasonably short amount
of time.

> A traditional algorithm could not compute the large itemsets in the lifetime
of the universe. [...] Yet many data sets are difficult to mine because they
have many frequently occurring items, complex relationships between the items,
and a large number of items per basket. In this paper we experiment with word
usage in documents on the World Wide Web (see Section 4.2 for details about
this data set). This data set is fundamentally different from a supermarket
data set. Each document has roughly 150 distinct words on average, as compared
to roughly 10 items for cash register transactions. We restrict ourselves to a
subset of about 24 million documents from the web. This set of documents
contains over 14 million distinct words, with tens of thousands of them
occurring above a reasonable support threshold. Very many sets of these words
are highly correlated and occur often. [9]

## Un/Ordered

In programming, I've encountered a recurring "problem" that's quite
symptomatic. It goes something like this: you (the programmer) have managed to
cobble out a lovely "content management system" (either from scratch, or using
any number of helpful frameworks) where your user can enter some "items" into
a database, for instance to store bookmarks. After this ordered items are
automatically presented in list form (say on a web page). The author: It's
great, except... could this bookmark come before that one? The problem stems
from the fact that the database ordering (a core functionality provided by any
database) somehow applies a sorting logic that's almost but not quite right. A
typical example is the sorting of names where details (where to place a name
that starts with a Norwegian "Ø" for instance), are language-specific, and
when a mixture of languages occurs, no single ordering is necessarily
"correct". The (often) exascerbated programmer might hastily add an additional
database field so that each item can also have an "order" (perhaps in the form
of a date or some other kind of (alpha)numerical "sorting" value) to be used
to correctly order the resulting list. Now the author has a means, awkward and
indirect but workable, to control the order of the presented data on the start
page. But one might well ask, why not just edit the resulting listing as a
document? Not possible! Contemporary content management systems are based on a
data flow from a "pure" source of a database, through controlling code and
templates to produce a document as a result. The document isn't the data, it's
the end result of an irreversible process. This problem, in this and many
variants, is widespread and reveals an essential backwardness that a
particular "computer scientist" mindset relating to what constitutes "data"
and in particular it's relationship to order that makes what might be a
straightforward question of editing a document into an over-engineered
database.

Recently working with Nikolaos Vogiatzis whose research explores playful and
radically subjective alternatives to the list, Vogiatzis was struck by how
from the earliest specifications of HTML (still valid today) have separate
elements (OL and UL) for "ordered" and "unordered" lists.

> The representation of the list is not defined here, but a bulleted list for
unordered lists, and a sequence of numbered paragraphs for an ordered list
would be quite appropriate. Other possibilities for interactive display
include embedded scrollable browse panels. [10]

Vogiatzis' surprise lay in the idea of a list ever being considered
"unordered" (or in opposition to the language used in the specification, for
order to ever be considered "insignificant"). Indeed in its suggested
representation, still followed by modern web browsers, the only difference
between the two visually is that UL items are preceded by a bullet symbol,
while OL items are numbered.

The idea of ordering runs deep in programming practice where essentially
different data structures are employed depending on whether order is to be
maintained. The indexes of a "hash" table, for instance (also known as an
associative array), are ordered in an unpredictable way governed by a
representation's particular implementation. This data structure, extremely
prevalent in contemporary programming practice sacrifices order to offer other
kinds of efficiency (fast text-based retrieval for instance).

## Data mining

In announcing Google's impending data center in Mons, Belgian prime minister
Di Rupo invoked the link between the history of the mining industry in the
region and the present and future interest in "data mining" as practiced by IT
companies such as Google.

Whether speaking of bales of cotton, barrels of oil, or bags of words, what
links these subjects is the way in which the notion of "raw material" obscures
the labor and power structures employed to secure them. "Raw" is always
relative: "purity" depends on processes of "refinement" that typically carry
social/ecological impact.

Stripping language of order is an act of "disembodiment", detaching it from
the acts of writing and reading. The shift from (human) reading to machine
reading involves a shift of responsibility from the individual human body to
the obscured responsibilities and seemingly inevitable forces of the
"machine", be it the machine of a market or the machine of an algorithm.

From [X = Y](/wiki/index.php?title=X_%3D_Y "X = Y"):

Still, it is reassuring to know that the products hold traces of the work,
that even with the progressive removal of human signs in automated processes,
the workers' presence never disappears completely. This presence is proof of
the materiality of information production, and becomes a sign of the economies
and paradigms of efficiency and profitability that are involved.

The computer scientists' view of textual content as "unstructured", be it in a
webpage or the OCR scanned pages of a book, reflect a negligence to the
processes and labor of writing, editing, design, layout, typesetting, and
eventually publishing, collecting and cataloging [11].

"Unstructured" to the computer scientist, means non-conformant to particular
forms of machine reading. "Structuring" then is a social process by which
particular (additional) conventions are agreed upon and employed. Computer
scientists often view text through the eyes of their particular reading
algorithm, and in the process (voluntarily) blind themselves to the work
practices which have produced and maintain these "resources".

Berners-Lee, in chastising his audience of web publishers to not only publish
online, but to release "unadulterated" data belies a lack of imagination in
considering how language is itself structured and a blindness to the need for
more than additional technical standards to connect to existing publishing
practices.

Last Revision: 2*08*2016

1. ↑ Benjamin Franklin Lieber, Lieber's Standard Telegraphic Code, 1896, New York;
2. ↑ Katherine Hayles, "Technogenesis in Action: Telegraph Code Books and the Place of the Human", How We Think: Digital Media and Contemporary Technogenesis, 2006
3. ↑ Hayles
4. ↑ Lieber's
5. ↑ Hayles
6. ↑ Tim Berners-Lee: The next web, TED Talk, February 2009
7. ↑ "Research on the Web seems to be fashionable these days and I guess I'm no exception." from Brin's [Stanford webpage](http://infolab.stanford.edu/~sergey/)
8. ↑ Extracting Patterns and Relations from the World Wide Web, Sergey Brin, Proceedings of the WebDB Workshop at EDBT 1998,
9. ↑ Dynamic Data Mining: Exploring Large Rule Spaces by Sampling; Sergey Brin and Lawrence Page, 1998; p. 2
10. ↑ Hypertext Markup Language (HTML): "Internet Draft", Tim Berners-Lee and Daniel Connolly, June 1993,
11. ↑

Retrieved from

[https://www.mondotheque.be/wiki/index.php?title=A_bag_but_is_language_nothing_of_words&oldid=8480](https://www.mondotheque.be/wiki/index.php?title=A_bag_but_is_language_nothing_of_words&oldid=8480)

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.