Liang
Shadow Libraries
2012


Journal #37 - September 2012

# Shadow Libraries

Over the last few monsoons I lived with the dread that the rain would
eventually find its ways through my leaky terrace roof and destroy my books.
Last August my fears came true when I woke up in the middle of the night to
see my room flooded and water leaking from the roof and through the walls.
Much of the night was spent rescuing the books and shifting them to a dry
room. While timing and speed were essential to the task at hand they were also
the key hazards navigating a slippery floor with books perched till one’s
neck. At the end of the rescue mission, I sat alone, exhausted amongst a
mountain of books assessing the damage that had been done, but also having
found books I had forgotten or had not seen in years; books which I had
thought had been permanently borrowed by others or misplaced found their way
back as I set many aside in a kind of ritual of renewed commitment.

[ ](//images.e-flux-systems.com/2012_09_book-library-small-WEB.jpg,2000)

Sorting the badly damaged from the mildly wet, I could not help but think
about the fragile histories of books from the library of Alexandria to the
great Florence flood of 1966. It may have seemed presumptuous to move from the
precarity of one’s small library and collection to these larger events, but is
there any other way in which one experiences earth-shattering events if not
via a microcosmic filtering through one’s own experiences? I sent a distressed
email to a friend Sandeep a committed bibliophile and book collector with a
fantastic personal library, who had also been responsible for many of my new
acquisitions. He wrote back on August 17, and I quote an extract of the email:

> Dear Lawrence

>

> I hope your books are fine. I feel for you very deeply, since my nightmares
about the future all contain as a key image my books rotting away under a
steady drip of grey water. Where was this leak, in the old house or in the
new? I spent some time looking at the books themselves: many of them I greeted
like old friends. I see you have Lewis Hyde’s _Trickster Makes the World_ and
Edward Rice’s _Captain Sir Richard Francis Burton_ in the pile: both top-class
books. (Burton is a bit of an obsession with me. The man did and saw
everything there was to do and see, and thought about it all, and wrote it all
down in a massive pile of notes and manuscripts. He squirrelled a fraction of
his scholarship into the tremendous footnotes to the Thousand and One Nights,
but most of it he could not publish without scandalising the Victorians, and
then he died, and his widow made a bonfire in the backyard, and burnt
everything because she disapproved of these products of a lifetime’s labors,
and of a lifetime such as few have ever had, and no one can ever have again. I
almost hope there is a special hell for Isabel Burton to burn in.)

Moving from one’s personal pile to the burning of the work of one of the
greatest autodidacts of the nineteenth century and back it was strangely
comforting to be reminded that libraries—the greatest of time machines
invented—were testimonies to both the grandeur and the fragility of
civilizations. Whenever I enter huge libraries it is with a tingling sense of
excitement normally reserved for horror movies, but at the same time this same
sense of awe is often accompanied by an almost debilitating sense of what it
means to encounter finitude as it is dwarfed by centuries of words and
scholarship. Yet strangely when I think of libraries it is rarely the New York
public library that comes to mind even as I wish that we could have similar
institutions in India. I think instead of much smaller collections—sometimes
of institutions but often just those of friends and acquaintances. I enjoy
browsing through people’s bookshelves, not just to discern their reading
preferences or to discover for myself unknown treasures, but also to take
delight in the local logic of their library, their spatial preferences and to
understand the order of things not as a global knowledge project but as a
personal, often quirky rationale.

[ ](//images.e-flux-systems.com/2012_09_library-of-congress.jpg,2000 "Machine
room for book transportation at the Library of Congress, early 20th century.")

Machine room for book transportation at the Library of Congress, early 20th
century.

Like romantic love, bibliophilia is perhaps shaped by one’s first love. The
first library that I knew intimately was a little six by eight foot shop
hidden in a by-lane off one of the busiest roads in Bangalore, Commercial
street. From its name to what it contained, Mecca stores could well have been
transported out of an Arabian nights tale. One side of the store was lined
with plastic ware and kitchen utensils of every shape and size while the other
wall was piled with books, comics, and magazines. From my eight-year-old
perspective it seemed large enough to contain all the knowledge of the world.
I earned a weekly stipend packing noodles for an hour every day after school
in the home shop that my parents ran, which I used to either borrow or buy
second hand books from the store. I was usually done with them by Sunday and
would have them reread by Wednesday. The real anguish came in waiting from
Wednesday to Friday for the next set. After finally acquiring a small
collection of books and comics myself I decided—spurred on by a fatal
combination of entrepreneurial enthusiasm and a pedantic desire to educate
others—to start a small library myself. Packing my books into a small aluminum
case and armed with a makeshift ledger, I went from house to house convincing
children in the neighborhood to forgo twenty-five paisa in exchange for a book
or comic with an additional caveat that they were not to share them with any
of their friends. While the enterprise got off to a reasonable start it soon
met its end when I realized that despite my instructions, my friends were
generously sharing the comics after they were done with them, which thereby
ended my biblioempire ambitions.

Over the past few years the explosion of ebook readers and consequent rise in
the availability of pirated books have opened new worlds to my booklust.
[Library.nu](library.nu), which began as gigapedia, suddenly made the idea of
the universal library seem like reality. By the time it shut down in February
2012 the library had close to a million books and over half a million active
users. Bibliophiles across the world were distraught when the site was shut
down and if it were ever possible to experience what the burning of the
library of Alexandria must have felt it was that collective ache of seeing the
closure of [library.nu.](library.nu)

What brings together something as monumental as the New York public library, a
collective enterprise like [library.nu](library.nu) and Mecca stores if not
the word library? As spaces they may have little in common but as virtual
spaces they speak as equals even if the scale of their imagination may differ.
All of them partake of their share in the world of logotopias. In an
exhibition designed to celebrate the place of the library in art, architecture
and imagination the curator Sascha Hastings coined the term logotopia to
designate “word places”—a happy coincidence of architecture and language.

There is however a risk of flattening the differences between these spaces by
classifying them all under a single utopian ideal of the library. Imagination
after all has a geography and physiology and requires our alertness to these
distinctions. Lets think instead of an entire pantheon (both of spaces as well
as practices) that we can designate as shadow libraries (or shadow logotopias
if you like) which exist in the shadows cast by the long history of monumental
libraries. While they are often dwarfed by the idea of the library, like the
shadows cast by our bodies, sometimes these shadows surge ahead of the body.

[ ](//images.e-flux-systems.com/2012_09_london-blitz-WEB.jpg,2000 "The London
Library after the Blitz, c. 1940.")

The London Library after the Blitz, c. 1940.

At the heart of all libraries lies a myth—that of the burning of the library
of Alexandria. No one knows what the library of Alexandria looked like or
possesses an accurate list of its contents. What we have long known though is
a sense of loss. But a loss of what? Of all the forms of knowledge in the
world in a particular time. Because that was precisely what the library of
Alexandria sought to collect under its roofs. It is believed that in order to
succeed in assembling a universal library, King Ptolemy I wrote “to all the
sovereigns and governors on earth” begging them to send to him every kind of
book by every kind of author, “poets and prose-writers, rhetoricians and
sophists, doctors and soothsayers, historians, and all others too.” The king’s
scholars had calculated that five hundred thousand scrolls would be required
if they were to collect in Alexandria “all the books of all the peoples of the
world.”1

What was special about the Library of Alexandria was the fact that until then
the libraries of the ancient world were either private collections of an
individual or government storehouses where legal and literary documents were
kept for official reference. By imagining a space where the public could have
access to all the knowledge of the world, the library also expressed a new
idea of the human itself. While the library of Alexandria is rightfully
celebrated, what is often forgotten in the mourning of its demise is another
library—one that existed in the shadows of the grand library but whose
whereabouts ensured that it survived Caesar’s papyrus destroying flames.

According to the Sicilian historian Diodorus Siculus, writing in the first
century BC, Alexandria boasted a second library, the so-called daughter
library, intended for the use of scholars not affiliated with the Museion. It
was situated in the south-western neighborhood of Alexandria, close to the
temple of Serapis, and was stocked with duplicate copies of the Museion
library’s holdings. This shadow library survived the fire that destroyed the
primary library of Alexandria but has since been eclipsed by the latter’s
myth.

Alberto Manguel says that if the library of Alexandria stood tall as an
expression of universal ambitions, there is another structure that haunts our
imagination: the tower of Babel. If the library attempted to conquer time, the
tower sought to vanquish space. He says “The Tower of Babel in space and the
Library of Alexandria in time are the twin symbols of these ambitions. In
their shadow, my small library is a reminder of both impossible yearnings—the
desire to contain all the tongues of Babel and the longing to possess all the
volumes of Alexandria.”2 Writing about the two failed projects Manguel adds
that when seen within the limiting frame of the real, the one exists only as
nebulous reality and the other as an unsuccessful if ambitious real estate
enterprise. But seen as myths, and in the imagination at night, the solidity
of both buildings for him is unimpeachable.3

The utopian ideal of the universal library was more than a question of built
up form or space or even the possibility of storing all of the knowledge of
the world; its real aspiration was in the illusion of order that it could
impose on a chaotic world where the lines drawn by a fine hairbrush
distinguished the world of animals from men, fairies from ghosts, science from
magic, and Europe from Japan. In some cases even after the physical structure
that housed the books had crumbled and the books had been reduced to dust the
ideal remained in the form of the order imagined for the library. One such
residual evidence comes to us by way of the _Pandectae_ —a comprehensive
bibliography created by Conrad Gesner in 1545 when he feared that the Ottoman
conquerors would destroy all the books in Europe. He created a bibliography
from which the library could be built again—an all embracing index which
contained a systematic organization of twenty principal groups with a matrix
like structure that contained 30,000 concepts.4

It is not surprising that Alberto Manguel would attempt write a literary,
historical and personal history of the library. As a seventeen-year-old man in
Buenos Aries, Manguel read for the blind seer Jorge Luis Borges who once
imagined in his appropriately named story—The Tower of Babel—paradise as a
kind of library. Modifying his mentor’s statement in what can be understood as
a gesture to the inevitable demands of the real and yet acknowledging the
possible pleasures of living in shadows, Manguel asserts that sometimes
paradise must adapt itself to suit circumstantial requirements. Similarly
Jacques Rancière writing about the libraries of the working class in the
eighteenth century tells us about Gauny a joiner and a boy in love with
vagrancy and botany who decides to build a library for himself. For the sons
of the poor proletarians living in Saint Marcel district, libraries were built
only a page at a time. He learnt to read by tracing the pages on which his
mother bought her lentils and would be disappointed whenever he came to the
end of a page and the next page was not available, even though he urged his
mother to buy her lentils from the same grocer. 5

[ ](//images.e-flux-systems.com/2012_09_DGF-D-Tropics-detail-hi-res-
WEB.jpg,2000 "Dominique Gonzalez-Foerster, Chronotopes & Dioramas , 2009.
Diorama installation at The Hispanic Society of America, New York.")

Dominique Gonzalez-Foerster, _Chronotopes & Dioramas_, 2009. Diorama
installation at The Hispanic Society of America, New York.

Is the utopian ideal of the universal library as exemplified by the library of
Alexandria or modernist pedagogic institutions of the twentieth century
adequate to the task of describing the space of the shadow library, or do we
need a different account of these other spaces? In an era of the ebook reader
where the line between a book and a library is blurred, the very idea of a
library is up for grabs. It has taken me well over two decades to build a
collection of a few thousand books while around two hundred thousand books
exist as bits and bytes on my computer. Admittedly hard drives crash and data
is lost, but is that the same threat as those of rain or fire? Which then is
my library and which its shadow? Or in the spirit of logotopias would it be
more appropriate to ask the spatial question: where is the library?

If the possibility of having 200,000 books on one’s computer feels staggering
here is an even more startling statistic. The Library of Congress which is the
largest library in the world with holdings of approximately thirty million
books, which would—if they were piled on the floor—cover 364 kilometers could
potentially fit into an SD card. It is estimated that by 2030 an ordinary SD
card will have the capacity of storing up to 64 TB and assuming each book were
digitized at an average size of 1MB it would technically be possible to fit
two Libraries of Congress in one’s pocket.

It sounds like science fiction, but isn’t it the case that much of the science
fiction of a decade ago finds itself comfortably within the weaves of everyday
life. How do we make sense of the future of the library? While it may be
tempting to throw our hands up in boggled perplexity about what it means to be
able to have thirty million books lets face it: the point of libraries have
never been that you will finish what’s there. Anyone with even a modest book
collection will testify to the impossibility of ever finishing their library
and if anything at all the library stands precisely at the cusp of our
finitude and our infinity. Perhaps that is what Borges—the consummate mixer of
time and space—meant when he described paradise as a library, not as a spatial
idea but a temporal one: that it was only within the confines of infinity that
one imagine finishing reading one’s library. It would therefore be more
interesting to think of the shadow library as a way of thinking about what it
means to dwell in knowledge. While all our aspirations for a habitat should
have a utopian element to them, lets face it, utopias have always been
difficult spaces to live in.

In contrast to the idea of utopia is heterotopia—a term with its origins in
medicine (referring to an organ of the body that had been dislodged from its
usual space) and popularized by Michel Foucault both in terms of language as
well as a spatial metaphor. If utopia exists as a nowhere or imaginary space
with no connection to any existing social spaces, then heterotopias in
contrast are realities that exist and are even foundational, but in which all
other spaces are potentially inverted and contested. A mirror for instance is
simultaneously a utopia (placeless place) even as it exists in reality. But
from the standpoint of the mirror you discover your absence as well. Foucault
remarks, “The mirror functions as a heterotopia in this respect: it makes this
place that I occupy at the moment when I look at myself in the glass at once
absolutely real, connected with all the space that surrounds it, and
absolutely unreal, since in order to be perceived it has to pass through this
virtual point which is over there.”6

In _The Order of Things_ Foucault sought to investigate the conceptual space
which makes the order of knowledge possible; in his famed reading of Borges’s
Chinese encyclopedia he argues that the impossibility involved in the
encyclopedia consists less in the fantastical status of the animals and their
coexistence with real animals such as (d) sucking pigs and (e) sirens, but in
where they coexist and what “transgresses the boundaries of all imagination,
of all possible thought, is simply that alphabetical series (a, b, c, d) which
links each of those categories to all the others.” 7 Heterotopias destabilize
the ground from which we build order and in doing so reframe the very
epistemic basis of how we know.

Foucault later developed a greater spatial understanding of heterotopias in
which he uses specific examples such as the cemetery (at once the space of the
familiar since everyone has someone in the cemetery and at the heart of the
city but also over a period of time the other city, where each family
possesses its dark resting place).8 Indeed, the paradox of heterotopias is
that they are both separate from yet connected to all other spaces. This
connectedness is precisely what builds contestation into heterotopias.
Imaginary spaces such as utopias exist completely outside of order.
Heteretopias by virtue of their connectedness become sites in which epistemes
collide and overlap. They bring together heterogeneous collections of unusual
things without allowing them a unity or order established through resemblance.
Instead, their ordering is derived from a process of similitude that produces,
in an almost magical, uncertain space, monstrous combinations that unsettle
the flow of discourse.

If the utopian ideal of the library was to bring together everything that we
know of the world then the length of its bookshelves was coterminous with the
breadth of the world. But like its predecessors in Alexandria and Babel the
project is destined to be incomplete haunted by what it necessarily leaves out
and misses. The library as heterotopia reveals itself only through the
interstices and lays bare the fiction of any possibility of a coherent ground
on which a knowledge project can be built. Finally there is the question of
where we stand once the grounds that we stand on itself has been dislodged.
The answer from my first foray into the tiny six by eight foot Mecca store to
the innumerable hours spent on [ library.nu]( library.nu) remains the same:
the heterotopic pleasure of our finite selves in infinity.

×

This essay is a part of a work I am doing for an exhibition curated by Raqs
Media Collective, Sarai Reader 09. The show began on August 19, 2012, with a
deceptively empty space containing only the proposal, with ideas for the
artworks to come over a period of nine months. See
.

**Lawrence Liang** is a researcher and writer based at the Alternative Law
Forum, Bangalore. His work lies at the intersection of law and cultural
politics, and has in recent years been looking at question of media piracy. He
is currently finish a book on law and justice in Hindi cinema.

© 2012 e-flux and the author

[ ![](//images.e-flux-systems.com/Banner-Eflux-760x1350px-Learoyd-ing-
ok.gif,300) ](/ads/redirect/271922)

Journal # 37

Related

Conversations

Notes

Share

[Download PDF](http://worker01.e-flux.com/pdf/article_8957468.pdf)

More

Julieta Aranda, Brian Kuan Wood, and Anton Vidokle

## [Editorial](/journal/37/61227/editorial/)

![](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

It is hard to avoid the feeling these days that the future is behind us. It’s
not so much that time has stopped, but rather that the sense of promise and
purpose that once drove historical progress has become impossible to sustain.
On the one hand, the faith in modernist, nationalist, or universalist utopias
continues to retreat, while on the other, a more immediate crisis of faith has
accompanied the widespread sense of diminishing economic prospects felt in so
many places. Not to mention...

## [Shadow Libraries](/journal/37/61228/shadow-libraries/)

![](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

Over the last few monsoons I lived with the dread that the rain would
eventually find its ways through my leaky terrace roof and destroy my books.
Last August my fears came true when I woke up in the middle of the night to
see my room flooded and water leaking from the roof and through the walls.
Much of the night was spent rescuing the books and shifting them to a dry
room. While timing and speed were essential to the task at hand they were also
the key hazards navigating a slippery floor...

Metahaven

## [Captives of the Cloud: Part I](/journal/37/61232/captives-of-the-cloud-
part-i/)

![](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

We are the voluntary prisoners of the cloud; we are being watched over by
governments we did not elect. Wael Ghonim, Google's Egyptian executive, said:
“If you want to liberate a society just give them the internet.” 1 But how
does one liberate a society that already has the internet? In a society
permanently connected through pervasive broadband networks, the shared
internet is, bit by bit and piece by piece, overshadowed by the “cloud.” The
Coming of the Cloud The cloud,...

Amelia Groom

## [There’s Nothing to See Here: Erasing the
Monochrome](/journal/37/61233/there-s-nothing-to-see-here-erasing-the-
monochrome/)

![](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

There was once a typist from Texas named Bette Nesmith Graham, who wasn’t very
good at her job. In 1951 she started erasing her typing mistakes with a white
tempera paint solution she mixed in her kitchen blender. She called her
invention Mistake Out and began distributing small green bottles of it to her
coworkers. In 1956 she founded the delectably named Mistake Out Company.
Shortly after, she was apparently fired from her typist job because she made a
“mistake” that she failed to cover...

Nato Thompson

## [The Last Pictures: Interview with Trevor Paglen](/journal/37/61238/the-
last-pictures-interview-with-trevor-paglen/)

![](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

In 1963 NASA launched the first communications satellite, Syncom 2, into a
geosynchronous orbit over the Atlantic Ocean. Since then, humans have slowly
and methodically added to this space-based communications infrastructure.
Currently, more than 800 spacecraft in geosynchronous orbit form a man-made
ring of satellites around Earth at an altitude of 36,000 kilometers. Most of
these spacecraft powered down long ago, yet continue to float aimlessly around
the planet. Geostationary satellites...

Claire Tancons

## [Carnival to Commons: Pussy Riot, Punk Protest, and the Exercise of
Democratic Culture](/journal/37/61239/carnival-to-commons-pussy-riot-punk-
protest-and-the-exercise-of-democratic-culture/)

![](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

Once again, the press has dismissed a popular movement as carnival—this time
not Occupy Wall Street, but the anti-Putin protests. On March 1, 2012, in a
Financial Times article titled “Carnival spirit is not enough to change
Russia,” Konstantin von Eggert wrote, “One cannot sustain [the movement] on
carnival spirit alone.” 1 A little over a week later, Reuters sought to close
the debate with an article by Alissa de Carbonnel, in which she announced,
“The carnival is over for Russia’s...

Anton Vidokle and Brian Kuan Wood

## [Breaking the Contract](/journal/37/61241/breaking-the-contract/)

![](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)

1\. The Contract The Duchampian revolution leads not to the liberation of the
artist from work, but to his or her proletarization via alienated construction
and transportation work. In fact, contemporary art institutions no longer need
an artist as a traditional producer. Rather, today the artist is more often
hired for a certain period of time as a worker to realize this or that
institutional project. — Boris Groys 1 When his readymades entered the space
of art, Duchamp...

Shadow Libraries

There is nothing related.

Conversations - Shadow Libraries

Conversations

[Join the Conversation](http://conversations.e-flux.com/t/5546)

e-flux conversations is a discussion platform for e-flux readers. Click to
start a discussion of the article above.

Start the Conversation

Notes - Shadow Libraries

1

Esther Shipman and Sascha Hastings eds., _Logotopia: The Library in
Architecture Art and the Imagination,_ (Cambridge Galleries: Abc Art Books
Canada, 2008).

Go to Text

2

Alberto Manguel, “My Library” in Hastings and Shipman eds. _Logotopia, The
Library in Art and Architecture and the Imagination, (Cambridge Galleries: ABC
Art Books Canada, 2008)._

Go to Text

3

Alberto Manguel, _The Library at Night_ , (Yale University Press 2009).

Go to Text

4

Ray Hastings and Esther Shipman, eds. _Logotopia: The Library in Architecture
Art and the Imagination_. Cambridge Galleries / ABC Art Books Canada, 2008.

Go to Text

5

Jacques Rancière, _The Nights of Labour: The Workers’ Dream in Nineteenth
Century France,_ (Philadelphia: Temple University Press, 1991).

Go to Text

6

Michel Foucault, “Different Spaces,” in _Aesthetics, Method, Epistemology_ ,
ed. James D. Faubion (New York: The New Press, 1998), 179; For Foucault on
language and heterotopias see _The Order of Things: An Archaeology of the
Human Sciences,_ (New York: Pantheon, 1970).

Go to Text

7

Ibid, xv.

Go to Text

8

In Foucault, “Different Spaces,” which was presented as a lecture to the
_Architecture Studies Circle_ in 1967, a few years after the writing of _The
Order of Things_.

Go to Text

Esther Shipman and Sascha Hastings eds., _Logotopia: The Library in
Architecture Art and the Imagination,_ (Cambridge Galleries: Abc Art Books
Canada, 2008).

Alberto Manguel, “My Library” in Hastings and Shipman eds. _Logotopia, The
Library in Art and Architecture and the Imagination, (Cambridge Galleries: ABC
Art Books Canada, 2008)._

Alberto Manguel, _The Library at Night_ , (Yale University Press 2009).

Ray Hastings and Esther Shipman, eds. _Logotopia: The Library in Architecture
Art and the Imagination_. Cambridge Galleries / ABC Art Books Canada, 2008.

Jacques Rancière, _The Nights of Labour: The Workers’ Dream in Nineteenth
Century France,_ (Philadelphia: Temple University Press, 1991).

Michel Foucault, “Different Spaces,” in _Aesthetics, Method, Epistemology_ ,
ed. James D. Faubion (New York: The New Press, 1998), 179; For Foucault on
language and heterotopias see _The Order of Things: An Archaeology of the
Human Sciences,_ (New York: Pantheon, 1970).

Ibid, xv.

In Foucault, “Different Spaces,” which was presented as a lecture to the
_Architecture Studies Circle_ in 1967, a few years after the writing of _The
Order of Things_.


Murtaugh
A bag but is language nothing of words
2016


## A bag but is language nothing of words

### From Mondotheque

#####

(language is nothing but a bag of words)

[Michael Murtaugh](/wiki/index.php?title=Michael_Murtaugh "Michael Murtaugh")

In text indexing and other machine reading applications the term "bag of
words" is frequently used to underscore how processing algorithms often
represent text using a data structure (word histograms or weighted vectors)
where the original order of the words in sentence form is stripped away. While
"bag of words" might well serve as a cautionary reminder to programmers of the
essential violence perpetrated to a text and a call to critically question the
efficacy of methods based on subsequent transformations, the expression's use
seems in practice more like a badge of pride or a schoolyard taunt that would
go: Hey language: you're nothin' but a big BAG-OF-WORDS.

## Bag of words

In information retrieval and other so-called _machine-reading_ applications
(such as text indexing for web search engines) the term "bag of words" is used
to underscore how in the course of processing a text the original order of the
words in sentence form is stripped away. The resulting representation is then
a collection of each unique word used in the text, typically weighted by the
number of times the word occurs.

Bag of words, also known as word histograms or weighted term vectors, are a
standard part of the data engineer's toolkit. But why such a drastic
transformation? The utility of "bag of words" is in how it makes text amenable
to code, first in that it's very straightforward to implement the translation
from a text document to a bag of words representation. More significantly,
this transformation then opens up a wide collection of tools and techniques
for further transformation and analysis purposes. For instance, a number of
libraries available in the booming field of "data sciences" work with "high
dimension" vectors; bag of words is a way to transform a written document into
a mathematical vector where each "dimension" corresponds to the (relative)
quantity of each unique word. While physically unimaginable and abstract
(imagine each of Shakespeare's works as points in a 14 million dimensional
space), from a formal mathematical perspective, it's quite a comfortable idea,
and many complementary techniques (such as principle component analysis) exist
to reduce the resulting complexity.

What's striking about a bag of words representation, given is centrality in so
many text retrieval application is its irreversibility. Given a bag of words
representation of a text and faced with the task of producing the original
text would require in essence the "brain" of a writer to recompose sentences,
working with the patience of a devoted cryptogram puzzler to draw from the
precise stock of available words. While "bag of words" might well serve as a
cautionary reminder to programmers of the essential violence perpetrated to a
text and a call to critically question the efficacy of methods based on
subsequent transformations, the expressions use seems in practice more like a
badge of pride or a schoolyard taunt that would go: Hey language: you're
nothing but a big BAG-OF-WORDS. Following this spirit of the term, "bag of
words" celebrates a perfunctory step of "breaking" a text into a purer form
amenable to computation, to stripping language of its silly redundant
repetitions and foolishly contrived stylistic phrasings to reveal a purer
inner essence.

## Book of words

Lieber's Standard Telegraphic Code, first published in 1896 and republished in
various updated editions through the early 1900s, is an example of one of
several competing systems of telegraph code books. The idea was for both
senders and receivers of telegraph messages to use the books to translate
their messages into a sequence of code words which can then be sent for less
money as telegraph messages were paid by the word. In the front of the book, a
list of examples gives a sampling of how messages like: "Have bought for your
account 400 bales of cotton, March delivery, at 8.34" can be conveyed by a
telegram with the message "Ciotola, Delaboravi". In each case the reduction of
number of transmitted words is highlighted to underscore the efficacy of the
method. Like a dictionary or thesaurus, the book is primarily organized around
key words, such as _act_ , _advice_ , _affairs_ , _bags_ , _bail_ , and
_bales_ , under which exhaustive lists of useful phrases involving the
corresponding word are provided in the main pages of the volume. [1]

[![Liebers
P1016847.JPG](/wiki/images/4/41/Liebers_P1016847.JPG)](/wiki/index.php?title=File:Liebers_P1016847.JPG)

[![Liebers
P1016859.JPG](/wiki/images/3/35/Liebers_P1016859.JPG)](/wiki/index.php?title=File:Liebers_P1016859.JPG)

[![Liebers
P1016861.JPG](/wiki/images/3/34/Liebers_P1016861.JPG)](/wiki/index.php?title=File:Liebers_P1016861.JPG)

[![Liebers
P1016869.JPG](/wiki/images/f/fd/Liebers_P1016869.JPG)](/wiki/index.php?title=File:Liebers_P1016869.JPG)

> [...] my focus in this chapter is on the inscription technology that grew
parasitically alongside the monopolistic pricing strategies of telegraph
companies: telegraph code books. Constructed under the bywords “economy,”
“secrecy,” and “simplicity,” telegraph code books matched phrases and words
with code letters or numbers. The idea was to use a single code word instead
of an entire phrase, thus saving money by serving as an information
compression technology. Generally economy won out over secrecy, but in
specialized cases, secrecy was also important.[2]

In Katherine Hayles' chapter devoted to telegraph code books she observes how:

> The interaction between code and language shows a steady movement away from
a human-centric view of code toward a machine-centric view, thus anticipating
the development of full-fledged machine codes with the digital computer. [3]

[![Liebers
P1016851.JPG](/wiki/images/1/13/Liebers_P1016851.JPG)](/wiki/index.php?title=File:Liebers_P1016851.JPG)
Aspects of this transitional moment are apparent in a notice included
prominently inserted in the Lieber's code book:

> After July, 1904, all combinations of letters that do not exceed ten will
pass as one cipher word, provided that it is pronounceable, or that it is
taken from the following languages: English, French, German, Dutch, Spanish,
Portuguese or Latin -- International Telegraphic Conference, July 1903 [4]

Conforming to international conventions regulating telegraph communication at
that time, the stipulation that code words be actual words drawn from a
variety of European languages (many of Lieber's code words are indeed
arbitrary Dutch, German, and Spanish words) underscores this particular moment
of transition as reference to the human body in the form of "pronounceable"
speech from representative languages begins to yield to the inherent potential
for arbitrariness in digital representation.

What telegraph code books do is remind us of is the relation of language in
general to economy. Whether they may be economies of memory, attention, costs
paid to a telecommunicatons company, or in terms of computer processing time
or storage space, encoding language or knowledge in any form of writing is a
form of shorthand and always involves an interplay with what one expects to
perform or "get out" of the resulting encoding.

> Along with the invention of telegraphic codes comes a paradox that John
Guillory has noted: code can be used both to clarify and occlude. Among the
sedimented structures in the technological unconscious is the dream of a
universal language. Uniting the world in networks of communication that
flashed faster than ever before, telegraphy was particularly suited to the
idea that intercultural communication could become almost effortless. In this
utopian vision, the effects of continuous reciprocal causality expand to
global proportions capable of radically transforming the conditions of human
life. That these dreams were never realized seems, in retrospect, inevitable.
[5]

[![Liebers
P1016884.JPG](/wiki/images/9/9c/Liebers_P1016884.JPG)](/wiki/index.php?title=File:Liebers_P1016884.JPG)

[![Liebers
P1016852.JPG](/wiki/images/7/74/Liebers_P1016852.JPG)](/wiki/index.php?title=File:Liebers_P1016852.JPG)

[![Liebers
P1016880.JPG](/wiki/images/1/11/Liebers_P1016880.JPG)](/wiki/index.php?title=File:Liebers_P1016880.JPG)

Far from providing a universal system of encoding messages in the English
language, Lieber's code is quite clearly designed for the particular needs and
conditions of its use. In addition to the phrases ordered by keywords, the
book includes a number of tables of terms for specialized use. One table lists
a set of words used to describe all possible permutations of numeric grades of
coffee (Choliam = 3,4, Choliambos = 3,4,5, Choliba = 4,5, etc.); another table
lists pairs of code words to express the respective daily rise or fall of the
price of coffee at the port of Le Havre in increments of a quarter of a Franc
per 50 kilos ("Chirriado = prices have advanced 1 1/4 francs"). From an
archaeological perspective, the Lieber's code book reveals a cross section of
the needs and desires of early 20th century business communication between the
United States and its trading partners.

The advertisements lining the Liebers Code book further situate its use and
that of commercial telegraphy. Among the many advertisements for banking and
law services, office equipment, and alcohol are several ads for gun powder and
explosives, drilling equipment and metallurgic services all with specific
applications to mining. Extending telegraphy's formative role for ship-to-
shore and ship-to-ship communication for reasons of safety, commercial
telegraphy extended this network of communication to include those parties
coordinating the "raw materials" being mined, grown, or otherwise extracted
from overseas sources and shipped back for sale.

## "Raw data now!"

From [La ville intelligente - Ville de la connaissance](/wiki/index.php?title
=La_ville_intelligente_-_Ville_de_la_connaissance "La ville intelligente -
Ville de la connaissance"):

Étant donné que les nouvelles formes modernistes et l'utilisation de matériaux
propageaient l'abondance d'éléments décoratifs, Paul Otlet croyait en la
possibilité du langage comme modèle de « [données
brutes](/wiki/index.php?title=Bag_of_words "Bag of words") », le réduisant aux
informations essentielles et aux faits sans ambiguïté, tout en se débarrassant
de tous les éléments inefficaces et subjectifs.


From [The Smart City - City of Knowledge](/wiki/index.php?title
=The_Smart_City_-_City_of_Knowledge "The Smart City - City of Knowledge"):

As new modernist forms and use of materials propagated the abundance of
decorative elements, Otlet believed in the possibility of language as a model
of '[raw data](/wiki/index.php?title=Bag_of_words "Bag of words")', reducing
it to essential information and unambiguous facts, while removing all
inefficient assets of ambiguity or subjectivity.


> Tim Berners-Lee: [...] Make a beautiful website, but first give us the
unadulterated data, we want the data. We want unadulterated data. OK, we have
to ask for raw data now. And I'm going to ask you to practice that, OK? Can
you say "raw"?

>

> Audience: Raw.

>

> Tim Berners-Lee: Can you say "data"?

>

> Audience: Data.

>

> TBL: Can you say "now"?

>

> Audience: Now!

>

> TBL: Alright, "raw data now"!

>

> [...]

>

> So, we're at the stage now where we have to do this -- the people who think
it's a great idea. And all the people -- and I think there's a lot of people
at TED who do things because -- even though there's not an immediate return on
the investment because it will only really pay off when everybody else has
done it -- they'll do it because they're the sort of person who just does
things which would be good if everybody else did them. OK, so it's called
linked data. I want you to make it. I want you to demand it. [6]

## Un/Structured

As graduate students at Stanford, Sergey Brin and Lawrence (Larry) Page had an
early interest in producing "structured data" from the "unstructured" web. [7]

> The World Wide Web provides a vast source of information of almost all
types, ranging from DNA databases to resumes to lists of favorite restaurants.
However, this information is often scattered among many web servers and hosts,
using many different formats. If these chunks of information could be
extracted from the World Wide Web and integrated into a structured form, they
would form an unprecedented source of information. It would include the
largest international directory of people, the largest and most diverse
databases of products, the greatest bibliography of academic works, and many
other useful resources. [...]

>

> **2.1 The Problem**
> Here we define our problem more formally:
> Let D be a large database of unstructured information such as the World
Wide Web [...] [8]

In a paper titled _Dynamic Data Mining_ Brin and Page situate their research
looking for _rules_ (statistical correlations) between words used in web
pages. The "baskets" they mention stem from the origins of "market basket"
techniques developed to find correlations between the items recorded in the
purchase receipts of supermarket customers. In their case, they deal with web
pages rather than shopping baskets, and words instead of purchases. In
transitioning to the much larger scale of the web, they describe the
usefulness of their research in terms of its computational economy, that is
the ability to tackle the scale of the web and still perform using
contemporary computing power completing its task in a reasonably short amount
of time.

> A traditional algorithm could not compute the large itemsets in the lifetime
of the universe. [...] Yet many data sets are difficult to mine because they
have many frequently occurring items, complex relationships between the items,
and a large number of items per basket. In this paper we experiment with word
usage in documents on the World Wide Web (see Section 4.2 for details about
this data set). This data set is fundamentally different from a supermarket
data set. Each document has roughly 150 distinct words on average, as compared
to roughly 10 items for cash register transactions. We restrict ourselves to a
subset of about 24 million documents from the web. This set of documents
contains over 14 million distinct words, with tens of thousands of them
occurring above a reasonable support threshold. Very many sets of these words
are highly correlated and occur often. [9]

## Un/Ordered

In programming, I've encountered a recurring "problem" that's quite
symptomatic. It goes something like this: you (the programmer) have managed to
cobble out a lovely "content management system" (either from scratch, or using
any number of helpful frameworks) where your user can enter some "items" into
a database, for instance to store bookmarks. After this ordered items are
automatically presented in list form (say on a web page). The author: It's
great, except... could this bookmark come before that one? The problem stems
from the fact that the database ordering (a core functionality provided by any
database) somehow applies a sorting logic that's almost but not quite right. A
typical example is the sorting of names where details (where to place a name
that starts with a Norwegian "Ø" for instance), are language-specific, and
when a mixture of languages occurs, no single ordering is necessarily
"correct". The (often) exascerbated programmer might hastily add an additional
database field so that each item can also have an "order" (perhaps in the form
of a date or some other kind of (alpha)numerical "sorting" value) to be used
to correctly order the resulting list. Now the author has a means, awkward and
indirect but workable, to control the order of the presented data on the start
page. But one might well ask, why not just edit the resulting listing as a
document? Not possible! Contemporary content management systems are based on a
data flow from a "pure" source of a database, through controlling code and
templates to produce a document as a result. The document isn't the data, it's
the end result of an irreversible process. This problem, in this and many
variants, is widespread and reveals an essential backwardness that a
particular "computer scientist" mindset relating to what constitutes "data"
and in particular it's relationship to order that makes what might be a
straightforward question of editing a document into an over-engineered
database.

Recently working with Nikolaos Vogiatzis whose research explores playful and
radically subjective alternatives to the list, Vogiatzis was struck by how
from the earliest specifications of HTML (still valid today) have separate
elements (OL and UL) for "ordered" and "unordered" lists.

> The representation of the list is not defined here, but a bulleted list for
unordered lists, and a sequence of numbered paragraphs for an ordered list
would be quite appropriate. Other possibilities for interactive display
include embedded scrollable browse panels. [10]

Vogiatzis' surprise lay in the idea of a list ever being considered
"unordered" (or in opposition to the language used in the specification, for
order to ever be considered "insignificant"). Indeed in its suggested
representation, still followed by modern web browsers, the only difference
between the two visually is that UL items are preceded by a bullet symbol,
while OL items are numbered.

The idea of ordering runs deep in programming practice where essentially
different data structures are employed depending on whether order is to be
maintained. The indexes of a "hash" table, for instance (also known as an
associative array), are ordered in an unpredictable way governed by a
representation's particular implementation. This data structure, extremely
prevalent in contemporary programming practice sacrifices order to offer other
kinds of efficiency (fast text-based retrieval for instance).

## Data mining

In announcing Google's impending data center in Mons, Belgian prime minister
Di Rupo invoked the link between the history of the mining industry in the
region and the present and future interest in "data mining" as practiced by IT
companies such as Google.

Whether speaking of bales of cotton, barrels of oil, or bags of words, what
links these subjects is the way in which the notion of "raw material" obscures
the labor and power structures employed to secure them. "Raw" is always
relative: "purity" depends on processes of "refinement" that typically carry
social/ecological impact.

Stripping language of order is an act of "disembodiment", detaching it from
the acts of writing and reading. The shift from (human) reading to machine
reading involves a shift of responsibility from the individual human body to
the obscured responsibilities and seemingly inevitable forces of the
"machine", be it the machine of a market or the machine of an algorithm.

From [X = Y](/wiki/index.php?title=X_%3D_Y "X = Y"):

Still, it is reassuring to know that the products hold traces of the work,
that even with the progressive removal of human signs in automated processes,
the workers' presence never disappears completely. This presence is proof of
the materiality of information production, and becomes a sign of the economies
and paradigms of efficiency and profitability that are involved.


The computer scientists' view of textual content as "unstructured", be it in a
webpage or the OCR scanned pages of a book, reflect a negligence to the
processes and labor of writing, editing, design, layout, typesetting, and
eventually publishing, collecting and cataloging [11].

"Unstructured" to the computer scientist, means non-conformant to particular
forms of machine reading. "Structuring" then is a social process by which
particular (additional) conventions are agreed upon and employed. Computer
scientists often view text through the eyes of their particular reading
algorithm, and in the process (voluntarily) blind themselves to the work
practices which have produced and maintain these "resources".

Berners-Lee, in chastising his audience of web publishers to not only publish
online, but to release "unadulterated" data belies a lack of imagination in
considering how language is itself structured and a blindness to the need for
more than additional technical standards to connect to existing publishing
practices.

Last Revision: 2*08*2016

1. ↑ Benjamin Franklin Lieber, Lieber's Standard Telegraphic Code, 1896, New York;
2. ↑ Katherine Hayles, "Technogenesis in Action: Telegraph Code Books and the Place of the Human", How We Think: Digital Media and Contemporary Technogenesis, 2006
3. ↑ Hayles
4. ↑ Lieber's
5. ↑ Hayles
6. ↑ Tim Berners-Lee: The next web, TED Talk, February 2009
7. ↑ "Research on the Web seems to be fashionable these days and I guess I'm no exception." from Brin's [Stanford webpage](http://infolab.stanford.edu/~sergey/)
8. ↑ Extracting Patterns and Relations from the World Wide Web, Sergey Brin, Proceedings of the WebDB Workshop at EDBT 1998,
9. ↑ Dynamic Data Mining: Exploring Large Rule Spaces by Sampling; Sergey Brin and Lawrence Page, 1998; p. 2
10. ↑ Hypertext Markup Language (HTML): "Internet Draft", Tim Berners-Lee and Daniel Connolly, June 1993,
11. ↑

Retrieved from
[https://www.mondotheque.be/wiki/index.php?title=A_bag_but_is_language_nothing_of_words&oldid=8480](https://www.mondotheque.be/wiki/index.php?title=A_bag_but_is_language_nothing_of_words&oldid=8480)

 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.