digitization in Bodo 2014


dreds of thousands of books and millions of
journal articles. In this contribution we try to understand the factors that led to the development of
these sites, and the sociocultural and legal conditions that enable them to operate under hostile legal
and political conditions. Through the reconstruction of the micro-histories of peer produced online text
collections that played a central role in the history of RuNet, we are able to link the formal and informal
support for these sites to the specific conditions developed under the Soviet and post Soviet times.

(pirate) libraries on the net
The digitization and collection of texts was one of the very first activities enabled by computers. Project
Gutenberg, the first in line of digital libraries was established as early as 1971. By the early nineties, a
number of online electronic text archives emerged, all hoping to finally realize the dream that was
chased by humans every since the first library: the collection of everything (Battles, 2004), the Memex
(Bush, 1945), the Mundaneum (Rieusset-Lemarié, 1997), the Library of Babel (Borges, 1998). It did not
take long to realize that the dream was still beyond reach: the information storage and retri


the most important books,
novels that "everyone must read" and such stuff. People typed in poetry, smaller prose pieces. I have
myself read a sci-fi novel printed on a mainframe, which was obviously typed in. This novel was by
Strugatski brothers. It was not prohibited or dissident, but just impossible to buy in the stores. These
were culturally important, cult novels, so people typed them in. […] At this point it became clear that
there was a lot of value in having a plaintext file with some novels, and the most popular novels were first
digitized in this way.”
The next stage in the text digitization started around 1994. By that time growing numbers of people had
computers, scanning peripherals, OCR software. Russian internet and PC penetration while extremely
low overall in the 1990s (0.1% of the population having internet access in 1994, growing to 8.3% by
2003), began to make inroads in educational and scientific institutions and among Moscow and
St.Petersburg elites, who were often the critical players in these networks. As access to technologies
increased a much wider array of people began to digitize their favorite texts, and these collections began
to circulate, first via CD-ROMs, later via the internet.
One of such collection belonged to Maxim Moshkov, who published his library under the name lib.ru in
1994. Moshkov was a graduate of the Moscow State University Department of Mechanics and
Mathematics, which played a large role in the digitization of scientific works. After graduation, he started
to work for the Scientific Research Institute of System Development, a computer science institute
associated with the Russian Academy of Sciences. He describes the early days of his collection as follows:
“ I began to collect electronic texts in 1990, on a desktop computer. When I got on the Internet in 1994, I
found lots of sites with texts. It was like a dream came true: there they were, all the desired books. But
these collections were in a dreadful state! Incompatible formats, different encodings, missing content. I
had to spend hours sco


owly the library grew, and the audience increased with it. People started
to send books to me, because they were easier to read in my collection. And the time came when I
stopped surfing the internet for books: regular readers are now sending me the books. Day after day I get
about 100 emails, and 10-30 of them contain books. So many books were sent in, that I did not have time
to process them. Authors, translators and publishers also started to send texts. They all needed the
library.”(Мошков, 1999)

In the second half of the 1990’s, the Russian Internet—RuNet—was awash in book digitization projects.
With the advent of scanners, OCR technology, and the Internet, the work of digitization eased
considerably. Texts migrated from print to digital and sometimes back to print again. They circulated
through different collections, which, in turn, merged, fell apart, and re-formed. Digital libraries with the
mission to collect and consolidate these free-floating texts sprung up by the dozens.
Such digital librarianship was the antithesis of official Soviet book culture: it was free, bottom-up,
democratic, and uncensored. It also offered a partial remedy to problems created by the post-Soviet
collapse of the economy: the impoverishment of libraries, readers, and publishers. In this context, book
digitization and collecting also offered a sense of political, economic and cultural agency, with parallels
to the copying and distribution of texts in Soviet times. The capacity to scale up these practices coincided
with the moment when anti-totalitarian social sentiments were the strongest, and economic needs the
direst.
The unprecedented bloom of digital librarianship is the result of the superimposition of multiple waves
of distinct transformations: technological, political, economical and social. “Maksim Moshkov's Library”
was ground zero for this convergence and soon became a central point of exchange for the community
engaged in text digitization and collection:
[At the outset] there were just a couple of people who started scanning books in large quantities. Literally
hundreds of books. Others started proofreading, etc. There was a huge hole in the market for books.
Science fiction, adventure, crime fiction, all of this was hugely in demand by the public. So lib.ru was to a
large part the response, and was filled by those books that people most desired and most valued.
For years, lib.ru integrated as much as it could of the different digital libraries flourishing in the RuNet. By
doing so, it preserved the collections of the many shor


ial support. The kolhoz group never had a web site with a database, like
most projects today. They had an ftp server with files, and the access to ftp was given by PM in a forum.
This ftp server was privately supported by one of the members (who was an academic researcher, like
most kolhoz members). The files were distributed directly by burning files on writable DVDs and giving the

4

DJVU is a file format that revolutionized online book distribution the way mp3 revolutionized the online music
distribution. For books that contain graphs, images and mathematical formulae scanning is the only digitization
option. However, the large number of resulting image files is difficult to handle. The DJVU file format allows for the
images of scanned book pages to be stored in the smallest possible file size, which makes it the perfect medium for
the distribution of scanned e-books.

11

Draft Manuscript, 11/4/2014, DO NOT CITE!
DVDs away. Later, the ftp access was closed to the public, and only a temporary file-swapping ftp server
remained. Today the kolhoz DVD releases are mostly spread via torrents.” 5
Kolhoz amassed around fifty thousand documents, the mexmat collection of the Moscow State
Universi


he collection is represented not by the number of books but
by the amount of knowledge it contains. [ALEPH] does not need to grow more and I am not the only one
among us who thinks so. […]
We have absolutely no idea who sends books in. It is practically impossible to know, because there are a
million books. We gather huge collections which eliminate any traces of the original uploaders.
My expectation is that new arrivals will dry up. Not completely, as I described above, some books will
always be scanned or rescanned (it nowadays happens quite surprisingly often) and the overall process of
digitization cannot and should not be stopped. It is also hard to say when the slowdown will occur: I
expected it about a year ago, but then library.nu got shut down and things changed dramatically in many
respects. Now we are "in charge" (we had been the largest anyways, just now everyone thinks we are in
5

Anonymous source #1

12

Draft Manuscript, 11/4/2014, DO NOT CITE!
charge) and there has been a temporary rise in the book inflow. At the moment, relatively small or
previously unseen collections are being integrated into [ALEPH]. Perhaps in a year it will saturate.
However, intuition is not a good g


had to adopt global norms, while the global norms struggled to adapt to the emergence of digital
copying.
The first post-Soviet decade produced new copyright laws that conformed with some of the international
norms advocated by Western rightsholders, but little legal clarity or enforceability (Sezneva & Karaganis,
2011). Under such conditions, informally negotiated copynorms set in to fill the void of non-existent,
unreasonable, or unenforceable laws. The pirate libraries in the RuNet are as much regulated by such
norms as by the actual laws themselves.
During most of the 1990’s user-driven digitization and archiving was legal, or to be more exact, wasn’t
illegal. The first Russian copyright law, enacted in 1993, did not cover “internet rights” until a 2006
amendment (Budylin & Osipova, 2007; Elst, 2005, p. 425). As a result, many argued (including the
Moscow prosecutor’s office), that the distribution of copyrighted works via the internet was not
copyright infringement. Authors and publishers, who saw their works appear in digital form, and
circulated via CD-ROMs and the internet, had to rely on informal norms, still in development, to establish
control over their texts vis-à-vis en


am afraid to live in a world where no one reads
books. This is already the case in America, and it is speeding up with us. I don’t just want to derail this
process, I would like to turn it around.”

17

Draft Manuscript, 11/4/2014, DO NOT CITE!
Moshkov played a crucial role in consolidating copynorms in the Russian digital publishing domain. His
reputation and place in the Russian literary domain is marked by a number of prizes12, and the library’s
continued existence. This place was secured by a number of closely intertwined factors:







Framing and anchoring the digitization and distribution practice in the library tradition.
The non-profit status of the enterprise.
Respecting the wishes of the rights holders even if he was not legally obliged to do so.
Maintaining active communication with the different stakeholders in the community,
including authors and readers.
Responding to a clear gap in affordable, legal access.
Conservatism with regard to the book, anchored in the argument that digital texts are not
substitutes for printed matter.

Many other digital libraries tried to follow Moshkov’s formula, but the times were changing. Internet and
computer access le


digitization in Barok 2014


onsulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.

Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.

C

Searching in a collection of [amassed/assembled] [tangible] documents (ie.
bookshelf) is different from searching in a systematically structured
repository (library) and even more so from searching in a digital repository
(digital library). Not that they are mutually exclusive. One can devise
structures and algorithms to search through a printed text, or read books in a
library one by one. They are


in the text. The same goes for explicit associations made
between blocks of the text by means of indexed paragraphs, chapters or pages.

From this follows that all utterances point to the following utterance by the
nature of sequential order, and indexing provides means for pointing elsewhere
in the document as well.

A lot can be said about references to other texts. Here, to spare time, I
would refer you to a talk I gave a few months ago and which is online
10(http://monoskop.org/Talks/Communing_Texts).

This is still the realm of print. What happens with document when it is
digitized?

Digitization breaks a document into units of which each is assigned a numbered
position in the sequence of the document. From this perspective digitization
can be viewed as a total indexation of the document. It is converted into
units rendered for machine operations. This sequentiality is made explicit, by
means of an underlying index.

Sequences and chains are orders of one dimension. Their one-dimensional
ordering allows addressability of each element and [random] access. [Jumps]
between [random] addresses are still sequential, processing elements one at a
time.

## (K) The
index[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=6
"Edit section: \(K\) The index")]

* [![](/images/thumb/2/27/Summa_confessorum.1310.jpg/103p

 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.