aleph in Bodo 2015


rious branches of the Openbare Bibliotheek in Amsterdam, the libraries of the University of
Amsterdam, with a computer in front of me, on which another library was running, a library which is
perfectly virtual, which has no monumental buildings, no multi-million euro budget, no miles of stacks,
no hundreds of staff, but which has, despite lacking all what apparently makes a library, millions of
literary works and millions of scientific books, all digitized, all available at the click of the mouse for
everyone on the earth without any charge, library or university membership. As I was sitting in these

1

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

physical spaces where the past seemed to define the present, I was wondering where I should look to find
the library of the future: down to my screen or up around me.
The library on my screen was Aleph, one of the biggest of the countless piratical text collections on the
internet. It has more than a million scientific works and another million literary works to offer, all free to
download, without any charge or fee, for anyone on the net. I’ve spent months among its virtual stacks,
combing through the catalogue, talking to the librarians who maintain the collection, and watching the
library patrons as they used the collection. I kept going back to Aleph both as a user and as a researcher.
As a user, Aleph offered me books that the local libraries around me didn’t, in formats that were more
convenient than print. As a researcher, I was interested in the origins of Aleph, its modus operandi, its
future, and I was curious where the journey to which it has taken the book-readers, authors, publishers
and libraries would end.
In this short essay I will introduce some of the findings of a two year research project conducted on
Aleph. In the project I looked at several things. I reconstructed the pirate library’s genesis in order to
understand the forces that called it to life and shaped its development. I looked at its catalogue to
understand what it has to offer and how that piratical supply of books is related to the legal supply of
books through libraries and online distributors. I also acquired data on its usage, so was able to
reconstruct some aspects of piratical demand. After a short introduction, in the first part of this essay I
will outline some of the main findings, and in the second part will situate the findings in the wider context
of the future of libraries.

Book pirates and shadow librarians
Book piracy has a fascinating history, tightly


vity in the grey and black zones of legality marks the emergence of a
demand which legal suppliers are unwilling or unable to serve (Bodó, 2011a). That friction, more often
than not, leads to change. Earlier waves of book piracy foretold fundamental economic, political, societal
or technological shifts (Bodó, 2011b): changes in how the book publishing trade was organized (Judge,
1934; Pollard, 1916, 1920); the emergence of the new, bourgeois reading class (Patterson, 1968; Solly,
1885); the decline of pre-publication censorship (Rose, 1993); the advent of the Reformation and of the
Enlightenment (Darnton, 1982, 2003), or the rapid modernization of more than one nation (Khan &
Sokoloff, 2001; Khan, 2004; Yu, 2000).
The latest wave of piracy has coincided with the digital revolution which, in itself, profoundly upset the
economics of cultural production and distribution (Landes & Posner, 2003). However technology is not
the primary cause of the emergence of cultural black markets like Aleph. The proliferation of computers
and the internet has just revealed a more fundamental issue which all has to do with the uneven
distribution of the access to knowledge around the globe.
Sometimes book pirates do more than just forecast and react to changes that are independent of them.
Under certain conditions, they themselves can be powerful agents of change (Bodó, 2011b). Their agency
rests on their ability to challenge the status quo and resist cooptation or subjugation. In that effect, digital
pirates seem to be quite resilient (Giblin, 2011; Patry, 2009). They have the technological upper hand and
so far they have been able to outsmart any copyright enforcement effort (Bodó, forthcoming). As long as
it is not completely possible to eradicate file sharing technologies, and as long as there is a substantial
difference between what is legally available and what is in demand, cultural black markets will be here to
compete with and outcompete the established and recognized cultural i


ublic piratical shadow libraries.

5

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

Aleph1
Aleph2 is a meta-library, and currently one of the biggest online piratical text collections on the internet.
The project started on a Russian bulletin board devoted to piracy in around 2008 as an effort to integrate
various free-floating text collections that circulated online, on optical media, on various public and private
ftp servers and on hard-drives. Its aim was to consolidate these separate text collections, many of which
were created in various Russian academic institutions, into a single, unified catalog, standardize the
technical aspects, add and correct missing or incorrect metadata, and offer the resulting catalogue,
computer code and the collection of files as an open infrastructure.

From Russia with love
It is by no means a mistake that Aleph was born in Russia. In post-Soviet Russia the unique constellation
of several different factors created the necessary conditions for the digital librarianship movement that
ultimately led to the development of Aleph. A rich literary legacy, the Soviet heritage, the pace with
which various copying technologies penetrated the market, the shortcomings of the legal environment and
the informal norms that stood in for the non-existent digital copyrights all contributed to the emergence of
the biggest piratical library in the history of mankind.
Russia cherishes a rich literary tradition, which suffered and endured extreme economic hardships and
political censorship during the Soviet period (Ermolaev, 1997; Friedberg, Watanabe, & Nakamoto, 1984;
Stelmakh, 2001). The political transformation in the early 1990’s liberated authors, publishers, librarians
and readers from much of the political oppression, but it did not solve the economic issues that stood in
the way of a healthy literary market. Disposable income was low, state subsidies were limited, the dire
economic situation created uncertainty in the book market. The previous decades, however, have taught
authors and readers how to overcome political and economic obstacles to access to books. During the
Soviet times authors, editors and readers operated clandestine samizdat distribution networks, while
informal book black markets, operating in semi-private spheres, made uncensored but hard to come by
books accessible (Stelmakh, 2001). This survivalist attitude and the skills that came with it became handy
in the post-Soviet turmoil, and were directly transferable to the then emerging digital technologies.

1

I have conducted extensive research on the origins of Aleph, on its catalogue and its users. The detailed findings, at
the time of writing this contribution are being prepared for publication. The following section is brief summary of
those findings and is based upon two forthcoming book chapters on Aleph in a report, edited by Joe Karaganis, on
the role of shadow libraries in the higher education systems of multiple countries.
2
Aleph is a pseudonym chosen to protect the identity of the shadow library in question.

6

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

Russia is not the only country with a significant informal media economy of books, but in most other
places it was the photocopy machine that emerged to serve such book grey/black markets. In pre-1990
Russia and in other Eastern European countries the access to this technology was limited, and when
photocopiers finally became available, computers were close behind them in terms of accessibility. The
result of the parallel introduction of the photocopier and the computer was that the photocopy technology
did not have time to lock in the informal market of texts. In many countries where the photocopy machine
preceded the computer by decades, copy shops still capture the bulk of the informal production and
distributi


the Russian internet in the late 1990’s, early 2000’s.
First, lib.ru provided the technological blueprint for any future digital library. But more importantly,

7

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

Moshkov’s way of handling the texts, his way of responding to the claims, requests, questions, complaints
of authors and publishers paved the way to the development of copynorms (Schultz, 2007) that continue
to define the Russian digital library scene until today. Moshkov was instrumental in the creation of an
enabling environment for the digital librarianship while respecting the claims of authors, during times
when the formal copyright framework and the enforcement environment was both unable and unwilling to
protect works of authorship (Elst, 2005; Sezneva, 2012).

Guerilla Open Access
Around the time of the late 2000’s when Aleph started to merge the Kolkhoz collection with other, freefloating texts collections, two other notable events took place. It was in 2008 when Aaron Swartz penned
his Guerilla Open Access Manifesto (Swartz, 2008), in which he called for the liberation and sharing of
scientific knowledge. Swartz forcefully argued that scientific knowledge, the production of which is
mostly funded by the public and by the voluntary labor of academics, cannot be locked up behind
corporate paywalls set up by publishers. He framed the unauthorized copying and transfer of scientific
works from closed access text repositories to public archives as a moral act, and by doing so, he created
an ideological framework which was more radical and promised to be more effective than either the
creative commons (Lessig, 2004) or the open access (Suber, 2013) movements that tried to address the
access to knowledge issues in a more copyright friendly manner. During interviews, the administrators of
Aleph used the very same arguments to justify the raison d'être of their piratical library. While it seems
that Aleph is the practical realization of Swartz’s ideas, it is hard to tell which served as an inspiration for
the other.
It was also in around the same time when another piratical library, gigapedia/library.nu started its
operation, focusing mostly on making freely available English language scientific works (Liang, 2012).
Until its legal troubles and subsequent shutdown in 2012, gigapedia/library.nu was the biggest English
language piratical scientific library on the internet amassing several hundred thousand books, including
high-quality proofs ready to print and low resolution scans possibly prepared by a student or a lecturer.
During 2012 the mostly Russian-language and natural sciences focused Alephs absorbed the English
language, social sciences rich gigapedia/library.nu, and with the subsequent shutdown of
gigapedia/library.nu Aleph became the center of the scientific shadow library ecosystem and community.

Aleph by numbers

8

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

By adding pre-existing text collections to its catalogue Aleph was able to grow at an astonishing rate.
Aleph added, on average 17.500 books to its collection each month since 2009, and as a result, by April
2014 is has more than 1.15 million documents. Nearly two thirds of the collection is in English, one fifth
of the documents is in Russian, while German works amount to the third largest group with 8.5% of the
collection. The rest of the major European languages, like French or Spanish have less than 15000 works
each in the collection.
More than 50 thousand publishers have works in the library, but most of the collection is published by
mainstream western academic publishers. Springer published more than 12% of the works in the
collection, followed by the Cambridge University Press, Wiley, Routledge and Oxford University Press,
each having more than 9000 works in the collection.
Most of the collection is relatively recent, more than 70% of the collection being published in 1990 or
after. Despite the recentness of the collection, the electronic availability of the titles in the collection is
limited. While around 80% of the books that had an ISBN number registered in the catalogue3 was
available in print either as a new copy or a second hand one, only about one third of the titles were
available in e-book formats. The mean price of the titles still in print was 62 USD according to the data
gathered from Amazon.com.
The number of works accessed through of Aleph is as impressive as its catalogue. In the three months
between March and June, 2012, on average 24.000 documents were downloaded every day from one of
its half-a-dozen mirrors.4 This means that the number of documents downloaded daily from Aleph is
probably in the 50 to 100.000 range. The library users come from more than 150 different countries. The
biggest users in terms of volume were the Russian Federation, Indonesia, USA, India, Iran, Egypt, China,
Germany and the UK. Meanwhile, many of the highest per-capita users are Central and Eastern European
countries.

What Aleph is and what it is not
Aleph is an example of the library in the post scarcity age. It is founded on the idea that books should no
longer be a scarce resource. Aleph set out to remove both sources of scarcity: the natural source of
3

Market availability data is only available for that 40% of books in the Aleph catalogue that had an ISBN number
on file. The titles without a valid ISBN number tend to be older, Russian language titles, in general with low
expected print and e-book availability.
4
Download data is based on the logs provided by one of the shadow library services which offers the books in
Aleph’s catalogue as well as other works also free and without any restraints or limitations.

9

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

scarcity in physical copies is overcome through distributed digitization; the artificial source of scarcity
created by copyright protection is overcome through infringement. The liberation from both constraints is
necessary to create a truly scarcity free environment and to release the potential of the library in the postscarcity age.
Aleph is also an ongoing demonstration of the fact that under the condition of non-scarcity, the library can
be a decentralized, distributed, commons-based institution created and maintained through peer
production (Benkler, 2006). The message of Aleph is clear: users left to their own devices, can produce a
library by themselves for themselves. In fact, users are the library. And when everyone has the means to
digitize, collect, catalogue and share his/her own library, then the library suddenly is everywhere. Small
individual and institutional collections are aggregated into Aleph, which, in turn is constantly fragmented
into smaller, local, individual collections as users download works from the collection. The library is
breathing (Battles, 2004) books in and out, but for the first time, this circulation of books is not a zero
sum game, but a cumulative one: with every cycle the collection grows.
On the other hand Aleph may have lots of books on offer, but it is clear that it is neither universal in its
scope, nor does it fulfill all the critical functions of a library. Most importantly Aleph is disembedded
from the local contexts and communities that usually define the focus of the library. While it relies on the
availability of local digital collections for its growth, it has no means to play an active role in its own
development. The guardians of Aleph can prevent books from entering the collection, but they cannot
pay, ask or force anyone to provide a title if it is missing. Aleph is reliant on the weak copy-protection
technologies of official e-text repositories and the goodwill of individual document submitters when it
comes to the expansion of the collection. This means that the Aleph collection is both fragmented and
biased, and it lacks the necessary safeguards to ensure that it stays either current or relevant.
Aleph, with all its strengths and weaknesses carries an important lesson for the discussions on the future
of libraries. In the next section I’ll try situate these lessons in the wider context of the library in the post
scarcity age.

The future of the library
There is hardly a week without a blog post, a conference, a workshop or an academic paper discussing the
future of libraries. While existing libraries are buzzing with activity, librarians are well aware that they
need to re-define themselves and their institutions, as the book collections around which libraries were
organized slowly go the way the catalogue has gone: into the digital realm. It would be impossible to give

10

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

a faithful summary of all the discussions on the future of libraries is such a short contribution. There are,
however, a few threads, to which the story of Aleph may contribute.

Competition
It is very rare to find the two words: libraries and competition in the same sentence. No wonder: libraries
enjoyed a near perfect monopoly in their field of activity. Though there may have been many different
local initiatives that provided free access to books, as a specialized institution to do so, the library was
unmatched and unchallenged. This monopoly position has been lost in a remarkably short period of time
due to the internet and the rapid innovations in the legal e-book distribution markets. Textbooks can be
rented, e-books can be lent, a number of new startups and major sellers offer flat rate access to huge
collections. Expertise that helps navigate the domains of knowledge is abundant, there are multiple
authoritative sources of information and meta-information online. The search box of the library catalog is
only one, and not even the most usable of all the different search boxes one can type a query in5.
Meanwhile there are plenty of physic


ption being orphan works which are presumed to be still copyrighted, but without an identifiable
rights owner. In the EU, the Directive 2012/28/EU on certain permitted uses of orphan works in theory eases access
to such works, but in practice its practical impact is limited by the many constraints among its provisions. Lacking
any orphan works legislation and the Google Book Settlement still in limbo, the US is even farther from making
orphan works generally accessible to the public.

13

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

licenses to e-book catalogues of various sizes, these arrangements also carry the danger of a commercial
lock-in of the access to digital works, and render libraries dependent upon the services of commercial
providers who may or may not be the best defenders of public interest (OECD, 2012).
Shadow libraries like Aleph are called into existence by the vacuum that was left behind by the collapse
of libraries in the digital sphere and by the inability of the commercial arrangements to provide adequate
substitute services. Shadow libraries are pooling distributed resources and expertise over the internet, and
use the lack of legal or technological barriers to innovation in the informal sphere to fill in the void left
behind by libraries.

What can Aleph teach us about the future of libraries?
The story of Aleph offers two, closely interrelated considerations for the debate on the future of libraries:
a legal and an organizational one. Aleph operates beyond the limits of legality, as almost all of its
activities are copyright infringing, including the unauthorized digitization of books, the unauthorized
mass downloads from e-text repositories, the unauthorized acts of uploading books to the archive, the
unauthorized distribution of books, and, in most countries, the unauthorized act of users’ downloading
books from the archive. In the debates around copyright infringement, illegality is usually interpreted as a
necessary condition to access works for free. While this is undoubtedly true, the fact that Aleph provides
no-cost access to books seems to be less important than the fact that it provides an access to them in the
first place.
Aleph is a clear indicator of the volume of the demand for current books in digital formats in developed
and in developing countries. The legal digital availability, or rather, unavailability of its catalogue also
demonstrates the limits of the current commercial and library based arrangements that aim to provide low
cost access to books over the internet. As mentioned earlier, Aleph’s catalogue is mostly of recent books,
meaning that 80% of the titles with a valid ISBN number are still in print and available as a new or used
print copy through commercial retailers. What is also clear, that around 66% of these books are yet to be
made available in electronic format. While publishers in theory have a strong incentive to make their most
recent titles available as e-books, they lag behind in doing so.
This might explain why one third of all the e-book downloads in Aleph are from highly developed
Western countries, and two third of these downloads are of books without a kindle version. Having access
to print copies either through libraries or through commercial retailers is simply not enough anymore.
Developing countries are a slightly different case. There, compared to developed countries, twice as many

14

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

of the downloads (17% compared to 8% in developed countries) are of titles that aren’t available in print
at all. Not having access to books in print seems to be a more pressing problem for developing countries
than not having access to electronic copies. Aleph thus fulfills at least two distinct types of demand: in
developed countries it provides access to missing electronic versions, in developing countries it provides
access to missing print copies.
The ability to fulfill an otherwise unfulfilled demand is not the only function of illegality. Copyright
infringement in the case of Aleph has a much more important role: it enables the peer production of the
library. Aleph is an open source library. This means that every resource it uses and every resource it
creates is freely accessible to anyone for use without any further restrictions. This includes the server
code, the database, the catalogue and the collection. The open source nature of Aleph rests on the
ideological claim that the scientific knowledge produced by humanity, mostly through public funds
should be open for anyone to access without any restrictions. Everything else in and around Aleph stems
from this claim, as they replicate the open access logic in all the other aspects of Aleph’s operation. Aleph
uses the peer produced Open Library to fetch book metadata, it uses the bittorrent and ed2k P2P networks
to store and make books accessible, it uses Linux and MySQL to run its code, and it allows its users to
upload books and edit book metadata. As a consequence of its open source nature, anyone can contribute
to the project, and everyone can enjoy its benefits.
It is hard to quantify the impact of this piratical open access library on education, science and research in
various local contexts where Aleph is the prime source of otherwise inaccessible books. But it is
relatively easy to measure the consequences of openness at the level of the Aleph, the library. The
collection of Aleph was created mostly by those individuals and communities who decided to digitize
books by themselves for their own use. While any single individual is only capable of digitizing a few
books at the maximum, the small contributions quickly add up. To digitize the 1.15 million documents in
the Aleph collection would require an investment of several hundred million Euros, and a substantial
subsequent investment in storage, collection management and access provision (Poole, 2010). Compared
to these figures the costs associated with running Aleph is infinitesimal, as it survives on the volunteer
labor of a few individuals, and annual donations in the total value of a few thousand dollars. The hundreds
of thousands who use Aleph on a more or less regular basis have an immense amount of resources, and by
disregarding the copyright laws Aleph is able to tap into those resources and use them for the
development of the library. The value of these resources and of the peer produced library is the difference
between the actual costs associated with Aleph, and the investment that would be required to create
something remotely similar.

15

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

The decentralized, collaborative mass digitization and making available of current, thus most relevant
scientific works is only possible at the moment through massive copyright infringement. It is debatable
whether the copyrighted corpus of scientific works should be completely open, and whether the blatant
disregard of copyrights through which Aleph achieved this openness is the right path towards a more
openly accessible body of scientific knowledge. It is also yet to be measured what effects shadow libraries
may have on the commercial intermediaries and on the health of scientific publishing and science in
general. But Aleph, in any case, is a case study in the potential benefits of open sourcing the library.

Conclusion
If we can take Aleph as an expression of what users around the globe want from a library, then the answer
is that there is a strong need for a universally accessible collection of current, relevant (scientific) books
in restrictions-free electronic formats. Can we expect any single library to provide anything even remotely
similar to that in the foreseeable future? Does such a service have a place in the future of libraries? It is as
hard to imagine the future library with such a service as without.
While the legal and financial obstacles to the creation of a scientific library with as universal reach as
Aleph may be difficult the overcome, other aspects of it may be more easily replicable. The way Aleph
operates demonstrates the amount of material and immaterial resources users are willing to contribute to
build a library that responds to their needs and expectations. If libraries plan to only ‘host’ user-governed
activities, it means that the library is still imagined to be a separate entity from its users. Aleph teaches us
that this separation can be overcome and users can constitute a library. But for that they need
opportunities to participate in the production of the library: they need the right to digitize books and copy
digital books to and from the library, they need the opportunity to participate in the cataloging and
collection building process, they need the opportunity to curate and program the collection. In other
words users need the chance to be librarians in the library if they wish to do so, and so libraries need to be
able to provide access not just to the collection but to their core functions as well. The walls that separate
librarians from library patrons, private and public collections, insiders and outsiders can all prevent the
peer production of the library, and through that, prevent the future that is the closest to what library users
think of as ideal.

16

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values,

 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.