aleph in Bodo 2015


utions and Systems of Intellectual Property, Ashgate

physical spaces where the past seemed to define the present, I was wondering where I should look to find
the library of the future: down to my screen or up around me.
The library on my screen was Aleph, one of the biggest of the countless piratical text collections on the
internet. It has more than a million scientific works and another million literary works to offer, all free to
download, without any charge or fee, for anyone on the net. I’ve spent months among its virtual stacks,
combing through the catalogue, talking to the librarians who maintain the collection, and watching the
library patrons as they used the collection. I kept going back to Aleph both as a user and as a researcher.
As a user, Aleph offered me books that the local libraries around me didn’t, in formats that were more
convenient than print. As a researcher, I was interested in the origins of Aleph, its modus operandi, its
future, and I was curious where the journey to which it has taken the book-readers, authors, publishers
and libraries would end.
In this short essay I will introduce some of the findings of a two year research project conduct


y has coincided with the digital revolution which, in itself, profoundly upset the
economics of cultural production and distribution (Landes & Posner, 2003). However technology is not
the primary cause of the emergence of cultural black markets like Aleph. The proliferation of computers
and the internet has just revealed a more fundamental issue which all has to do with the uneven
distribution of the access to knowledge around the globe.
Sometimes book pirates do more than just forecast and react to c


atalog, standardize the
technical aspects, add and correct missing or incorrect metadata, and offer the resulting catalogue,
computer code and the collection of files as an open infrastructure.

From Russia with love
It is by no means a mistake that Aleph was born in Russia. In post-Soviet Russia the unique constellation
of several different factors created the necessary conditions for the digital librarianship movement that
ultimately led to the development of Aleph. A rich literary legacy, the Soviet heritage, the pace with
which various copying technologies penetrated the market, the shortcomings of the legal environment and
the informal norms that stood in for the non-existent digital copyrights all contribut


(Stelmakh, 2001). This survivalist attitude and the skills that came with it became handy
in the post-Soviet turmoil, and were directly transferable to the then emerging digital technologies.

1

I have conducted extensive research on the origins of Aleph, on its catalogue and its users. The detailed findings, at
the time of writing this contribution are being prepared for publication. The following section is brief summary of
those findings and is based upon two forthcoming book chapters on Aleph in a report, edited by Joe Karaganis, on
the role of shadow libraries in the higher education systems of multiple countries.
2
Aleph is a pseudonym chosen to protect the identity of the shadow library in question.

6

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intel


ms of authors, during times
when the formal copyright framework and the enforcement environment was both unable and unwilling to
protect works of authorship (Elst, 2005; Sezneva, 2012).

Guerilla Open Access
Around the time of the late 2000’s when Aleph started to merge the Kolkhoz collection with other, freefloating texts collections, two other notable events took place. It was in 2008 when Aaron Swartz penned
his Guerilla Open Access Manifesto (Swartz, 2008), in which he called for the liberation


omised to be more effective than either the
creative commons (Lessig, 2004) or the open access (Suber, 2013) movements that tried to address the
access to knowledge issues in a more copyright friendly manner. During interviews, the administrators of
Aleph used the very same arguments to justify the raison d'être of their piratical library. While it seems
that Aleph is the practical realization of Swartz’s ideas, it is hard to tell which served as an inspiration for
the other.
It was also in around the same time when another piratical library, gigapedia/library.nu started its
operation, focusing mostly on maki


library on the internet amassing several hundred thousand books, including
high-quality proofs ready to print and low resolution scans possibly prepared by a student or a lecturer.
During 2012 the mostly Russian-language and natural sciences focused Alephs absorbed the English
language, social sciences rich gigapedia/library.nu, and with the subsequent shutdown of
gigapedia/library.nu Aleph became the center of the scientific shadow library ecosystem and community.

Aleph by numbers

8

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

By adding pre-existing text collections to its catalogue Aleph was able to grow at an astonishing rate.
Aleph added, on average 17.500 books to its collection each month since 2009, and as a result, by April
2014 is has more than 1.15 million documents. Nearly two thirds of the collection is in English, one fifth
of the documents is in Russian, while German


r as a new copy or a second hand one, only about one third of the titles were
available in e-book formats. The mean price of the titles still in print was 62 USD according to the data
gathered from Amazon.com.
The number of works accessed through of Aleph is as impressive as its catalogue. In the three months
between March and June, 2012, on average 24.000 documents were downloaded every day from one of
its half-a-dozen mirrors.4 This means that the number of documents downloaded daily from Aleph is
probably in the 50 to 100.000 range. The library users come from more than 150 different countries. The
biggest users in terms of volume were the Russian Federation, Indonesia, USA, India, Iran, Egypt, China,
Germany and the UK. Meanwhile, many of the highest per-capita users are Central and Eastern European
countries.

What Aleph is and what it is not
Aleph is an example of the library in the post scarcity age. It is founded on the idea that books should no
longer be a scarce resource. Aleph set out to remove both sources of scarcity: the natural source of
3

Market availability data is only available for that 40% of books in the Aleph catalogue that had an ISBN number
on file. The titles without a valid ISBN number tend to be older, Russian language titles, in general with low
expected print and e-book availability.
4
Download data is based on the logs provided by one of the shado


l source of scarcity
created by copyright protection is overcome through infringement. The liberation from both constraints is
necessary to create a truly scarcity free environment and to release the potential of the library in the postscarcity age.
Aleph is also an ongoing demonstration of the fact that under the condition of non-scarcity, the library can
be a decentralized, distributed, commons-based institution created and maintained through peer
production (Benkler, 2006). The message of Aleph is clear: users left to their own devices, can produce a
library by themselves for themselves. In fact, users are the library. And when everyone has the means to
digitize, collect, catalogue and share his/her own library, then the library suddenly is everywhere. Small
individual and institutional collections are aggregated into Aleph, which, in turn is constantly fragmented
into smaller, local, individual collections as users download works from the collection. The library is
breathing (Battles, 2004) books in and out, but for the first time, this circulation of books is not a zero
sum game, but a cumulative one: with every cycle the collection grows.
On the other hand Aleph may have lots of books on offer, but it is clear that it is neither universal in its
scope, nor does it fulfill all the critical functions of a library. Most importantly Aleph is disembedded
from the local contexts and communities that usually define the focus of the library. While it relies on the
availability of local digital collections for its growth, it has no means to play an active role in its own
development. The guardians of Aleph can prevent books from entering the collection, but they cannot
pay, ask or force anyone to provide a title if it is missing. Aleph is reliant on the weak copy-protection
technologies of official e-text repositories and the goodwill of individual document submitters when it
comes to the expansion of the collection. This means that the Aleph collection is both fragmented and
biased, and it lacks the necessary safeguards to ensure that it stays either current or relevant.
Aleph, with all its strengths and weaknesses carries an important lesson for the discussions on the future
of librarie


reative values, Cultural Heritage Institutions and Systems of Intellectual Property, Ashgate

a faithful summary of all the discussions on the future of libraries is such a short contribution. There are,
however, a few threads, to which the story of Aleph may contribute.

Competition
It is very rare to find the two words: libraries and competition in the same sentence. No wonder: libraries
enjoyed a near perfect monopoly in their field of activity. Though there may have been many different
local initi


ts also carry the danger of a commercial
lock-in of the access to digital works, and render libraries dependent upon the services of commercial
providers who may or may not be the best defenders of public interest (OECD, 2012).
Shadow libraries like Aleph are called into existence by the vacuum that was left behind by the collapse
of libraries in the digital sphere and by the inability of the commercial arrangements to provide adequate
substitute services. Shadow libraries are pooling distributed resources and expertise over the internet, and
use the lack of legal or technological barriers to innovation in the informal sphere to fill in the void left
behind by libraries.

What can Aleph teach us about the future of libraries?
The story of Aleph offers two, closely interrelated considerations for the debate on the future of libraries:
a legal and an organizational one. Aleph operates beyond the limits of legality, as almost all of its
activities are copyright infringing, including the unauthorized digitization of books, the unauthorized
mass downloads from e-text repositories, the unauthorized acts of uploading books to


es, the unauthorized act of users’ downloading
books from the archive. In the debates around copyright infringement, illegality is usually interpreted as a
necessary condition to access works for free. While this is undoubtedly true, the fact that Aleph provides
no-cost access to books seems to be less important than the fact that it provides an access to them in the
first place.
Aleph is a clear indicator of the volume of the demand for current books in digital formats in developed
and in developing countries. The legal digital availability, or rather, unavailability of its catalogue also
demonstrates the limits of the current commercial and library based arrangements that aim to provide low
cost access to books over the internet. As mentioned earlier, Aleph’s catalogue is mostly of recent books,
meaning that 80% of the titles with a valid ISBN number are still in print and available as a new or used
print copy through commercial retailers. What is also clear, that around 66% of these books are yet to be
made available in electronic format. While publishers in theory have a strong incentive to make their most
recent titles available as e-books, they lag behind in doing so.
This might explain why one third of all the e-book downloads in Aleph are from highly developed
Western countries, and two third of these downloads are of books without a kindle version. Having access
to print copies either through libraries or through commercial retailers is simply not enough anymore.
Developing count


ownloads (17% compared to 8% in developed countries) are of titles that aren’t available in print
at all. Not having access to books in print seems to be a more pressing problem for developing countries
than not having access to electronic copies. Aleph thus fulfills at least two distinct types of demand: in
developed countries it provides access to missing electronic versions, in developing countries it provides
access to missing print copies.
The ability to fulfill an otherwise unfulfilled demand is not the only function of illegality. Copyright
infringement in the case of Aleph has a much more important role: it enables the peer production of the
library. Aleph is an open source library. This means that every resource it uses and every resource it
creates is freely accessible to anyone for use without any further restrictions. This includes the server
code, the database, the catalogue and the collection. The open source nature of Aleph rests on the
ideological claim that the scientific knowledge produced by humanity, mostly through public funds
should be open for anyone to access without any restrictions. Everything else in and around Aleph stems
from this claim, as they replicate the open access logic in all the other aspects of Aleph’s operation. Aleph
uses the peer produced Open Library to fetch book metadata, it uses the bittorrent and ed2k P2P networks
to store and make books accessible, it uses Linux and MySQL to run its code, and it allows its users to
upload books and edit book metadata. As a


sequence of its open source nature, anyone can contribute
to the project, and everyone can enjoy its benefits.
It is hard to quantify the impact of this piratical open access library on education, science and research in
various local contexts where Aleph is the prime source of otherwise inaccessible books. But it is
relatively easy to measure the consequences of openness at the level of the Aleph, the library. The
collection of Aleph was created mostly by those individuals and communities who decided to digitize
books by themselves for their own use. While any single individual is only capable of digitizing a few
books at the maximum, the small contributions quickly add up. To digitize the 1.15 million documents in
the Aleph collection would require an investment of several hundred million Euros, and a substantial
subsequent investment in storage, collection management and access provision (Poole, 2010). Compared
to these figures the costs associated with running Aleph is infinitesimal, as it survives on the volunteer
labor of a few individuals, and annual donations in the total value of a few thousand dollars. The hundreds
of thousands who use Aleph on a more or less regular basis have an immense amount of resources, and by
disregarding the copyright laws Aleph is able to tap into those resources and use them for the
development of the library. The value of these resources and of the peer produced library is the difference
between the actual costs associated with Aleph, and the investment that would be required to create
something remotely similar.

15

Bodó B. (2015): Libraries in the post-scarcity era.
in: Porsdam (ed): Copyrighting Creativity: Creative values, Cultural Heritage Institutions and Systems of Inte


evant
scientific works is only possible at the moment through massive copyright infringement. It is debatable
whether the copyrighted corpus of scientific works should be completely open, and whether the blatant
disregard of copyrights through which Aleph achieved this openness is the right path towards a more
openly accessible body of scientific knowledge. It is also yet to be measured what effects shadow libraries
may have on the commercial intermediaries and on the health of scientific publishing and science in
general. But Aleph, in any case, is a case study in the potential benefits of open sourcing the library.

Conclusion
If we can take Aleph as an expression of what users around the globe want from a library, then the answer
is that there is a strong need for a universally accessible collection of current, relevant (scientific) books
in restrictions-free electronic formats. Can we expect


ture? Does such a service have a place in the future of libraries? It is as
hard to imagine the future library with such a service as without.
While the legal and financial obstacles to the creation of a scientific library with as universal reach as
Aleph may be difficult the overcome, other aspects of it may be more easily replicable. The way Aleph
operates demonstrates the amount of material and immaterial resources users are willing to contribute to
build a library that responds to their needs and expectations. If libraries plan to only ‘host’ user-governed
activities, it means that the library is still imagined to be a separate entity from its users. Aleph teaches us
that this separation can be overcome and users can constitute a library. But for that they need
opportunities to participate in the production of the library: they need the right to digitize books and copy
digital books to and from the lib


aleph in Bodo 2014


]The quality and accessibility of education to poors will drastically grow too. Frankly, I'm seeing this as
the only way to naturally improve mankind: by breeding people with all the information given to them at
any time.” – Anonymous admin of Aleph, explaining the reason d’étre of the site

Abstract
RuNet, the Russian segment of the internet is now the home of the most comprehensive scientific pirate
libraries on the net. These sites offer free access to hundreds of thousands of books and mi


st to involve
major university publishers in particular. Under the injunction, the Library.nu adminstrators closed the
site. The collection disappeared and the community around it dispersed. (Liang, 2012)
Gigapedia’s collection was integrated into Aleph’s predominantly Russian language collection before the
shutdown, making Aleph the natural successor of Gigapedia/library.nu.

Libraries in the RuNet

2
Electronic copy available at: http://ssrn.com/abstract=2616631

Draft Manuscript, 11/4/2014, DO NOT CITE!
The search soon zeroed in on a number of sites with strong hints to their Russian origins. Sites like Aleph,
[sc], [fi], [os] are open, completely free to use, and each offers access to a catalog comparable to the late
Gigapedia’s.
The similarity of these seemingly distinct services is no coincidence. These sites constitute a tightly knit
network, in which Aleph occupies the central position. Aleph, as its name suggests, is the source library,
it aims to seed of all scientific digital libraries on the net. Its mission is simple and straightforward. It
collects free-floating scientific texts and other collections from the Internet and consolidat


them (both
content and metadata) into a single, open database. Though ordinary users can search the catalog and
retrieve the texts, its main focus is the distribution of the catalog and the collection to anyone who
wants to build services upon them. Aleph has regularly updated links that point to its own, neatly packed
source code, its database dump, and to the terabytes worth of collection. It is a knowledge infrastructure
that can be freely accessed, used and built upon by anyone. This radical openness enables a number of
other pirate libraries to offer Aleph’s catalogue along with books coming from other sources. By
mirroring Aleph they take over tasks that the administrators of Aleph are unprepared or unwilling to do.
Handling much of the actual download traffic they relieve Aleph from the unavoidable investment in
servers and bandwidth, which, in turn puts less pressure on Aleph to engage in commercial activities to
finance its operation. While Aleph stays in the background, the network of mirrors compete for
attention, users and advertising revenue as their design, business model, technical sophistication is finetuned to the profile of their intended target audience.
This strategy of creating an open infrastructure serves Aleph well. It ensures the widespread distribution
of books while it minimizes (legal) exposure. By relinquishing control, Aleph also ensures its own longterm survival, as it is copied again and again. In fact, openness is the core element in the philosophy of
Aleph, which was summed up by one of its administrators as to:
“- collect valuable science/technology/math/medical/h


ed the
laborious task of organizing the texts into a usable, searchable format—first filtering duplicates and
organizing existing metadata first into an excel spreadsheet, and later moving to a more open, webbased database operating under the name Aleph.
Aleph inherited more than just books from Kolhoz and Moshkov’s lib.ru. It inherited their elitism with
regard to canonical texts, and their understanding of librarianship as a community effort. Like the earlier
sites, Aleph’s collections are complemented by a stream of user submissions. Like the other sites, the
number of submissions grew rapidly as the site’s visibility, reputation and trustworthiness was
established, and like the others it later fell, as more and


s is what is thought to be relevant by the community,
measured by the act of actively digitizing and sharing books. But it has created a very interesting strategy
to establish a library which is universal in terms of its reach. The administrators of Aleph understand that
Gigapedia’s downfall was due to its visibility and they wish to avoid that trap:
“Well, our policy, which I control as strictly as I can, is to avoid fame. Gigapedia's policy was to gain as
much fame as possible. Books should be a


providing access without jeopardizing their mission by open sourcing
the collection and thus allowing others to create widely publicized services that interface with the
public.They let others run the risk of getting famous.

Mirrors and communities
Aleph serves as a source archive for around a half-dozen freely accessible pirate libraries on the net. The
catalog database is downloadable, the content is downloadable, even the server code is downloadable.
No passwords are required to download and there


stacle to setting
up a similar library with a wider catalog, with improved user interface and better services, with a
different audience or, in fact, a different business model.
This arrangement creates a two-layered community. The core group of the Aleph admins maintains the
current service, while a loose and ever changing network of ‘mirror sites’ build on the Aleph
infrastructure.
“The unspoken agreement is that the mirrors support our ideas. Otherwise we simply do not interact with
them. If the mirrors do support this, they appear in the discussions, on the Web etc. in a positive context.
This is again about


expresses his
own views and if they conform with ours, we support them. If the ideology does not match, it breaks
down.”8

6

Anonymous source #1
Anonymous source #2
8
Anonymous source #1
7

13

Draft Manuscript, 11/4/2014, DO NOT CITE!
The core Aleph team claims to exclusively control only two critical resources: the BBS that is the home of
the community, and the book-uploading interface. That claim is, however, not entirely accurate. For the
time being, the academic minded e-book community indeed gathers on the BBS managed by Aleph, and
though there is little incentive to move on, technically nothing stands in the way of alternatives to spring
up. As for the centralization of the book collection: many of the mirrors have their own upload pages
where one can contribute to a mirror’s collection, and it is not clear how or whether books that land at
one of the mirrors find their way back to the central database. Aleph also offers a desktop library
management tool, which enables dedicated librarians to see the latest Aleph database on their desktop
and integrate their local collections with the central database via this application. Nevertheless, it seems
that nothing really stands in the way of the fragmentation of the collection, apart from the willingness of
uploaders to contribute directly to Aleph rather than to one of its mirrors (or other sites).
Funding for Aleph comes from the administrators’ personal resources as well as occasional donations
when there is a need to buy or rent equipment or services:
“[W]e've been asking and getting support for this purpose for years. […] All our mirrors are supported


nations 3 or 4 times, for a specific purpose only and
with all the budget spoken for. And after getting the requested amount of money we shut down the
donations.”9
Mirrors, however, do not need to be non-commercial to enjoy the support of the core Aleph community,
they just have to provide free access. Ad-supported business models that do not charge for individual
access are still acceptable to the community, but there has been serious fallout with another site, which
used the Aleph stock to seed its own library, but decided to follow a “collaborative piracy” business
approach.
“To make it utmost clear: we collaborate with anyone who shares the ideology of free knowledge
distribution. No conditions. [But] we can't suddenly


really there or what is being done.
There are very few similarities in common between [e]and [ALEPH], and these similarities are too
superficial to serve as a common ground for communication. […]
They run an illegal business, making a profit.”11
Aleph administrators describe a set of values that differentiates possible site models. They prioritize the
curatorial mission and the provision of long term free access to the collection with all the costs such a
position implies, such as open sourcing th


raining from commercial activities, and as a result, operating on a reduced budget . [e] prioritizes the
expansion of its catalogue on demand but that implies a commercial operation, a larger budget and the
associated high legal risk. Sites carrying Aleph’s catalogue prioritize public visibility, carry ads to cover
costs but respond to takedown requests to avoid as much trouble as they can. From the perspective of
expanding access, these are not easy or straightforward tradeoffs. In Aleph’s case, the strong
commitment to the mission of providing free access comes with significant sacrifices, the most important
of which is relinquishing control over its most valuable asset: its collection of 1.2 million scientific books.
But they believe that these costs are justified by the promise, that this way the fate of free access is not
tied to the fate of Aleph.
The fact that piratical file sharing communities are willing to make substantial sacrifices (in terms of selfrestraint) to ensure their long term survival has been documented in a number of different cases. (Bodó,
2013) Aleph is unique, however in its radical open source approach. No other piratical community has
given up all the control over itself entirely. This approach is rooted in the way how it regards the legal
status of its subject matter, i.e. scholarly publications in the first place. While norms of openness in the
field of scientific knowledge production were first formed in the Enlightenment period, Aleph’s
11

BBS comments posted on Jul 02, 2013, and Aug 25, 2013

15

Draft Manuscript, 11/4/2014, DO NOT CITE!
copynorms are as much shaped by the specificities of post-Soviet era as by the age old realization that in
science we can see further if we


NOT CITE!
business purposes, which had marked the Russian IP regime throughout the decade (Sezneva &
Karaganis, 2011), slowly gave way to more uniform enforcement.

Closure of the Legal Regime
The legal, economic, and cultural conditions under which Aleph and its mirrors operate today are very
different from those of two decades earlier. The major legal loopholes are now closed, though Russian
authorities have shown little inclination to pursue Aleph so far:
I can't say whether it's the Russian copyright enforcement or the Western one that's most dangerous for
Aleph; I'd say that Russian enforcement is still likely to tolerate most of the things that Western
publishers won't allow. For example, l


o. But such efforts are slowly
increasing, as the market for digital texts grows and as publishers benefit from the enforcement
precedents set or won by the more aggressive rightsholder groups. The domain name of [os], one of the
sites mirroring the Aleph collection was seized, apparently due to the legal action taken by a US
rightholder, and it also started to respond to DMCA notices, removing links to books reported to be
infringing. Aleph responds to this with a number of tactical moves:
We want books to be available, but only for those who need them. We do not want [ALEPH] to be visible.
If one knows where to get books, there are here for him or her. In this way we stay relatively in


EPH] is spread mostly by face-to-face communication, where most of the unnecessary
people do not know about it. (Unnecessary are those who aim profit)14
The policy of invisibility is radically different from Moshkov’s policy of maximum visibility. Aleph hopes
that it can recede into the shadows where it will be protected by the omerta of academics sharing the
sharing ethos:
In Russian academia, [Aleph] is tacitly or actively supported. There are people that do not want to be
included, but it is hard


4, DO NOT CITE!
The protection the academic community has to offer may not be enough to fend off the publishers’
enforcement actions. The option to recede further into the darknets and hide behind the veil of privacy
technologies is one option the Aleph site has: the first mirror on I2P, an anonymizing network designed
to hide the whereabouts and identity of web services is already operational. But
[i]f people are physically served court invitations, they will have to close the site. The idea is, however,
that the entire collection is copied throughout the world many times over, the database is open, the code
for the site is open, so other people can continue.16

On methodology
We tried to reconstruct the story behind Aleph by conducting interviews and browsing through the BBS
of the community. Access to the site and community members was given under a strict condition of
anonymity. We thus removed any reference to the names and URLs of the services in question.
At one

 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.