Adema
Scanners, collectors and aggregators. On the underground movement of (pirated) theory text sharing
2009


# Scanners, collectors and aggregators. On the ‘underground movement’ of
(pirated) theory text sharing

_“But as I say, let’s play a game of science fiction and imagine for a moment:
what would it be like if it were possible to have an academic equivalent to
the peer-to-peer file sharing practices associated with Napster, eMule, and
BitTorrent, something dealing with written texts rather than music? What would
the consequences be for the way in which scholarly research is conceived,
communicated, acquired, exchanged, practiced, and understood?”_

Gary Hall – [Digitize this
book!](http://www.upress.umn.edu/Books/H/hall_digitize.html) (2008)

![ubuweb](https://openreflections.files.wordpress.com/2009/09/ubuweb.jpg?w=547)Ubu
web was founded in 1996 by poet [Kenneth
Goldsmith](http://en.wikipedia.org/wiki/Kenneth_Goldsmith "Kenneth Goldsmith")
and has developed from ‘a repository for visual, concrete and (later) sound
poetry, to a site that ‘embraced all forms of the avant-garde and beyond. Its
parameters continue to expand in all directions.’ As
[Wikipedia](http://en.wikipedia.org/wiki/UbuWeb) states, Ubu is non-commercial
and operates on a gift economy. All the same - by forming an amazing resource
and repository for the avant-garde movement, and by offering and hosting these
works on its platform, Ubu is violating copyright laws. As they state however:
‘ _should something return to print, we will remove it from our site
immediately. Also, should an artist find their material posted on UbuWeb
without permission and wants it removed, please let us know. However, most of
the time, we find artists are thrilled to find their work cared for and
displayed in a sympathetic context. As always, we welcome more work from
existing artists on site_.’

Where in the more affluent and popular media realms of block buster movies and
pop music the [Piratebay](http://thepiratebay.org/) and other download sites
(or p2p networks) like [Mininova](http://www.mininova.org/) are being sued and
charged with copyright infringement, the major powers to be seem to turn a
blind eye when it comes to Ubu and many other resource sites online that offer
digital versions of hard-to-get-by materials ranging from books to
documentaries.

This is and has not always been the case: in 2002 [Sebastian
Lütgert](http://www.wizards-of-
os.org/archiv/wos_3/sprecher/l_p/sebastian_luetgert.html) from Berlin/New York
was sued by the "Hamburger Stiftung zur Förderung von Wissenschaft und Kultur"
for putting online two downloadable texts from Theodor W. Adorno on his
website [textz.com](http://www.medienkunstnetz.de/artist/textz-
com/biography/), an underground archive for Literature. According to
[this](http://de.indymedia.org/2004/03/76975.shtml) Indymedia interview with
Lütgert, textz.com was referred to as ‘the Napster for books’ offering about
700 titles, focusing on, as Lütgert states _‘Theorie, Romane, Science-Fiction,
Situationisten, Kino, Franzosen, Douglas Adams, Kritische Theorie, Netzkritik
usw’._

The interview becomes even more interesting when Lütgert remarks that one can
still easily download both Adorno texts without much ado if one wants to. This
leads to the bigger question of the real reasons underlying the charge against
textz.com; why was textz.com sued? As Lütgert says in the interview: “ _Das
kann man sowieso_ [when referring to the still available Adorno texts] _._
_Aber es gibt schon lange einen klaren Unterschied zwischen offener
Verfügbarkeit und dem Untergrund. Man kann die freie Verbreitung von Inhalten
nicht unterbinden, aber man scheint verhindern zu wollen dass dies allzu offen
und selbstverständlich geschieht. Das ist es was sie stört.”
_

_![I don't have any
secrets](https://openreflections.files.wordpress.com/2009/09/i-dont-have-any-
secrets.jpg?w=547)_

But how can something be truly underground in an online environment whilst
still trying to spread or disseminate texts as widely as possible? This seems
to be the paradox of many - not quite legal and/or copyright protected -
resource sharing and collecting communities and platforms nowadays. However,
multiple scenario’s are available to evade this dilemma: by being frankly open
about the ‘status’ of the content on offer, as Ubu does, or by using little
‘tricks’ like an easy website registration, classifying oneself as a reading
group, or by relieving oneself from responsibility by stating that one is only
aggregating sources from elsewhere (linking) and not hosting the content on
its own website or blog. One can also state the offered texts or multimedia
files form a special issue or collection of resources, emphasizing their
educational and not-for-profit value.

Most of the ‘underground’ text and content sharing communities seem to follow
the concept of (the inevitability of) ‘[information wants to be
free](https://openreflections.wordpress.com/tag/information-wants-to-be-
free/)’, especially on the Internet. As Lütgert States: “ _Und vor allem sind
die über Walter Benjamin nicht im Bilde, der das gleiche Problem der
Reproduzierbarkeit von Werken aller Art schon zu Beginn des letzten
Jahrhunderts vor sich hatte und erkannt hat: die Massen haben das Recht, sich
das alles wieder anzueignen. Sie haben das Recht zu kopieren, und das Recht,
kopiert zu werden. Jedenfalls ist das eine ganz schön ungemütliche Situation,
dass dessen Nachlass jetzt von solch einem Bürokraten verwaltet wird._ _A:
Glaubst Du es ist überhaupt legitim intellektuellen Inhalt zu "besitzen"? Oder
__Eigentümer davon zu sein?_ _S: Es ist *unmöglich*. "Geistiges" Irgendwas
verbreitet sich immer weiter. Reemtsmas Vorfahren wären nie von den Bäumen
runtergekommen oder aus dem Morast rausgekrochen, wenn sich "geistiges"
Irgendwas nicht verbreitet hätte.”_

![646px-
Book_scanner_svg.jpg](https://openreflections.files.wordpress.com/2009/09
/646px-book_scanner_svg-jpg1.png?w=547)

What seems to be increasingly obvious, as the interview also states, is that
one can find virtually all Ebooks and texts one needs via p2p networks and
other file sharing community’s (the true
[Darknet](http://en.wikipedia.org/wiki/Darknet_\(file_sharing\)) in a way) –
more and more people are offering (and asking for!) selections of texts and
books (including the ones by Adorno) on openly available websites and blogs,
or they are scanning them and offering them for (educational) use on their
domains. Although the Internet is mostly known for the pirating and
dissemination of pirated movies and music, copyright protected textual content
has (of course) always been spread too. But with the rise of ‘born digital’
text content, and with the help of massive digitization efforts like Google
Books (and accompanying Google Books [download
tools](http://www.codeplex.com/GoogleBookDownloader)) accompanied by the
appearance of better (and cheaper) scanning equipment, the movement of
‘openly’ spreading (pirated) texts (whether or not focusing on education and
‘fair use’) seems to be growing fast.

The direct harm (to both the producers and their publishers) of the free
online availability of (in copyright) texts is also maybe less clear than for
instance with music and films. Many feel texts and books will still be
preferred to be read in print, making the online and free availability of text
nothing more than a marketing tool for the sales of the printed version. Once
discovered, those truly interested will find and buy the print book. Also more
than with music and film, it is felt essential to share information, as a
cultural good and right, to prevent censorship and to improve society.

![Piracy by Mikel Casal](https://openreflections.files.wordpress.com/2009/09
/piracy-by-mikel-casal.jpg?w=432&h=312)

This is one of the reasons the [Open
Access](http://en.wikipedia.org/wiki/Open_access_\(publishing\)) movement for
scientific research has been initiated. But where the amount of people and
institutions supportive of this movement is gradually growing (especially
where it concerns articles and journals in the Sciences), the spread
concerning Open Access (or even digital availability) of monographs in the
Humanities and Social Sciences (of which the majority of the resources on
offer in the underground text sharing communities consists) has only just
started.

This has lead to a situation in which some have decided that change is not
coming fast enough. Instead of waiting for this utopian Open Access future to
come gradually about, they are actively spreading, copying, scanning and
pirating scholarly texts/monographs online. Although many times accompanied by
lengthy disclaimers about why they are violating copyright (to make the
content more widely accessible for one), many state they will take down the
content if asked. Following the
[copyleft](http://en.wikipedia.org/wiki/Copyleft) movement, what has in a way
thus arisen is a more ‘progressive’ or radical branch of the Open Access
movement. The people who spread these texts deem it inevitable they will be
online eventually, they are just speeding up the process. As Lütgert states: ‘
_The desire of an increasingly larger section of the population to 100-percent
of information is irreversible. The only way there can be slowed down in the
worst case, but not be stopped._

![scribd-logo](https://openreflections.files.wordpress.com/2009/09/scribd-
logo.jpg?w=547)

Still we have not yet answered the question of why publishers (and their
pirated authors) are not more upset about these kinds of websites and
platforms. It is not a simple question of them not being aware that these kind
of textual disseminations are occurring. As mentioned before, the harm to
producers (scholars) and their publishers (in Humanities and Social Sciences
mainly Not-For-Profit University Presses) is less clear. First of all, their
main customers are libraries (compare this to the software business model:
free for the consumer, companies pay), who are still buying the legal content
and mostly follow the policy of buying either print or both print and ebook,
so there are no lost sales there for the publishers. Next to that it is not
certain that the piracy is harming sales. Unlike in literary publishing, the
authors (academics) are already paid and do not loose money (very little maybe
in royalties) from the online availability. Perhaps some publishers also see
the Open Access movement as something inevitably growing and they thus don’t
see the urge to step up or organize a collaborative effort against scholarly
text piracy (where most of the presses also lack the scale to initiate this).
Whereas there has been some more upsurge and worries about _[textbook
piracy](http://bookseller-association.blogspot.com/2008/07/textbook-
piracy.html)_ (since this is of course the area where individual consumers –
students – do directly buy the material) and websites like
[Scribd](http://www.scribd.com/), this mostly has to do with the fact that
these kind of platforms also host non-scholarly content and actively promote
the uploading of texts (where many of the text ‘sharing’ platforms merely
offer downloading facilities). In the case of Scribd the size of the platform
(or the amount of content available on the platform) also has caused concerns
and much [media coverage](http://labnol.blogspot.com/2007/04/scribd-youtube-
for-pirated-ebooks-but.html).

All of this gives a lot of potential power to text sharing communities, and I
guess they know this. Only authors might be directly upset (especially famous
ones gathering a lot of royalties on their work) or in the case of Lütgert,
their beneficiaries, who still do see a lot of money coming directly from
individual customers.

Still, it is not only the lack of fear of possible retaliations that is
feeding the upsurge of text sharing communities. There is a strong ideological
commitment to the inherent good of these developments, and a moral and
political strive towards institutional and societal change when it comes to
knowledge production and dissemination.

![Information Libre](https://openreflections.files.wordpress.com/2009/09
/information-libre.jpg?w=547)As Adrian Johns states in his
[article](http://www.culturemachine.net/index.php/cm/article/view/345/348)
_Piracy as a business force_ , ‘today’s pirate philosophy is a moral
philosophy through and through’. As Jonas Andersson
[states](http://www.culturemachine.net/index.php/cm/article/view/346/359), the
idea of piracy has mostly lost its negative connotations in these communities
and is seen as a positive development, where these movements ‘have begun to
appear less as a reactive force (i.e. ‘breaking the rules’) and more as a
proactive one (‘setting the rules’). Rather than complain about the
conservatism of established forms of distribution they simply create new,
alternative ones.’ Although Andersson states this kind of activism is mostly
_occasional_ , it can be seen expressed clearly in the texts accompanying the
text sharing sites and blogs. However, copyright is perhaps so much _an issue_
on most of these sites (where it is on some of them), as it is something that
seems to be simply ignored for the larger good of aggregating and sharing
resources on the web. As is stated clearly for instance in an
[interview](http://blog.sfmoma.org/2009/08/four-dialogues-2-on-aaaarg/) with
Sean Dockray, who maintains AAAARG:

_" The project wasn’t about criticizing institutions, copyright, authority,
and so on. It was simply about sharing knowledge. This wasn’t as general as it
sounds; I mean literally the sharing of knowledge between various individuals
and groups that I was in correspondence with at the time but who weren’t
necessarily in correspondence with each other."_

Back to Lütgert. The files from textz.com have been saved and are still
[accessible](http://web.archive.org/web/20031208043421/textz.gnutenberg.net/index.php3?enhanced_version=http://textz.com/index.php3)
via [The Internet Archive Wayback
Machine](http://web.archive.org/collections/web.html). In the case of
textz.com, these files contain ’typed out text’, so no scanned contents or
PDF’s. Textz.com (or better said its shadow or mirror) offers an amazing
collection of texts, including artists statements/manifestos and screenplays
from for instance David Lynch.

The text sharing community has evolved and now knows many players. Two other
large members in this kind of ‘pirate theory base network’ (although – and I
have to make that clear! – they offer many (and even mostly) legal and out of
copyright texts), still active today, are
[Monoskop/Burundi](http://burundi.sk/monoskop/log/) and
[AAAARG.ORG](http://a.aaaarg.org/). These kinds of platforms all seem to
disseminate (often even on a titular level) similar content, focusing mostly
on Continental Philosophy and Critical Theory, Cultural Studies and Literary
Theory, The Frankfurter Schule, Sociology/Social Theory, Psychology,
Anthropology and Ethnography, Media Art and Studies, Music Theory, and
critical and avant-garde writers like Kafka, Beckett, Burroughs, Joyce,
Baudrillard, etc.etc.

[Monoskop](http://www.burundi.sk/monoskop/index.php/Main_Page) is, as they
state, a collaborative wiki research on the social history of media art or a
‘living archive of writings on art, culture and media technology’. At the
sitemap of their log, or under the categories section, you can browse their
resources on genre: book, journal, e-zine, report, pamphlet etc. As I found
[here](http://www.slovakia.culturalprofiles.net/?id=7958), Burundi originated
in 2003 as a (Slovakian) media lab working between the arts, science and
technologies, which spread out to a European city based cultural network; They
even functioned as a press, publishing the Anthology of New Media Literature
(in Slovak) in 2006, and they hosted media events and curated festivals. It
dissolved in June 2005 although the
[Monoskop](http://www.slovakia.culturalprofiles.net/?id=7964) research wiki on
media art, has continued to run since the dissolving of Burundi.

![AAAARG](https://openreflections.files.wordpress.com/2009/09/aaaarg.jpg?w=547)As
is stated on their website, AAAARG is a conversation platform, or
alternatively, a school, reading group or journal, maintained by Los Angeles
artist[ Sean Dockray](http://www.design.ucla.edu/people/faculty.php?ID=64
"Sean Dockray"). In the true spirit of Critical Theory, its aim is to ‘develop
critical discourse outside of an institutional framework’. Or even more
beautiful said, it operates in the spaces in between: ‘ _But rather than
thinking of it like a new building, imagine scaffolding that attaches onto
existing buildings and creates new architectures between them_.’ To be able to
access the texts and resources that are being ‘discussed’ at AAAARG, you need
to register, after which you will be able to browse the
[library](http://a.aaaarg.org/library). From this library, you can download
resources, but you can also upload content. You can subscribe to their
[feed](http://aaaarg.org/feed) (RSS/XML) and [like
Monoskop](http://twitter.com/monoskop), AAAARG.org also maintains a [Twitter
account](http://twitter.com/aaaarg) on which updates are posted. The most
interesting part though is the ‘extra’ functions the platform offers: after
you have made an account, you can make your own collections, aggregations or
issues out of the texts in the library or the texts you add. This offers an
alternative (thematically ordered) way into the texts archived on the site.
You also have the possibility to make comments or start a discussion on the
texts. See for instance their elaborate [discussion
lists](http://a.aaaarg.org/discussions). The AAAARG community thus serves both
as a sharing and feedback community and in this way operates in a true p2p
fashion, in a way like p2p seemed originally intended. The difference being
that AAAARG is not based on a distributed network of computers, but is based
on one platform, to which registered users are able to upload a file (which is
not the case on Monoskop for instance – only downloading here).

Via[
mercurunionhall](http://mercerunionhall.blogspot.com/2009/06/aaaargorg.html),
I found the image underneath which depicts AAAARG.ORG's article index
organized as a visual map, showing the connections between the different
texts. This map was created and posted by AAAARG user john, according to
mercurunionhall.

![Connections-v1 by
John](https://openreflections.files.wordpress.com/2009/09/connections-v1-by-
john.jpg?w=547)

Where AAAArg.org focuses again on the text itself - typed out versions of
books - Monoskop works with more modern versions of textual distribution:
scanned versions or full ebooks/pdf’s with all the possibilities they offer,
taking a lot of content from Google books or (Open Access) publishers’
websites. Monoskop also links back to the publishers’ websites or Google
Books, for information about the books or texts (which again proves that the
publishers should know about their activities). To download the text however,
Monoskop links to [Sharebee](http://www.sharebee.com/), keeping the actual
text and the real downloading activity away from its platform.

Another part of the text sharing content consists of platforms offering
documentaries and lectures (so multi-media content) online. One example of the
last is the [Discourse Notebook Archive](http://www.discoursenotebook.com/),
which describes itself as an effort which has as its main goal ‘to make
available lectures in contemporary continental philosophy’ and is maintained
by Todd Kesselman, a PhD Student at The New School for Social Research. Here
you can find lectures from Badiou, Kristeva and Zizek (both audio and video)
and lectures aggregated from the European Graduate School. Kesselman also
links to resources on the web dealing with contemporary continental
philosophy.

![Eule - Society of
Control](https://openreflections.files.wordpress.com/2009/09/eule-society-of-
control.gif?w=547)Society of Control is a website maintained by [Stephan
Dillemuth](http://www.kopenhagen.dk/fileadmin/oldsite/interviews/solmennesker.htm),
an artist living and working in Munich, Germany, offering amongst others an
overview of his work and scientific research. According to
[this](http://www2.khib.no/~hovedfag/akademiet_05/tekster/interview.html)
interview conducted by Kristian Ø Dahl and Marit Flåtter his work is a
response to the increased influence of the neo-liberal world order on
education, creating a culture industry that is more than often driven by
commercial interests. He asks the question ‘How can dissidence grow in the
blind spots of the ‘society of control’ and articulate itself?’ His website,
the [Society of Control](http://www.societyofcontrol.com/disclaimer1.htm) is,
as he states, ‘an independent organization whose profits are entirely devoted
to research into truth and meaning.’

Society of Control has a [library
section](http://www.societyofcontrol.com/library/) which contains works from
some of the biggest thinkers of the twentieth century: Baudrillard, Adorno,
Debord, Bourdieu, Deleuze, Habermas, Sloterdijk und so weiter, and so much
more, a lot in German, and all ‘typed out’ texts. The library section offers a
direct search function, a category function and a a-z browse function.
Dillemuth states that he offers this material under fair use, focusing on not
for profit, freedom of information and the maintenance of freedom of speech
and information and making information accessible to all:

_“The Societyofcontrol website site contains information gathered from many
different sources. We see the internet as public domain necessary for the free
flow and exchange of information. However, some of these materials contained
in this site maybe claimed to be copyrighted by various unknown persons. They
will be removed at the copyright holder 's request within a reasonable period
of time upon receipt of such a request at the email address below. It is not
the intent of the Societyofcontrol to have violated or infringed upon any
copyrights.”_

![Vilem Flusser, Andreas Strohl, Erik Eisel Writings
\(2002\)](https://openreflections.files.wordpress.com/2009/09/vilem-flusser-
andreas-strohl-erik-eisel-writings-2002.jpg?w=547)Important in this respect is
that he put the responsibility of reading/using/downloading the texts on his
site with the viewers, and not with himself: _“Anyone reading or looking at
copyright material from this site does so at his/her own peril, we disclaim
any participation or liability in such actions.”_

Fark Yaraları = [Scars of Différance](http://farkyaralari.blogspot.com/) and
[Multitude of blogs](http://multitudeofblogs.blogspot.com/) are maintained by
the same author, Renc-u-ana, a philosophy and sociology student from Istanbul.
The first is his personal blog (with also many links to downloadable texts),
focusing on ‘creating an e-library for a Heideggerian philosophy and
Bourdieuan sociology’ on which he writes ‘market-created inequalities must be
overthrown in order to close knowledge gap.’ The second site has a clear
aggregating function with the aim ‘to give united feedback for e-book
publishing sites so that tracing and finding may become easier.’ And a call
for similar blogs or websites offering free ebook content. The blog is
accompanied by a nice picture of a woman warning to keep quiet, very
paradoxically appropriate to the context. Here again, a statement from the
host on possible copyright infringement _: ‘None of the PDFs are my own
productions. I 've collected them from web (e-mule, avax, libreremo, socialist
bros, cross-x, gigapedia..) What I did was thematizing._’ The same goes for
[pdflibrary](http://pdflibrary.wordpress.com/) (which seems to be from the
same author), offering texts from Derrida, Benjamin, Deleuze and the likes:
_‘_ _None of the PDFs you find here are productions of this blog. They are
collected from different places in the web (e-mule, avax, libreremo, all
socialist bros, cross-x, …). The only work done here is thematizing and
tagging.’_

[![GRUP_Z~1](https://openreflections.files.wordpress.com/2009/09/grup_z11.jpg?w=547)](http://multitudeofblogs.blogspot.com/)Our
student from Istanbul lists many text sharing sites on Multitude of blogs,
including [Inishark](http://danetch.blogspot.com/) (amongst others Badiou,
Zizek and Derrida), [Revelation](http://revelation-online.blogspot.com/2009/02
/keeping-ten-commandments.html) (a lot of history and bible study), [Museum of
accidents](http://museumofaccidents.blogspot.com/) (many resources relating to
again, critical theory, political theory and continental philhosophy) and
[Makeworlds](http://makeworlds.net/) (initiated from the [make world
festival](http://www.makeworlds.org/1/index.html) 2001).
[Mariborchan](http://mariborchan.wordpress.com/) is mainly a Zizek resource
site (also Badiou and Lacan) and offers next to ebooks also video and audio
(lectures and documentaries) and text files, all via links to file sharing
platforms.

What is clear is that the text sharing network described above (I am sure
there are many more related to other fields and subjects) is also formed and
maintained by the fact that the blogs and resource sites link to each other in
their blog rolls, which is what in the end makes up the network of text
sharing, only enhanced by RSS feeds and Twitter accounts, holding together
direct communication streams with the rest of the community. That there has
not been one major platform or aggregation site linking them together and
uploading all the texts is logical if we take into account the text sharing
history described before and this can thus be seen as a clear tactic: it is
fear, fear for what happened to textz.com and fear for the issue of scale and
fear of no longer operating at the borders, on the outside or at the fringes.
Because a larger scale means they might really get noticed. The idea of
secrecy and exclusivity which makes for the idea of the underground is very
practically combined with the idea that in this way the texts are available in
a multitude of places and can thus not be withdrawn or disappear so easily.

This is the paradox of the underground: staying small means not being noticed
(widely), but will mean being able to exist for probably an extended period of
time. Becoming (too) big will mean reaching more people and spreading the
texts further into society, however it will also probably mean being noticed
as a treat, as a ‘network of text-piracy’. The true strategy is to retain this
balance of openly dispersed subversivity.

Update 25 November 2005: Another interesting resource site came to my
attention recently: [Bedeutung](http://http://www.bedeutung.co.uk/index.php),
a philosophical and artistic initiative consisting of three projects:
[Bedeutung
Magazine](http://www.bedeutung.co.uk/index.php?option=com_content&view=article&id=1&Itemid=3),
[Bedeutung
Collective](http://www.bedeutung.co.uk/index.php?option=com_content&view=article&id=67&Itemid=4)
and [Bedeutung Blog](http://bedeutung.wordpress.com/), hosts a
[library](http://www.bedeutung.co.uk/index.php?option=com_content&view=article&id=85&Itemid=45)
section which links to freely downloadable online e-books, articles, audio
recordings and videos.

### Share this:

* [Twitter](https://openreflections.wordpress.com/2009/09/20/scanners-collectors-and-aggregators-on-the-%e2%80%98underground-movement%e2%80%99-of-pirated-theory-text-sharing/?share=twitter "Click to share on Twitter")
* [Facebook](https://openreflections.wordpress.com/2009/09/20/scanners-collectors-and-aggregators-on-the-%e2%80%98underground-movement%e2%80%99-of-pirated-theory-text-sharing/?share=facebook "Click to share on Facebook")
*

### Like this:

Like Loading...

### _Related_

### 17 comments on " Scanners, collectors and aggregators. On the
‘underground movement’ of (pirated) theory text sharing"

1. Pingback: [Humanism at the fringe « Snarkmarket](http://snarkmarket.com/2009/3428)

2. Pingback: [Scanners, collectors and aggregators. On the 'underground movement' of (pirated) theory text sharing « Mariborchan](http://mariborchan.wordpress.com/2009/09/20/scanners-collectors-and-aggregators-on-the-underground-movement-of-pirated-theory-text-sharing/)

3. Mariborchan

September 20, 2009

![](https://2.gravatar.com/avatar/b8eea582f7e9ac0a622e3dacecad5835?s=55&d=&r=G)

I took the liberty to pirate this article.

4. [jannekeadema1979](http://www.openreflections.wordpress.com)

September 20, 2009

![](https://2.gravatar.com/avatar/e4898febe4230b412db7f7909bcb9fc9?s=55&d=&r=G)

Thanks, it's all about the sharing! Hope you liked it.

5. Pingback: [links for 2009-09-20 « Blarney Fellow](http://blarneyfellow.wordpress.com/2009/09/21/links-for-2009-09-20/)

6. [scars of différance](http://farkyaralari.blogspot.com)

September 30, 2009

![](https://1.gravatar.com/avatar/7b10f9b53e5fe3d284857da59fe8919c?s=55&d=&r=G)

hi there, I'm the owner of the Scars of Différance blog, I'm grateful for your
reading which nurtures self-reflexivity.

text-sharers phylum is a Tardean phenomena, it works through imitation and
differences differentiate styles and archives. my question was inherited from
aby warburg who is perhaps the first kantian librarian (not books, but the
nomenclatura of books must be thought!), I shape up a library where books
speak to each other, each time fragmentary.

you are right about the "fear", that's why I don't reupload books that are
deleted from mediafire. blog is one of the ways, for ex there are e-mail
groups where chain-sharings happen and there are forums where people ask each
other from different parts of the world, to scan a book that can't be found in
their library/country. I understand publishers' qualms (I also work in a
turkish publishing house and make translations). but they miss a point, it was
the very movement which made book a medium that de-posits "book" (in the
Blanchotian sense): these blogs do indeed a very important service, they save
books from the databanks. I'm not going to make a easy rider argument and
decry technology.what I mean is this: these books are the very bricks which
make up resistance -they are not compost-, it is a sharing "partage" and these
fragmentary impartations (the act in which 'we' emancipate books from the
proper names they bear: author, editor, publisher, queen,…) make words blare.
our work: to disenfranchise.

to get larger, to expand: these are too ambitious terms, one must learn to
stay small, remain finite. a blog can not supplant the non-place of the
friendships we make up around books.

the epigraph at the top of my blog reads: "what/who exorbitates mutates into
its opposite" from a Turkish poet Cahit Zarifoğlu. and this logic is what
generates the slithering of the word. we must save books from its own ends.

thanks again, best.

p.s. I'm not the owner of pdf library.

7. Bedeutung

November 24, 2009

![](https://0.gravatar.com/avatar/665e8f5cb5d701f1c7e310b9b6fef277?s=55&d=&r=G)

Here, an article that might interest:

sharing-free-piracy>

8. [jannekeadema1979](http://www.openreflections.wordpress.com)

November 24, 2009

![](https://2.gravatar.com/avatar/e4898febe4230b412db7f7909bcb9fc9?s=55&d=&r=G)

Thanks for the link, good article, agree with the contents, especially like
the part 'Could, for instance, the considerable resources that might be
allocated to protecting, policing and, ultimately, sanctioning online file-
sharing not be used for rendering it less financially damaging for the
creative sector?'
I like this kind of pragmatic reasoning, and I know more people do.
By the way, checked Bedeutung, great journal, and love your
[library](http://www.bedeutung.co.uk/index.php?option=com_content&view=article&id=86&Itemid=46)
section! Will add it to the main article.

9. Pingback: [Borderland › Critical Readings](http://borderland.northernattitude.org/2010/01/07/critical-readings/)

10. Pingback: [Mariborchan » Scanners, collectors and aggregators. On the 'underground movement' of (pirated) theory text sharing](http://mariborchan.com/scanners-collectors-and-aggregators-on-the-underground-movement-of-pirated-theory-text-sharing/)

11. Pingback: [Urgh! AAAARG dead? « transversalinflections](http://transversalinflections.wordpress.com/2010/05/29/urgh-aaaarg-dead/)

12. [nick knouf](http://turbulence.org/Works/JJPS)

June 18, 2010

![](https://0.gravatar.com/avatar/9908205c0ec5ecb5f27266e7cb7bff13?s=55&d=&r=G)

This is Nick, the author of the JJPS project; thanks for the tweet! I actually
came across this blog post while doing background research for the project and
looking for discussions about AAAARG; found out about a lot of projects that I
didn't already know about. One thing that I haven't been able to articulate
very well is that I think there's an interesting relationship between, say,
Kenneth Goldsmith's own poetry and his founding of Ubu Web; a collation and
reconfiguration of the detritus of culture (forgotten works of the avant-
gardes locked up behind pay walls of their own, or daily minutiae destined to
be forgotten), which is something that I was trying to do, in a more
circumscribed space, in JJPS Radio. But the question of distribution of
digital works is something I find fascinating, as there are all sorts of
avenues that we could be investigating but we are not. The issue, as it often
is, is one of technical ability, and that's why one of the future directions
of JJPS is to make some of the techniques I used easier to use. Those who want
to can always look into the code, which is of course freely available, but
that cannot and should not be a prerequisite.

13. [jannekeadema1979](http://www.openreflections.wordpress.com)

June 18, 2010

![](https://2.gravatar.com/avatar/e4898febe4230b412db7f7909bcb9fc9?s=55&d=&r=G)

Hi Nick, thanks for your comment. I love the JJPS and it would be great if the
technology you mention would be easily re-usable. What I find fascinating is
how you use another medium (radio) to translate/re-mediate and in a way also
unlock textual material. I see you also have an Open Access and a Cut-up hour.
I am very much interested in using different media to communicate scholarly
research and even more in remixing and re-mediating textual scholarship. I
think your project(s) is a very valuable exploration of these themes while at
the same time being a (performative) critique of the current system. I am in
awe.

14. Pingback: [Text-sharing "in the paradise of too many books" – SLOTHROP](http://slothrop.com/2012/11/16/text-sharing-in-the-paradise-of-too-many-books/)

15. [Jason Kennedy](http://www.facebook.com/903035234)

May 6, 2015

![](https://i2.wp.com/graph.facebook.com/v2.2/903035234/picture?q=type%3Dlarge%26_md5%3Da95c382cfe878c70aaad88831f511711&resize=55%2C55)

Some obvious fails suggest major knowledge gaps regarding sourcing texts
online (outside of legal channels).

And featuring Scribd doesn't help.

Q: What's the largest pirate book site on the net, with an inventory almost as
large as Amazon?

And it's not L_____ G_____

16. [Janneke Adema](http://www.openreflections.wordpress.com)

May 6, 2015

![](https://2.gravatar.com/avatar/e4898febe4230b412db7f7909bcb9fc9?s=55&d=&r=G)

Do enlighten us Jason… And might I remind you that this post was written in
2009?

17. Mike Andrews

May 7, 2015

![](https://0.gravatar.com/avatar/c255ce6922fbb867a2ee635beb85bd71?s=55&d=&r=G)

Interesting topic, but also odd in some respects. Not translating the German
quotes is very unthoughtful and maybe even arrogant. If you are interested in
open access accessibility needs to be your top priority. I can read German,
but many of my friends (and most of the world) can't. It take a little effort
to just fix this, but you can do it.


Medak, Sekulic & Mertens
Book Scanning and Post-Processing Manual Based on Public Library Overhead Scanner v1.2
2014


PUBLIC LIBRARY
&
MULTIMEDIA INSTITUTE

BOOK SCANNING & POST-PROCESSING MANUAL
BASED ON PUBLIC LIBRARY OVERHEAD SCANNER

Written by:
Tomislav Medak
Dubravka Sekulić
With help of:
An Mertens

Creative Commons Attribution - Share-Alike 3.0 Germany

TABLE OF CONTENTS

Introduction
3
I. Photographing a printed book
7
I. Getting the image files ready for post-processing
11
III. Transformation of source images into .tiffs
13
IV. Optical character recognition
16
V. Creating a finalized e-book file
16
VI. Cataloging and sharing the e-book
16
Quick workflow reference for scanning and post-processing
18
References
22

INTRODUCTION:
BOOK SCANNING - FROM PAPER BOOK TO E-BOOK
Initial considerations when deciding on a scanning setup
Book scanning tends to be a fragile and demanding process. Many factors can go wrong or produce
results of varying quality from book to book or page to page, requiring experience or technical skill
to resolve issues that occur. Cameras can fail to trigger, components to communicate, files can get
corrupted in the transfer, storage card doesn't get purged, focus fails to lock, lighting conditions
change. There are trade-offs between the automation that is prone to instability and the robustness
that is prone to become time consuming.
Your initial choice of book scanning setup will have to take these trade-offs into consideration. If
your scanning community is confined to your hacklab, you won't be risking much if technological
sophistication and integration fails to function smoothly. But if you're aiming at a broad community
of users, with varying levels of technological skill and patience, you want to create as much timesaving automation as possible on the condition of keeping maximum stability. Furthermore, if the
time of individual members of your scanning community can contribute is limited, you might also
want to divide some of the tasks between users and their different skill levels.
This manual breaks down the process of digitization into a general description of steps in the
workflow leading from the printed book to a digital e-book, each of which can be in a concrete
situation addressed in various manners depending on the scanning equipment, software, hacking
skills and user skill level that are available to your book scanning project. Several of those steps can
be handled by a single piece of equipment or software, or you might need to use a number of them your mileage will vary. Therefore, the manual will try to indicate the design choices you have in the
process of planning your workflow and should help you make decisions on what design is best for
you situation.
Introducing book scanner designs
The book scanning starts with the capturing of digital image files on the scanning equipment. There
are three principle types of book scanner designs:
 flatbed scanner
 single camera overhead scanner
 dual camera overhead scanner
Conventional flatbed scanners are widely available. However, given that they require the book to be
spread wide open and pressed down with the platen in order to break the resistance of the book
binding and expose sufficiently the inner margin of the text, it is the most destructive approach for
the book, imprecise and slow.
Therefore, book scanning projects across the globe have taken to custom designing improvised
setups or scanner rigs that are less destructive and better suited for fast turning and capturing of
pages. Designs abound. Most include:




one or two digital photo cameras of lesser or higher quality to capture the pages,
transparent V-shaped glass or Plexiglas platen to press the open book against a V-shape
cradle, and
a light source.

The go-to web resource to help you make an informed decision is the DIY book scanning
community at http://diybookscanner.org. A good place to start is their intro
(http://wiki.diybookscanner.org/ ) and scanner build list (http://wiki.diybookscanner.org/scannerbuild-list ).
The book scanners with a single camera are substantially cheaper, but come with an added difficulty
of de-warping the distorted page images due to the angle that pages are photographed at, which can
sometimes be difficult to correct in the post-processing. Hence, in this introductory chapter we'll
focus on two camera designs where the camera lens stands relatively parallel to the page. However,
with a bit of adaptation these instructions can be used to work with any other setup.
The Public Library scanner
In the focus of this manual is the scanner built for the Public Library project, designed by Voja
Antonić (see Illustration 1). The Public Library scanner was built with the immediate use by a wide
community of users in mind. Hence, the principle consideration in designing the Public Library
scanner was less sophistication and more robustness, facility of use and distributed process of
editing.
The board designs can be found here: http://www.memoryoftheworld.org/blog/2012/10/28/ourbeloved-bookscanner. The current iterations are using two Canon 1100 D cameras with the kit lens
Canon EF-S 18-55mm 1:3.5-5.6 IS. Cameras are auto-charging.

Illustration 1: Public Library Scanner
The scanner operates by automatically lowering the Plexiglas platen, illuminating the page and then
triggering camera shutters. The turning of pages and the adjustments of the V-shaped cradle holding

the book are manual.
The scanner is operated by a two-button controller (see Illustration 2). The upper, smaller button
breaks the capture process in two steps: the first click lowers the platen, increases the light level and
allows you to adjust the book or the cradle, the second click triggers the cameras and lifts the platen.
The lower button has
two modes. A quick
click will execute the
whole capture process in
one go. But if you hold
it pressed longer, it will
lower the platen,
allowing you to adjust
the book and the cradle,
and lift it without
triggering cameras when
you press again.

Illustration 2: A two-button controller

More on this manual: steps in the book scanning process
The book scanning process in general can be broken down in six steps, each of which will be dealt
in a separate chapter in this manual:
I. Photographing a printed book
I. Getting the image files ready for post-processing
III. Transformation of source images into .tiffs
IV. Optical character recognition
V. Creating a finalized e-book file
VI. Cataloging and sharing the e-book
A step by step manual for Public Library scanner
This manual is primarily meant to provide a detailed description and step-by-step instructions for an
actual book scanning setup -- based on the Voja Antonić's scanner design described above. This is a
two-camera overhead scanner, currently equipped with two Canon 1100 D cameras with EF-S 1855mm 1:3.5-5.6 IS kit lens. It can scan books of up to A4 page size.
The post-processing in this setup is based on a semi-automated transfer of files to a GNU/Linux
personal computer and on the use of free software for image editing, optical character recognition
and finalization of an e-book file. It was initially developed for the HAIP festival in Ljubljana in
2011 and perfected later at MaMa in Zagreb and Leuphana University in Lüneburg.
Public Library scanner is characterized by a somewhat less automated yet distributed scanning
process than highly automated and sophisticated scanner hacks developed at various hacklabs. A
brief overview of one such scanner, developed at the Hacker Space Bruxelles, is also included in
this manual.
The Public Library scanning process proceeds thus in following discrete steps:

1. creating digital images of pages of a book,
2. manual transfer of image files to the computer for post-processing,
3. automated renaming of files, ordering of even and odd pages, rotation of images and upload to a
cloud storage,
4. manual transformation of source images into .tiff files in ScanTailor
5. manual optical character recognition and creation of PDF files in gscan2pdf
The detailed description of the Public Library scanning process follows below.
The Bruxelles hacklab scanning process
For purposes of comparison, here we'll briefly reference the scanner built by the Bruxelles hacklab
(http://hackerspace.be/ScanBot). It is a dual camera design too. With some differences in hardware functionality
(Bruxelles scanner has automatic turning of pages, whereas Public Library scanner has manual turning of pages), the
fundamental difference between the two is in the post-processing - the level of automation in the transfer of images
from the cameras and their transformation into PDF or DjVu e-book format.
The Bruxelles scanning process is different in so far as the cameras are operated by a computer and the images are
automatically transferred, ordered and made ready for further post-processing. The scanner is home-brew, but the
process is for advanced DIY'ers. If you want to know more on the design of the scanner, contact Michael Korntheuer at
contact@hackerspace.be.
The scanning and post-processing is automated by a single Python script that does all the work
http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEAD
The scanner uses two Canon point and shoot cameras. Both cameras are connected to the PC with USB. They both run
PTP/CHDK (Canon Hack Development Kit). The scanning sequence is the following:
1. Script sends CHDK command line instructions to the cameras
2. Script sorts out the incoming files. This part is tricky. There is no reliable way to make a distinction between the left
and right camera, only between which camera was recognized by USB first. So the protocol is to always power up the
left camera first. See the instructions with the source code.
3. Collect images in a PDF file
4. Run script to OCR a .PDF file to plain .TXT file: http://git.constantvzw.org/?
p=algolit.git;a=blob;f=scanbot_brussel/ocr_pdf.sh;h=2c1f24f9afcce03520304215951c65f58c0b880c;hb=HEAD

I. PHOTOGRAPHING A PRINTED BOOK
Technologically the most demanding part of the scanning process is creating digital images of the
pages of a printed book. It's a process that is very different form scanner design to scanner design,
from camera to camera. Therefore, here we will focus strictly on the process with the Public Library
scanner.
Operating the Public Library scanner
0. Before you start:
Better and more consistent photographs lead to a more optimized and faster post-processing and a
higher quality of the resulting digital e-book. In order to guarantee the quality of images, before you
start it is necessary to set up the cameras properly and prepare the printed book for scanning.
a) Loosening the book
Depending on the type and quality of binding, some books tend to be too resistant to opening fully
to reveal the inner margin under the pressure of the scanner platen. It is thus necessary to “break in”
the book before starting in order to loosen the binding. The best way is to open it as wide as
possible in multiple places in the book. This can be done against the table edge if the book is more
rigid than usual. (Warning – “breaking in” might create irreversible creasing of the spine or lead to
some pages breaking loose.)
b) Switch on the scanner
You start the scanner by pressing the main switch or plugging the power cable into the the scanner.
This will also turn on the overhead LED lights.

c) Setting up the cameras
Place the cameras onto tripods. You need to move the lever on the tripod's head to allow the tripod
plate screwed to the bottom of the camera to slide into its place. Secure the lock by turning the lever
all the way back.
If the automatic chargers for the camera are provided, open the battery lid on the bottom of the
camera and plug the automatic charger. Close the lid.
Switch on the cameras using the lever on the top right side of the camera's body and place it into the
aperture priority (Av) mode on the mode dial above the lever (see Illustration 3). Use the main dial
just above the shutter button on the front side of the camera to set the aperture value to F8.0.

Illustration 3: Mode and main dial, focus mode switch, zoom
and focus ring
On the lens, turn the focus mode switch to manual (MF), turn the large zoom ring to set the value
exactly midway between 24 and 35 mm (see Illustration 3). Try to set both cameras the same.
To focus each camera, open a book on the cradle, lower the platen by holding the big button on the
controller, and turn on the live view on camera LCD by pressing the live view switch (see
Illustration 4). Now press the magnification button twice and use the focus ring on the front of the
lens to get a clear image view.

Illustration 4: Live view switch and magnification button

d) Connecting the cameras
Now connect the cameras to the remote shutter trigger cables that can be found lying on each side
of the scanner. They need to be plugged into a small round port hidden behind a protective rubber
cover on the left side of the cameras.
e) Placing the book into the cradle and double-checking the cameras
Open the book in the middle and place it on the cradle. Hold pressed the large button on the
controller to lower the Plexiglas platen without triggering the cameras. Move the cradle so that the
the platen fits into with the middle of the book.
Turn on the live view on the cameras' LED to see if the the pages fit into the image and if the
cameras are positioned parallel to the page.
f) Double-check storage cards and batteries
It is important that both storage cards on cameras are empty before starting the scanning in order
not to mess up the page sequence when merging photos from the left and the right camera in the
post-processing. To double-check, press play button on cameras and erase if there are some photos
left from the previous scan -- this you do by pressing the menu button, selecting the fifth menu from
the left and then select 'Erase Images' -> 'All images on card' -> 'OK'.
If no automatic chargers are provided, double-check on the information screen that batteries are
charged. They should be fully charged before starting with the scanning of a new book.

g) Turn off the light in the room
Lighting conditions during scanning should be as constant as possible, to reduce glare and achieve
maximum quality remove any source of light that might reflect off the Plexiglas platen. Preferably
turn off the light in the room or isolate the scanner with the black cloth provided.

1. Photographing a book
Now you are ready to start scanning. Place the book closed in the cradle and lower the platen by
holding the large button on the controller pressed (see Illustration 2). Adjust the position of the
cradle and lift the platen by pressing the large button again.
To scan you can now either use the small button on the controller to lower the platen, adjust and
then press it again to trigger the cameras and lift the platen. Or, you can just make a short press on
the large button to do it in one go.
ATTENTION: When the cameras are triggered, the shutter sound has to be heard coming
from both cameras. If one camera is not working, it's best to reconnect both cameras (see
Section 0), make sure the batteries are charged or adapters are connected, erase all images
and restart.
A mistake made in the photographing requires a lot of work in the post-processing, so it's
much quicker to repeat the photographing process.
If you make a mistake while flipping pages, or any other mistake, go back and scan from the page
you missed or incorrectly scanned. Note down the page where the error occurred and in the postprocessing the redundant images will be removed.
ADVICE: The scanner has a digital counter. By turning the dial forward and backward, you
can set it to tell you what page you should be scanning next. This should help you avoid
missing a page due to a distraction.
While scanning, move the cradle a bit to the left from time to time, making sure that the tip of Vshaped platen is aligned with the center of the book and the inner margin is exposed enough.

II. GETTING THE IMAGE FILES READY FOR POST-PROCESSING
Once the book pages have been photographed, they have to be transfered to the computer and
prepared for post-processing. With two-camera scanners, the capturing process will result in two
separate sets of images -- odd and even pages -- coming from the left and right cameras respectively
-- and you will need to rename and reorder them accordingly, rotate them into a vertical position
and collate them into a single sequence of files.
a) Transferring image files
For the transfer of files your principle process design choices are either to copy the files by
removing the memory cards from the cameras and copying them to the computer via a card reader
or to transfer them via a USB cable. The latter process can be automated by remote operating your
cameras from a computer, however this can be done only with a certain number of Canon cameras
(http://bit.ly/16xhJ6b) that can be hacked to run the open Canon Hack Development Kit firmware
(http://chdk.wikia.com).
After transferring the files, you want to erase all the image files on the camera memory card, so that
they would not end up messing up the scan of the next book.
b) Renaming image files
As the left and right camera are typically operated in sync, the photographing process results in two
separate sets of images, with even and odd pages respectively, that have completely different file
names and potentially same time stamps. So before you collate the page images in the order how
they appear in the book, you want to rename the files so that the first image comes from the right
camera, the second from the left camera, the third comes again from the right camera and so on.
You probably want to do a batch renaming, where your right camera files start with n and are offset
by an increment of 2 (e.g. page_0000.jpg, page_0002.jpg,...) and your left camera files start with
n+1 and are also offset by an increment of 2 (e.g. page_0001.jpg, page_0003.jpg,...).
Batch renaming can be completed either from your file manager, in command line or with a number
of GUI applications (e.g. GPrename, rename, cuteRenamer on GNU/Linux).
c) Rotating image files
Before you collate the renamed files, you might want to rotate them. This is a step that can be done
also later in the post-processing (see below), but if you are automating or scripting your steps this is
a practical place to do it. The images leaving your cameras will be positioned horizontally. In order
to position them vertically, the images from the camera on the right will have to be rotated by 90
degrees counter-clockwise, the images from the camera on the left will have to be rotated by 90
degrees clockwise.
Batch rotating can be completed in a number of photo-processing tools, in command line or
dedicated applications (e.g. Fstop, ImageMagick, Nautilust Image Converter on GNU/Linux).
d) Collating images into a single batch
Once you're done with the renaming and rotating of the files, you want to collate them into the same
folder for easier manipulation later.

Getting the image files ready for post-processing on the Public Library scanner
In the case of Public Library scanner, a custom C++ script was written by Mislav Stublić to
facilitate the transfer, renaming, rotating and collating of the images from the two cameras.
The script prompts the user to place into the card reader the memory card from the right camera
first, gives a preview of the first and last four images and provides an entry field to create a subfolder in a local cloud storage folder (path: /home/user/Copy).
It transfers, renames, rotates the files, deletes them from the card and prompts the user to replace the
card with the one from the left camera in order to the transfer the files from there and place them in
the same folder. The script was created for GNU/Linux system and it can be downloaded, together
with its source code, from: https://copy.com/nLSzflBnjoEB
If you have other cameras than Canon, you can edit the line 387 of the source file to change to the
naming convention of your cameras, and recompile by running the following command in your
terminal: "gcc scanflow.c -o scanflow -ludev `pkg-config --cflags --libs gtk+-2.0`"
In the case of Hacker Space Bruxelles scanner, this is handled by the same script that operates the cameras that can be
downloaded from: http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEAD

III. TRANSFORMATION OF SOURCE IMAGES INTO .TIFFS
Images transferred from the cameras are high definition full color images. You want your cameras
to shoot at the largest possible .jpg resolution in order for resulting files to have at least 300 dpi (A4
at 300 dpi requires a 9.5 megapixel image). In the post-processing the size of the image files needs
to be reduced down radically, so that several hundred images can be merged into an e-book file of a
tolerable size.
Hence, the first step in the post-processing is to crop the images from cameras only to the content of
the pages. The surroundings around the book that were captured in the photograph and the white
margins of the page will be cropped away, while the printed text will be transformed into black
letters on white background. The illustrations, however, will need to be preserved in their color or
grayscale form, and mixed with the black and white text. What were initially large .jpg files will
now become relatively small .tiff files that are ready for optical character recognition process
(OCR).
These tasks can be completed by a number of software applications. Our manual will focus on one
that can be used across all major operating systems -- ScanTailor. ScanTailor can be downloaded
from: http://scantailor.sourceforge.net/. A more detailed video tutorial of ScanTailor can be found
here: http://vimeo.com/12524529.
ScanTailor: from a photograph of a page to a graphic file ready for OCR
Once you have transferred all the photos from cameras to the computer, renamed and rotated them,
they are ready to be processed in the ScanTailor.
1) Importing photographs to ScanTailor
- start ScanTailor and open ‘new project’
- for ‘input directory’ chose the folder where you stored the transferred and renamed photo images
- you can leave ‘output directory’ as it is, it will place your resulting .tiffs in an 'out' folder inside
the folder where your .jpg images are
- select all files (if you followed the naming convention above, they will be named
‘page_xxxx.jpg’) in the folder where you stored the transferred photo images, and click 'OK'
- in the dialog box ‘Fix DPI’ click on All Pages, and for DPI choose preferably '600x600', click
'Apply', and then 'OK'
2) Editing pages
2.1 Rotating photos/pages
If you've rotated the photo images in the previous step using the scanflow script, skip this step.
- Rotate the first photo counter-clockwise, click Apply and for scope select ‘Every other page’
followed by 'OK'
- Rotate the following photo clockwise, applying the same procedure like in the previous step
2.2 Deleting redundant photographs/pages
- Remove redundant pages (photographs of the empty cradle at the beginning and the end of the
book scanning sequence; book cover pages if you don’t want them in the final scan; duplicate pages
etc.) by right-clicking on a thumbnail of that page in the preview column on the right side, selecting
‘Remove from project’ and confirming by clicking on ‘Remove’.

# If you by accident remove a wrong page, you can re-insert it by right-clicking on a page
before/after the missing page in the sequence, selecting 'insert after/before' (depending on which
page you selected) and choosing the file from the list. Before you finish adding, it is necessary to
again go through the procedure of fixing DPI and Rotating.
2.3 Adding missing pages
- If you notice that some pages are missing, you can recapture them with the camera and insert them
manually at this point using the procedure described above under 2.2.
3) Split pages and deskew
Steps ‘Split pages’ and ‘Deskew’ should work automatically. Run them by clicking the ‘Play’ button
under the 'Select content' function. This will do the three steps automatically: splitting of pages,
deskewing and selection of content. After this you can manually re-adjust splitting of pages and deskewing.
4) Selecting content
Step ‘Select content’ works automatically as well, but it is important to revise the resulting selection
manually page by page to make sure the entire content is selected on each page (including the
header and page number). Where necessary, use your pointer device to adjust the content selection.
If the inner margin is cut, go back to 'Split pages' view and manually adjust the selected split area. If
the page is skewed, go back to 'Deskew' and adjust the skew of the page. After this go back to
'Select content' and readjust the selection if necessary.
This is the step where you do visual control of each page. Make sure all pages are there and
selections are as equal in size as possible.
At the bottom of thumbnail column there is a sort option that can automatically arrange pages by
the height and width of the selected content, making the process of manual selection easier. The
extreme differences in height should be avoided, try to make selected areas as much as possible
equal, particularly in height, across all pages. The exception should be cover and back pages where
we advise to select the full page.
5) Adjusting margins
For best results select in the previous step content of the full cover and back page. Now go to the
'Margins' step and set under Margins section both Top, Bottom, Left and Right to 0.0 and do 'Apply
to...' → 'All pages'.
In Alignment section leave 'Match size with other pages' ticked, choose the central positioning of
the page and do 'Apply to...' → 'All pages'.
6) Outputting the .tiffs
Now go to the 'Output' step. Ignore the 'Output Resolution' section.
Next review two consecutive pages from the middle of the book to see if the scanned text is too
faint or too dark. If the text seems too faint or too dark, use slider Thinner – Thicker to adjust. Do
'Apply to' → 'All pages'.
Next go to the cover page and select under Mode 'Color / Grayscale' and tick on 'White Margins'.
Do the same for the back page.
If there are any pages with illustrations, you can choose the 'Mixed' mode for those pages and then

under the thumb 'Picture Zones' adjust the zones of the illustrations.
Now you are ready to output the files. Just press 'Play' button under 'Output'. Once the computer is
finished processing the images, just do 'File' → 'Save as' and save the project.

IV. OPTICAL CHARACTER RECOGNITION
Before the edited-down graphic files are finalized as an e-book, we want to transform the image of
the text into an actual text that can be searched, highlighted, copied and transformed. That
functionality is provided by Optical Character Recognition. This a technically difficult task dependent on language, script, typeface and quality of print - and there aren't that many OCR tools
that are good at it. There is, however, a relatively good free software solution - Tesseract
(http://code.google.com/p/tesseract-ocr/) - that has solid performance, good language data and can
be trained for an even better performance, although it has its problems. Proprietary solutions (e.g.
Abby FineReader) sometimes provide superior results.
Tesseract supports as input format primarily .tiff files. It produces a plain text file that can be, with
the help of other tools, embedded as a separate layer under the original graphic image of the text in
a PDF file.
With the help of other tools, OCR can be performed also against other input files, such as graphiconly PDF files. This produces inferior results, depending again on the quality of graphic files and
the reproduction of text in them. One such tool is a bashscript to OCR a ODF file that can be found
here: https://github.com/andrecastro0o/ocr/blob/master/ocr.sh
As mentioned in the 'before scanning' section, the quality of the original book will influence the
quality of the scan and thus the quality of the OCR. For a comparison, have a look here:
http://www.paramoulipist.be/?p=1303
Once you have your .txt file, there is still some work to be done. Because OCR has difficulties to
interpret particular elements in the lay-out and fonts, the TXT file comes with a lot of errors.
Recurrent problems are:
- combinations of specific letters in some fonts (it can mistake 'm' for 'n' or 'I' for 'i' etc.);
- headers become part of body text;
- footnotes are placed inside the body text;
- page numbers are not recognized as such.

V. CREATING A FINALIZED E-BOOK FILE
After the optical character recognition has been completed, the resulting text can be merged with
the images of pages and output into an e-book format. While increasingly the proper e-book file
formats such as ePub have been gaining ground, PDFs still remain popular because many people
tend to read on their computers, and they retain the original layout of the book on paper including
the absolute pagination needed for referencing in citations. DjVu is also an option, as an alternative
to PDF, used because of its purported superiority, but it is far less popular.
The export to PDF can be done again with a number of tools. In our case we'll complete the optical
character recognition and PDF export in gscan2pdf. Again, the proprietary Abbyy FineReader will
produce a bit smaller PDFs.
If you prefer to use an e-book format that works better with e-book readers, obviously you will have
to remove some of the elements that appear in the book - headers, footers, footnotes and pagination.

This can be done earlier in the process of cropping down the original .jpg image files (see under III)
or later by transforming the PDF files. This can be done in Calibre (http://calibre-ebook.com) by
converting the PDF into an ePub, where it can be further tweaked to better accommodate or remove
the headers, footers, footnotes and pagination.
Optical character recognition and PDF export in Public Library workflow
Optical character recognition with the Tesseract engine can be performed on GNU/Linux by a
number of command line and GUI tools. Much of those tools exist also for other operating systems.
For the users of the Public Library workflow, we recommend using gscan2pdf application both for
the optical character recognition and the PDF or DjVu export.
To do so, start gscan2pdf and open your .tiff files. To OCR them, go to 'Tools' and select 'OCR'. In
the dialog box select the Tesseract engine and your language. 'Start OCR'. Once the OCR is
finished, export the graphic files and the OCR text to PDF by selecting 'Save as'.
However, given that sometimes the proprietary solutions produce better results, these tasks can also
be done, for instance, on the Abbyy FineReader running on a Windows operating system running
inside the Virtual Box. The prerequisites are that you have both Windows and Abbyy FineReader
you can install in the Virtual Box. If using Virtual Box, once you've got both installed, you need to
designate a shared folder in your Virtual Box and place the .tiff files there. You can now open them
from the Abbyy FineReader running in the Virtual Box, OCR them and export them into a PDF.
To use Abbyy FineReader transfer the output files in your 'out' out folder to the shared folder of the
VirtualBox. Then start the VirtualBox, start Windows image and in Windows start Abbyy
FineReader. Open the files and let the Abbyy FineReader read the files. Once it's done, output the
result into PDF.

VI. CATALOGING AND SHARING THE E-BOOK
Your road from a book on paper to an e-book is complete. If you want to maintain your library you
can use Calibre, a free software tool for e-book library management. You can add the metadata to
your book using the existing catalogues or you can enter metadata manually.
Now you may want to distribute your book. If the work you've digitized is in the public domain
(https://en.wikipedia.org/wiki/Public_domain), you might consider contributing it to the Gutenberg
project
(http://www.gutenberg.org/wiki/Gutenberg:Volunteers'_FAQ#V.1._How_do_I_get_started_as_a_Pr
oject_Gutenberg_volunteer.3F ), Wikibooks (https://en.wikibooks.org/wiki/Help:Contributing ) or
Arhive.org.
If the work is still under copyright, you might explore a number of different options for sharing.

QUICK WORKFLOW REFERENCE FOR SCANNING AND
POST-PROCESSING ON PUBLIC LIBRARY SCANNER
I. PHOTOGRAPHING A PRINTED BOOK
0. Before you start:
- loosen the book binding by opening it wide on several places
- switch on the scanner
- set up the cameras:
- place cameras on tripods and fit them tigthly
- plug in the automatic chargers into the battery slot and close the battery lid
- switch on the cameras
- switch the lens to Manual Focus mode
- switch the cameras to Av mode and set the aperture to 8.0
- turn the zoom ring to set the focal length exactly midway between 24mm and 35mm
- focus by turning on the live view, pressing magnification button twice and adjusting the
focus to get a clear view of the text
- connect the cameras to the scanner by plugging the remote trigger cable to a port behind a
protective rubber cover on the left side of the cameras
- place the book into the crade
- double-check storage cards and batteries
- press the play button on the back of the camera to double-check if there are images on the
camera - if there are, delete all the images from the camera menu
- if using batteries, double-check that batteries are fully charged
- switch off the light in the room that could reflect off the platen and cover the scanner with the
black cloth
1. Photographing
- now you can start scanning either by pressing the smaller button on the controller once to
lower the platen and adjust the book, and then press again to increase the light intensity, trigger the
cameras and lift the platen; or by pressing the large button completing the entire sequence in one
go;
- ATTENTION: Shutter sound should be coming from both cameras - if one camera is not
working, it's best to reconnect both cameras, make sure the batteries are charged or adapters
are connected, erase all images and restart.
- ADVICE: The scanner has a digital counter. By turning the dial forward and backward,
you can set it to tell you what page you should be scanning next. This should help you to
avoid missing a page due to a distraction.

II. Getting the image files ready for post-processing
- after finishing with scanning a book, transfer the files to the post-processing computer
and purge the memory cards
- if transferring the files manually:
- create two separate folders,
- transfer the files from the folders with image files on cards, using a batch
renaming software rename the files from the right camera following the convention
page_0001.jpg, page_0003.jpg, page_0005.jpg... -- and the files from the left camera
following the convention page_0002.jpg, page_0004.jpg, page_0006.jpg...
- collate image files into a single folder
- before ejecting each card, delete all the photo files on the card
- if using the scanflow script:
- start the script on the computer
- place the card from the right camera into the card reader
- enter the name of the destination folder following the convention
"Name_Surname_Title_of_the_Book" and transfer the files
- repeat with the other card
- script will automatically transfer the files, rename, rotate, collate them in proper
order and delete them from the card
III. Transformation of source images into .tiffs
ScanTailor: from a photograph of page to a graphic file ready for OCR
1) Importing photographs to ScanTailor
- start ScanTailor and open ‘new project’
- for ‘input directory’ chose the folder where you stored the transferred photo images
- you can leave ‘output directory’ as it is, it will place your resulting .tiffs in an 'out' folder
inside the folder where your .jpg images are
- select all files (if you followed the naming convention above, they will be named
‘page_xxxx.jpg’) in the folder where you stored the transferred photo images, and click
'OK'
- in the dialog box ‘Fix DPI’ click on All Pages, and for DPI choose preferably '600x600',
click 'Apply', and then 'OK'
2) Editing pages
2.1 Rotating photos/pages
If you've rotated the photo images in the previous step using the scanflow script, skip this step.
- rotate the first photo counter-clockwise, click Apply and for scope select ‘Every other
page’ followed by 'OK'
- rotate the following photo clockwise, applying the same procedure like in the previous
step

2.2 Deleting redundant photographs/pages
- remove redundant pages (photographs of the empty cradle at the beginning and the end;
book cover pages if you don’t want them in the final scan; duplicate pages etc.) by rightclicking on a thumbnail of that page in the preview column on the right, selecting ‘Remove
from project’ and confirming by clicking on ‘Remove’.
# If you by accident remove a wrong page, you can re-insert it by right-clicking on a page
before/after the missing page in the sequence, selecting 'insert after/before' and choosing the file
from the list. Before you finish adding, it is necessary to again go the procedure of fixing DPI and
rotating.
2.3 Adding missing pages
- If you notice that some pages are missing, you can recapture them with the camera and
insert them manually at this point using the procedure described above under 2.2.
3)

Split pages and deskew
- Functions ‘Split Pages’ and ‘Deskew’ should work automatically. Run them by
clicking the ‘Play’ button under the 'Select content' step. This will do the three steps
automatically: splitting of pages, deskewing and selection of content. After this you can
manually re-adjust splitting of pages and de-skewing.

4)

Selecting content and adjusting margins
- Step ‘Select content’ works automatically as well, but it is important to revise the
resulting selection manually page by page to make sure the entire content is selected on
each page (including the header and page number). Where necessary use your pointer device
to adjust the content selection.
- If the inner margin is cut, go back to 'Split pages' view and manually adjust the selected
split area. If the page is skewed, go back to 'Deskew' and adjust the skew of the page. After
this go back to 'Select content' and readjust the selection if necessary.
- This is the step where you do visual control of each page. Make sure all pages are there
and selections are as equal in size as possible.
- At the bottom of thumbnail column there is a sort option that can automatically arrange
pages by the height and width of the selected content, making the process of manual
selection easier. The extreme differences in height should be avoided, try to make
selected areas as much as possible equal, particularly in height, across all pages. The
exception should be cover and back pages where we advise to select the full page.

5) Adjusting margins
- Now go to the 'Margins' step and set under Margins section both Top, Bottom, Left and
Right to 0.0 and do 'Apply to...' → 'All pages'.
- In Alignment section leave 'Match size with other pages' ticked, choose the central

positioning of the page and do 'Apply to...' → 'All pages'.
6) Outputting the .tiffs
- Now go to the 'Output' step.
- Review two consecutive pages from the middle of the book to see if the scanned text is
too faint or too dark. If the text seems too faint or too dark, use slider Thinner – Thicker to
adjust. Do 'Apply to' → 'All pages'.
- Next go to the cover page and select under Mode 'Color / Grayscale' and tick on 'White
Margins'. Do the same for the back page.
- If there are any pages with illustrations, you can choose the 'Mixed' mode for those
pages and then under the thumb 'Picture Zones' adjust the zones of the illustrations.
- To output the files press 'Play' button under 'Output'. Save the project.
IV. Optical character recognition & V. Creating a finalized e-book file
If using all free software:
1) open gscan2pdf (if not already installed on your machine, install gscan2pdf from the
repositories, Tesseract and data for your language from https://code.google.com/p/tesseract-ocr/)
- point gscan2pdf to open your .tiff files
- for Optical Character Recognition, select 'OCR' under the drop down menu 'Tools',
select the Tesseract engine and your language, start the process
- once OCR is finished and to output to a PDF, go under 'File' and select 'Save', edit the
metadata and select the format, save
If using non-free software:
2) open Abbyy FineReader in VirtualBox (note: only Abby FineReader 10 installs and works with some limitations - under GNU/Linux)
- transfer files in the 'out' folder to the folder shared with the VirtualBox
- point it to the readied .tiff files and it will complete the OCR
- save the file

REFERENCES
For more information on the book scanning process in general and making your own book scanner
please visit:
DIY Book Scanner: http://diybookscannnner.org
Hacker Space Bruxelles scanner: http://hackerspace.be/ScanBot
Public Library scanner: http://www.memoryoftheworld.org/blog/2012/10/28/our-belovedbookscanner/
Other scanner builds: http://wiki.diybookscanner.org/scanner-build-list
For more information on automation:
Konrad Voeckel's post-processing script (From Scan to PDF/A):
http://blog.konradvoelkel.de/2013/03/scan-to-pdfa/
Johannes Baiter's automation of scanning to PDF process: http://spreads.readthedocs.org
For more information on applications and tools:
Calibre e-book library management application: http://calibre-ebook.com/
ScanTailor: http://scantailor.sourceforge.net/
gscan2pdf: http://sourceforge.net/projects/gscan2pdf/
Canon Hack Development Kit firmware: http://chdk.wikia.com
Tesseract: http://code.google.com/p/tesseract-ocr/
Python script of Hacker Space Bruxelles scanner: http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEA
D


 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.