Medak, Sekulic & Mertens
Book Scanning and Post-Processing Manual Based on Public Library Overhead Scanner v1.2
2014

PUBLIC LIBRARY
&
MULTIMEDIA INSTITUTE

BOOK SCANNING & POST-PROCESSING MANUAL
BASED ON PUBLIC LIBRARY OVERHEAD SCANNER

Written by:
Tomislav Medak
Dubravka Sekulić
With help of:
An Mertens

Creative Commons Attribution - Share-Alike 3.0 Germany

TABLE OF CONTENTS

Introduction
3
I. Photographing a printed book
7
I. Getting the image files ready for post-processing
11
III. Transformation of source images into .tiffs
13
IV. Optical character recognition
16
V. Creating a finalized e-book file
16
VI. Cataloging and sharing the e-book
16
Quick workflow reference for scanning and post-processing
18
References
22

INTRODUCTION:
BOOK SCANNING - FROM PAPER BOOK TO E-BOOK
Initial considerations when deciding on a scanning setup
Book scanning tends to be a fragile and demanding process. Many factors can go wrong or produce
results of varying quality from book to book or page to page, requiring experience or technical skill
to resolve issues that occur. Cameras can fail to trigger, components to communicate, files can get
corrupted in the transfer, storage card doesn't get purged, focus fails to lock, lighting conditions
change. There are trade-offs between the automation that is prone to instability and the robustness
that is prone to become time consuming.
Your initial choice of book scanning setup will have to take these trade-offs into consideration. If
your scanning community is confined to your hacklab, you won't be risking much if technological
sophistication and integration fails to function smoothly. But if you're aiming at a broad community
of users, with varying levels of technological skill and patience, you want to create as much timesaving automation as possible on the condition of keeping maximum stability. Furthermore, if the
time of individual members of your scanning community can contribute is limited, you might also
want to divide some of the tasks between users and their different skill levels.
This manual breaks down the process of digitization into a general description of steps in the
workflow leading from the printed book to a digital e-book, each of which can be in a concrete
situation addressed in various manners depending on the scanning equipment, software, hacking
skills and user skill level that are available to your book scanning project. Several of those steps can
be handled by a single piece of equipment or software, or you might need to use a number of them your mileage will vary. Therefore, the manual will try to indicate the design choices you have in the
process of planning your workflow and should help you make decisions on what design is best for
you situation.
Introducing book scanner designs
The book scanning starts with the capturing of digital image files on the scanning equipment. There
are three principle types of book scanner designs:
 flatbed scanner
 single camera overhead scanner
 dual camera overhead scanner
Conventional flatbed scanners are widely available. However, given that they require the book to be
spread wide open and pressed down with the platen in order to break the resistance of the book
binding and expose sufficiently the inner margin of the text, it is the most destructive approach for
the book, imprecise and slow.
Therefore, book scanning projects across the globe have taken to custom designing improvised
setups or scanner rigs that are less destructive and better suited for fast turning and capturing of
pages. Designs abound. Most include:
•
•
•

one or two digital photo cameras of lesser or higher quality to capture the pages,
transparent V-shaped glass or Plexiglas platen to press the open book against a V-shape
cradle, and
a light source.

The go-to web resource to help you make an informed decision is the DIY book scanning
community at http://diybookscanner.org. A good place to start is their intro
(http://wiki.diybookscanner.org/ ) and scanner build list (http://wiki.diybookscanner.org/scannerbuild-list ).
The book scanners with a single camera are substantially cheaper, but come with an added difficulty
of de-warping the distorted page images due to the angle that pages are photographed at, which can
sometimes be difficult to correct in the post-processing. Hence, in this introductory chapter we'll
focus on two camera designs where the camera lens stands relatively parallel to the page. However,
with a bit of adaptation these instructions can be used to work with any other setup.
The Public Library scanner
In the focus of this manual is the scanner built for the Public Library project, designed by Voja
Antonić (see Illustration 1). The Public Library scanner was built with the immediate use by a wide
community of users in mind. Hence, the principle consideration in designing the Public Library
scanner was less sophistication and more robustness, facility of use and distributed process of
editing.
The board designs can be found here: http://www.memoryoftheworld.org/blog/2012/10/28/ourbeloved-bookscanner. The current iterations are using two Canon 1100 D cameras with the kit lens
Canon EF-S 18-55mm 1:3.5-5.6 IS. Cameras are auto-charging.

Illustration 1: Public Library Scanner
The scanner operates by automatically lowering the Plexiglas platen, illuminating the page and then
triggering camera shutters. The turning of pages and the adjustments of the V-shaped cradle holding

the book are manual.
The scanner is operated by a two-button controller (see Illustration 2). The upper, smaller button
breaks the capture process in two steps: the first click lowers the platen, increases the light level and
allows you to adjust the book or the cradle, the second click triggers the cameras and lifts the platen.
The lower button has
two modes. A quick
click will execute the
whole capture process in
one go. But if you hold
it pressed longer, it will
lower the platen,
allowing you to adjust
the book and the cradle,
and lift it without
triggering cameras when
you press again.

Illustration 2: A two-button controller

More on this manual: steps in the book scanning process
The book scanning process in general can be broken down in six steps, each of which will be dealt
in a separate chapter in this manual:
I. Photographing a printed book
I. Getting the image files ready for post-processing
III. Transformation of source images into .tiffs
IV. Optical character recognition
V. Creating a finalized e-book file
VI. Cataloging and sharing the e-book
A step by step manual for Public Library scanner
This manual is primarily meant to provide a detailed description and step-by-step instructions for an
actual book scanning setup -- based on the Voja Antonić's scanner design described above. This is a
two-camera overhead scanner, currently equipped with two Canon 1100 D cameras with EF-S 1855mm 1:3.5-5.6 IS kit lens. It can scan books of up to A4 page size.
The post-processing in this setup is based on a semi-automated transfer of files to a GNU/Linux
personal computer and on the use of free software for image editing, optical character recognition
and finalization of an e-book file. It was initially developed for the HAIP festival in Ljubljana in
2011 and perfected later at MaMa in Zagreb and Leuphana University in Lüneburg.
Public Library scanner is characterized by a somewhat less automated yet distributed scanning
process than highly automated and sophisticated scanner hacks developed at various hacklabs. A
brief overview of one such scanner, developed at the Hacker Space Bruxelles, is also included in
this manual.
The Public Library scanning process proceeds thus in following discrete steps:

1. creating digital images of pages of a book,
2. manual transfer of image files to the computer for post-processing,
3. automated renaming of files, ordering of even and odd pages, rotation of images and upload to a
cloud storage,
4. manual transformation of source images into .tiff files in ScanTailor
5. manual optical character recognition and creation of PDF files in gscan2pdf
The detailed description of the Public Library scanning process follows below.
The Bruxelles hacklab scanning process
For purposes of comparison, here we'll briefly reference the scanner built by the Bruxelles hacklab
(http://hackerspace.be/ScanBot). It is a dual camera design too. With some differences in hardware functionality
(Bruxelles scanner has automatic turning of pages, whereas Public Library scanner has manual turning of pages), the
fundamental difference between the two is in the post-processing - the level of automation in the transfer of images
from the cameras and their transformation into PDF or DjVu e-book format.
The Bruxelles scanning process is different in so far as the cameras are operated by a computer and the images are
automatically transferred, ordered and made ready for further post-processing. The scanner is home-brew, but the
process is for advanced DIY'ers. If you want to know more on the design of the scanner, contact Michael Korntheuer at
contact@hackerspace.be.
The scanning and post-processing is automated by a single Python script that does all the work
http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEAD
The scanner uses two Canon point and shoot cameras. Both cameras are connected to the PC with USB. They both run
PTP/CHDK (Canon Hack Development Kit). The scanning sequence is the following:
1. Script sends CHDK command line instructions to the cameras
2. Script sorts out the incoming files. This part is tricky. There is no reliable way to make a distinction between the left
and right camera, only between which camera was recognized by USB first. So the protocol is to always power up the
left camera first. See the instructions with the source code.
3. Collect images in a PDF file
4. Run script to OCR a .PDF file to plain .TXT file: http://git.constantvzw.org/?
p=algolit.git;a=blob;f=scanbot_brussel/ocr_pdf.sh;h=2c1f24f9afcce03520304215951c65f58c0b880c;hb=HEAD

I. PHOTOGRAPHING A PRINTED BOOK
Technologically the most demanding part of the scanning process is creating digital images of the
pages of a printed book. It's a process that is very different form scanner design to scanner design,
from camera to camera. Therefore, here we will focus strictly on the process with the Public Library
scanner.
Operating the Public Library scanner
0. Before you start:
Better and more consistent photographs lead to a more optimized and faster post-processing and a
higher quality of the resulting digital e-book. In order to guarantee the quality of images, before you
start it is necessary to set up the cameras properly and prepare the printed book for scanning.
a) Loosening the book
Depending on the type and quality of binding, some books tend to be too resistant to opening fully
to reveal the inner margin under the pressure of the scanner platen. It is thus necessary to “break in”
the book before starting in order to loosen the binding. The best way is to open it as wide as
possible in multiple places in the book. This can be done against the table edge if the book is more
rigid than usual. (Warning – “breaking in” might create irreversible creasing of the spine or lead to
some pages breaking loose.)
b) Switch on the scanner
You start the scanner by pressing the main switch or plugging the power cable into the the scanner.
This will also turn on the overhead LED lights.

c) Setting up the cameras
Place the cameras onto tripods. You need to move the lever on the tripod's head to allow the tripod
plate screwed to the bottom of the camera to slide into its place. Secure the lock by turning the lever
all the way back.
If the automatic chargers for the camera are provided, open the battery lid on the bottom of the
camera and plug the automatic charger. Close the lid.
Switch on the cameras using the lever on the top right side of the camera's body and place it into the
aperture priority (Av) mode on the mode dial above the lever (see Illustration 3). Use the main dial
just above the shutter button on the front side of the camera to set the aperture value to F8.0.

Illustration 3: Mode and main dial, focus mode switch, zoom
and focus ring
On the lens, turn the focus mode switch to manual (MF), turn the large zoom ring to set the value
exactly midway between 24 and 35 mm (see Illustration 3). Try to set both cameras the same.
To focus each camera, open a book on the cradle, lower the platen by holding the big button on the
controller, and turn on the live view on camera LCD by pressing the live view switch (see
Illustration 4). Now press the magnification button twice and use the focus ring on the front of the
lens to get a clear image view.

Illustration 4: Live view switch and magnification button

d) Connecting the cameras
Now connect the cameras to the remote shutter trigger cables that can be found lying on each side
of the scanner. They need to be plugged into a small round port hidden behind a protective rubber
cover on the left side of the cameras.
e) Placing the book into the cradle and double-checking the cameras
Open the book in the middle and place it on the cradle. Hold pressed the large button on the
controller to lower the Plexiglas platen without triggering the cameras. Move the cradle so that the
the platen fits into with the middle of the book.
Turn on the live view on the cameras' LED to see if the the pages fit into the image and if the
cameras are positioned parallel to the page.
f) Double-check storage cards and batteries
It is important that both storage cards on cameras are empty before starting the scanning in order
not to mess up the page sequence when merging photos from the left and the right camera in the
post-processing. To double-check, press play button on cameras and erase if there are some photos
left from the previous scan -- this you do by pressing the menu button, selecting the fifth menu from
the left and then select 'Erase Images' -> 'All images on card' -> 'OK'.
If no automatic chargers are provided, double-check on the information screen that batteries are
charged. They should be fully charged before starting with the scanning of a new book.

g) Turn off the light in the room
Lighting conditions during scanning should be as constant as possible, to reduce glare and achieve
maximum quality remove any source of light that might reflect off the Plexiglas platen. Preferably
turn off the light in the room or isolate the scanner with the black cloth provided.

1. Photographing a book
Now you are ready to start scanning. Place the book closed in the cradle and lower the platen by
holding the large button on the controller pressed (see Illustration 2). Adjust the position of the
cradle and lift the platen by pressing the large button again.
To scan you can now either use the small button on the controller to lower the platen, adjust and
then press it again to trigger the cameras and lift the platen. Or, you can just make a short press on
the large button to do it in one go.
ATTENTION: When the cameras are triggered, the shutter sound has to be heard coming
from both cameras. If one camera is not working, it's best to reconnect both cameras (see
Section 0), make sure the batteries are charged or adapters are connected, erase all images
and restart.
A mistake made in the photographing requires a lot of work in the post-processing, so it's
much quicker to repeat the photographing process.
If you make a mistake while flipping pages, or any other mistake, go back and scan from the page
you missed or incorrectly scanned. Note down the page where the error occurred and in the postprocessing the redundant images will be removed.
ADVICE: The scanner has a digital counter. By turning the dial forward and backward, you
can set it to tell you what page you should be scanning next. This should help you avoid
missing a page due to a distraction.
While scanning, move the cradle a bit to the left from time to time, making sure that the tip of Vshaped platen is aligned with the center of the book and the inner margin is exposed enough.

II. GETTING THE IMAGE FILES READY FOR POST-PROCESSING
Once the book pages have been photographed, they have to be transfered to the computer and
prepared for post-processing. With two-camera scanners, the capturing process will result in two
separate sets of images -- odd and even pages -- coming from the left and right cameras respectively
-- and you will need to rename and reorder them accordingly, rotate them into a vertical position
and collate them into a single sequence of files.
a) Transferring image files
For the transfer of files your principle process design choices are either to copy the files by
removing the memory cards from the cameras and copying them to the computer via a card reader
or to transfer them via a USB cable. The latter process can be automated by remote operating your
cameras from a computer, however this can be done only with a certain number of Canon cameras
(http://bit.ly/16xhJ6b) that can be hacked to run the open Canon Hack Development Kit firmware
(http://chdk.wikia.com).
After transferring the files, you want to erase all the image files on the camera memory card, so that
they would not end up messing up the scan of the next book.
b) Renaming image files
As the left and right camera are typically operated in sync, the photographing process results in two
separate sets of images, with even and odd pages respectively, that have completely different file
names and potentially same time stamps. So before you collate the page images in the order how
they appear in the book, you want to rename the files so that the first image comes from the right
camera, the second from the left camera, the third comes again from the right camera and so on.
You probably want to do a batch renaming, where your right camera files start with n and are offset
by an increment of 2 (e.g. page_0000.jpg, page_0002.jpg,...) and your left camera files start with
n+1 and are also offset by an increment of 2 (e.g. page_0001.jpg, page_0003.jpg,...).
Batch renaming can be completed either from your file manager, in command line or with a number
of GUI applications (e.g. GPrename, rename, cuteRenamer on GNU/Linux).
c) Rotating image files
Before you collate the renamed files, you might want to rotate them. This is a step that can be done
also later in the post-processing (see below), but if you are automating or scripting your steps this is
a practical place to do it. The images leaving your cameras will be positioned horizontally. In order
to position them vertically, the images from the camera on the right will have to be rotated by 90
degrees counter-clockwise, the images from the camera on the left will have to be rotated by 90
degrees clockwise.
Batch rotating can be completed in a number of photo-processing tools, in command line or
dedicated applications (e.g. Fstop, ImageMagick, Nautilust Image Converter on GNU/Linux).
d) Collating images into a single batch
Once you're done with the renaming and rotating of the files, you want to collate them into the same
folder for easier manipulation later.

Getting the image files ready for post-processing on the Public Library scanner
In the case of Public Library scanner, a custom C++ script was written by Mislav Stublić to
facilitate the transfer, renaming, rotating and collating of the images from the two cameras.
The script prompts the user to place into the card reader the memory card from the right camera
first, gives a preview of the first and last four images and provides an entry field to create a subfolder in a local cloud storage folder (path: /home/user/Copy).
It transfers, renames, rotates the files, deletes them from the card and prompts the user to replace the
card with the one from the left camera in order to the transfer the files from there and place them in
the same folder. The script was created for GNU/Linux system and it can be downloaded, together
with its source code, from: https://copy.com/nLSzflBnjoEB
If you have other cameras than Canon, you can edit the line 387 of the source file to change to the
naming convention of your cameras, and recompile by running the following command in your
terminal: "gcc scanflow.c -o scanflow -ludev `pkg-config --cflags --libs gtk+-2.0`"
In the case of Hacker Space Bruxelles scanner, this is handled by the same script that operates the cameras that can be
downloaded from: http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEAD

III. TRANSFORMATION OF SOURCE IMAGES INTO .TIFFS
Images transferred from the cameras are high definition full color images. You want your cameras
to shoot at the largest possible .jpg resolution in order for resulting files to have at least 300 dpi (A4
at 300 dpi requires a 9.5 megapixel image). In the post-processing the size of the image files needs
to be reduced down radically, so that several hundred images can be merged into an e-book file of a
tolerable size.
Hence, the first step in the post-processing is to crop the images from cameras only to the content of
the pages. The surroundings around the book that were captured in the photograph and the white
margins of the page will be cropped away, while the printed text will be transformed into black
letters on white background. The illustrations, however, will need to be preserved in their color or
grayscale form, and mixed with the black and white text. What were initially large .jpg files will
now become relatively small .tiff files that are ready for optical character recognition process
(OCR).
These tasks can be completed by a number of software applications. Our manual will focus on one
that can be used across all major operating systems -- ScanTailor. ScanTailor can be downloaded
from: http://scantailor.sourceforge.net/. A more detailed video tutorial of ScanTailor can be found
here: http://vimeo.com/12524529.
ScanTailor: from a photograph of a page to a graphic file ready for OCR
Once you have transferred all the photos from cameras to the computer, renamed and rotated them,
they are ready to be processed in the ScanTailor.
1) Importing photographs to ScanTailor
- start ScanTailor and open ‘new project’
- for ‘input directory’ chose the folder where you stored the transferred and renamed photo images
- you can leave ‘output directory’ as it is, it will place your resulting .tiffs in an 'out' folder inside
the folder where your .jpg images are
- select all files (if you followed the naming convention above, they will be named
‘page_xxxx.jpg’) in the folder where you stored the transferred photo images, and click 'OK'
- in the dialog box ‘Fix DPI’ click on All Pages, and for DPI choose preferably '600x600', click
'Apply', and then 'OK'
2) Editing pages
2.1 Rotating photos/pages
If you've rotated the photo images in the previous step using the scanflow script, skip this step.
- Rotate the first photo counter-clockwise, click Apply and for scope select ‘Every other page’
followed by 'OK'
- Rotate the following photo clockwise, applying the same procedure like in the previous step
2.2 Deleting redundant photographs/pages
- Remove redundant pages (photographs of the empty cradle at the beginning and the end of the
book scanning sequence; book cover pages if you don’t want them in the final scan; duplicate pages
etc.) by right-clicking on a thumbnail of that page in the preview column on the right side, selecting
‘Remove from project’ and confirming by clicking on ‘Remove’.

# If you by accident remove a wrong page, you can re-insert it by right-clicking on a page
before/after the missing page in the sequence, selecting 'insert after/before' (depending on which
page you selected) and choosing the file from the list. Before you finish adding, it is necessary to
again go through the procedure of fixing DPI and Rotating.
2.3 Adding missing pages
- If you notice that some pages are missing, you can recapture them with the camera and insert them
manually at this point using the procedure described above under 2.2.
3) Split pages and deskew
Steps ‘Split pages’ and ‘Deskew’ should work automatically. Run them by clicking the ‘Play’ button
under the 'Select content' function. This will do the three steps automatically: splitting of pages,
deskewing and selection of content. After this you can manually re-adjust splitting of pages and deskewing.
4) Selecting content
Step ‘Select content’ works automatically as well, but it is important to revise the resulting selection
manually page by page to make sure the entire content is selected on each page (including the
header and page number). Where necessary, use your pointer device to adjust the content selection.
If the inner margin is cut, go back to 'Split pages' view and manually adjust the selected split area. If
the page is skewed, go back to 'Deskew' and adjust the skew of the page. After this go back to
'Select content' and readjust the selection if necessary.
This is the step where you do visual control of each page. Make sure all pages are there and
selections are as equal in size as possible.
At the bottom of thumbnail column there is a sort option that can automatically arrange pages by
the height and width of the selected content, making the process of manual selection easier. The
extreme differences in height should be avoided, try to make selected areas as much as possible
equal, particularly in height, across all pages. The exception should be cover and back pages where
we advise to select the full page.
5) Adjusting margins
For best results select in the previous step content of the full cover and back page. Now go to the
'Margins' step and set under Margins section both Top, Bottom, Left and Right to 0.0 and do 'Apply
to...' → 'All pages'.
In Alignment section leave 'Match size with other pages' ticked, choose the central positioning of
the page and do 'Apply to...' → 'All pages'.
6) Outputting the .tiffs
Now go to the 'Output' step. Ignore the 'Output Resolution' section.
Next review two consecutive pages from the middle of the book to see if the scanned text is too
faint or too dark. If the text seems too faint or too dark, use slider Thinner – Thicker to adjust. Do
'Apply to' → 'All pages'.
Next go to the cover page and select under Mode 'Color / Grayscale' and tick on 'White Margins'.
Do the same for the back page.
If there are any pages with illustrations, you can choose the 'Mixed' mode for those pages and then

under the thumb 'Picture Zones' adjust the zones of the illustrations.
Now you are ready to output the files. Just press 'Play' button under 'Output'. Once the computer is
finished processing the images, just do 'File' → 'Save as' and save the project.

IV. OPTICAL CHARACTER RECOGNITION
Before the edited-down graphic files are finalized as an e-book, we want to transform the image of
the text into an actual text that can be searched, highlighted, copied and transformed. That
functionality is provided by Optical Character Recognition. This a technically difficult task dependent on language, script, typeface and quality of print - and there aren't that many OCR tools
that are good at it. There is, however, a relatively good free software solution - Tesseract
(http://code.google.com/p/tesseract-ocr/) - that has solid performance, good language data and can
be trained for an even better performance, although it has its problems. Proprietary solutions (e.g.
Abby FineReader) sometimes provide superior results.
Tesseract supports as input format primarily .tiff files. It produces a plain text file that can be, with
the help of other tools, embedded as a separate layer under the original graphic image of the text in
a PDF file.
With the help of other tools, OCR can be performed also against other input files, such as graphiconly PDF files. This produces inferior results, depending again on the quality of graphic files and
the reproduction of text in them. One such tool is a bashscript to OCR a ODF file that can be found
here: https://github.com/andrecastro0o/ocr/blob/master/ocr.sh
As mentioned in the 'before scanning' section, the quality of the original book will influence the
quality of the scan and thus the quality of the OCR. For a comparison, have a look here:
http://www.paramoulipist.be/?p=1303
Once you have your .txt file, there is still some work to be done. Because OCR has difficulties to
interpret particular elements in the lay-out and fonts, the TXT file comes with a lot of errors.
Recurrent problems are:
- combinations of specific letters in some fonts (it can mistake 'm' for 'n' or 'I' for 'i' etc.);
- headers become part of body text;
- footnotes are placed inside the body text;
- page numbers are not recognized as such.

V. CREATING A FINALIZED E-BOOK FILE
After the optical character recognition has been completed, the resulting text can be merged with
the images of pages and output into an e-book format. While increasingly the proper e-book file
formats such as ePub have been gaining ground, PDFs still remain popular because many people
tend to read on their computers, and they retain the original layout of the book on paper including
the absolute pagination needed for referencing in citations. DjVu is also an option, as an alternative
to PDF, used because of its purported superiority, but it is far less popular.
The export to PDF can be done again with a number of tools. In our case we'll complete the optical
character recognition and PDF export in gscan2pdf. Again, the proprietary Abbyy FineReader will
produce a bit smaller PDFs.
If you prefer to use an e-book format that works better with e-book readers, obviously you will have
to remove some of the elements that appear in the book - headers, footers, footnotes and pagination.

This can be done earlier in the process of cropping down the original .jpg image files (see under III)
or later by transforming the PDF files. This can be done in Calibre (http://calibre-ebook.com) by
converting the PDF into an ePub, where it can be further tweaked to better accommodate or remove
the headers, footers, footnotes and pagination.
Optical character recognition and PDF export in Public Library workflow
Optical character recognition with the Tesseract engine can be performed on GNU/Linux by a
number of command line and GUI tools. Much of those tools exist also for other operating systems.
For the users of the Public Library workflow, we recommend using gscan2pdf application both for
the optical character recognition and the PDF or DjVu export.
To do so, start gscan2pdf and open your .tiff files. To OCR them, go to 'Tools' and select 'OCR'. In
the dialog box select the Tesseract engine and your language. 'Start OCR'. Once the OCR is
finished, export the graphic files and the OCR text to PDF by selecting 'Save as'.
However, given that sometimes the proprietary solutions produce better results, these tasks can also
be done, for instance, on the Abbyy FineReader running on a Windows operating system running
inside the Virtual Box. The prerequisites are that you have both Windows and Abbyy FineReader
you can install in the Virtual Box. If using Virtual Box, once you've got both installed, you need to
designate a shared folder in your Virtual Box and place the .tiff files there. You can now open them
from the Abbyy FineReader running in the Virtual Box, OCR them and export them into a PDF.
To use Abbyy FineReader transfer the output files in your 'out' out folder to the shared folder of the
VirtualBox. Then start the VirtualBox, start Windows image and in Windows start Abbyy
FineReader. Open the files and let the Abbyy FineReader read the files. Once it's done, output the
result into PDF.

VI. CATALOGING AND SHARING THE E-BOOK
Your road from a book on paper to an e-book is complete. If you want to maintain your library you
can use Calibre, a free software tool for e-book library management. You can add the metadata to
your book using the existing catalogues or you can enter metadata manually.
Now you may want to distribute your book. If the work you've digitized is in the public domain
(https://en.wikipedia.org/wiki/Public_domain), you might consider contributing it to the Gutenberg
project
(http://www.gutenberg.org/wiki/Gutenberg:Volunteers'_FAQ#V.1._How_do_I_get_started_as_a_Pr
oject_Gutenberg_volunteer.3F ), Wikibooks (https://en.wikibooks.org/wiki/Help:Contributing ) or
Arhive.org.
If the work is still under copyright, you might explore a number of different options for sharing.

QUICK WORKFLOW REFERENCE FOR SCANNING AND
POST-PROCESSING ON PUBLIC LIBRARY SCANNER
I. PHOTOGRAPHING A PRINTED BOOK
0. Before you start:
- loosen the book binding by opening it wide on several places
- switch on the scanner
- set up the cameras:
- place cameras on tripods and fit them tigthly
- plug in the automatic chargers into the battery slot and close the battery lid
- switch on the cameras
- switch the lens to Manual Focus mode
- switch the cameras to Av mode and set the aperture to 8.0
- turn the zoom ring to set the focal length exactly midway between 24mm and 35mm
- focus by turning on the live view, pressing magnification button twice and adjusting the
focus to get a clear view of the text
- connect the cameras to the scanner by plugging the remote trigger cable to a port behind a
protective rubber cover on the left side of the cameras
- place the book into the crade
- double-check storage cards and batteries
- press the play button on the back of the camera to double-check if there are images on the
camera - if there are, delete all the images from the camera menu
- if using batteries, double-check that batteries are fully charged
- switch off the light in the room that could reflect off the platen and cover the scanner with the
black cloth
1. Photographing
- now you can start scanning either by pressing the smaller button on the controller once to
lower the platen and adjust the book, and then press again to increase the light intensity, trigger the
cameras and lift the platen; or by pressing the large button completing the entire sequence in one
go;
- ATTENTION: Shutter sound should be coming from both cameras - if one camera is not
working, it's best to reconnect both cameras, make sure the batteries are charged or adapters
are connected, erase all images and restart.
- ADVICE: The scanner has a digital counter. By turning the dial forward and backward,
you can set it to tell you what page you should be scanning next. This should help you to
avoid missing a page due to a distraction.

II. Getting the image files ready for post-processing
- after finishing with scanning a book, transfer the files to the post-processing computer
and purge the memory cards
- if transferring the files manually:
- create two separate folders,
- transfer the files from the folders with image files on cards, using a batch
renaming software rename the files from the right camera following the convention
page_0001.jpg, page_0003.jpg, page_0005.jpg... -- and the files from the left camera
following the convention page_0002.jpg, page_0004.jpg, page_0006.jpg...
- collate image files into a single folder
- before ejecting each card, delete all the photo files on the card
- if using the scanflow script:
- start the script on the computer
- place the card from the right camera into the card reader
- enter the name of the destination folder following the convention
"Name_Surname_Title_of_the_Book" and transfer the files
- repeat with the other card
- script will automatically transfer the files, rename, rotate, collate them in proper
order and delete them from the card
III. Transformation of source images into .tiffs
ScanTailor: from a photograph of page to a graphic file ready for OCR
1) Importing photographs to ScanTailor
- start ScanTailor and open ‘new project’
- for ‘input directory’ chose the folder where you stored the transferred photo images
- you can leave ‘output directory’ as it is, it will place your resulting .tiffs in an 'out' folder
inside the folder where your .jpg images are
- select all files (if you followed the naming convention above, they will be named
‘page_xxxx.jpg’) in the folder where you stored the transferred photo images, and click
'OK'
- in the dialog box ‘Fix DPI’ click on All Pages, and for DPI choose preferably '600x600',
click 'Apply', and then 'OK'
2) Editing pages
2.1 Rotating photos/pages
If you've rotated the photo images in the previous step using the scanflow script, skip this step.
- rotate the first photo counter-clockwise, click Apply and for scope select ‘Every other
page’ followed by 'OK'
- rotate the following photo clockwise, applying the same procedure like in the previous
step

2.2 Deleting redundant photographs/pages
- remove redundant pages (photographs of the empty cradle at the beginning and the end;
book cover pages if you don’t want them in the final scan; duplicate pages etc.) by rightclicking on a thumbnail of that page in the preview column on the right, selecting ‘Remove
from project’ and confirming by clicking on ‘Remove’.
# If you by accident remove a wrong page, you can re-insert it by right-clicking on a page
before/after the missing page in the sequence, selecting 'insert after/before' and choosing the file
from the list. Before you finish adding, it is necessary to again go the procedure of fixing DPI and
rotating.
2.3 Adding missing pages
- If you notice that some pages are missing, you can recapture them with the camera and
insert them manually at this point using the procedure described above under 2.2.
3)

Split pages and deskew
- Functions ‘Split Pages’ and ‘Deskew’ should work automatically. Run them by
clicking the ‘Play’ button under the 'Select content' step. This will do the three steps
automatically: splitting of pages, deskewing and selection of content. After this you can
manually re-adjust splitting of pages and de-skewing.

4)

Selecting content and adjusting margins
- Step ‘Select content’ works automatically as well, but it is important to revise the
resulting selection manually page by page to make sure the entire content is selected on
each page (including the header and page number). Where necessary use your pointer device
to adjust the content selection.
- If the inner margin is cut, go back to 'Split pages' view and manually adjust the selected
split area. If the page is skewed, go back to 'Deskew' and adjust the skew of the page. After
this go back to 'Select content' and readjust the selection if necessary.
- This is the step where you do visual control of each page. Make sure all pages are there
and selections are as equal in size as possible.
- At the bottom of thumbnail column there is a sort option that can automatically arrange
pages by the height and width of the selected content, making the process of manual
selection easier. The extreme differences in height should be avoided, try to make
selected areas as much as possible equal, particularly in height, across all pages. The
exception should be cover and back pages where we advise to select the full page.

5) Adjusting margins
- Now go to the 'Margins' step and set under Margins section both Top, Bottom, Left and
Right to 0.0 and do 'Apply to...' → 'All pages'.
- In Alignment section leave 'Match size with other pages' ticked, choose the central

positioning of the page and do 'Apply to...' → 'All pages'.
6) Outputting the .tiffs
- Now go to the 'Output' step.
- Review two consecutive pages from the middle of the book to see if the scanned text is
too faint or too dark. If the text seems too faint or too dark, use slider Thinner – Thicker to
adjust. Do 'Apply to' → 'All pages'.
- Next go to the cover page and select under Mode 'Color / Grayscale' and tick on 'White
Margins'. Do the same for the back page.
- If there are any pages with illustrations, you can choose the 'Mixed' mode for those
pages and then under the thumb 'Picture Zones' adjust the zones of the illustrations.
- To output the files press 'Play' button under 'Output'. Save the project.
IV. Optical character recognition & V. Creating a finalized e-book file
If using all free software:
1) open gscan2pdf (if not already installed on your machine, install gscan2pdf from the
repositories, Tesseract and data for your language from https://code.google.com/p/tesseract-ocr/)
- point gscan2pdf to open your .tiff files
- for Optical Character Recognition, select 'OCR' under the drop down menu 'Tools',
select the Tesseract engine and your language, start the process
- once OCR is finished and to output to a PDF, go under 'File' and select 'Save', edit the
metadata and select the format, save
If using non-free software:
2) open Abbyy FineReader in VirtualBox (note: only Abby FineReader 10 installs and works with some limitations - under GNU/Linux)
- transfer files in the 'out' folder to the folder shared with the VirtualBox
- point it to the readied .tiff files and it will complete the OCR
- save the file

REFERENCES
For more information on the book scanning process in general and making your own book scanner
please visit:
DIY Book Scanner: http://diybookscannnner.org
Hacker Space Bruxelles scanner: http://hackerspace.be/ScanBot
Public Library scanner: http://www.memoryoftheworld.org/blog/2012/10/28/our-belovedbookscanner/
Other scanner builds: http://wiki.diybookscanner.org/scanner-build-list
For more information on automation:
Konrad Voeckel's post-processing script (From Scan to PDF/A):
http://blog.konradvoelkel.de/2013/03/scan-to-pdfa/
Johannes Baiter's automation of scanning to PDF process: http://spreads.readthedocs.org
For more information on applications and tools:
Calibre e-book library management application: http://calibre-ebook.com/
ScanTailor: http://scantailor.sourceforge.net/
gscan2pdf: http://sourceforge.net/projects/gscan2pdf/
Canon Hack Development Kit firmware: http://chdk.wikia.com
Tesseract: http://code.google.com/p/tesseract-ocr/
Python script of Hacker Space Bruxelles scanner: http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEA
D

Sollfrank, Francke & Weinmayr
Piracy Project
2013

Giving What You Don't Have

Andrea Francke, Eva Weinmayr
Piracy Project

Birmingham, 6 December 2013

[00:12]
Eva Weinmayr: When we talk about the word piracy, it causes a lot of problems
to quite a few institutions to deal with it. So events that we’ve organised
have been announced by Central Saint Martins without using the word piracy.
That’s interesting, the problems it still causes…

Cornelia Sollfrank: And how do you announce the project without “Piracy”? The
Project?

E. W.: It’s a project about intellectual property.

C. S.: The P Project.

Andrea Francke, Eva Weinmayr: [laugh] Yes.

[00:52]
Andrea Francke: The Piracy Project is a knowledge platform, and it is based
around a collection of pirated books, of books that have been copied by
people. And we use it to raise discussion about originality, authorship,
intellectual property questions, and to produce new material, new essays and
new questions.

[01:12]
E. W.: So the Piracy Project includes several aspects. One is that it is an
act of piracy in itself, because it is located in an art school, in a library,
in an officially built up a collection of pirated books. [01:30] So that’s the
second aspect, it’s a collection of books which have been copied,
appropriated, modified, improved, which live in this library. [01:40] And the
third part is that it is a collection of physical books, which is touring. We
create reading rooms and invite people to explore the books and discuss issues
raised by cultural piracy.
[01:58] The Piracy Project started in an art college library, which was
supposed to be closed down. And the Piracy Project is one project of And
Publishing. And Publishing is a publishing activity exploring print-on-demand
and new modes of production and of dissemination, the immediacy of
dissemination. [02:20] And Publishing is a collaboration between myself and
Lynn Harris, and we were hosted by Central Saint Martins College of Art and
Design in London. And the campus where this library was situated was the
campus we were working at. [02:40] So when the library was being closed, we
moved in the library together with other members of staff, and kept the
library open in a self-organised way. But we were aware that there’s no budget
to buy new books, and we wanted to have this as a lively space, so we created
an open call for submissions and we asked people to select a book which is
really important to them and make a copy of it. [03:09] So we weren’t
interested in piling up a collection of second hand books, we were really
interested in this process: what happens when you make a copy of a book, and
how does this copy sit next to the original authoritative copy of the book.
This is how it started.

[03:31]
A. F.: I met Eva at the moment when And Publishing was helping to set up this
new space in the library, and they were trying to think how to make the
library more alive inside that university. [03:44] And I was doing research on
Peruvian book piracy at that time, and I had found this book that was modified
and was in circulation. And it was a very exciting moment for us to think what
happens if we can promote this type of production inside this academic
library.

[04:05] Piracy Project
Collection / Reading Room / Research

[04:11]
The Collection

[04:15]
E. W.: We asked people to make a copy of a book which is important to them and
send it to us, and so with these submission we started to build up the
collections. Lots of students were getting involved, but also lots of people
who work in this topic, and were interested in these topics. [04:38] So we
received about one hundred books in a couple of months. And then, parallel to
this, we started to do research ourselves. [04:50] We had a residency in
China, so we went to China, to Beijing and Shanghai, to meet illegal
booksellers of pirated architecture books. And we had a residency in Turkey,
in Istanbul, where we did lots of interviews with publishers and artists on
book piracy. [05:09] So the collection is a mix of our own research and cases
from the real book markets, and creative work, artistic work which is produced
in the context of an art college and the wider cultural realm.

[05:29]
A. F.: And it is an ongoing project.

E. W.: The project is ongoing, we still receive submissions. The collection is
growing, and at the moment here we have about 180 books, here at Grand Union
(Birmingham).

[05:42]
A. F.: When we did the open call, something that was really important to us
was to make clear for people that they have a space of creativity when they
are making a copy. So we wrote, please send us a copy of a book, and be aware
that things happen when you copy a book. [05:57] Whether you do it
intentionally or not a copy is never the same. So you can use that space, take
ownership of that space and make something out of that; or you can take a step
back and allow things to happen without having control. And I think that is
something that is quite important for us in the project. [06:12] And it is
really interesting how people have embraced that in different measures, like
subtle things, or material things, or adding text, taking text out, mixing
things, judging things. Sometimes just saying, I just want it to circulate, I
don’t mind what happens in the space, I just want the subject to be in the
world again.

[06:35]
E. W.: I think this is one which I find interesting in terms of making a copy,
because it’s not so much about my own creativity, it’s more about exploring
how technology edits what you can see. It’s Jan van Toorn’s Critical Practice,
and the artist is Hester Barnard, a Canadian artist. [07:02] She sent us these
three copies, and we thought, that’s really generous, three copies. But they
are not identical copies, they are very different. Some have a lot of empty
pages in the book. And this book has been screen-captured on a 3.5 inch
iPhone, whereas this book has been screen-captured on a desktop, and this one
has been screen-captured with a laptop. [07:37] So the device you use to
access information online determines what you actually receive. And I find
this really interesting, that she translated this back into a hardcopy, the
online edited material. [07:53] And this is kind of taught by this book,
standard International Copyright. She went to Google Books, and screen-
captured all the pages Google Books are showing. So we are all familiar with
blurry text pages, but then it starts that you get the message “Page 38 is not
shown in this preview.” [08:18] And then it’s going through the whole book, so
she printed every page basically, omitting the actual information. But the
interesting thing is that we are all aware that this is happening on Google,
on screen online, but the fact that she’s translating this back into an
object, into a printed book, is interesting.

[08:44]
Reading Room

[08:48]
A. F.: We create these reading rooms with the collection as a way to tour the
collection, and meet people and have conversations around the books. And that
is something quite important to us, that we go with the physical books to a
place, either for two or three months, and meet different people that have
different interests in relation to the collection in that locality. We’ve been
doing that for the last two years, I think, three years. [09:12] And it’s
quite interesting because different places have very different experiences of
piracy. So you can go to a country where piracy is something very common, or a
different place where people have a very strong position against piracy, or a
different legal framework. And I feel the type of conversations and the
quality of interactions is quite different from being present on the space and
with the books. [09:36] And that’s why we don’t call these exhibitions,
because we always have places where people can come and they can stay, and
they can come again. Sometimes people come three or four times and they
actually read the books. And a few times they go back to their houses and they
bring books back, and they said, I’m going to contact this friend who has been
to Russia and he told me about this book – so we can add it to the collection.
I think that makes a big difference to how the research in the project
functions.

[10:06]
E. W.: One of the most interesting events we did with the Piracy collection
was at the Show Room where we had a residency for the last year. There were
three events, and one was A Day At The Courtroom. This was an afternoon where
we invited three copyright lawyers coming from different legal systems: the
US, the UK, and the Continental European, Athens. And we presented ten
selected cases from the collection and the three copyright lawyers had to
assess them in the eyes of the law, and they had to agree where to put this
book in a scale from legal to illegal. [10:51] So we weren’t interested really
to say, this is legal and this is illegal, we were interested in all the
shades in between. And then they had to discuss where they would place the
book. But then the audience had the last verdict, and then the audience placed
the book. [11:05] And this was an extremely interesting discussion, because it
was interesting to see how different the legal backgrounds are, how blurry the
whole field is, how you can assess when is the moment where a work becomes a
transformative work, or when it stays a derivative work, and this whole
discussion.
[11:30] When we do these reading rooms – and we had one in New York, for
example, at the New York Art Book Fair – people are coming, and they are
coming to see the physical books in a physical space, so this creates a social
encounter and we have these conversations. [11:47] For example, a woman stood
up to us in New york and she told us about a piracy project she run where she
was working in a juvenile detention centre, and she produced a whole shadow
library of books because the incarcerated kids couldn’t take the books in
their cells, so she created these copies, individual chapters, and they could
circulate. [12:20] I’m telling this because the fact that we are having this
reading room and that we are meeting people, and that we are having these
conversations, really furthers our research. We find out about these projects
by sharing knowledge.

[12:38]
Categories

[12:42]
A. F.: Whenever we set our reading room for the Piracy Project we need to
organise the books in a certain way. What we started to do now is that we’ve
created these different categories, and the first set of categories came from
the legal event. [12:56] So we set up, we organised the books in different
categories that would help us have questions for the lawyers, that would work
for groups of books instead of individual works. [13:07] And the idea is that,
for example, we are going to have our next events with librarians, and a new
set of categories would come. So the categories change as our interest or
research in the project is changing. [13:21] The current categories are:
Pirated Design, so books where the look of the book has been copied but not
the content; recirculation, books that have been copied trying to be
reproduced exactly as they were, because they need to be circulating again;
transformation, books that have been modified; For Sale Doctrine, so we
receive quite a few books where people haven’t actually made a copy but they
have cut the book or drawn inside the book, and legally you are allowed to do
anything with a book except copy it, so we thought that it was quite important
so that we didn’t have to discuss that with the lawyers; [14:03] Public
Domain, which are works that are already out of copyright, again, so whatever
you do with those books is legal; and collation, books gathered from different
sources, and who owns the copyright, which was a really interesting question,
which is when you have a book that has many authors – it’s really interesting.
Different systems in different countries have different ways to deal with who
owns the copyright and what are the rights of the owners of the different
works.

[14:36]
E. W.: Ahmet Şık is a journalist who published a book about the Ergenekon
scandal and the Turkish government, and connects that kind of mafioso
structures. Before the book could be published he was arrested and put in jail
for a whole year without trial, and he sent the PDF to friends, and the PDF
was circulating on many different computers so it couldn’t be taken. [15:06]
They published the PDF, and as authors they put over a hundred different
author names, so there was not just one author who could be taken into
responsibility.

[15:22] We have in the collection this book, it’s Teignmouth Electron by
Tacita Dean. This is the original, it’s published by Book Works and Steidl.
And to this round table, to this event, we invited also Jane Rolo, director of
Book Works (and she published this book). [15:41] And we invited her saying,
do you know that your book has been pirated? So she was really interested and
she came along. This is the pirated version, it’s Alias, [by] Damián Ortega in
Mexico. It’s a series of books where he translates texts and theory into
Spanish, which are not available in Spanish. So it’s about access, it’s about
circulation. [16:07] But actually he redesigned the book. The pirated version
looks very different, and it has a small film roll here, from Tacita Dean’s
book. And it was really amazing that Jane Rolo flipped the pirated book and
she said, well, actually this is really very nice.

[16:31] This is kind of a standard academic publishing format, it’s Gilles
Deleuze’s Proust and Signs, and the contributor, the artist who produced the
book is Neil Chapman, a writer based in London. And he made a facsimile of his
copy of this book, including the binding mistakes – so there’s one chapter
upside down printed in the book. [17:04] But the really interesting thing is
that he scanned it on his home inkjet printer – he scanned it on his scanner
and then printed it on his home inkjet printer. And the feel of it is very
crafty, because the inkjet has a very different typographic appearance than
the official copy. [17:28] And this makes you read the book in quite a
different way, you relate differently to the actual text. So it’s not just
about the information conveyed on this page, it’s really about how I can
relate to it visually. I find this really interesting when we put this book
into the library, in our collection in the library, and it sat next to the
original, [17:54] it raises really interesting questions about what kind of
authority decides which book can access the library, because this is
definitely and obviously a self-made copy – so if this self-made copy can
enter the library, any self-made text and self-published copy could enter the
library. So it was raising really interesting questions about gatekeepers of
knowledge, and hierarchies and authorities.

[18:26]
On-line catalogue

[18:30]
E. W.: We created this online catalogue give to an overview of what we have in
the collection. We have a cover photograph and then we have a short text where
we try to frame and to describe the approach taken, like the strategy, what’s
been pirated and what was the strategy. [18:55] And this is quite a lot,
because it’s giving you the framework of it, the conceptual framework. But
it’s not giving you the book, and this is really important because lots of the
books couldn’t be digitised, because it’s exactly their material quality which
is important, and which makes the point. [19:17] So if I would… if I have a
project which is working about mediation, and then I put another layer of
mediation on top of it by scanning it, it just wouldn’t work anymore.
[19:29] The purpose of the online catalogue isn’t to give you insight into all
the books to make actually all the information available, it’s more to talk
about the approach taken and the questions which are raised by this specific
book.

[19:47]
Cultures of the copy

[19:51]
A topic of cultural difference became really obvious when we went to Istanbul.
A copy shop which had many academic titles on the shelves, copied, pirated
titles... The fact is that in London, where I’m based, you can access anything
in any library, and it’s not too expensive to get the original book. [20:27]
But in Istanbul it’s very expensive, and the whole academic community thrives
on pirated, copied academic titles.

[20:39]
A. F.: So this is the original Jaime Bayly [No se lo digas a nadie], and this
is the pirated copy of the Jaime Bayly. This book is from Peru, it was bought
on the street, on a street market. [20:53] And Peru has a very big pirated
book market, most books in Peru are pirated. And we found this because there
was a rumour that books in Peru had been modified, pirated books. And this
version, the pirated version, has two extra chapters that are not in the
original one. [21:13] It’s really hard to understand the motivation behind it.
There’s no credit, so the person is inhabiting this author’s identity in a
sense. They are not getting any cultural capital from it. They are not getting
extra money, because if they are found out, nobody would buy books from this
publisher anymore. [21:33] The chapters are really well written, so you as a
reader would not realise that you are reading something that has been pirated.
And that was really fascinating in terms of what space you create. So when you
have this technology that allows you to have the book open and print it so
easily – how you can you take advantage of that, and take ownership or inhabit
these spaces that technology is opening up for you.

[22:01]
E. W.: Book piracy in China is really important when it comes to architecture
books, Western architecture books. Lots of architecture studios, but even
university libraries would buy from pirate book sellers, because it’s just so
much cheaper. [22:26] And we’ve found this Mark magazine with one of the
architecture sellers, and it’s supposed to be a bargain because you have six
magazines in one. [22:41] And we were really interested in the question, what
are the criteria for the editing? How do you edit six issues into one? But
basically everything is in here, from advertisement, to text, to images, it’s
all there. But then a really interesting question arises when it comes to
technology, because in this magazine there are pages in Italian language
clearly taken from other magazines.

[23:14]
A. F.: But it was also really interesting to go there, and actually interview
the distributor and go through the whole experience. We had to meet the
distributor in a neutral place, and he interviewed us to see if he was going
to allow us to go into the shop and buy his books. [23:31] And then going
through the catalogue and realising how Rem Koolhaas is really popular among
the pirates, but actually Chinese architecture is not popular, so there’s only
like three pirated books on Chinese architecture; or that from all the
architecture universities in the world only the AA books are copied – the
Architectural Association books. [23:51] And I think those small things are
really things that are worth spending time and reflecting on.

[23:58]
E. W.: We found this pirate copy of Tintin when we visited Beijing, and
obviously compared to the original, it looks different, a different format.
But also it’s black and white, but it’s not a photocopy of the original full-
colour. [24:23] It’s redrawn by hand, so all the drawings are redrawn and
obviously translated into Chinese. This is quite a labour of love, which is
really amazing. I can compare the two. The space is slightly differently
interpreted.

[24:50]
A. F.: And it’s really incredible, because at some point in China there were
14 or 15 different publishers publishing Tintin, and they all have their
versions. They are all hand-drawn by different people, so in the back, in
Chinese, it’s the credit. So you can buy it by deciding which person does the
best drawings of the production of Tintin, which I thought it was really…
[25:14] It’s such a different cultural way to actually give credit to the
person that is copying it, and recognise the labour, and the intention and the
value of that work.

[25:24]
Why books?

[25:28]
E. W.: Books have always been very important in my practice, in my artistic
practice, because lots of my projects culminated in a book, or led into a
book. And publications are important because they can circulate freely, they
can circulate much easier than artworks in a gallery. [25:50] So this question
of how to make things public and how to create an audience… not how to create
an audience – how to reach a reader and how to create a dialogue. So the book
is the perfect tool for this.

[26:04]
A. F.: My interest in books comes from making art, or thinking about art as a
way to interact with the world, so outside art settings, and I found books
really interesting in that. And that’s how I met Eva, in a sense, because I
was interested in that part of her practice. [26:26] When I found the Jaime
Bayly book, for me that was a real moment of excitement, of this person that
was doing this things in the world without taking any credit, but was having
such a profound effect on so many readers. I’m quite fascinated by that.
[26:44] I'm also really interested in research and using events – research
that works with people. So it kind of creates communities around certain
subjects, and then it uses that to explore different issues and to interact
with different areas of knowledge. And I think books are a privileged space to
do that.

[27:11]
E. W.: The books in the Piracy collection, because they are objects you can
grab, and because they need a place, they are a really important tool to start
a dialogue. When we had this reading room in the New York Art Book Fair, it
was really the book that created this moment when you started a conversation
with somebody else. And I think this is a very important moment in the Piracy
collection as a tool to start this discussion. [27:44] In the Piracy
collection the books are not so important to circulate, because they don’t
circulate. They only travel with us, in a way, or they travel here to Grand
Union to be installed in this reading room. But they are not meant to be
printed in a thousands print run and circulated in the world.

C. S.: So what is their function?

[28:08]
E. W.: The functions of the books here in the Piracy collection are to create
a dialogue, debate about these issues they are raising, and they are a tool
for a direct encounter, for a social encounter. As Andrea said, building a
community which is debating these issues which they are raising. [28:32] And I
also find it really interesting – when we where in China we also talked with
lots of publishers and artists, and they said that the book, in comparison to
an online file, is a really important tool in China, because it can’t be
controlled as easily as online communication. [28:53] So a book is an
autonomous object which can be passed on from one hand to the other, without
the state or another authority to intervene. I think that is an important
aspect when you talk about books in comparison with circulating information
online.

[29:13]
Passion for piracy

[29:17]
A. F.: I’m quite interested in enclosures, and people that jump those
enclosures. I’m kind of interested in these imposed… Maybe because I come from
Peru and we have a different relation to rules, and I’m in Britain where rules
seem to have so much strength. And I’m quite interested in this agency of
taking personal responsibility and saying, I’m going to obey this rule, I’m
not going to obey this one, and what does that mean. [29:42] That makes me
really interested in all these different strategies, and also to find a way to
value them and show them – how when you make this decision to jump a rule, you
actually help bring up questions, modifications, and propose new models or new
ways about thinking things. [30:02] And I think that is something that is part
of all the other projects that I do: stating the rules and the people that
break them.

[30:12]
E. W.: The pirate as a trickster who tries to push the boundaries which are
being set. And I think the interesting, or the complex part of the Piracy
Project is that we are not saying, I’m for piracy or I’m against piracy, I’m
for copyright, I’m against copyright. It’s really about testing out these
decisions and the own boundaries, the legal boundaries, the moral limits – to
push them and find them. [30:51] I mean, the Piracy Project as a whole is a
project which is pushing the boundaries because it started in this academic
library, and it’s assessed by copyright lawyers as illegal, so to run such a
project is an act of piracy in itself.

[31:17]
This method of doing or approaching this art project is to create a
collaboration to instigate this discourse, and this discourse is happening on
many different levels. One of them is conversation, debate. But the other one
is this material outcome, and then this material outcome is creating a new
debate.

Kelty, Bodo & Allen
Guerrilla Open Access
2018

Memory
of the
World

Edited by

Guerrilla
Open Access
Christopher
Kelty

Balazs
Bodo

Laurie
Allen

Published by Post Office Press,
Rope Press and Memory of the
World. Coventry, 2018.
© Memory of the World, papers by
respective Authors.
Freely available at:
http://radicaloa.co.uk/
conferences/ROA2
This is an open access pamphlet,
licensed under a Creative
Commons Attribution-ShareAlike
4.0 International (CC BY-SA 4.0)
license.
Read more about the license at:
https://creativecommons.org/
licenses/by-sa/4.0/
Figures and other media included
with this pamphlet may be under
different copyright restrictions.
Design by: Mihai Toma, Nick White
and Sean Worley
Printed by: Rope Press,
Birmingham

This pamphlet is published in a series
of 7 as part of the Radical Open
Access II – The Ethics of Care
conference, which took place June
26-27 at Coventry University. More
information about this conference
and about the contributors to this
pamphlet can be found at:
http://radicaloa.co.uk/conferences/
ROA2
This pamphlet was made possible due
to generous funding from the arts
and humanities research studio, The
Post Office, a project of Coventry
University’s Centre for Postdigital
Cultures and due to the combined
efforts of authors, editors, designers
and printers.

Table of Contents

Guerrilla Open Access:
Terms Of Struggle
Memory of the World
Page 4

Recursive Publics and Open Access
Christopher Kelty
Page 6

Own Nothing
Balazs Bodo
Page 16

What if We Aren't the Only
Guerrillas Out There?
Laurie Allen
Page 26

Guerilla
Open
Access:
Terms Of
Struggle

In the 1990s, the Internet offered a horizon from which to imagine what society
could become, promising autonomy and self-organization next to redistribution of
wealth and collectivized means of production. While the former was in line with the
dominant ideology of freedom, the latter ran contrary to the expanding enclosures
in capitalist globalization. This antagonism has led to epochal copyfights, where free
software and piracy kept the promise of radical commoning alive.
Free software, as Christopher Kelty writes in this pamphlet, provided a model ‘of a
shared, collective, process of making software, hardware and infrastructures that
cannot be appropriated by others’. Well into the 2000s, it served as an inspiration
for global free culture and open access movements who were speculating that
distributed infrastructures of knowledge production could be built, as the Internet
was, on top of free software.
For a moment, the hybrid world of ad-financed Internet giants—sharing code,
advocating open standards and interoperability—and users empowered by these
services, convinced almost everyone that a new reading/writing culture was
possible. Not long after the crash of 2008, these disruptors, now wary monopolists,
began to ingest smaller disruptors and close off their platforms. There was still
free software somewhere underneath, but without the ‘original sense of shared,
collective, process’. So, as Kelty suggests, it was hard to imagine that for-profit
academic publishers wouldn't try the same with open access.
Heeding Aaron Swartz’s call to civil disobedience, Guerrilla Open Access has
emerged out of the outrage over digitally-enabled enclosure of knowledge that
has allowed these for-profit academic publishers to appropriate extreme profits
that stand in stark contrast to the cuts, precarity, student debt and asymmetries
of access in education. Shadow libraries stood in for the access denied to public
libraries, drastically reducing global asymmetries in the process.

4

This radicalization of access has changed how publications
travel across time and space. Digital archiving, cataloging and
sharing is transforming what we once considered as private
libraries. Amateur librarianship is becoming public shadow
librarianship. Hybrid use, as poetically unpacked in Balazs
Bodo's reflection on his own personal library, is now entangling
print and digital in novel ways. And, as he warns, the terrain
of antagonism is shifting. While for-profit publishers are
seemingly conceding to Guerrilla Open Access, they are
opening new territories: platforms centralizing data, metrics
and workflows, subsuming academic autonomy into new
processes of value extraction.
The 2010s brought us hope and then realization how little
digital networks could help revolutionary movements. The
redistribution toward the wealthy, assisted by digitization, has
eroded institutions of solidarity. The embrace of privilege—
marked by misogyny, racism and xenophobia—this has catalyzed
is nowhere more evident than in the climate denialism of the
Trump administration. Guerrilla archiving of US government
climate change datasets, as recounted by Laurie Allen,
indicates that more technological innovation simply won't do
away with the 'post-truth' and that our institutions might be in
need of revision, replacement and repair.
As the contributions to this pamphlet indicate, the terms
of struggle have shifted: not only do we have to continue
defending our shadow libraries, but we need to take back the
autonomy of knowledge production and rebuild institutional
grounds of solidarity.

Memory of the World
http://memoryoftheworld.org

5

Recursive
Publics and
Open Access

Christopher
Kelty

Ten years ago, I published a book calledTwo Bits: The Cultural Significance of Free
Software (Kelty 2008).1 Duke University Press and my editor Ken Wissoker were
enthusiastically accommodating of my demands to make the book freely and openly
available. They also played along with my desire to release the 'source code' of the
book (i.e. HTML files of the chapters), and to compare the data on readers of the
open version to print customers. It was a moment of exploration for both scholarly
presses and for me. At the time, few authors were doing this other than Yochai Benkler
(2007) and Cory Doctorow2, both activists and advocates for free software and open
access (OA), much as I have been. We all shared, I think, a certain fanaticism of the
convert that came from recognizing free software as an historically new, and radically
different mode of organizing economic and political activity. Two Bits gave me a way
to talk not only about free software, but about OA and the politics of the university
(Kelty et al. 2008; Kelty 2014). Ten years later, I admit to a certain pessimism at the
way things have turned out. The promise of free software has foundered, though not
disappeared, and the question of what it means to achieve the goals of OA has been
swamped by concerns about costs, arcane details of repositories and versioning, and
ritual offerings to the metrics God.
When I wrote Two Bits, it was obvious to me that the collectives who built free
software were essential to the very structure and operation of a standardized
Internet. Today, free software and 'open source' refer to dramatically different
constellations of practice and people. Free software gathers around itself those
committed to the original sense of a shared, collective, process of making software,
hardware and infrastructures that cannot be appropriated by others. In political
terms, I have always identified free software with a very specific, updated, version
of classical Millian liberalism. It sustains a belief in the capacity for collective action
and rational thought as aids to establishing a flourishing human livelihood. Yet it
also preserves an outdated blind faith in the automatic functioning of meritorious
speech, that the best ideas will inevitably rise to the top. It is an updated classical
liberalism that saw in software and networks a new place to resist the tyranny of the
conventional and the taken for granted.

6

Christopher Kelty

By contrast, open source has come to mean something quite different: an ecosystem
controlled by an oligopoly of firms which maintains a shared pool of components and
frameworks that lower the costs of education, training, and software creation in the
service of establishing winner-take-all platforms. These are built on open source, but
they do not carry the principles of freedom or openness all the way through to the
platforms themselves.3 What open source has become is now almost the opposite of
free software—it is authoritarian, plutocratic, and nepotistic, everything liberalism
wanted to resist. For example, precarious labor and platforms such as Uber or Task
Rabbit are built upon and rely on the fruits of the labor of 'open source', but the
platforms that result do not follow the same principles—they are not open or free
in any meaningful sense—to say nothing of the Uber drivers or task rabbits who live
by the platforms.
Does OA face the same problem? In part, my desire to 'free the source' of my book
grew out of the unfinished business of digitizing the scholarly record. It is an irony
that much of the work that went into designing the Internet at its outset in the
1980s, such as gopher, WAIS, and the HTML of CERN, was conducted in the name
of the digital transformation of the library. But by 2007, these aims were swamped
by attempts to transform the Internet into a giant factory of data extraction. Even
in 2006-7 it was clear that this unfinished business of digitizing the scholarly record
was going to become a problem—both because it was being overshadowed by other
concerns, and because of the danger it would eventually be subjected to the very
platformization underway in other realms.
Because if the platform capitalism of today has ended up being parasitic on the
free software that enabled it, then why would this not also be true of scholarship
more generally? Are we not witnessing a transition to a world where scholarship
is directed—in its very content and organization—towards the profitability of the
platforms that ostensibly serve it?4 Is it not possible that the platforms created to
'serve science'—Elsevier's increasing acquisition of tools to control the entire lifecycle of research, or ResearchGate's ambition to become the single source for all
academics to network and share research—that these platforms might actually end up
warping the very content of scholarly production in the service of their profitability?
To put this even more clearly: OA has come to exist and scholarship is more available
and more widely distributed than ever before. But, scholars now have less control,
and have taken less responsibility for the means of production of scientific research,
its circulation, and perhaps even the content of that science.

Recursive Publics and Open Access

7

The Method of Modulation
When I wrote Two Bits I organized the argument around the idea of modulation:
free software is simply one assemblage of technologies, practices, and people
aimed at resolving certain problems regarding the relationship between knowledge
(or software tools related to knowledge) and power (Hacking 2004; Rabinow
2003). Free software as such was and still is changing as each of its elements
evolve or are recombined. Because OA derives some of its practices directly from
free software, it is possible to observe how these different elements have been
worked over in the recent past, as well as how new and surprising elements are
combined with OA to transform it. Looking back on the elements I identified as
central to free software, one can ask: how is OA different, and what new elements
are modulating it into something possibly unrecognizable?

Sharing source code
Shareable source code was a concrete and necessary achievement for free
software to be possible. Similarly, the necessary ability to circulate digital texts
is a significant achievement—but such texts are shareable in a much different way.
For source code, computable streams of text are everything—anything else is a
'blob' like an image, a video or any binary file. But scholarly texts are blobs: Word or
Portable Document Format (PDF) files. What's more, while software programmers
may love 'source code', academics generally hate it—anything less than the final,
typeset version is considered unfinished (see e.g. the endless disputes over
'author's final versions' plaguing OA).5 Finality is important. Modifiability of a text,
especially in the humanities and social sciences, is acceptable only when it is an
experiment of some kind.
In a sense, the source code of science is not a code at all, but a more abstract set
of relations between concepts, theories, tools, methods, and the disciplines and
networks of people who operate with them, critique them, extend them and try to
maintain control over them even as they are shared within these communities.

avoid the waste of 'reinventing the wheel' and of pathological
competition, allowing instead modular, reusable parts that
could be modified and recombined to build better things in an
upward spiral of innovation. The 1980s ideas of modularity,
modifiability, abstraction barriers, interchangeable units
have been essential to the creation of digital infrastructures.
To propose an 'open science' thus modulates this definition—
and the idea works in some sciences better than others.
Aside from the obviously different commercial contexts,
philosophers and literary theorists just don't think about
openness this way—theories and arguments may be used
as building blocks, but they are not modular in quite the
same way. Only the free circulation of the work, whether
for recombination or for reference and critique, remains a
sine qua non of the theory of openness proposed there. It
is opposed to a system where it is explicit that only certain
people have access to the texts (whether that be through
limitations of secrecy, or limitations on intellectual property,
or an implicit elitism).

Writing and using copyright licenses
Of all the components of free software that I analyzed, this
is the one practice that remains the least transformed—OA
texts use the same CC licenses pioneered in 2001, which
were a direct descendant of free software licenses.

For free software to make sense as a solution, those involved first had to
characterize the problem it solved—and they did so by identifying a pathology in
the worlds of corporate capitalism and engineering in the 1980s: that computer
corporations were closed organizations who re-invented basic tools and
infrastructures in a race to dominate a market. An 'open system,' by contrast, would

A novel modulation of these licenses is the OA policies (the
embrace of OA in Brazil for instance, or the spread of OA
Policies starting with Harvard and the University of California,
and extending to the EU Mandate from 2008 forward). Today
the ability to control the circulation of a text with IP rights is
far less economically central to the strategies of publishers
than it was in 2007, even if they persist in attempting to do
so. At the same time, funders, states, and universities have all
adopted patchwork policies intended to both sustain green
OA, and push publishers to innovate their own business
models in gold and hybrid OA. While green OA is a significant
success on paper, the actual use of it to circulate work pales

8

Recursive Publics and Open Access

Defining openness

Christopher Kelty

9

in comparison to the commercial control of circulation on the
one hand, and the increasing success of shadow libraries on
the other. Repositories have sprung up in every shape and
form, but they remain largely ad hoc, poorly coordinated, and
underfunded solutions to the problem of OA.

Coordinating collaborations
The collective activity of free software is ultimately the
most significant of its achievements—marrying a form of
intensive small-scale interaction amongst programmers,
with sophisticated software for managing complex objects
(version control and GitHub-like sites). There has been
constant innovation in these tools for controlling, measuring,
testing, and maintaining software.
By contrast, the collective activity of scholarship is still
largely a pre-modern affair. It is coordinated largely by the
idea of 'writing an article together' and not by working
to maintain some larger map of what a research topic,
community, or discipline has explored—what has worked and
what has not.
This focus on the coordination of collaboration seemed to
me to be one of the key advantages of free software, but it
has turned out to be almost totally absent from the practice
or discussion of OA. Collaboration and the recombination of
elements of scholarly practice obviously happens, but it does
not depend on OA in any systematic way: there is only the
counterfactual that without it, many different kinds of people
are excluded from collaboration or even simple participation
in, scholarship, something that most active scholars are
willfully ignorant of.

Fomenting a movement
I demoted the idea of a social movement to merely one
component of the success of free software, rather than let
it be—as most social scientists would have it—the principal
container for free software. They are not the whole story.

10

Christopher Kelty

Is there an OA movement? Yes and no. Librarians remain
the most activist and organized. The handful of academics
who care about it have shifted to caring about it in primarily
a bureaucratic sense, forsaking the cross-organizational
aspects of a movement in favor of activism within universities
(to which I plead guilty). But this transformation forsakes
the need for addressing the collective, collaborative
responsibility for scholarship in favor of letting individual
academics, departments, and disciplines be the focus for
such debates.
By contrast, the publishing industry works with a
phantasmatic idea of both an OA 'movement' and of the actual
practices of scholarship—they too defer, in speech if not in
practice, to the academics themselves, but at the same time
must create tools, innovate processes, establish procedures,
acquire tools and companies and so on in an effort to capture
these phantasms and to prevent academics from collectively
doing so on their own.
And what new components? The five above were central to
free software, but OA has other components that are arguably
more important to its organization and transformation.

Money, i.e. library budgets
Central to almost all of the politics and debates about OA
is the political economy of publication. From the 'bundles'
debates of the 1990s to the gold/green debates of the 2010s,
the sole source of money for publication long ago shifted into
the library budget. The relationship that library budgets
have to other parts of the political economy of research
(funding for research itself, debates about tenured/nontenured, adjunct and other temporary salary structures) has
shifted as a result of the demand for OA, leading libraries
to re-conceptualize themselves as potential publishers, and
publishers to re-conceptualize themselves as serving 'life
cycles' or 'pipeline' of research, not just its dissemination.

Recursive Publics and Open Access

11

Metrics
More than anything, OA is promoted as a way to continue
to feed the metrics God. OA means more citations, more
easily computable data, and more visible uses and re-uses of
publications (as well as 'open data' itself, when conceived of
as product and not measure). The innovations in the world
of metrics—from the quiet expansion of the platforms of the
publishers, to the invention of 'alt metrics', to the enthusiasm
of 'open science' for metrics-driven scientific methods—forms
a core feature of what 'OA' is today, in a way that was not true
of free software before it, where metrics concerning users,
downloads, commits, or lines of code were always after-thefact measures of quality, and not constitutive ones.
Other components of this sort might be proposed, but the
main point is to resist to clutch OA as if it were the beating
heart of a social transformation in science, as if it were a
thing that must exist, rather than a configuration of elements
at a moment in time. OA was a solution—but it is too easy to
lose sight of the problem.
Open Access without Recursive Publics
When we no longer have any commons, but only platforms,
will we still have knowledge as we know it? This is a question
at the heart of research in the philosophy and sociology
of knowledge—not just a concern for activism or social
movements. If knowledge is socially produced and maintained,
then the nature of the social bond surely matters to the
nature of that knowledge. This is not so different than asking
whether we will still have labor or work, as we have long known
it, in an age of precarity? What is the knowledge equivalent of
precarity (i.e. not just the existence of precarious knowledge
workers, but a kind of precarious knowledge as such)?

knowledge and power is shifting dramatically, because the costs—and the stakes—
of producing high quality, authoritative knowledge have also shifted. It is not so
powerful any longer; science does not speak truth to power because truth is no
longer so obviously important to power.
Although this is a pessimistic portrait, it may also be a sign of something yet to
come. Free software as a community, has been and still sometimes is critiqued as
being an exclusionary space of white male sociality (Nafus 2012; Massanari 2016;
Ford and Wajcman 2017; Reagle 2013). I think this critique is true, but it is less a
problem of identity than it is a pathology of a certain form of liberalism: a form that
demands that merit consists only in the content of the things we say (whether in
a political argument, a scientific paper, or a piece of code), and not in the ways we
say them, or who is encouraged to say them and who is encouraged to remain silent
(Dunbar-Hester 2014).
One might, as a result, choose to throw out liberalism altogether as a broken
philosophy of governance and liberation. But it might also be an opportunity to
focus much more specifically on a particular problem of liberalism, one that the
discourse of OA also relies on to a large extent. Perhaps it is not the case that
merit derives solely from the content of utterances freely and openly circulated,
but also from the ways in which they are uttered, and the dignity of the people
who utter them. An OA (or a free software) that embraced that principle would
demand that we pay attention to different problems: how are our platforms,
infrastructures, tools organized and built to support not just the circulation of
putatively true statements, but the ability to say them in situated and particular
ways, with respect for the dignity of who is saying them, and with the freedom to
explore the limits of that kind of liberalism, should we be so lucky to achieve it.

Do we not already see the evidence of this in the 'posttruth' of fake news, or the deliberate refusal by those in
power to countenance evidence, truth, or established
systems of argument and debate? The relationship between

12

Christopher Kelty

Recursive Publics and Open Access

13

References

¹ https://twobits.net/download/index.html

Benkler, Yochai. 2007. The Wealth of Networks: How Social Production Transforms Markets
and Freedom. Yale University Press.
Dunbar-Hester, Christina. 2014. Low Power to the People: Pirates, Protest, and Politics in
FM Radio Activism. MIT Press.
Ford, Heather, and Judy Wajcman. 2017. “‘Anyone Can Edit’, Not Everyone Does:
Wikipedia’s Infrastructure and the Gender Gap”. Social Studies of Science 47 (4):
511–527. doi:10.1177/0306312717692172.
Hacking, I. 2004. Historical Ontology. Harvard University Press.
Kelty, Christopher M. 2014. “Beyond Copyright and Technology: What Open Access Can
Tell Us About Precarity, Authority, Innovation, and Automation in the University
Today”. Cultural Anthropology 29 (2): 203–215. doi:10.14506/ca29.2.02.
——— . 2008. Two Bits: The Cultural Significance of Free Software. Durham, N.C.: Duke
University Press.
Kelty, Christopher M., et al. 2008. “Anthropology In/of Circulation: a Discussion”. Cultural
Anthropology 23 (3).
Massanari, Adrienne. 2016. “#gamergate and the Fappening: How Reddit’s Algorithm,
Governance, and Culture Support Toxic Technocultures”. New Media & Society 19 (3):
329–346. doi:10.1177/1461444815608807.
Nafus, Dawn. 2012. “‘Patches don’t have gender’: What is not open in open source
software”. New Media & Society 14, no. 4: 669–683. Visited on 04/01/2014. http://
doi:10.1177/1461444811422887.
Rabinow, Paul. 2003. Anthropos Today: Reflections on Modern Equipment. Princeton
University Press.
Reagle, Joseph. 2013. “"Free As in Sexist?" Free Culture and the Gender Gap”. First
Monday 18 (1). doi:10.5210/fm.v18i1.4291.

² https://craphound.com/

³ For example, Platform Cooperativism
https://platform.coop/directory

See for example the figure from ’Rent
Seeking by Elsevier,’ by Alejandro Posada
and George Chen (http://knowledgegap.
org/index.php/sub-projects/rent-seekingand-financialization-of-the-academicpublishing-industr preliminary-findings/)
4

See Sherpa/Romeo
http://www.sherpa.ac.uk/romeo/index.php
5

14

Christopher Kelty

Recursive Publics and Open Access

15

Own
Nothing

the contexts we were fleeing from. We made a choice to leave
behind the history, the discourses, the problems and the pain
that accumulated in the books of our library. I knew exactly
what it was I didn’t want to teach to my children once we moved.
So we did not move the books. We pretended that we would
never have to think about what this decision really meant. Up
until today. This year we needed to empty the study with the
shelves. So I’m standing in our library now, the dust covering
my face, my hands, my clothes. In the middle of the floor there
are three big crates and one small box. The small box swallows
what we’ll ultimately take with us, the books I want to show to
my son when he gets older, in case he still wants to read. One of
the big crates will be taken away by the antiquarian. The other
will be given to the school library next door. The third is the
wastebasket, where everything else will ultimately go.

Balazs
Bodo

Flow My Tears
My tears cut deep grooves into the dust on my face. Drip, drip,
drop, they hit the floor and disappear among the torn pages
scattered on the floor.
This year it dawned on us that we cannot postpone it any longer:
our personal library has to go. Our family moved countries
more than half a decade ago, we switched cultures, languages,
and chose another future. But the past, in the form of a few
thousand books in our personal library, was still neatly stacked
in our old apartment, patiently waiting, books that we bought
and enjoyed — and forgot; books that we bought and never
opened; books that we inherited from long-dead parents and
half-forgotten friends. Some of them were important. Others
were relevant at one point but no longer, yet they still reminded
us who we once were.
When we moved, we took no more than two suitcases of personal
belongings. The books were left behind. The library was like
a sick child or an ailing parent, it hung over our heads like an
unspoken threat, a curse. It was clear that sooner or later
something had to be done about it, but none of the options
available offered any consolation. It made no sense to move
three thousand books to the other side of this continent. We
decided to emigrate, and not to take our past with us, abandon

16

Balazs Bodo

Drip, drip, drip, my tears flow as I throw the books into this
last crate, drip, drip, drop. Sometimes I look at my partner,
working next to me, and I can see on her face that she is going
through the same emotions. I sometimes catch the sight of
her trembling hand, hesitating for a split second where a book
should ultimately go, whether we could, whether we should
save that particular one, because… But we either save them all
or we are as ruthless as all those millions of people throughout
history, who had an hour to pack their two suitcases before they
needed to leave. Do we truly need this book? Is this a book we’ll
want to read? Is this book an inseparable part of our identity?
Did we miss this book at all in the last five years? Is this a text
I want to preserve for the future, for potential grandchildren
who may not speak my mother tongue at all? What is the function
of the book? What is the function of this particular book in my
life? Why am I hesitating throwing it out? Why should I hesitate
at all? Drop, drop, drop, a decision has been made. Drop, drop,
drop, books are falling to the bottom of the crates.
We are killers, gutting our library. We are like the half-drown
sailor, who got entangled in the ropes, and went down with the
ship, and who now frantically tries to cut himself free from the
detritus that prevents him to reach the freedom of the surface,
the sunlight and the air.

Own Nothing

17

advantages of a fully digital book future. What I see now is the emergence of a strange
and shapeshifting-hybrid of diverse physical and electronic objects and practices,
where the relative strengths and weaknesses of these different formats nicely
complement each other.
This dawned on me after we had moved into an apartment without a bookshelf. I grew
up in a flat that housed my parents’ extensive book collection. I knew the books by their
cover and from time to time something made me want to take it from the shelf, open
it and read it. This is how I discovered many of my favorite books and writers. With
the e-reader, and some of the best shadow libraries at hand, I felt the same at first. I
felt liberated. I could experiment without cost or risk, I could start—or stop—a book,
I didn’t have to consider the cost of buying and storing a book that was ultimately
not meant for me. I could enjoy the books without having to carry the burden and
responsibility of ownership.

Own Nothing, Have Everything
Do you remember Napster’s slogan after it went legit, trying to transform itself into
a legal music service around 2005? ‘Own nothing, have everything’ – that was the
headline that was supposed to sell legal streaming music. How stupid, I thought. How
could you possibly think that lack of ownership would be a good selling point? What
does it even mean to ‘have everything’ without ownership? And why on earth would
not everyone want to own the most important constituents of their own self, their
own identity? The things I read, the things I sing, make me who I am. Why wouldn’t I
want to own these things?
How revolutionary this idea had been I reflected as I watched the local homeless folks
filling up their sacks with the remains of my library. How happy I would be if I could
have all this stuff I had just thrown away without actually having to own any of it. The
proliferation of digital texts led me to believe that we won’t be needing dead wood
libraries at all, at least no more than we need vinyl to listen to, or collect music. There
might be geeks, collectors, specialists, who for one reason or another still prefer the
physical form to the digital, but for the rest of us convenience, price, searchability, and
all the other digital goodies give enough reason not to collect stuff that collects dust.

Did you notice how deleting an epub file gives you a different feeling than throwing
out a book? You don’t have to feel guilty, you don’t have to feel anything at all.
So I was reading, reading, reading like never before. But at that time my son was too
young to read, so I didn’t have to think about him, or anyone else besides myself. But
as he was growing, it slowly dawned on me: without these physical books how will I be
able to give him the same chance of serendipity, and of discovery, enchantment, and
immersion that I got in my father’s library? And even later, what will I give him as his
heritage? Son, look into this folder of PDFs: this is my legacy, your heritage, explore,
enjoy, take pride in it?
Collections of anything, whether they are art, books, objects, people, are inseparable
from the person who assembled that collection, and when that person is gone, the
collection dies, as does the most important inroad to it: the will that created this
particular order of things has passed away. But the heavy and unavoidable physicality
of a book collection forces all those left behind to make an effort to approach, to
force their way into, and try to navigate that garden of forking paths that is someone
else’s library. Even if you ultimately get rid of everything, you have to introduce
yourself to every book, and let every book introduce itself to you, so you know what
you’re throwing out. Even if you’ll ultimately kill, you will need to look into the eyes of
all your victims.
With a digital collection that’s, of course, not the case.

I was wrong to think that. I now realize that the future is not fully digital, it is more
a physical-digital hybrid, in which the printed book is not simply an endangered
species protected by a few devoted eccentrics who refuse to embrace the obvious

The e-book is ephemeral. It has little past and even less chance to preserve the
fingerprints of its owners over time. It is impersonal, efficient, fast, abundant, like

18

Own Nothing

Balazs Bodo

19

fast food or plastic, it flows through the hand like sand. It lacks the embodiment, the
materiality which would give it a life in a temporal dimension. If you want to network the
dead and the unborn, as is the ambition of every book, then you need to print and bind,
and create heavy objects that are expensive, inefficient and a burden. This burden
subsiding in the object is the bridge that creates the intergenerational dimension,
that forces you to think of the value of a book.
Own nothing, have nothing. Own everything, and your children will hate you when
you die.
I have to say, I’m struggling to find a new balance here. I started to buy books again,
usually books that I’d already read from a stolen copy on-screen. I know what I want
to buy, I know what is worth preserving. I know what I want to show to my son, what
I want to pass on, what I would like to take care of over time. Before, book buying for
me was an investment into a stranger. Now that thrill is gone forever. I measure up
the merchandise well beforehand, I build an intimate relationship, we make love again
and again, before moving in together.
It is certainly a new kind of relationship with the books I bought since I got my e-reader.
I still have to come to terms with the fact that the books I bought this way are rarely
opened, as I already know them, and their role is not to be read, but to be together.
What do I buy, and what do I get? Temporal, existential security? The chance of
serendipity, if not for me, then for the people around me? The reassuring materiality
of the intimacy I built with these texts through another medium?
All of these and maybe more. But in any case, I sense that this library, the physical
embodiment of a physical-electronic hybrid collection with its unopened books and
overflowing e-reader memory cards, is very different from the library I had, and the
library I’m getting rid of at this very moment. The library that I inherited, the library
that grew organically from the detritus of the everyday, the library that accumulated
books similar to how the books accumulated dust, as is the natural way of things, this
library was full of unknowns, it was a library of potentiality, of opportunities, of trips
waiting to happen. This new, hybrid library is a collection of things that I’m familiar with.
I intimately know every piece, they hold little surprise, they offer few discoveries — at
least for me. The exploration, the discovery, the serendipity, the pre-screening takes
place on the e-reader, among the ephemeral, disposable PDFs and epubs.

We Won
This new hybrid model is based on the cheap availability of digital books. In my case, the
free availability of pirated copies available through shadow libraries. These libraries
don’t have everything on offer, but they have books in an order of magnitude larger
than I’ll ever have the time and chance to read, so they offer enough, enough for me
to fill up hard drives with books I want to read, or at least skim, to try, to taste. As if I
moved into an infinite bookstore or library, where I can be as promiscuous, explorative,
nomadic as I always wanted to be. I can flirt with books, I can have a quickie, or I can
leave them behind without shedding a single tear.
I don’t know how this hybrid library, and this analogue-digital hybrid practice of reading
and collecting would work without the shadow libraries which make everything freely
accessible. I rely on their supply to test texts, and feed and grow my print library.
E-books are cheaper than their print versions, but they still cost money, carry a
risk, a cost of experimentation. Book-streaming, the flat-rate, the all-you-can-eat
format of accessing books is at the moment only available to audiobooks, but rarely
for e-books. I wonder why.
Did you notice that there are no major book piracy lawsuits?

Have everything, and own a few.

20

Balazs Bodo

Own Nothing

21

Of course there is the lawsuit against Sci-Hub and Library Genesis in New York, and
there is another one in Canada against aaaaarg, causing major nuisance to those who
have been named in these cases. But this is almost negligible compared to the high
profile wars the music and audiovisual industries waged against Napster, Grokster,
Kazaa, megaupload and their likes. It is as if book publishers have completely given up on
trying to fight piracy in the courts, and have launched a few lawsuits only to maintain
the appearance that they still care about their digital copyrights. I wonder why.
I know the academic publishing industry slightly better than the mainstream popular
fiction market, and I have the feeling that in the former copyright-based business
models are slowly being replaced by something else. We see no major anti-piracy
efforts from publishers, not because piracy is non-existent — on the contrary, it is
global, and it is big — but because the publishers most probably realized that in the
long run the copyright-based exclusivity model is unsustainable. The copyright wars
of the last two decades taught them that law cannot put an end to piracy. As the
Sci-Hub case demonstrates, you can win all you want in a New York court, but this
has little real-world effect as long as the conditions that attract the users to the
shadow libraries remain.
Exclusivity-based publishing business models are under assault from other sides as
well. Mandated open access in the US and in the EU means that there is a quickly
growing body of new research for the access of which publishers cannot charge
money anymore. LibGen and Sci-Hub make it harder to charge for the back catalogue.
Their sheer existence teaches millions on what uncurtailed open access really is, and
makes it easier for university libraries to negotiate with publishers, as they don’t have
to worry about their patrons being left without any access at all.
The good news is that radical open access may well be happening. It is a less and less
radical idea to have things freely accessible. One has to be less and less radical to
achieve the openness that has been long overdue. Maybe it is not yet obvious today
and the victory is not yet universal, maybe it’ll take some extra years, maybe it won’t
ever be evenly distributed, but it is obvious that this genie, these millions of books on
everything from malaria treatments to critical theory, cannot be erased, and open
access will not be undone, and the future will be free of access barriers.

We Are Not Winning at All
But did we really win? If publishers are happy to let go of access control and copyright,
it means that they’ve found something that is even more profitable than selling
back to us academics the content that we have produced. And this more profitable
something is of course data. Did you notice where all the investment in academic
publishing went in the last decade? Did you notice SSRN, Mendeley, Academia.edu,
ScienceDirect, research platforms, citation software, manuscript repositories, library
systems being bought up by the academic publishing industry? All these platforms
and technologies operate on and support open access content, while they generate
data on the creation, distribution, and use of knowledge; on individuals, researchers,
students, and faculty; on institutions, departments, and programs. They produce data
on the performance, on the success and the failure of the whole domain of research
and education. This is the data that is being privatized, enclosed, packaged, and sold
back to us.

Drip, drip, drop, its only nostalgia. My heart is light, as I don’t have to worry about
gutting the library. Soon it won’t matter at all.

Taylorism reached academia. In the name of efficiency, austerity, and transparency,
our daily activities are measured, profiled, packaged, and sold to the highest bidder.
But in this process of quantification, knowledge on ourselves is lost for us, unless we
pay. We still have some patchy datasets on what we do, on who we are, we still have
this blurred reflection in the data-mirrors that we still do control. But this path of
self-enlightenment is quickly waning as less and less data sources about us are freely
available to us.

22

Own Nothing

Who is downloading books and articles? Everyone. Radical open access? We won,
if you like.

Balazs Bodo

23

I strongly believe that information on the self is the foundation
of self-determination. We need to have data on how we operate,
on what we do in order to know who we are. This is what is being
privatized away from the academic community, this is being
taken away from us.
Radical open access. Not of content, but of the data about
ourselves. This is the next challenge. We will digitize every page,
by hand if we must, that process cannot be stopped anymore.
No outside power can stop it and take that from us. Drip, drip,
drop, this is what I console myself with, as another handful of
books land among the waste.
But the data we lose now will not be so easy to reclaim.

24

Balazs Bodo

Own Nothing

25

What if
We Aren't
the Only
Guerrillas
Out
There?
Laurie
Allen

My goal in this paper is to tell the story
of a grass-roots project called Data
Refuge (http://www.datarefuge.org)
that I helped to co-found shortly after,
and in response to, the Trump election
in the USA. Trump’s reputation as
anti-science, and the promise that his
administration would elevate people into
positions of power with a track record
of distorting, hiding, or obscuring the
scientific evidence of climate change
caused widespread concern that
valuable federal data was now in danger.
The Data Refuge project grew from the
work of Professor Bethany Wiggin and
the graduate students within the Penn
Program in Environmental Humanities
(PPEH), notably Patricia Kim, and was
formed in collaboration with the Penn
Libraries, where I work. In this paper, I
will discuss the Data Refuge project, and
call attention to a few of the challenges
inherent in the effort, especially as
they overlap with the goals of this
collective. I am not a scholar. Instead,
I am a librarian, and my perspective as
a practicing informational professional
informs the way I approach this paper,
which weaves together the practical
and technical work of ‘saving data’ with
the theoretical, systemic, and ethical
issues that frame and inform what we
have done.

I work as the head of a relatively small and new department within the libraries
of the University of Pennsylvania, in the city of Philadelphia, Pennsylvania, in the
US. I was hired to lead the Digital Scholarship department in the spring of 2016,
and most of the seven (soon to be eight) people within Digital Scholarship joined
the library since then in newly created positions. Our group includes a mapping
and spatial data librarian and three people focused explicitly on supporting the
creation of new Digital Humanities scholarship. There are also two people in the
department who provide services connected with digital scholarly open access
publishing, including the maintenance of the Penn Libraries’ repository of open
access scholarship, and one Data Curation and Management Librarian. This
Data Librarian, Margaret Janz, started working with us in September 2016, and
features heavily into the story I’m about to tell about our work helping to build Data
Refuge. While Margaret and I were the main people in our department involved in
the project, it is useful to understand the work we did as connected more broadly
to the intersection of activities—from multimodal, digital, humanities creation to
open access publishing across disciplines—represented in our department in Penn.
At the start of Data Refuge, Professor Wiggin and her students had already been
exploring the ways that data about the environment can empower communities
through their art, activism, and research, especially along the lower Schuylkill
River in Philadelphia. They were especially attuned to the ways that missing data,
or data that is not collected or communicated, can be a source of disempowerment.
After the Trump election, PPEH graduate students raised the concern that the
political commitments of the new administration would result in the disappearance
of environmental and climate data that is vital to work in cities and communities
around the world. When they raised this concern with the library, together we cofounded Data Refuge. It is notable to point out that, while the Penn Libraries is a
large and relatively well-resourced research library in the United States, it did not
have any automatic way to ingest and steward the data that Professor Wiggin and
her students were concerned about. Our system of acquiring, storing, describing
and sharing publications did not account for, and could not easily handle, the
evident need to take in large quantities of public data from the open web and make
them available and citable by future scholars. Indeed, no large research library
was positioned to respond to this problem in a systematic way, though there was
general agreement that the community would like to help.
The collaborative, grass-roots movement that formed Data Refuge included many
librarians, archivists, and information professionals, but it was clear from the
beginning that my own profession did not have in place a system for stewarding
these vital information resources, or for treating them as ‘publications’ of the

26

Laurie Allen

What if We Aren't the Only Guerrillas Out There?

27

federal government. This fact was widely understood by various members of our
profession, notably by government document librarians, who had been calling
attention to this lack of infrastructure for years. As Government Information
Librarian Shari Laster described in a blog post in November of 2016, government
documents librarians have often felt like they are ‘under siege’ not from political
forces, but from the inattention to government documents afforded by our systems
and infrastructure. Describing the challenges facing the profession in light of the
2016 election, she commented: “Government documents collections in print are
being discarded, while few institutions are putting strategies in place for collecting
government information in digital formats. These strategies are not expanding in
tandem with the explosive proliferation of these sources, and certainly not in pace
with the changing demands for access from public users, researchers, students,
and more.” (Laster 2016) Beyond government documents librarians, our project
joined efforts that were ongoing in a huge range of communities, including: open
data and open science activists; archival experts working on methods of preserving
born-digital content; cultural historians; federal data producers and the archivists
and data scientists they work with; and, of course, scientists.

the scientific record to fight back, in a concrete way, against
an anti-fact establishment. By downloading data and moving
it into the Internet Archive and the Data Refuge repository,
volunteers were actively claiming the importance of accurate
records in maintaining or creating a just society.

This distributed approach to the work of downloading and saving the data
encouraged people to see how they were invested in environmental and scientific
data, and to consider how our government records should be considered the
property of all of us. Attending Data Rescue events was a way for people who value

Of course, access to data need not rely on its inclusion in
a particular repository. As is demonstrated so well in other
contexts, technological methods of sharing files can make
the digital repositories of libraries and archives seem like a
redundant holdover from the past. However, as I will argue
further in this paper, the data that was at risk in Data Refuge
differed in important ways from the contents of what Bodó
refers to as ‘shadow libraries’ (Bodó 2015). For opening
access to copies of journals articles, shadow libraries work
perfectly. However, the value of these shadow libraries relies
on the existence of the widely agreed upon trusted versions.
If in doubt about whether a copy is trustworthy, scholars
can turn to more mainstream copies, if necessary. This was
not the situation we faced building Data Refuge. Instead, we
were often dealing with the sole public, authoritative copy
of a federal dataset and had to assume that, if it were taken
down, there would be no way to check the authenticity of
other copies. The data was not easily pulled out of systems
as the data and the software that contained them were often
inextricably linked. We were dealing with unique, tremendously
valuable, but often difficult-to-untangle datasets rather than
neatly packaged publications. The workflow we established
was designed to privilege authenticity and trustworthiness
over either the speed of the copying or the easy usability of
the resulting data. 2 This extra care around authenticity was
necessary because of the politicized nature of environmental
data that made many people so worried about its removal
after the election. It was important that our project
supported the strongest possible scientific arguments that
could be made with the data we were ‘saving’. That meant
that our copies of the data needed to be citable in scientific
scholarly papers, and that those citations needed to be
able to withstand hostile political forces who claim that the
science of human-caused climate change is ‘uncertain’. It

28

What if We Aren't the Only Guerrillas Out There?

Born from the collaboration between Environmental Humanists and Librarians,
Data Refuge was always an effort both at storytelling and at storing data. During
the first six months of 2017, volunteers across the US (and elsewhere) organized
more than 50 Data Rescue events, with participants numbering in the thousands.
At each event, a group of volunteers used tools created by our collaborators at
the Environmental and Data Governance Initiative (EDGI) (https://envirodatagov.
org/) to support the End of Term Harvest (http://eotarchive.cdlib.org/) project
by identifying seeds from federal websites for web archiving in the Internet
Archive. Simultaneously, more technically advanced volunteers wrote scripts to
pull data out of complex data systems, and packaged that data for longer term
storage in a repository we maintained at datarefuge.org. Still other volunteers
held teach-ins, built profiles of data storytellers, and otherwise engaged in
safeguarding environmental and climate data through community action (see
http://www.ppehlab.org/datarefugepaths). The repository at datarefuge.org that
houses the more difficult data sources has been stewarded by myself and Margaret
Janz through our work at Penn Libraries, but it exists outside the library’s main
technical infrastructure.1

Laurie Allen

29

was easy to imagine in the Autumn of 2016, and even easier
to imagine now, that hostile actors might wish to muddy the
science of climate change by releasing fake data designed
to cast doubt on the science of climate change. For that
reasons, I believe that the unique facts we were seeking
to safeguard in the Data Refuge bear less similarity to the
contents of shadow libraries than they do to news reports
in our current distributed and destabilized mass media
environment. Referring to the ease of publishing ideas on the
open web, Zeynep Tufecki wrote in a recent column, “And
sure, it is a golden age of free speech—if you can believe your
lying eyes. Is that footage you’re watching real? Was it really
filmed where and when it says it was? Is it being shared by altright trolls or a swarm of Russian bots? Was it maybe even
generated with the help of artificial intelligence? (Yes, there
are systems that can create increasingly convincing fake
videos.)” (Tufekci 2018). This was the state we were trying to
avoid when it comes to scientific data, fearing that we might
have the only copy of a given dataset without solid proof that
our copy matched the original.
If US federal websites cease functioning as reliable stewards
of trustworthy scientific data, reproducing their data
without a new model of quality control risks producing the
very censorship that our efforts are supposed to avoid,
and further undermining faith in science. Said another way,
if volunteers duplicated federal data all over the Internet
without a trusted system for ensuring the authenticity of
that data, then as soon as the originals were removed, a sea of
fake copies could easily render the original invisible, and they
would be just as effectively censored. “The most effective
forms of censorship today involve meddling with trust and
attention, not muzzling speech itself.” (Tufekci 2018).
These concerns about the risks of open access to data should
not be understood as capitulation to the current marketdriven approach to scholarly publishing, nor as a call for
continuation of the status quo. Instead, I hope to encourage
continuation of the creative approaches to scholarship
represented in this collective. I also hope the issues raised in

30

Laurie Allen

Data Refuge will serve as a call to take greater responsibility for the systems into
which scholarship flows and the structures of power and assumptions of trust (by
whom, of whom) that scholarship relies on.
While plenty of participants in the Data Refuge community posited scalable
technological approaches to help people trust data, none emerged that were
strong enough to risk further undermining faith in science that a malicious attack
might cause. Instead of focusing on technical solutions that rely on the existing
systems staying roughly as they are, I would like to focus on developing networks
that explore different models of trust in institutions, and that honor the values
of marginalized and indigenous people. For example, in a recent paper, Stacie
Williams and Jarrett Drake describe the detailed decisions they made to establish
and become deserving of trust in supporting the creation of an Archive of Police
Violence in Cleveland (Williams and Drake 2017). The work of Michelle Caswell and
her collaborators on exploring post-custodial archives, and on engaging in radical
empathy in the archives provide great models of the kind of work that I believe is
necessary to establish new models of trust that might help inform new modes of
sharing and relying on community information (Caswell and Cifor 2016).
Beyond seeking new ways to build trust, it has become clear that new methods
are needed to help filter and contextualize publications. Our current reliance
on a few for-profit companies to filter and rank what we see of the information
landscape has proved to be tremendously harmful for the dissemination of facts,
and has been especially dangerous to marginalized communities (Noble 2018).
While the world of scholarly humanities publishing is doing somewhat better than
open data or mass media, there is still a risk that without new forms of filtering and
establishing quality and trustworthiness, good ideas and important scholarship will
be lost in the rankings of search engines and the algorithms of social media. We
need new, large scale systems to help people filter and rank the information on the
open web. In our current situation, according to media theorist dana boyd, “[t]he
onus is on the public to interpret what they see. To self-investigate. Since we live
in a neoliberal society that prioritizes individual agency, we double down on media
literacy as the ‘solution’ to misinformation. It’s up to each of us as individuals to
decide for ourselves whether or not what we’re getting is true.” (boyd 2018)
In closing, I’ll return to the notion of Guerrilla warfare that brought this panel
together. While some of our collaborators and some in the press did use the term
‘Guerrilla archiving’ to describe the data rescue efforts (Currie and Paris 2017),
I generally did not. The work we did was indeed designed to take advantage of
tactics that allow a small number of actors to resist giant state power. However,

What if We Aren't the Only Guerrillas Out There?

31

if anything, the most direct target of these guerrilla actions in my mind was not
the Trump administration. Instead, the action was designed to prompt responses
by the institutions where many of us work and by communities of scholars and
activists who make up these institutions. It was designed to get as many people as
possible working to address the complex issues raised by the two interconnected
challenges that the Data Refuge project threw into relief. The first challenge,
of course, is the need for new scientific, artistic, scholarly and narrative ways of
contending with the reality of global, human-made climate change. And the second
challenge, as I’ve argued in this paper, is that our systems of establishing and
signaling trustworthiness, quality, reliability and stability of information are in dire
need of creative intervention as well. It is not just publishing but all of our systems
for discovering, sharing, acquiring, describing and storing that scholarship that
need support, maintenance, repair, and perhaps in some cases, replacement. And
this work will rely on scholars, as well as expert information practitioners from a
range of fields (Caswell 2016).

¹ At the time of this writing, we are working
on un-packing and repackaging the data
within Data Refuge for eventual inclusion
in various Research Library Repositories.

Ideally, of course, all federally produced
datasets would be published in neatly
packaged and more easily preservable
containers, along with enough technical
checks to ensure their validity (hashes,
checksums, etc.) and each agency would
create a periodical published inventory of
datasets. But the situation we encountered
with Data Refuge did not start us in
anything like that situation, despite the
hugely successful and important work of
the employees who created and maintained
data.gov. For a fuller view of this workflow,
see my talk at CSVConf 2017 (Allen 2017).

2

Closing note: The workflow established and used at Data Rescue events was
designed to tackle this set of difficult issues, but needed refinement, and was retired
in mid-2017. The Data Refuge project continues, led by Professor Wiggin and her
colleagues and students at PPEH, who are “building a storybank to document
how data lives in the world – and how it connects people, places, and non-human
species.” (“DataRefuge” n.d.) In addition, the set of issues raised by Data Refuge
continue to inform my work and the work of many of our collaborators.

32

Laurie Allen

What if We Aren't the Only Guerrillas Out There?

33

References
Allen, Laurie. 2017. “Contexts and Institutions.” Paper presented at csv,conf,v3, Portland,
Oregon, May 3rd 2017. Accessed May 20, 2018. https://youtu.be/V2gwi0CRYto.
Bodo, Balazs. 2015. “Libraries in the Post - Scarcity Era.” In Copyrighting Creativity:
Creative Values, Cultural Heritage Institutions and Systems of Intellectual Property,
edited by Porsdam. Routledge.
boyd, danah. 2018. “You Think You Want Media Literacy… Do You?” Data & Society: Points.
March 9, 2018. https://points.datasociety.net/you-think-you-want-media-literacy-doyou-7cad6af18ec2.
Caswell, Michelle. 2016. “‘The Archive’ Is Not an Archives: On Acknowledging the
Intellectual Contributions of Archival Studies.” Reconstruction: Studies in
Contemporary Culture 16:1 (2016) (special issue “Archives on Fire”),
http://reconstruction.eserver.org/Issues/161/Caswell.shtml.
Caswell, Michelle, and Marika Cifor. 2016. “From Human Rights to Feminist Ethics: Radical
Empathy in the Archives.” Archivaria 82 (0): 23–43.
Currie, Morgan, and Britt Paris. 2017. “How the ‘Guerrilla Archivists’ Saved History – and
Are Doing It Again under Trump.” The Conversation (blog). February 21, 2017.
https://theconversation.com/how-the-guerrilla-archivists-saved-history-and-aredoing-it-again-under-trump-72346.
“DataRefuge.” n.d. PPEH Lab. Accessed May 21, 2018.
http://www.ppehlab.org/datarefuge/.
“DataRescue Paths.” n.d. PPEH Lab. Accessed May 20, 2018.
http://www.ppehlab.org/datarefugepaths/.
“End of Term Web Archive: U.S. Government Websites.” n.d. Accessed May 20, 2018.
http://eotarchive.cdlib.org/.
“Environmental Data and Governance Initiative.” n.d. EDGI. Accessed May 19, 2018.
https://envirodatagov.org/.
Laster, Shari. 2016. “After the Election: Libraries, Librarians, and the Government - Free
Government Information (FGI).” Free Government Information (FGI). November 23,
2016. https://freegovinfo.info/node/11451.
Noble, Safiya Umoja. 2018. Algorithms of Oppression: How Search Engines Reinforce
Racism. New York: NYU Press.
Tufekci, Zeynep. 2018. “It’s the (Democracy-Poisoning) Golden Age of Free Speech.”
WIRED. Accessed May 20, 2018.
https://www.wired.com/story/free-speech-issue-tech-turmoil-new-censorship/.
“Welcome - Data Refuge.” n.d. Accessed May 20, 2018. https://www.datarefuge.org/.
Williams, Stacie M, and Jarrett Drake. 2017. “Power to the People: Documenting Police
Violence in Cleveland.” Journal of Critical Library and Information Studies 1 (2).
https://doi.org/10.24242/jclis.v1i2.33.

34

Laurie Allen

Guerrilla
Open
Access

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.