Graziano, Mars & Medak
Learning from #Syllabus
2019

ACTIONS

LEARNING FROM
#SYLLABUS
VALERIA GRAZIANO,
MARCELL MARS,
TOMISLAV MEDAK

115

116

STATE MACHINES

LEARNING FROM #SYLLABUS
VALERIA GRAZIANO, MARCELL MARS, TOMISLAV MEDAK
The syllabus is the manifesto of the 21st century.
—Sean Dockray and Benjamin Forster1
#Syllabus Struggles
In August 2014, Michael Brown, an 18-year-old boy living in Ferguson, Missouri,
was fatally shot by police officer Darren Wilson. Soon after, as the civil protests denouncing police brutality and institutional racism began to mount across the United
States, Dr. Marcia Chatelain, Associate Professor of History and African American
Studies at Georgetown University, launched an online call urging other academics
and teachers ‘to devote the first day of classes to a conversation about Ferguson’ and ‘to recommend texts, collaborate on conversation starters, and inspire
dialogue about some aspect of the Ferguson crisis.’2 Chatelain did so using the
hashtag #FergusonSyllabus.
Also in August 2014, using the hashtag #gamergate, groups of users on 4Chan,
8Chan, Twitter, and Reddit instigated a misogynistic harassment campaign against
game developers Zoë Quinn and Brianna Wu, media critic Anita Sarkeesian, as well as
a number of other female and feminist game producers, journalists, and critics. In the
following weeks, The New Inquiry editors and contributors compiled a reading list and
issued a call for suggestions for their ‘TNI Syllabus: Gaming and Feminism’.3
In June 2015, Donald Trump announced his candidacy for President of the United
States. In the weeks that followed, he became the presumptive Republican nominee,
and The Chronicle of Higher Education introduced the syllabus ‘Trump 101’.4 Historians N.D.B. Connolly and Keisha N. Blain found ‘Trump 101’ inadequate, ‘a mock college syllabus […] suffer[ing] from a number of egregious omissions and inaccuracies’,
failing to include ‘contributions of scholars of color and address the critical subjects
of Trump’s racism, sexism, and xenophobia’. They assembled ‘Trump Syllabus 2.0’.5
Soon after, in response to a video in which Trump engaged in ‘an extremely lewd
conversation about women’ with TV host Billy Bush, Laura Ciolkowski put together a
‘Rape Culture Syllabus’.6

1
2
3
4
5
6

Sean Dockray, Benjamin Forster, and Public Office, ‘README.md’, Hyperreadings, 15 February
2018, https://samiz-dat.github.io/hyperreadings/.
Marcia Chatelain, ‘Teaching the #FergusonSyllabus’, Dissent Magazine, 28 November 2014,
https://www.dissentmagazine.org/blog/teaching-ferguson-syllabus/.
‘TNI Syllabus: Gaming and Feminism’, The New Inquiry, 2 September 2014, https://thenewinquiry.
com/tni-syllabus-gaming-and-feminism/.
‘Trump 101’, The Chronicle of Higher Education, 19 June 2016, https://www.chronicle.com/article/
Trump-Syllabus/236824/.
N.D.B. Connolly and Keisha N. Blain, ‘Trump Syllabus 2.0’, Public Books, 28 June 2016, https://
www.publicbooks.org/trump-syllabus-2-0/.
Laura Ciolkowski, ‘Rape Culture Syllabus’, Public Books, 15 October 2016, https://www.
publicbooks.org/rape-culture-syllabus/.

ACTIONS

117

In April 2016, members of the Standing Rock Sioux tribe established the Sacred Stone
Camp and started the protest against the Dakota Access Pipeline, the construction of
which threatened the only water supply at the Standing Rock Reservation. The protest at the site of the pipeline became the largest gathering of native Americans in
the last 100 years and they earned significant international support for their ReZpect
Our Water campaign. As the struggle between protestors and the armed forces unfolded, a group of Indigenous scholars, activists, and supporters of the struggles of
First Nations people and persons of color, gathered under the name the NYC Stands
for Standing Rock Committee, put together #StandingRockSyllabus.7
The list of online syllabi created in response to political struggles has continued to
grow, and at present includes many more examples:
All Monuments Must Fall Syllabus
#Blkwomensyllabus
#BLMSyllabus
#BlackIslamSyllabus
#CharlestonSyllabus
#ColinKaepernickSyllabus
#ImmigrationSyllabus
Puerto Rico Syllabus (#PRSyllabus)
#SayHerNameSyllabus
Syllabus for White People to Educate Themselves
Syllabus: Women and Gender Non-Conforming People Writing about Tech
#WakandaSyllabus
What To Do Instead of Calling the Police: A Guide, A Syllabus, A Conversation, A
Process
#YourBaltimoreSyllabus
It would be hard to compile a comprehensive list of all the online syllabi that have
been created by social justice movements in the last five years, especially, but not
exclusively, those initiated in North America in the context of feminist and anti-racist
activism. In what is now a widely spread phenomenon, these political struggles use
social networks and resort to the hashtag template ‘#___Syllabus’ to issue calls for
the bottom-up aggregation of resources necessary for political analysis and pedagogy
centering on their concerns. For this reason, we’ll call this phenomenon ‘#Syllabus’.
During the same years that saw the spread of the #Syllabus phenomenon, university
course syllabi have also been transitioning online, often in a top-down process initiated
by academic institutions, which has seen the syllabus become a contested document
in the midst of increasing casualization of teaching labor, expansion of copyright protections, and technology-driven marketization of education.
In what follows, we retrace the development of the online syllabus in both of these
contexts, to investigate the politics enmeshed in this new media object. Our argument

7

‘#StandingRockSyllabus’, NYC Stands with Standing Rock, 11 October 2016, https://
nycstandswithstandingrock.wordpress.com/standingrocksyllabus/.

118

STATE MACHINES

is that, on the one hand, #Syllabus names the problem of contemporary political culture as pedagogical in nature, while, on the other hand, it also exposes academicized
critical pedagogy and intellectuality as insufficiently political in their relation to lived
social reality. Situating our own stakes as both activists and academics in the present
debate, we explore some ways in which the radical politics of #Syllabus could be supported to grow and develop as an articulation of solidarity between amateur librarians
and radical educators.
#Syllabus in Historical Context: Social Movements and Self-Education
When Professor Chatelain launched her call for #FergusonSyllabus, she was mainly
addressing a community of fellow educators:
I knew Ferguson would be a challenge for teachers: When schools opened across
the country, how were they going to talk about what happened? My idea was simple, but has resonated across the country: Reach out to the educators who use
Twitter. Ask them to commit to talking about Ferguson on the first day of classes.
Suggest a book, an article, a film, a song, a piece of artwork, or an assignment that
speaks to some aspect of Ferguson. Use the hashtag: #FergusonSyllabus.8
Her call had a much greater resonance than she had originally anticipated as it reached
beyond the limits of the academic community. #FergusonSyllabus had both a significant impact in shaping the analysis and the response to the shooting of Michael
Brown, and in inspiring the many other #Syllabus calls that soon followed.
The #Syllabus phenomenon comprises different approaches and modes of operating. In some cases, the material is clearly claimed as the creation of a single individual, as in the case of #BlackLivesMatterSyllabus, which is prefaced on the project’s
landing page by a warning to readers that ‘material compiled in this syllabus should
not be duplicated without proper citation and attribution.’9 A very different position on
intellectual property has been embraced by other #Syllabus interventions that have
chosen a more commoning stance. #StandingRockSyllabus, for instance, is introduced as a crowd-sourced process and as a useful ‘tool to access research usually
kept behind paywalls.’10
The different workflows, modes of engagements, and positioning in relation to
intellectual property make #Syllabus readable as symptomatic of the multiplicity
that composes social justice movements. There is something old school—quite
literally—about the idea of calling a list of online resources a ‘syllabus’; a certain
quaintness, evoking thoughts of teachers and homework. This is worthy of investigation especially if contrasted with the attention dedicated to other online cultural
phenomena such as memes or fake news. Could it be that the online syllabus offers

8

9
10

Marcia Chatelain, ‘How to Teach Kids About What’s Happening in Ferguson’, The Atlantic, 25
August 2014, https://www.theatlantic.com/education/archive/2014/08/how-to-teach-kids-aboutwhats-happening-in-ferguson/379049/.
Frank Leon Roberts, ‘Black Lives Matter: Race, Resistance, and Populist Protest’, 2016, http://
www.blacklivesmattersyllabus.com/fall2016/.
‘#StandingRockSyllabus’, NYC Stands with Standing Rock, 11 October 2016, https://
nycstandswithstandingrock.wordpress.com/standingrocksyllabus/.

ACTIONS

119

a useful, fresh format precisely for the characteristics that foreground its connections to older pedagogical traditions and techniques, predating digital cultures?
#Syllabus can indeed be analyzed as falling within a long lineage of pedagogical tools
created by social movements to support processes of political subjectivation and the
building of collective consciousness. Activists and militant organizers have time and
again created and used various textual media objects—such as handouts, pamphlets,
cookbooks, readers, or manifestos—to facilitate a shared political analysis and foment
mass political mobilization.
In the context of the US, anti-racist movements have historically placed great emphasis on critical pedagogy and self-education. In 1964, the Council of Federated Organizations (an alliance of civil rights initiatives) and the Student Nonviolent
Coordinating Committee (SNCC), created a network of 41 temporary alternative
schools in Mississippi. Recently, the Freedom Library Project, a campaign born out
of #FergusonSyllabus to finance under-resourced pedagogical initiatives, openly
referenced this as a source of inspiration. The Freedom Summer Project of 1964
brought hundreds of activists, students, and scholars (many of whom were white)
from the north of the country to teach topics and issues that the discriminatory
state schools would not offer to black students. In the words of an SNCC report,
Freedom Schools were established following the belief that ‘education—facts to
use and freedom to use them—is the basis of democracy’,11 a conviction echoed
by the ethos of contemporary #Syllabus initiatives.
Bob Moses, a civil rights movement leader who was the head of the literary skills initiative in Mississippi, recalls the movement’s interest, at the time, in teaching methods
that used the very production of teaching materials as a pedagogical tool:
I had gotten hold of a text and was using it with some adults […] and noticed that
they couldn’t handle it because the pictures weren’t suited to what they knew […]
That got me into thinking about developing something closer to what people were
doing. What I was interested in was the idea of training SNCC workers to develop
material with the people we were working with.12
It is significant that for him the actual use of the materials the group created was much
less important than the process of producing the teaching materials together. This focus
on what could be named as a ‘pedagogy of teaching’, or perhaps more accurately ‘the
pedagogy of preparing teaching materials’, is also a relevant mechanism at play in the
current #Syllabus initiatives, as their crowdsourcing encourages different kinds of people
to contribute what they feel might be relevant resources for the broader movement.
Alongside the crucial import of radical black organizing, another relevant genealogy in
which to place #Syllabus would be the international feminist movement and, in particular, the strategies developed in the 70s campaign Wages for Housework, spearheaded

11
12

Daniel Perlstein, ‘Teaching Freedom: SNCC and the Creation of the Mississippi Freedom Schools’,
History of Education Quarterly 30.3 (Autumn 1990): 302.
Perlstein, ‘Teaching Freedom’: 306.

120

STATE MACHINES

by Selma James and Silvia Federici. The Wages for Housework campaign drove home
the point that unwaged reproductive labor provides a foundation for capitalist exploitation. They wanted to encourage women to denaturalize and question the accepted
division of labor into remunerated work outside the house and labor of love within
the confines of domesticity, discussing taboo topics such as ‘prostitution as socialized housework’ and ‘forced sterilization’ as issues impacting poor, often racialized,
women. The organizing efforts of Wages for Housework held political pedagogy at their
core. They understood that that pedagogy required:
having literature and other materials available to explain our goals, all written in a
language that women can understand. We also need different types of documents,
some more theoretical, others circulating information about struggles. It is important
that we have documents for women who have never had any political experience.
This is why our priority is to write a popular pamphlet that we can distribute massively and for free—because women have no money.13
The obstacles faced by the Wages for Housework campaign were many, beginning
with the issue of how to reach a dispersed constituency of isolated housewives
and how to keep the revolutionary message at the core of their claims accessible
to different groups. In order to tackle these challenges, the organizers developed
a number of innovative communication tactics and pedagogical tools, including
strategies to gain mainstream media coverage, pamphlets and leaflets translated
into different languages,14 a storefront shop in Brooklyn, and promotional tables at
local events.
Freedom Schools and the Wages for Housework campaign are only two amongst
the many examples of the critical pedagogies developed within social movements.
The #Syllabus phenomenon clearly stands in the lineage of this history, yet we should
also highlight its specificity in relation to the contemporary political context in which it
emerged. The #Syllabus acknowledges that since the 70s—and also due to students’
participation in protests and their display of solidarity with other political movements—
subjects such as Marxist critical theory, women studies, gender studies, and African
American studies, together with some of the principles first developed in critical pedagogy, have become integrated into the educational system. The fact that many initiators of #Syllabus initiatives are women and Black academics speaks to this historical
shift as an achievement of that period of struggles. However, the very necessity felt by
these educators to kick-start their #Syllabus campaigns outside the confines of academia simultaneously reveals the difficulties they encounter within the current privatized and exclusionary educational complex.

13
14

Silvia Federici and Arlen Austin (eds) The New York Wages for Housework Committee 1972-1977:
History, Theory and Documents. New York: Autonomedia, 2017: 37.
Some of the flyers and pamphlets were digitized by MayDay Rooms, ‘a safe haven for historical
material linked to social movements, experimental culture and the radical expression of
marginalised figures and groups’ in London, and can be found in their online archive: ‘Wages
for Housework: Pamphlets – Flyers – Photographs’, MayDay Rooms, http://maydayrooms.org/
archives/wages-for-housework/wfhw-pamphlets-flyers-photographs/.

ACTIONS

121

#Syllabus as a Media Object
Besides its contextualization within the historical legacy of previous grassroots mobilizations, it is also necessary to discuss #Syllabus as a new media object in its own
right, in order to fully grasp its relevance for the future politics of knowledge production and transmission.
If we were to describe this object, a #Syllabus would be an ordered list of links to
scholarly texts, news reports, and audiovisual media, mostly aggregated through a
participatory and iterative process, and created in response to political events indicative of larger conditions of structural oppression. Still, as we have seen, #Syllabus
as a media object doesn’t follow a strict format. It varies based on the initial vision
of their initiators, political causes, and social composition of the relevant struggle.
Nor does it follow the format of traditional academic syllabi. While a list of learning
resources is at the heart of any syllabus, a boilerplate university syllabus typically
also includes objectives, a timetable, attendance, coursework, examination, and an
outline of the grading system used for the given course. Relieved of these institutional
requirements, the #Syllabus typically includes only a reading list and a hashtag. The
reading list provides resources for understanding what is relevant to the here and
now, while the hashtag provides a way to disseminate across social networks the call
to both collectively edit and teach what is relevant to the here and now. Both the list
and the hashtag are specificities and formal features of the contemporary (internet)
culture and therefore merit further exploration in relation to the social dynamics at
play in #Syllabus initiatives.
The different phases of the internet’s development approached the problem of the
discoverability of relevant information in different ways. In the early days, the Gopher
protocol organized information into a hierarchical file tree. With the rise of World Wide
Web (WWW), Yahoo tried to employ experts to classify and catalog the internet into
a directory of links. That seemed to be a successful approach for a while, but then
Google (founded in 1998) came along and started to use a webgraph of links to rank
the importance of web pages relative to a given search query.
In 2005, Clay Shirky wrote the essay ‘Ontology is Overrated: Categories, Links and
Tags’,15 developed from his earlier talk ‘Folksonomies and Tags: The Rise of User-Developed Classification’. Shirky used Yahoo’s attempt to categorize the WWW to argue
against any attempt to classify a vast heterogenous body of information into a single
hierarchical categorical system. In his words: ‘[Yahoo] missed [...] that, if you’ve got
enough links, you don’t need the hierarchy anymore. There is no shelf. There is no file
system. The links alone are enough.’ Those words resonated with many. By following
simple formatting rules, we, the internet users, whom Time magazine named Person of
the Year in 2006, proved that it is possible to collectively write the largest encyclopedia
ever. But, even beyond that, and as per Shirky’s argument, if enough of us organized
our own snippets of the vast body of the internet, we could replace old canons, hierarchies, and ontologies with folksonomies, social bookmarks, and (hash)tags.

15

Clay Shirky, ‘Ontology Is Overrated: Categories, Links, and Tags’, 2005, http://shirky.com/writings/
herecomeseverybody/ontology_overrated.html.

122

STATE MACHINES

Very few who lived through those times would have thought that only a few years later
most user-driven services would be acquired by a small number of successful companies and then be shut down. Or, that Google would decide not to include the biggest
hashtag-driven platform, Twitter, into its search index and that the search results on
its first page would only come from a handful of usual suspects: media conglomerates, Wikipedia, Facebook, LinkedIn, Amazon, Reddit, Quora. Or, that Twitter would
become the main channel for the racist, misogynist, fascist escapades of the President
of United States.
This internet folk naivety—stoked by an equally enthusiastic, venture-capital-backed
startup culture—was not just naivety. This was also a period of massive experimental
use of these emerging platforms. Therefore, this history would merit to be properly
revisited and researched. In this text, however, we can only hint to this history: to contextualize how the hashtag as a formalization initially emerged, and how with time the
user-driven web lost some of its potential. Nonetheless, hashtags today still succeed in
propagating political mobilizations in the network environment. Some will say that this
propagation is nothing but a reflection of the internet as a propaganda machine, and
there’s no denying that hashtags do serve a propaganda function. However, it equally
matters that hashtags retain the capacity to shape coordination and self-organization,
and they are therefore a reflection of the internet as an organization machine.
As mentioned, #Syllabus as a media object is an ordered list of links to resources.
In the long history of knowledge retrieval systems and attempts to help users find
relevant information from big archives, the list on the internet continues in the tradition of the index card catalog in libraries, of charts in the music industry, or mixtapes
and playlists in popular culture, helping people tell their stories of what is relevant and
what isn’t through an ordered sequence of items. The list (as a format) together with
the hashtag find themselves in the list (pun intended) of the most iconic media objects
of the internet. In the network media environment, being smart in creating new lists
became the way to displace old lists of relevance, the way to dismantle canons, the
way to unlearn. The way to become relevant.
The Academic Syllabus Migrates Online
#Syllabus interventions are a challenge issued by political struggles to educators as
they expose a fundamental contradiction in the operations of academia. While critical pedagogies of yesteryear’s social movements have become integrated into the
education system, the radical lessons that these pedagogies teach students don’t
easily reconcile with their experience: professional practice courses, the rethoric of
employability and compulsory internships, where what they learn is merely instrumental, leaves them wondering how on earth they are to apply their Marxism or feminism
to their everyday lives?
Cognitive dissonance is at the basis of degrees in the liberal arts. And to make things
worse, the marketization of higher education, the growing fees and the privatization
of research has placed universities in a position where they increasingly struggle to
provide institutional space for critical interventions in social reality. As universities become more dependent on the ‘customer satisfaction’ of their students for survival, they
steer away from heated political topics or from supporting faculty members who might
decide to engage with them. Borrowing the words of Stefano Harney and Fred Moten,

ACTIONS

123

‘policy posits curriculum against study’,16 creating the paradoxical situation wherein
today’s universities are places in which it is possible to do almost everything except
study. What Harney and Moten propose instead is the re-appropriation of the diffuse
capacity of knowledge generation that stems from the collective processes of selforganization and commoning. As Moten puts it: ‘When I think about the way we use the
term ‘study,’ I think we are committed to the idea that study is what you do with other
people.’17 And it is this practice of sharing a common repertoire—what Moten and
Harney call ‘rehearsal’18—that is crucially constitutive of a crowdsourced #Syllabus.
This contradiction and the tensions it brings to contemporary neoliberal academia can
be symptomatically observed in the recent evolution of the traditional academic syllabus. As a double consequence of (some) critical pedagogies becoming incorporated
into the teaching process and universities striving to reduce their liability risks, academic syllabi have become increasingly complex and extensive documents. They are
now understood as both a ‘social contract’ between the teachers and their students,
and ‘terms of service’19 between the institution providing educational services and the
students increasingly framed as sovereign consumers making choices in the market of
educational services. The growing official import of the syllabus has had the effect that
educators have started to reflect on how the syllabus translates the power dynamics
into their classroom. For instance, the critical pedagogue Adam Heidebrink-Bruno has
demanded that the syllabus be re-conceived as a manifesto20—a document making
these concerns explicit. And indeed, many academics have started to experiment with
the form and purpose of the syllabus, opening it up to a process of co-conceptualization with their students, or proposing ‘the other syllabus’21 to disrupt asymmetries.
At the same time, universities are unsurprisingly moving their syllabi online. A migration
that can be read as indicative of three larger structural shifts in academia.
First, the push to make syllabi available online, initiated in the US, reinforces the differential effects of reputation economy. It is the Ivy League universities and their professorial star system that can harness the syllabus to advertise the originality of their
scholarship, while the underfunded public universities and junior academics are burdened with teaching the required essentials. This practice is tied up with the replication
in academia of the different valorization between what is considered to be the labor of
production (research) and that of social reproduction (teaching). The low esteem (and
corresponding lower rewards and remuneration) for the kinds of intellectual labors that
can be considered labors of care—editing journals, reviewing papers or marking, for
instance—fits perfectly well with the gendered legacies of the academic institution.

Stefano Harney and Fred Moten, The Undercommons: Fugitive Planning & Black Study, New York:
Autonomedia, 2013, p. 81.
17 Harney and Moten, The Undercommons, p. 110.
18 Harney and Moten, The Undercommons, p. 110.
19 Angela Jenks, ‘It’s In The Syllabus’, Teaching Tools, Cultural Anthropology website, 30 June 2016,
https://culanth.org/fieldsights/910-it-s-in-the-syllabu/.
20 Adam Heidebrink-Bruno, ‘Syllabus as Manifesto: A Critical Approach to Classroom Culture’,
Hybrid Pedagogy, 28 August 2014, http://hybridpedagogy.org/syllabus-manifesto-criticalapproach-classroom-culture/.
21 Lucy E. Bailey, ‘The “Other” Syllabus: Rendering Teaching Politics Visible in the Graduate
Pedagogy Seminar’, Feminist Teacher 20.2 (2010): 139–56.
16

124

STATE MACHINES

Second, with the withdrawal of resources to pay precarious and casualized academics during their ‘prep’ time (that is, the time in which they can develop new
course material, including assembling new lists of references, updating their courses as well as the methodologies through which they might deliver these), syllabi
now assume an ambivalent role between the tendencies for collectivization and
individualization of insecurity. The reading lists contained in syllabi are not covered
by copyrights; they are like playlists or recipes, which historically had the effect of
encouraging educators to exchange lesson plans and make their course outlines
freely available as a valuable knowledge common. Yet, in the current climate where
universities compete against each other, the authorial function is being extended
to these materials too. Recently, US universities have been leading a trend towards
the interpretation of the syllabus as copyrightable material, an interpretation that
opened up, as would be expected, a number of debates over who is a syllabus’
rightful owner, whether the academics themselves or their employers. If the latter interpretation were to prevail, this would enable universities to easily replace
academics while retaining their contributions to the pedagogical offer. The fruits of
a teacher’s labor could thus be turned into instruments of their own deskilling and
casualization: why would universities pay someone to write a course when they can
recycle someone else’s syllabus and get a PhD student or a precarious post doc to
teach the same class at a fraction of the price?
This tendency to introduce a logic of property therefore spurs competitive individualism and erasure of contributions from others. Thus, crowdsourcing the syllabus
in the context of growing precarization of labor risks remaining a partial process,
as it might heighten the anxieties of those educators who do not enjoy the security
of a stable job and who are therefore the most susceptible to the false promises of
copyright enforcement and authorship understood as a competitive, small entrepreneurial activity. However, when inserted in the context of live, broader political
struggles, the opening up of the syllabus could and should be an encouragement
to go in the opposite direction, providing a ground to legitimize the collective nature
of the educational process and to make all academic resources available without
copyright restrictions, while devising ways to secure the proper attribution and the
just remuneration of everyone’s labor.
The introduction of the logic of property is hard to challenge as it is furthered by commercial academic publishers. Oligopolists, such as Elsevier, are not only notorious for
using copyright protections to extract usurious profits from the mostly free labor of
those who write, peer review, and edit academic journals,22 but they are now developing all sorts of metadata, metrics, and workflow systems that are increasingly becoming central for teaching and research. In addition to their publishing business, Elsevier
has expanded its ‘research intelligence’ offering, which now encompasses a whole
range of digital services, including the Scopus citation database; Mendeley reference
manager; the research performance analytics tools SciVal and Research Metrics; the
centralized research management system Pure; the institutional repository and pub-

22 Vincent Larivière, Stefanie Haustein, and Philippe Mongeon, ‘The Oligopoly of Academic
Publishers in the Digital Era’, PLoS ONE 10.6 (10 June 2015),https://journals.plos.org/plosone/
article?id=10.1371/journal.pone.0127502/.

ACTIONS

125

lishing platform Bepress; and, last but not least, grant discovery and funding flow tools
Funding Institutional and Elsevier Funding Solutions. Given how central digital services
are becoming in today’s universities, whoever owns these platforms is the university.
Third, the migration online of the academic syllabus falls into larger efforts by universities to ‘disrupt’ the educational system through digital technologies. The introduction
of virtual learning environments has led to lesson plans, slides, notes, and syllabi becoming items to be deposited with the institution. The doors of public higher education are being opened to commercial qualification providers by means of the rise in
metrics-based management, digital platforming of university services, and transformation of students into consumers empowered to make ‘real-time’ decisions on how to
spend their student debt.23 Such neoliberalization masquerading behind digitization
is nowhere more evident than in the hype that was generated around Massive Open
Online Courses (MOOCs), exactly at the height of the last economic crisis.
MOOCs developed gradually from the Massachusetts Institute of Techology’s (MIT) initial experiments with opening up its teaching materials to the public through the OpenCourseWare project in 2001. By 2011, MOOCs were saluted as a full-on democratization of access to ‘Ivy-League-caliber education [for] the world’s poor.’24 And yet, their
promise quickly deflated following extremely low completion rates (as low as 5%).25
Believing that in fifty years there will be no more than 10 institutions globally delivering
higher education,26 by the end of 2013 Sebastian Thrun (Google’s celebrated roboticist
who in 2012 founded the for-profit MOOC platform Udacity), had to admit that Udacity
offered a ‘lousy product’ that proved to be a total failure with ‘students from difficult
neighborhoods, without good access to computers, and with all kinds of challenges in
their lives.’27 Critic Aaron Bady has thus rightfully argued that:
[MOOCs] demonstrate what the technology is not good at: accreditation and mass
education. The MOOC rewards self-directed learners who have the resources and
privilege that allow them to pursue learning for its own sake [...] MOOCs are also a
really poor way to make educational resources available to underserved and underprivileged communities, which has been the historical mission of public education.28
Indeed, the ‘historical mission of public education’ was always and remains to this
day highly contested terrain—the very idea of a public good being under attack by
dominant managerial techniques that try to redefine it, driving what Randy Martin

23 Ben Williamson, ‘Number Crunching: Transforming Higher Education into “Performance Data”’,
Medium, 16 August 2018, https://medium.com/ussbriefs/number-crunching-transforming-highereducation-into-performance-data-9c23debc4cf7.
24 Max Chafkin, ‘Udacity’s Sebastian Thrun, Godfather Of Free Online Education, Changes Course’,
FastCompany, 14 November 2013, https://www.fastcompany.com/3021473/udacity-sebastianthrun-uphill-climb/.
25 ‘The Rise (and Fall?) Of the MOOC’, Oxbridge Essays, 14 November 2017, https://www.
oxbridgeessays.com/blog/rise-fall-mooc/.
26 Steven Leckart, ‘The Stanford Education Experiment Could Change Higher Learning Forever’,
Wired, 20 March 2012, https://www.wired.com/2012/03/ff_aiclass/.
27 Chafkin, ‘Udacity’s Sebastian Thrun’.
28 Aaron Bady, ‘The MOOC Moment and the End of Reform’, Liberal Education 99.4 (Fall 2013),
https://www.aacu.org/publications-research/periodicals/mooc-moment-and-end-reform.

126

STATE MACHINES

aptly called the ‘financialization of daily life.’29 The failure of MOOCs finally points to a
broader question, also impacting the vicissitudes of #Syllabus: Where will actual study
practices find refuge in the social, once the social is made directly productive for capital at all times? Where will study actually ‘take place’, in the literal sense of the phrase,
claiming the resources that it needs for co-creation in terms of time, labor, and love?
Learning from #Syllabus
What have we learned from the #Syllabus phenomenon?
The syllabus is the manifesto of 21st century.
Political struggles against structural discrimination, oppression, and violence in the
present are continuing the legacy of critical pedagogies of earlier social movements
that coupled the process of political subjectivation with that of collective education.
By creating effective pedagogical tools, movements have brought educators and students into the fold of their struggles. In the context of our new network environment,
political struggles have produced a new media object: #Syllabus, a crowdsourced list
of resources—historic and present—relevant to a cause. By doing so, these struggles
adapt, resist, and live in and against the networks dominated by techno-capital, with
all of the difficulties and contradictions that entails.
What have we learned from the academic syllabus migrating online?
In the contemporary university, critical pedagogy is clashing head-on with the digitization of higher education. Education that should empower and research that should
emancipate are increasingly left out in the cold due to the data-driven marketization
of academia, short-cutting the goals of teaching and research to satisfy the fluctuating demands of labor market and financial speculation. Resistance against the capture of data, research workflows, and scholarship by means of digitization is a key
struggle for the future of mass intellectuality beyond exclusions of class, disability,
gender, and race.
What have we learned from #Syllabus as a media object?
As old formats transform into new media objects, the digital network environment defines the conditions in which these new media objects try to adjust, resist, and live. A
right intuition can intervene and change the landscape—not necessarily for the good,
particularly if the imperatives of capital accumulation and social control prevail. We
thus need to re-appropriate the process of production and distribution of #Syllabus
as a media object in its totality. We need to build tools to collectively control the workflows that are becoming the infrastructures on top of which we collaboratively produce
knowledge that is vital for us to adjust, resist, and live. In order to successfully intervene in the world, every aspect of production and distribution of these new media objects becomes relevant. Every single aspect counts. The order of items in a list counts.
The timestamp of every version of the list counts. The name of every contributor to

29 Randy Martin, Financialization Of Daily Life, Philadelphia: Temple University Press, 2002.

ACTIONS

127

every version of the list counts. Furthermore, the workflow to keep track of all of these
aspects is another complex media object—a software tool of its own—with its own order and its own versions. It is a recursive process of creating an autonomous ecology.
#Syllabus can be conceived as a recursive process of versioning lists, pointing to textual, audiovisual, or other resources. With all of the linked resources publicly accessible to all; with all versions of the lists editable by all; with all of the edits attributable to
their contributors; with all versions, all linked resources, all attributions preservable by
all, just such an autonomous ecology can be made for #Syllabus. In fact, Sean Dockray, Benjamin Forster, and Public Office have already proposed such a methodology in
their Hyperreadings, a forkable readme.md plaintext document on GitHub. They write:
A text that by its nature points to other texts, the syllabus is already a relational
document acknowledging its own position within a living field of knowledge. It is
decidedly not self-contained, however it often circulates as if it were.
If a syllabus circulated as a HyperReadings document, then it could point directly to the texts and other media that it aggregates. But just as easily as it circulates, a HyperReadings syllabus could be forked into new versions: the syllabus
is changed because there is a new essay out, or because of a political disagreement, or because following the syllabus produced new suggestions. These forks
become a family tree where one can follow branches and trace epistemological
mutations.30
It is in line with this vision, which we share with the HyperReadings crew, and in line
with our analysis, that we, as amateur librarians, activists, and educators, make our
promise beyond the limits of this text.
The workflow that we are bootstrapping here will keep in mind every aspect of the media object syllabus (order, timestamp, contributor, version changes), allowing diversity
via forking and branching, and making sure that every reference listed in a syllabus
will find its reference in a catalog which will lead to the actual material, in digital form,
needed for the syllabus.
Against the enclosures of copyright, we will continue building shadow libraries and
archives of struggles, providing access to resources needed for the collective processes of education.
Against the corporate platforming of workflows and metadata, we will work with social
movements, political initiatives, educators, and researchers to aggregate, annotate,
version, and preserve lists of resources.
Against the extractivism of academia, we will take care of the material conditions that
are needed for such collective thinking to take place, both on- and offline.

30 Sean Dockray, Benjamin Forster, and Public Office, ‘README.md’, Hyperreadings, 15 February
2018, https://samiz-dat.github.io/hyperreadings/.

128

STATE MACHINES

Bibliography
Bady, Aaron. ‘The MOOC Moment and the End of Reform’, Liberal Education 99.4 (Fall 2013), https://
www.aacu.org/publications-research/periodicals/mooc-moment-and-end-reform/.
Bailey, Lucy E. ‘The “Other” Syllabus: Rendering Teaching Politics Visible in the Graduate Pedagogy
Seminar’, Feminist Teacher 20.2 (2010): 139–56.
Chafkin, Max. ‘Udacity’s Sebastian Thrun, Godfather Of Free Online Education, Changes Course’,
FastCompany, 14 November 2013, https://www.fastcompany.com/3021473/udacity-sebastianthrun-uphill-climb/.
Chatelain, Marcia. ‘How to Teach Kids About What’s Happening in Ferguson’, The Atlantic, 25 August
2014, https://www.theatlantic.com/education/archive/2014/08/how-to-teach-kids-about-whatshappening-in-ferguson/379049/.
_____. ‘Teaching the #FergusonSyllabus’, Dissent Magazine, 28 November 2014, https://www.dissentmagazine.org/blog/teaching-ferguson-syllabus/.
Ciolkowski, Laura. ‘Rape Culture Syllabus’, Public Books, 15 October 2016, https://www.publicbooks.
org/rape-culture-syllabus/.
Connolly, N.D.B. and Keisha N. Blain. ‘Trump Syllabus 2.0’, Public Books, 28 June 2016, https://www.
publicbooks.org/trump-syllabus-2-0/.
Dockray, Sean, Benjamin Forster, and Public Office. ‘README.md’, HyperReadings, 15 February 2018,
https://samiz-dat.github.io/hyperreadings/.
Federici, Silvia, and Arlen Austin (eds) The New York Wages for Housework Committee 1972-1977: History, Theory, Documents, New York: Autonomedia, 2017.
Harney, Stefano, and Fred Moten, The Undercommons: Fugitive Planning & Black Study, New York:
Autonomedia, 2013.
Heidebrink-Bruno, Adam. ‘Syllabus as Manifesto: A Critical Approach to Classroom Culture’, Hybrid
Pedagogy, 28 August 2014, http://hybridpedagogy.org/syllabus-manifesto-critical-approach-classroom-culture/.
Jenks, Angela. ‘It’s In The Syllabus’, Teaching Tools, Cultural Anthropology website, 30 June 2016,
https://culanth.org/fieldsights/910-it-s-in-the-syllabus/.
Larivière, Vincent, Stefanie Haustein, and Philippe Mongeon, ‘The Oligopoly of Academic Publishers in the Digital Era’, PLoS ONE 10.6 (10 June 2015), https://journals.plos.org/plosone/
article?id=10.1371/journal.pone.0127502/.
Leckart, Steven. ‘The Stanford Education Experiment Could Change Higher Learning Forever’, Wired,
20 March 2012, https://www.wired.com/2012/03/ff_aiclass/.
Martin, Randy. Financialization Of Daily Life, Philadelphia: Temple University Press, 2002.
Perlstein, Daniel. ‘Teaching Freedom: SNCC and the Creation of the Mississippi Freedom Schools’,
History of Education Quarterly 30.3 (Autumn 1990).
Roberts, Frank Leon. ‘Black Lives Matter: Race, Resistance, and Populist Protest’, 2016, http://www.
blacklivesmattersyllabus.com/fall2016/.
‘#StandingRockSyllabus’, NYC Stands with Standing Rock, 11 October 2016, https://nycstandswithstandingrock.wordpress.com/standingrocksyllabus/.
Shirky, Clay. ‘Ontology Is Overrated: Categories, Links, and Tags’, 2005, http://shirky.com/writings/
herecomeseverybody/ontology_overrated.html.
‘The Rise (and Fall?) Of the MOOC’, Oxbridge Essays, 14 November 2017, https://www.oxbridgeessays.
com/blog/rise-fall-mooc/.
‘TNI Syllabus: Gaming and Feminism’, The New Inquiry, 2 September 2014, https://thenewinquiry.com/
tni-syllabus-gaming-and-feminism/.
‘Trump 101’, The Chronicle of Higher Education, 19 June 2016, https://www.chronicle.com/article/
Trump-Syllabus/236824/.
‘Wages for Housework: Pamphlets – Flyers – Photographs,’ MayDay Rooms, http://maydayrooms.org/
archives/wages-for-housework/wfhw-pamphlets-flyers-photographs/.
Williamson, Ben. ‘Number Crunching: Transforming Higher Education into “Performance Data”’,
Medium, 16 August 2018, https://medium.com/ussbriefs/number-crunching-transforming-highereducation-into-performance-data-9c23debc4cf7/.

WHW
There Is Something Political in the City Air
2016

What, How & for Whom / WHW

“There is something political in the city air”*

The curatorial collective What,
How & for Whom / WHW, based
in Zagreb and Berlin, examine
the interconnections between
contemporary art and political and
social strata, including the role of art
institutions in contemporary society.
In the present essay, their discussion
of recent projects they curated
highlights the struggle for access to
knowledge and the free distribution
of information, which in Croatia also
means confronting the pressures
of censorship and revisionism
in the writing of history and the
construction of the future.

Contemporary art’s attempts to come to terms with its evasions in delivering on the promise of its own intrinsic capacity to propose alternatives, and
to do better in the constant game of staying ahead of institutional closures
and marketization, are related to a broader malady in leftist politics. The
crisis of organizational models and modes of political action feels especially acute nowadays, after the latest waves of massive political mobilization
and upheaval embodied in such movements as the Arab Spring and Occupy and the widespread social protests in Southern Europe against austerity
measures – and the failure of these movements to bring about structural
changes. As we witnessed in the dramatic events that unfolded through the
spring and summer of 2015, even in Greece, where Syriza was brought to
power, the people’s will behind newly elected governments proved insufficient to change the course of austerity politics in Europe. Simultaneously,
a series of conditional gains and effective defeats gave rise to the alarming
ascent of radical right-wing populism, against which the left has failed to
provide any real vision or driving force.
Both the practice of political articulation and the political practices of
art have been affected by the hollowing and disabling of democracy related
to the ascendant hegemony of the neoliberal rationale that shapes every
domain of our lives in accordance with a specific image of economics,1
as well as the problematic “embrace of localism and autonomy by much
of the left as the pure strategy”2 and the left’s inability to destabilize the
dominant world-view and reclaim the future.3 Consequently, art practices
increasingly venture into novel modes of operation that seek to “expand
our collective imagination beyond what capitalism allows”.4 They not only
point to the problems but address them head on. By negotiating art’s autonomy and impact on the social, and by conceptualizing the whole edifice
of art as a social symptom, such practices attempt to do more than simply
squeeze novel ideas into exhausted artistic formats and endow them with
political content that produces “marks of distinction”,5 which capital then
exploits for the enhancement of its own reproduction.
The two projects visited in this text both work toward building truly
accessible public spaces. Public Library, launched by Marcell Mars and
Tomislav Medak in 2012, is an ongoing media and social project based on
ideas from the open-source software movement, while Autonomy Cube, by
artist Trevor Paglen and the hacker and computer security researcher Jacob Appelbaum, centres on anonymized internet usage in the post–Edward
*
1
2
3
4
5

David Harvey, Rebel Cities: From the Right to the City to the Urban Revolution, Verso, London and New York, 2012, p. 117.
See Wendy Brown, Undoing the Demos: Neoliberalism’s Stealth Revolution, Zone books,
New York, 2015.
Harvey, Rebel Cities, p. 83.
See Nick Srnicek and Alex Williams, Inventing the Future: Postcapitalism and a World
Without Work, Verso, London and New York, 2015.
Ibid., p. 495.
See Harvey, Rebel Cities, especially pp. 103–109.

“There is something political in the city air”

289

Snowden world of unprecedented institutionalized surveillance. Both projects operate in tacit alliance with art institutions that more often than not
are suffering from a kind of “mission drift” under pressure to align their
practices and structures with the profit sector, a situation that in recent
decades has gradually become the new norm.6 By working within and with
art institutions, both Public Library and Autonomy Cube induce the institutions to return to their initial mission of creating new common spaces
of socialization and political action. The projects develop counter-publics
and work with infrastructures, in the sense proposed by Keller Easterling:
not just physical networks but shared standards and ideas that constitute
points of contact and access between people and thus rule, govern, and
control the spaces in which we live.7
By building a repository of digitized books, and enabling others to do this
as well, Public Library promotes the idea of the library as a truly public institution that offers universal access to knowledge, which “together with
free public education, a free public healthcare, the scientific method, the
Universal Declaration of Human Rights, Wikipedia, and free software,
among others – we, the people, are most proud of ”, as the authors of the
project have said.8 Public Library develops devices for the free sharing of
books, but it also functions as a platform for advocating social solidarity
in free access to knowledge. By ignoring and avoiding the restrictive legal
regime for intellectual property, which was brought about by decades of
neoliberalism, as well as the privatization or closure of public institutions,
spatial controls, policing, and surveillance – all of which disable or restrict
possibilities for building new social relations and a new commons – Public
Library can be seen as part of the broader movement to resist neoliberal
austerity politics and the commodification of knowledge and education
and to appropriate public spaces and public goods for common purposes.
While Public Library is fully engaged with the movement to oppose the
copyright regime – which developed as a kind of rent for expropriating the
commons and reintroducing an artificial scarcity of cognitive goods that
could be reproduced virtually for free – the project is not under the spell of
digital fetishism, which until fairly recently celebrated a new digital commons as a non-frictional space of smooth collaboration where a new political and economic autonomy would be forged that would spill over and
undermine the real economy and permeate all spheres of life.9 As Matteo
Pasquinelli argues in his critique of “digitalism” and its celebration of the
6
7
8
9

See Brown, Undoing the Demos.
Keller Easterling, Extrastatecraft: The Power of Infrastructure Space, Verso, London and
New York, 2014.
Marcell Mars, Manar Zarroug, and Tomislav Medak, “Public Library”, in Public Library,
ed. Marcell Mars, Tomislav Medak, and What, How & for Whom / WHW, exh. publication, What, How & for Whom / WHW and Multimedia Institute, Zagreb, 2015, p. 78.
See Matteo Pasquinelli, Animal Spirits: A Bestiary of the Commons, NAi Publishers, Rotterdam, and Institute of Network Cultures, Amsterdam, 2008.

290

What, How & for Whom / WHW

virtues of the information economy with no concern about the material
basis of production, the information economy is a parasite on the material
economy and therefore “an accurate understanding of the common must
be always interlinked with the real physical forces producing it and the material economy surrounding it.”10
Public Library emancipates books from the restrictive copyright regime
and participates in the exchange of information enabled by digital technology, but it also acknowledges the labour and energy that make this possible. There is labour that goes into the cataloguing of the books, and labour
that goes into scanning them before they can be brought into the digital
realm of free reproduction, just as there are the ingenuity and labour of
the engineers who developed a special scanner that makes it easier to scan
books; also, the scanner needs to be installed, maintained, and fed books
over hours of work. This is where the institutional space of art comes in
handy by supporting the material production central to the Public Library
endeavour. But the scanner itself does not need to be visible. In 2014, at
the Museo Nacional Centro de Arte Reina Sofia in Madrid, we curated the
exhibition Really Useful Knowledge, which dealt with conflicts triggered by
struggles over access to knowledge and the effects that knowledge, as the
basis of capital reproduction, has on the totality of workers’ lives. In the
exhibition, the production funds allocated to Public Library were used to
build the book scanner at Calafou, an anarchist cooperative outside Barcelona. The books chosen for scanning were relevant to the exhibition’s
themes – methods of reciprocal learning and teaching, forms of social and
political organization, the history of the Spanish Civil War, etc. – and after
being scanned, they were uploaded to the Public Library website. All that
was visible in the exhibition itself was a kind of index card or business card
with a URL link to the Public Library website and a short statement (fig. 1):
A public library is:
• free access to books for every member of society
• library catalog
• librarian
With books ready to be shared, meticulously cataloged, everyone is a
librarian. When everyone is librarian, the library is everywhere.11
Public Library’s alliance with art institutions serves to strengthen the
cultural capital both for the general demand to free books from copyright
restrictions on cultural goods and for the project itself – such cultural capital could be useful in a potential lawsuit. Simultaneously, the presence and
realization of the Public Library project within an exhibition enlists the host
institution as part of the movement and exerts influence on it by taking
the museum’s public mission seriously and extending it into a grey zone of
10
11

Ibid., p. 29.
Mars, Zarroug, and Medak, “Public Library”, p. 85.

“There is something political in the city air”

291

questionable legality. The defence of the project becomes possible by making the traditional claim of the “autonomy” of art, which is not supposed
to assert any power beyond the museum walls. By taking art’s autonomy
at its word, and by testing the truth of the liberal-democratic claim that
the field of art is a field of unlimited freedom, Public Library engages in a
kind of “overidentification” game, or what Keller Easterling, writing about
the expanded activist repertoire in infrastructure space, calls “exaggerated
compliance”.12 Should the need arise, as in the case of a potential lawsuit
against the project, claims of autonomy and artistic freedom create a protective shroud of untouchability. And in this game of liberating books from
the parochial capitalist imagination that restricts their free circulation, the
institution becomes a complicit partner. The long-acknowledged insight
that institutions embrace and co-opt critique is, in this particular case, a
win-win situation, as Public Library uses the public status of the museum
as a springboard to establish the basic message of free access and the free
circulation of books and knowledge as common sense, while the museum
performs its mission of bringing knowledge to the public and supporting
creativity, in this case the reworking, rebuilding and reuse of technology
for the common good. The fact that the institution is not naive but complicit produces a synergy that enhances potentialities for influencing and
permeating the public sphere. The gesture of not exhibiting the scanner in
the museum has, among other things, a practical purpose, as more books
would be scanned voluntarily by the members of the anarchist commune
in Calafou than would be by the overworked museum staff, and employing
somebody to do this during the exhibition would be too expensive (and the
mantra of cuts, cuts, cuts would render negotiation futile). If there is a flirtatious nod to the strategic game of not exposing too much, it is directed less
toward the watchful eyes of the copyright police than toward the exhibition
regime of contemporary art group shows in which works compete for attention, the biggest scarcity of all. Public Library flatly rejects identification
with the object “our beloved bookscanner” (as the scanner is described on
the project website13), although it is an attractive object that could easily
be featured as a sculpture within the exhibition. But its efficacy and use
come first, as is also true of the enigmatic business card–like leaflet, which
attracts people to visit the Public Library website and use books, not only to
read them but also to add books to the library: doing this in the privacy of
one’s home on one’s own computer is certainly more effective than doing
it on a computer provided and displayed in the exhibition among the other
art objects, films, installations, texts, shops, cafés, corridors, exhibition
halls, elevators, signs, and crowds in a museum like Reina Sofia.
For the exhibition to include a scanner that was unlikely to be used or
a computer monitor that showed the website from which books might be
12
13

Easterling, Extrastatecraft, p. 492.
See https://www.memoryoftheworld.org/blog/2012/10/28/our-belovedbookscanner-2/ (accessed July 4, 2016).

292

What, How & for Whom / WHW

downloaded, but probably not read, would be the embodiment of what
philosopher Robert Pfaller calls “interpassivity”, the appearance of activity or a stand-in for it that in fact replaces any genuine engagement.14 For
Pfaller, interpassivity designates a flight from engagement, a misplaced libidinal investment that under the mask of enjoyment hides aversion to an
activity that one is supposed to enjoy, or more precisely: “Interpassivity is
the creation of a compromise between cultural interests and latent cultural
aversion.”15 Pfaller’s examples of participation in an enjoyable process that
is actually loathed include book collecting and the frantic photocopying of
articles in libraries (his book was originally published in 2002, when photocopying had not yet been completely replaced by downloading, bookmarking, etc.).16 But he also discusses contemporary art exhibitions as sites of
interpassivity, with their overabundance of objects and time-based works
that require time that nobody has, and with the figure of the curator on
whom enjoyment is displaced – the latter, he says, is a good example of
“delegated enjoyment”. By not providing the exhibition with a computer
from which books can be downloaded, the project ensures that books are
seen as vehicles of knowledge acquired by reading and not as immaterial
capital to be frantically exchanged; the undeniable pleasure of downloading and hoarding books is, after all, just one step removed from the playground of interpassivity that the exhibition site (also) is.
But Public Library is hardly making a moralistic statement about the
virtues of reading, nor does it believe that ignorance (such as could be
overcome by reading the library’s books) is the only obstacle that stands
in the way of ultimate emancipation. Rather, the project engages with, and
contributes to, the social practice that David Harvey calls “commoning”:
“an unstable and malleable social relation between a particular self-defined social group and those aspects of its actually existing or yet-to-becreated social and/or physical environment deemed crucial to its life and
livelihood”.17 Public Library works on the basis of commoning and tries to
enlist others to join it, which adds a distinctly political dimension to the
sabotage of intellectual property revenues and capital accumulation.
The political dimension of Public Library and the effort to form and
publicize the movement were expressed more explicitly in the Public Li14
15
16

17

Robert Pfaller, On the Pleasure Principle in Culture: Illusions Without Owners, Verso, London and New York, 2014.
Ibid., p. 76.
Pfaller’s book, which first appeared in German, was published in English only in 2014.
His ideas have gained greater relevance over time, not only as the shortcomings of the
immensely popular social media activism became apparent – where, as many critics
have noted, participation in political organizing and the articulation of political tasks
and agendas are often replaced by a click on an icon – but also because of Pfaller’s
broader argument about the self-deception at play in interpassivity and its role in eliciting enjoyment from austerity measures and other calamities imposed on the welfare
state by the neoliberal regime, which since early 2000 has exceeded even the most sober (and pessimistic) expectations.
Ibid., p. 73.

“There is something political in the city air”

293

brary exhibition in 2015 at Gallery Nova in Zagreb, where we have been
directing the programme since 2003. If the Public Library project was not
such an eminently collective practice that pays no heed to the author function, the Gallery Nova show might be considered something like a solo exhibition. As it was realized, the project again used art as an infrastructure
and resource to promote the movement of freeing books from copyright
restrictions while collecting legitimization points from the art world as enhanced cultural capital that could serve as armour against future attacks
by the defenders of the holy scripture of copyright laws. But here the more
important tactic was to show the movement as an army of many and to
strengthen it through self-presentation. The exhibition presented Public
Library as a collection of collections, and the repertory form (used in archive science to describe a collection) was taken as the basic narrative procedure. It mobilized and activated several archives and open digital repositories, such as MayDay Rooms from London, The Ignorant Schoolmaster and
His Committees from Belgrade, Library Genesis and Aaaaaarg.org, Catalogue
of Free Books, (Digitized) Praxis, the digitized work of the Midnight Notes
Collective, and Textz.com, with special emphasis on activating the digital
repositories UbuWeb and Monoskop. Not only did the exhibition attempt to
enlist the gallery audience but, equally important, the project was testing
its own strength in building, articulating, announcing, and proposing, or
speculating on, a broader movement to oppose the copyright of cultural
goods within and adjacent to the art field.
Presenting such a movement in an art institution changes one of the
basic tenets of art, and for an art institution the project’s main allure probably lies in this kind of expansion of the art field. A shared politics is welcome, but nothing makes an art institution so happy as the sense of purpose that a project like Public Library can endow it with. (This, of course,
comes with its own irony, for while art institutions nowadays compete for
projects that show emphatically how obsolete the aesthetic regime of art is,
they continue to base their claims of social influence on knowledge gained
through some form of aesthetic appreciation, however they go about explaining and justifying it.) At the same time, Public Library’s nonchalance
about institutional maladies and anxieties provides a homeopathic medicine whose effect is sometimes so strong that discussion about placebos
becomes, at least temporarily, beside the point. One occasion when Public
Library’s roving of the political terrain became blatantly direct was the exhibition Written-off: On the Occasion of the 20th Anniversary of Operation
Storm, which we organized in the summer of 2015 at Gallery Nova (figs.
2–4).
The exhibition/action Written-off was based on data from Ante Lesaja’s
extensive research on “library purification”, which he published in his book
Knjigocid: Uništavanje knjige u Hrvatskoj 1990-ih (Libricide: The Destruction
of Books in Croatia in the 1990s).18 People were invited to bring in copies of
18

Ante Lesaja, Knjigocid: Uništavanje knjige u Hrvatskoj 1990-ih, Profil and Srbsko narodno

294

What, How & for Whom / WHW

books that had been removed from Croatian public libraries in the 1990s.
The books were scanned and deposited in a digital archive; they then became available on a website established especially for the project. In Croatia during the 1990s, hundreds of thousands of books were removed from
schools and factories, from public, specialized, and private libraries, from
former Yugoslav People’s Army centres, socio-political organizations, and
elsewhere because of their ideologically inappropriate content, the alphabet they used (Serbian Cyrillic), or the ethnic or political background of the
authors. The books were mostly thrown into rubbish bins, discarded on
the street, destroyed, or recycled. What Lesaja’s research clearly shows is
that the destruction of the books – as well as the destruction of monuments
to the People’s Liberation War (World War II) – was not the result of individuals running amok, as official accounts preach, but a deliberate and systematic action that symbolically summarizes the dominant politics of the
1990s, in which war, rampant nationalism, and phrases about democracy
and sovereignty were used as a rhetorical cloak to cover the nakedness of
the capitalist counter-revolution and criminal processes of dispossession.
Written-off: On the Occasion of the 20th Anniversary of Operation Storm
set up scanners in the gallery, initiated a call for collecting and scanning
books that had been expunged from public institutions in the 1990s, and
outlined the criteria for the collection, which corresponded to the basic
domains in which the destruction of the books, as a form of censorship,
was originally implemented: books written in the Cyrillic alphabet or in
Serbian regardless of the alphabet; books forming a corpus of knowledge
about communism, especially Yugoslav communism, Yugoslav socialism,
and the history of the workers’ struggle; and books presenting the anti-Fascist and revolutionary character of the People’s Liberation Struggle during
World War II.
The exhibition/action was called Written-off because the removal and
destruction of the books were often presented as a legitimate procedure
of library maintenance, thus masking the fact that these books were unwanted, ideologically unacceptable, dangerous, harmful, unnecessary, etc.
Written-off unequivocally placed “book destruction” in the social context
of the period, when the destruction of “unwanted” monuments and books
was happening alongside the destruction of homes and the killing of “unwanted” citizens, outside of and prior to war operations. For this reason,
the exhibition was dedicated to the twentieth anniversary of Operation
Storm, the final military/police operation in what is called, locally, the
Croatian Homeland War.19
The exhibition was intended as a concrete intervention against a political logic that resulted in mass exile and killing, the history of which is
glossed over and critical discussion silenced, and also against the official
19

vijeće, Zagreb, 2012.
Known internationally as the Croatian War of Independence, the war was fought between Croatian forces and the Serb-controlled Yugoslav People’s Army from 1991 to
1995.

“There is something political in the city air”

295

celebrations of the anniversary, which glorified militarism and proclaimed
the ethical purity of the victory (resulting in the desired ethnic purity of the
nation).
As both symbolic intervention and real-life action, then, the exhibition
Written-off took place against a background of suppressed issues relating
to Operation Storm – ethno-nationalism as the flip side of neoliberalism,
justice and the present status of the victims and refugees, and the overall character of the war known officially as the Homeland War, in which
discussions about its prominent traits as a civil war are actively silenced
and increasingly prosecuted. In protest against the official celebrations
and military parades, the exhibition marked the anniversary of Operation
Storm with a collective action that evokes books as symbolic of a “knowledge society” in which knowledge becomes the location of conflictual engagement. It pointed toward the struggle over collective symbolic capital
and collective memory, in which culture as a form of the commons has a
direct bearing on the kind of place we live in. The Public Library project,
however, is engaged not so much with cultural memory and remembrance
as a form of recollection or testimony that might lend political legitimation
to artistic gestures; rather, it engages with history as a construction and
speculative proposition about the future, as Peter Osborne argues in his
polemical hypotheses on the notion of contemporary art that distinguishes
between “contemporary” and “present-day” art: “History is not just a relationship between the present and the past – it is equally about the future.
It is this speculative futural moment that definitively separates the concept
of history from memory.”20 For Public Library, the future that participates
in the construction of history does not yet exist, but it is defined as more
than just a project against the present as reflected in the exclusionary, parochially nationalistic, revisionist and increasingly fascist discursive practices of the Croatian political elites. Rather, the future comes into being as
an active and collective construction based on the emancipatory aspects of
historical experiences as future possibilities.
Although defined as an action, the project is not exultantly enthusiastic
about collectivity or the immediacy and affective affinities of its participants, but rather it transcends its local and transient character by taking
up the broader counter-hegemonic struggle for the mutual management
of joint resources. Its endeavour is not limited to the realm of the political
and ideological but is rooted in the repurposing of technological potentials
from the restrictive capitalist game and the reutilization of the existing infrastructure to build a qualitatively different one. While the culture industry adapts itself to the limited success of measures that are geared toward
preventing the free circulation of information by creating new strategies
for pushing information into a form of property and expropriating value

20

Peter Osborne, Anywhere or Not at All: Philosophy of Contemporary Art, Verso, London
and New York, 2013, p. 194.

296

What, How & for Whom / WHW

fig. 1
Marcell Mars, Art as Infrastructure: Public Library, installation
view, Really Useful Knowledge, curated by WHW, Museo
Nacional Centro de Arte Reina Sofia, Madrid, 2014.
Photo by Joaquin Cortes and Roman Lores / MNCARS.

fig. 2
Public Library, exhibition view, Gallery Nova, Zagreb, 2015.
Photo by Ivan Kuharic.

fig. 3
Written-off: On the Occasion of the 20th Anniversary of Operation
Storm, exhibition detail, Gallery Nova, Zagreb, 2015.
Photo by Ivan Kuharic.

fig. 4
Written-off: On the Occasion of the 20th Anniversary of Operation
Storm, exhibition detail, Gallery Nova, Zagreb, 2015.
Photo by Ivan Kuharic.

fig. 5
Trevor Paglen and Jacob Appelbaum, Autonomy Cube,
installation view, Really Useful Knowledge, curated by WHW,
Museo Nacional Centro de Arte Reina Sofia, Madrid, 2014.
Photo by Joaquín Cortés and Román Lores / MNCARS.

through the control of metadata (information about information),21 Public Library shifts the focus away from aesthetic intention – from unique,
closed, and discrete works – to a database of works and the metabolism
of the database. It creates values through indexing and connectivity, imagined communities and imaginative dialecticization. The web of interpenetration and determination activated by Public Library creates a pedagogical endeavour that also includes a propagandist thrust, if the notion of
propaganda can be recast in its original meaning as “things that must be
disseminated”.
A similar didactic impetus and constructivist praxis is present in the work
Autonomy Cube, which was developed through the combined expertise of
artist and geographer Trevor Paglen and internet security researcher, activist and hacker Jacob Appelbaum. This work, too, we presented in the
Reina Sofia exhibition Really Useful Knowledge, along with Public Library
and other projects that offered a range of strategies and methodologies
through which the artists attempted to think through the disjunction between concrete experience and the abstraction of capital, enlisting pedagogy as a crucial element in organized collective struggles. Autonomy Cube
offers a free, open-access, encrypted internet hotspot that routes internet
traffic over TOR, a volunteer-run global network of servers, relays, and services, which provides anonymous and unsurveilled communication. The
importance of the privacy of the anonymized information that Autonomy
Cube enables and protects is that it prevents so-called traffic analysis – the
tracking, analysis, and theft of metadata for the purpose of anticipating
people’s behaviour and relationships. In the hands of the surveillance
state this data becomes not only a means of steering our tastes, modes of
consumption, and behaviours for the sake of making profit but also, and
more crucially, an effective method and weapon of political control that
can affect political organizing in often still-unforeseeable ways that offer
few reasons for optimism. Visually, Autonomy Cube references minimalist
sculpture (fig. 5) (specifically, Hans Haacke’s seminal piece Condensation
Cube, 1963–1965), but its main creative drive lies in the affirmative salvaging of technologies, infrastructures, and networks that form both the leading organizing principle and the pervasive condition of complex societies,
with the aim of supporting the potentially liberated accumulation of collective knowledge and action. Aesthetic and art-historical references serve
as camouflage or tools for a strategic infiltration that enables expansion of
the movement’s field of influence and the projection of a different (contingent) future. Engagement with historical forms of challenging institutions
becomes the starting point of a poetic praxis that materializes the object of
its striving in the here and now.
Both Public Library and Autonomy Cube build their autonomy on the dedi21

McKenzie Wark, “Metadata Punk”, in Public Library, pp. 113–117 (see n. 9).

“There is something political in the city air”

305

cation and effort of the collective body, without which they would not
exist, rendering this interdependence not as some consensual idyll of cooperation but as conflicting fields that create further information and experiences. By doing so, they question the traditional edifice of art in a way
that supports Peter Osborne’s claim that art is defined not by its aesthetic
or medium-based status, but by its poetics: “Postconceptual art articulates a post-aesthetic poetics.”22 This means going beyond criticality and
bringing into the world something defined not by its opposition to the real,
but by its creation of the fiction of a shared present, which, for Osborne,
is what makes art truly contemporary. And if projects like these become a
kind of political trophy for art institutions, the side the institutions choose
nevertheless affects the common sense of our future.

22

Osborne, Anywhere or Not at All, p. 33.

306

What, How & for Whom / WHW

“There is something political in the city air”

307

Fuller
The Indexalist
2016

## The Indexalist

### From Mondotheque

#####

[Matthew Fuller](/wiki/index.php?title=Matthew_Fuller "Matthew Fuller")

I first spoke to the patient in the last week of that August. That evening the
sun was tender in drawing its shadows across the lines of his face. The eyes
gazed softly into a close middle distance, as if composing a line upon a
translucent page hung in the middle of the air, the hands tapping out a stanza
or two of music on legs covered by the brown folds of a towelling dressing
gown. He had the air of someone who had seen something of great amazement but
yet lacked the means to put it into language. As I got to know the patient
over the next few weeks I learned that this was not for the want of effort.

In his youth he had dabbled with the world-speak language Volapük, one
designed to do away with the incompatibility of tongues, to establish a
standard in which scientific intercourse might be conducted with maximum
efficiency and with minimal friction in movement between minds, laboratories
and publications. Latin biological names, the magnificent table of elements,
metric units of measurement, the nomenclature of celestial objects from clouds
to planets, anatomical parts and medical conditions all had their own systems
of naming beyond any specific tongue. This was an attempt to bring reason into
speech and record, but there were other means to do so when reality resisted
these early measures.

The dabbling, he reflected, had become a little more than that. He had
subscribed to journals in the language, he wrote letters to colleagues and
received them in return. A few words of world-speak remained readily on his
tongue, words that he spat out regularly into the yellow-wallpapered lounge of
the sanatorium with a disgust that was lugubriously palpable.

According to my records, and in piecing together the notes of previous
doctors, there was something else however, something more profound that the
language only hinted at. Just as the postal system did not require the
adoption of any language in particular but had its formats that integrated
them into addressee, address line, postal town and country, something that
organised the span of the earth, so there was a sense of the patient as having
sustained an encounter with a fundamental form of organisation that mapped out
his soul. More thrilling than the question of language indeed was that of the
system of organisation upon which linguistic symbols are inscribed. I present
for the reader’s contemplation some statements typical of those he seemed to
mull over.

“The index card system spoke to my soul. Suffice it to say that in its use I
enjoyed the highest form of spiritual pleasure, and organisational efficiency,
a profound flowering of intellect in which every thought moved between its
enunciation, evidence, reference and articulation in a mellifluous flow of
ideation and the gratification of curiosity.” This sense of the soul as a
roving enquiry moving across eras, across forms of knowledge and through the
serried landscapes of the vast planet and cosmos was returned to over and
over, a sense that an inexplicable force was within him yet always escaping
his touch.

“At every reference stood another reference, each more interesting than the
last. Each the apex of a pyramid of further reading, pregnant with the threat
of digression, each a thin high wire which, if not observed might lead the
author into the fall of error, a finding already found against and written
up.” He mentions too, a number of times, the way the furniture seemed to
assist his thoughts - the ease of reference implied by the way in which the
desk aligned with the text resting upon the pages of the off-print, journal,
newspaper, blueprint or book above which further drawers of cards stood ready
in their cabinet. All were integrated into the system. And yet, amidst these
frenetic recollections there was a note of mourning in his contemplative
moods, “The superposition of all planes of enquiry and of thought in one
system repels those for whom such harmonious speed is suspicious.” This
thought was delivered with a stare that was not exactly one of accusation, but
that lingered with the impression that there was a further statement to follow
it, and another, queued up ready to follow.

As I gained the trust of the patient, there was a sense in which he estimated
me as something of a junior collaborator, a clerk to his natural role as
manager. A lucky, if slightly doubtful, young man whom he might mentor into
efficiency and a state of full access to information. For his world, there was
not the corruption and tiredness of the old methods. Ideas moved faster in his
mind than they might now across the world. To possess a register of thoughts
covering a period of some years is to have an asset, the value of which is
almost incalculable. That it can answer any question respecting any thought
about which one has had an enquiry is but the smallest of its merits. More
important is the fact that it continually calls attention to matters requiring
such attention.

Much of his discourse was about the optimum means of arrangement of the
system, there was an art to laying out the cards. As the patient further
explained, to meet the objection that loose cards may easily be mislaid, cards
may be tabbed with numbers from one to ten. When arranged in the drawer, these
tabs proceed from left to right across the drawer and the absence of a single
card can thus easily be detected. The cards are further arranged between
coloured guide cards. As an alternative to tabbed cards, signal flags may be
used. Here, metal clips may be attached to the top end of the card and that
stand out like guides. For use of the system in relation to dates of the
month, the card is printed with the numbers 1 to 31 at the top. The metal clip
is placed as a signal to indicate the card is to receive attention on the
specified day. Within a large organisation a further card can be drawn up to
assign responsibility for processing that date’s cards. There were numerous
means of working the cards, special techniques for integrating them into any
type of research or organisation, means by which indexes operating on indexes
could open mines of information and expand the knowledge and capabilities of
mankind.

As he pressed me further, I began to experiment with such methods myself by
withdrawing data from the sanatorium’s records and transferring it to cards in
the night. The advantages of the system are overwhelming. Cards, cut to the
right mathematical degree of accuracy, arrayed readily in drawers, set in
cabinets of standard sizes that may be added to at ease, may be apportioned
out amongst any number of enquirers, all of whom may work on them
independently and simultaneously. The bound book, by contrast, may only be
used by one person at a time and that must stay upon a shelf itself referred
to by an index card system. I began to set up a structure of rows of mirrors
on chains and pulleys and a set of levered and hinged mechanical arms to allow
me to open the drawers and to privately consult my files from any location
within the sanatorium. The clarity of the image is however so far too much
effaced by the diffusion of light across the system.

It must further be borne in mind that a system thus capable of indefinite
expansion obviates the necessity for hampering a researcher with furniture or
appliances of a larger size than are immediately required. The continuous and
orderly sequence of the cards may be extended further into the domain of
furniture and to the conduct of business and daily life. Reasoning, reference
and the order of ideas emerging as they embrace and articulate a chaotic world
and then communicate amongst themselves turning the world in turn into
something resembling the process of thought in an endless process of
consulting, rephrasing, adding and sorting.

For the patient, ideas flowed like a force of life, oblivious to any unnatural
limitation. Thought became, with the proper use of the system, part of the
stream of life itself. Thought moved through the cards not simply at the
superficial level of the movement of fingers and the mechanical sliding and
bunching of cards, but at the most profound depths of the movement between
reality and our ideas of it. The organisational grace to be found in
arrangement, classification and indexing still stirred the remnants of his
nervous system until the last day.

Last Revision: 2*08*2016

Retrieved from

[https://www.mondotheque.be/wiki/index.php?title=The_Indexalist&oldid=8448](https://www.mondotheque.be/wiki/index.php?title=The_Indexalist&oldid=8448)

1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx*2)), $2}' > freq.$i.txt; done && rm temp.txt

* 2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR
line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR
line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf*$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

* 3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR

Barok
Poetics of Research
2014

_An unedited version of a talk given at the conference[Public
Library](http://www.wkv-stuttgart.de/en/program/2014/events/public-library/)
held at Württembergischer Kunstverein Stuttgart, 1 November 2014._

_Bracketed sequences are to be reformulated._

Poetics of Research

In this talk I'm going to attempt to identify [particular] cultural
algorithms, ie. processes in which cultural practises and software meet. With
them a sphere is implied in which algorithms gather to form bodies of
practices and in which cultures gather around algorithms. I'm going to
approach them through the perspective of my practice as a cultural worker,
editor and artist, considering practice in the same rank as theory and
poetics, and where theorization of practice can also lead to the
identification of poetical devices.

The primary motivation for this talk is an attempt to figure out where do we
stand as operators, users [and communities] gathering around infrastructures
containing a massive body of text (among other things) and what sort of things
might be considered to make a difference [or to keep making difference].

The talk mainly [considers] the role of text and the word in research, by way
of several figures.

A

A reference, list, scheme, table, index; those things that intervene in the
flow of narrative, illustrating the point, perhaps in a more economic way than
the linear text would do. Yet they don't function as pictures, they are
primarily texts, arranged in figures. Their forms have been
standardised[normalised] over centuries, withstood the transition to the
digital without any significant change, being completely intuitive to the
modern reader. Compared to the body of text they are secondary, run parallel
to it. Their function is however different to that of the punctuation. They
are there neither to shape the narrative nor to aid structuring the argument
into logical blocks. Nor is their function spatial, like in visual poems.
Their positions within a document are determined according to the sequential
order of the text, [standing as attachments] and are there to clarify the
nature of relations among elements of the subject-matter, or to establish
relations with other documents. The [premise] of my talk is that these
_textual figures_ also came to serve as the abstract[relational] models
determining possible relations among documents as such, and in consequence [to
structure conditions [of research]].

B

It can be said that research, as inquiry into a subject-matter, consists of
discrete queries. A query, such as a question about what something is, what
kinds, parts and properties does it have, and so on, can be consulted in
existing documents or generate new documents based on collection of data [in]
the field and through experiment, before proceeding to reasoning [arguments
and deductions]. Formulation of a query is determined by protocols providing
access to documents, which means that there is a difference between collecting
data outside the archive (the undocumented, ie. in the field and through
experiment), consulting with a person--an archivist (expert, librarian,
documentalist), and consulting with a database storing documents. The
phenomena such as [deepening] of specialization and throughout digitization
[have given] privilege to the database as [a|the] [fundamental] means for
research. Obviously, this is a very recent [phenomenon]. Queries were once
formulated in natural language; now, given the fact that databases are queried
[using] SQL language, their interfaces are mere extensions of it and
researchers pose their questions by manipulating dropdowns, checkboxes and
input boxes mashed together on a flat screen being ran by software that in
turn translates them into a long line of conditioned _SELECTs_ and _JOINs_
performed on tables of data.

Specialization, digitization and networking have changed the language of
questioning. Inquiry, once attached to the flesh and paper has been
[entrusted] to the digital and networked. Researchers are querying the black
box.

C

Searching in a collection of [amassed/assembled] [tangible] documents (ie.
bookshelf) is different from searching in a systematically structured
repository (library) and even more so from searching in a digital repository
(digital library). Not that they are mutually exclusive. One can devise
structures and algorithms to search through a printed text, or read books in a
library one by one. They are rather [models] [embodying] various [processes]
associated with the query. These properties of the query might be called [the
sequence], the structure and the index. If they are present in the ways of
querying documents, and we will return to this issue, are they persistent
within the inquiry as such? [wait]

D

This question itself is a rupture in the sequence. It makes a demand to depart
from one narrative [a continuous flow of words] to another, to figure out,
while remaining bound to it [it would be even more as a so-called rhetorical
question]. So there has been one sequence, or line, of the inquiry--about the
kinds of the query and its properties. That sequence itself is a digression,
from within the sequence about what is research and describing its parts
(queries). We are thus returning to it and continue with a question whether
the properties of the inquiry are the same as the properties of the query.

E

But isn't it true that every single utterance occurring in a sequence yields a
query as well? Let's consider the word _utterance_. [wait] It can produce a
number of associations, for example with how Foucault employs the notion of
_énoncé_ in his _Archaeology of Knowledge_ , giving hard time to his English
translators wondering whether _utterance_ or _statement_ is more appropriate,
or whether they are interchangeable, and what impact would each choice have on
his reception in the Anglophone world. Limiting ourselves to textual forms for
now (and not translating his work but pursing a different inquiry), let us say
the utterance is a word [or a phrase or an idiom] in a sequence such as a
sentence, a paragraph, or a document.

## (F) The
structure[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=1
"Edit section: $F$ The structure")]

This distinction is as old as recorded Western thought since both Plato and
Aristotle differentiate between a word on its own ("the said", a thing said)
and words in the company of other words. For example, Aristotle's _Categories_
[lay] on the [notion] of words on their own, and they are made the subject-
matter of that inquiry. [For him], the ambiguity of connotation words
[produce] lies in their synonymity, understood differently from the moderns--
not as more words denoting a similar thing but rather one word denoting
various things. Categories were outlined as a device to differentiate among
words according to kinds of these things. Every word as such belonged to not
less and not more than one of ten categories.

So it happens to the word _utterance_ , as to any other word uttered in a
sequence, that it poses a question, a query about what share of the spectrum
of possibly denoted things might yield as the most appropriate in a given
context. The more context the more precise share comes to the fore. When taken
out of the context ambiguity prevails as the spectrum unveils in its variety.

Thus single words [as any other utterances] are questions, queries,
themselves, and by occuring in statements, in context, their [means] are being
singled out.

This process is _conditioned_ by what has been formalized as the techniques of
_regulating_ definitions of words.

### (G) The structure: words as
words[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=2
"Edit section: $G$ The structure: words as words")]

* [![](/images/thumb/c/c8/Philitas_in_P.Oxy.XX_2260_i.jpg/144px-Philitas_in_P.Oxy.XX_2260_i.jpg)](/File:Philitas_in_P.Oxy.XX_2260_i.jpg)

P.Oxy.XX 2260 i: Oxyrhynchus papyrus XX, 2260, column i, with quotation from
Philitas, early 2nd c. CE. ¹(http://163.1.169.40/cgi-
bin/library?e=q-000-00---0POxy--00-0-0--0prompt-10---4------0-1l--1-en-50---
20-about-2260--
00031-001-0-0utfZz-8-00&a=d&c=POxy&cl=search&d=HASH13af60895d5e9b50907367)
²(http://en.wikipedia.org/wiki/File:POxy.XX.2260.i-Philitas-
highlight.jpeg)

* [![](/images/thumb/9/9e/Cyclopaedia_1728_page_210_Dictionary_entry.jpg/88px-Cyclopaedia_1728_page_210_Dictionary_entry.jpg)](/File:Cyclopaedia_1728_page_210_Dictionary_entry.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , 1728, p. 210. ³(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0576&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/b/b8/Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg/160px-Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)](/File:Detail_from_the_Liddell-Scott_Greek-English_Lexicon_c1843.jpg)

Detail from the Liddell-Scott Greek-English Lexicon, c1843.

Dictionaries have had a long life. The ancient Greek scholar and poet Philitas
of Cos living in the 4th c. BCE wrote a vocabulary explaining the meanings of
rare Homeric and other literary words, words from local dialects, and
technical terms. The vocabulary, called _Disorderly Words_ (Átaktoi glôssai),
has been lost, with a few fragments quoted by later authors. One example is
that the word πέλλα (pélla) meant "wine cup" in the ancient Greek region of
Boeotia; contrasted to the same word meaning "milk pail" in Homer's _Iliad_.

Not much has changed in the way how dictionaries constitute order. Selected
archives of statements are queried to yield occurrences of particular words,
various _criteria[indicators]_ are applied to filtering and sorting them and
in turn the spectrum of [denoted] things allocated in this way is structured
into groups and subgroups which are then given, according to other set of
rules, shorter or longer names. These constitute facets of [potential]
meanings of a word.

So there are at least _four_ sets of conditions [structuring] dictionaries.
One is required to delimit an archive[corpus of texts], one to select and give
preference[weights] to occurrences of a word, another to cluster them, and yet
another to abstract[generalize] the subject-matter of each of these clusters.
Needless to say, this is a craft of a few and these criteria are rarely being
disclosed, despite their impact on research, and more generally, their
influence as conditions for production[making] of a so called _common sense_.

It doesn't take that much to reimagine what a dictionary is and what it could
be, especially having large specialized corpora of texts at hand. These can
also serve as aids in production of new words and new meanings.

### (H) The structure: words as knowledge and the
world[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=3
"Edit section: $H$ The structure: words as knowledge and the world")]

* [![](/images/thumb/0/02/Boethius_Porphyrys_Isagoge.jpg/120px-Boethius_Porphyrys_Isagoge.jpg)](/File:Boethius_Porphyrys_Isagoge.jpg)

Boethius's rendering of a classification tree described in Porphyry's Isagoge
(3th c.), [6th c.] 10th c.
⁴(http://www.e-codices.unifr.ch/en/sbe/0315/53/medium)

* [![](/images/thumb/d/d0/Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg/94px-Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)](/File:Cyclopaedia_1728_page_ii_Division_of_Knowledge.jpg)

Ephraim Chambers, _Cyclopaedia, or an Universal Dictionary of Arts and
Sciences_ , London, 1728, p. II. ⁵(http://digicoll.library.wisc.edu/cgi-
bin/HistSciTech/HistSciTech-
idx?type=turn&entity=HistSciTech.Cyclopaedia01.p0015&id=HistSciTech.Cyclopaedia01&isize=L)

* [![](/images/thumb/d/d6/Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg/116px-Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)](/File:Encyclopedie_1751_Systeme_figure_des_connaissances_humaines.jpg)

Système figuré des connaissances humaines, _Encyclopédie ou Dictionnaire
raisonné des sciences, des arts et des métiers_ , 1751.
⁶(http://encyclopedie.uchicago.edu/content/syst%C3%A8me-figur%C3%A9-des-
connaissances-humaines)

* [![](/images/thumb/9/96/Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg/96px-Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)](/File:Haeckel_Ernst_1874_Stammbaum_des_Menschen.jpg)

Haeckel - Darwin's tree.

Another _formalized_ and [internalized] process being at play when figuring
out a word is its [containment]. Word is not only structured by way of things
it potentially denotes but also by words it is potentially part of and those
it contains.

The fuzz around categorization of knowledge _and_ the world in the Western
thought can be traced back to Porphyry, if not further. In his introduction to
Aristotle's _Categories_ this 3rd century AD Neoplatonist began expanding the
notions of genus and species into their hypothetic consequences. Aristotle's
brief work outlines ten categories of 'things that are said' (legomena,
λεγόμενα), namely substance (or substantive, {not the same as matter!},
οὐσία), quantity (ποσόν), qualification (ποιόν), a relation (πρός), where
(ποῦ), when (πότε), being-in-a-position (κεῖσθαι), having (or state,
condition, ἔχειν), doing (ποιεῖν), and being-affected (πάσχειν). In his
different work, _Topics_ , Aristotle outlines four kinds of subjects/materials
indicated in propositions/problems from which arguments/deductions start.
These are a definition (όρος), a genus (γένος), a property (ἴδιος), and an
accident (συμβεβηϰόϛ). Porphyry does not explicitly refer _Topics_ , and says
he omits speaking "about genera and species, as to whether they subsist (in
the nature of things) or in mere conceptions only"
⁸(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C1),
which means he avoids explicating whether he talks about kinds of concepts or
kinds of things in the sensible world. However, the work sparked confusion, as
the following passage [suggests]:

> "[I]n each category there are certain things most generic, and again, others
most special, and between the most generic and the most special, others which
are alike called both genera and species, but the most generic is that above
which there cannot be another superior genus, and the most special that below
which there cannot be another inferior species. Between the most generic and
the most special, there are others which are alike both genera and species,
referred, nevertheless, to different things, but what is stated may become
clear in one category. Substance indeed, is itself genus, under this is body,
under body animated body, under which is animal, under animal rational animal,
under which is man, under man Socrates, Plato, and men particularly." (Owen
1853,
⁹(http://www.ccel.org/ccel/pearse/morefathers/files/porphyry_isagogue_02_translation.htm#C2))

Porphyry took one of Aristotle's ten categories of the word, substance, and
dissected it using one of his four rhetorical devices, genus. Employing
Aristotle's categories, genera and species as means for logical operations,
for dialectic, Porphyry's interpretation resulted in having more resemblance
to the perceived _structures_ of the world. So they began to bloom.

There were earlier examples, but Porphyry was the most influential in
injecting the _universalist_ version of classification [implying] the figure
of a tree into the [locus] of Aristotle's thought. Knowledge became
monotheistic.

Classification schemes [growing from one point] play a major role in
untangling the format of modern encyclopedia from that of the dictionary
governed by alphabet. Two of the most influential encyclopedias of the 18th
century are cases in the point. Although still keeping 'dictionary' in their
titles, they are conceived not to represent words but knowledge. The [upper-
most] genus of the body was set as the body of knowledge. The English
_Cyclopaedia, or an Universal Dictionary of Arts and Sciences_ (1728) splits
into two main branches: "natural and scientifical" and "artificial and
technical"; these further split down to 47 classes in total, each carrying a
structured list (on the following pages) of thematic articles, serving as
table of contents. The French _Encyclopedia: or a Systematic Dictionary of the
Sciences, Arts, and Crafts_ (1751) [unwinds] from judgement ( _entendement_ ),
branches into memory as history, reason as philosophy, and imagination as
poetry. The logic of containers was employed as an aid not only to deal with
the enormous task of naming and not omiting anything from what is known, but
also for the management of labour of hundreds of writers and researchers, to
create a mechanism for delegating work and the distribution of
responsibilities. Flesh was also more present, in the field research, with
researchers attending workshops and sites of everyday life to annotate it.

The world came forward to unshine the word in other schemes. Darwin's tree of
evolution and some of the modern document classification systems such as
Charles A. Cutter's _Expansive Classification_ (1882) set to classify the
world itself and set the field for what has came to be known as authority
lists structuring metadata in today's computing.

### The structure
(summary)[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=4
"Edit section: The structure $summary$")]

Facetization of meaning and branching of knowledge are both the domain of the
unit of utterance.

While lexicographers[dictionarists] structure thought through multi-layered
processes of abstraction of the written record, knowledge growers dissect it
into hierarchies of [mutually] contained notions.

One seek to describe the word as a faceted list of small worlds, another to
describe the world as a structured lists of words. One play prime in the
domain of epistemology, in what is known, controlling the vocabulary, another
in the domain of ontology, in what is, controlling reality.

Every [word] has its given things, every thing has its place, closer or
further from a single word.

The schism between classifying words and classifying the world implies it is
not possible to construct a universal classification scheme[system]. On top of
that, any classification system of words is bound to a corpus of texts it is
operating upon and any classification system of the world again operates with
words which are bound to a vocabulary[lexicon] which is again bound to a
corpus [of texts]. It doesn't mean it would prevent people from trying.
Classifications function as descriptors of and 'inscriptors' upon the world,
imprinting their authority. They operate from [a locus of] their
corpus[context]-specificity. The larger the corpus, the more power it has on
shaping the world, as far as the word shapes it (yes, I do imply Google here,
for which it is a domain to be potentially exploited).

## (J) The
sequence[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=5
"Edit section: $J$ The sequence")]

The structure-yielding query [of] the single word [shrinks][zuzuje
sa,spresnuje] with preceding and following words. Inquiry proceeds in the flow
that establishes another kind[mode] of relationality, chaining words into the
sequence. While the structuring property of the query brings words apart from
each other, its sequential property establishes continuity and brings these
units into an ordered set.

This is what is responsible for attaching textual figures mentioned earlier
(lists, schemes, tables) to the body of the text. Associations can be also
stated explicitly, by indexing tables and then referring them from a
particular point in the text. The same goes for explicit associations made
between blocks of the text by means of indexed paragraphs, chapters or pages.

From this follows that all utterances point to the following utterance by the
nature of sequential order, and indexing provides means for pointing elsewhere
in the document as well.

A lot can be said about references to other texts. Here, to spare time, I
would refer you to a talk I gave a few months ago and which is online
¹⁰(http://monoskop.org/Talks/Communing_Texts).

This is still the realm of print. What happens with document when it is
digitized?

Digitization breaks a document into units of which each is assigned a numbered
position in the sequence of the document. From this perspective digitization
can be viewed as a total indexation of the document. It is converted into
units rendered for machine operations. This sequentiality is made explicit, by
means of an underlying index.

Sequences and chains are orders of one dimension. Their one-dimensional
ordering allows addressability of each element and [random] access. [Jumps]
between [random] addresses are still sequential, processing elements one at a
time.

## (K) The
index[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=6
"Edit section: $K$ The index")]

* [![](/images/thumb/2/27/Summa_confessorum.1310.jpg/103px-Summa_confessorum.1310.jpg)](/File:Summa_confessorum.1310.jpg)

Summa confessorum [1297-98], 1310.
⁷(http://www.bl.uk/onlinegallery/onlineex/illmanus/roymanucoll/j/011roy000008g11u00002000.html)

[The] sequencing not only weaves words into statements but activates other
temporalities, and _presents occurrences of words from past statements_. As
now when I am saying the word _utterance_ , each time there surface contexts
in which I have used it earlier.

A long quote from Frederick G. Kilgour, _The Evolution of the Book_ , 1998, pp
76-77:

> "A century of invention of various types of indexes and reference tools
preceded the advent of the first subject index to a specific book, which
occurred in the last years of the thirteenth century. The first subject
indexes were "distinctions," collections of "various figurative or symbolic
meanings of a noun found in the scriptures" that "are the earliest of all
alphabetical tools aside from dictionaries." (Richard and Mary Rouse supply an
example: "Horse = Preacher. Job 39: 'Hast thou given the horse strength, or
encircled his neck with whinning?')

>

> [Concordance] By the end of the third decade of the thirteenth century Hugh
de Saint-Cher had produced the first word concordance. It was a simple word
index of the Bible, with every location of each word listed by [its position
in the Bible specified by book, chapter, and letter indicating part of the
chapter]. Hugh organized several dozen men, assigning to each man an initial
letter to search; for example, the man assigned M was to go through the entire
Bible, list each word beginning with M and give its location. As it was soon
perceived that this original reference work would be even more useful if words
were cited in context, a second concordance was produced, with each word in
lengthy context, but it proved to be unwieldy. [Soon] a third version was
produced, with words in contexts of four to seven words, the model for
biblical concordances ever since.

>

> [Subject index] The subject index, also an innovation of the thirteenth
century, evolved over the same period as did the concordance. Most of the
early topical indexes were designed for writing sermons; some were organized,
while others were apparently sequential without any arrangement. By midcentury
the entries were in alphabetical order, except for a few in some classified
arrangement. Until the end of the century these alphabetical reference works
indexed a small group of books. Finally John of Freiburg added an alphabetical
subject index to his own book, _Summa Confessorum_ (1297—1298). As the Rouses
have put it, 'By the end of the [13]th century the practical utility of the
subject index is taken for granted by the literate West, no longer solely as
an aid for preachers, but also in the disciplines of theology, philosophy, and
both kinds of law.'"

In one sense neither subject-index nor concordane are indexes, they are words
or group of words selected according to given criteria from the body of the
text, each accompanied with a list of identifiers. These identifiers are
elements of an index, whether they represent a page, chapter, column, or other
[kind of] block of text. Every identifier is an unique _address_.

The index is thus an ordering of a sequence by means of associating its
elements with a set of symbols, when each element is given unique combination
of symbols. Different sizes of sets yield different number of variations.
Symbol sets such as an alphabet, arabic numerals, roman numerals, and binary
digits have different proportions between the length of a string of symbols
and the number of possible variations it can contain. Thus two symbols of
English alphabet can store 26^2 various values, of arabic numerals 10^2, of
roman numberals 8^2 and of binary digits 2^2.

Indexation is segmentation, a breaking into segments. From as early as the
13th century the index such as that of sections has served as enabler of
search. The more [detailed] indexation the more precise search results it
enables.

The subject-index and concordance are tables of search results. There is a
direct lineage from the 13th-century biblical concordances and the birth of
computational linguistic analysis, they were both initiated and realised by
priests.

During the World War II, Jesuit Father Roberto Busa began to look for machines
for the automation of the linguistic analysis of the 11 million-word Latin
corpus of Thomas Aquinas and related authors.

Working on his Ph.D. thesis on the concept of _praesens_ in Aquinas he
realised two things:

> "I realized first that a philological and lexicographical inquiry into the
verbal system of an author has t o precede and prepare for a doctrinal
interpretation of his works. Each writer expresses his conceptual system in
and through his verbal system, with the consequence that the reader who
masters this verbal system, using his own conceptual system, has to get an
insight into the writer's conceptual system. The reader should not simply
attach t o the words he reads the significance they have in his mind, but
should try t o find out what significance they had in the writer's mind.
Second, I realized that all functional or grammatical words (which in my mind
are not 'empty' at all but philosophically rich) manifest the deepest logic of
being which generates the basic structures of human discourse. It is .this
basic logic that allows the transfer from what the words mean today t o what
they meant to the writer.

>

> In the works of every philosopher there are two philosophies: the one which
he consciously intends to express and the one he actually uses to express it.
The structure of each sentence implies in itself some philosophical
assumptions and truths. In this light, one can legitimately criticize a
philosopher only when these two philosophies are in contradiction."
¹¹(http://www.alice.id.tue.nl/references/busa-1980.pdf)

Collaborating with the IBM in New York from 1949, the work, a concordance of
all the words of Thomas Aquinas, was finally published in the 1970s in 56
printed volumes (a version is online since 2005
¹²(http://www.corpusthomisticum.org/it/index.age)). Besides that, an
electronic lexicon for automatic lemmatization of Latin words was created by a
team of ten priests in the scope of two years (in two phases: grouping all the
forms of an inflected word under their lemma, and coding the morphological
categories of each form and lemma), containing 150,000 forms
¹³(http://www.alice.id.tue.nl/references/busa-1980.pdf#page=4). Father
Busa has been dubbed the father of humanities computing and recently also of
digital humanities.

The subject-index has a crucial role in the printed book. It is the only means
for search the book offers. Subjects composing an index can be selected
according to a classification scheme (specific to a field of an inquiry), for
example as elements of a certain degree (with a given minimum number of
subclasses).

Its role seemingly vanishes in the digital text. But it can be easily
transformed. Besides serving as a table of pre-searched results the subject-
index also gives a distinct idea about content of the book. Two patterns give
us a clue: numbers of occurrences of selected words give subjects weights,
while words that seem specific to the book outweights other even if they don't
occur very often. A selection of these words then serves as a descriptor of
the whole text, and can be thought of as a specific kind of 'tags'.

This process was formalized in a mathematical function in the 1970s, thanks to
a formula by Karen Spärck Jones which she entitled 'inverse document
frequency' (IDF), or in other words, "term specificity". It is measured as a
proportion of texts in the corpus where the word appears at least once to the
total number of texts. When multiplied by the frequency of the word _in_ the
text (divided by the maximum frequency of any word in the text), we get _term
frequency-inverse document frequency_ (tf-idf). In this way we can get an
automated list of subjects which are particular in the text when compared to a
group of texts.

We came to learn it by practice of searching the web. It is a mechanism not
dissimilar to thought process involved in retrieving particular information
online. And search engines have it built in their indexing algorithms as well.

There is a paper proposing attaching words generated by tf-idf to the
hyperlinks when referring websites ¹⁴(http://bscit.berkeley.edu/cgi-
bin/pl_dochome?query_src=&format=html&collection=Wilensky_papers&id=3&show_doc=yes).
This would enable finding the referred content even after the link is dead.
Hyperlinks in references in the paper use this feature and it can be easily
tested: ¹⁵(http://www.cs.berkeley.edu/~phelps/papers/dissertation-
abstract.html?lexical-
signature=notemarks+multivalent+semantically+franca+stylized).

There is another measure, cosine similarity, which takes tf-idf further and
can be applied for clustering texts according to similarities in their
specificity. This might be interesting as a feature for digital libraries, or
even a way of organising library bottom-up into novel categories, new
discourses could emerge. Or as an aid for researchers to sort through texts,
or even for editors as an aid in producing interesting anthologies.

## Final
remarks[[edit](/index.php?title=Talks/Poetics_of_Research&action=edit§ion=7
"Edit section: Final remarks")]

1

New disciplines emerge all the time - most recently, for example, cultural
techniques, software studies, or media archaeology. It takes years, even
decades, before they gain dedicated shelves in libraries or a category in
interlibrary digital repositories. Not that it matters that much. They are not
only sites of academic opportunities but, firstly, frameworks of new
perspectives of looking at the world, new domains of knowledge. From the
perspective of researcher the partaking in a discipline involves negotiating
its vocabulary, classifications, corpus, reference field, and specific
terms[subjects]. Creating new fields involves all that, and more. Even when
one goes against all disciplines.

2

Google can still surprise us.

3

Knowledge has been in the making for millenia. There have been (abstract)
mechanisms established that govern its conditions. We now possess specialized
corpora of texts which are interesting enough to serve as a ground to discuss
and experiment with dictionaries, classifications, indexes, and tools for
references retrieval. These all belong to the poetic devices of knowledge-
making.

4

Command-line example of tf-idf and concordance in 3 steps.

* 1\. Process the files text.1-5.txt and produce freq.1-5.txt with lists of (nonlemmatized) words (in respective texts), ordered by frequency:

> for i in {1..5}; do tr '[A-Z]' '[a-z]' < text.$i.txt | tr -c '[a-z]'
'[\012*]' | tr -d '[:punct:]' | sort | uniq -c | sort -k 1nr | sed '1,1d' >
temp.txt; max=$(awk -vvar=1 -F" " 'NR

1 {print $var}' temp.txt); awk
-vmaxx=$max -F' ' '{printf "%-7.7f %s\n", $1=0.5+($1/(maxx2)), $2}' > freq.$i.txt; done && rm temp.txt

2\. Process the files freq.1-5.txt and produce tfidf.1-5.txt containing a list of words (out of 500 most frequent in respective lists), ordered by weight (specificity for each text):

> for j in {1..5}; do rm freq.$j.txt.temp; lines=$(wc -l freq.$j.txt) && for i
in {1..500}; do word=$(awk -vline="$i" -vfield=2 -F" " 'NR
line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR
line {print
$field}' freq.$j.txt); count=$(egrep -lw $word freq.?.txt | wc -l); idf=$(echo
"1+l(5/$count)" | bc -l); tfidf=$(echo $tf$idf | bc); echo $word $tfidf >>
freq.$j.txt.temp; done; sort -k 2nr < freq.$j.txt.temp > tfidf.$j.txt; done

3\. Process the files tfidf.1-5.txt and their source text, text.txt, and produce occ.txt with concordance of top 3 words from each of them:

> rm occ.txt && for j in {1..5}; do echo "$j" >> occ.txt; ptx -f -w 150
text.txt.$j > occ.$j.txt; for i in {1..3}; do word=$(awk -vline="$i" -vfield=1
-F" " 'NR
line {print $field}' tfidf.$j.txt); egrep -i
"[alpha:](/index.php?title=Alpha:&action=edit&redlink=1 "Alpha: $page does
not exist$") $word" occ.$j.txt >> occ.txt; done; done

Dušan Barok

_Written 23 October - 1 November 2014 in Bratislava and Stuttgart._

Medak, Sekulic & Mertens
Book Scanning and Post-Processing Manual Based on Public Library Overhead Scanner v1.2
2014

PUBLIC LIBRARY
&
MULTIMEDIA INSTITUTE

BOOK SCANNING & POST-PROCESSING MANUAL
BASED ON PUBLIC LIBRARY OVERHEAD SCANNER

Written by:
Tomislav Medak
Dubravka Sekulić
With help of:
An Mertens

Creative Commons Attribution - Share-Alike 3.0 Germany

TABLE OF CONTENTS

Introduction
3
I. Photographing a printed book
7
I. Getting the image files ready for post-processing
11
III. Transformation of source images into .tiffs
13
IV. Optical character recognition
16
V. Creating a finalized e-book file
16
VI. Cataloging and sharing the e-book
16
Quick workflow reference for scanning and post-processing
18
References
22

INTRODUCTION:
BOOK SCANNING - FROM PAPER BOOK TO E-BOOK
Initial considerations when deciding on a scanning setup
Book scanning tends to be a fragile and demanding process. Many factors can go wrong or produce
results of varying quality from book to book or page to page, requiring experience or technical skill
to resolve issues that occur. Cameras can fail to trigger, components to communicate, files can get
corrupted in the transfer, storage card doesn't get purged, focus fails to lock, lighting conditions
change. There are trade-offs between the automation that is prone to instability and the robustness
that is prone to become time consuming.
Your initial choice of book scanning setup will have to take these trade-offs into consideration. If
your scanning community is confined to your hacklab, you won't be risking much if technological
sophistication and integration fails to function smoothly. But if you're aiming at a broad community
of users, with varying levels of technological skill and patience, you want to create as much timesaving automation as possible on the condition of keeping maximum stability. Furthermore, if the
time of individual members of your scanning community can contribute is limited, you might also
want to divide some of the tasks between users and their different skill levels.
This manual breaks down the process of digitization into a general description of steps in the
workflow leading from the printed book to a digital e-book, each of which can be in a concrete
situation addressed in various manners depending on the scanning equipment, software, hacking
skills and user skill level that are available to your book scanning project. Several of those steps can
be handled by a single piece of equipment or software, or you might need to use a number of them your mileage will vary. Therefore, the manual will try to indicate the design choices you have in the
process of planning your workflow and should help you make decisions on what design is best for
you situation.
Introducing book scanner designs
The book scanning starts with the capturing of digital image files on the scanning equipment. There
are three principle types of book scanner designs:
 flatbed scanner
 single camera overhead scanner
 dual camera overhead scanner
Conventional flatbed scanners are widely available. However, given that they require the book to be
spread wide open and pressed down with the platen in order to break the resistance of the book
binding and expose sufficiently the inner margin of the text, it is the most destructive approach for
the book, imprecise and slow.
Therefore, book scanning projects across the globe have taken to custom designing improvised
setups or scanner rigs that are less destructive and better suited for fast turning and capturing of
pages. Designs abound. Most include:
•
•
•

one or two digital photo cameras of lesser or higher quality to capture the pages,
transparent V-shaped glass or Plexiglas platen to press the open book against a V-shape
cradle, and
a light source.

The go-to web resource to help you make an informed decision is the DIY book scanning
community at http://diybookscanner.org. A good place to start is their intro
(http://wiki.diybookscanner.org/ ) and scanner build list (http://wiki.diybookscanner.org/scannerbuild-list ).
The book scanners with a single camera are substantially cheaper, but come with an added difficulty
of de-warping the distorted page images due to the angle that pages are photographed at, which can
sometimes be difficult to correct in the post-processing. Hence, in this introductory chapter we'll
focus on two camera designs where the camera lens stands relatively parallel to the page. However,
with a bit of adaptation these instructions can be used to work with any other setup.
The Public Library scanner
In the focus of this manual is the scanner built for the Public Library project, designed by Voja
Antonić (see Illustration 1). The Public Library scanner was built with the immediate use by a wide
community of users in mind. Hence, the principle consideration in designing the Public Library
scanner was less sophistication and more robustness, facility of use and distributed process of
editing.
The board designs can be found here: http://www.memoryoftheworld.org/blog/2012/10/28/ourbeloved-bookscanner. The current iterations are using two Canon 1100 D cameras with the kit lens
Canon EF-S 18-55mm 1:3.5-5.6 IS. Cameras are auto-charging.

Illustration 1: Public Library Scanner
The scanner operates by automatically lowering the Plexiglas platen, illuminating the page and then
triggering camera shutters. The turning of pages and the adjustments of the V-shaped cradle holding

the book are manual.
The scanner is operated by a two-button controller (see Illustration 2). The upper, smaller button
breaks the capture process in two steps: the first click lowers the platen, increases the light level and
allows you to adjust the book or the cradle, the second click triggers the cameras and lifts the platen.
The lower button has
two modes. A quick
click will execute the
whole capture process in
one go. But if you hold
it pressed longer, it will
lower the platen,
allowing you to adjust
the book and the cradle,
and lift it without
triggering cameras when
you press again.

Illustration 2: A two-button controller

More on this manual: steps in the book scanning process
The book scanning process in general can be broken down in six steps, each of which will be dealt
in a separate chapter in this manual:
I. Photographing a printed book
I. Getting the image files ready for post-processing
III. Transformation of source images into .tiffs
IV. Optical character recognition
V. Creating a finalized e-book file
VI. Cataloging and sharing the e-book
A step by step manual for Public Library scanner
This manual is primarily meant to provide a detailed description and step-by-step instructions for an
actual book scanning setup -- based on the Voja Antonić's scanner design described above. This is a
two-camera overhead scanner, currently equipped with two Canon 1100 D cameras with EF-S 1855mm 1:3.5-5.6 IS kit lens. It can scan books of up to A4 page size.
The post-processing in this setup is based on a semi-automated transfer of files to a GNU/Linux
personal computer and on the use of free software for image editing, optical character recognition
and finalization of an e-book file. It was initially developed for the HAIP festival in Ljubljana in
2011 and perfected later at MaMa in Zagreb and Leuphana University in Lüneburg.
Public Library scanner is characterized by a somewhat less automated yet distributed scanning
process than highly automated and sophisticated scanner hacks developed at various hacklabs. A
brief overview of one such scanner, developed at the Hacker Space Bruxelles, is also included in
this manual.
The Public Library scanning process proceeds thus in following discrete steps:

1. creating digital images of pages of a book,
2. manual transfer of image files to the computer for post-processing,
3. automated renaming of files, ordering of even and odd pages, rotation of images and upload to a
cloud storage,
4. manual transformation of source images into .tiff files in ScanTailor
5. manual optical character recognition and creation of PDF files in gscan2pdf
The detailed description of the Public Library scanning process follows below.
The Bruxelles hacklab scanning process
For purposes of comparison, here we'll briefly reference the scanner built by the Bruxelles hacklab
(http://hackerspace.be/ScanBot). It is a dual camera design too. With some differences in hardware functionality
(Bruxelles scanner has automatic turning of pages, whereas Public Library scanner has manual turning of pages), the
fundamental difference between the two is in the post-processing - the level of automation in the transfer of images
from the cameras and their transformation into PDF or DjVu e-book format.
The Bruxelles scanning process is different in so far as the cameras are operated by a computer and the images are
automatically transferred, ordered and made ready for further post-processing. The scanner is home-brew, but the
process is for advanced DIY'ers. If you want to know more on the design of the scanner, contact Michael Korntheuer at
contact@hackerspace.be.
The scanning and post-processing is automated by a single Python script that does all the work
http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEAD
The scanner uses two Canon point and shoot cameras. Both cameras are connected to the PC with USB. They both run
PTP/CHDK (Canon Hack Development Kit). The scanning sequence is the following:
1. Script sends CHDK command line instructions to the cameras
2. Script sorts out the incoming files. This part is tricky. There is no reliable way to make a distinction between the left
and right camera, only between which camera was recognized by USB first. So the protocol is to always power up the
left camera first. See the instructions with the source code.
3. Collect images in a PDF file
4. Run script to OCR a .PDF file to plain .TXT file: http://git.constantvzw.org/?
p=algolit.git;a=blob;f=scanbot_brussel/ocr_pdf.sh;h=2c1f24f9afcce03520304215951c65f58c0b880c;hb=HEAD

I. PHOTOGRAPHING A PRINTED BOOK
Technologically the most demanding part of the scanning process is creating digital images of the
pages of a printed book. It's a process that is very different form scanner design to scanner design,
from camera to camera. Therefore, here we will focus strictly on the process with the Public Library
scanner.
Operating the Public Library scanner
0. Before you start:
Better and more consistent photographs lead to a more optimized and faster post-processing and a
higher quality of the resulting digital e-book. In order to guarantee the quality of images, before you
start it is necessary to set up the cameras properly and prepare the printed book for scanning.
a) Loosening the book
Depending on the type and quality of binding, some books tend to be too resistant to opening fully
to reveal the inner margin under the pressure of the scanner platen. It is thus necessary to “break in”
the book before starting in order to loosen the binding. The best way is to open it as wide as
possible in multiple places in the book. This can be done against the table edge if the book is more
rigid than usual. (Warning – “breaking in” might create irreversible creasing of the spine or lead to
some pages breaking loose.)
b) Switch on the scanner
You start the scanner by pressing the main switch or plugging the power cable into the the scanner.
This will also turn on the overhead LED lights.

c) Setting up the cameras
Place the cameras onto tripods. You need to move the lever on the tripod's head to allow the tripod
plate screwed to the bottom of the camera to slide into its place. Secure the lock by turning the lever
all the way back.
If the automatic chargers for the camera are provided, open the battery lid on the bottom of the
camera and plug the automatic charger. Close the lid.
Switch on the cameras using the lever on the top right side of the camera's body and place it into the
aperture priority (Av) mode on the mode dial above the lever (see Illustration 3). Use the main dial
just above the shutter button on the front side of the camera to set the aperture value to F8.0.

Illustration 3: Mode and main dial, focus mode switch, zoom
and focus ring
On the lens, turn the focus mode switch to manual (MF), turn the large zoom ring to set the value
exactly midway between 24 and 35 mm (see Illustration 3). Try to set both cameras the same.
To focus each camera, open a book on the cradle, lower the platen by holding the big button on the
controller, and turn on the live view on camera LCD by pressing the live view switch (see
Illustration 4). Now press the magnification button twice and use the focus ring on the front of the
lens to get a clear image view.

Illustration 4: Live view switch and magnification button

d) Connecting the cameras
Now connect the cameras to the remote shutter trigger cables that can be found lying on each side
of the scanner. They need to be plugged into a small round port hidden behind a protective rubber
cover on the left side of the cameras.
e) Placing the book into the cradle and double-checking the cameras
Open the book in the middle and place it on the cradle. Hold pressed the large button on the
controller to lower the Plexiglas platen without triggering the cameras. Move the cradle so that the
the platen fits into with the middle of the book.
Turn on the live view on the cameras' LED to see if the the pages fit into the image and if the
cameras are positioned parallel to the page.
f) Double-check storage cards and batteries
It is important that both storage cards on cameras are empty before starting the scanning in order
not to mess up the page sequence when merging photos from the left and the right camera in the
post-processing. To double-check, press play button on cameras and erase if there are some photos
left from the previous scan -- this you do by pressing the menu button, selecting the fifth menu from
the left and then select 'Erase Images' -> 'All images on card' -> 'OK'.
If no automatic chargers are provided, double-check on the information screen that batteries are
charged. They should be fully charged before starting with the scanning of a new book.

g) Turn off the light in the room
Lighting conditions during scanning should be as constant as possible, to reduce glare and achieve
maximum quality remove any source of light that might reflect off the Plexiglas platen. Preferably
turn off the light in the room or isolate the scanner with the black cloth provided.

1. Photographing a book
Now you are ready to start scanning. Place the book closed in the cradle and lower the platen by
holding the large button on the controller pressed (see Illustration 2). Adjust the position of the
cradle and lift the platen by pressing the large button again.
To scan you can now either use the small button on the controller to lower the platen, adjust and
then press it again to trigger the cameras and lift the platen. Or, you can just make a short press on
the large button to do it in one go.
ATTENTION: When the cameras are triggered, the shutter sound has to be heard coming
from both cameras. If one camera is not working, it's best to reconnect both cameras (see
Section 0), make sure the batteries are charged or adapters are connected, erase all images
and restart.
A mistake made in the photographing requires a lot of work in the post-processing, so it's
much quicker to repeat the photographing process.
If you make a mistake while flipping pages, or any other mistake, go back and scan from the page
you missed or incorrectly scanned. Note down the page where the error occurred and in the postprocessing the redundant images will be removed.
ADVICE: The scanner has a digital counter. By turning the dial forward and backward, you
can set it to tell you what page you should be scanning next. This should help you avoid
missing a page due to a distraction.
While scanning, move the cradle a bit to the left from time to time, making sure that the tip of Vshaped platen is aligned with the center of the book and the inner margin is exposed enough.

II. GETTING THE IMAGE FILES READY FOR POST-PROCESSING
Once the book pages have been photographed, they have to be transfered to the computer and
prepared for post-processing. With two-camera scanners, the capturing process will result in two
separate sets of images -- odd and even pages -- coming from the left and right cameras respectively
-- and you will need to rename and reorder them accordingly, rotate them into a vertical position
and collate them into a single sequence of files.
a) Transferring image files
For the transfer of files your principle process design choices are either to copy the files by
removing the memory cards from the cameras and copying them to the computer via a card reader
or to transfer them via a USB cable. The latter process can be automated by remote operating your
cameras from a computer, however this can be done only with a certain number of Canon cameras
(http://bit.ly/16xhJ6b) that can be hacked to run the open Canon Hack Development Kit firmware
(http://chdk.wikia.com).
After transferring the files, you want to erase all the image files on the camera memory card, so that
they would not end up messing up the scan of the next book.
b) Renaming image files
As the left and right camera are typically operated in sync, the photographing process results in two
separate sets of images, with even and odd pages respectively, that have completely different file
names and potentially same time stamps. So before you collate the page images in the order how
they appear in the book, you want to rename the files so that the first image comes from the right
camera, the second from the left camera, the third comes again from the right camera and so on.
You probably want to do a batch renaming, where your right camera files start with n and are offset
by an increment of 2 (e.g. page_0000.jpg, page_0002.jpg,...) and your left camera files start with
n+1 and are also offset by an increment of 2 (e.g. page_0001.jpg, page_0003.jpg,...).
Batch renaming can be completed either from your file manager, in command line or with a number
of GUI applications (e.g. GPrename, rename, cuteRenamer on GNU/Linux).
c) Rotating image files
Before you collate the renamed files, you might want to rotate them. This is a step that can be done
also later in the post-processing (see below), but if you are automating or scripting your steps this is
a practical place to do it. The images leaving your cameras will be positioned horizontally. In order
to position them vertically, the images from the camera on the right will have to be rotated by 90
degrees counter-clockwise, the images from the camera on the left will have to be rotated by 90
degrees clockwise.
Batch rotating can be completed in a number of photo-processing tools, in command line or
dedicated applications (e.g. Fstop, ImageMagick, Nautilust Image Converter on GNU/Linux).
d) Collating images into a single batch
Once you're done with the renaming and rotating of the files, you want to collate them into the same
folder for easier manipulation later.

Getting the image files ready for post-processing on the Public Library scanner
In the case of Public Library scanner, a custom C++ script was written by Mislav Stublić to
facilitate the transfer, renaming, rotating and collating of the images from the two cameras.
The script prompts the user to place into the card reader the memory card from the right camera
first, gives a preview of the first and last four images and provides an entry field to create a subfolder in a local cloud storage folder (path: /home/user/Copy).
It transfers, renames, rotates the files, deletes them from the card and prompts the user to replace the
card with the one from the left camera in order to the transfer the files from there and place them in
the same folder. The script was created for GNU/Linux system and it can be downloaded, together
with its source code, from: https://copy.com/nLSzflBnjoEB
If you have other cameras than Canon, you can edit the line 387 of the source file to change to the
naming convention of your cameras, and recompile by running the following command in your
terminal: "gcc scanflow.c -o scanflow -ludev `pkg-config --cflags --libs gtk+-2.0`"
In the case of Hacker Space Bruxelles scanner, this is handled by the same script that operates the cameras that can be
downloaded from: http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEAD

III. TRANSFORMATION OF SOURCE IMAGES INTO .TIFFS
Images transferred from the cameras are high definition full color images. You want your cameras
to shoot at the largest possible .jpg resolution in order for resulting files to have at least 300 dpi (A4
at 300 dpi requires a 9.5 megapixel image). In the post-processing the size of the image files needs
to be reduced down radically, so that several hundred images can be merged into an e-book file of a
tolerable size.
Hence, the first step in the post-processing is to crop the images from cameras only to the content of
the pages. The surroundings around the book that were captured in the photograph and the white
margins of the page will be cropped away, while the printed text will be transformed into black
letters on white background. The illustrations, however, will need to be preserved in their color or
grayscale form, and mixed with the black and white text. What were initially large .jpg files will
now become relatively small .tiff files that are ready for optical character recognition process
(OCR).
These tasks can be completed by a number of software applications. Our manual will focus on one
that can be used across all major operating systems -- ScanTailor. ScanTailor can be downloaded
from: http://scantailor.sourceforge.net/. A more detailed video tutorial of ScanTailor can be found
here: http://vimeo.com/12524529.
ScanTailor: from a photograph of a page to a graphic file ready for OCR
Once you have transferred all the photos from cameras to the computer, renamed and rotated them,
they are ready to be processed in the ScanTailor.
1) Importing photographs to ScanTailor
- start ScanTailor and open ‘new project’
- for ‘input directory’ chose the folder where you stored the transferred and renamed photo images
- you can leave ‘output directory’ as it is, it will place your resulting .tiffs in an 'out' folder inside
the folder where your .jpg images are
- select all files (if you followed the naming convention above, they will be named
‘page_xxxx.jpg’) in the folder where you stored the transferred photo images, and click 'OK'
- in the dialog box ‘Fix DPI’ click on All Pages, and for DPI choose preferably '600x600', click
'Apply', and then 'OK'
2) Editing pages
2.1 Rotating photos/pages
If you've rotated the photo images in the previous step using the scanflow script, skip this step.
- Rotate the first photo counter-clockwise, click Apply and for scope select ‘Every other page’
followed by 'OK'
- Rotate the following photo clockwise, applying the same procedure like in the previous step
2.2 Deleting redundant photographs/pages
- Remove redundant pages (photographs of the empty cradle at the beginning and the end of the
book scanning sequence; book cover pages if you don’t want them in the final scan; duplicate pages
etc.) by right-clicking on a thumbnail of that page in the preview column on the right side, selecting
‘Remove from project’ and confirming by clicking on ‘Remove’.

# If you by accident remove a wrong page, you can re-insert it by right-clicking on a page
before/after the missing page in the sequence, selecting 'insert after/before' (depending on which
page you selected) and choosing the file from the list. Before you finish adding, it is necessary to
again go through the procedure of fixing DPI and Rotating.
2.3 Adding missing pages
- If you notice that some pages are missing, you can recapture them with the camera and insert them
manually at this point using the procedure described above under 2.2.
3) Split pages and deskew
Steps ‘Split pages’ and ‘Deskew’ should work automatically. Run them by clicking the ‘Play’ button
under the 'Select content' function. This will do the three steps automatically: splitting of pages,
deskewing and selection of content. After this you can manually re-adjust splitting of pages and deskewing.
4) Selecting content
Step ‘Select content’ works automatically as well, but it is important to revise the resulting selection
manually page by page to make sure the entire content is selected on each page (including the
header and page number). Where necessary, use your pointer device to adjust the content selection.
If the inner margin is cut, go back to 'Split pages' view and manually adjust the selected split area. If
the page is skewed, go back to 'Deskew' and adjust the skew of the page. After this go back to
'Select content' and readjust the selection if necessary.
This is the step where you do visual control of each page. Make sure all pages are there and
selections are as equal in size as possible.
At the bottom of thumbnail column there is a sort option that can automatically arrange pages by
the height and width of the selected content, making the process of manual selection easier. The
extreme differences in height should be avoided, try to make selected areas as much as possible
equal, particularly in height, across all pages. The exception should be cover and back pages where
we advise to select the full page.
5) Adjusting margins
For best results select in the previous step content of the full cover and back page. Now go to the
'Margins' step and set under Margins section both Top, Bottom, Left and Right to 0.0 and do 'Apply
to...' → 'All pages'.
In Alignment section leave 'Match size with other pages' ticked, choose the central positioning of
the page and do 'Apply to...' → 'All pages'.
6) Outputting the .tiffs
Now go to the 'Output' step. Ignore the 'Output Resolution' section.
Next review two consecutive pages from the middle of the book to see if the scanned text is too
faint or too dark. If the text seems too faint or too dark, use slider Thinner – Thicker to adjust. Do
'Apply to' → 'All pages'.
Next go to the cover page and select under Mode 'Color / Grayscale' and tick on 'White Margins'.
Do the same for the back page.
If there are any pages with illustrations, you can choose the 'Mixed' mode for those pages and then

under the thumb 'Picture Zones' adjust the zones of the illustrations.
Now you are ready to output the files. Just press 'Play' button under 'Output'. Once the computer is
finished processing the images, just do 'File' → 'Save as' and save the project.

IV. OPTICAL CHARACTER RECOGNITION
Before the edited-down graphic files are finalized as an e-book, we want to transform the image of
the text into an actual text that can be searched, highlighted, copied and transformed. That
functionality is provided by Optical Character Recognition. This a technically difficult task dependent on language, script, typeface and quality of print - and there aren't that many OCR tools
that are good at it. There is, however, a relatively good free software solution - Tesseract
(http://code.google.com/p/tesseract-ocr/) - that has solid performance, good language data and can
be trained for an even better performance, although it has its problems. Proprietary solutions (e.g.
Abby FineReader) sometimes provide superior results.
Tesseract supports as input format primarily .tiff files. It produces a plain text file that can be, with
the help of other tools, embedded as a separate layer under the original graphic image of the text in
a PDF file.
With the help of other tools, OCR can be performed also against other input files, such as graphiconly PDF files. This produces inferior results, depending again on the quality of graphic files and
the reproduction of text in them. One such tool is a bashscript to OCR a ODF file that can be found
here: https://github.com/andrecastro0o/ocr/blob/master/ocr.sh
As mentioned in the 'before scanning' section, the quality of the original book will influence the
quality of the scan and thus the quality of the OCR. For a comparison, have a look here:
http://www.paramoulipist.be/?p=1303
Once you have your .txt file, there is still some work to be done. Because OCR has difficulties to
interpret particular elements in the lay-out and fonts, the TXT file comes with a lot of errors.
Recurrent problems are:
- combinations of specific letters in some fonts (it can mistake 'm' for 'n' or 'I' for 'i' etc.);
- headers become part of body text;
- footnotes are placed inside the body text;
- page numbers are not recognized as such.

V. CREATING A FINALIZED E-BOOK FILE
After the optical character recognition has been completed, the resulting text can be merged with
the images of pages and output into an e-book format. While increasingly the proper e-book file
formats such as ePub have been gaining ground, PDFs still remain popular because many people
tend to read on their computers, and they retain the original layout of the book on paper including
the absolute pagination needed for referencing in citations. DjVu is also an option, as an alternative
to PDF, used because of its purported superiority, but it is far less popular.
The export to PDF can be done again with a number of tools. In our case we'll complete the optical
character recognition and PDF export in gscan2pdf. Again, the proprietary Abbyy FineReader will
produce a bit smaller PDFs.
If you prefer to use an e-book format that works better with e-book readers, obviously you will have
to remove some of the elements that appear in the book - headers, footers, footnotes and pagination.

This can be done earlier in the process of cropping down the original .jpg image files (see under III)
or later by transforming the PDF files. This can be done in Calibre (http://calibre-ebook.com) by
converting the PDF into an ePub, where it can be further tweaked to better accommodate or remove
the headers, footers, footnotes and pagination.
Optical character recognition and PDF export in Public Library workflow
Optical character recognition with the Tesseract engine can be performed on GNU/Linux by a
number of command line and GUI tools. Much of those tools exist also for other operating systems.
For the users of the Public Library workflow, we recommend using gscan2pdf application both for
the optical character recognition and the PDF or DjVu export.
To do so, start gscan2pdf and open your .tiff files. To OCR them, go to 'Tools' and select 'OCR'. In
the dialog box select the Tesseract engine and your language. 'Start OCR'. Once the OCR is
finished, export the graphic files and the OCR text to PDF by selecting 'Save as'.
However, given that sometimes the proprietary solutions produce better results, these tasks can also
be done, for instance, on the Abbyy FineReader running on a Windows operating system running
inside the Virtual Box. The prerequisites are that you have both Windows and Abbyy FineReader
you can install in the Virtual Box. If using Virtual Box, once you've got both installed, you need to
designate a shared folder in your Virtual Box and place the .tiff files there. You can now open them
from the Abbyy FineReader running in the Virtual Box, OCR them and export them into a PDF.
To use Abbyy FineReader transfer the output files in your 'out' out folder to the shared folder of the
VirtualBox. Then start the VirtualBox, start Windows image and in Windows start Abbyy
FineReader. Open the files and let the Abbyy FineReader read the files. Once it's done, output the
result into PDF.

VI. CATALOGING AND SHARING THE E-BOOK
Your road from a book on paper to an e-book is complete. If you want to maintain your library you
can use Calibre, a free software tool for e-book library management. You can add the metadata to
your book using the existing catalogues or you can enter metadata manually.
Now you may want to distribute your book. If the work you've digitized is in the public domain
(https://en.wikipedia.org/wiki/Public_domain), you might consider contributing it to the Gutenberg
project
(http://www.gutenberg.org/wiki/Gutenberg:Volunteers'_FAQ#V.1._How_do_I_get_started_as_a_Pr
oject_Gutenberg_volunteer.3F ), Wikibooks (https://en.wikibooks.org/wiki/Help:Contributing ) or
Arhive.org.
If the work is still under copyright, you might explore a number of different options for sharing.

QUICK WORKFLOW REFERENCE FOR SCANNING AND
POST-PROCESSING ON PUBLIC LIBRARY SCANNER
I. PHOTOGRAPHING A PRINTED BOOK
0. Before you start:
- loosen the book binding by opening it wide on several places
- switch on the scanner
- set up the cameras:
- place cameras on tripods and fit them tigthly
- plug in the automatic chargers into the battery slot and close the battery lid
- switch on the cameras
- switch the lens to Manual Focus mode
- switch the cameras to Av mode and set the aperture to 8.0
- turn the zoom ring to set the focal length exactly midway between 24mm and 35mm
- focus by turning on the live view, pressing magnification button twice and adjusting the
focus to get a clear view of the text
- connect the cameras to the scanner by plugging the remote trigger cable to a port behind a
protective rubber cover on the left side of the cameras
- place the book into the crade
- double-check storage cards and batteries
- press the play button on the back of the camera to double-check if there are images on the
camera - if there are, delete all the images from the camera menu
- if using batteries, double-check that batteries are fully charged
- switch off the light in the room that could reflect off the platen and cover the scanner with the
black cloth
1. Photographing
- now you can start scanning either by pressing the smaller button on the controller once to
lower the platen and adjust the book, and then press again to increase the light intensity, trigger the
cameras and lift the platen; or by pressing the large button completing the entire sequence in one
go;
- ATTENTION: Shutter sound should be coming from both cameras - if one camera is not
working, it's best to reconnect both cameras, make sure the batteries are charged or adapters
are connected, erase all images and restart.
- ADVICE: The scanner has a digital counter. By turning the dial forward and backward,
you can set it to tell you what page you should be scanning next. This should help you to
avoid missing a page due to a distraction.

II. Getting the image files ready for post-processing
- after finishing with scanning a book, transfer the files to the post-processing computer
and purge the memory cards
- if transferring the files manually:
- create two separate folders,
- transfer the files from the folders with image files on cards, using a batch
renaming software rename the files from the right camera following the convention
page_0001.jpg, page_0003.jpg, page_0005.jpg... -- and the files from the left camera
following the convention page_0002.jpg, page_0004.jpg, page_0006.jpg...
- collate image files into a single folder
- before ejecting each card, delete all the photo files on the card
- if using the scanflow script:
- start the script on the computer
- place the card from the right camera into the card reader
- enter the name of the destination folder following the convention
"Name_Surname_Title_of_the_Book" and transfer the files
- repeat with the other card
- script will automatically transfer the files, rename, rotate, collate them in proper
order and delete them from the card
III. Transformation of source images into .tiffs
ScanTailor: from a photograph of page to a graphic file ready for OCR
1) Importing photographs to ScanTailor
- start ScanTailor and open ‘new project’
- for ‘input directory’ chose the folder where you stored the transferred photo images
- you can leave ‘output directory’ as it is, it will place your resulting .tiffs in an 'out' folder
inside the folder where your .jpg images are
- select all files (if you followed the naming convention above, they will be named
‘page_xxxx.jpg’) in the folder where you stored the transferred photo images, and click
'OK'
- in the dialog box ‘Fix DPI’ click on All Pages, and for DPI choose preferably '600x600',
click 'Apply', and then 'OK'
2) Editing pages
2.1 Rotating photos/pages
If you've rotated the photo images in the previous step using the scanflow script, skip this step.
- rotate the first photo counter-clockwise, click Apply and for scope select ‘Every other
page’ followed by 'OK'
- rotate the following photo clockwise, applying the same procedure like in the previous
step

2.2 Deleting redundant photographs/pages
- remove redundant pages (photographs of the empty cradle at the beginning and the end;
book cover pages if you don’t want them in the final scan; duplicate pages etc.) by rightclicking on a thumbnail of that page in the preview column on the right, selecting ‘Remove
from project’ and confirming by clicking on ‘Remove’.
# If you by accident remove a wrong page, you can re-insert it by right-clicking on a page
before/after the missing page in the sequence, selecting 'insert after/before' and choosing the file
from the list. Before you finish adding, it is necessary to again go the procedure of fixing DPI and
rotating.
2.3 Adding missing pages
- If you notice that some pages are missing, you can recapture them with the camera and
insert them manually at this point using the procedure described above under 2.2.
3)

Split pages and deskew
- Functions ‘Split Pages’ and ‘Deskew’ should work automatically. Run them by
clicking the ‘Play’ button under the 'Select content' step. This will do the three steps
automatically: splitting of pages, deskewing and selection of content. After this you can
manually re-adjust splitting of pages and de-skewing.

4)

Selecting content and adjusting margins
- Step ‘Select content’ works automatically as well, but it is important to revise the
resulting selection manually page by page to make sure the entire content is selected on
each page (including the header and page number). Where necessary use your pointer device
to adjust the content selection.
- If the inner margin is cut, go back to 'Split pages' view and manually adjust the selected
split area. If the page is skewed, go back to 'Deskew' and adjust the skew of the page. After
this go back to 'Select content' and readjust the selection if necessary.
- This is the step where you do visual control of each page. Make sure all pages are there
and selections are as equal in size as possible.
- At the bottom of thumbnail column there is a sort option that can automatically arrange
pages by the height and width of the selected content, making the process of manual
selection easier. The extreme differences in height should be avoided, try to make
selected areas as much as possible equal, particularly in height, across all pages. The
exception should be cover and back pages where we advise to select the full page.

5) Adjusting margins
- Now go to the 'Margins' step and set under Margins section both Top, Bottom, Left and
Right to 0.0 and do 'Apply to...' → 'All pages'.
- In Alignment section leave 'Match size with other pages' ticked, choose the central

positioning of the page and do 'Apply to...' → 'All pages'.
6) Outputting the .tiffs
- Now go to the 'Output' step.
- Review two consecutive pages from the middle of the book to see if the scanned text is
too faint or too dark. If the text seems too faint or too dark, use slider Thinner – Thicker to
adjust. Do 'Apply to' → 'All pages'.
- Next go to the cover page and select under Mode 'Color / Grayscale' and tick on 'White
Margins'. Do the same for the back page.
- If there are any pages with illustrations, you can choose the 'Mixed' mode for those
pages and then under the thumb 'Picture Zones' adjust the zones of the illustrations.
- To output the files press 'Play' button under 'Output'. Save the project.
IV. Optical character recognition & V. Creating a finalized e-book file
If using all free software:
1) open gscan2pdf (if not already installed on your machine, install gscan2pdf from the
repositories, Tesseract and data for your language from https://code.google.com/p/tesseract-ocr/)
- point gscan2pdf to open your .tiff files
- for Optical Character Recognition, select 'OCR' under the drop down menu 'Tools',
select the Tesseract engine and your language, start the process
- once OCR is finished and to output to a PDF, go under 'File' and select 'Save', edit the
metadata and select the format, save
If using non-free software:
2) open Abbyy FineReader in VirtualBox (note: only Abby FineReader 10 installs and works with some limitations - under GNU/Linux)
- transfer files in the 'out' folder to the folder shared with the VirtualBox
- point it to the readied .tiff files and it will complete the OCR
- save the file

REFERENCES
For more information on the book scanning process in general and making your own book scanner
please visit:
DIY Book Scanner: http://diybookscannnner.org
Hacker Space Bruxelles scanner: http://hackerspace.be/ScanBot
Public Library scanner: http://www.memoryoftheworld.org/blog/2012/10/28/our-belovedbookscanner/
Other scanner builds: http://wiki.diybookscanner.org/scanner-build-list
For more information on automation:
Konrad Voeckel's post-processing script (From Scan to PDF/A):
http://blog.konradvoelkel.de/2013/03/scan-to-pdfa/
Johannes Baiter's automation of scanning to PDF process: http://spreads.readthedocs.org
For more information on applications and tools:
Calibre e-book library management application: http://calibre-ebook.com/
ScanTailor: http://scantailor.sourceforge.net/
gscan2pdf: http://sourceforge.net/projects/gscan2pdf/
Canon Hack Development Kit firmware: http://chdk.wikia.com
Tesseract: http://code.google.com/p/tesseract-ocr/
Python script of Hacker Space Bruxelles scanner: http://git.constantvzw.org/?
p=algolit.git;a=tree;f=scanbot_brussel;h=81facf5cb106a8e4c2a76c048694a3043b158d62;hb=HEA
D

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.

line {print $field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR

line {print
$field}' freq.$j.txt); tf=$(awk -vline="$i" -vfield=1 -F" " 'NR