USDC
Opinion: Elsevier against SciHub and LibGen
2015


UNITED STATES DISTRICT COURT
SOUTHERN DISTRICT OF NEW YORK
----------------------------------------

15 Civ. 4282(RWS)
OPINION

ELSEVIER INC., ELSEVIER B.V., and ELSEVIER LTD.,

Plaintiffs,

- against -

WWW.SCI-HUB.ORG, THE LIBRARY GENESIS PROJECT, d/b/a LIBGEN.ORG, ALEXANDRA ELBAKYAN, and JOHN DOES 1-99,

Defendants.

----------------------------------------

APPEARANCES

Attorneys for the Plaintiffs

DEVORE & DEMARCO LLP
99 Park Avenue, Suite 1100
New York, NY 1001 6
By:
Joseph DeMarco, Esq.
David Hirschberg, Esq.
Urvashi Sen, Esq.

Pro Se

Alexandra Elbakyan
Almaty, Kazakhstan

1

Sweet, D.J.,

Plaintiffs Elsevier Inc., Elsevier B.V., and Elsevier, Ltd. (collectively, "Elsevier" or the "Plaintiffs") have moved for a preliminary injunction preventing defendants Sci-Hub, Library Genesis Project (the " Project"), Alexandra Elbakyan ("Elbakyan"), Bookfi.org, Elibgen.org, Erestroresollege.org, and Libgen.info (collectively, the "Defendants") from distributing works to which Elsevier owns the copyright. Based upon the facts and conclusions below, the motion is granted and the Defendants are prohibited from distributing the Plaintiffs' copyrighted works.

Prior Proceedings

Elsevier, a major publisher of scientific journal articles and book chapters, brought this action on June 2, 2015, alleging that the Defendants, a series of websites affiliated with the Project (the "Website Defendants") and their owner and operator, Alexandra Elbakyan, infringed Elsevier's copyrighted works and violated the Computer Fraud and Abuse Act. (See generally Complaint, Dkt. No. 1.) Elsevier filed the instant motion for a preliminary injunction on June 11, 2015, via an Order to Show Cause. (Dkt. Nos. 5-13.) On June 18, 2015, the Court granted

2

Plaintiffs' Order to Show Cause and authorized service on the

Defendants via email.
week,

(Dkt.

No.

1 5.)

During the following

the Plaintiffs served the Website Defendants via email and

Elbakyan via email and postal mail.
On July 7,
Part One Judge,
and Elbakyan,

2015,

See Dkt.

Nos.

the Honorable Ronnie Abrams,

24-31. )
acting as

held a telephone conference with the Plaintiffs

during which Elbakyan acknowledged receiving the

papers concerning this case and declared that she did not intend
to obtain a lawyer.
conference,

(See Transcript,

Dkt.

No.

38. )

After the

Judge Abrams issued an Order directing Elbakyan to

notify the Court whether she wished assistance in obtaining pro
bono counsel,
se,

and advising her that while she could proceed pro

the Website Defendants,

not being natural persons,
(Dkt. No.

obtain counsel or risk default.

telephonic conference was held on July 14 ,

must

3 6. )

A second

2015,

during which

Elbakyan stated that she needed additional time to find a
lawyer.

( See Transcript,

the request,

Dkt.

No.

4 2. )

Judge Abrams granted

but warned Elbakyan th�t "you have to move quickly

both in attempting to retain an attorney and you' ll have to
stick to the schedule that is set once it' s set. "
After the telephone conference,

(Id.

at 6. )

Judge Abrams issued another

Order setting the preliminary injunction hearing for September
1 6 and directing Elbakyan to inform the Court by July 21 if she
wished assistance in obtaining pro bono counsel.
3

(Dkt. No.

4 0. )

The motion for a preliminary injunction was heard on
September 1 6,
hearing,

201 5.

None of the Defendants appeared at the

although Elbakyan sent a two-page letter to the court

the day before.

(Dkt. No.

50.)

Applicable Standard

Preliminary injunctions are "extraordinary and drastic
remed[ies]

that should not be granted unless the movant,

clear showing,
Armstrong,

carries the burden of persuasion. "

5 20 U. S.

district court may,

9 68,

972 (1997).

by a

Mazurek v.

In a copyright case,

at its discretion,

a

grant a preliminary

injunction when the plaintiffs demonstrate 1) a likelihood of
success on the merits,
injunction,
favor,

2) irreparable harm in the absence of an

3) a balance of the hardships tipping in their

and 4 ) that issuance of an injunction would not do a

disservice to the public interest.
F. 3d 27 5,

278 ( 2d Cir.

W PIX,

Inc.

v. ivi,

Inc.,

691

2012).

The Motion is Granted

With the exception of Elbakyan,

none of the Defendants

filed any opposition to the instant motion,

participated in any

hearing or telephone conference, or in any other way appeared in
4

the case.

Although Elbakyan acknowledges that she is the "main

operator of sci-hub. erg website"
only represent herself pro

se;

(Dkt.

No.

50 at 1. ), she may

since the Website Defendants are

not natural persons, they may only be represented by an attorney
See Max Cash Media, Inc.

admitted to practice in federal court.
v.

Prism Corp. , No.

(S.D. N. Y.

12 Civ.

147, 2012 WL 2861 162, at *1

July 9, 2012);

Auth. , 722 F. 2d 20, 22

(2d Cir.

1983)

(stating reasons for the

rule and noting that it is "venerable and widespread").

Because

the Website Defendants did not retain an attorney to defend this
action, they are in default.
However, the Website Defendants' default does not
the Plaintiffs to an injunction, nor does

automatically entit

the fact that Elbakyan's submission raises no mer
challenge to the Plaintiffs' claims.
Music, No.
2015).

13 Civ.

s-based

See Thurman v.

5194, 2015 WL 2 168134, at *4

Bun Bun
May 7,

(S. D. N. Y.

Instead, notwithstanding the default, the Plaintiffs

must present evidence sufficient to establish that they are
entitled to injunctive relief.
Curveal Fashion, No.
(S. D. N. Y.
Cir.

09 Civ.

Jan 20, 2010);

See id. ;

Inc.

v.

8458, 2010 WL 308303, at *2

CFTC v.

Vartuli, 228 F. 3d 94, 98

2000).

A. Likelihood of S

Gucci Am.,

ss on the
5

rits

(2d

, -

Elsevier has established that the Defendants have
reproduced and distributed its copyrighted works,
of the exclusive rights established by 17
Complaint,

Dkt. No. 1,

at 11-13.)

(1)

"two elements must be

ownership of a valid copyright,

and

(2)

copying of

constituent elements of the work that are original."
Records,

LLC v. Doe 3,

Feist Publ'ns,

See

U.S.C. § 106.

In order to prevail on a

claim for infringement of copyright,
proven:

in violation

604 F.3d 110,

117

Arista

(2d Cir. 2010)

Inc. v. Rural Tel. Serv. Co.,

499 U.S.

(quoting

340,

361

(1991) ) .
Elsevier has made a substantial evidentiary showing,
documenting the manner in which the Defendants access its
ScienceDirect database of scientific literature and post
copyrighted material on their own websites free of charge.
According to Elsevier,

the Defendants gain access to

ScienceDirect by using credentials fraudulently obtained from
educational institutions,

including educational institutions

located in the Southern District of New York,
legitimate access to ScienceDirect.
Woltermann

(the "Woltermann Dec.") ,

which are granted

(See Declaration of Anthony
Dkt. No. 8,

at 13-14.)

As

an attachment to one of the supporting declarations to this
motion,

Elsevier includes a sequence of screenshots showing how

a user could go to �ww.sc�-hub.org,
6

one of the Website

Defendants,

search for information on a scientific article,

a set of search results, click on a link,
copyrighted article on ScienceDirect,

get

and be redirected to a

via a proxy.

See

Elsevier also points to a

Walterman Dec. at 41-44 and Ex. U.)

Twitter post (in Russian) indicating that whenever an article is
downloaded via this method,
own servers.
1 2,

Ex.

B.)

the Defendants save a copy on their

(See Declaration of David M. Hirschberg,
As specific examples,

with their copyright registrations.
Dkt.

No. 9,

Exs. B-D.)

No.

Elsevier includes copies of

two of its articles accessed via the Defendants'

Doda,

Dkt.

websites,

along

(Declaration of Paul F.

This showing demonstrates a

likelihood of success on Elsevier' s copyright infringement
claims.
Elsevier also shows a likelihood of success on its claim
under the Computer Fraud and Abuse Act ("CFAA").
prohibits,

inter alia,

The CFAA

obtaining information from "any protected

computer" without authorization,

18 U.S. C. § 1030(a)(2)(C),

and

obtaining anything of value by accessing any protected computer
with intent to defraud.

Id.

§ (a) (4).

The definition of

"protected computer" includes one "which is used in or affecting
interstate or foreign commerce or communication,

including a

computer located outside the United States that

is used in a

manner that affects interstate or foreign commerce or
communication of the United States."
7

I .

§ (e) (2) (B);

Nexans

Wires S. A.
2006).

v.

Sa

Inc.

166 F.

App'x 559, 562 n. 5

(2d Cir.

Elsevier's ScienceDirect database is located on multiple

servers throughout the world and is accessed by educational
institutions and their students, and qualifies as a computer
used in interstate commerce, and therefore as a protected
computer under the CFAA.

See Woltermann Dec.

at 2-3. )

As

found above, Elsevier has shown that the Defendants' access to
ScienceDirect was unauthorized and accomplished via fraudulent
university credentials.

While the C fAA requires a civil

plaintiff to have suffered over $5,000 in damage or loss, see
Register. com, Inc.

v.

Verio, Inc. , 356 F. 3d 393, 439

(2d Cir.

2004), Elsevier has made the necessary showing since it
documented between 2,000 and 8,500 of its articles being added
to the LibGen database each day

(Woltermann Dec.

at 8, Exs.

G &

H) and because its articles carry purchase prices of between
$19. 95 and $41. 95 each.
Leon, No.

12 Civ.

Id.

at 2;

see Millennium TGA, Inc.

1360, 2013 WL 5719079, at *10

(E. D. N.Y.

v.

Oct.

18, 2013). 1
Elsevier's evidence is also buttressed by Elbakyan's
submission, in which she frankly admits to copyright
infringement.

1

(See Dkt.

No.

50.)

She discusses her time as a

While Elsevier's articles are likely sufficient on their own to qualify as

"[]thing[s]

of value" under the CFAA,

Elbakyan acknowledges in her submission

that the Defendants derive revenue from their website.
50,

at

1

{"That is true that website collects donations,

pressure anyone to send them.").)

8

Letter,

Dkt. No.

however we do not

student at a university in Kazakhstan, where she did not have
access to research papers and found the prices charged to be
just insane.
(Id.

at 1.)

She obtained the papers she needed

"by pirating them," and found may similar students and
researchers, predominantly in developing count

s, who were in

similar situations and helped each other illicitly obtain
research materials that they could not access legitimately or
afford on the open market.

Id.)

As Elbakyan describes it, "I

could obtain any paper by pirating it, so I solved many requests
and people always were very grateful for my help.

After that, I

created sci-hub.org website that simply makes this process
automatic and the website immediately became popular."

(Id.)

Given Elsevier's strong evidentiary showing and Elbakyan's
admissions, the first prong of the preliminary injunction test
is firmly established.

B. Irreparable Harm

Irreparable harm is present "where, but for the grant of
equitable relief, there is a substantial chance that upon final
resolution of the action the parties cannot be returned to the
positions they previously occupied."

Brenntag Int'l Chems.,

Inc. v. Bank of India, 175 F.3d 245, 249

(2d Cir. 1999).

Here,

there is irreparable harm because it is entirely likely that the
9

•'

damage to Elsevier could not be effectively quantified.
Register.com,

356 F.3d at 404

{"irreparable harm may be found

where damages are difficult to establish and measure.").
would be difficult,

if not impossible,

It

to determine how much

money the Plaintiffs have lost due to the availability of
thousands of their articles on the Defendant websites;

some

percentage of those articles would no doubt have been paid for
legitimately if they were not downloadable for free,

but there

appears to be no way of determining how many that would be.
There is also the matter of harm caused by "viral infringement, "
where Elsevier's content could be transmitted and retransmitted
by third parties who acquired it from the Defendants even after
the Defendants' websites were shut down.
Inc.,
275

765 F. Supp. 2d 594,

(2d Cir. 2012).

620

(S.D.N.Y.

See WPIX,
2011),

'to prove the loss of sales due to

infringement is .

notoriously difficult.'"

Colting,

81

607 F.3d 6 8,

(2d Cir. 2010)

Corp. v. Petri-Kine Camera Co.,
(Friendly,

aff'd 691 F.3d

"(C]ourts have tended to issue injunctions

in this context because

1971)

Inc. v. ivi,

Salinger v.

(quoting Omega Importing

451 F.2d 1190,

1195

(2d Cir.

J.)).

Additionally,

the harm done to the Plaintiffs is likely

irreparable because the scale of any money damages would
dramatically exceed Defendants' ability to pay.
F.3d at 249-50

Brenntag,

175

(explaining that even where money damages can be
10

quantified, there is irreparable harm when a defendant will be
unable to cover the damages).
Defendants'

It is highly likely that the

activities will be found to be willful - Elbakyan

herself refers to the websites'

activities as "pirating" (Dkt.

No. 50 at 1) - in which case they would be liable for between
$750 and $150,000 in statutory damages for each pirated work.
See 17 U.S.C.

§ 504(c);

HarperCollins Publishers LLC v. Open

Road Integrated Media, LLP, 58 F.
2014).

Supp. 3d 380, 38 7 (S.D.N.Y.

Since the Plaintiffs credibly allege that the Defendants

infringe an average of over 3,000 new articles each day
(Woltermann Deel. at 7), even if the Court were to award damages
at the lower end of the statutory range the Defendants'
liability could be extensive.

Since the Defendants are an

individual and a set of websites supported by voluntary
donations, the potential damages are likely to be far beyond the
Defendants'

ability to pay.

C. Balance of Hardships

The balance of hardships clearly tips in favor of the
Plaintiffs.

Elsevier has shown that it is likely to succeed on

the merits, and that it continues to suffer irreparable harm due
to the Defendants'
free.

making its copyrighted material available for

As for the Defendants, "it is axiomatic that an infringer
11

of copyright cannot complain about the loss of ability to offer
its infringing product."
omitted).

W PIX,

691 F.3d at 287 (quotation

The Defendants cannot be legally harmed by the fact

that they cannot continue to steal the Plaintiff' s content,

even

See id.

if they tried to do so for public-spirited reasons.

D. Public Interest

To the extent that Elbakyan mounts a legal challenge to the
motion for a preliminary injunction,
interest prong of the test.

it is on the public

In her letter to the Court,

notes that there are "lots of researchers .

she

. especially in

developing countries" who do not have access to key scientific
papers owned by Elsevier and similar organizations,

and who

cannot afford to pay the high fees that Elsevier charges.
No.

50,

at 1.)

Elbakyan states in her letter that Elsevier
operates by racket:
any papers.

(Dkt.

if you do not send money,

On my website,

as they want for free,

you will not read

any person can read as many papers

and sending donations is their free will.

Why Elsevier cannot work like this,

(Id.)

I wonder?
Elbakyan

also notes that researchers do not actually receive money in
exchange for granting Elsevier a copyright.

Id.)

Rather,

she

alleges they give Elsevier ownership of their works "because
Elsevier is an owner of so-called
12

'high-impact'

journals.

If a

researcher wants to be recognized,

make a career - he or she

needs to have publications in such journals.n

{ Id. at 1-2.)

Elbakyan notes that prominent researchers have made attempts to
boycott Elsevier and states that "[t]he general opinion in
research community is that research papers should be distributed
for free (open access),

not sold.

And practices of such

companies like Elsevier are unacceptable,
distribution of knowledge."

because they limit

ld. at 2.)

Elsevier contends that the public interest favors the
issuance of an injunction because doing so will "protect the
delicate ecosystem which supports scientific research
worldwide."

(Pl.'s Br.,

Dkt. No. 6,

at 21.)

It states that the

money it generates by selling access. to scientific research is
used to support new discoveries,
maintain a "de
discovery."

to create new journals,

and to

nitive and accurate record of scientif

( Id.)

It also argues that allowing its articles to

be widely distributed

sks the spread of bad science - while

Elsevier corrects and retracts articles whose conclusions are
later found to be flawed,

it has no way of doing so when the

content is taken out of its control.

Id. at 22.)

Lastly,

Elsevier argues that injunctive relief against the Defendants is
important to deter "cyber-crime," while

ling to issue an

injunction will incentivize pirates to continue to publish
copyrighted works.
13

It cannot be denied that there is a compelling public
interest in fostering scientific achievement, and that ensuring
broad access to scientific research is an important component of
that effort.

As the Second Circuit has noted, "[c]opyright law

inherently balances [] two competing public interests .

.

. the

rights of users and the public interest in broad accessibility
of creative works, and the rights of copyright owners and the
public interest in rewarding and incentivizing creative efforts
(the

'owner-user balance' )."

WPIX, 691 F.3d at 287 .

Elbakyan' s

solution to the problems she identifies, simply making
copyrighted content available for free via a foreign website,
disserves the public interest.

As the Plaintiffs have

established, there is a "delicate ecosystem which supports
scientific research worldwide,"

( Pl.' s Br., Dkt. No. 6 at 21),

and copyright law pays a critical function within that system.
"Inadequate protections for copyright owners can threaten the
very store of knowledge to be accessed; encouraging the
production of creative work thus ultimately serves the public' s
interest in promoting the accessibility of such works. "
691 F.3d at 287 .

W PIX,

The existence of Elsevier shows that

publication of scient ific research

generates substantial

economic value.
The public' s interest in the broad diffusion of scientific
knowledge is sustained by two critical exceptions in copyright
14

law.

First,

the "idea/expression dichotomy" ensures that while

a scientific article may be subject to copyright,

the ideas and

See 17 U. S.C. § 102(b)

insights within that article are not.

("In no case does copyright protection for an original work of
authorship extend to any idea,

procedure,

method of operation,

concept,

to this distinction,

every idea,

principle,

process,

system,

or discovery").

theory,

"Due

and fact in a

copyrighted work becomes instantly available for public
exploitation at the moment of publication."
537 U.S. 186,

219

(2003).

So while Elsevier may be able to keep

its actual articles behind a paywall,
them are fair game for anyone.
doctrine,

comment,

the discoveries within

Secondly,

codified at 17 U.S.C. § 107,

expressions,

as well as ideas,

news reporting,

Eldred v. Ashcroft,

the "fair use"

allows the public to use

nfor purposes such as criticism,

teaching .

.

.

scholarship,

or

research" without being liable for copyright infringement.

(emphasis added)

Under this doctrine,

themselves may be taken and used,
purposes,

Elsevier' s articles

bu.t only for legitimate

and not for wholesale infringement.

U.S. at 219.2

See Eldred,

537

The public interest in the broad dissemination and

use of scientific research is protected by the idea/expression
dichotomy and the fair use doctrine.

2

See Golan v. Holder,

The public interest in wide d1sseminat1on of scientific works

by the fact that copyrights are given only limited

464

15

U.S.

duration.

417, 431-32

132

is also served

See Sony Corp.

(1984).

S.

Ct. 873,

890 (2012);

Eldred,

537 U.S. at 219.

Given the

importance of scientific research and the critical role that
copyright plays in promoting it,

the public interest weighs in

favor of an injunction.

Conclusion

For the reasons set forth above,

It is hereby ordered that:

preliminary injunction is granted.

1. The Defendants,
agents,

their officers,

servants,

employees,

the motion for a

directors,

principals,

successors and assigns,

and

all persons and entities in active concert or participation
with them,

are hereby temporarily restrained from unlawful

access to,

use,

reproduction,

and/or distribution of

Elsevier's copyrighted works and from assisting,

aiding,

or

abetting any other person or business entity in engaging in
unlawful access to,

use,

reproduction,

and/or distribution

of Elsevier' s copyrighted works.
2. Upon the Plaintiffs'

request,

have registered Defendants'

those organizations which

domain names on behalf of

Defendants shall disclose immediately to the Plaintiffs all
information in their possession concerning the identity of
the operator or registrant of such domain names and of any
16

bank accounts or financial accounts owned or used by such
operator or registrant.
3. Defendants shall not transfer ownership of the Defendants'
websites during the pendency of this Action,

or until

further Order of the Court.
4. The TLD Registries for the Defendants'
administrators,

websites,

or their

shall place the domain names on

registryHold/serverHold as well as serverUpdate,
serverDelete,

and serverTransfer prohibited statuses,

until

further Order of the Court.
5. The Defendants shall preserve copies of all computer files
relating to the use of the websites and shall take all
necessary steps to retrieve computer files relating to the
use of the websites that may have been deleted before entry
of this Order.
6. That security in the amount of $ 5, 000 be posted by the
Plaintiffs within one week of the entry of this Order.
Fed.

R.

Civ.

P. 6 5(c).

17

See

It is so ordered.

New York,

fY
October ? ;--1

2015
R BERT W. SWEET

U.S.D.J.

18


Barok
Communing Texts
2014


Communing Texts

_A talk given on the second day of the conference_ [Off the
Press](http://digitalpublishingtoolkit.org/22-23-may-2014/program/) _held at
WORM, Rotterdam, on May 23, 2014. Also available
in[PDF](/images/2/28/Barok_2014_Communing_Texts.pdf "Barok 2014 Communing
Texts.pdf")._

I am going to talk about publishing in the humanities, including scanning
culture, and its unrealised potentials online. For this I will treat the
internet not only as a platform for storage and distribution but also as a
medium with its own specific means for reading and writing, and consider the
relevance of plain text and its various rendering formats, such as HTML, XML,
markdown, wikitext and TeX.

One of the main reasons why books today are downloaded and bookmarked but
hardly read is the fact that they may contain something relevant but they
begin at the beginning and end at the end; or at least we are used to treat
them in this way. E-book readers and browsers are equipped with fulltext
search functionality but the search for "how does the internet change the way
we read" doesn't yield anything interesting but the diversion of attention.
Whilst there are dozens of books written on this issue. When being insistent,
one easily ends up with a folder with dozens of other books, stucked with how
to read them. There is a plethora of books online, yet there are indeed mostly
machines reading them.

It is surely tempting to celebrate or to despise the age of artificial
intelligence, flat ontology and narrowing down the differences between humans
and machines, and to write books as if only for machines or return to the
analogue, but we may as well look back and reconsider the beauty of simple
linear reading of the age of print, not for nostalgia but for what we can
learn from it.

This perspective implies treating texts in their context, and particularly in
the way they commute, how they are brought in relations with one another, into
a community, by the mere act of writing, through a technique that have
developed over time into what we have came to call _referencing_. While in the
early days referring to texts was practised simply as verbal description of a
referred writing, over millenia it evolved into a technique with standardised
practices and styles, and accordingly: it gained _precision_. This precision
is however nothing machinic, since referring to particular passages in other
texts instead of texts as wholes is an act of comradeship because it spares
the reader time when locating the passage. It also makes apparent that it is
through contexts that the web of printed books has been woven. But even though
referencing in its precision has been meant to be very concrete, particularly
the advent of the web made apparent that it is instead _virtual_. And for the
reader, laborous to follow. The web has shown and taught us that a reference
from one document to another can be plastic. To follow a reference from a
printed book the reader has to stand up, walk down the street to a library,
pick up the referred volume, flip through its pages until the referred one is
found and then follow the text until the passage most probably implied in the
text is identified, while on the web the reader, _ideally_ , merely moves her
finger a few milimeters. To click or tap; the difference between the long way
and the short way is obviously the hyperlink. Of course, in the absence of the
short way, even scholars are used to follow the reference the long way only as
an exception: there was established an unwritten rule to write for readers who
are familiar with literature in the respective field (what in turn reproduces
disciplinarity of the reader and writer), while in the case of unfamiliarity
with referred passage the reader inducts its content by interpreting its
interpretation of the writer. The beauty of reading across references was
never fully realised. But now our question is, can we be so certain that this
practice is still necessary today?

The web silently brought about a way to _implement_ the plasticity of this
pointing although it has not been realised as the legacy of referencing as we
know it from print. Today, when linking a text and having a particular passage
in mind, and even describing it in detail, the majority of links physically
point merely to the beginning of the text. Hyperlinks are linking documents as
wholes by default and the use of anchors in texts has been hardly thought of
as a _requirement_ to enable precise linking.

If we look at popular online journalism and its use of hyperlinks within the
text body we may claim that rarely someone can afford to read all those linked
articles, not even talking about hundreds of pages long reports and the like
and if something is wrong, it would get corrected via comments anyway. On the
internet, the writer is meant to be in more immediate feedback with the
reader. But not always readers are keen to comment and not always they are
allowed to. We may be easily driven to forget that quoting half of the
sentence is never quoting a full sentence, and if there ought to be the entire
quote, its source text in its whole length would need to be quoted. Think of
the quote _information wants to be free_ , which is rarely quoted with its
wider context taken into account. Even factoids, numbers, can be carbon-quoted
but if taken out of the context their meaning can be shaped significantly. The
reason for aversion to follow a reference may well be that we are usually
pointed to begin reading another text from its beginning.

While this is exactly where the practices of linking as on the web and
referencing as in scholarly work may benefit from one another. The question is
_how_ to bring them closer together.

An approach I am going to propose requires a conceptual leap to something we
have not been taught.

For centuries, the primary format of the text has been the page, a vessel, a
medium, a frame containing text embedded between straight, less or more
explicit, horizontal and vertical borders. Even before the material of the
page such as papyrus and paper appeared, the text was already contained in
lines and columns, a structure which we have learnt to perceive as a grid. The
idea of the grid allows us to view text as being structured in lines and
pages, that are in turn in hand if something is to be referred to. Pages are
counted as the distance from the beginning of the book, and lines as the
distance from the beginning of the page. It is not surprising because it is in
accord with inherent quality of its material medium -- a sheet of paper has a
shape which in turn shapes a body of a text. This tradition goes as far as to
the Ancient times and the bookroll in which we indeed find textual grids.

[![Papyrus of Plato
Phaedrus.jpg](/images/thumb/4/49/Papyrus_of_Plato_Phaedrus.jpg/700px-
Papyrus_of_Plato_Phaedrus.jpg)](/File:Papyrus_of_Plato_Phaedrus.jpg)

[![](/skins/common/images/magnify-
clip.png)](/File:Papyrus_of_Plato_Phaedrus.jpg "Enlarge")


A crucial difference between print and digital is that text files such as HTML
documents nor markdown documents nor database-driven texts did inherit this
quality. Their containers are simply not structured into pages, precisely
because of the nature of their materiality as media. Files are written on
memory drives in scattered chunks, beginning at point A and ending at point B
of a drive, continuing from C until D, and so on. Where does each of these
chunks start is ultimately independent from what it contains.

Forensic archaeologists would confirm that when a portion of a text survives,
in the case of ASCII documents it is not a page here and page there, or the
first half of the book, but textual blocks from completely arbitrary places of
the document.

This may sound unrelated to how we, humans, structure our writing in HTML
documents, emails, Office documents, even computer code, but it is a reminder
that we structure them for habitual (interfaces are rectangular) and cultural
(human-readability) reasons rather then for a technical necessity that would
stem from material properties of the medium. This distinction is apparent for
example in HTML, XML, wikitext and TeX documents with their content being both
stored on the physical drive and treated when rendered for reading interfaces
as single flow of text, and the same goes for other texts when treated with
automatic line-break setting turned off. Because line-breaks and spaces and
everything else is merely a number corresponding to a symbol in character set.

So how to address a section in this kind of document? An option offers itself
-- how computers do, or rather how we made them do it -- as a position of the
beginning of the section in the array, in one long line. It would mean to
treat the text document not in its grid-like format but as line, which merely
adapts to properties of its display when rendered. As it is nicely implied in
the animated logo of this event and as we know it from EPUBs for example.

The general format of bibliographic record is:



Author. Title. Publisher. [Place.] Date. [Page.] URL.


In the case of 'reference-linking' we can refer to a passage by including the
information about its beginning and length determined by the character
position within the text (in analogy to _pp._ operator used for printed
publications) as well as the text version information (in printed texts served
by edition and date of publication). So what is common in printed text as the
page information is here replaced by the character position range and version.
Such a reference-link is more precise while addressing particular section of a
particular version of a document regardless of how it is rendered on an
interface.

It is a relatively simple idea and its implementation does not be seem to be
very hard, although I wonder why it has not been implemented already. I
discussed it with several people yesterday to find out there were indeed
already attempts in this direction. Adam Hyde pointed me to a proposal for
_fuzzy anchors_ presented on the blog of the Hypothes.is initiative last year,
which in order to overcome the need for versioning employs diff algorithms to
locate the referred section, although it is too complicated to be explained in
this setting.[1] Aaaarg has recently implemented in its PDF reader an option
to generate URLs for a particular point in the scanned document which itself
is a great improvement although it treats texts as images, thus being specific
to a particular scan of a book, and generated links are not public URLs.

Using the character position in references requires an agreement on how to
count. There are at least two options. One is to include all source code in
positioning, which means measuring the distance from the anchor such as the
beginning of the text, the beginning of the chapter, or the beginning of the
paragraph. The second option is to make a distinction between operators and
operands, and count only in operands. Here there are further options where to
make the line between them. We can consider as operands only characters with
phonetic properties -- letters, numbers and symbols, stripping the text from
operators that are there to shape sonic and visual rendering of the text such
as whitespaces, commas, periods, HTML and markdown and other tags so that we
are left with the body of the text to count in. This would mean to render
operators unreferrable and count as in _scriptio continua_.

_Scriptio continua_ is a very old example of the linear onedimensional
treatment of the text. Let's look again at the bookroll with Plato's writing.
Even though it is 'designed' into grids on a closer look it reveals the lack
of any other structural elements -- there are no spaces, commas, periods or
line-breaks, the text is merely one flow, one long line.

_Phaedrus_ was written in the fourth century BC (this copy comes from the
second century AD). Word and paragraph separators were reintroduced much
later, between the second and sixth century AD when rolls were gradually
transcribed into codices that were bound as pages and numbered (a dramatic
change in publishing comparable to digital changes today).[2]

'Reference-linking' has not been prominent in discussions about sharing books
online and I only came to realise its significance during my preparations for
this event. There is a tremendous amount of very old, recent and new texts
online but we haven't done much in opening them up to contextual reading. In
this there are publishers of all 'grounds' together.

We are equipped to treat the internet not only as repository and library but
to take into account its potentials of reading that have been hiding in front
of our very eyes. To expand the notion of hyperlink by taking into account
techniques of referencing and to expand the notion of referencing by realising
its plasticity which has always been imagined as if it is there. To mesh texts
with public URLs to enable entaglement of referencing and hyperlinks. Here,
open access gains its further relevance and importance.

Dušan Barok

_Written May 21-23, 2014, in Vienna and Rotterdam. Revised May 28, 2014._

Notes

1. ↑ Proposals for paragraph-based hyperlinking can be traced back to the work of Douglas Engelbart, and today there is a number of related ideas, some of which were implemented on a small scale: fuzzy anchoring, 1(http://hypothes.is/blog/fuzzy-anchoring/); purple numbers, 2(http://project.cim3.net/wiki/PMWX_White_Paper_2008); robust anchors, 3(http://github.com/hypothesis/h/wiki/robust-anchors); _Emphasis_ , 4(http://open.blogs.nytimes.com/2011/01/11/emphasis-update-and-source); and others 5(http://en.wikipedia.org/wiki/Fragment_identifier#Proposals). The dependence on structural elements such as paragraphs is one of their shortcoming making them not suitable for texts with longer paragraphs (e.g. Adorno's _Aesthetic Theory_ ), visual poetry or computer code; another is the requirement to store anchors along the text.
2. ↑ Works which happened not to be of interest at the time ceased to be copied and mostly disappeared. On the book roll and its gradual replacement by the codex see William A. Johnson, "The Ancient Book", in _The Oxford Handbook of Papyrology_ , ed. Roger S. Bagnall, Oxford, 2009, pp 256-281, 6(http://google.com/books?id=6GRcLuc124oC&pg=PA256).

Addendum (June 9)

Arie Altena wrote a [report from the
panel](http://digitalpublishingtoolkit.org/2014/05/off-the-press-report-day-
ii/) published on the website of Digital Publishing Toolkit initiative,
followed by another [summary of the
talk](http://digitalpublishingtoolkit.org/2014/05/dusan-barok-digital-imprint-
the-motion-of-publishing/) by Irina Enache.

The online repository Aaaaarg [has
introduced](http://twitter.com/aaaarg/status/474717492808413184) the
reference-link function in its document viewer, see [an
example](http://aaaaarg.fail/ref/60090008362c07ed5a312cda7d26ecb8#0.102).


 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.