unicode in Constant 2015


you do page
layout, you get into a new frame of mind ... you look in a different way at
publications. It is less content oriented, but more layout oriented. You will
pick something up and it will spread. People by now have understood that
it is not such a good idea to use twelve different fonts in one text ... and I
think that knowledge about better page layout will also spread.

19

When we came to the Libre Graphics Meeting
for the first time in 2007, we recorded this rare
conversation with George Williams, developer of
FontForge, the editing tool for fonts. We spoke
about Shakespeare, Unicode, the pleasure of making beautiful things, and pottery.
We‘re doing these interviews, as we’re working as designers on Open Source
OK.

With Open Source tools, as typographers, but often when we speak to
developers they say well, tell me what you want, or they see our interest in
what they are doing as a kind of feature request or bug report.

(laughs) Yes.

Of course it’s clear that that’s the way it often works, but for us it’s also
interesting to think about these tools as really tools, as ways of shaping
work, to try and understand how they are made or who is making them.
It can h


in fonts. And there was this program
that came out in the eighties called Fontographer which allowed you to create PostScript 1 and later TrueType 2 fonts. And I loved it. And I made lots
of calligraphic fonts with it.

You were ... like 20?

I was 20~30. Lets see, I was born in 1959, so in the eighties I was in my
twenties mostly. And then Fontographer was bought up by Macromedia 3
who had no interest in it. They wanted FreeHand 4 which was done by
the same company. So they dropped Fon ... well they continued to sell
Fontographer but they didn’t update it. And then OpenType 5 came out and
Unicode 6 came out and Fontographer didn’t do this right and it didn’t do
that right ... And I started making my own fonts, and I used Fontographer
to provide the basis, and I started writing scripts that would add accents to
latin letters and so on. And figured out the Type1 7 format so that I could
decompose it — decompose the Fontographer output so that I could add
1
2
3
4
5

6
7

PostScript fonts are outline font specifications developed by Adobe Systems for professional
digital typesetting, which uses PostScript file format to encode font information.
Wikipedia. PostScript fonts — Wikiped


ssed 18.12.2014]

Adobe FreeHand (formerly Macromedia Freehand) is a computer application for creating
two-dimensional vector graphics. Adobe discontinued development and updates to the
program. Wikipedia. Adobe FreeHand — Wikipedia, The Free Encyclopedia, 2014. [Online; accessed 18.12.2014]
OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType,
retaining TrueType’s basic structure and adding many intricate data structures for prescribing
typographic behavior. Wikipedia. Opentype — wikipedia, the free encyclopedia, 2014. [Online; accessed 18.12.2014]
Unicode is a computing industry standard for the consistent encoding, representation, and
handling of text expressed in most of the world’s writing systems.
Wikipedia. Unicode — Wikipedia, The Free Encyclopedia, 2014. [Online; accessed 18.12.2014]

Type 1 is a font format for single-byte digital fonts for use with Adobe Type Manager
software and with PostScript printers. It can support font hinting. It was originally a
proprietary specification, but Adobe released the specification to third-party font
manufacturers provided that all Type 1 fonts adhere to it.
Wikipedia. PostScript fonts — Wikipedia, The Free Encyclopedia, 2014. [Online; accessed 18.12.2014]

24

my own things to it. And then Fontographer didn’t do Type0 8 PostScript
fonts, so I figured that o


accessed 18.12.2014]

26

So when, before we moved, I was curious about, I wanted you to talk
about a Shakespearian influence on your interest in fonts. But on the
other hand you talk about working in a company where you did HTML
editors at the time you actually started, I think. So do you think that
is somehow present ... the web is somehow present in your — in how
FontForge works? Or how fonts work or how you think about fonts?

I don’t think the web had much to do with my — well, that’s not true.
OK, when I was working on the HTML editor, at the time, mid-90s, there
weren’t any Unicode fonts, and so part of the reason I was writing all these
scripts to add accents and get Type0 support in PostScript (which is what
you need for a Unicode font) was because I needed a Unicode font for our
HTML product.
To that extent — yes-s-s-s.
It had an effect. Aside from that, not really.
The web has certainly allowed me to distribute it. Without the web I doubt
anyone would know — I wouldn’t have any idea how to ‘market’ it. If that’s
the right word for something that doesn’t get paid for. And certainly the
web has provided a convenient infrastructure to do the documentation in.
But — as for font design itself — that (the web) has certainly not affected
me.
Maybe with this creative commons talk that Jon Phillips was giving, there
may be, at some point, a butt


author and developer of ConTeXt, past president of NTG, and
active in many other areas of the TeX community
Hans Hagen – Interview – TeX Users Group. http://tug.org/interviews/hagen.html, 2006. [Online; accessed 18.12.2014]

47

proper way to do it, is to use he. It’s an invented problem. This whole question is
an invented question and there is no such thing as a need for considering any other
options besides this. 3 So I wrote back and said: That’s not up to you to decide,
because if somebody has a problem, than there is a problem. So I kind of naively
suggested that we could make a Unicode character, that can stand in, like a
typographical element, that does not necessarily have a pronounciation yet.
So something that, when you are reading it, you could either say he or she
or they and it would be sort of [emergent|dialogic|personalized].
Like delayed political correctness or delayed embraciveness. But, little did I
know, that Unicode was not the answer.

Did they tell you that? That Unicode is not the answer?

Well, Arthur actually wrote back 4 , and he knows a lot about Unicode and
he said: With Unicode you have to prove that it’s in use already. In my sense,
Unicode was a playground where I could just map whatever values I wanted
to be whatever glyph I wanted. Somewhere, in some corner of unused
namespace or something. But that’s not the way it works. But TeX works
like this. So I could always just define a macro that would do this. Hans
actually wrote a macro 5 that would basically flip a coin at the beginning of
your paper. So whenever you wanted to use the gender neutral, you would
just use the macro and then it wouldn’t be up to you. It’s another way of
obfuscating, or pushing the responsibility away from you as an author. It’s
like ok, well,


n
of that because it’s such a simple thing, once you start cramming too much
into it it starts feeling wrong. But all its gonna take is for someone to make
a new app that needs something else and then there will be a reason to
change it but I think the change will always be adding, not removing.

254

The following text is a transcription of a talk by and conversation with Denis Jacquerye in the context of the Libre
Graphics Research Unit in 2012. We invited him in the
context of a session called Co-position where we tried to
re-imagine layout from scratch. The text-encoding standard Unicode and moreover Denis’ precise understanding of the many cultural and political path-dependencies
involved in the making of it, felt like an obvious place
to start. Denis Jacquerye is involved in language technology, software localization and font engineering. He’s
been the co-lead of the DéjàVu Font project and works
with the African Network for Localization (ANLoc) to remove language limitations that exist in today’s technology.
Denis currently lives in London.This text is also available
in Considering your tools. 1 A shorter version has been published in Libre Graphics Magazine 2.1.
Th


them all in ASCII.
Often they would start with ASCII and then add the specific requirements
but soon they ended up having a lot of different standards because of all the
different needs. So one single byte of representation would have different
meanings and each of these meanings could be displayed differently in fonts.
But old webpages are often using old encodings. If your browser is not
using the right encoding you would have jibbish displayed because of this
chaos of encodings. So in the late eighties, they started thinking about
those problems and in the nineties they started working on Unicode: several
companies got together and worked on one single unifying standard that
would be compatible with all the pre-used standards or the new coming
ones.
Unicode is pretty well defined, you have a universal code point to represent to identify a character, and then that character can be displayed with
different glyphs depending on the font or the style selected. With that
framework, when you need to have the proper character displayed, you have
to go the code point in a font editor, change the shape of the character and
it can be displayed properly. Then sometimes there’s just no code point for
the character you need because it hasn’t been added, it wasn’t in any existing
262

standard or nobody has ever needed it before or people who needed it just
used old printers and metal type.
So in this case, you have to start to deal with the Unicode organization itself.
They have a few ways to communicate like the mailing list, the public, and
recently they also opened a forum where you can ask questions about the
characters you need as you might just not find them.
In most operating systems, you have a character map application where you
can access all the characters, either all the characters that exist in Unicode or
the ones available in the font you’re using. And it’s quite hard to find what
you need, as it’s most of the time organized with a very restrictive set of
rules. Characters are just ordered in the way they’re ordered within Unicode
using their code point order: for example, capital A is 41, and then B is 42,
etc. The further you go in the alphabet the further you go in the Unicode
blocks and tables, and there is a lot of different writing systems ... Moreover
because Unicode is sort of expanding organically – work is done on one
script, and then on another, then coming back to previous scripts to add
things – things are not really in a logical or practical order. Basic Latin is all
the way up there, and more far, you have Latin Extended A, (Conditional)
Extended Latin, Latin Extended B, C and D. Those are actually quite far
apart within Unicode, and each of them can have a different setup: for
example, here you have a capital letter that is just alone, and here you have
a capital letter and a lowercase letter. So when you know the character you
want to use, sometimes you would find the uppercase letter but you’d have
to keep looking for the corresponding lowercase.
Basically when you have a character that you can’t find, people from the
mailing list or the forum can tell you if it would be relevant to include it
in Unicode or not. And if you’re very motivated, you can try to meet the
inclusion criterias. But for a proper inclusion, there has to be a formal
proposal using their template with questions to answer, you also have to
provide proof that the characters you want to add are actually used or how
they would be used.

263

The criterias are quite complicated because you have to make sure that this is
not a glyphic variant (the same character but represented differently). Then
you also have to prove the character doesn’t already exist because sometimes
you just don’t know it’s a variant of another on


he character so
that they can use it in their documentation.
How long does it take usually?

It depends as sometimes they accept it right away if you explain your request
properly and provide enough proof, but they often ask for revisions to the
proposals and then it can be rejected because it doesn’t meet the criterias.
Actually those criterias have changed a bit in the past. They started with
Basic Latin and then added special characters which were used: here for example is the international phonetic alphabet but also all the accented ones ...
As they were used in other encodings and that Unicode initially wanted to
be compatible with everything that already exists, they added them. Then
they figured they already had all those accented characters from other encodings so they’re also going to add all the ones they know are used even
though they were not encoded yet. They ended up with different names because they had different policies at the beginning instead of having the same
policy as now. They added here a bunch of Latin letters with marks that
were used for example in transcription. So if you’re transcribing Sanskrit for
example, you would use some of the characters here. Then


ot above in the block of the diacritical
marks. You have access to all the diacritical marks they thought were useful
at some point. At that point, when they realized they would end up having
thousands of accented characters they figured with this way where we can
have just any possibility, so from now on, they’re just going to say if you
want to have an accented character that hasn’t been encoded already, just
264

use the parts that can represent it. Then in 1996, some people for Yoruba,
a spoken language in Nigeria, made a proposal to add the characters with
diacritics they needed and Unicode just rejected the proposal as they could
compose those characters by combining existing parts.
Weren’t the elements they needed already in the toolbox?

Yes, the encoding parts are there, meaning it can be represented with
Unicode but the software didn’t handle them properly so it made more
sense to the Yoruba speakers to have it encoded it in Unicode.

So you could type, but you’d need to type two characters of course?

Yes, the way you type things is a big problem. Because most keyboards
are based on old encodings where you have accented characters as single
characters, so when you want to do a sequence of characters, you actually
have to type more, or you’d have to have a special keyboard layout allowing
you to have one key mapped to several characters. So that’s technically
feasible but it’s a slow process to have all the possibilities. You might have
one whic is very common so developers end up adding it to the keyboard
layouts or whatever applications they’re using, but not when other people
have different needs.
There is a lot of documentation within Unicode, but it’s quite hard to find
what you want when you’re just starting, and it’s quite technical. Most of it
is actually in a book they publish at every new version. This book has a few
chapters that describe how Unicode works and how characters should work
together, what properties they have. And all the differences between scripts
are relevant. They also have special cases trying to cater to those needs that
weren’t met or the proposals that were rejected. They have a few examples
in the Unicode book: in some transcription systems they have this sequence
of characters or ligature; a t and a s with a ligature tie and then a dot above.
So the ligature tie means that t and s are pronounced together and the dot
above is err ... has a different meaning (laughs). But it has a meaning! But
because of the way characters work in Unicode, applications actually reorder
it whatever you type in, it’s reordered so that the ligature tie ends up being
moved after the dot. So you always have this representation because you
have the t, there should be the dot, and then there should be the ligature tie
and then the s. So the t goes first, the dot goes above the t, the ligature tie
goes above everything and then the s just goes next to the t. The way they
265

explain how to do this is supposed to do the t, the ligature tie, and then a
special diacritical mark that prevents any kind of reordering, then you can
add the dot and then you can do the s. So this kind of use is great as you
have a solution, it’s just super hard because you have to type five characters
instead of ... well ... four (laughs). But still, most of the libraries that are
rendering fonts don’t handle it properly and then even most fonts don’t
plan for it. So even if the fonts did anyway the libraries wouldn’t handle it
properly. Then there are other things that Unicode does: because of that
separation between accents and characters and then the composition, you
can actually normalize how things are ordered. This sequence of characters
can be reordered into the pre-composed one with a circumflex or whatever;
you have combining marks in the normalized order. All these things have
to be handled in the libraries, in the application or in the fonts.
The documentation of Unicode itself is not prescriptive, meaning that the
shape of the glyphs are not set in stone. So you can still have room to
have the style you want, the style your target users want. For example
if we have different glyphs: Unicode has just one shape and it’s the font
designer’s choice to have different ones. Unicode is not about glyphs, it’s
really about how information is represented, how it’s displayed. Or you have
two characters displayed as a ligature: it is actually encoded as one character
because of previous encodings. But if ever it would be a new case, Unicode
wouldn’t stake the ligature as a single character.

266

So all this information is really in a corner there. It’s quite rare to find fonts
that actually use this information to provide to the needs of the people who
need specific features. One of the way to implement all those features is
with TrueType OpenType and there are also some alternatives like Graphite
which is a subset of a TrueType OpenType font. But then, you need your
applications to be able to handle Graphite. So eventually the real unique
standard is TrueType Opentype. It’s pretty well documented and very technical becau


e system, meaning that some identified languages just can’t be identified in OpenType. One of the features in
OpenType is managing language environment. If I’m using Polish, I’d want
this shape; if I’m using Navajo, I’d want this shape. That’s very cool because you can make just one font that’s used by Polish speakers and Navajo
speakers without them worrying about changing fonts as long as they specify the language they’re using. But you can’t use this feature for languages
which aren’t in the OpenType specifications as they have their own way of
describing languages than Unicode. It’s really frustrating because, you can
find all the characters in Unicode, not organized in a practical way: you have
to look all around the tables to find the characters that may be used by one
language, and then you have to look around for how to actually use them.
It is a real lack of awareness within the font designer community. Because
even when they might add all the characters you need, they might just not
add the positioning, so for example you have a ... when you combine with a
circumflex, it doesn’t position well because most of the font designers still
work with the old encoding mindset when you have one character for one
accentuated letter. Sometimes they just think that following the Unicode
blocks is good enough. But then you have problems where, as you can see
in the Basic Latin charts at the beginning, the capital is in one block and
its lowercase in a different block. And then they just work on one block,
they just don’t do the other one because they don’t think it’s necessary, but
yet, two blocks of the same letter are there, so it would make sense to have
both. It’s hard because there’s very few connections between the Unicode
world, people working on OpenType libraries, font designers and the actual
needs of the users.
267

At the beginning of the presentation you went for the code point of the characters,
all your characters are subtitled by their code points; it’s kind of the beauty of
Unicode to name everything, every character.
Those names are actually quite long. One funny thing about this. Unicode
has the policy of not changing the names of the characters, so they have an
errata where they realized that oh, we shouldn’t have named this that, so here’s
the actual name that makes sense, and the real name is wrong.

Pierre refers to the fact that in the character mappings that each of the glyphs
also has a description. And those are sometimes so abstract and poetic that
this was a start of a work from OSP, the Dingbats Liberation Fest, to try
to re-imagine what shapes would belong to those descriptions. So ‘combining
dot above’ that’s the textual description of the code point. Bu


at there are decisions,
that need to be taken or that have been taken. And actually I like the feeling
of convenience when things get finished. They are done. Not configurable
forever.

( laughs) That’s convenient, if things get done!
339

For this specific book, you have made a few decisions, for example your selection of fonts is particular.
Xavier, can you say something about the typography of Conversations?

Huuumn yep, for the typographic decisions ... in the beginning we searched for
fancy fonts, but in a way came back to use very classic fonts, respectively one classic
font. So the Junicode 8 for the text and the OCR-A 9 for anything else. Because
we decided to focus on testing different ways of layouting and use the fonts as a
way to keep a certain continuity between the parts. We thought this can be more
interesting, than to show that we can find a lot of beautiful, fancy fonts.

So in the beginning, we thought about having a different font for every
speaker, but sooner or later we realised that it would be good to have something that keeps the whole thing together. Right now, this are the two
fonts. The Junicode, which is a font for medievalists, and the OCR-A,
which is a opti


p
the content in a readable and understandable source format.

Xavier, what is going to happen next?

Right now, I’m the guy who tests on Scribus, Inkscape. But I don’t know if it’s
the answer to your question.

I was just curious because you have a month to work on this still, so I was
wondering ... are there other things you are testing or trying ?

Yeah, I think I want to finish the hotglue2svg.sh, I mean it’s my first
Bash program, I want to raise my baby. ( laughs) But right now I’m trying to
find different ways of layouts. The first one is the one with the big squares, the
big unicode characters and all the arrows. So it’s very complicated, but it’s the
attempt to find an another way to express a conversation in text.

Can you say more about that ?

Because in the beginning, my first try was to keep the ‘life’ of a conversation in
the text with some things, like indentation or with graphic things, like the choice
342

of the unicode characters. If this can be a way to express a conversation. Because
it’s hard to it with programming stuff so we’re using GUI based software.

It’s a bit coming to the question, what you are doing differently, if you work
with a direct visual feedback. So you don’t try to reduce the content to get
it through a logical structure. Because that’s in a way how the markdown
to LaTeX transformation is doing it. You set certain rules, that may be in
special cases soft rules, but you really try to establish a logical structure and
have a set of rules and apply them. For me, it’s also an in


ertens, Boris Kish, Christoph Haag, Femke Snelting, George Williams, Gijs
de Heij, ginger coons, Ivan Monroy Lopez, John Haltiwanger, Ludivine Loiseau, Martino Morandi,
Pierre Huyghebaert, Urantsetseg Ulziikhuu, Xavier Klein
Chapter opener: Built with petter by Benjamin Stephan
-> http://github.com/b3nson/petter

Tools: basename, bash, bibtex, cat, Chromium, cp, curl, dpkg, egrep, Etherpad, exit,
ftp, gedit, GIMP, ghostscript, Git, GNU coreutils, grep, ImageMagick, Inkscape, Kate, man,
makeindex, meld, ne, pandoc, pdflatex, pdftk, Processing, python, read, rev, Scribus,
sed, vim, wget
Fonts: Junicode by Peter S. Baker, OCR-A by John Sauter

Source Files:
Texts, fonts and pdf: http://conversations.tools
Software: https://github.com/lafkon/conversations
Published by: Constant Verlag (Brussels, January 2015)
ISBN: 9789081145930

Copyright (C) Constant 2014
Copyleft: This work is free. You may copy, distribute and modify
it according to the terms of the Free Art License (see appendix)
This publication is made possible by the Libre Graphics Community, through the financial support
from the European Commission (Libre Graphics Research Unit) and the Flemish authorities.

Printed in Germany.

http

 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.