unicode in Constant 2015


er page layout will also spread.

19

When we came to the Libre Graphics Meeting
for the first time in 2007, we recorded this rare
conversation with George Williams, developer of
FontForge, the editing tool for fonts. We spoke
about Shakespeare, Unicode, the pleasure of making beautiful things, and pottery.
We‘re doing these interviews, as we’re working as designers on Open Source
OK.

With Open Source tools, as typographers, but often when we speak to
developers they say well, tell me what you


rapher was bought up by Macromedia 3
who had no interest in it. They wanted FreeHand 4 which was done by
the same company. So they dropped Fon ... well they continued to sell
Fontographer but they didn’t update it. And then OpenType 5 came out and
Unicode 6 came out and Fontographer didn’t do this right and it didn’t do
that right ... And I started making my own fonts, and I used Fontographer
to provide the basis, and I started writing scripts that would add accents to
latin letters and so on. And


s built on its predecessor TrueType,
retaining TrueType’s basic structure and adding many intricate data structures for prescribing
typographic behavior. Wikipedia. Opentype — wikipedia, the free encyclopedia, 2014. [Online; accessed 18.12.2014]
Unicode is a computing industry standard for the consistent encoding, representation, and
handling of text expressed in most of the world’s writing systems.
Wikipedia. Unicode — Wikipedia, The Free Encyclopedia, 2014. [Online; accessed 18.12.2014]

Type 1 is a font format for single-byte digital fonts for use with Adobe Type Manager
software and with PostScript printers. It can support font hinting. It was originally a
p


ent in your — in how
FontForge works? Or how fonts work or how you think about fonts?

I don’t think the web had much to do with my — well, that’s not true.
OK, when I was working on the HTML editor, at the time, mid-90s, there
weren’t any Unicode fonts, and so part of the reason I was writing all these
scripts to add accents and get Type0 support in PostScript (which is what
you need for a Unicode font) was because I needed a Unicode font for our
HTML product.
To that extent — yes-s-s-s.
It had an effect. Aside from that, not really.
The web has certainly allowed me to distribute it. Without the web I doubt
anyone would know — I wouldn’t have any idea how to ‘market’ it


is no such thing as a need for considering any other
options besides this. 3 So I wrote back and said: That’s not up to you to decide,
because if somebody has a problem, than there is a problem. So I kind of naively
suggested that we could make a Unicode character, that can stand in, like a
typographical element, that does not necessarily have a pronounciation yet.
So something that, when you are reading it, you could either say he or she
or they and it would be sort of [emergent|dialogic|personalized].
Like delayed political correctness or delayed embraciveness. But, little did I
know, that Unicode was not the answer.

Did they tell you that? That Unicode is not the answer?

Well, Arthur actually wrote back 4 , and he knows a lot about Unicode and
he said: With Unicode you have to prove that it’s in use already. In my sense,
Unicode was a playground where I could just map whatever values I wanted
to be whatever glyph I wanted. Somewhere, in some corner of unused
namespace or something. But that’s not the way it works. But TeX works
like this. So I could always just define a ma


of a talk by and conversation with Denis Jacquerye in the context of the Libre
Graphics Research Unit in 2012. We invited him in the
context of a session called Co-position where we tried to
re-imagine layout from scratch. The text-encoding standard Unicode and moreover Denis’ precise understanding of the many cultural and political path-dependencies
involved in the making of it, felt like an obvious place
to start. Denis Jacquerye is involved in language technology, software localization and font eng


using old encodings. If your browser is not
using the right encoding you would have jibbish displayed because of this
chaos of encodings. So in the late eighties, they started thinking about
those problems and in the nineties they started working on Unicode: several
companies got together and worked on one single unifying standard that
would be compatible with all the pre-used standards or the new coming
ones.
Unicode is pretty well defined, you have a universal code point to represent to identify a character, and then that character can be displayed with
different glyphs depending on the font or the style selected. With that
framework, when you need to have the p



the character you need because it hasn’t been added, it wasn’t in any existing
262

standard or nobody has ever needed it before or people who needed it just
used old printers and metal type.
So in this case, you have to start to deal with the Unicode organization itself.
They have a few ways to communicate like the mailing list, the public, and
recently they also opened a forum where you can ask questions about the
characters you need as you might just not find them.
In most operating systems, you have a character map application where you
can access all the characters, either all the characters that exist in Unicode or
the ones available in the font you’re using. And it’s quite hard to find what
you need, as it’s most of the time organized with a very restrictive set of
rules. Characters are just ordered in the way they’re ordered within Unicode
using their code point order: for example, capital A is 41, and then B is 42,
etc. The further you go in the alphabet the further you go in the Unicode
blocks and tables, and there is a lot of different writing systems ... Moreover
because Unicode is sort of expanding organically – work is done on one
script, and then on another, then coming back to previous scripts to add
things – things are not really in a logical or practical order. Basic Latin is all
the way up there, and more far, you have Latin Extended A, (Conditional)
Extended Latin, Latin Extended B, C and D. Those are actually quite far
apart within Unicode, and each of them can have a different setup: for
example, here you have a capital letter that is just alone, and here you have
a capital letter and a lowercase letter. So when you know the character you
want to use, sometimes you would find the uppercase letter but you’d have
to keep looking for the corresponding lowercase.
Basically when you have a character that you can’t find, people from the
mailing list or the forum can tell you if it would be relevant to include it
in Unicode or not. And if you’re very motivated, you can try to meet the
inclusion criterias. But for a proper inclusion, there has to be a formal
proposal using their template with questions to answer, you also have to
provide proof that the characters you w


have changed a bit in the past. They started with
Basic Latin and then added special characters which were used: here for example is the international phonetic alphabet but also all the accented ones ...
As they were used in other encodings and that Unicode initially wanted to
be compatible with everything that already exists, they added them. Then
they figured they already had all those accented characters from other encodings so they’re also going to add all the ones they know are used even
though t


have an accented character that hasn’t been encoded already, just
264

use the parts that can represent it. Then in 1996, some people for Yoruba,
a spoken language in Nigeria, made a proposal to add the characters with
diacritics they needed and Unicode just rejected the proposal as they could
compose those characters by combining existing parts.
Weren’t the elements they needed already in the toolbox?

Yes, the encoding parts are there, meaning it can be represented with
Unicode but the software didn’t handle them properly so it made more
sense to the Yoruba speakers to have it encoded it in Unicode.

So you could type, but you’d need to type two characters of course?

Yes, the way you type things is a big problem. Because most keyboards
are based on old encodings where you have accented characters as single
characters, so when you want to do


ave all the possibilities. You might have
one whic is very common so developers end up adding it to the keyboard
layouts or whatever applications they’re using, but not when other people
have different needs.
There is a lot of documentation within Unicode, but it’s quite hard to find
what you want when you’re just starting, and it’s quite technical. Most of it
is actually in a book they publish at every new version. This book has a few
chapters that describe how Unicode works and how characters should work
together, what properties they have. And all the differences between scripts
are relevant. They also have special cases trying to cater to those needs that
weren’t met or the proposals that were rejected. They have a few examples
in the Unicode book: in some transcription systems they have this sequence
of characters or ligature; a t and a s with a ligature tie and then a dot above.
So the ligature tie means that t and s are pronounced together and the dot
above is err ... has a different meaning (laughs). But it has a meaning! But
because of the way characters work in Unicode, applications actually reorder
it whatever you type in, it’s reordered so that the ligature tie ends up being
moved after the dot. So you always have this representation because you
have the t, there should be the dot, and then there should be the


laughs). But still, most of the libraries that are
rendering fonts don’t handle it properly and then even most fonts don’t
plan for it. So even if the fonts did anyway the libraries wouldn’t handle it
properly. Then there are other things that Unicode does: because of that
separation between accents and characters and then the composition, you
can actually normalize how things are ordered. This sequence of characters
can be reordered into the pre-composed one with a circumflex or whatever;
you have combining marks in the normalized order. All these things have
to be handled in the libraries, in the application or in the fonts.
The documentation of Unicode itself is not prescriptive, meaning that the
shape of the glyphs are not set in stone. So you can still have room to
have the style you want, the style your target users want. For example
if we have different glyphs: Unicode has just one shape and it’s the font
designer’s choice to have different ones. Unicode is not about glyphs, it’s
really about how information is represented, how it’s displayed. Or you have
two characters displayed as a ligature: it is actually encoded as one character
because of previous encodings. But if ever it would be a new case, Unicode
wouldn’t stake the ligature as a single character.

266

So all this information is really in a corner there. It’s quite rare to find fonts
that actually use this information to provide to the needs of the people who
need specific features. One


ers without them worrying about changing fonts as long as they specify the language they’re using. But you can’t use this feature for languages
which aren’t in the OpenType specifications as they have their own way of
describing languages than Unicode. It’s really frustrating because, you can
find all the characters in Unicode, not organized in a practical way: you have
to look all around the tables to find the characters that may be used by one
language, and then you have to look around for how to actually use them.
It is a real lack of awareness within the font designer


ave a ... when you combine with a
circumflex, it doesn’t position well because most of the font designers still
work with the old encoding mindset when you have one character for one
accentuated letter. Sometimes they just think that following the Unicode
blocks is good enough. But then you have problems where, as you can see
in the Basic Latin charts at the beginning, the capital is in one block and
its lowercase in a different block. And then they just work on one block,
they just don’t do the other one because they don’t think it’s necessary, but
yet, two blocks of the same letter are there, so it would make sense to have
both. It’s hard because there’s very few connections between the Unicode
world, people working on OpenType libraries, font designers and the actual
needs of the users.
267

At the beginning of the presentation you went for the code point of the characters,
all your characters are subtitled by their code points; it’s kind of the beauty of
Unicode to name everything, every character.
Those names are actually quite long. One funny thing about this. Unicode
has the policy of not changing the names of the characters, so they have an
errata where they realized that oh, we shouldn’t have named this that, so here’s
the actual name that makes sense, and the real name is wrong.

Pierre refers to the fact


avier, can you say something about the typography of Conversations?

Huuumn yep, for the typographic decisions ... in the beginning we searched for
fancy fonts, but in a way came back to use very classic fonts, respectively one classic
font. So the Junicode 8 for the text and the OCR-A 9 for anything else. Because
we decided to focus on testing different ways of layouting and use the fonts as a
way to keep a certain continuity between the parts. We thought this can be more
interesting, than to show that


trying ?

Yeah, I think I want to finish the hotglue2svg.sh, I mean it’s my first
Bash program, I want to raise my baby. ( laughs) But right now I’m trying to
find different ways of layouts. The first one is the one with the big squares, the
big unicode characters and all the arrows. So it’s very complicated, but it’s the
attempt to find an another way to express a conversation in text.

Can you say more about that ?

Because in the beginning, my first try was to keep the ‘life’ of a conversation in
the text with some things, like indentation or with graphic things, like the choice
342

of the unicode characters. If this can be a way to express a conversation. Because
it’s hard to it with programming stuff so we’re using GUI based software.

It’s a bit coming to the question, what you are doing differently, if you work
with a direct visual f


Chromium, cp, curl, dpkg, egrep, Etherpad, exit,
ftp, gedit, GIMP, ghostscript, Git, GNU coreutils, grep, ImageMagick, Inkscape, Kate, man,
makeindex, meld, ne, pandoc, pdflatex, pdftk, Processing, python, read, rev, Scribus,
sed, vim, wget
Fonts: Junicode by Peter S. Baker, OCR-A by John Sauter

Source Files:
Texts, fonts and pdf: http://conversations.tools
Software: https://github.com/lafkon/conversations
Published by: Constant Verlag (Brussels, January 2015)
ISBN: 9789081145930

Copyright (C) Consta

 

Display 200 300 400 500 600 700 800 900 1000 ALL characters around the word.