The use of Unicode data in font creation
I've worn many hats over the years…one of my favorites is FontForge contributor. Back on August 13th, I made it so when you close the Execute Script dialog, you won't lose the associated script.
One reason for that is I frequently write useful scripts which I accidentally lose when closing FontForge. Today I'll present one.
I'm making a typewriter font…have been since May! One of the things that makes this font different is that when two same letters are nearby they do not look identical, unlike almost every font in existence…minus another font of mine, Quaerite Regnum Dei. 😉
What I'm going for with this project is maximum realism. I've studied output of real typewriters a little too much, and tinkered with GIMP scripts a little way too much, to get acceptable procedurally generated output. So, the precomposed characters in this font, according to my vision, should look as if they were made on a typewriter—type the base letter, backspace, type the mark. Meaning, marks, within reason, must not move from letter to letter; ñ and ã must have the mark in the same place if overlaid on top of one another, same with Ñ and Ã, however I'm willing to imagine a typewriter with two versions of each mark, one for uppercase and lowercase letters.
So unlike basically all fonts where what you want to do is place anchors carefully on each letter, in this font we want uniformity. That means we can use the FontForge Python API to place our marks. Another anomaly is that we really only need one mark
(Mark to Base Positioning) class, let's call it themarkclass
, even though this font has some top marks, bottom marks, even the Hebrew dagesh middle mark, because it's monospaced.
Here's a script to figure out which characters should get marks:
import unicodedata
f=fontforge.fonts()[0]
for g in f.glyphs("encoding"):
e=chr(0 if g.unicode == -1 else g.unicode) # Encoding...
c=unicodedata.category(e) # Unicode "category", looks like e.g. "Lu"
d=unicodedata.decomposition(e) # Unicode "decomposition", for Ñ would be "004E 0303"
il=('L' in c and ('u' in c or 'l' in c) and not d) # Is "e" a base letter?
if il: print(g.glyphname, e, "L")
We get some interesting output like:
...
grk_ALPHA Α L
grk_BETA Β L
grk_GAMMA Γ L
grk_DELTA Δ L
grk_EPSILON Ε L
grk_ZETA Ζ L
grk_ETA Η L
grk_THETA Θ L
grk_IOTA Ι L
grk_KAPPA Κ L
grk_LAMDA Λ L
grk_MU Μ L
grk_NU Ν L
grk_XI Ξ L
grk_OMICRON Ο L
grk_PI Π L
grk_RHO Ρ L
grk_SIGMA Σ L
grk_TAU Τ L
grk_UPSILON Υ L
grk_PHI Φ L
grk_CHI Χ L
grk_PSI Ψ L
grk_OMEGA Ω L
...
All right! And for our marks we can just say:
im = 'M' in c
if im: print(g.glyphname, e, "M")
Now we've got as well:
...
uni05B0 ְ M
uni05B1 ֱ M
uni05B2 ֲ M
uni05B3 ֳ M
uni05B4 ִ M
uni05B5 ֵ M
uni05B6 ֶ M
uni05B7 ַ M
uni05B8 ָ M
uni05B9 ֹ M
uni05BA ֺ M
uni05BB ֻ M
dagesh ּ M
uni05BF ֿ M
shin_dot ׁ M
sin_dot ׂ M
...
All that's left to do is use the FontForge API to add the marks...
if il: g.addAnchorPoint("theclass","base",0,(0 if 'u' in c else -114))
if im: g.addAnchorPoint("theclass","mark",0,0)
And let's also add the needed tables!
f.addLookup("mark1","gpos_mark2base",(),(("mark",(("latn",("dflt")),("cyrl",("dflt")),("grek",("dflt")))),) )
f.addLookupSubtable("mark1", "mark1-1")
f.addAnchorClass("mark1-1", "theclass")
Voilà!
Now we can tell FontForge we want it to generate Latin-1 for us just by going to the Element→Build menu; as long as we have the letters needed, and the marks needed (it's best for Latin-1 to add 0-width marks to the block beginning at U+0300), it will build them. Be aware, if that's not true, it will do its best, and its best is often...terrible.
Can you tell which one FontForge generated?