The two ligatures
lam_alefWasla-ar.fina though present in the Arabic -> Basic list, don’t have entries in the GlyphsData which causes Make Composite Glyphs to make nonessential components for them, which is rather odd since all other lam-alef ligatures in that list work fine.
The two ligatures
You are right. I’ll add them. What would you prefer as production name:
or any other combination. I tend to use the none presentation form unicodes if possible.
I’d prefer the
In general I think using presentation forms for Arabic positional variants is a bad practice and shouldn’t be enabled by default. Very few applications need this (and these don’t support OpenType at all, so users get very degraded experience anyway). It also breaks text extraction from PDFs that use the
cmap table or glyph names to populate PDFs
/ToUnicode mapping as one would end with presentation forms in the extracted text not the original characters (some applications with normalize this away, but since it is NFKC/NFKD normalization, not all applications will do it).
So should I remove all presentation Unicodes? I would very much like this.
Yes, that would be my preference. If some people actually have use for them (I highly doubt it), they can add the presentation forms Unicode values manually.
That is quite a bit work to untangle this. Some chars in that block don’t have and obvious mean to access them (FBC1, FD3E, FDFC). And a lot production names need to be updated.
this is a diff for the changes (mostly done by some scripting).
0001-don-t-use-Arabic-presentation-form-unicodes.patch.zip (40.5 KB)
I found some issues that I need to solve tomorrow (
uighurkirghizyehHamzaabove_alefMaksura-ar, There is no
Looks fine, some things I noticed:
alef-ar.fina.shortstill have the production name
alefFathatan-ar.finashould remain legacy encoded, as they are of no much use in modern fonts (they come from metal type when it was easier to have one sort for this too common combination, it is basically a ligature of alef and fathatan not a single letter).
alefMaksuraAlefabove-ar.finaare also a legacy ligatures of base letter and mark so should remain legacy encoded (no idea why these got into Unicode at all, can’t imagine what legacy use they were for!).
kasratan-ar.isol, all are legacy positional variant for vowel marks (for systems that didn}t have a way to place marks over glyphs), they should remain legacy encoded as well.
Thanks a lot.
Why is the last group different from all the other positional forms? So why should
fathatan-ar.isol have a presentation unicode and not
And do you have an suggestion for
Combining marks don’t have positional forms, it was a hack in some old systems that couldn’t position the marks so they were placed over a space (
.isol form) or over a tatweel (
.medi form), so instead of:
it would be:
But since combining marks have no positional forms, OpenType shaping engines will not apply
medi lookups on them, so if one really wants to emulate this behavior in OpenType it will need contextual substitutions, and for legacy systems (I have never seen any of this) it will need the legacy code points anyway.
It decomposes to
yeh-ar hamzaabove-ar alefMaksura-ar in Unicode, so it basically the same as
yehHamzaabove_alefMaksura-ar, so has no use other than the legacy code point (there is no
uighurkirghizyehHamzaabove-ar in Unicode AFAICT, probably because it is the same as
but those will need the legacy unicode for everything else, too? So we still don’t need them?
Indeed, you probably can drop them completely without any loss.