Gtilde & gtilde from Guaraní

Hi @forum I´m trying to implement a glyph without unicode (from the guarani language) through the localized forms (locl), but nothing happened. There’s the code that I’m using.

languagesystems:

languagesystem latn GUA;

and locl:

script latn;
language GUA;
sub G [tilde tildecomb asciitilde] by Gtilde.loclGUA;
sub g [tilde tildecomb asciitilde] by gtilde.loclGUA;

There´s any error in the sintaxis or I´m missing something?

Thanks in advance!

1 Like

Try putting the quote marks in:

sub G' [tilde tildecomb asciitilde]' by Gtilde.loclGUA;

If that doesn’t work, split the rules:

sub G tilde by Gtilde.loclGUA;
sub G tildecomb by Gtilde.loclGUA;
sub G asciitilde by Gtilde.loclGUA;

So fast!, I will try it that way. Many thanks Bendy.

It does not work, grouped in a list or splitting the rules. It’s confusing that in some cases it works. if someone is encouraged to take a look at what happens I would appreciate it. Thanks in advance.

I didn’t try this but I think sub G' [tilde tildecomb asciitilde]' by Gtilde.loclGUA; will not work.

How does the “Gtilde.loclGUA” look? If it is just a “G” with a “tilde”, y0u don’t need the glyph at all. Just add a “top” anchor in the “G” and “_top” in the “tildecomb”. Alloing “tilde” and “asciitilde” as input might be convenient to the user but might break kerning and other things.
If it is widespread practice to use the wrong encoding you could “fix” it with this lines in “ccmp”:

script latn;
language GUA;
sub G tilde' by tildecomb;
sub G asciitilde' by tildecomb;

Big thanks Georg, the trick with the composition and decomposition works perfect.
Many users of Guarani are a non-type designer public or related to, so they need to be able to use it in a very simple way (keyboard combination). Anyway we will work to improve the methods for approaching to the ideal practices :wink:

@GeorgSeifert all three ways of making the substitution above compile fine. If the rules aren’t valid, shouldn’t there be an error message?

I wouldn’t have expected that but good to know.

Hm, I do not think it is a good idea to solve these Unicode-less glyphs with the locl feature. Locl might not always work as expected, e.g. in an English text explaining about Guaraní language.
Like Georg explained, anchors will make it work regardless of the language setting.
If you want to add the pre-composed Gtilde and gtilde to your font, so that users can find these glyphs next to G and g in the InDesign Glyphs panel (or because you don’t trust anchors ;-), you have to name them u00470303 and u00670303 (Unicodes of G/g plus combining tilde).
Then make sure the combining tilde uni0303 is in the font (required for MS Word) and then add the following lines to ccmp, without language code:

sub G tilde by u00470303;
sub G uni0303 by u00470303;
sub g tilde by u00670303;
sub g uni0303 by u00670303;

I’ve used this for Bulgarian glyphs with a grave accent, works like a charm.

5 Likes

The names can be handles on a different level. You can use “nice” names in Glyphs and the proper names (as Lucas explained) are switched in on export.

there are two ways of achieving this. Either call it Gtilde and assign the proper glyph info manually (select it in Font view and press Cmd+Opt+I.


Or, call the glyph G_tildecomb. Then the info will be generated automatically.

6 Likes

I love that dialog!
But I could not get the appropriate info to be generated automatically from G_tildecomb, it kept on being exported with that name. I will ask my Glyphs-savvy colleges tomorrow and read more manual.

Even though both u00470303 and uni00470303 will work as intended, I just learned that the latter is more compatible with old Acrobat versions, and that it is recommended for glyphs from the BMP, whose Unicodes have 4 hexadecimal positions. Unicodes with 5 positions must be preceeded with just “u”.

1 Like

G_tildecomb is a valid production name in the AGLFN. So it is exported as is.

ok, I thought that a name with an underscore might turn the glyph into a ligature and influence caret behavior, but it doesn’t matter it seems.

However, it does make a difference if the glyph is built with a substitution of 1) tildecomb, 2) tilde or 3) asciitilde.
When type three G’s, followed by the three tildes, with the default paragraph composer, InDesign always splits up the resulting glyph in two parts, with a caret in the middle. And only 1) seems to survive the clipboard to MS Word. Doing the same with the world ready composer activated: 1) turns into a single whole glyph, 2) and 3) remain split in two halves.
When inserting the precomposed glyph from the InDesign Glyphs panel, when the ccmp substitution also covers 2) or 3), it will result in two split halves, even when the world ready composer is active.
Only when ccmp contains sub G tildecomb by G_tildecomb; and nothing else, the Glyphs panel insert one whole glyph.

In MS Word, a caret appears at the center of the glyph when G and asciitilde are typed directly from the keyboard. When the three tildes are inserted right after a G, (by typing out their Unicodes followed by Alt+X) both 2) tilde and 3) asciitilde result in a caret position at the right side of the glyph, as if there is a non-spacing character, and only 1) tildecomb results in a single whole glyph, without any caret position, just like in InDesign with the world ready composer, which in my opinion is the desired result.

Conclusion, my “works like a charm” wasn’t completely charming.
The substitution with tildecomb works best, and adding ccmp substitutions for the spacing tildes might result in unwanted and inconsistent behavior.
Input method InDesign: Glyphs panel; input method MS Word: type G0303 followed by Alt-X.
And the World Ready Composer should be default in InDesign, but that is old news.

3 Likes

Great explanation Lucas, clarifies many doubts about how to implement certain aspects of some languages. Thanks!

Thanks Georg, very useful this method, I will try not only with the Guarani language but with other more complex language where many accents are stacked. I will come back later with a couple of additional questions.

Lucas, does it make sense then, first to substitute the spacing tildes with tildecomb, so it might not cause problems?

Letting the user enter G~ or G˜ to show a g-tilde like G̃ is a bad idea in the long term.
First, as mentioned a caret is inserted as these are two spacing characters, whichever order you use for substitution won’t change this. This will happen on all platforms whereas it might only happen with the combining mark on buggy platforms.
Furthermore because this changes the semantics and will break functionalities like searching and matching.

Imagine you search in a document for a word that uses that letter, which one do you try? None of them will match the other. Or imagine you share that document with someone who doesn’t have a font that does that trick, they won’t see the correct letter g-tilde.

The best long term solution is to provide a font that shapes G̃ correctly as more and more fonts do nowadays, and provide a practical input methods so the

2 Likes

Glyph substitution is only the latter part that happens in text processing, and that part is more-less predictable (though implementations vary). But the earlier part, itemization of text into scripts, and processing of various “special” characters, is equally important. And here, text engines vary a lot ij terms of which Unicode recommendations they follow, what they do in terms of Unicode normalization, and what they do with graphemes.

What? Graphemes? :slight_smile: Yeah… Here’s a more-less human-readable discussion about it: string - What's the difference between a character, a code point, a glyph and a grapheme? - Stack Overflow

Now, combining mark characters (like tildecomb) are “known” by some text engines as “those that form graphemes with preceding characters”.

So if you enter G followed by a combining tilde, or optionally by more combining marks, some text engines will then treat such a sequence as a grapheme. This means that the caret moves through them as if they were one character, when you select, the base letter and the following marks will be selected, and if you delete, the base plus the marks will be deleted all in one go. Other text engines will or will not do some of those things…

When you enter G followed by a spacing tilde or even worse the ASCII tilde, these sequences will not be treated as one grapheme. Sometimes they won’t be treated as part of a “word”, so things like Alt+left/right for stepping through the text will not work as expected. And if the user does search, some search engines will “conveniently” skip combining marks so the text will be found even if you only enter the base letters. But the text has an ASCII tilde stuck in the middle — then no.

The font may implement substitutions for spacing and nonspacing tilde in the same way, and you may even get the same visual result — which was OK in print times, when the typeset text went to a PDF and then often to print.

But if the text is supposed to go into an electronic document where the user can search, edit, select, copy-paste — using combining marks is always better, and using those other marks will always cause some unexpected complications somewhere along the way. The composite glyphs will appear in both cases the same, but the behavior of the text will be different.

1 Like

Botio — substituting tilde by tildecomb on the glyph level via GSUB won’t change anything, because the text engines do copy-paste, stepping through the text and search based mostly on characters and graphemes, and if the text has the tilde, it’ll always have the tilde. Text engines use some info for the font, for example caret positions and glyph metrics with kerning, to decide where to put the text cursor, but the decisions how text selection works, how stepping through works etc. — they’re made on the basis of the underlying Unicode text, and they’re made by each text engine differently.

The differences are fundamental, they start with very basic things like whether you need to press the right arrow or the left arrow to go to the next character in Arabic or other RTL text (different apps do it differently), and there are tons of other differences in behavior that a font cannot influence.

1 Like

For example, if you type a Bulgarian word with combining graves to indicate the stressed vowels, and the font produces the correct appearance, either via GPOS mark positioning or via a GSUB substitution — then the users will most likely find that word even if they search for the form without the accents. Because the search engines “know” that combining marks can be skipped.

But if you type the word using spacing marks, and the font produces the same appearance — this word will most likely not be found if the users search for the unaccented version, because to the search engine, this will be a different word.

Of course there are many text engines and there are many search engines (libraries) that apps use, and some are smarter, while others are dumber, so it’ll never be fully consistent.

But using spacing marks as a workaround will always cause more such problems. The real problem is that manufacturers of keyboard layouts in various OSes still build these layouts like in 1991, and it’s virtually impossible for users to enter combining marks from their keyboards without resorting to conplicated solutions.

So we, as the “font people” should lobby (together with “text people” who care about the correct form of electronic texts) that OS makers fix their keyboard layouts and bring them into the 21st century. Because, really, that’s where the core of the problem lies — and trying to fix it at the font end is just wrong (we can provide such hacks in fonts but they should be short-term, and we should still lobby to the OS makers and get some “aware” users to help).

Users will be happy to enter combining marks if the keyboard layouts allow them to. Unfortunately, the most innovation in keyboard layouts these days is various ways to enter emoji :face_with_raised_eyebrow:

1 Like