Language system tags with 4 characters seems to mishandle OpenType Language System tags with 4 characters like Shona ‘SNA0’, Toma ‘TOD0’, etc. See the 12 or so language system tags ending with 0 in Language system tags (OpenType 1.9) - Typography | Microsoft Learn.

When a font source file has glyph with a suffix like .loclTOD0 it doesn’t get handled as other glyphs with similar 3-character language system tags and not substitution is added to the automatic locl code.

Additionaly the LanguageSystems prefix code isn’t updated correctly, having a language TOD0; before a lookup has an automatic languagesystem TOD; added instead of the correct languagesystem TOD0;.

I also have a related question: How does one use the automaticl locl feature code with glyphs that should be applied to multiple locales/language systems?

For example Bhook.loclTOD0 should be used in Toma TOD0 and Kpelle XPE or others but the automatic code only uses it in a language TOD0 because of the name but it would need to be in both language TOD0 and language XPE and others. Is there a syntax for multiple languagesystem?

I think we assumed language tags to always be three letters. Have to fix this.

One features will be difficult to keep working if the three letter assumption doesn’t hold. Right now, you can just add multiple lang tags together. But discerning them is much more difficult if you can’t rely on the char count. I’ll see what I can do.

And why is there no script(s) associated with each language? It needs to be in some script? Or just use dflt?

I don’t know why I didn’t think of trying adding the lang tags together, that makes sense.

And why is there no script(s) associated with each language? It needs to be in some script? Or just use dflt?

That’s my mistake, the code generated is indeed languagesystem dflt TOD;.

So when there is no obvious script associated with a language, then just use dflt? Shouldn’t there be a script for each language?

Arghh. Sorry for getting mixed up. The generated code is with the correct script unless the locl glyph variants are for Common characters like combining marks.

So for example having Bhook.loclTOD0 will generate languagesystem latn TOD; which is the main issue with the 4-character codes.

The dflt issue, which was a Freudian slip, still occurs for example when having caroncomb.loclLAZ which will generate languagesystem dflt LAZ; which may not do anything useful depending on the shaper.

My question was who do I know that TOD0 is a latin based language? I didn’t see a mapping for this on the OpeType page?

That has to be inferred from the script of the glyphs. Bhook.loclTOD0 is a variant of Bhook which is Latin.

Right. The code had some fallback that looks for script from the lang tag if the script was not set. I’ll ignore it :wink: