Missing Khmer submenu letters, marks, symbols

Is it a good way to have many versions of it and localise it somehow? That only works with language settings anyway, right?

I would not try to have different versions of dottedCircle, unless you really have to (e.g. when the baseline is different between scripts). You’re right, the language settings are not reliable for this kind of thing. The insertion of dottedCircle is determined by the shaping engine and outside the control of the font in many cases. DottedCircle seems to inherit its script settings from adjacent script runs, so problems can crop up pretty easily, especially in multi-script settings.

Maybe using different anchor names for each script would solve some of the problems (bottomright.thai, bottomright.burmese)?

And all categories can have subcategories.

Oh, that’s good, I didn’t realise that.

I though that dividing the glyphs by “GDEF” class (base and mark glyphs) would be a good first step. That would mean Vowels and Consonants are in one group. The behave similarly as they are spacing and can recite marks. And occupy roughly the same space.

I don’t really agree they behave similarly:

  • Their ‘dependent’ aspect means they can’t exist alone like the bases, they need to hang onto a consonant; in all senses they are different kinds of things from the consonants.

  • In many situations, they don’t receive marks. In standard Thai and Lao, none of the spacing vowel signs would operate as bases for diacritics: ◌ะ ◌า เ◌ แ◌ โ◌ ใ◌ ไ◌ ◌ະ ◌າ ເ◌ ແ◌ ໂ◌ ໄ◌ ໃ◌. As far as I know, the same rules apply to Khmer េ ែ ៃ and Burmese ေ. There may be others.

Several reasons why I’d keep the dependent spacing vowel glyphs separate from the consonant bases:

  • They are always listed alongside the diacritic versions in any primer/textbook/Wikipedia page/character list, and not among the consonants.

  • The spacing/nonspacing aspect of the vowels doesn’t really have any significance. For example in Thai ั◌ and ◌ะ are the same vowel, one is used between two consonants, and one is used in syllable-final position. Other examples are Thai เ◌็◌ and เ◌ะ, แ◌็◌ and แ◌ะ; and Lao ເ◌ັ◌ and ເ◌ະ, ◌ົ◌ and ໂ◌ະ, ◌ັອ◌ and ເ◌າະ. The fact that sometime bits are above and sometimes next to the base doesn’t mean they’re different categories, each pair represents the same vowel.

  • EDIT: also consider the case of Burmese where -uMark and -uMark.post are alternates for the same character, not just the same vowel. These should stay together in the sort order.

  • For the same reason that we keep the consonants in alphabetic order, we should keep the vowels in their conventional arrangement as far as possible, i.e. all together.

1 Like

I agree with Ben that GDEF classes aren’t helpful in organizing the characters of South-East Asian scripts. In addition to the dependent vowels he mentions, there are also the conjunct forms in Javanese and Balinese, most of which are subjoined and therefore mark glyphs, but some of which are spacing and positioned to the right of the base consonant and function as base glyphs themselves. For example, ᬲ᭄ᬢ (sta) vs. ᬲ᭄ᬧ (spa).

These conjunct forms also mean that positional terms such as “below” are not good descriptors for conjunct forms – functional terms such as “conj” would work better.

1 Like

I just wanted to explain my initial thought. I fine with whatever you come up with.

1 Like

Thanks Norbert. It’s a good point about the .below suffix, since several of these are post-base and one is pre-base in Khmer. Should we think of using .coeng for these? (that’s the Khmer word for the pre-, post- or below-base forms of consonants). @sovichet @zakkuri @RobPratley would you prefer the suffix .coeng or .sub or .conj ? (We’d have the same question for Tham script, which puts the conjunct forms in the same places as Khmer.)

I would prefer names/suffixes that work for all scripts. Maybe using the feature tag where they are normally substituted?

The subjoined consonants might not all be in the same feature. I had to put the Khmer sub-Ro in a PREF feature in the last font I made so that it would be reordered before the base in one or two environments. All the other subconsonants were in CLIG. I can’t remember why I didn’t use BLWF, and I know my code is pretty ropey, but I think it’s safe to say things often need to be divided between features to make them work across the broadest range of target environments.

EDIT

I checked Oriya and Gurmukhi, and it looks like there are .below and .post consonants in the ‘combining’ category. Maybe we should just give them different suffixes depending where they go.

BTW in Gujarati you need to search and replace ‘strait’ with ‘straight’. A strait is a stretch of water.

I think that this may be a better approach yes. I’ve seen the terms subjoined and postfixed used, so perhaps .post, .pre, and .sub/.below would work best, as well as giving some guidance to the expected behaviour of relevant glyphs.

2 Likes

Good idea. But this would mean to have all anchors named according to that scheme, right? And it still doesn’t solve the problem of possible different sizes (and/or even vertical positions) of the dottedCircle. Anyway, sorry for highjacking this thread. Perhaps it’s a different constructions site.

Or just append .burmese to the ones that don’t fit the same system as the others. Maybe Khmer, Thai and Lao can use the same anchor names. I think it’s good to keep the discussion in one place, as we’re talking about how to keep things consistent between scripts.

2 Likes

I think the suffix “.coeng” would be more straight to how we call those subscripts in Khmer.

I see in the build 1105, there is a menu for Khmer and those subscripts are called “coeng_<con>-khmer”. To me, calling them like this is clear and straight, but I don’t think other designers would know if it is another letter or just the substitution.

So, using suffix .coeng or .sub is better to me. I prefer to kep a particular suffix for all subscripts.

2 Likes

I don’t know much about the GDEF table. With my experience in coding the OT features, I tend to create the following main classes:

  • Consonants
  • Pre-base (sub RO)
  • Post-base (sub KHO, sub CHO, sub TTHO, sub YO, sub SSO and sub SA)
  • Subconsonants (excluding pre-base and post-base)
  • Above marks
  • Below marks
  • Numbers

I felt a bit strange when I saw all post-base subscripts are in “Letters” category.

2 Likes

I think the suffix “.coeng” would be more straight to how we call those subscripts in Khmer.

I see in the build 1105, there is a menu for Khmer and those subscripts are called “coeng_-khmer”. To me, calling them like this is clear and straight, but I don’t think other designers would know if it is another letter or just the substitution.

So, using suffix .coeng or .sub is better to me. I prefer to kep a particular suffix for all subscripts.

Interesting, I hadn’t thought of the ‘coeng_ka-khmer’ format. They are technically ligatures, but I don’t know whether that’s relevant for the glyph names, and I agree it’s potentially confusing. I think the priority should be to think of what’s most clear to users, and I do think that using ‘coeng’ has the benefit of being correct and clear in Khmer language. (I think we’re trying to encourage type designers from Cambodia, as well as encouraging non-native designers to learn something of the script they’re working on.) Is it such a problem if these are called the correct things in each language, rather than being identical across scripts?

I don’t know much about the GDEF table.

I think that’s a good thing to bear in mind. I think most users shouldn’t need to be concerned with the GDEF table.

I felt a bit strange when I saw all post-base subscripts are in “Letters” category.

I think ‘letters’ is a fudgy term and we should try to be explicit.

‘.coeng’ would be more consistent within Khmer, though arguably less consistent for developing a broader scheme that encompasses different scripts with similar behaviour that use different names for the equivalent character. My concern with ‘.sub’ is that 1) it does not give any reliable indication to the positioning of the pre- and post-base forms, and 2) sub is so ingrained in OT feature code that any semantic relationship to subconsonant or subjoined may be lost initially. I guess this raises the questions: how much should the nice-names rely on script research by the designer, and do naming conventions faithfully align to specific script/languages, or more broadly across writing systems.

EDIT: I think Ben beat me to it…

3 Likes

If we can really have subcategories, I’m leaning towards naming the category ‘coeng’ and then maybe having ‘Prebase’, ‘Below-base’ and ‘Post-base’ as subcategories. And then having .coeng in the glyph names.

If it’s important to have consistency between scripts, then maybe ‘conjunct’ is better than ‘subjoined’ for the category and .pre/.below/.post for the glyph names. I think ‘sub’ does kind of imply below-base, and as Rob notes, ‘sub’ is also used elsewhere. I’m not sure whether ‘conjunct’ would be clear and nice for Khmer users.

1 Like

We have never been concerned about the term in English. So now we start to use one. :slight_smile: I just rechecked the Microsoft’s document and it calls “conjuncts.”

EDIT: I think “conjuncts” sounds good to me.

Just a small question: What does ‘coeng’ mean?

It literally means “leg.” That’s how we call in Khmer.

1 Like

Problem with ‘conjunct’ is it might be used in other scripts to mean the whole conjunct, rather than the attaching part. In language terms, the conjunct is the whole stack, no?

It looks like I can’t give any more comment on this now. So we are in between ‘conjunct’ and ‘subjoined’. I checked Myanmar and there is a menu for ‘Subjoined.’ What do you think? Do glyphs there behave the same way as in Khmer?

They’re all below the base in Burmese. I personally think there’s a benefit in giving different names between the scripts, so designers know things don’t behave the same way universally.