Missing Khmer submenu letters, marks, symbols

ksor930 · August 22, 2017, 2:57am

In the leftside panel under languages categories in Southeast Asia, Khmer is missing submenus, but Thai, Myanmar and Lao have submenu for those glyphs. Is this by design or is it just my computer?

mekkablue · August 22, 2017, 5:59am

For something like this, we need user input. Do you have a suggestion for submenus that make sense? What would you expect there?

ksor930 · August 26, 2017, 6:53am

I think implementing submenu for Khmer based on Thai submenu would make mose sense with some few additions. Letters, Marks, Numerals, Symbols (for lunar dates), Punctuations, Others.

Since writing Thai and Khmer is based on the same rules of consonant and vowel placement in Thai (almost identical with one key difference) basing the submenu on what is present for Thai makes total sense to type designers.

The Khmer script is divided into these sub categories Letters, Marks, Numerals, Others, . In Glyphs app, Letters in Khmer are single character consonants and independent vowels. Independent vowels are just seperate graphical representations of vowels that are rare, but commonly used in old words, Pali, Sanskrit, or special significant technical terms. All vowels in Khmer, called Marks in Glyphs app, need to attach to a consonant. These vowels are called dependent vowels for this reason. They cannot be written or read without a base consonant.

Seperating the Mark from the Letter make sense in every context. Although Khmer do use combining marks that behave just like latin diacritics, they function nearly the same way as vowels except the fact that they always appear above the base consonant (Which I think is universal in all Latin based scripts. Vowels in Khmer can be placed above, below, to the right or left, and combination of each direction around one base consonant.

Numerals - Khmer has its own numeral digits 0-9, plus special divination lore numerals 0-9, each with its own specific glyph representation.

Others - I’d keep these for punctuation marks.

RobPratley · August 26, 2017, 9:36am

Personally. I would probably want to add a bit more complexity into the subdivisions to help signify the separation between Khmer letters, and glyphs required for proper orthographic representation. Perhaps putting the blwf, pref, pstf etc sub-consonants into a second list, similar to how halfforms and conjuncts are treated in some of the Indic scripts would go some way toward partly satisfying that.

mekkablue · August 26, 2017, 4:20pm

Can you create these lists? Here is how:
https://glyphsapp.com/tutorials/custom-sidebar-entries-in-font-view

RobPratley · August 27, 2017, 11:55am

Sure. Where’s best to send it after: support @ ?

mekkablue · August 27, 2017, 8:53pm

Yes.

sovichet · January 2, 2018, 3:58pm

Hello @RobPratley! It looks interesting that you are working on the list of Khmer. Could I know the progress? I’m glad to help with Khmer script.

GeorgSeifert · January 2, 2018, 7:50pm

I just saw that I added an entry for Khmer but didn’t add the subgroups. Help is very welcome. Please suggest groups of glyphs.

mekkablue · January 2, 2018, 10:08pm

@Bendy did we discuss Khmer too the last time? I don’t remember.

Bendy · January 2, 2018, 10:25pm

I think we got sidetracked into a glyph naming discussion: ‘iMark’ (as a combining mark) vs ‘iSign’ (as a spacing pre-base or post-base character [dependent vowel letter]) vs ‘I’ (as independent vowel) … these should ideally be consistent across all the SE Asian scripts, and potentially in the Indian scripts too.

I think anyway we are waiting for @RobPratley to have some time, and Sovichet, Zachary and I are happy to review.

GeorgSeifert · January 3, 2018, 11:00am

I added a first batch of stuff to the sidebar. It is loosely based on the Noto Khmer but doesn’t include alternates and positional forms.

RobPratley · January 3, 2018, 6:25pm

Sorry for the delay: I’ve got some free time at the moment so will pick this back up and put something on GitHub ideally tomorrow for review/modifications. @Bendy, @zakkuri, @sovichet ; I’ll put a link here when it’s up.

RobPratley · January 5, 2018, 4:41pm

A bit late, but here’s some work so far.

Laptop is in for repair so apologies for the readme - it was done from my phone. Nothing is ready to be implemented yet, but hopefully it’s a starting point for discussion/improvements. Everything should be self explanatory (I hope!)

Bendy · January 10, 2018, 1:03pm

Thanks Rob, this is looking like a very good start!

Some questions for general discussion:

I tend to use the same kind of glyph groups for all the Southeast Asian scripts, and I think it would be good to organise them all along similar lines. I tend to order as follows:

Consonants
I don’t call these ‘letters’, since to my mind some other things (see below) are also letters, and ‘consonants’ is more definite and less ambiguous.
Medials and subjoined (medials only for Burmese)
I’ve heard ‘sub’ used for the Khmer below-base consonants and ‘subjoined’ is the term in Burmese.
Lao signLo and nyoVowel should also really go in this class, since they derive from below/post-base consonant forms, despite the Unicode names.
Some Khmer designs (not the Noto Sans here) will also need sub2 versions of the below-base subconsonants which would be smaller versions of the normal ones, to be used when there’s more than one mark below the baseline.
Vowels and marks
This is the most tricky thing to figure out a system for, as there are a number of different things in here.
First we have two kinds of dependent vowels: spacing letters (see the difficulty with ‘letters’, we did discuss calling these ‘signs’ but I don’t know what other options we have) in pre-base or post-base positions, and then nonspacing diacritics (to my mind these are also ‘letters’) which go above or below, and are often used in combination with the spacing dependent vowel letters. Indeed some of these glyphs are made of spacing and nonspacing components (e.g. saraAm in Thai/Lao). Keeping vowels and marks in the same group means seems a good idea to me, as these are all dependent (needing to have a base) and need to be treated similarly at the engineering stage (being ignored or included in cluster shaping).
Then we see Burmese and Khmer have independent vowels, and I’d keep these separate from the alphabetic letters, since nobody would list them as part of the alphabet [which would just list the consonant repertoire]. These can occasionally take diacritic marks, so perhaps they belong in a group of their own rather than bundled in here.
Conjuncts and ligatures
These are basically any precomposed things, either made from components or drawn afresh if the consonant or ligature is a different form from the components. I consider conjuncts to be pure consonant-consonant combinations, and ligatures to be anything else, so they could be consonant-consonant-vowel, consonant-mark, numeral-mark, punctuation-mark, or anything else. Note in the current build of Glyphs, Lao ໜ and ໝ have the property ‘ligature’ which prevents the mark feature from including them (it only builds combinations from ‘letter’ category bases).
Though the Noto Khmer fonts do not contain ligatures for every consonant with -aa and -aq, any design with the dipped centre in the letters’ hair will need a ligature glyph in the font (this is actually most Khmer fonts). Khmer ligatures should also include the post-base sub consonants combined with -aa and -aq.
Numerals
I’d probably include the Khmer divination and lunar numerals in here, rather than in the ‘other’ category. In previous versions of Glyphs the numerals needed to be classed as ‘letter’ for the mark feature to include them as bases. I think this is fixed now, but I haven’t tested everything recently.
Punctuation and symbols
Straightforward, but again please make sure these are compiled into the mark feature if there are anchors defined.
Minority
Another tricky section. For Lao it’s easy as there are only two characters khmuGo and khmuNyo. But if we go to Myanmar minority languages, we double the existing glyph set, with many new glyphs in each of the above categories. Bundling these all together in ‘minority’ means a) minority consonants are now in a different place from standard Burmese consonants, and the same for every other class and b) minority consonants and minority vowels/marks and minority everything else are all in one place. I think there needs to be a better mechanism for switching between a standard/basic implementation of a script and an comprehensive implementation of all the languages.

EDIT: We should also ensure that all the control characters are somehow included for all complex scripts:
zerowidthspace
wordjoiner
graphemejoinercomb
nbspace
zerowidthjoiner
zerowidthnonjoiner
dottedCircle

I can’t open Robert’s GlyphData.xml in EditGlyphData.

EDIT: I tend to keep ru-thai and lu-thai after roReua and loLing; that is, in the consonant group. They’re kind of weird things that don’t fit into any other category. They’re normally grouped with vowel signs in textbooks as they’re vocalic consonants, and can work like a dependent vowel following a syllable-initial consonant (as in อังกฤษ). But they don’t need a base consonant (they can appear at the start of a word as in ฤดู, and they don’t take diacritics (behaving like independent vowels). Unicode puts them in with the consonants, so that’s where I tend to leave them. They should not have .short alternates since nothing will ever go underneath them. Burmese and Khmer have different signs for the independent and dependent versions, so there’s not the same question of where to categorise them. Lao characters are not yet encoded, and attestations are scarce, but they’d likely be encoded and behave like Thai.

RobPratley · January 10, 2018, 2:24pm

Hi Ben,

Thanks for your comments - You’ve much more experience in these matters than I at the moment.

I did initially separate Consonants from Vowels, but put them together at the moment following the conventions currently in place in Glyphs (for better or worse…). In agreement, splitting these would be desirable akin to how Indic scripts are currently covered. Whether this includes categorical separation of Independent and dependent vowels is perhaps another discussion.

I agree regarding the ligated forms of -aa and -aq for designs where the hair has a form that requires them, though the fact that some typefaces use less dynamic approaches to this particular element means that it is not a necessity for a working implementation. I’d argue for omitting these from any sidebar categories for this reason so that people are not prescribed to include glyphs that satisfy more design rather than strictly ‘functional’ requirements (compared to ba_aasign etc, which do require a new construction and should be prescribed). So long as the necessary feature is built in the presence of ka_aasign-khmer etc, then this should not present any headaches.

Good note on numerals: These were separated based on my admitted ignorance at this stage on their utility in contemporary khmer typesetting, and therefore wanting to separate what might be considered necessary vs good to have in serviceable font. I think we’re both of the mind that the more support a font includes, the better, though as these entries may be used for determining a basic character set, some distinction may be good in the same way that ‘minority’ is used in Burmese. (Though, as mentioned, it may be my ignorance of usage that should mean that divination/lunar characters are indeed a basic requisite).

re: the GlyphData file - this is more a test to try and build it at the moment so won’t work, my apologies. I’ll fix that up at some point in the next couple of days. I’ll also update the files a bit taking into account some of the comments discussed above. Perhaps it may be useful to arrange a chat sometime soon to flesh out some of this.

Bendy · January 10, 2018, 2:58pm

The Khmer magic numerals raise a useful question and I can see why it’d be nice to separate them somehow from the standard numerals. The presence of these in the Unicode charts, like many of the encoded Burmese characters, leads people to believe they are part of a standard implementation. What I gather is that the numerals for Khmer divination lore and lunar calendar would be used only in the unlikeliest of situations, when transcribing astrological charts, or perhaps for specialist academic study. I would not generally recommend including these in a general-purpose font, unless specifically asked.

So, do we want to prescribe these as part of the basic glyph set? Or perhaps we should move them to a category like ‘Specialist’ (I think ‘Other’ is a bit vague to be helpful). Are we aiming to provide a framework for people to create general-purpose fonts, or to provide something that caters for any circumstance, no matter how exceptional? This also relates to my question above about switching between standard language coverage and providing for all the minority languages in Burma. I don’t have any good answers to this, or any particular preference, but I think it needs to be managed smoothly in Glyphs. Let’s hope some others can chime in.

Mark · January 10, 2018, 3:48pm

Thanks everyone for making up your minds about that!

Speaking of the dottedCircle: Has anyone a recommendation for how to properly use it in multi-script-fonts? It turned out to be a real pain concerning its size and anchors once you need it for Thai, Lao, Khmer, Burmese, Devanagari, …
Is it a good way to have many versions of it and localise it somehow? That only works with language settings anyway, right?

GeorgSeifert · January 10, 2018, 3:48pm

I though that dividing the glyphs by “GDEF” class (base and mark glyphs) would be a good first step. That would mean Vowels and Consonants are in one group. The behave similarly as they are spacing and can recite marks. And occupy roughly the same space.

GeorgSeifert · January 10, 2018, 3:50pm

And all categories can have subcategories. So we can define Consonats > Basic and Cosonants > Historic…