Is it a good way to have many versions of it and localise it somehow? That only works with language settings anyway, right?
I would not try to have different versions of dottedCircle, unless you really have to (e.g. when the baseline is different between scripts). You’re right, the language settings are not reliable for this kind of thing. The insertion of dottedCircle is determined by the shaping engine and outside the control of the font in many cases. DottedCircle seems to inherit its script settings from adjacent script runs, so problems can crop up pretty easily, especially in multi-script settings.
Maybe using different anchor names for each script would solve some of the problems (bottomright.thai, bottomright.burmese)?
And all categories can have subcategories.
Oh, that’s good, I didn’t realise that.
I though that dividing the glyphs by “GDEF” class (base and mark glyphs) would be a good first step. That would mean Vowels and Consonants are in one group. The behave similarly as they are spacing and can recite marks. And occupy roughly the same space.
I don’t really agree they behave similarly:
Their ‘dependent’ aspect means they can’t exist alone like the bases, they need to hang onto a consonant; in all senses they are different kinds of things from the consonants.
In many situations, they don’t receive marks. In standard Thai and Lao, none of the spacing vowel signs would operate as bases for diacritics: ◌ะ ◌า เ◌ แ◌ โ◌ ใ◌ ไ◌ ◌ະ ◌າ ເ◌ ແ◌ ໂ◌ ໄ◌ ໃ◌. As far as I know, the same rules apply to Khmer េ ែ ៃ and Burmese ေ. There may be others.
Several reasons why I’d keep the dependent spacing vowel glyphs separate from the consonant bases:
They are always listed alongside the diacritic versions in any primer/textbook/Wikipedia page/character list, and not among the consonants.
The spacing/nonspacing aspect of the vowels doesn’t really have any significance. For example in Thai ั◌ and ◌ะ are the same vowel, one is used between two consonants, and one is used in syllable-final position. Other examples are Thai เ◌็◌ and เ◌ะ, แ◌็◌ and แ◌ะ; and Lao ເ◌ັ◌ and ເ◌ະ, ◌ົ◌ and ໂ◌ະ, ◌ັອ◌ and ເ◌າະ. The fact that sometime bits are above and sometimes next to the base doesn’t mean they’re different categories, each pair represents the same vowel.
EDIT: also consider the case of Burmese where -uMark and -uMark.post are alternates for the same character, not just the same vowel. These should stay together in the sort order.
For the same reason that we keep the consonants in alphabetic order, we should keep the vowels in their conventional arrangement as far as possible, i.e. all together.