GlyphData.xml lacks recently encoded characters, causing glyphs to be uncategorized

I noticed that characters added in Unicode 13 and later don’t get categorized properly, and show up under Other.

I understand that I can add these characters to the data file myself, but I think the default data should be updated with some reasonable frequency. My suggestion is that the list of encoded characters, and the associated script values, should be updated with each Unicode release, or once a year.

As an example of the issue, all of the glyphs below should appear under category Letter, Han.

The entirely of the CJK Unified Ideographs Extension G block is likewise affected.

Fixed. Thanks for pointing this out.

I upgraded to version 3.2 (3188) to try it out. Thank you for assigning the characters from CJK ext G to Han script.

But there are more still to add:

  • u2A6D7… u2A6DF (added to CJK ext B in Unicode 13.0 & 14.0)
  • u2B735…u2B739 (added to CJK ext C in Unicode 14.0 & 15.0)

When urgently needed characters (UNCs) are proposed, they get assigned to the reserved code points at the end of the CJK extension blocks, so these blocks get new characters until they are full.

There is also a whole block left out:

  • u31350…u323AF (CJK ext H created in Unicode 15.0)

For now, no new Han script characters are planed for Unicode 15.1—but this could change at the last minute, as CJK ext I might sneak in.

1 Like

Thank for checking this. I added them, too.

On a new Mac using a fresh install of Glyphs 3.1.2, I checked again today and found the following Han script characters are misclassified as Other.

  • 2A6D7..2A6DF (part of CJK ext B; additions from Unicode 13.0 & 14.0)
  • 2B735..2B739 (part of CJK ext C; additions from Unicode 14.0 & 15.0)
  • 31350..323AF (CJK ext H block; created in Unicode 15.0)
  • 2EBF0..2EE5F (CJK ext I block; created in Unicode 15.1)

The first two rows are included in Glyphs 3.2, I’ll add the other two.

1 Like