I noticed that characters added in Unicode 13 and later don’t get categorized properly, and show up under Other.
I understand that I can add these characters to the data file myself, but I think the default data should be updated with some reasonable frequency. My suggestion is that the list of encoded characters, and the associated script values, should be updated with each Unicode release, or once a year.
As an example of the issue, all of the glyphs below should appear under category Letter, Han.
I upgraded to version 3.2 (3188) to try it out. Thank you for assigning the characters from CJK ext G to Han script.
But there are more still to add:
u2A6D7… u2A6DF (added to CJK ext B in Unicode 13.0 & 14.0)
u2B735…u2B739 (added to CJK ext C in Unicode 14.0 & 15.0)
When urgently needed characters (UNCs) are proposed, they get assigned to the reserved code points at the end of the CJK extension blocks, so these blocks get new characters until they are full.
There is also a whole block left out:
u31350…u323AF (CJK ext H created in Unicode 15.0)
For now, no new Han script characters are planed for Unicode 15.1—but this could change at the last minute, as CJK ext I might sneak in.