Resources for mapping character sets to supported languages

Hi guys,

I’ve been reading the other posts on character sets and the supported languages, but I don’t find a good solution. So I need your help.
I’m working out information for my website on custom type. I’ve defined these character set:

  • Basic Latin
  • Latin - 1 Supplement
  • Latin Extended - A
  • Latin Extended - B
  • Latin Extended - C
  • Latin Extended - D
  • Latin Extended Additional

But still it’s impossible for me to determine the languages supported by these sets. Wikipedia talks about ‘major alphabets’ but not all of them and not always the languages that are supported. Let alone the languages that are “not fully supported” because some glyphs are not included.
Even more, every Wikipedia page is different. Some don’t even document the supported languages at all, just the character sets.

I want to map a character set to their supported languages:
My goal is to offer, for instance:

  • Basic Latin: X characters - X supported languages - for price X
  • Latin Extended - A: X characters - X supported languages - for price Y
  • Latin Extended Additional: X characters - X supported languages - for price Z
    and so on.

This should make it understandable for any client because they know what they’ll get.

I’ve checked these websites:

Do you know of any other resources where I can find this information?

Thanks in advance.

Did you check Alphabet Type tools ?

I did, but this requires that you upload a font so it can check if the required character set is available in that font file. It doesn’t say what languages are supported.

From the top of my head, I would not know what all the Eastern-Europe languages are, and what characters are included. :wink:

There is no good answer to the question how a certain character set relates to a list of languages. We had this discussion here a few years back. It gets down to what do you count as a language, and what do you consider as a requirement for that language.

For instance, you might say that if you have ÄÖÜẞäöüß in your font, you support German, because these are the extra letters listed in the official spelling guide. In fact you can write German perfectly well without ẞß, but you will fail on most long texts if you do not have ÀàÉé.

So take all info you get out of this with a ton of salt, because, simply put, it will always be wrong in the end.

Having said that, if you want the Underware info the right way around, run Test > Language Report in the mekkablue scripts. You can easily update the script with additional languages.

Another resource I occasionally use: Letter database: languages, character sets, names etc

Makes sense, Rainer. For me it’s just a matter of definition. If a client wants a font that covers the Western-European languages, it would be unprofessional to deliver a font that lacks some characters because it’s hard to define a character set. :wink:

I’ll try to mix all the info together and get a good setup that works for me.
A site like Alphabet Type and Underware Plus are then a resource for the final check if I did it right.

Have you checked this? GitHub - rosettatype/hyperglot: Hyperglot: a database and tools for detecting language support in fonts

@reneverkaart I just noticed you didn’t have this on your list: Evertype: The Alphabets of Europe

Cool, very good. Thanks

I’ve checked this one to. This helped me a lot defining the characters inside a certain language. It was just so much work to combine this with the other languages. The same accents are used by so many languages that it’s crazy to map them all into a group…

Would it be to useful to export a typeface with the character sets you mentioned separately and try them one by one with hyperglot or something similar? This way maybe you can see which set supports which languages.

A seemingly rational approach is to offer a reasonable char set and extend it upon request. Here, “reasonable” is a balance between development costs and coverage (I suggest counting people/users, rather than merely a number of languages). Building such a default set may take time, but you already have all the good links for that. Check what well-established foundries offer, check whether rare characters are actually used. For instance, some add Aringacute and screw up vertical metrics for that one char (probably because Underware list it for Danish), but my research shows that it’s hardly ever used.

Sure alexs, the number of users is important, but this is another part of the price offer. :wink:

If a client asks for coverage of a certain region I’d like to also document the correct languages so that there can never be an issue over the project scope.
I’ve been doing my job since 1995 and if there’s something that has ALWAYS saved the day, it’s having a solid scope.

This is another great resource I just found:

I tested it with Open Sans Regular and it gives a great out put of the available characters and supported languages.

I’m sure this will work, but it’s strange to me to see how difficult it is to get this information together. This would mean I need to create different fonts with these character sets just to see what languages they support. It becomes more and more evident to me that character sets vs languages are not important to anybody selling (custom) fonts. Perhaps it’s just me being too professional. :rofl:

There is also a good topic about language coverage and how it relates to glyph sets and shaping rules on the TypeDrawers forum.

1 Like

Thanks Florian, I dove into it, and the explaination from Ray Larabie makes a lot of sense and stuck with me most:

Consider Hyperglot: