Resources for mapping character sets to supported languages

reneverkaart · August 25, 2022, 12:12pm

Hi guys,

I’ve been reading the other posts on character sets and the supported languages, but I don’t find a good solution. So I need your help.
I’m working out information for my website on custom type. I’ve defined these character set:

Basic Latin
Latin - 1 Supplement
Latin Extended - A
Latin Extended - B
Latin Extended - C
Latin Extended - D
Latin Extended Additional

But still it’s impossible for me to determine the languages supported by these sets. Wikipedia talks about ‘major alphabets’ but not all of them and not always the languages that are supported. Let alone the languages that are “not fully supported” because some glyphs are not included.
Even more, every Wikipedia page is different. Some don’t even document the supported languages at all, just the character sets.

I want to map a character set to their supported languages:
My goal is to offer, for instance:

Basic Latin: X characters - X supported languages - for price X
Latin Extended - A: X characters - X supported languages - for price Y
Latin Extended Additional: X characters - X supported languages - for price Z
and so on.

This should make it understandable for any client because they know what they’ll get.

I’ve checked these websites:

Unicode - Wikipedia (extended info but very difficult to map info together)
Underware (great, but this is the “wrong” way around and a lot of work to collect all “East & West European languages”)
ISO/IEC 8859-2 - Wikipedia
Font map (great, but the languages are missing)
FontShop (too criptic)
Latin-1 Supplement - Wikipedia
Latin Extended-A - Wikipedia
and more similar stuff…

Do you know of any other resources where I can find this information?

Thanks in advance.

HugoJ · August 25, 2022, 12:52pm

Did you check Alphabet Type tools ?

reneverkaart · August 25, 2022, 12:55pm

I did, but this requires that you upload a font so it can check if the required character set is available in that font file. It doesn’t say what languages are supported.

From the top of my head, I would not know what all the Eastern-Europe languages are, and what characters are included.

mekkablue · August 25, 2022, 1:07pm

There is no good answer to the question how a certain character set relates to a list of languages. We had this discussion here a few years back. It gets down to what do you count as a language, and what do you consider as a requirement for that language.

For instance, you might say that if you have ÄÖÜẞäöüß in your font, you support German, because these are the extra letters listed in the official spelling guide. In fact you can write German perfectly well without ẞß, but you will fail on most long texts if you do not have ÀàÉé.

So take all info you get out of this with a ton of salt, because, simply put, it will always be wrong in the end.

Having said that, if you want the Underware info the right way around, run Test > Language Report in the mekkablue scripts. You can easily update the script with additional languages.

Another resource I occasionally use: Letter database: languages, character sets, names etc

reneverkaart · August 25, 2022, 1:12pm

Makes sense, Rainer. For me it’s just a matter of definition. If a client wants a font that covers the Western-European languages, it would be unprofessional to deliver a font that lacks some characters because it’s hard to define a character set.

I’ll try to mix all the info together and get a good setup that works for me.
A site like Alphabet Type and Underware Plus are then a resource for the final check if I did it right.

obiobik · August 25, 2022, 1:13pm

Have you checked this? GitHub - rosettatype/hyperglot: Hyperglot: a database and tools for detecting language support in fonts

George_Thomas · August 25, 2022, 2:09pm

@reneverkaart I just noticed you didn’t have this on your list: Evertype: The Alphabets of Europe

reneverkaart · August 25, 2022, 2:14pm

Cool, very good. Thanks

reneverkaart · August 25, 2022, 2:18pm

I’ve checked this one to. This helped me a lot defining the characters inside a certain language. It was just so much work to combine this with the other languages. The same accents are used by so many languages that it’s crazy to map them all into a group…

obiobik · August 26, 2022, 8:56am

Would it be to useful to export a typeface with the character sets you mentioned separately and try them one by one with hyperglot or something similar? This way maybe you can see which set supports which languages.

alexs · August 26, 2022, 12:35pm

A seemingly rational approach is to offer a reasonable char set and extend it upon request. Here, “reasonable” is a balance between development costs and coverage (I suggest counting people/users, rather than merely a number of languages). Building such a default set may take time, but you already have all the good links for that. Check what well-established foundries offer, check whether rare characters are actually used. For instance, some add Aringacute and screw up vertical metrics for that one char (probably because Underware list it for Danish), but my research shows that it’s hardly ever used.

reneverkaart · August 26, 2022, 12:41pm

Sure alexs, the number of users is important, but this is another part of the price offer.

If a client asks for coverage of a certain region I’d like to also document the correct languages so that there can never be an issue over the project scope.
I’ve been doing my job since 1995 and if there’s something that has ALWAYS saved the day, it’s having a solid scope.

reneverkaart · August 26, 2022, 1:46pm

This is another great resource I just found:
https://bulletproof.italic.space/languages

I tested it with Open Sans Regular and it gives a great out put of the available characters and supported languages.

reneverkaart · August 26, 2022, 3:33pm

I’m sure this will work, but it’s strange to me to see how difficult it is to get this information together. This would mean I need to create different fonts with these character sets just to see what languages they support. It becomes more and more evident to me that character sets vs languages are not important to anybody selling (custom) fonts. Perhaps it’s just me being too professional.

FlorianPircher · August 26, 2022, 3:40pm

There is also a good topic about language coverage and how it relates to glyph sets and shaping rules on the TypeDrawers forum.

reneverkaart · August 26, 2022, 3:43pm

Thanks Florian, I dove into it, and the explaination from Ray Larabie makes a lot of sense and stuck with me most:

mrbrezina · September 19, 2022, 2:03pm

Consider Hyperglot: https://hyperglot.rosettatype.com

Honza · December 4, 2023, 5:29pm

It’s fantastic tool, as I can copy & paste symbols directly in Glyphs app to test them, thank you for making it available like that, for a complete noob type designer like me it’s priceless.