American English vs Central and Western European Languages

Hello,

I am designing a font and I am supporting the western and central European character sets.

American English is the only language I speak fluently. I have heard that letter combinations found in words are mostly the same between American English and the languages of western and central Europe. Is this true?

Thank you in advance,
-Sam

No, not really. Well… define ‘mostly’.

The majority of letters found in words in Western and Central European languages are the 26 letters also found in English (A to Z). The majority of the combinations will also be ones that exist in at least some English words – but far from all. Hungarian, for example, uses ⟨zzs⟩, ⟨sz⟩, ⟨szs⟩ and various other combinations of s and z that are never found in English words; Finnish uses double vowels like ⟨aa, öö, yy⟩ all over the place; German uses ⟨dsch⟩ to represent the ‘j’ sound; Portuguese uses ⟨lh, nh⟩; Irish uses ⟨bh, mh, fh, bhf⟩; and so on and so forth.

And then there are the letters (or more commonly, letters with diacritics) that aren’t used in English at all, like Icelandic ⟨þ, ð, ý⟩, Danish/Norwegian ⟨æ, ø, å⟩, German ⟨ß⟩, Dutch ⟨ij⟩, Welsh ⟨ŵ⟩, Polish ⟨ł⟩, Czech ⟨ř, ů⟩, etc.

If you take a random page of text in any given Western or Central European language, certainly at least 60% of the letter pairs on the page will also appear at least somewhat regularly in a number of English words; but there will be a fair amount of combinations that either never appear at all, or are exceedingly rare in English.

I don’t quite understand the point of your question. What do you need to know this for? Kerning?

If you’re sticking to Latin, then there are no surprises in combinations of the standard 26 letters (without diacritics). Of course, you’ll need to take into account a lot more combinations (and specific ones, at that) when you move on to diacritics and other characters (such as German ß, Icelandic ð, Catalan ŀ). Those only appear in language-specific combinations. For example, you never need to consider ð followed/preceded by w or z, because the latter two don’t occur in Icelandic.

You will also need to take accents into account when kerning, such as Tó, which you often can’t kern as tightly as To (especially in heavier, condensed weights).

Edit: I see see Janus beat me to it :smiley:

However, ⟨ðw⟩ does occur in Old and Middle English. For example, the Middle English version of thwart was varyingly – and quite randomly – written ðwert, þuert, thwerrt, thuart and half a dozen other ways. :wink:

Hello Janus Bahs Jacquet and Sebastian Carewe.

I am indeed asking this question for kerning reasons. I have all of my characters (accented and not) paired into kerning groups. I have heard and read contradicting information about letter combos in American English vs central and Western European languages.

Experienced type designers have told me that it is a waste of time to kern and/or create ligatures for every possible letter pair because a good chunk of them won’t be relevant to the languages you’re supporting. I’m taking this advice and trying to identify an efficient kerning and ligature creation process.

I only speak American English. Thus, I’m feeling stuck/overwhelmed and don’t know how to proceed from where I currently am.

In this thread, which you started two years ago, there are some good resources, especially Tim’s pair list:

There is no better resource in terms of kerning pair frequency/existence.

For a basic use, you can extract the pair_frequencies.txt file, search for the character you are interested in, and then see which other characters it occurs with.

What do you mean by this?

Hi Sebastian,

I will check that out again. I remember trying to use Kern On a couple years back but it didn’t work for some reason.

Answer to your follow-up question:

I mean that there are inconsistencies in the information I’ve heard and read about letter combos in the languages of western and Central Europe vs American English.

One thing I’ve been told is that base letter combos are the same while pronunciation and occurrence frequency varies per language. This doesn’t line up with what you and Janus are telling me now.

I’m not suggesting you use Kern On (but you can, it’s excellent!), but for what you’re trying to do (gather information), you can just look at the aforementioned file inside the Kern On plugin, as Tim described.

I don’t understand what this means, or rather, what this implies. Of course, certain letter combinations will be used more in some languages than in others. ⟨gy⟩ occurs all the time in Hungarian, but almost never in German, for example, while both letters exist in both languages. However: What does this mean for you? Just kern them anyway, since the combination does occur in at least one language (in the example for ⟨gy⟩, Hungarian and English come to mind).

I still don’t really understand your question, or what you’re trying to achieve exactly.

Hi Sebastian,

I apologize if I am having trouble communicating exactly what I mean. I remember trying to ask the same question a couple years back on that thread you posted. I tried using Tim’s resource and for some reason I couldn’t extract the file back in 2023. That said, I can access the file now and I see a load of different pairs. Thank you!

In short, I’m still trying to identify the best and most efficient way to kern a font based on a specific set of supported languages (without using the brute-force approach of handling unnecessary pairs). This seems daunting because American English is the only language I speak.

As someone who is still relatively inexperienced, answers aren’t always clear and sometimes it is easy to go down a rabbit hole of information on the internet. You also sometimes get different answers to the same questions depending on who you ask.

The pieces are fitting together more and more and the optimal process for making a font is becoming clearer.

Which is a language that loves abbreviations, YKWIM? There are also names, brands, loan words and so on, which means that

is the brute force, all-to-all. “Useless” pairs don’t add as much more work as it might seem

@SamMorgan — This might be helpful. A set of word lists for many languages organized by frequency. GitHub - hermitdave/FrequencyWords: Repository for Frequency Word List Generator and processed files

Hi George,

A list of predefined pairs is what Sebastian was suggesting earlier. The list exists as part of the Kern-On demo download.

Hi Alex,

It sounds like you’re arguing that no pair is useless because abbreviations can consist of any letter pair. Do I have you?

For a display font, do you think abbreviations would be very common? In my mind, I picture singular words and titles.