Justification issues

Galifer · February 6, 2020, 3:36pm

So I’m having some justification issues with my font. I think helping the know the precise rules the justification process follows, would be very useful. Is there an explanation fo that somewhere? Couldn’t fine any.

I’m mainly using Pages on Mac for using my font.

So my font is non-English. I’m using normal English keyboard keystrokes for my glyphs, but they do not represent what they usually do.

I want it to justify the text treating all glyphs equally, so it can split text anywhere, but not to split a mark of what it’s marking. I have a number of nonspacing marks.

The problem I have is that it forces some items to be kept together, refusing to break them, and, sometimes forces marks to be separated, leaving them isolated at the top of lines sometimes, rather than at the bottom of the previous line with the glyph they are marking!

An example. I will keep this simple are refer to glyphs as either liga (regular ligatures), nonspacing mark liga, or by name in case of non-liga.

At the bottom of one line, I have:

period + liga + nonspacing mark liga + hyphen

Then there’s a huge gap to the bottom of the line (exactly what I’m trying to avoid).
Then:

liga + parenright (nonspacing mark) + liga + period + liga + nonspacing mark liga + hyphen + comma + liga + hyphen + liga + hyphen_hyphen.liga + comma + nonspacing mark liga + comma [etc.]

The gap is big enough to fit this much of the next line I showed above:

liga + parenright (nonspacing mark) + liga + period + liga + nonspacing mark liga + hyphen + comma + liga + hyphen + liga + hyphen_hyphen.liga

To try to get some of that line to be sent back to the previous line to fill the huge gap, I try using spaces, to make it think they are breaks between words, so it can justify better. The strange thing is, adding a space only fixes things in some positions.

Putting a space after:

liga
does nothing!

Putting a space after

liga + parenright (nonspacing mark)
puts ‘liga + parenright (nonspacing mark)’ back to the previous line, great.

Instead of doing that, putting a space after

liga + parenright (nonspacing mark) + liga
does nothing!

Space after:

liga + parenright (nonspacing mark) + liga + period
that works.

Space after:

liga + parenright (nonspacing mark) + liga + period + liga
doesn’t work. SO far looks like the marks might be stopping it, which would be fine. But then, space after:
liga + parenright (nonspacing mark) + liga + period + liga + nonspacing mark liga + hyphen + comma + liga + hyphen
doesn’t work.

So is it because ‘hyphen’ is ‘punctuation’? Apparently not, because space after:

liga + parenright (nonspacing mark) + liga + period + liga + nonspacing mark liga + hyphen + comma
works fine!

I can sort of fix it by hitting return instead of space, but obviously that doesn’t solve things since the line ending with ‘return’ will not get justified!

If I need to change the attributes of my glyphs, perhaps I can change this behaviour? Would love to know how!

Here is an example of it breaking marks apart from their preceding glyph:

liga + hyphen + nonspacing mark liga + liga

It separates by putting this on the next line:

nonspacing mark liga + liga

It doesn’t accept that the mark is marking the hyphen. I even tried going to ‘info’ on hyphen glyph an un-ticking ‘punctuation’ to see if that helped - no change.

chi-suhi-

mekkablue · February 7, 2020, 10:28am

Justification of a line cannot be controlled within the font. This is something the layout engine (or more generally, the renderer) is taking care of.

By doing something with the Latin script (not English, that would be a language) that the layout engine does not expect, in other words, a hack, you will likely confuse the engine. You could try and abandon the Latin character range, either by moving the whole script assignment into the private use area, or using a script that is more similar to yours.

Galifer · February 7, 2020, 11:41am

At the moment I don’t know what it expects. I know we can’t directly control it, but I figured if I know on what laws it operates, I can work out how to work with those rules.

So I would love to know what kind of rules it uses. How does it differentiate between glyphs when it sorts its rules? For example, does it classify them for justification, according to something we can change in font info? For example according to the category or subcategory we have assigned to the glyph? Or, does it go by the glyph title, such as giving specific rules to period, hyphen, regular letters, ligatures etc.? Either of those would give me potential work-arounds, the former being the most flexible.

Or, some other variable?

Thanks!

mekkablue · February 7, 2020, 12:52pm

I don’t recommend this method because if you find out how it works and you manage to make a hack work for now, that’s fine, but the way it works is likely to change, and has changed a lot in the past.

I would rather make it work in harfbuzz, and then file a bug report to Apple to make it work in Apple’s engine.

But that requires setting up your own writing system and using the Unicode PUA.

Not in the least because Apple usually does not communicate internal workings of their software, so it will be hard to get that information in the first place. And we cannot help you here with it either because we are not Apple.

Galifer · February 7, 2020, 1:00pm

Thanks I’ll look up harfbuzz. But in general, is there not a basic way that different programs work with this? At least some basic form of standard, even if it doesn’t cover anything? I didn’t find anything googling so far.

mekkablue · February 7, 2020, 1:52pm

What you are looking for is Harfbuzz, and perhaps also Microsoft’s universal shaping engine. Apple uses something called CoreText.

What you can do from opentype’s point if view is documented on Microsoft.com/typography

GeorgSeifert · February 7, 2020, 5:46pm

Line breaking is controlled by Unicode properties of the character. For Latin characters it might rely on the language setting of the text.
To break after any character, you need to pick codes that are not associated with any language as for all languages that I know of (that uses alphabets) it would use word breaks (break on the most recent space).

Galifer · February 7, 2020, 8:22pm

Thanks @GeorgSeifert. Is there anywhere I can see the rules? Even common ones if there are variations, should give me the basics!

Also does that also mean I can change the Unicode value of any glyph, and that will thereby change justification behaviour? For example even non ligatures, perhaps I could change even unicode value of period, hyphen and so on?

Or I could assign them all the Unicode value for space - 0020?

Please also note that I reported strange behaviour of it not working with spaces depending on what is before the space. This issue is really bugging me - I made the space very small so that I could manually add spaces to manually justify the text but this issue prevents that method. See:

Galifer:

At the bottom of one line, I have:

period + liga + nonspacing mark liga + hyphen

Then there’s a huge gap to the bottom of the line (exactly what I’m trying to avoid).
Then:

liga + parenright (nonspacing mark) + liga + period + liga + nonspacing mark liga + hyphen + comma + liga + hyphen + liga + hyphen_hyphen.liga + comma + nonspacing mark liga + comma [etc.]

The gap is big enough to fit this much of the next line I showed above:

liga + parenright (nonspacing mark) + liga + period + liga + nonspacing mark liga + hyphen + comma + liga + hyphen + liga + hyphen_hyphen.liga

To try to get some of that line to be sent back to the previous line to fill the huge gap, I try using spaces , to make it think they are breaks between words, so it can justify better. The strange thing is, adding a space only fixes things in some positions .

Putting a space after:

liga
does nothing!

Putting a space after

liga + parenright (nonspacing mark)
puts ‘ liga + parenright (nonspacing mark) ’ back to the previous line, great.

Instead of doing that, putting a space after

liga + parenright (nonspacing mark) + liga
does nothing!

Space after:

liga + parenright (nonspacing mark) + liga + period
that works.

Space after:

liga + parenright (nonspacing mark) + liga + period + liga
doesn’t work. SO far looks like the marks might be stopping it, which would be fine. But then, space after:
liga + parenright (nonspacing mark) + liga + period + liga + nonspacing mark liga + hyphen + comma + liga + hyphen
doesn’t work.

So is it because ‘ hyphen ’ is ‘ punctuation ’? Apparently not, because space after:

liga + parenright (nonspacing mark) + liga + period + liga + nonspacing mark liga + hyphen + comma
works fine!

I can sort of fix it by hitting return instead of space , but obviously that doesn’t solve things since the line ending with ‘ return ’ will not get justified!

If I need to change the attributes of my glyphs, perhaps I can change this behaviour? Would love to know how!

Here is an example of it breaking marks apart from their preceding glyph :

liga + hyphen + nonspacing mark liga + liga

It separates by putting this on the next line:

nonspacing mark liga + liga

Galifer · February 7, 2020, 8:35pm

Having another look, it seems to me that what is stopping it from working is - if there’s a hyphen followed by a comma or period, a space in between after the hyphen, it will not work, (in my case Pages) will not separate that across lines even with the space there.

Deleting the Unicode values from those glyphs does not change the behaviour. So I guess it’s taking the classification straight from the keystrokes. Can you think of any way around this without changing the keystrokes for input?

Can I for example either make some other change? Or, is there a keystroke that is more powerful than space to force it to understand it’s a ‘word break’ kind of thing? So it knows it can separate it to the next line?

GeorgSeifert · February 7, 2020, 9:13pm

You misunderstand how Unicode works quite a bit. You can change the Unicode of period. Unicode defines what you get when you type something. So if you change the Unicode of ‘period’, it wouldn’t be a period any more.

I don’t really understand what you are trying to achieve with that feature code. Can you post a screenshot of the final result?

Galifer · February 7, 2020, 9:17pm

Perhaps I am misunderstanding, but I do not know in what way yet. I did change the unicode of period - I deleted the Unicode in the font info for period as I mentioned above. And Pages is still treating it in exactly the same way.

So, is it because I left it blank and Pages guessed it? Or… ?

Galifer · February 7, 2020, 9:20pm

I’m trying to make a space act like a … ‘word break’ I guess it might be called? So instead of huge spaces at the bottom of lines, it can break up the text into smaller chunks. I’d like the space to always do that. But, specifically, it is not doing that if the space is preceded by a hyphen and followed by a period or comma.

I checked this in a system font (normal English) and the behaviour is the same. And as I said, it’s the same even if I strip all those of their Unicode identification.

GeorgSeifert · February 7, 2020, 9:25pm

If the font behaves the same way as before, either the OTF export ‘fixed’ that and re-added them or you have a font cache problem: https://www.glyphsapp.com/tutorials/eliminating-font-cache-problems

Galifer · February 7, 2020, 9:49pm

I always make a new font with new font name, so, no cache problem.

I also just ran a test to see if it was guessing because it was blank. I changed it to a different Unicode… but it seems that’s connected to keyboard input - the period glyph is no longer triggered by the period key!.. so, that method is therefore not for me if it affects the input method.

But anyway, seems we have learned that deleting Unicode doesn’t delete the unicode somehow at least - still recognised as standard Unicode values on exporting. If that is helpful knowledge for anyone!

I have one idea that is awkward, but, have you any nice ideas? Basically all I am looking for is a way of inserting ‘word breaks’ which will not be ignored when followed by comma or period, as space is being. Perhaps there is some Unicode character more powerful than a space, but not breaking the line like return?