Btw, I believe A.smcp is wrong, too. I’d clearly vote for A.sc (as per the original post).
Pretty much, actually the majority of PDF producers don’t embed the text string in the PDF, only the glyphs w/o CMAP table (which is often when present is simply generated from glyph names).
Sorry to be persistent but could you be more specific please? What is “the majority of PDF producers”?
Anything other than Adobe apps and few other applications like LibreOffice.
Sorry, would you please be so kind and mention one specific scenario?
Honestly, it is no big difference if you would get lowercase or uppercase copying text out of a PDF. Actually with the uppercase you would at least also get part of the highlighted intention of the author/typographer and it’s easy to spot them and change them to mixed case afterwards. Actually outputting them in lowercase might also lead to getting names in all lowercase, as the authors might not have minded the correct underlying differentiation if outputted as all-small-caps.
And then we also have the implementation in html/etc. where you’d always get the original text, ignoring any opentype magic.
I would highly doubt that this pdf argument is really in favour of the current implementation or possibly even argue that it has no relevance as long as the letter itself is identifiable, outside of its upper or lowercase variation. And please give us a specific scenario where it really would be troublesome.
Try with Apple’s TextEdit or Pages, or XeTeX, or Chrome, or Firefox, with two fonts one with a->a.sc and one with a->A.sc and see what text you get if you copy from the PDF.
The thing with apple’s text engine is a little more complex (I have not checked in recent system versions though, so this may be outdated): it inherits the character from the first appearance of the smallcap glyph and sticks with it for subsequent appearances of the same glyph. So copying a long smallcap text typically gives you miXeD cAse.
My conclusion from the various (buggy) PDF implementations is that it doesn’t make sense to fix bugs in other software with fonts. The software is not supposed to take the glyph name into account in the first place. And there are ways to produce (and support) properly encoded PDFs. The fact that it is not done properly is not the fault of fonts or font editors.
I don’t disagree, but this is about the only reason to include glyph names in shipped fonts at all, so if people don’t care about this anymore then fonts shouldn’t be exported with glyph names and it can save a few bytes per file (can reach to few KB for fonts with 1000s of glyphs).
Thanks! I can now confirm this.
I suppose you are talking about fonts that include LC as well as UC named small caps? From what I just tested, with only one version in the font the copied text is strictly based on the glyph name case, no matter what comes before.
Yay! Does that mean we will see support for the  A.sc naming scheme in Glyphs soon?
So, from what I understand, we have three options:
- using a.sc as well as A.sc: This may copy miXed CasE (if I understand Rainer correctly), which is, of course, the worst case
- using a.sc: This always copies LC, which is not ideal
- using A.sc: This always copies UC, which is also not ideal but certainly better than always copying LC, so this is the best solution
Though this should only happen if you are messy within your font file using sometimes one, and sometimes another, right? If so, I don’t see a problem of co-exisiting in the current version. (though in major updates as Glyphs 3 the lowercase approach could be removed)
If you copy from anything other than PDF, you always get the original text regardless of glyph names. So even of the glyph is named
A.sc but mapped from
a you will get
a. The same happens with PDF files generated by applications that embed the original text (like LibreOffice, and I suppose InDesign but it is been years since I last tested it).
The case of the copied text depends on the character of the first smallcap, not the glyph name at all.
Example: Imagine the phrase ‘Sara Atlas’ and you turn that into small caps (with a font that has one set of small caps for both smcp and c2sc) in TextEdit and make a PDF from it, then you copy the text out of the PDF in Preview.app, and paste it somewhere, you’ll get ‘Sara atlaS’. All small cap a’s will be lowercase because the first appearance of a small cap a (the second letter in ‘Sara’) was lowercase. All occurrences of small cap s’s will be uppercase because the first small cap s was an uppercase letter (the S in ‘Sara’).
Not on my Mac now, so I cannot verify if that is still the case in 10.14 or later. But the glyph name has no influence, as far as I remember. I seem to remember that you could circumvent it with two sets of small caps, one for smcp and one for c2sc. Should be possible to automate by simply employing those suffixes (e.g. S.c2sc and s.smcp). But I certainly do not recommend it just to fix a bug in a third party app.
For the user it won’t make much of a difference because copying text out of PDFs is pretty much broken in pretty much any implementation I have seen so far. It’s like fixing a small scratch on a total-loss car wreck. And the irony of it is that while it may be a functioning workaround for one PDF implementation, it may make things worse in others.
Again, it’s not the type designer’s job to fix broken PDF viewers and creators. Don’t waste your time on it other than perhaps pointing users to bug report pages.
Okay, I understand that you don’t want to fix broken PDF viewers and creators but I’d like to understand what is going on here, at least. I am really puzzled.
You mean, by applying
smcp as well as
How would Preview know whether the first small cap a was lowercase? I thought the whole issue here is that this character information is not written into the PDF? Even without testing, I cannot even theoretically understand how this case-jumbling behaviour could possibly come about.
I can’t confirm this behavior, but I can guess how it happens. Assuming All smallcap glyphs use the same names regardless of the original characters (so the glyph string in the PDF file is
S.sc A.sc R.sc A.sc space A.sc T.sc L.sc A.sc S.sc), a smart-but-not-so-smart PDF producer creating CMAP table would:
- See first
S.scand figure it was an
Sin the original text and map it to
S, and subsequent appearances of
S.scwill b e skipped because CMAP can only have one entry per glyph.
- See the first occurrence of
A.scand figure that it was
ain original text, etc.
- So the CMAP would be something like:
- Here the PDF producer is trying to be smart and use the original text to guide the creation of the CMAP table, but CMAP has limitations and you end up with situation like this.
- A dumber PDF producer would solely use glyph names and map them all to upper case, but then you still get different results copying from PDF and spying from HTML (or anything else, they all will give you the original text).
Ah, I am beginning to understand. Thanks for the detailed explanation, Khaled!
I was not aware that a dumb PDF generator may have access to the raw text. The only dumb PDF generator I have worked with is Acrobat Distiller. So, in our scenario, the glyphs in the embedded font do have (Unicode) character information attached but the text in the PDF does not, right?
With the current macOS, I cannot reproduce the mixed-case problem, however. Looks like Apple has switched to the proper dumb system (which is good news)?
As an update to what I wrote above, as a font maker, I have three options:
- using a.sc as well as A.sc: This would be the ideal case for the copy & paste problem but I consider the added font development effort and file size inappropriate for this small benefit
- using a.sc: This always copies LC, which is not ideal. No additional font development effort.
- using A.sc: This always copies UC, which is much better than always copying LC, but it (unnecessarily!) requires additional font development effort because I cannot use the automatic feature generation. I am currently not using this because I consider the additional effort (compared to 2.) inappropriate for this small benefit (compared to 2.)
So, @GeorgSeifert: Why, oh why does Glyph refuse to support A.sc? It would allow me to create fonts with a better behaviour (than a.sc) without additional effort.
I’ll have a look.
Hi, let me add another perspective to this discussion. I think the way we perceive small caps differs based on local typographic tradition. My observation is, the traditional use of small caps in many European countries is using them as a stylistic version of lower case letters (Pʟᴀᴛᴏɴ, NATO), similarly to Italic or Bold. While the tradition in Anglophone countries (or at least US) is different: they use (all) small caps as a nicer alternative to upper case (ᴘʟᴀᴛᴏɴ, ɴᴀᴛᴏ). Therefore apart from the @TimAhrens option 1 (using a.sc as well as A.sc) both options 2 and 3 will be incorrect for certain languages or regions. Btw. I would be very interested which way of using small caps is more frequent, Pʟᴀᴛᴏɴ or ᴘʟᴀᴛᴏɴ…
(I hope small caps are displays properly, I fake them using Unicode small capital phonetic letters.)
I don‘t know how things are in the US, but in the UK I‘ve always seen small caps as an lower-case stylying (for instance in caps and small caps headings, in the opening words of chapters in book setting and as an alternative form of emphasis to bold or italic). For sure, we also use them with non-lining figures as a way of harmonising postcodes with the text of an address, and for acronyms and titles. But it‘s always made sense for me to apply small caps styling to lowercase.