Character substitutions in text
TeX handles some character sequence substitutions by (ab)using the ligature mechanism, e.g., ``
→“. This works reasonably well for Computer Modern which defines these in its ligature table, but falls apart once we start trying to use non-TeX fonts. Furthermore, there’s the added complication that most fonts put the characters '
and `
in character positions 39 and 126 while TeX is expecting those characters to typeset ’ and ‘.
I’m thinking that a solution to this would be to have a character sequence substitution that’s run-time configurable as part of the text-input pipeline. This would happen after commands have been interpreted but before the text stream is processed for ligatures and line breaks. The standard route would be to import a tab-delimited table of input and output sequences of Unicode characters. The standard TeX input would look like:
` |
‘ |
`` |
“ |
' |
’ |
'' |
” |
-- |
– |
--- |
— |
!` |
¡ |
?` |
¿ |
~ |
\u00a0 |
Note that we no longer have an active character concept to allow using ~
for non-breaking spaces. Also, the timing of when the substitutions take place mean that we cannot use this mechanism to insert commands into the input stream. doing a mapping like TeX
→\TeX
will not typeset the TeX logo but will typeset the sequence \TeX instead including the backslash (Actually, given the use of \
to open a Unicode hex sequence it might produce something like a tab followed by eX depending on what other escape sequences are employed.
Other TeX conventions, like the AMS Cyrillic transliteration where ligatures are used to map sequences like yu
→ю can easily be managed. Similarly Silvio Levy’s ASCII input scheme for polytonic Greek can also be easily managed. These would allow for easy input of non-Latin alphabets for users who primarily write in Latin alphabets and work on operating systems where switching keyboard layouts to allow input of non-Latin scrips are difficult.
[…] piece of code (something relatively trivial) for finl is now on crates.io: finl-charsub is the character substitution module for finl. It may have use outside of finl when fixed character sequence replacements are needed in […]