Latest news
Since it’s been a while since I’ve posted an update, current status: I’m working on the parser, but had a bit of a digression to create some Unicode support with the interface that I need for finl. The Unicode Segmentation crate is a nice piece of code, but it assumes that it is to be the primary interface for iterating over input and I, instead, want to be able to augment the CharIndices
iterator that I’m using in the parser. So I had to create my own implementation of Unicode grapheme segmentation. I had initially (incorrectly) assumed I would also need to have to implement Unicode category support as well for this but it turned out I was wrong. Regardless, I did it anyway because I find this kind of coding fun. My implementation of Unicode category identification is significantly faster than the Unicode Categories crate (by a factor of ten according to my benchmarks) which is good because I call this on all command names so it will have a noticeable improvement in speed. My segmentation code on the other hand is running much slower than the Unicode segmentation crate, but it may be an artifact of the fact of how the two are implemented (I return a new String
from a call on the CharIndices
iterator while they, because they’re creating an iterator over *str
, are able to return *str
references and avoid the copy and allocate). As a result, to make sure that my algorithm is not flawed (it does at least pass all the tests), I’m also implementing the algorithm as an iterator on a *str
for benchmarking purposes. I then need to do some documentation and I will put a 1.0 crate for finl_unicode
out into the world. I’m torn between hoping that I won’t have to implement more Unicode support and hoping that I will. This stuff is fun to code.¹
- If anyone wants to hire me to write Unicode support code, I’m listening.