The purescript-unicode library is a direct port of Haskell’s Unicode functionality. purescript-unicode’s
Data.Char.Unicode module contains all of the Unicode-related functions and datatypes provided by Haskell’s
Originally, I had wanted to use purescript-parsing to parse a programming language. I knew Haskell’s parsec library had convenient functionality for parsing programming languages, but it wasn’t implemented in
purescript-parsing. I decided to port the functionality from parsec to purescript-parsing. However, I ran into a roadblock. parsec was using many Unicode-related functions from Haskell’s
I took some time to think about how to proceed, and I came up with a plan to port all of the Unicode functionality from Haskell’s
Data.Char module to a separate PureScript package. This is what became purescript-unicode. I then sent a PR to purescript-parsing that adds functionality which depends on purescript-unicode.
purescript-unicode Usage Example
Here is a short example of actually using purescript-unicode:
>>> generalCategory 'a' Just LowercaseLetter >>> generalCategory '0' Just DecimalNumber >>> generalCategory '♥' Just OtherSymbol >>> generalCategory '本' Just OtherLetter
>>> isControl '\04' true >>> isControl 'a' false
>>> isPrint '\04' false >>> isPrint 'a' true
>>> isSpace ' ' true >>> isSpace 'a' false
>>> isUpper 'Z' true >>> isUpper 'a' false >>> isUpper '日' false
>>> isAlpha 'a' true >>> isAlpha '日' true >>> isAlpha ' ' false
How is purescript-unicode Implemented?
purescript-unicode works very similarly to Haskell’s
There are multiple possible future improvements.
- Performance: Performance has been completely ignored in the
Data.Char.Unicode.Internalmodule. There are TODOs at the bottom of the file with particularly bad examples of this. It would be nice to fix all these TODOs. It would also be nice to have proper benchmarks for this library.
- Internal File Generation: The shell script that generates the
Data.Char.Unicode.Internalmodule is somewhat hacky. Internally it’s using
awk, so it’s not portable to machines without
awk. Ideally, this shell script could be rewritten as a purescript-node program. That way it could be runnable by anyone as long as they have