Recently I had a need to parse HTML documents and extract information from them. The code I wrote was somewhat complicated, so I knew I needed tests.
I wanted to embed HTML Documents directly into the tests. One easy way to do this is to use a Quasiquoter that produces an xml-conduit
Document. After searching on Hackage, it appears that there is only one package that provides Quasiquoters for Documents: xml-hamlet.
xml-hamlet is nice. It could do with a little more documentation, but otherwise it is easy to use. However, it has one major downfall. It forces the programmer to write in Hamlet syntax. I felt like this was a little much for just some simple tests.
In my current freelance project, my coworker Kadzuya Okamoto created a package called heterocephalus. It provides most of Hamlet's functionality, but without requiring the programmer write in Hamlet syntax.
I decided to use heterocephalus to create a Quasiquoter that produced
Documents, just like xml-hamlet, but that let the programmer write in normal HTML/XML syntax. I released it as a package called xml-html-qq.
Here's an example of using variable interpolation, which is one of the control statements available:
If you try to use improper XML, you will get a
Left SomeException. Here is an example of a
Document where one tag is not closed:
Text.HTML.QQ.html produces values of type
Document. It also allows heterocephalus-style control statements.
Text.HTML.QQ.html uses a different parser than
Text.XML.QQ.xml, so it is permissive of documents that aren't perfect. Here's an example of using it in GHCI:
Here's an example of using variable interpolation:
Elements with unclosed tags will work like expected:
xml-html-qq exports some additional Quasiquoters similar to
html. Check the Haddocks for more information.
There are two additional features that would be nice to have in this library:
(Github Issue) Functions for creating documents that escape HTML and XML characters.
If you are interested in working on either these, drop me a line on the Github issues.