2017-02-17
Recently I had a need to parse HTML documents and extract information from them. The code I wrote was somewhat complicated, so I knew I needed tests.
I wanted to embed HTML Documents directly into the tests. One easy way to do this is to use a Quasiquoter that produces an xml-conduit Document
. After searching on Hackage, it appears that there is only one package that provides Quasiquoters for Documents: xml-hamlet.
xml-hamlet is nice. It could do with a little more documentation, but otherwise it is easy to use. However, it has one major downfall. It forces the programmer to write in Hamlet syntax. I felt like this was a little much for just some simple tests.
In my current freelance project, my coworker Kadzuya Okamoto created a package called heterocephalus. It provides most of Hamlet's functionality, but without requiring the programmer write in Hamlet syntax.
I decided to use heterocephalus to create a Quasiquoter that produced Document
s, just like xml-hamlet, but that let the programmer write in normal HTML/XML syntax. I released it as a package called xml-html-qq.
xml-html-qq Examples
xml-html-qq provides two main Quasiquoters: Text.XML.QQ.xml
, and Text.HTML.QQ.html
.
xml
Text.XML.QQ.xml
produces values of type Either SomeException Document
. It allows heterocephalus-style control statements. Here's an example of using it in GHCi:
> :set -XQuasiQuotes
> import Text.XML.QQ (xml)
> [xml|<root><node>hello</node></root>|] :: Either SomeException Document
Right (Document ... )
Here's an example of using variable interpolation, which is one of the control statements available:
> let foo = "hello" :: String
> [xml|<root><node>#{foo}</node></root>|] :: Either SomeException Document
Right (Document ... )
If you try to use improper XML, you will get a Left SomeException
. Here is an example of a Document
where one tag is not closed:
html
Text.HTML.QQ.html
produces values of type Document
. It also allows heterocephalus-style control statements. Text.HTML.QQ.html
uses a different parser than Text.XML.QQ.xml
, so it is permissive of documents that aren't perfect. Here's an example of using it in GHCI:
> :set -XQuasiQuotes
> import Text.HTML.QQ (html)
> [html|<html><body><p>hello</p></body></html>|] :: Document
Document ...
Here's an example of using variable interpolation:
> let foo = "hello" :: String
> [html|<html><body><p>#{foo}</p></body></html>|] :: Document
Document ...
Elements with unclosed tags will work like expected:
Other Functions
xml-html-qq exports some additional Quasiquoters similar to xml
and html
. Check the Haddocks for more information.
Future Work
There are two additional features that would be nice to have in this library:
(Github Issue) Quasiquoters that produce
Element
s or[Node]
. This would make it much easier to programmatically combine differentElement
s into one big document.(Github Issue) Functions for creating documents that escape HTML and XML characters.
If you are interested in working on either these, drop me a line on the Github issues.
tags: haskell