xml-html-qq

2017-02-17

Recently I had a need to parse HTML documents and extract information from them. The code I wrote was somewhat complicated, so I knew I needed tests.

I wanted to embed HTML Documents directly into the tests. One easy way to do this is to use a Quasiquoter that produces an xml-conduit Document. After searching on Hackage, it appears that there is only one package that provides Quasiquoters for Documents: xml-hamlet.

xml-hamlet is nice. It could do with a little more documentation, but otherwise it is easy to use. However, it has one major downfall. It forces the programmer to write in Hamlet syntax. I felt like this was a little much for just some simple tests.

In my current freelance project, my coworker Kadzuya Okamoto created a package called heterocephalus. It provides most of Hamlet's functionality, but without requiring the programmer write in Hamlet syntax.

I decided to use heterocephalus to create a Quasiquoter that produced Documents, just like xml-hamlet, but that let the programmer write in normal HTML/XML syntax. I released it as a package called xml-html-qq.

xml-html-qq Examples

xml-html-qq provides two main Quasiquoters: Text.XML.QQ.xml, and Text.HTML.QQ.html.

xml

Text.XML.QQ.xml produces values of type Either SomeException Document. It allows heterocephalus-style control statements. Here's an example of using it in GHCi:

> :set -XQuasiQuotes
> import Text.XML.QQ (xml)
> [xml|<root><node>hello</node></root>|] :: Either SomeException Document
Right (Document ... )

Here's an example of using variable interpolation, which is one of the control statements available:

> let foo = "hello" :: String
> [xml|<root><node>#{foo}</node></root>|] :: Either SomeException Document
Right (Document ... )

If you try to use improper XML, you will get a Left SomeException. Here is an example of a Document where one tag is not closed:

> [xml|<root><node></root>|] :: Either SomeException Document
Left ...

html

Text.HTML.QQ.html produces values of type Document. It also allows heterocephalus-style control statements. Text.HTML.QQ.html uses a different parser than Text.XML.QQ.xml, so it is permissive of documents that aren't perfect. Here's an example of using it in GHCI:

> :set -XQuasiQuotes
> import Text.HTML.QQ (html)
> [html|<html><body><p>hello</p></body></html>|] :: Document
Document ...

Here's an example of using variable interpolation:

> let foo = "hello" :: String
> [html|<html><body><p>#{foo}</p></body></html>|] :: Document
Document ...

Elements with unclosed tags will work like expected:

> [html|<html><body><br></body></html>|] :: Document
Document ...

Other Functions

xml-html-qq exports some additional Quasiquoters similar to xml and html. Check the Haddocks for more information.

Future Work

There are two additional features that would be nice to have in this library:

  1. (Github Issue) Quasiquoters that produce Elements or [Node]. This would make it much easier to programmatically combine different Elements into one big document.

  2. (Github Issue) Functions for creating documents that escape HTML and XML characters.

If you are interested in working on either these, drop me a line on the Github issues.

tags: haskell