Introduction to HaXml

[2015-08-06]

A quick introduction into the HaXml library, reading data from XML and representing them as Haskell types

I was looking for a HaXml tutorial or code examples on the web but I could not find any description of the basic idea behind the library. There are a few tutorials out there, but their focus is more on the power of combinators than on the library basics. There is also a code fragment in the book real-world haskell but (at the time of writing this post) it is outdated.

The aim of this tutorial is to describe the basic HaXml functionality - reading XML data from a file into custom types and saving the data from these types back to another file.

What we want to achieve is:

main :: IO ()
main = do
    students <- fReadXml "sample-in.xml" :: IO Students
    print students
    fWriteXml "sample-out.xml" students

The sample-in.xml looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<Students>
 <Student gender="M">
  <name>Zachary Popme</name>
  <Address>
   <street>roma</street>
   <number>10</number>
  </Address>
 </Student>
 <Student gender="F">
  <name>Daria Marchenko</name>
  <Address>
   <street>uptown</street>
   <number>4</number>
  </Address>
 </Student>
</Students>

First we need to install the library with:

cabal update
cabal install haxml

We could also work with the XML data directly and not map it to any types. This can be achieved with the ANYContent type. We could read everything from the XML like this:

main :: IO ()
main = do
    c <- fReadXml "sample-in.xml" :: IO ANYContent
    print c

Nonetheless, our goal is to map the XML to custom data-types. These types will have 1 to 1 correspondence with the XML data, although this is of course not necessary.

data Students = Students {enrolled :: [Student]} deriving Show
data Student = Student {name :: String, gender :: String, address :: Address} deriving Show
data Address = Address {street :: String, number :: Int} deriving Show

HaXml reads the contents (elements, attributes, comments, declarations etc.) of any XML and wraps it into HaXml data-types. For illustration purposes we can have a look at the Address element:

CElem (Elem (N "Address") []
  [
    CString False "\n\t\t\t",
    CElem (Elem (N "street") [] [CString False "uptown"]),
    CString False "\n\t\t\t",
    CElem (Elem (N "number") [] [CString False "4"]),
    CString False "\n\t\t" ]),
    CString False "\n\t" ]),
    CString False "\n"
  ]

We need a way to extract the data from these HaXml types and store them in our types. We can do it by implementing two typeclasses HTypeable and XmlContent.

Let’s describe these in turn.

HTypeable

The class is quite simple:

class HTypeable a where
toHType :: a -> HType

HType adds additional metadata to the type it augments. For example:

instance HTypeable Integer where
toHType   _    = Prim "Integer" "integer"

The Prim type holds the XML-element name for a given primitive (“integer” in case of the type Integer). HaXml takes advantage of metadata in HTypes when mapping from Haskell types to XML.

We have defined three records - Student, Students and Address. For each record we have to define an instance of HTypeable. Let’s begin with Address:

instance HTypeable Address where
toHType (Address s n) = Defined "Address" [] [Constr "Address" [] [toHType s, toHType n]]

The arguments to the Defined data constructor are as follows:

  • name of type,
  • a list of HTypes of the type variables,
  • a list of all data constructors for the type.

The Constr has 3 arguments:

  • name of the constructor (will be used as the XML tag-name),
  • a list of HTypes of the type variables,
  • list of all arguments to the data constructor (as HTypes).

A more detailed description of Constr can be found in the code.

The other two instances are similar to the first one:

instance HTypeable Students where
toHType (Students ss) = Defined "Students" [] [Constr "Students" [] [toHType ss]]

instance HTypeable Student where
toHType s =
    let Student n g a = s in Defined "Student" [] [Constr "Student" [] [toHType n, toHType g, toHType a]]

XmlContent

Now we can start defining the XmlContent instances which have two functions - parseContents (from XML to types) and toContents (from types to XML).

instance XmlContent Address where
    parseContents = inElement "Address" (Address <$> parseStreet <*> parseNumber)
      where
        parseStreet = inElement "street" text
        parseNumber = read <$> inElement "number" text

    toContents a@(Address s n) =
        [mkElemC (showConstr 0 $ toHType a) [mkElemC "street" $ toText s, mkElemC "number" $ toText . show $ n]]

Let’s first look at toContents. Its implementation will be (almost) the same in all instances. toContents is called when a new Address XML-element should be created. We define the XML-element with mkElemC. The first argument is the tag-name (we can use the showConstr function which pulls the name from a HType). Afterwards we create two sub-elements - “street” and “number”.

parseContents traverses the HaXml types which wrap the XML data. It uses the HaXml functions to navigate through all the types like CElem, Elem, CString etc. (we saw those in the example above). The inElement function are basically two functions - element which moves to the next XML-element and interior which goes “into” the contents of that element. The text function extracts the text from the element. Since XmlContent is a functor (inElement returns XMLParser a) we can use the applicative operators as shown above.

The Students instance is pretty much the same:

instance XmlContent Students where
    parseContents = inElement "Students" (Students <$> parseContents)

    toContents v@(Students ss) =
         [mkElemC (showConstr 0 $ toHType v) (toContents ss)]

For reading and writing attribute values I’ll define helper functions (if there is a better HaXml-way to do it let me know!). The first argument of attrToText is the name of the attribute, the second is a list of attributes which we go through in order to find an attribute with this name. If no attribute is found, the function returns Nothing.

mkAttrElemC creates a CElem with the appropriate attributes.

attrToText :: String -> [Attributes] -> Maybe String
attrToText n as = foldl1 (<|>) attrText
    where attrText = map (fromAttrToStr n) as

mkAttrElemC :: String -> [Attribute] -> [Content ()] -> Content ()
mkAttrElemC x as cs = CElem (Elem (N x) as cs) ()

instance XmlContent Student where
    parseContents = do
      e <- element ["Student"]
      interior e (Student <$> parseName <*> parseGender e <*> parseContents )
        where
          parseName = inElement "name" text
          parseGender e = return $ fromMaybe "unknown" $ attrToText "gender" $ attrs e

    toContents v@(Student n g a) =
        [mkAttrElemC (showConstr 0 $ toHType v) [mkAttr "gender" g]  (mkElemC "name" (toText n) : toContents a)]

We read the name (parseName), afterwards we use the HaXml Student element e (it’s just an Elem type) for reading all of its attributes and return the value of “gender” attribute. The last function is parseContents which will trigger parseContents in the Address XmlContent instance.

Generating code with DrIFT

The last point to mention is that we can partially automate generating this boilerplate. This can be achieved with DrIFT.

Install with:

cabal update
cabal install drift

We prepare the template:


data Students = Students {enrolled :: [Student]} deriving Show
data Student = Student {name :: String, gender :: String, address :: Address} deriving Show
data Address = Address {street :: String, number :: Int} deriving Show

{-!for Address derive : XmlContent!-}       
{-!for Students derive : XmlContent!-}
{-!for Student derive : XmlContent!-}

Now we can generate all the instances by calling:

DrIFT xmlparser.hs

from the command line. It prints the generated source code to stdout. A full working example can be found here xmlparser.hs