Introduction to HaXml
A quick introduction into the HaXml library, reading data from XML and representing them as Haskell types
I was looking for a HaXml tutorial or code examples on the web but I could not find any description of the basic idea behind the library. There are a few tutorials out there, but their focus is more on the power of combinators than on the library basics. There is also a code fragment in the book real-world haskell but (at the time of writing this post) it is outdated.
The aim of this tutorial is to describe the basic HaXml functionality - reading XML data from a file into custom types and saving the data from these types back to another file.
What we want to achieve is:
main :: IO ()
= do
main <- fReadXml "sample-in.xml" :: IO Students
students print students
"sample-out.xml" students fWriteXml
The sample-in.xml looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<Students>
<Student gender="M">
<name>Zachary Popme</name>
<Address>
<street>roma</street>
<number>10</number>
</Address>
</Student>
<Student gender="F">
<name>Daria Marchenko</name>
<Address>
<street>uptown</street>
<number>4</number>
</Address>
</Student>
</Students>
First we need to install the library with:
cabal update
cabal install haxml
We could also work with the XML data directly and not map it to any types. This can be achieved with the ANYContent
type. We could read everything from the XML like this:
main :: IO ()
= do
main <- fReadXml "sample-in.xml" :: IO ANYContent
c print c
Nonetheless, our goal is to map the XML to custom data-types. These types will have 1 to 1 correspondence with the XML data, although this is of course not necessary.
data Students = Students {enrolled :: [Student]} deriving Show
data Student = Student {name :: String, gender :: String, address :: Address} deriving Show
data Address = Address {street :: String, number :: Int} deriving Show
HaXml reads the contents (elements, attributes, comments, declarations etc.) of any XML and wraps it into HaXml data-types. For illustration purposes we can have a look at the Address
element:
CElem (Elem (N "Address") []
[CString False "\n\t\t\t",
CElem (Elem (N "street") [] [CString False "uptown"]),
CString False "\n\t\t\t",
CElem (Elem (N "number") [] [CString False "4"]),
CString False "\n\t\t" ]),
CString False "\n\t" ]),
CString False "\n"
]
We need a way to extract the data from these HaXml types and store them in our types. We can do it by implementing two typeclasses HTypeable
and XmlContent
.
Let’s describe these in turn.
HTypeable
The class is quite simple:
class HTypeable a where
toHType :: a -> HType
HType
adds additional metadata to the type it augments. For example:
instance HTypeable Integer where
= Prim "Integer" "integer" toHType _
The Prim
type holds the XML-element name for a given primitive (“integer” in case of the type Integer
). HaXml takes advantage of metadata in HTypes
when mapping from Haskell types to XML.
We have defined three records - Student
, Students
and Address
. For each record we have to define an instance of HTypeable
. Let’s begin with Address
:
instance HTypeable Address where
Address s n) = Defined "Address" [] [Constr "Address" [] [toHType s, toHType n]] toHType (
The arguments to the Defined
data constructor are as follows:
- name of type,
- a list of
HTypes
of the type variables, - a list of all data constructors for the type.
The Constr
has 3 arguments:
- name of the constructor (will be used as the XML tag-name),
- a list of
HTypes
of the type variables, - list of all arguments to the data constructor (as
HTypes
).
A more detailed description of Constr
can be found in the code.
The other two instances are similar to the first one:
instance HTypeable Students where
Students ss) = Defined "Students" [] [Constr "Students" [] [toHType ss]]
toHType (
instance HTypeable Student where
=
toHType s let Student n g a = s in Defined "Student" [] [Constr "Student" [] [toHType n, toHType g, toHType a]]
XmlContent
Now we can start defining the XmlContent
instances which have two functions - parseContents
(from XML to types) and toContents
(from types to XML).
instance XmlContent Address where
= inElement "Address" (Address <$> parseStreet <*> parseNumber)
parseContents where
= inElement "street" text
parseStreet = read <$> inElement "number" text
parseNumber
@(Address s n) =
toContents a0 $ toHType a) [mkElemC "street" $ toText s, mkElemC "number" $ toText . show $ n]] [mkElemC (showConstr
Let’s first look at toContents
. Its implementation will be (almost) the same in all instances. toContents
is called when a new Address
XML-element should be created. We define the XML-element with mkElemC
. The first argument is the tag-name (we can use the showConstr
function which pulls the name from a HType
). Afterwards we create two sub-elements - “street” and “number”.
parseContents
traverses the HaXml types which wrap the XML data. It uses the HaXml functions to navigate through all the types like CElem
, Elem
, CString
etc. (we saw those in the example above). The inElement
function are basically two functions - element
which moves to the next XML-element and interior
which goes “into” the contents of that element. The text
function extracts the text from the element. Since XmlContent
is a functor (inElement
returns XMLParser a
) we can use the applicative operators as shown above.
The Students
instance is pretty much the same:
instance XmlContent Students where
= inElement "Students" (Students <$> parseContents)
parseContents
@(Students ss) =
toContents v0 $ toHType v) (toContents ss)] [mkElemC (showConstr
For reading and writing attribute values I’ll define helper functions (if there is a better HaXml-way to do it let me know!). The first argument of attrToText
is the name of the attribute, the second is a list of attributes which we go through in order to find an attribute with this name. If no attribute is found, the function returns Nothing
.
mkAttrElemC
creates a CElem
with the appropriate attributes.
attrToText :: String -> [Attributes] -> Maybe String
= foldl1 (<|>) attrText
attrToText n as where attrText = map (fromAttrToStr n) as
mkAttrElemC :: String -> [Attribute] -> [Content ()] -> Content ()
= CElem (Elem (N x) as cs) ()
mkAttrElemC x as cs
instance XmlContent Student where
= do
parseContents <- element ["Student"]
e Student <$> parseName <*> parseGender e <*> parseContents )
interior e (where
= inElement "name" text
parseName = return $ fromMaybe "unknown" $ attrToText "gender" $ attrs e
parseGender e
@(Student n g a) =
toContents v0 $ toHType v) [mkAttr "gender" g] (mkElemC "name" (toText n) : toContents a)] [mkAttrElemC (showConstr
We read the name (parseName
), afterwards we use the HaXml Student
element e
(it’s just an Elem
type) for reading all of its attributes and return the value of “gender” attribute. The last function is parseContents
which will trigger parseContents
in the Address XmlContent
instance.
Generating code with DrIFT
The last point to mention is that we can partially automate generating this boilerplate. This can be achieved with DrIFT.
Install with:
cabal update
cabal install drift
We prepare the template:
data Students = Students {enrolled :: [Student]} deriving Show
data Student = Student {name :: String, gender :: String, address :: Address} deriving Show
data Address = Address {street :: String, number :: Int} deriving Show
{-!for Address derive : XmlContent!-}
{-!for Students derive : XmlContent!-}
{-!for Student derive : XmlContent!-}
Now we can generate all the instances by calling:
DrIFT xmlparser.hs
from the command line. It prints the generated source code to stdout. A full working example can be found here xmlparser.hs