Generating Haskell data types for complex XSDs using HaXml

[2025-03-17]

We demonstrate that HaXml can be used to generate fairly complex data bindings in Haskell, though the process is not straightforward

The goal is to generate Haskell data/newtype bindings for complex XSDs. Defining Haskell bindings for XSDs with hundreds of XSD-types manually is not feasible. It has been a decade since I wrote a post Introduction to HaXml. I wanted to re-visit the HaXml library and find a way to generate Haskell bindings for a large XSD.

As an example we will use threeDS_V2.2-0.3.xsd (link to older version). An XML for this XSD is the following:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ECommerceConnect xmlns:ns2="http://www.w3.org/2000/09/xmldsig#">
    <EMV3DSRequest id="9">
        <MerchantID>
            123456789012345
        </MerchantID>
        <TerminalID>
            X123456
        </TerminalID>
        <InitBRWRequest>
            <InitAuthData>
                <CardNum>1234512345</CardNum>
                <ExpYear>2029</ExpYear>
                <ExpMonth>08</ExpMonth>
                <TotalAmount>123</TotalAmount>
                <Currency>123</Currency>
            </InitAuthData>
            <eventCallbackUrl>https://localhost/1234567</eventCallbackUrl>
            <challengeWindowSize>00</challengeWindowSize>
            <skipAutoBrowserInfoCollect>Y</skipAutoBrowserInfoCollect>
        </InitBRWRequest>
    </EMV3DSRequest>
</ECommerceConnect>

Some XSDs reference other XSDs. It’s nearly impossible to merge/flatten dependent XSDs unless they use the same namespaces. Hence, for generating bindings you will need to have all XSDs that are referenced and used from your XSD.

HaXml has a helper utility for generating bindings called XsdToHaskell (github link)

The utility needs the following dependencies:

polyparse
pretty
HaXml

Ideally, your XSDs dependencies should form a directed acyclic graph. It doesn’t matter from which XSD you start generating bindings, but it’s easier to start with XSDs without dependencies and then proceed to the dependant XSDs, otherwise you won’t be able to resolve compilation errors.

In our case we have 2 external XSD that are referenced, but only one which is used (xmldsig-core-schema.xsd) inside threeDS_V2.2-0.3.xsd, so let’s start with it. After compiling XsdToHaskell all we need to do is the following:

./XsdToHaskell < xmldsig-core-schema.xsd > XmldSig.hs

After XmldSig.hs was generated we need to clean-up the generated code. If we open XmldSig.hs it starts with something like:


Parse Success!

-----------------

Schema {schema_elementFormDefault = ......)]}

-----------------

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
{-# OPTIONS_GHC -fno-warn-duplicate-exports #-}
module V'
  ( module V'
  ) where
 
import Text.XML.HaXml.Schema.Schema (SchemaType(..),SimpleType(..),Extension(..),Restricts(..))
import ........

Okay, let’s remove the console output, the Schema and give the module a proper name. We get something like this:

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
{-# OPTIONS_GHC -fno-warn-duplicate-exports #-}
module XmldSig where -- added module name

import ........

We make sure XmldSig.hs compiles, and we move to threeDS_V2.2-0.3.xsd. Let’s generate the bindings for threeDS_V2.2-0.3.xsd:

./XsdToHaskell < threeDS_V2.2-0.3.xsd > ThreeDS.hs

When we open ThreeDS.hs we see the following:


Parse Success!

-----------------

Schema {schema_.......}


-----------------

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
{-# OPTIONS_GHC -fno-warn-duplicate-exports #-}
module V'
  ( module V'
  ) where
 
import Text.XML.HaXml.Schema.Schema (SchemaType(..),SimpleType(..),Extension(..),Restricts(..))
import Text.XML.HaXml.Schema.Schema as Schema
import Text.XML.HaXml.OneOfN
import qualified Text.XML.HaXml.Schema.PrimitiveTypes as Xs
import Xmldsig'core'schema'xsd as Ds
import Xenc'schema'xsd

There are 2 imports which are not part of HaXml: Xmldsig'core'schema'xsd and Xenc'schema'xsd. The schema keyword in these 2 imports indicates that they are external dependencies which could be needed in generated bindings. In this case xenc schema is not used, but xmldsig XSD is → we need to add the import XmldSig.

Clean-up of ThreeDS.hs looks like this:

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
{-# OPTIONS_GHC -fno-warn-duplicate-exports #-}
module ThreeDS where -- added module name
 
import Text.XML.HaXml.Schema.Schema (SchemaType(..),SimpleType(..),Extension(..),Restricts(..))
import Text.XML.HaXml.Schema.Schema as Schema
import Text.XML.HaXml.OneOfN
import qualified Text.XML.HaXml.Schema.PrimitiveTypes as Xs
import XmldSig as Ds -- added dependency xmldsig 

Next step is to import ThreeDS and see if it compiles correctly.

....
import ThreeDS as TDS
....
....
main :: IO ()
....

Unfortunately compilation returns the following error:

parse error on input ‘.’

After looking at the generated code we see an incorrectly generated name:

    , eCommerceConnect_Ds.signature :: Maybe Ds.Signature

the fix is easy, we just change it to:

    , eCommerceConnect_Ds_signature :: Maybe Ds.Signature   

We retry compilation and we get another error:

Not in scope: type constructor or class ‘Xsd.XsdString’

This can be fixed by importing HaXml PrimitiveTypes. Note that importing PrimitiveTypes from HaXml will clash with Token type generated in ThreeDS.hs, hence the import also hides Token

import Text.XML.HaXml.Schema.PrimitiveTypes as Xsd hiding (Token)

We try to compile and yet again we get a compilation error:

Not in scope: type constructor or class ‘Ds.Signature’

This is due to ambiguity between element name and element type attributes in XSD. We can find that Maybe Ds.Signature should have been Maybe Ds.SignatureType and thus the fix is simple:

eCommerceConnect_Ds_signature :: Maybe Ds.Signature

changes to:

eCommerceConnect_Ds_signature :: Maybe Ds.SignatureType

Finally, all compilation errors are resolved.

We have generated all the types, but now we need to do something useful with them - read and write XMLs!

HaXml uses the typeclass XmlContent which instructs the parser how to process XMLs. Luckily the generated bindings also contain these instructions and we can plug them in the XmlContent. We also need to supply HTypeable typeclass instance (which I describe in my previous HaXml post) - but it won’t be used as the generated bindings contain all the parsing instructions. In order to read and write our sample XML we need to define the following instances:

instance HTypeable ECommerceConnect where
  toHType x = undefined -- doesn't matter

instance XmlContent ECommerceConnect where
  parseContents = elementECommerceConnect
  toContents = elementToXMLECommerceConnect

The functions elementECommerceConnect and elementToXMLECommerceConnect can be found in the generated module.

Once these are defined we can you HaXml standard API to read and write XMLs. We use the function readXml to read the sample XML. The Haskell representation looks like this:

ECommerceConnect {eCommerceConnect_choice0 = OneOf3 (EMV3DSRequest {eMV3DSRequest_id = XsdString "9", eMV3DSRequest_merchantID = XsdString "\n            123456789012345\n        ", eMV3DSRequest_terminalID = XsdString "\n            X123456\n        ", eMV3DSRequest_choice2 = OneOf3 (InitBRWRequest {initBRWRequest_initAuthData = InitAuthData {initAuthData_choice0 = OneOf2 (TypeCardNum (Long 1234512345),TypeExpYear (XsdString "2029"),TypeExpMonth (XsdString "08")), initAuthData_totalAmount = Long 123, initAuthData_currency = TypeCurrency (XsdString "123")}, initBRWRequest_eventCallbackUrl = TypeCallbackUrl (AnyURI "https://localhost/1234567"), initBRWRequest_challengeWindowSize = Just (TypeСhallengeWindowSize (XsdString "00")), initBRWRequest_skipAutoBrowserInfoCollect = Just (SkipAutoBrowserInfoCollect (XsdString "Y"))})}), eCommerceConnect_Ds_signature = Nothing}

We can use the HaXml function showXml to write the above Haskell ECommerceConnect instance into XML. We get the following XML back:

"<?xml version='1.0' ?>\n<ECommerceConnect><EMV3DSRequest id="9"><MerchantID>123456789012345</MerchantID><TerminalID>X123456</TerminalID><InitBRWRequest><InitAuthData><CardNum>1234512345</CardNum><ExpYear>2029</ExpYear><ExpMonth>08</ExpMonth><TotalAmount>123</TotalAmount><Currency>123</Currency></InitAuthData><eventCallbackUrl>https://localhost/1234567</eventCallbackUrl><challengeWindowSize>00</challengeWindowSize><skipAutoBrowserInfoCollect>Y</skipAutoBrowserInfoCollect></InitBRWRequest></EMV3DSRequest></ECommerceConnect>"

The complete example looks like this:

module Main (main) where

import ThreeDS as TDS
import Data.Text.IO as TX
import Text.XML.HaXml.XmlContent
import Text.XML.HaXml.XmlContent.Parser
import Data.Text (unpack)

instance HTypeable ECommerceConnect where
  toHType x = undefined

instance XmlContent ECommerceConnect where
  parseContents = elementECommerceConnect
  toContents = elementToXMLECommerceConnect

deserialize :: String -> ECommerceConnect
deserialize s = let (Right ecomm) = readXml s in ecomm

main :: IO ()
main = do
  xml <- TX.readFile "./resources/sample.xml"
  let ecomm = deserialize $ unpack xml
  print ecomm
  print $ showXml False ecomm

Writing HTypeable and XmlContent could be seen as boilerplate, and you can circumvent it by re-implementing readXml and showXml yourself, however I don’t recommend it. By getting rid of XmlContent you will lose generalizability of reading/writing. For ECommerceConnect reading without XmlContent would be:

...
import Text.XML.HaXml.Parse (xmlParse')
import Text.XML.HaXml.Posn (posInNewCxt)
import Text.XML.HaXml.XmlContent.Parser
...

readXml' :: String -> ECommerceConnect
readXml' xml = do
   let (Right (Document _ _ ecc' _)) = xmlParse' "./resource/err" xml
       (Right ecc) = fst $ runParser elementECommerceConnect [CElem ecc' (posInNewCxt "./resource/err" Nothing)]
   in  ecc

This is error-prone and relies on implementation details of HaXml.

Full code can be found here

Conclusion

The biggest weakness of generating bindings with HaXml are dependencies between XSDs, the more dependencies the worse manual intervention. There is also no way to automate the process, and you can’t predict what changes will be necessary. If you don’t have highly dependant set of XSDs and changes are not frequent then HaXml can be used for generating Haskell bindings.