Generating Haskell data types for complex XSDs using HaXml
We demonstrate that HaXml can be used to generate fairly complex data bindings in Haskell, though the process is not straightforward
The goal is to generate Haskell data
/newtype
bindings for complex XSDs. Defining Haskell bindings for XSDs with hundreds of XSD-types manually is not feasible. It has been a decade since I wrote a post Introduction to HaXml. I wanted to re-visit the HaXml library and find a way to generate Haskell bindings for a large XSD.
As an example we will use threeDS_V2.2-0.3.xsd (link to older version). An XML for this XSD is the following:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ECommerceConnect xmlns:ns2="http://www.w3.org/2000/09/xmldsig#">
<EMV3DSRequest id="9">
<MerchantID>
123456789012345</MerchantID>
<TerminalID>
X123456</TerminalID>
<InitBRWRequest>
<InitAuthData>
<CardNum>1234512345</CardNum>
<ExpYear>2029</ExpYear>
<ExpMonth>08</ExpMonth>
<TotalAmount>123</TotalAmount>
<Currency>123</Currency>
</InitAuthData>
<eventCallbackUrl>https://localhost/1234567</eventCallbackUrl>
<challengeWindowSize>00</challengeWindowSize>
<skipAutoBrowserInfoCollect>Y</skipAutoBrowserInfoCollect>
</InitBRWRequest>
</EMV3DSRequest>
</ECommerceConnect>
Some XSDs reference other XSDs. It’s nearly impossible to merge/flatten dependent XSDs unless they use the same namespaces. Hence, for generating bindings you will need to have all XSDs that are referenced and used from your XSD.
HaXml has a helper utility for generating bindings called XsdToHaskell (github link)
The utility needs the following dependencies:
polyparse
pretty
HaXml
Ideally, your XSDs dependencies should form a directed acyclic graph. It doesn’t matter from which XSD you start generating bindings, but it’s easier to start with XSDs without dependencies and then proceed to the dependant XSDs, otherwise you won’t be able to resolve compilation errors.
In our case we have 2 external XSD that are referenced, but only one which is used (xmldsig-core-schema.xsd) inside threeDS_V2.2-0.3.xsd, so let’s start with it. After compiling XsdToHaskell
all we need to do is the following:
./XsdToHaskell < xmldsig-core-schema.xsd > XmldSig.hs
After XmldSig.hs was generated we need to clean-up the generated code. If we open XmldSig.hs it starts with something like:
Parse Success!
-----------------
Schema {schema_elementFormDefault = ......)]}
-----------------
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
{-# OPTIONS_GHC -fno-warn-duplicate-exports #-}
module V'
module V'
( where
)
import Text.XML.HaXml.Schema.Schema (SchemaType(..),SimpleType(..),Extension(..),Restricts(..))
import ........
Okay, let’s remove the console output, the Schema
and give the module a proper name. We get something like this:
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
{-# OPTIONS_GHC -fno-warn-duplicate-exports #-}
module XmldSig where -- added module name
import ........
We make sure XmldSig.hs compiles, and we move to threeDS_V2.2-0.3.xsd. Let’s generate the bindings for threeDS_V2.2-0.3.xsd:
./XsdToHaskell < threeDS_V2.2-0.3.xsd > ThreeDS.hs
When we open ThreeDS.hs we see the following:
Parse Success!
-----------------
Schema {schema_.......}
-----------------
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
{-# OPTIONS_GHC -fno-warn-duplicate-exports #-}
module V'
module V'
( where
)
import Text.XML.HaXml.Schema.Schema (SchemaType(..),SimpleType(..),Extension(..),Restricts(..))
import Text.XML.HaXml.Schema.Schema as Schema
import Text.XML.HaXml.OneOfN
import qualified Text.XML.HaXml.Schema.PrimitiveTypes as Xs
import Xmldsig'core'schema'xsd as Ds
import Xenc'schema'xsd
There are 2 imports which are not part of HaXml: Xmldsig'core'schema'xsd
and Xenc'schema'xsd
. The schema keyword in these 2 imports indicates that they are external dependencies which could be needed in generated bindings. In this case xenc schema is not used, but xmldsig XSD is → we need to add the import XmldSig
.
Clean-up of ThreeDS.hs looks like this:
{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies #-}
{-# OPTIONS_GHC -fno-warn-duplicate-exports #-}
module ThreeDS where -- added module name
import Text.XML.HaXml.Schema.Schema (SchemaType(..),SimpleType(..),Extension(..),Restricts(..))
import Text.XML.HaXml.Schema.Schema as Schema
import Text.XML.HaXml.OneOfN
import qualified Text.XML.HaXml.Schema.PrimitiveTypes as Xs
import XmldSig as Ds -- added dependency xmldsig
Next step is to import ThreeDS
and see if it compiles correctly.
....
import ThreeDS as TDS
....
....
main :: IO ()
....
Unfortunately compilation returns the following error:
parse error on input ‘.’
After looking at the generated code we see an incorrectly generated name:
.signature :: Maybe Ds.Signature , eCommerceConnect_Ds
the fix is easy, we just change it to:
eCommerceConnect_Ds_signature :: Maybe Ds.Signature ,
We retry compilation and we get another error:
Not in scope: type constructor or class ‘Xsd.XsdString’
This can be fixed by importing HaXml PrimitiveTypes
. Note that importing PrimitiveTypes
from HaXml will clash with Token
type generated in ThreeDS.hs, hence the import also hides Token
import Text.XML.HaXml.Schema.PrimitiveTypes as Xsd hiding (Token)
We try to compile and yet again we get a compilation error:
Not in scope: type constructor or class ‘Ds.Signature’
This is due to ambiguity between element name and element type attributes in XSD. We can find that Maybe Ds.Signature
should have been Maybe Ds.SignatureType
and thus the fix is simple:
eCommerceConnect_Ds_signature :: Maybe Ds.Signature
changes to:
eCommerceConnect_Ds_signature :: Maybe Ds.SignatureType
Finally, all compilation errors are resolved.
We have generated all the types, but now we need to do something useful with them - read and write XMLs!
HaXml uses the typeclass XmlContent
which instructs the parser how to process XMLs. Luckily the generated bindings also contain these instructions and we can plug them in the XmlContent
. We also need to supply HTypeable
typeclass instance (which I describe in my previous HaXml post) - but it won’t be used as the generated bindings contain all the parsing instructions. In order to read and write our sample XML we need to define the following instances:
instance HTypeable ECommerceConnect where
= undefined -- doesn't matter
toHType x
instance XmlContent ECommerceConnect where
= elementECommerceConnect
parseContents = elementToXMLECommerceConnect toContents
The functions elementECommerceConnect
and elementToXMLECommerceConnect
can be found in the generated module.
Once these are defined we can you HaXml standard API to read and write XMLs. We use the function readXml
to read the sample XML. The Haskell representation looks like this:
ECommerceConnect {eCommerceConnect_choice0 = OneOf3 (EMV3DSRequest {eMV3DSRequest_id = XsdString "9", eMV3DSRequest_merchantID = XsdString "\n 123456789012345\n ", eMV3DSRequest_terminalID = XsdString "\n X123456\n ", eMV3DSRequest_choice2 = OneOf3 (InitBRWRequest {initBRWRequest_initAuthData = InitAuthData {initAuthData_choice0 = OneOf2 (TypeCardNum (Long 1234512345),TypeExpYear (XsdString "2029"),TypeExpMonth (XsdString "08")), initAuthData_totalAmount = Long 123, initAuthData_currency = TypeCurrency (XsdString "123")}, initBRWRequest_eventCallbackUrl = TypeCallbackUrl (AnyURI "https://localhost/1234567"), initBRWRequest_challengeWindowSize = Just (TypeСhallengeWindowSize (XsdString "00")), initBRWRequest_skipAutoBrowserInfoCollect = Just (SkipAutoBrowserInfoCollect (XsdString "Y"))})}), eCommerceConnect_Ds_signature = Nothing}
We can use the HaXml function showXml
to write the above Haskell ECommerceConnect
instance into XML. We get the following XML back:
<?xml version='1.0' ?>\n<ECommerceConnect><EMV3DSRequest id="9"><MerchantID>123456789012345</MerchantID><TerminalID>X123456</TerminalID><InitBRWRequest><InitAuthData><CardNum>1234512345</CardNum><ExpYear>2029</ExpYear><ExpMonth>08</ExpMonth><TotalAmount>123</TotalAmount><Currency>123</Currency></InitAuthData><eventCallbackUrl>https://localhost/1234567</eventCallbackUrl><challengeWindowSize>00</challengeWindowSize><skipAutoBrowserInfoCollect>Y</skipAutoBrowserInfoCollect></InitBRWRequest></EMV3DSRequest></ECommerceConnect>" "
The complete example looks like this:
module Main (main) where
import ThreeDS as TDS
import Data.Text.IO as TX
import Text.XML.HaXml.XmlContent
import Text.XML.HaXml.XmlContent.Parser
import Data.Text (unpack)
instance HTypeable ECommerceConnect where
= undefined
toHType x
instance XmlContent ECommerceConnect where
= elementECommerceConnect
parseContents = elementToXMLECommerceConnect
toContents
deserialize :: String -> ECommerceConnect
= let (Right ecomm) = readXml s in ecomm
deserialize s
main :: IO ()
= do
main <- TX.readFile "./resources/sample.xml"
xml let ecomm = deserialize $ unpack xml
print ecomm
print $ showXml False ecomm
Writing HTypeable
and XmlContent
could be seen as boilerplate, and you can circumvent it by re-implementing readXml
and showXml
yourself, however I don’t recommend it. By getting rid of XmlContent
you will lose generalizability of reading/writing. For ECommerceConnect
reading without XmlContent
would be:
...
import Text.XML.HaXml.Parse (xmlParse')
import Text.XML.HaXml.Posn (posInNewCxt)
import Text.XML.HaXml.XmlContent.Parser
...
readXml' :: String -> ECommerceConnect
= do
readXml' xml let (Right (Document _ _ ecc' _)) = xmlParse' "./resource/err" xml
Right ecc) = fst $ runParser elementECommerceConnect [CElem ecc' (posInNewCxt "./resource/err" Nothing)]
(in ecc
This is error-prone and relies on implementation details of HaXml.
Full code can be found here
Conclusion
The biggest weakness of generating bindings with HaXml are dependencies between XSDs, the more dependencies the worse manual intervention. There is also no way to automate the process, and you can’t predict what changes will be necessary. If you don’t have highly dependant set of XSDs and changes are not frequent then HaXml can be used for generating Haskell bindings.