Friday, March 10, 2006

Delivery of Large XML Messages

XML Destinations with SSIS

The Situation

A possible situation you might encounter while developing integration solutions is the need to delivery a large single XML message. The requirements could be as follows: we will need to build our message from data retrieved from a SQL database view, we will need to deliver this data in a single xml message to multiple clients, the different client messages have unique XSD schemas, the file will be delivered to different file destinations, will be delivered on a 24 hour cycle.

The Solution

We need to start identifying possible solutions for each area of this scenario. First, the different xml outputs suggest the possible use of XSLT to map our canonical data to different xml outputs. The 24 hour job cycle requires some sort of job scheduling; also, the possible parallel job execution will require some sort of environment that can manage large amounts of activity.
So let’s make our SSIS solution meet our requirements.

First, lets create the variables we will use in our process to make this a externally configurable and thus a reusable package.

Next, we need to add our to retrieve our data from our database view. Once we have referenced our view we need to add a and select it to be used as a destination. Within the script component is where we are going to do all of our work. Select all of the columns needed for the XSLT transformation and xml output. Next we need to create some custom code to convert out view results to an xml version.

Once we have made our way to the Script view we need to create a few data members to hold our dataset we are going to populate dynamically in the ProcessInputRow method and one for holding the PropertyInfo of the BufferWrapper.

'Private members
Dim batchCount As Integer = 0
Dim firstRun As Boolean = True
Dim ds As DataSet
Dim pis As System.Reflection.PropertyInfo()
Dim xslt As System.Xml.Xsl.XslCompiledTransform

We need to initialize our dataset when the script object gets initialized.

Public Sub New()
ds = New DataSet("Root")
If ds.Tables.Count = 0 Then
ds.Tables.Add("Inventory")
End If

pis = GetType(Input0Buffer).GetProperties()
End Sub

Next we need to build the rest of our dataset by looping through the propertyInfo object and build the data columns and to cache our XSLT for performance considerations during our processing. These need to go under the Public Sub New() and Public Overrides Sub PreExecute() code areas.

Public Sub New()
...
'Create Data columns
For Each pi As PropertyInfo In pis
If Not pi.Name.Contains("_IsNull") Then
ds.Tables(0).Columns.Add(pi.Name)
End If
Next
End Sub

Public Overrides Sub PreExecute()
If firstRun Then
'Remove old file before writing or the file process append
System.IO.File.Delete(Me.Variables.DestinationFile)

'Load the XSLT document
Dim XSLTStream As New IO.StreamReader(Me.Variables.XSLTPath)

'Use an XmlReader and pass in a Stream as the Source
Dim xsltXR As System.Xml.XmlReader = New System.Xml.XmlTextReader(XSLTStream)
xslt = New System.Xml.Xsl.XslCompiledTransform()

'Load the stylesheet.
xslt.Load(xsltXR)
'Close the xslt reader
xsltXR.Close()
firstRun = False
End If
MyBase.PreExecute()
End Sub


If you notice, the firstRun variable and condition. This is to allow for the conditioned code to only run once durning the processing. The reason for the coding residing here instead of all in the New() area is due to the fact the the Me.Variables.XSLTPath is not available at initialization time.

Now that we have the dataset ready to go and the XSLT object loaded let us populate the dataset with the values returned from the OLE data source. The following code has one other addition. It needs to implement processing batch. This is due to the fact that transforming an XML document of any size becomes a huge bottleneck for this solution. Because of this the batch processing gives us the ability to apply the XSLT to controlled sized XML documents.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

'Create data row
Dim dr As DataRow = ds.Tables(0).NewRow()
'Populate data row
For Each pi As PropertyInfo In pis
If Not pi.Name.Contains("_IsNull") Then
dr(pi.Name) = pi.GetValue(Row, Nothing)
End If
Next

'Add Row to data table
ds.Tables(0).Rows.Add(dr)

If batchCount > Me.Variables.BatchSize Then
Export()
'Clear rows from datatable
ds.Tables(0).Rows.Clear()
batchCount = 0
End If

batchCount = batchCount + 1
End Sub

As we process row-by-row we will eventually hit the batch size set. When this happens we call an internal method named Export(). The Export() method contains the code that will now retrieve the XML stream and load it into a the XPathDocument.

'Create a new XPathDocument, loading the source from the DataSet
Dim dsSt As System.IO.Stream = New System.IO.MemoryStream()
ds.WriteXml(dsSt)
dsSt.Position = 0

'Load Xpathdocument with dataset stream
Dim xp As New System.Xml.XPath.XPathDocument(dsSt)
Dim xpn As System.Xml.XPath.XPathNavigator = xp.CreateNavigator()

'Finished with dataset stream
dsSt.Close()
dsSt.Dispose()

Now that we have the loaded XPathNavigator we need to apply our XSLT to generate our output xml.

'Create a Memory Stream to hold the transformed xml
Dim mSt As New System.IO.MemoryStream()

'Create the XmlTextWriter to output to append to the file.
Dim writer As New System.Xml.XmlTextWriter(mSt, System.Text.Encoding.UTF8)
mSt.Position = 0

'Transform the file
xslt.Transform(xp, Nothing, writer)

'Reset the stream position
mSt.Position = 0

'Retrieve the transformed XML Document from the Memory Stream
Dim outputXPDoc As New System.Xml.XPath.XPathDocument(mSt)
Dim outputXPNav As Xml.XPath.XPathNavigator = outputXPDoc.CreateNavigator()

'Finished with memeory stream
mSt.Close()
mSt.Dispose()
writer.Close()

We need to be find out if the Xml output document already exists. If it doesn’t we should create it; if it does we need to append our new batch to the old file.

'Check the if the xml document exists
If IO.File.Exists(Me.Variables.DestinationFile) Then
'Read file from disk and load to xmldocument
Dim xDocFromDisk As New System.Xml.XmlDocument()
xDocFromDisk.Load(Me.Variables.DestinationFile)
Dim xNavFromDisk As XPath.XPathNavigator = xDocFromDisk.CreateNavigator()

'Navigate to the destination element
xNavFromDisk.MoveToFollowing(Me.Variables.DestLocalName, "")

'Iterate through the requested items and appending the child elements
Dim iter As Xml.XPath.XPathNodeIterator = outputXPNav.Select(Me.Variables.ChildrenXPath)
While iter.MoveNext()
'Append selected node from source xpath document
xNavFromDisk.AppendChild(iter.Current)
End While

'Return to top of xml document
xNavFromDisk.MoveToRoot()

'Stream the entire XML document to the XmlWriter.
Dim xml As New XmlTextWriter(Me.Variables.DestinationFile, System.Text.Encoding.UTF8)
xNavFromDisk.WriteSubtree(xml)
xml.Close()
Else
'The File Does not exist so use the current xpathdocument
'Stream the entire XML document to the XmlWriter.
Dim outxml As New XmlTextWriter(Me.Variables.DestinationFile, System.Text.Encoding.UTF8)
outputXPNav.WriteSubtree(outxml)
outxml.Close()
End If


Finally we need to process any of the rows in the dataset queue once all rows have been processed.

Public Overrides Sub PostExecute()
'Process all remaining items in dataset
Export()
ds.Dispose()

MyBase.PostExecute()
End Sub

Performance

I have run this package in a situation where four parallel SQL jobs where running the same SSIS package with 4 different destinations. The process retrieved 12,000 rows from a SQL view. These rows where converted and transformed and written to the file system every 3 minutes. Each of packages completed in about 30 seconds with little memory impact. The processor showed high utilization during this time due to the transformation task required for the XML output. Other possible considerations could be taken to increase performance by converting the internal canonical format from a Dataset to an XmlDocument object. There were signs that this change could increase performance slightly.


The Complete Code

' Microsoft SQL Server Integration Services user script component
' This is your new script component in Microsoft Visual Basic .NET
' ScriptMain is the entrypoint class for script components

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Reflection
Imports System.Xml
Imports System.Xml.Xsl

Public Class ScriptMain
Inherits UserComponent

'Private members
Dim batchCount As Integer = 0
Dim firstRun As Boolean = True
Dim ds As DataSet
Dim pis As System.Reflection.PropertyInfo()
Dim xslt As System.Xml.Xsl.XslCompiledTransform

Public Sub New()
ds = New DataSet("Root")
If ds.Tables.Count = 0 Then
ds.Tables.Add("Inventory")
End If

pis = GetType(Input0Buffer).GetProperties()

'Create Data columns
For Each pi As PropertyInfo In pis
If Not pi.Name.Contains("_IsNull") Then
ds.Tables(0).Columns.Add(pi.Name)
End If
Next

End Sub


Public Overrides Sub PreExecute()
If firstRun Then
System.IO.File.Delete(Me.Variables.DestinationFile)

'Load the XSLT document
Dim XSLTStream As New IO.StreamReader(Me.Variables.XSLTPath)

' Use an XmlReader and pass in a Stream as the Source
Dim xsltXR As System.Xml.XmlReader = New System.Xml.XmlTextReader(XSLTStream)
xslt = New System.Xml.Xsl.XslCompiledTransform()
' Load the stylesheet.
xslt.Load(xsltXR)
'Close the xslt reader
xsltXR.Close()
firstRun = False
End If
MyBase.PreExecute()
End Sub

Public Overrides Sub PostExecute()
'Process all remaining items in dataset
Export()
ds.Dispose()

MyBase.PostExecute()
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
'Create data row
Dim dr As DataRow = ds.Tables(0).NewRow()
'Populate data row
For Each pi As PropertyInfo In pis
If Not pi.Name.Contains("_IsNull") Then
dr(pi.Name) = pi.GetValue(Row, Nothing)
End If
Next
'Add Row to data table
ds.Tables(0).Rows.Add(dr)

If Me.batchCount > Me.Variables.BatchSize Then
Export()
'Clear rows from datatable
ds.Tables(0).Rows.Clear()
batchCount = 0
End If

batchCount = batchCount + 1
End Sub

Private Sub Export()

' The following code may be appended to the preceding example
' Create a new XPathDocument, loading the source from the DataSet
Dim dsSt As System.IO.Stream = New System.IO.MemoryStream()
ds.WriteXml(dsSt)
dsSt.Position = 0
'Load Xpathdocument with dataset stream
Dim xp As System.Xml.XPath.XPathDocument = New System.Xml.XPath.XPathDocument(dsSt)
Dim xpn As System.Xml.XPath.XPathNavigator = xp.CreateNavigator()

'Finished with dataset stream
dsSt.Close()
dsSt.Dispose()

'Create a Memory Stream to hold the transformed xml
Dim mSt As New System.IO.MemoryStream()

' Create the XmlTextWriter to output to append to the file.
Dim writer As New System.Xml.XmlTextWriter(mSt, System.Text.Encoding.UTF8)
mSt.Position = 0

' Transform the file
xslt.Transform(xp, Nothing, writer)

'Reset the stream position
mSt.Position = 0

'Retrieve the transformed XML Document from the Memory Stream
Dim outputXPDoc As New System.Xml.XPath.XPathDocument(mSt)
Dim outputXPNav As Xml.XPath.XPathNavigator = outputXPDoc.CreateNavigator()

'Finished with memeory stream
mSt.Close()
mSt.Dispose()
writer.Close()

'Check the if the xml document exists
If IO.File.Exists(Me.Variables.DestinationFile) Then

'Read file from disk and load to xmldocument
Dim xDocFromDisk As New System.Xml.XmlDocument()
xDocFromDisk.Load(Me.Variables.DestinationFile)
Dim xNavFromDisk As XPath.XPathNavigator = xDocFromDisk.CreateNavigator()

'Navigate to the destination element
xNavFromDisk.MoveToFollowing(Me.Variables.DestLocalName, "")

'Iterate through the requested items
Dim iter As Xml.XPath.XPathNodeIterator = outputXPNav.Select(Me.Variables.ChildrenXPath)
While iter.MoveNext()
'Append selected node from source xpath document
xNavFromDisk.AppendChild(iter.Current)
End While

'Return to top of xml document
xNavFromDisk.MoveToRoot()

' Stream the entire XML document to the XmlWriter.
Dim xml As New XmlTextWriter(Me.Variables.DestinationFile, System.Text.Encoding.UTF8)
xNavFromDisk.WriteSubtree(xml)
xml.Close()

Else
'The File Does not exist so use the current xpathdocument
' Stream the entire XML document to the XmlWriter.
Dim outxml As New XmlTextWriter(Me.Variables.DestinationFile, System.Text.Encoding.UTF8)
outputXPNav.WriteSubtree(outxml)
outxml.Close()
End If
End Sub
End Class


Special Thanks to Gsus
http://weblogs.asp.net/gsusx/