Monday, March 12, 2007

MVP Summit '07

Another year has passed and it is time for the MVP summit in Redmond. This year my good friend Jesus Rodriguez will be join Sonu Arora (Program Manager BizTalk Server) on stage to discuss some of the changes to the Adapter Framework Components with .NET 3.0. I am sure I will get a day by day full run down of events from him with all the great new Buzz from the conference. Check back for updates on what new things come out during this week.

Friday, March 10, 2006

Delivery of Large XML Messages

XML Destinations with SSIS

The Situation

A possible situation you might encounter while developing integration solutions is the need to delivery a large single XML message. The requirements could be as follows: we will need to build our message from data retrieved from a SQL database view, we will need to deliver this data in a single xml message to multiple clients, the different client messages have unique XSD schemas, the file will be delivered to different file destinations, will be delivered on a 24 hour cycle.

The Solution

We need to start identifying possible solutions for each area of this scenario. First, the different xml outputs suggest the possible use of XSLT to map our canonical data to different xml outputs. The 24 hour job cycle requires some sort of job scheduling; also, the possible parallel job execution will require some sort of environment that can manage large amounts of activity.
So let’s make our SSIS solution meet our requirements.

First, lets create the variables we will use in our process to make this a externally configurable and thus a reusable package.

Next, we need to add our to retrieve our data from our database view. Once we have referenced our view we need to add a and select it to be used as a destination. Within the script component is where we are going to do all of our work. Select all of the columns needed for the XSLT transformation and xml output. Next we need to create some custom code to convert out view results to an xml version.

Once we have made our way to the Script view we need to create a few data members to hold our dataset we are going to populate dynamically in the ProcessInputRow method and one for holding the PropertyInfo of the BufferWrapper.

'Private members
Dim batchCount As Integer = 0
Dim firstRun As Boolean = True
Dim ds As DataSet
Dim pis As System.Reflection.PropertyInfo()
Dim xslt As System.Xml.Xsl.XslCompiledTransform

We need to initialize our dataset when the script object gets initialized.

Public Sub New()
ds = New DataSet("Root")
If ds.Tables.Count = 0 Then
ds.Tables.Add("Inventory")
End If

pis = GetType(Input0Buffer).GetProperties()
End Sub

Next we need to build the rest of our dataset by looping through the propertyInfo object and build the data columns and to cache our XSLT for performance considerations during our processing. These need to go under the Public Sub New() and Public Overrides Sub PreExecute() code areas.

Public Sub New()
...
'Create Data columns
For Each pi As PropertyInfo In pis
If Not pi.Name.Contains("_IsNull") Then
ds.Tables(0).Columns.Add(pi.Name)
End If
Next
End Sub

Public Overrides Sub PreExecute()
If firstRun Then
'Remove old file before writing or the file process append
System.IO.File.Delete(Me.Variables.DestinationFile)

'Load the XSLT document
Dim XSLTStream As New IO.StreamReader(Me.Variables.XSLTPath)

'Use an XmlReader and pass in a Stream as the Source
Dim xsltXR As System.Xml.XmlReader = New System.Xml.XmlTextReader(XSLTStream)
xslt = New System.Xml.Xsl.XslCompiledTransform()

'Load the stylesheet.
xslt.Load(xsltXR)
'Close the xslt reader
xsltXR.Close()
firstRun = False
End If
MyBase.PreExecute()
End Sub


If you notice, the firstRun variable and condition. This is to allow for the conditioned code to only run once durning the processing. The reason for the coding residing here instead of all in the New() area is due to the fact the the Me.Variables.XSLTPath is not available at initialization time.

Now that we have the dataset ready to go and the XSLT object loaded let us populate the dataset with the values returned from the OLE data source. The following code has one other addition. It needs to implement processing batch. This is due to the fact that transforming an XML document of any size becomes a huge bottleneck for this solution. Because of this the batch processing gives us the ability to apply the XSLT to controlled sized XML documents.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

'Create data row
Dim dr As DataRow = ds.Tables(0).NewRow()
'Populate data row
For Each pi As PropertyInfo In pis
If Not pi.Name.Contains("_IsNull") Then
dr(pi.Name) = pi.GetValue(Row, Nothing)
End If
Next

'Add Row to data table
ds.Tables(0).Rows.Add(dr)

If batchCount > Me.Variables.BatchSize Then
Export()
'Clear rows from datatable
ds.Tables(0).Rows.Clear()
batchCount = 0
End If

batchCount = batchCount + 1
End Sub

As we process row-by-row we will eventually hit the batch size set. When this happens we call an internal method named Export(). The Export() method contains the code that will now retrieve the XML stream and load it into a the XPathDocument.

'Create a new XPathDocument, loading the source from the DataSet
Dim dsSt As System.IO.Stream = New System.IO.MemoryStream()
ds.WriteXml(dsSt)
dsSt.Position = 0

'Load Xpathdocument with dataset stream
Dim xp As New System.Xml.XPath.XPathDocument(dsSt)
Dim xpn As System.Xml.XPath.XPathNavigator = xp.CreateNavigator()

'Finished with dataset stream
dsSt.Close()
dsSt.Dispose()

Now that we have the loaded XPathNavigator we need to apply our XSLT to generate our output xml.

'Create a Memory Stream to hold the transformed xml
Dim mSt As New System.IO.MemoryStream()

'Create the XmlTextWriter to output to append to the file.
Dim writer As New System.Xml.XmlTextWriter(mSt, System.Text.Encoding.UTF8)
mSt.Position = 0

'Transform the file
xslt.Transform(xp, Nothing, writer)

'Reset the stream position
mSt.Position = 0

'Retrieve the transformed XML Document from the Memory Stream
Dim outputXPDoc As New System.Xml.XPath.XPathDocument(mSt)
Dim outputXPNav As Xml.XPath.XPathNavigator = outputXPDoc.CreateNavigator()

'Finished with memeory stream
mSt.Close()
mSt.Dispose()
writer.Close()

We need to be find out if the Xml output document already exists. If it doesn’t we should create it; if it does we need to append our new batch to the old file.

'Check the if the xml document exists
If IO.File.Exists(Me.Variables.DestinationFile) Then
'Read file from disk and load to xmldocument
Dim xDocFromDisk As New System.Xml.XmlDocument()
xDocFromDisk.Load(Me.Variables.DestinationFile)
Dim xNavFromDisk As XPath.XPathNavigator = xDocFromDisk.CreateNavigator()

'Navigate to the destination element
xNavFromDisk.MoveToFollowing(Me.Variables.DestLocalName, "")

'Iterate through the requested items and appending the child elements
Dim iter As Xml.XPath.XPathNodeIterator = outputXPNav.Select(Me.Variables.ChildrenXPath)
While iter.MoveNext()
'Append selected node from source xpath document
xNavFromDisk.AppendChild(iter.Current)
End While

'Return to top of xml document
xNavFromDisk.MoveToRoot()

'Stream the entire XML document to the XmlWriter.
Dim xml As New XmlTextWriter(Me.Variables.DestinationFile, System.Text.Encoding.UTF8)
xNavFromDisk.WriteSubtree(xml)
xml.Close()
Else
'The File Does not exist so use the current xpathdocument
'Stream the entire XML document to the XmlWriter.
Dim outxml As New XmlTextWriter(Me.Variables.DestinationFile, System.Text.Encoding.UTF8)
outputXPNav.WriteSubtree(outxml)
outxml.Close()
End If


Finally we need to process any of the rows in the dataset queue once all rows have been processed.

Public Overrides Sub PostExecute()
'Process all remaining items in dataset
Export()
ds.Dispose()

MyBase.PostExecute()
End Sub

Performance

I have run this package in a situation where four parallel SQL jobs where running the same SSIS package with 4 different destinations. The process retrieved 12,000 rows from a SQL view. These rows where converted and transformed and written to the file system every 3 minutes. Each of packages completed in about 30 seconds with little memory impact. The processor showed high utilization during this time due to the transformation task required for the XML output. Other possible considerations could be taken to increase performance by converting the internal canonical format from a Dataset to an XmlDocument object. There were signs that this change could increase performance slightly.


The Complete Code

' Microsoft SQL Server Integration Services user script component
' This is your new script component in Microsoft Visual Basic .NET
' ScriptMain is the entrypoint class for script components

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.Reflection
Imports System.Xml
Imports System.Xml.Xsl

Public Class ScriptMain
Inherits UserComponent

'Private members
Dim batchCount As Integer = 0
Dim firstRun As Boolean = True
Dim ds As DataSet
Dim pis As System.Reflection.PropertyInfo()
Dim xslt As System.Xml.Xsl.XslCompiledTransform

Public Sub New()
ds = New DataSet("Root")
If ds.Tables.Count = 0 Then
ds.Tables.Add("Inventory")
End If

pis = GetType(Input0Buffer).GetProperties()

'Create Data columns
For Each pi As PropertyInfo In pis
If Not pi.Name.Contains("_IsNull") Then
ds.Tables(0).Columns.Add(pi.Name)
End If
Next

End Sub


Public Overrides Sub PreExecute()
If firstRun Then
System.IO.File.Delete(Me.Variables.DestinationFile)

'Load the XSLT document
Dim XSLTStream As New IO.StreamReader(Me.Variables.XSLTPath)

' Use an XmlReader and pass in a Stream as the Source
Dim xsltXR As System.Xml.XmlReader = New System.Xml.XmlTextReader(XSLTStream)
xslt = New System.Xml.Xsl.XslCompiledTransform()
' Load the stylesheet.
xslt.Load(xsltXR)
'Close the xslt reader
xsltXR.Close()
firstRun = False
End If
MyBase.PreExecute()
End Sub

Public Overrides Sub PostExecute()
'Process all remaining items in dataset
Export()
ds.Dispose()

MyBase.PostExecute()
End Sub

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
'Create data row
Dim dr As DataRow = ds.Tables(0).NewRow()
'Populate data row
For Each pi As PropertyInfo In pis
If Not pi.Name.Contains("_IsNull") Then
dr(pi.Name) = pi.GetValue(Row, Nothing)
End If
Next
'Add Row to data table
ds.Tables(0).Rows.Add(dr)

If Me.batchCount > Me.Variables.BatchSize Then
Export()
'Clear rows from datatable
ds.Tables(0).Rows.Clear()
batchCount = 0
End If

batchCount = batchCount + 1
End Sub

Private Sub Export()

' The following code may be appended to the preceding example
' Create a new XPathDocument, loading the source from the DataSet
Dim dsSt As System.IO.Stream = New System.IO.MemoryStream()
ds.WriteXml(dsSt)
dsSt.Position = 0
'Load Xpathdocument with dataset stream
Dim xp As System.Xml.XPath.XPathDocument = New System.Xml.XPath.XPathDocument(dsSt)
Dim xpn As System.Xml.XPath.XPathNavigator = xp.CreateNavigator()

'Finished with dataset stream
dsSt.Close()
dsSt.Dispose()

'Create a Memory Stream to hold the transformed xml
Dim mSt As New System.IO.MemoryStream()

' Create the XmlTextWriter to output to append to the file.
Dim writer As New System.Xml.XmlTextWriter(mSt, System.Text.Encoding.UTF8)
mSt.Position = 0

' Transform the file
xslt.Transform(xp, Nothing, writer)

'Reset the stream position
mSt.Position = 0

'Retrieve the transformed XML Document from the Memory Stream
Dim outputXPDoc As New System.Xml.XPath.XPathDocument(mSt)
Dim outputXPNav As Xml.XPath.XPathNavigator = outputXPDoc.CreateNavigator()

'Finished with memeory stream
mSt.Close()
mSt.Dispose()
writer.Close()

'Check the if the xml document exists
If IO.File.Exists(Me.Variables.DestinationFile) Then

'Read file from disk and load to xmldocument
Dim xDocFromDisk As New System.Xml.XmlDocument()
xDocFromDisk.Load(Me.Variables.DestinationFile)
Dim xNavFromDisk As XPath.XPathNavigator = xDocFromDisk.CreateNavigator()

'Navigate to the destination element
xNavFromDisk.MoveToFollowing(Me.Variables.DestLocalName, "")

'Iterate through the requested items
Dim iter As Xml.XPath.XPathNodeIterator = outputXPNav.Select(Me.Variables.ChildrenXPath)
While iter.MoveNext()
'Append selected node from source xpath document
xNavFromDisk.AppendChild(iter.Current)
End While

'Return to top of xml document
xNavFromDisk.MoveToRoot()

' Stream the entire XML document to the XmlWriter.
Dim xml As New XmlTextWriter(Me.Variables.DestinationFile, System.Text.Encoding.UTF8)
xNavFromDisk.WriteSubtree(xml)
xml.Close()

Else
'The File Does not exist so use the current xpathdocument
' Stream the entire XML document to the XmlWriter.
Dim outxml As New XmlTextWriter(Me.Variables.DestinationFile, System.Text.Encoding.UTF8)
outputXPNav.WriteSubtree(outxml)
outxml.Close()
End If
End Sub
End Class


Special Thanks to Gsus
http://weblogs.asp.net/gsusx/

Friday, February 10, 2006

Little Known Bug (BizTalk Envelope Schemas)

There is a little known bug in BizTalk 2004 when it comes to mapping schemas to a destination schema that is marked as an envelope schema.  This issue is only seen when trying to apply the map in a port.  There is always the work around of creating an orchestration to apply the map in.  The situation I was in this week was that we had 10 other projects that contained schemas that needed to be mapped to the envelope schema.  So to solve this issue I created a separate project that contained a copy of the envelope schema, with this copy I removed the envelope schema property and then with this new project I created a referenced from my other 10 projects and then completed my maps and then applied them on the send ports.

Thursday, December 08, 2005

Cubism, Pointillism: A Developers Canvas

Picasso

A short time ago I spent a few days in Barcelona. While there, I visited the Pablo Picasso museum. The work there showed the progress of Picasso's development and how he went from creating simple drawings to creating some wonderful works of art. Picasso is known for his work in the artistic movement knows as Cubism. Cubism's key concept is viewing an object from multiple angles at once. This concept of viewing multiple angles of a subject at once is an approach I have begun to work on when analyzing business modeling and solution architecting. Most people, when looking at Picasso's work, only see a distorted image made up of misshaped shapes. It is true that most people could never see the genius of Picasso's work. They just see the work and know that it is different and not understood as other styles of art. Business modeling and solution architecture is very similar to the approach taken by the cubist artist. You can never look at a process or issue and be able to design a solution in one straight forward look. Cubism instills this idea into its basic concept. Ideas and designs must flow from a multiple angle design approach. An auditory approach has to be taken into consideration if you want to avoid multiple iterations of the development lifecycle. Auditory thinking is associated with sequential thinking and visual thinking is associated with spatial thinking. Whoa!? This method has to be taken to allow for the unordered receiving of solution requirements. Never is there a sequential reception of requirements for a solution. If it is a business solution, a messaging queuing solution or some other crazy idea, it has come up during a conversation between Gsus and me. Spatial thinkers’ and organizers’ traits allow them to model and conceptualize advanced scenarios in their world between their ears. This type of person is your best architect for solutions. This trait needs to be nurtured in those who do not possess these skills to help their ability to view an issue, process, or scenario from multiple angles at one time.

Pointillism, the Drag-and-Drop coding approach

For those of you that love the new path of development studios, the drag and drop code blocks, take note. Microsoft has been making this practice more and more main stream for mission-critical solutions. I just want to know, when did object-oriented program actually go to dragging objects? The advent of Visual Studio and BizTalk's orchestration design and mapper tool has made certain programmers seem like wizards. People can read some basic how-to guide about what an item in a tool box does and then drag it out to the design area of their project. Well as a result of the lack of proper architecting, or more likely the lack of architecting altogether, this ease of bolting together a solution in a visual technique is causing solutions to be developed with detrimental performance in plan for them. This ease of developing a solution that works, in theory, has become a detriment to software platforms available today. So to all of you pointillism.NET designers, pick up a book on some advanced development ideas and apply them to the artwork you are working on.