﻿<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<title>SSIS bits 'n' bobs</title>
	<updated>2008-08-28T05:53:56Z</updated>
	<id>http://ssisblog.replicationanswers.com/atom.aspx</id>
	<link rel="self" href="http://ssisblog.replicationanswers.com/atom.aspx" />
	<link rel="alternate" href="http://ssisblog.replicationanswers.com" />
	<generator uri="http://app.onlinequickblog.com/" version="2.0">Quick Blog</generator>
	<entry>
		<title>Escaping Double Quotes in Flat Files: SSIS is different to DTS</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2008/01/10/escaping-double-quotes-in-flat-files-ssis-is-different-to-dts.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2008-01-10:6cf81507-5d22-4534-bd5c-15d68f344b04</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="DataFlow" />
		<category term="Flat File" />
		<category term="General" />
		<updated>2008-01-10T22:44:14Z</updated>
		<published>2008-01-10T22:40:00Z</published>
		<content type="html"><![CDATA[Just a heads-up as I noticed today that the import behaviour of escaped double-quotes in a string is different in DTS and SSIS. <BR><BR>If we have the following single entry in a text file:<BR><BR><STRONG><FONT size=3>"xxx""999""xxx"<BR></FONT></STRONG><BR>Using a <U>Transform Data task in DTS</U>, the resulting database entry will be:<BR><BR><STRONG><FONT size=3>xxx"999"xxx</FONT></STRONG><BR><BR>Using a <U>Data Flow task in SSIS </U>we will get:<BR><BR><STRONG><FONT size=3>xxx""999""xxx</FONT></STRONG><BR><BR>In both packages the double quote is defined as a text delimiter and apart from that there don't seem to be any relevant configuration possibilities to finetune the behaviour.<BR><BR>It seems DTS interpretets this as an intentionally escaped double-quote, and removes the extra characters automatically. SSIS on the other hand assumes that this is a valid string entry.<BR><BR>I'm currently converting DTS packages to SSIS at the moment and as the import behaviour has changed in this way, my solution is to run an immediate update (ExecuteSQL task) on the destination data and replace<BR>adjacent double quotes with a single one - not nice. I could alternatively strip them out in the data flow, using a derived column transformation - also not nice. Hopefully this'll be exposed as a property of the Flat File connection manager some time.<BR>]]></content>
	</entry>
	<entry>
		<title>When to Not use SSIS - directory / file iterations!!!!</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2008/01/04/when-to-not-use-ssis--directory--file-iterations.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2008-01-04:09d760e4-cb7a-47ee-aff1-d1a071ebc03b</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="DataFlow" />
		<category term="General" />
		<updated>2008-01-04T19:04:57Z</updated>
		<published>2008-01-04T18:51:00Z</published>
		<content type="html"><![CDATA[I've become a bit too much of an enthusiast about using SSIS for everything recently and I came unstuck with a simple file iteration problem........<BR><BR>I simply wanted to iterate through a set of network directories and do some file copying. Basically I needed to sync up the UAT and live directories of source files, so that the testing in the UAT environment could be a true parallel test of live. So, if the process came across a recent file in a live directory that wasn't in the corresponding test<BR>subdirectory, it should be copied over. <BR><BR>I agree that this sounds like a straightforward issue. It's the sort of thing I can easily do in VBScript or VB6.0 but I'm trying to do everything in SSIS at the moment to centralise the code for maintenance, so I decided to do it that way. <BR><BR>In the designer we see that there is a task which seems custom-made for this type of problem - "Foreach Loops based on File Enumerator" so away I went and created the package below:<BR><BR><IMG src="http://images.quickblogcast.com/105183-98006/SSISLooping.JPG" width=363 border=0><BR><BR>In the SSIS package above, the outer Foreach_FeedDirectory loop iterates through a recordset of directories and stores the directory path in a local variable. The inner ForeachFeedSourceFile loop iterates over the files in the specific directory. <BR><BR>Here's where you begin to come unstuck - all the usual file attributes you'd expect to see are simply not exposed to the Foreach File Enumerator. In the SSIS loop we can get the filename with or without the full path / extension and that's about it - no "Created Date", no "Last Modified Date", no "Size" etc etc. <EM><STRONG>To get any of these useful attributes you have resort to scripting!<BR></STRONG></EM><BR>So in the package shown above, there is an inner script task which uses System.IO.FileInfo, System.IO.File.Exists, System.IO.File.Copy etc to decide whether to do the copy or not and applies to the file name found in the inner Foreach File Enumerator.<BR><BR>Let's recap: I am looping through each directory. Once I find a diectory I loop through the files. Once I find a file I use the FileInfo method to get the file info in script, then copy over if it is recent and if it doesn't exist in the destination directories.<BR><BR>On my 33 directories, some of which contain 1000s of files, this takes approx&nbsp; <FONT size=3><STRONG>1.5 to 2 hours to run in SSIS <IMG src="http://ssisblog.replicationanswers.com/emoticons/sad.png" border=0></STRONG></FONT><BR><BR>As this is a process I need to run often, this is a complete disaster <IMG src="http://ssisblog.replicationanswers.com/emoticons/sad.png" border=0>. I can see that the iteration over the files is the bottleneck and this gives precious little useful info in the task, so as a trial, I start thinking of c# as a replacement. Now I'm definitely not a c# developer (and never will be <IMG src="http://ssisblog.replicationanswers.com/emoticons/smile.png" border=0>), but I managed to knock together the console application below to replace the entire package in an hour or so using my pal Google. Amazingly it <STRONG><FONT size=3>c#</FONT> <FONT size=3>does the same task but takes just 70secs!!!!!!</FONT></STRONG><BR><BR>So - if you're just doing iterative directory / file manipulation and have to incorporate scripting, I'd definitely recommend resorting to c# rather than SSIS.<BR><BR><U><STRONG><FONT size=3>C# code:<BR></FONT></STRONG></U><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; private static void CheckDirectory(String sLiveDirectory, String sTestDirectory)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Console.WriteLine("Processing: " + sLiveDirectory);<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DirectoryInfo dirCustom = new DirectoryInfo(sLiveDirectory);<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FileInfo[] filCustom;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; filCustom = dirCustom.GetFiles();<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; foreach (FileInfo filFile in filCustom)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TimeSpan ts = DateTime.Now - filFile.CreationTime;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (ts.Days &lt; 2)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if (!File.Exists(sTestDirectory + filFile.Name))<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; File.Copy(filFile.FullName, sTestDirectory + filFile.Name);<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Console.WriteLine("Copying: " + filFile.FullName);<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>]]></content>
	</entry>
	<entry>
		<title>Simple Method for Importing Ragged files / Files with Different Row Types in SSIS / Files with a problematic final row</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2008/01/02/simple-method-for-importing-ragged-files--files-with-different-row-types-in-ssis--files-with-a-problematic-final-row.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2008-01-02:60316419-efec-47c1-874f-f49d28c87102</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="DataFlow" />
		<category term="Flat File" />
		<category term="General" />
		<updated>2008-01-03T20:00:15Z</updated>
		<published>2008-01-02T19:35:00Z</published>
		<content type="html"><![CDATA[<P>&nbsp;</P>
<P><FONT face=Arial size=2>There may come a time when you are not in control of the input file format and it is decidely non-standard. I recently had to process such a feed file which had a metadata row at the start and the end of the file. The file could be represented something like this, with only the x,y,z records being of importance (along with the column headers if we can get this info).</FONT></P>
<P><FONT face=Arial size=2>
<TABLE borderColor=#000000 cellSpacing=0 cellPadding=0 border=1>
<TBODY>
<TR>
<TD bgColor=#c0c0c0><FONT size=2>Col1</FONT>&nbsp;</TD>
<TD bgColor=#c0c0c0>&nbsp;<FONT size=2>Col2</FONT></TD>
<TD bgColor=#c0c0c0>&nbsp;<FONT size=2>Col3</FONT></TD></TR>
<TR>
<TD bgColor=#ff0000>&nbsp;<FONT size=2>xxx start</FONT></TD>
<TD bgColor=#ff0000>&nbsp;</TD>
<TD bgColor=#ff0000>&nbsp;</TD></TR>
<TR>
<TD>&nbsp;x</TD>
<TD>&nbsp;y</TD>
<TD>&nbsp;z</TD></TR>
<TR>
<TD>&nbsp;x</TD>
<TD>&nbsp;y</TD>
<TD>&nbsp;z</TD></TR>
<TR>
<TD>&nbsp;x</TD>
<TD>&nbsp;y</TD>
<TD>&nbsp;z</TD></TR>
<TR>
<TD bgColor=#ff0000>&nbsp;<FONT size=2>xxx end</FONT></TD>
<TD bgColor=#ff0000>&nbsp;</TD>
<TD bgColor=#ff0000>&nbsp;</TD></TR></TBODY></TABLE><BR>The problem is that for the red rows there is only one column: "xxx start" and no comma separators, while the second row would be "x,y,z".<BR><BR></FONT><FONT face=Arial size=2>This is not an easy problem to describe but I tried various keyword combinations on Googling to see how others were coping with this issue.<BR><BR><FONT face=Arial size=2>Some people advise using the script task: </FONT><A href="http://www.sql-server-performance.com/article_print.aspx?id=1056&amp;type=art"><U><FONT face=Arial color=#0000ff size=2>http://www.sql-server-performance.com/article_print.aspx?id=1056&amp;type=art</FONT></U></A> <BR><FONT face=Arial size=2>Some use the Conditional Split: </FONT><A href="http://www.sqlis.com/54.aspx"><U><FONT face=Arial color=#0000ff size=2>http://www.sqlis.com/54.aspx</FONT></U></A> <BR><FONT face=Arial size=2>I've also seen packages which call vbscript code to open the text file, read all the lines and then write them back into the text file minus the problemmatic rows.</FONT></P>
<P><FONT face=Arial size=2>All of these are interesting technical solutions, however they weren't suitable for a lazy guy like me <IMG src="http://ssisblog.replicationanswers.com/emoticons/smile.png" border=0></FONT> </P><FONT face=Arial size=2>
<OL>
<LI><FONT face=Arial size=2>I didn't want to write any code</FONT> </LI>
<LI><FONT face=Arial size=2>in my case there are dozens of columns, so I wanted the designer to define the columns for me by looking at the text file, rather than defining them one-by-one by hand</LI>
<LI></FONT><FONT face=Arial size=2>finally </FONT></FONT><FONT face=Arial size=2>I didn't want to use script outside the package which would make maintenance more difficult.</FONT> </LI></OL>
<P><FONT face=Arial size=2>In short, my aim was to have the entire process encapsulated in a single package with no calls being made to outside processes, and have it as simple as possible to build and maintain.</FONT></P>
<P><FONT face=Arial size=2>So, how to do this?</FONT> </P>
<P><FONT face=Arial size=2>The way I do it now is quite simple (below):</FONT> </P>
<P><FONT face=Arial color=#000000 size=2></FONT>&nbsp;<IMG src="http://images.quickblogcast.com/105183-98006/Ragged.JPG" width=315 border=0></P>
<P><FONT face=Arial size=2>Initially I import the entire file into a staging table. The Flat File source is defined as "Ragged right" and has one column which is 8000 chars long. This is imported into a staging table which has one column - defined as varchar(8000). This way the entire file is always imported successfully, along with the 2 problem rows.</FONT></P>
<P><FONT face=Arial size=2>Next we remove the extra rows. In my case this is a simple delete statement as the length of the problem rows is constant and is significantly shorter than a data row. You'll need some way of distinguish this row(s) from the others and if you're lucky it'll be as simple as my case. If not, you might have to filter using the column delimiters or use some other similar logic.</FONT></P>
<P><FONT face=Arial size=2>Next I export to a temporary file. This is a Flat File defined as "Ragged right", again with one column. The staging table above is the source and data is exported from it to the staging Flat File.</FONT></P>
<P><FONT face=Arial size=2>Finally, this staging file is treated as a standard Flat File and imported as per usual in another data flow task. This means that the column names can be determined from the source file by the designer.</FONT></P>
<P><FONT face=Arial size=2>It's all really simple and transparent. If / when a recordset source can be used, we can do away with the persisted staging Flat File to make it even neater in the future.</FONT></P></FONT>]]></content>
	</entry>
	<entry>
		<title>Thorough Package Logging - Some Lessons Learned and Recommendations</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/28/thorough-package-logging--some-lessons-learned-and-recommendations.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-28:5003e2d4-f891-4719-89e8-9f3a7c5013f5</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="Logging" />
		<category term="General" />
		<updated>2007-11-28T09:16:24Z</updated>
		<published>2007-11-28T09:08:00Z</published>
		<content type="html"><![CDATA[<DIV>
<P><FONT face=Arial size=2>At some point you'll need to decide on a logging methodology, and this is my 2ps worth: </FONT></P>
<P><FONT face=Arial size=2>(1) As a starting point I'd say take a look at Jamie's excellent article on event-handler logging: </FONT><A href="http://blogs.conchango.com/jamiethomson/archive/2005/06/11/SSIS_3A00_-Custom-Logging-Using-Event-Handlers.aspx"><U><FONT face=Arial color=#0000ff size=2>http://blogs.conchango.com/jamiethomson/archive/2005/06/11/SSIS_3A00_-Custom-Logging-Using-Event-Handlers.aspx</FONT></U></A><FONT face=Arial size=2>. Initially I used the same setup and we standardised all packages on it but with hindsight I'd now recommend to do things slightly differently. The main difference would be to not use property expressions to do the insert to the log table but instead<U><B> for event-handling, call a stored procedure</B></U><B></B></FONT><B><FONT face=Arial>.</FONT></B><FONT face=Arial size=2> Why? Because the property expression will become completely illegible when you try to escape single and double quotes - both of which might appear in a server-generated error message. <BR>We currently just log the OnError and OnPostExecute events, but I'm looking again to see if any of the others would be particularly useful. Finally I'd say that this goes hand-in-hand with using a decent naming convention (Jamie has some standards in his blog). The reason being that the events will log the task name so if they are all called "Data Flow", "Data Flow 1", "Data Flow 2" etc it's just a complete pain when debugging!</FONT></P>
<P><FONT face=Arial size=2>(2) As well as event logging to a central table, </FONT><U><FONT face=Arial size=2><STRONG>do some 'simple workflow&nbsp;message logging</STRONG></FONT></U><FONT face=Arial size=2><STRONG>' </STRONG>to the same table</FONT><FONT face=Arial size=2>. In my package template I have 2 tasks which sandwich the rest. This makes reports nice and easy.</FONT></P>
<P><FONT face=Arial color=#000000 size=2></FONT><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/MainLogginsTasks.JPG" width=285 border=0>&nbsp;</P>
<P><FONT face=Arial size=2>(3) <STRONG><U>Make sure</U></STRONG></FONT><STRONG><U> <FONT face=Arial size=2>each end&nbsp;task in the control flow is a logging message task</FONT></U></STRONG><FONT face=Arial size=2>. In this task use a message that is either constant, or which begins consistently. That way you know that a package always has a start and end message pair or a start and an error message pair. This makes reporting really simple.</FONT></P>
<P><FONT face=Arial size=2>(3)</FONT><U></U><U><B> <FONT face=Arial size=2>Add another parameter to the logging table - "PackageSessionGUID"</FONT></B></U><B></B><FONT face=Arial size=2>. Have an intial task which generates a guid (use newid() against a SQL OLEDB connection) and which then saves the value to a package variable. Use this variable in all subsequent logging. Why? Because this allows you to <EM>distinguish between parallel executes </EM>of a package.</FONT></P>
<P><FONT face=Arial size=2>(4) Also</FONT> <FONT face=Arial size=2><U><STRONG>log to a text file</STRONG></U></FONT><FONT face=Arial size=2>. This can be really useful when the problem lies with the normal logging setup. EG what happens if the connection to the database which is used as a logging repository is not correct? Normally you'd not get any info because you rely on this to write the log entries, but the text file works as a kind of backup log.</FONT></P>
<P><FONT face=Arial size=2>(5) Also</FONT><U></U><U><B> <FONT face=Arial size=2>log the output of the job itself</FONT></B></U><B></B><FONT face=Arial size=2>, assuming the package is scheduled.</FONT><BR><FONT face=Arial size=2>This will help debug any other problems associated with the package and it's batch file.</FONT> </P></DIV>]]></content>
	</entry>
	<entry>
		<title>Retain NULL values vs Keep NULLs - which to use?</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/24/retain-null-values-vs-keep-nulls--which-to-use.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-24:4016c176-5da6-4bb8-93a3-630444bdd955</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="DataFlow" />
		<category term="General" />
		<updated>2007-11-24T12:02:34Z</updated>
		<published>2007-11-24T11:32:00Z</published>
		<content type="html"><![CDATA[<DIV><BR>There is some confusion as to what the various NULL settings all do in SSIS. In fact in my team we had created 15 packages before realising the full implications of the various default settings. Anyway, hopefully this&nbsp;blog will help to&nbsp;clarify a bit......<BR><BR>Suppose you are doing a bog-standard import of a text file like the one below. We have a data flow task with a flat-file source going to a OLEDB destination. The flat file contains 3 columns: MyDate, MyInt and MyText. In the file there are 2 rows - one containing normal values and the other one just containing blank column values. So, what happens in the case of a blank value?<BR><BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/SimpleDataFlow.JPG" width=192 border=0><BR><BR>By default, what will happen is shown in the first dataset below (2nd row). A blank date will become "1st Jan 1753", an integer will become&nbsp;"0" and text will become "". Is this what you want? Usually not - especially if "0" is also a legitimate explicit numerical value in the text file!<BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/RetainNullsDataset1.JPG" width=262 border=0><BR><BR><BR>So, we can designate that the null values from the flat file (really an absence of a value) will become a null in the resulting OLEDB destination. There is a little <STRONG><U><FONT size=3>checkbox on the flat file source</FONT> </U></STRONG>which controls this:<BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/RetainNullsFlatFile.JPG" width=341 border=0><BR><BR>The imported dataset will now look like this:<BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/RetainNullsDataset2.JPG" width=262 border=0><BR><BR>So far so good. However there is another similar setting available here. This is a <STRONG><U><FONT size=3>checkbox on the OLEDB destination</FONT></U></STRONG> labelled as "Keep NULLs" - what is that for? Basically this controls what happens when you have defaults on the destination table. If there is a default constraint on each column - eg the date should be today's date, the number should be -1 and the text "n/a" - do we want these default values to be overwritten by NULL values or to simply apply their defaults?<BR><BR>The option is shown below. In this particluar case the defaults won't get applied and will be overwritten by NULLs from the source file.<BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/KeepNullsOLEDBDest.JPG" width=355 border=0><BR><BR>If it was unchecked, and the defaults mentioed above existed, we would have the values below.<BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/KeepNullsOffOLEDBDest.JPG" width=259 border=0><BR><BR>Hopefully this makes these 2 setting and the various permutations s a little clearer <img src="http://ssisblog.replicationanswers.com/emoticons/smile.png" border="0" /><BR><BR></DIV>]]></content>
	</entry>
	<entry>
		<title>3 Tips when using (SQL) Indirect Configurations</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/15/3-tips-when-using-sql-indirect-configurations.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-15:7e839bf0-ff4f-4bc2-9c4b-21992e36c54e</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="Configurations" />
		<category term="General" />
		<updated>2007-11-15T22:40:08Z</updated>
		<published>2007-11-15T22:23:00Z</published>
		<content type="html"><![CDATA[<DIV>Plenty of info on setting up indirect configurations in BOL and in other sites so I won't bother posting up info on how to get it all going.<BR><BR>In my case I like to hold the configuration info in SQL Server rather than XML files, so it is set up centrally in one table. I'll do a more detailed blog on this some other time but basically there is a separate configurations database in Dev, Test and Live which holds relevant connection info, so a package will automatically configure itself when placed on the deployment server - the so called "<A href="http://www.sqlmag.com/Articles/ArticleID/47688/pg/2/2.html" target=_blank>keeping packages in the dark</A>" method.<BR><BR>So, what else to look out for?<BR><BR>(1) <U><STRONG>Make sure there is a unique index on the "PackagePath" column</STRONG></U>. <BR>There isn't one in the table which is automatically created by the design environment. Perhaps it should be a composite constraint on "ConfigurationFilter, PackagePath and ConfiguredValueType" but in all likelihood the "ConfigurationFilter and ConfiguredValueType" will be constant anyway - it depends on how you use the table. Anyway, you'll need this unique constraint because sooner or later someone will add a duplicate PackagePath with a different ConfiguredValue with interesting consequences!<BR><BR>(2) <U><STRONG>Once it's set up, create a rollback trigger on this table</STRONG>.</U> <BR>Future entries can then only be entered by script with an initial explicit disabling of the trigger. Why? Because in some cases the configuration values can be written from a package into the table directly from the designer<BR>without warning and the trigger will prevent this occurring. It also acts as an extra security on the table which will be the most crucial table for our packages.<BR><BR>(3) <STRONG><U>Don't store connection strings directly in the configuration file</U></STRONG>. <BR><BR>ie never have:<BR><BR><STRONG>\Package.Connections[ServerName_DatabaseName].Properties[ConnectionString]<BR></STRONG><BR>But instead have the connection string stored in a package variable and have the variable's value stored in the configuration file:<BR><BR><STRONG>\Package.Variables[User::s_DatabaseName_ConnectionString].Properties[Value] <BR></STRONG><BR>Why jump through an extra hoop and do it this way? <BR><BR>Well if the PackagePath maps directly to a connection manager which exists in Package A but not in Package B, Package B will error when running. However if the PackagePath mapping to the variable is missing, you'll just get a warning but no error so the package will run ok. <BR><BR>The one exception to this is the connectionstrings to the Configurations Database which are hardcoded for each environment (more on this in a blog on logging some other time).<BR><BR>I'm sure I came across other problems with using the SQL table, but I didn't note them down. If anything else comes back to me I'll pots it up in a comment here.<BR></DIV>]]></content>
	</entry>
	<entry>
		<title>Coping with No Column Names in the OLEDB Data Source Editor</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/13/coping-with-no-column-names-in-the-oledb-data-source-editor.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-13:0bcdf0af-7491-4386-ae79-6cc72a388e35</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="DataFlow" />
		<category term="General" />
		<updated>2007-11-15T22:38:59Z</updated>
		<published>2007-11-13T20:59:00Z</published>
		<content type="html"><![CDATA[<DIV>Sooner or later you'll create a DataFlow and come across this type of dialog box for the OLE DB source:<BR><BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/NoMetaDataOLEDBSourceNew.JPG" width=539 border=0><BR><BR><BR>No source columns listed and noting to map -&nbsp;great <IMG src="http://ssisblog.replicationanswers.com/emoticons/sad.png" border=0><BR>&nbsp;<BR>So, why does this problem occur, when all previous attempts have worked? Well the most likely case is that you are selecting from a temporary table as the last statement in the source stored procedure, and in this case there is no metadata returned to the designer. Just a simple proc like the one below will cause this problem:<BR><BR>CREATE PROC myproc AS</DIV>
<DIV>SELECT TOP 10 organisation_name INTO #MyOrganisations FROM dbo.Organisation</DIV>
<DIV>SELECT * FROM #MyOrganisations<BR><BR>So, what can we do? There are 2 main possibilities (there is also a third one that I'll post up some other time).<BR><BR><FONT size=3>(1)</FONT> One possibility is to <STRONG><U>change the SQL Command Text </U></STRONG>as follows and add the italicised text:<BR><BR><EM>SET FMTONLY OFF<BR>SET NOCOUNT ON</EM><BR>exec myproc<BR><BR>This forces the parser to actually run the query to return the metadata and reverse the SET FMTONLY ON&nbsp;that it normally uses. This works, and you'll see the column names as per normal. However there is a serious downside that seems to be missed when this solution is advised.&nbsp;My colleague&nbsp;John Gillies added an auditing step to the proc "myproc" to test this and found that the proc itself was <STRONG><EM><FONT size=2>executed 5 times when running the package</FONT>!!!</EM></STRONG> This could seriously slow things down if the initial stored proc itself is slow anyway.<BR><BR><FONT size=3>(2)</FONT> <STRONG><U>Use a Table Variable instead<BR></U></STRONG><BR>CREATE PROC myproc AS</DIV>
<DIV>declare @mytable table (organisation_name varchar(100))</DIV>
<DIV>INSERT INTO @mytable(organisation_name)<BR>SELECT TOP 10 organisation_name FROM dbo.Organisation</DIV>
<DIV>SELECT * FROM @mytable<BR><BR>You'll now need to set the SQL Command Text to be:<BR><BR><EM>SET NOCOUNT ON<BR></EM>exec myproc<BR><BR>Otherwise you'll get the dreaded "A rowset based on the SQL command was not returned by the OLE DB provider" error. <BR><BR>Using this technique will make sure that the proc only <FONT size=2><STRONG><EM>runs once only!</EM></STRONG></FONT> OK - you can't put nonclustered indexes on a table variable and all the other restrictions, so if really needed you can use option (1) but in most cases I'd reason that option (2) is the preferable solution.<BR><BR></DIV>]]></content>
	</entry>
	<entry>
		<title>But I only wanted one log file!</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/12/but-i-only-wanted-one-log-file.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-12:4265891d-315d-46eb-8f0f-af2f14e9af77</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="Logging" />
		<category term="Configurations" />
		<updated>2007-11-13T21:44:40Z</updated>
		<published>2007-11-12T20:43:00Z</published>
		<content type="html"><![CDATA[<DIV>Suppose you are using a Log File&nbsp;into which you&nbsp;do your logging. Setting it up is straightforward, and you want it to go to "<STRONG>C:\SSIS\Logs\PackageName.log</STRONG>" during testing. <BR><BR>Later on you decide to go for a more sophisticated version and use a configuration file to determine where the log will be placed. This makes sense, as your deployment server might not have the "<STRONG>C:\SSIS\Logs\PackageName.log</STRONG>" directory, and perhaps you set up the configuration file to have the log's connection string set as&nbsp;"<STRONG>D:\SSIS\Logs\PackageName.log</STRONG>".<BR><BR>When you run the package, where will your log file be found?<BR><BR>You'd assume that it'll be "<STRONG>D:\SSIS\Logs</STRONG>" because the <STRONG><EM>real </EM></STRONG>log file location will be read from the configuration file - yes?<BR><BR>Actually what happens is that you'll have 2 log files - one on "<STRONG>C:\SSIS\Logs</STRONG>" and one on "<STRONG>D:\SSIS\Logs</STRONG>"! The first will be quite small, while the second contains most of the info.<BR><BR>At first I was surprised to see this after a colleague (Reno Messina) pointed it out. Then I decided it made sense. The first log file ("<STRONG>C:\SSIS\Logs</STRONG>") has to be there to log the process up to the loading of the configurations. Otherwise if the log location in the configuration file itself was incorrect we'd have no way of getting any info at all and be even more confused <IMG src="http://ssisblog.replicationanswers.com/emoticons/smile.png" border=0><BR><BR>So, if you know the location of your log file on the destination server, set this up in the package as well as the configuration file. That way your log info will all be combined together. If not, then make sure it's a location that exists on all your servers, just to be sure that the initial log info will go somewhere without breaking the package.</DIV>]]></content>
	</entry>
	<entry>
		<title>Data Flows and Problems with Partial Inserts</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/11/data-flows-and-problems-with-partial-inserts.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-11:1ae6065f-4446-48c5-8e69-01f61d36c881</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="DataFlow" />
		<category term="General" />
		<updated>2007-11-11T15:22:28Z</updated>
		<published>2007-11-11T14:31:00Z</published>
		<content type="html"><![CDATA[<DIV>For a data flow task using the fast load option I had assumed that the process copies <EM><STRONG>all </STRONG></EM>of the records or in the case of failure <STRONG><EM>none at all</EM></STRONG>. In fact this behaviour had been confirmed to me by an SSIS guru. However, I've since noticed that partial imports are in fact possible, and we currently have big problems with them <img src="http://ssisblog.replicationanswers.com/emoticons/sad.png" border="0" />.<BR></DIV>
<DIV>Suppose you have a large text file and are doing a flat-file import into a SQL table. The package could be simply one single data flow task. <BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/SimpleDataFlow.JPG" width=192 border=0><BR><BR>The data in the flat file is one varchar column and a counter column, and all the text goes into a NOT NULL varchar column on the SQL destination. The text file always has values for the varchar column, apart from one particular row, so it looks like that below - what will happen as a result of row 3?<BR><BR></DIV>
<DIV>
<TABLE borderColor=#000000 cellSpacing=0 cellPadding=0 border=1>
<TBODY>
<TR>
<TD vAlign=top align=left>&nbsp;www</TD>
<TD>1&nbsp;</TD></TR>
<TR>
<TD>&nbsp;xxx</TD>
<TD>&nbsp;2</TD></TR>
<TR>
<TD>&nbsp;</TD>
<TD>&nbsp;3</TD></TR>
<TR>
<TD>&nbsp;zzz</TD>
<TD>&nbsp;4</TD></TR></TBODY></TABLE></DIV>
<DIV><BR></DIV>
<DIV>In fact, rows up to the problem row ("www" and "xxx") are inserted then the package errors. There's no rollback and you have a partial insert. Changing the Rows per batch or Commit size doesn't stop this happening, and we've already started out withe the Fast Load option which uses a "Insert Bulk" methodology, so changing to a non-Fast_Load which does a row-by-row iterative approach also won't improve things.<BR><BR>So - what can we do to ensure all-or-nothing in the case of NULLs?<BR></DIV>
<DIV>As far as I can tell, We have 2 potential solutions:<BR></DIV>
<DIV><BR>(1) change the "Transaction Option" to "Required" for the Data Flow task.<BR></DIV>
<DIV><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/TransactionOptionRequired.JPG" width=244 border=0><BR><BR>This works ok but has 2 disadvantages:<BR></DIV>
<DIV><BR>(a) you must ensure that MSDTC is running. There is potentially now another point of failure because someone else might change this service and not realise the impact. And this service will need to be running on all the servers you intend to move this package to.<BR></DIV>
<DIV><BR>(b) all the transformation components will be contained within the transaction, which might cause unacceptable blocking to other users.<BR></DIV>
<DIV><BR>(2) Have a task which runs on failure of the data flow and which cleans up the partial import. In fact this is what I have now gone for. I add a derived column to the data flow which adds a column with&nbsp;a flag value of -1. An ExecuteSQL task runs on failure and which deletes the -1 records. Success of the data flow results in another ExecuteSQL task running to update the flag to 0. We only use "0" records for processing further down the pipeline.<BR><BR>Hopefully this can be useful for anyone else experiencing this problem. Interestingly, this behaviour is the <STRONG>opposite </STRONG>of what happens with a check constraint. When a check constraint is violated, the whole insert fails without any treaks being made. So, perhaps this will be configurable at some later time.&nbsp;<BR></DIV>]]></content>
	</entry>
	<entry>
		<title>Upgrade Advisor - don't trust it 100%!</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/11/upgrade-advisor--dont-trust-it-101.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-11:f049145b-a672-4b1b-b9f9-8d7a7e2a8e40</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="General SQL (not SSIS)" />
		<updated>2007-11-11T13:29:54Z</updated>
		<published>2007-11-11T13:13:00Z</published>
		<content type="html"><![CDATA[<DIV>We recently tested the upgrade path for 6 large SQL 2000 databases. We'd faithfully used the Upgrade Advisor and implemented all the changes it recommended to TSQL code throughout the databases. So - everything should work - yes? <BR><BR>Well we thought so, until I tried an additional test. I scripted out all the user stored procedures (upgraded ones) in SQL 2005, then deleted them, then I ran the same script. I hoped to not find anything at all but it failed for 2 generic types of error:<BR><BR></DIV>
<DIV><U>(1) Illogical syntax no longer acceptable in SQL 2005 but works fine in SQL 2000:<BR><BR></U></DIV>
<DIV>Declare @count int<BR>Set Set @count = 1<BR>Select @count<BR><BR></DIV>
<DIV>Declare @MyDate datetime<BR>Set @mydate = cast('1/1/99' as datetime(103))<BR>Select @mydate <BR><BR></DIV>
<DIV>Yes - I know. This code makes no sense. But if you have 100s or 1000s of procedures, someone may have written something strange like this somewhere.<BR><BR></DIV>
<DIV><U>(2) Reserved Keywords.<BR><BR></U></DIV>
<DIV>We have several columns called "External". I was surprised that this wasn't picked up by the advisor, and it's easily fixed by just adding square brackets, but is still a bit of a pain.<BR><BR>If I find anything else it missed I'll post it up.</DIV>]]></content>
	</entry>
	<entry>
		<title>Fun with Precedence Constraints</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/11/fun-with-precedence-constraints.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-11:1f5248b1-2b97-4c66-b684-943250b82b0e</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="General" />
		<updated>2007-11-11T12:57:59Z</updated>
		<published>2007-11-11T12:41:00Z</published>
		<content type="html"><![CDATA[<DIV>
<DIV><FONT face=Arial size=2>I was QA-ing a colleagues packages on Friday and came across one which had a Control Flow a bit like this:<BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/LogicalAndErrorHandler.JPG" width=387 border=0><BR><BR>This logic looks OK at first sight - he wants a script to be executed if one of the ExecSQL tasks fails. Also, his testing had proven that it worked ok. But can you spot the flaw here? <BR><BR>Precedence constraints are by default combined using a logical <STRONG>AND</STRONG>. That means that <EM><STRONG>both </STRONG></EM>previous steps must fail in order for the script to execute. If just one fails and the other succeeds, the Show Message task won't fire. If you right-click on the failure constraint (either one) and change it to a logical OR, then the diagram will change to be like it is below and everything's hunky-dory.<BR><BR><IMG src="http://ssisblog.replicationanswers.com/images/105183-98006/LogicalOrErrorHandler.JPG" width=388 border=0></FONT></DIV></DIV>]]></content>
	</entry>
	<entry>
		<title>So, what's all this got to do with replication?</title>
		<link rel="alternate" href="http://ssisblog.replicationanswers.com/2007/11/10/yet-another-ssis-blogger.aspx" />
		<id>tag:ssisblog.replicationanswers.com,2007-11-10:7f4ba51d-88bd-450f-bd57-9f30845c9095</id>
		<author>
			<name>Paul Ibison</name>
		</author>
		<category term="General" />
		<updated>2007-11-13T21:47:42Z</updated>
		<published>2007-11-10T14:02:00Z</published>
		<content type="html"><![CDATA[<DIV>Well - nothing really&nbsp;give or take&nbsp;the odd URL! <BR><BR>But the last 12 months I've been exclusively working on SSIS so I'd like to finally write up some of the things we've learned in my team. <BR><BR>Some of it is obvious looking back but wasn't clear to us at the time. Some other points don't seem to be covered elsewhere, or at least are not clear in existing articles.&nbsp;I may or may not get&nbsp;something&nbsp;posted this weekend, but either way please keep an eye out over the next few weeks because I already have about 7 topics in mind.<BR></DIV>]]></content>
	</entry>
</feed>