When to Not use SSIS - directory / file iterations!!!!
I've become a bit too much of an enthusiast about using SSIS for everything recently and I came unstuck with a simple file iteration problem........
I simply wanted to iterate through a set of network directories and do some file copying. Basically I needed to sync up the UAT and live directories of source files, so that the testing in the UAT environment could be a true parallel test of live. So, if the process came across a recent file in a live directory that wasn't in the corresponding test
subdirectory, it should be copied over.
I agree that this sounds like a straightforward issue. It's the sort of thing I can easily do in VBScript or VB6.0 but I'm trying to do everything in SSIS at the moment to centralise the code for maintenance, so I decided to do it that way.
In the designer we see that there is a task which seems custom-made for this type of problem - "Foreach Loops based on File Enumerator" so away I went and created the package below:

In the SSIS package above, the outer Foreach_FeedDirectory loop iterates through a recordset of directories and stores the directory path in a local variable. The inner ForeachFeedSourceFile loop iterates over the files in the specific directory.
Here's where you begin to come unstuck - all the usual file attributes you'd expect to see are simply not exposed to the Foreach File Enumerator. In the SSIS loop we can get the filename with or without the full path / extension and that's about it - no "Created Date", no "Last Modified Date", no "Size" etc etc. To get any of these useful attributes you have resort to scripting!
So in the package shown above, there is an inner script task which uses System.IO.FileInfo, System.IO.File.Exists, System.IO.File.Copy etc to decide whether to do the copy or not and applies to the file name found in the inner Foreach File Enumerator.
Let's recap: I am looping through each directory. Once I find a diectory I loop through the files. Once I find a file I use the FileInfo method to get the file info in script, then copy over if it is recent and if it doesn't exist in the destination directories.
On my 33 directories, some of which contain 1000s of files, this takes approx 1.5 to 2 hours to run in SSIS
As this is a process I need to run often, this is a complete disaster
. I can see that the iteration over the files is the bottleneck and this gives precious little useful info in the task, so as a trial, I start thinking of c# as a replacement. Now I'm definitely not a c# developer (and never will be
), but I managed to knock together the console application below to replace the entire package in an hour or so using my pal Google. Amazingly it c# does the same task but takes just 70secs!!!!!!
So - if you're just doing iterative directory / file manipulation and have to incorporate scripting, I'd definitely recommend resorting to c# rather than SSIS.
C# code:
private static void CheckDirectory(String sLiveDirectory, String sTestDirectory)
{
Console.WriteLine("Processing: " + sLiveDirectory);
DirectoryInfo dirCustom = new DirectoryInfo(sLiveDirectory);
FileInfo[] filCustom;
filCustom = dirCustom.GetFiles();
foreach (FileInfo filFile in filCustom)
{
TimeSpan ts = DateTime.Now - filFile.CreationTime;
if (ts.Days < 2)
{
if (!File.Exists(sTestDirectory + filFile.Name))
{
File.Copy(filFile.FullName, sTestDirectory + filFile.Name);
Console.WriteLine("Copying: " + filFile.FullName);
}
}
}
}
I simply wanted to iterate through a set of network directories and do some file copying. Basically I needed to sync up the UAT and live directories of source files, so that the testing in the UAT environment could be a true parallel test of live. So, if the process came across a recent file in a live directory that wasn't in the corresponding test
subdirectory, it should be copied over.
I agree that this sounds like a straightforward issue. It's the sort of thing I can easily do in VBScript or VB6.0 but I'm trying to do everything in SSIS at the moment to centralise the code for maintenance, so I decided to do it that way.
In the designer we see that there is a task which seems custom-made for this type of problem - "Foreach Loops based on File Enumerator" so away I went and created the package below:
In the SSIS package above, the outer Foreach_FeedDirectory loop iterates through a recordset of directories and stores the directory path in a local variable. The inner ForeachFeedSourceFile loop iterates over the files in the specific directory.
Here's where you begin to come unstuck - all the usual file attributes you'd expect to see are simply not exposed to the Foreach File Enumerator. In the SSIS loop we can get the filename with or without the full path / extension and that's about it - no "Created Date", no "Last Modified Date", no "Size" etc etc. To get any of these useful attributes you have resort to scripting!
So in the package shown above, there is an inner script task which uses System.IO.FileInfo, System.IO.File.Exists, System.IO.File.Copy etc to decide whether to do the copy or not and applies to the file name found in the inner Foreach File Enumerator.
Let's recap: I am looping through each directory. Once I find a diectory I loop through the files. Once I find a file I use the FileInfo method to get the file info in script, then copy over if it is recent and if it doesn't exist in the destination directories.
On my 33 directories, some of which contain 1000s of files, this takes approx 1.5 to 2 hours to run in SSIS
As this is a process I need to run often, this is a complete disaster
So - if you're just doing iterative directory / file manipulation and have to incorporate scripting, I'd definitely recommend resorting to c# rather than SSIS.
C# code:
private static void CheckDirectory(String sLiveDirectory, String sTestDirectory)
{
Console.WriteLine("Processing: " + sLiveDirectory);
DirectoryInfo dirCustom = new DirectoryInfo(sLiveDirectory);
FileInfo[] filCustom;
filCustom = dirCustom.GetFiles();
foreach (FileInfo filFile in filCustom)
{
TimeSpan ts = DateTime.Now - filFile.CreationTime;
if (ts.Days < 2)
{
if (!File.Exists(sTestDirectory + filFile.Name))
{
File.Copy(filFile.FullName, sTestDirectory + filFile.Name);
Console.WriteLine("Copying: " + filFile.FullName);
}
}
}
}

Why don't you use robcopy.exe to sync the directories - it's a one liner. I use it all the time, there are loads of copying options and its an official MS utility - found in the server 2003 resource kit tools, so there is no reason why even corporate administrators would not install it.
Reply to this
If only it were that simple in an investment bank
Reply to this