SSIS – A Generic Way of Landing Source Data

I like the old school of thought when comes to coding- reusability and have found managing SSIS packages developed over time pretty painful, especially when they generally do the same thing. I am impressed with BIML although the lack of knowledgebase on it makes for a steep learning curve. It bears the intentions of reusable code, and hence facilitates quick ETL development time once you’ve got your head round it.

A typical use of ETL package I have seen in the past year working as a BI Developer (previously a nagging ol’ DBA) is landing copies of source data in to a BI/Datawarehouse database. BIML offers a perfect method of ensuring we keep to the same standard of building and maintaining these packages. Not happy with just one solution, I thought of another — what about creating just one generic landing package?

One stumbling block came in mind: Column Mappings – handling differing metadata. This meant that I could not go down the traditional transformation route:

Capture

Scripting Task Handles Dynamic Column Mappings

Capture_bc

I see an opportunity here to use scripting task that bulk copies source data to destination table which provides a workaround for column mappings. This forms basis of the generic landing package.

Snippet code from Bulk Import Table Scripting Task:

static void CopyData(DataTable SourceTable, SqlConnection Destination, string DestinationTableName)
{
try
{
using (SqlBulkCopy s = new SqlBulkCopy(Destination))
{
s.DestinationTableName = DestinationTableName;
s.NotifyAfter = 10000;
s.WriteToServer(SourceTable);
s.Close();
}
}
catch
{
throw;
}
}

Metadata Read In Or Determined On-the-fly

Source Command Query and Destination Table information are retrieved from SSIS variables, which could be read in as SSIS parameter passed in from outside the package or worked out via system metadata views queried from a SQL Execute Task which is executed prior the scripting task. Either way it would give the package scope to get at the source and destination metadata flexibly. Note though that the metadata is retrieved from a metadata connection manager allowing for metadata to be retrieved from a known, stable schema.

Snippet of code reading command query and destination table name from ssis variables.

string SourceCmd = Dts.Variables["vSQLCT"].Value.ToString();
DataTable dt = new DataTable();
SqlCommand cmd = new SqlCommand(@"" + SourceCmd, source);
SqlDataAdapter ad = new SqlDataAdapter(cmd);
ad.Fill(dt);
string DestinationTableName = Dts.Variables["pWorkSchema"].Value.ToString() + "." + Dts.Variables["pToTable"].Value.ToString();
dest.Open();

Change Tracked Source and Other Parameters

Below are parameters used to set the behaviour of the package:

Capture_param

pChangeTrack if set to True, Source Command Query Text will query ChangeTable function and uses the pVersion parameter to determine which version to retrieve. It also assumes that a landing table is required to persist historical data, using the parameters pLandingSchema, and pToTable.

pCreateTables – Creates work (bulk inserted table) table and if needed an equivalent landing table. This should be used as a one-off as setting this to true will delete all historical rows in the landing table.

Running the Package

To be able to run several ETL from varying sources, I have built a few tables. The records for each ETL run is then looped through and executed via t-sql in a scheduled job.

DECLARE @schema AS SQL_VARIANT
,@table AS SQL_VARIANT
,@version AS INT
,@maxexecorder AS INT
,@i AS INT = 1
,@foldername AS NVARCHAR(128)
,@projectname AS NVARCHAR(128)
,@applicationame AS NVARCHAR(50) = 'AdventureWorks2012';
DECLARE @output_execution_id BIGINT;

SELECT @maxexecorder = max([ExecOrder])
FROM [dbo].[LandApplication] a
JOIN dbo.LandObjects b ON A.[LandApplicationID] = B.LandApplicationID
WHERE a.ApplicationName = @applicationame
AND b.Disabled = 0;

WHILE (@i <= @maxexecorder)
BEGIN
SELECT @schema = cast(SourceSchema AS NVARCHAR(128))
,@table = cast(SourceTable AS NVARCHAR(128))
,@version = isnull(version, - 1)
FROM [dbo].[LandApplication] a
JOIN dbo.LandObjects b ON A.[LandApplicationID] = B.LandApplicationID
LEFT JOIN [dbo].[LandObjectInstance] c ON b.LandID = c.landid
WHERE a.ApplicationName = @applicationame
AND b.Disabled = 0
AND execorder = @i;

SET @foldername = N'SLIMLandETL';
SET @projectname = N'SLIMLandETL_Adworks.etl';

EXEC [SSISDB].[catalog].[set_object_parameter_value] @object_type = 30
,@parameter_name = N'pSchema'
,@object_name = N'SLIMLandETL.dtsx'
,@folder_name = @foldername
,@project_name = @projectname
,@value_type = V
,@parameter_value = @schema;

EXEC [SSISDB].[catalog].[set_object_parameter_value] @object_type = 30
,@parameter_name = N'pTable'
,@object_name = N'SLIMLandETL.dtsx'
,@folder_name = @foldername
,@project_name = @projectname
,@value_type = V
,@parameter_value = @table;

EXEC [SSISDB].[catalog].[set_object_parameter_value] @object_type = 30
,@parameter_name = N'pToSchema'
,@object_name = N'SLIMLandETL.dtsx'
,@folder_name = @foldername
,@project_name = @projectname
,@value_type = V
,@parameter_value = N'work';

EXEC [SSISDB].[catalog].[set_object_parameter_value] @object_type = 30
,@parameter_name = N'pVersion'
,@object_name = N'SLIMLandETL.dtsx'
,@folder_name = @foldername
,@project_name = @projectname
,@value_type = V
,@parameter_value = @version;

EXEC [SSISDB].[catalog].[set_object_parameter_value] @object_type = 30
,@parameter_name = N'pApplicationName'
,@object_name = N'SLIMLandETL.dtsx'
,@folder_name = @foldername
,@project_name = @projectname
,@value_type = V
,@parameter_value = @applicationame;

EXEC dbo.execute_ssis_package_slimlandetl @foldername
,@projectname
,@output_execution_id OUTPUT;

SET @i = @i + 1;
END

 

Next Steps

dwLanding.zip (115 downloads)

The attached download is by no means the complete article; I have yet to complete the controlling tables (i.e. tables in which hold data for each ETL run) and also add in auditing and error handling. But should provide a good start to quickly land/stage data from SQL sources into a SQL BI database instance without the need to create separate SSIS packages.

 

 

Share this Indie Post:

ariati

Everything about databases for me! and oohh a whole lot of chocolates ❤️❤️

Leave a Reply