Monday, May 28, 2007

SSIS - Populating Dimension

Dear Friends,

I was looking for several ways to populate a dimension inside a dataflow and is not so simple find the right solution to that...

It does make sense populating dimensions only in case of the input database has redundance in their tables, or either, not normalized.

The database relation needed to this example is:

and the transformations inside the dataflow are:

Now, I will describe with more detail each of the dataflow transforms.

First step is to create a global variable to store the value of the identity key. I’m manually creating the identity key in my dimension table, so, I need to know the value on each row of the dataflow. (This value will be incremented in the script component transform)

Add a Execute SQL Task to the control flow to get the last identity key:

Add a LookUp transform, to LookUp for InstrumentName in the dimension table "Intrumento"

Add a MultiCast transform for the rows with new InstrumentNames founded.

Add an Aggregate transform to get the distinct InstrumentNames founded.

Add a Script Component transform to increment the INST_IDENTITY global variable

Script Component explanation:
PreExecute Method
Receives the global variable INST_IDENTITY with the last Identity value for the table Instrumento
Initialize the local variable counter, that will be incremented for each row.

Input0_ProcessInputRow Method
Increment counter variable and save it for each row in Row.INSTIDENTITY output column

PostExecute Method
Refresh the global variavle INST_IDENTITY with the counter value.
Add another MultiCast to allow you to Sort the new rows with Identity column created in the step before, and to sort these new records (InstrumentNames) on the Database.
The mappings for the OLEDB Destination that must be created, are:

Add a Merge Join, and configure as a LEFT JOIN like this:

Finally, you have all the rows with the respective Identity value.
I hope to receive some feedback from you!


Anonymous said...

IF a row fails while insertion we will lost the value of the identity column, is there any way to put in the vlue of the identity coulmn when it inserted.

Pedro said...

If you still need some elp, please send me an email!

Pedro said...

This is a old post... and in fact what is doint in the same flow is inserting dimension and fact in the same dataflow... in order to go only once to the sources. If you use a mirror database, you could resolve the problem of need to go only once to the source...

Even better to deal with erros and restart the ETL process.

If you want a new and more apropriate description I can create an updated post for you.

Even If there is only one user interested in an update post, I'll do it!


Jane said...

Hi Pedro,

I'm following your instruction but get stucked at the Script transaform. Where is the Row.INSTIDENTITY in Input0_ProcessInputRow function coming from?


Pedro said...

HI Jane,
thanks for your visit.
The Row.INSTIDENTITY column was added manually in the script component editing...
Dont forget that this post has more than 2 years and was focused in a specific requirement. Also you should first populate Dimensions and next the facts in a speratly process and not doing inall the same data flow.
Let me know If i can help you better!

Jane said...

Thanks for the super quick respond!! This article is exactly what I have been looking for. I've got the Script Transform part figured out (with your help of course :)). I will continue it from here and will keep you updated!!

Pedro said...

Very good!
If you have some issues just let me know. In my profile you have my email.

Amir said...

This worked great for me Pedro! Works just as advertised and i think is the best way of dynamically populating a dimension table. Thanks!

Pedro said...

Hi Amir,
It'is an old post but I'm very happy that it helped you resolve your issue!!
thanks for your visit!


Related Posts Plugin for WordPress, Blogger...