Sunday, October 14, 2007

BI CaseStudy - Slowly Changing Dimension (Type2)

The Problem
Store historic data in a relation ManyToMany between two Dimensions.

The Scenario
For each day I need to import via ETL about 5.000 rows of financial movements into a FactTable.
This data must be viewed by several dimensions: Entidade, Carteira, Mesa, Folder and FolderGroup. Each dimension has several variables that is related only by the dimension, and these variables must me store along time. So I need to store historic data for each dimension and In this project there is two important areas:
1. Entidade, Carteira, Mesa and Folder is a typically a Slowly Changing Dimension (Type2) and these dimension are structured as a snowflake. (Yes… I know Kimball doesn’t recommend the use of snowflakes, but I think for this case the data could be better understandable and the data model will be more easy to maintain in the future)
2. FolderGroup is a ManyToMany relation between Dimension Folder and Dimension Group and a Bridge to link the both.

Now, I’ll show the current structure and then I’ll explain my problem.
As I told you before, my problem is that I cannot create a Slowly Changing Dimension (Type 2) when there is a relation ManyToMany between Dimensions.
In my project I have the typically SCD (type2) that is working good and I fell data model very “clean” and understandable.



Sample Data from Slowly Changing Dimension (Type 2)
I divided Entidade on Entidade and EntidadeHistoric because the attributes never change and by this way I’ll avoid redundant data in the dimension. If for any unexpected reason I need to change ENT_Name or ENT_RelatorioKVaR, it will be a typically SCD (Type1) and I’ll do a classic update to the dimension row. The other reason of this division is that I only need to save historic data for hENT_RacioCooke. (I have other fields to store but for this example I’ll omit it)

The value 9020 is the Foreign Key to the Time Table and correspond to the date 10-10-2003.
So, from this picture we can see that for Lisbon, the field hENT_RacioCooke, between date 8888 and 9020 is 0,5 and between 9021 until now is 0,1.

Sample data for my Snowflake schema
I decided to create the relations below as Snowflake, because I felt that could be a better way to maintain the data in the future.
The business keys from each level (ENT_ID, CRT_ID, MESA_ID and FOLDER_ID) are in a string format to be coherent with the source data of ETL and to be more comprehensive for the admin users that will be analyzing data directly to OLAP and OLTP databases.


Based on the images posted before, I will show you how the SCD (Type 2) is working on my project.
I decided not include all the structure and focus only in the essential data to understand my pending problem.


As you can see in the above picture, this is my SCD (type 2). I’ll test if this structure is working good creating the follow SQL query:

And the result is:

Imagine you need to add a new field to the table Dados with the formula : D_MeasureX_ENT=D_MarketValue*hENT_RacioCooke.
For the same D_MarketValue value (10.000) in the days 9019 and 9023, the new field D_MeasureX_ENT is different because the RacioCooke between 8888 and 9020 was 0,5 and between 9021 until now is 0,1.


OK.. now that you understood my project, I will start describing my real problem. All I write above could be changed and discussed with us, but for now is working perfectly and for me is urgent resolve my current problem… but I accept suggestions for it!

In the project I have to group folders in groups and to stores the historic as I did before for Entidade, Carteira, Mesa and Folder. I remember you that each historic depends on each dimension, and I need to group 3 or 4 folders into a group and this group has a specific RacioCooke as Entidade, Carteira, Mesa and Folder. And the problem is that a folder in fact Dados could be in more than one group.
While in the SCD described before there is a relation OneToMany, and you could store your surrogateKey (Foreign Key) in the Fact Dados, in this case you couldn’t because there is a relation ManyToMany. I tried to use the structure below, but when I tested, didn’t work because it would create duplicate records in my fact Dados.

The Big Question
What is the hGRP_RacioCooke for the row 2131 in Fact DADOS for GRP01 (Investment) and for GRP02 (Negotiation)?

The Solution
I already found the solution and I will explain that later.
Kind Regards

7 comments:

Darren Gosbell said...

I am having trouble understanding your issue.

Can you possibly put the problem in the following form:

1) here are my tables (I think your last image probably shows this)

2) here is the SQL query that I tried and the output I got.

3) here is the output I would like to see.

Pedro said...

Darren,
Thanks for your comment.
I will try to explain better, but what I need to know is where I put my surrogatekey to store the historic for the folderGroup... How can I deal with the historic when there is a relation ManyToMany between 2 dimensions?
I can store historic with other dimensions Entidade, Carteira, Mesa and Folder, but with FolderGroup I'm having problems!!
Understood?!
Thanks Darren!

Amin said...
This comment has been removed by the author.
Pedro said...

Dear Amin,
Of course!! Send me the details of what you pretend to do and I will try to explain you my point of view!!
This specific problem that I wrote some time ago, is different for the normal SCD Type 2! :-)
Cheers!!!

Amin said...
This comment has been removed by the author.
Amin said...

Dear Pedro,
I have a big problem with implementing slowly changing dimension in SSAS,
the main problem is relating to the usage of ScdStartDate/ScdEndDate/ScdStatus/ScdOriginalID within SSAS,
Do you know how can I implement a SCD type 2 with these properties?

Thanks,
Amin

26 January, 2008 09:03
------------------------
Dear Pedro,
I realized that these properties have not implemented in client softwares and they are metadata for clients which has not used in clients yet, but If you had any exprience against this, I really appreciate to know.
What use ScdStartDate/ScdEndDate/ScdStatus/ScdOriginalID?

27 January, 2008 10:23

Amin said...
This comment has been removed by the author.

LinkWithin

Related Posts Plugin for WordPress, Blogger...