Business Intelligence Blog (BI Blog): SSAS

Showing posts with label SSAS. Show all posts

Tuesday, April 27, 2010

BI Conference - TechDays 2010 - After my session

Finally I have some time to describe what I experienced last week during the TechDays Portugal conferences. Today I’m writing my post nrº100 and could not do it better describing this fantastic experience

The facilities
The day before my session I went to see the event facilities and aclimatize myself to the room where I would present my session. There were many people working for this event, and it is important that everyone had the notion that organizing such an event is not easy. It requires lot of human labour and a lot of dedication, mainly from persons like Luis Martins, João Bilhim and Leonor (Microsoft) and several other amazing dedicated persons

The people
I met many people and also spoke with people who normally only contact via the Web. Is indeed a very interesting event, where people share knowledge and opinions that for sure is more important than if the lunch is good or bad! Jorge Paulino, Ricardo Peres, André Lage, Rodrigo Pinto, MVP guys, Jorge Moura, Nuno Batalha, Tiago Rente, and a lot of enthusiastic professionals

The Session
Most people, certainly should not have a notion that it was my first public session with so many people... and also, that a year ago never came into my head be presenting a session at TechDays. It was a very difficult challenge trying to pass a message on Data Mining for almost 300 people, mainly because the profiles of the audience was very diverse ... feedback received varied between easy and very difficult session contents. Thanks to all the persons that attended to my session. Well… if you're curious take a look into presentation below:

TechDays 2010 DAT202

I hope you enjoy it... I can translate to english if needed.

Tuesday, April 20, 2010

BI Conference - TechDays 2010 - Before my session

Dear friends,
There are only a few hours before presenting my session at TechDays. It will be a great challenge for me where I hope I can transmit the message I purposed to.

I don’t know yet, how many people will be watching my session, could be from 40 to 500 attendees… and for that reason I cannot measure right now the pression level that I’ll have today… :-)

There’ll be a lot of interesting sessions almost made by portuguese professionals…It’ll be a prove, that here in Portugal we have lot of expertise professionals.

I am sure that next year there will be lot of new speakers that are just watching this year. Mainly the master and pós-graduation BI Students… are your there!?!! :-)

Well… I'll leave now and I’ll wish a good luck to myself and a very good TechDays to all you. :-) I’ll be back after the event...

Pedro

Saturday, December 19, 2009

BI Tools - First Look to Excel 2010

Create Dashboards never been so easy ... the new version of Excel 2010 (Beta) is really amazing. Exploration of a multidimensional cube has never been so intuitive and simple like it is now. The image below shows a simple dashboard that I made in this new version in 20 minutes for the Northwind database sample...

There still are some features to improve in this version, anyway, the new features promise to make the life of the decision maker easier... and now, I sincerely believe that the democratization of business intelligence is beginning, and finally Business Intelligence will soon be available to all employees of an organization. And all because Excel is the best known tool in the world, where almost users who use a computer, know what Excel is and what Excel do. Many BI Vendors know it, and began to build few years ago, add-ins for Excel in order to take advantage of some previous limitations of Excel. And a question arises ... will these small businesses survive the constant improvement of Excel? Only time will tell whether they will be able to adapt or not with new solutions... I really hope so, becausee this small companies do a big important job… and Microsoft knows that!

As the most interesting features from Excel 2010, I highlight the slicers, the “sparklines”, and the possibility of creating Named Sets through Excel. The Slicers lets you visually filter the data in a PivotTable and quickly extract exactly the data you're looking for. Slicers can be moved and resized like charts and graphs. And each slicer could be connected to several pivot tables… for me it’s really amazing this feature. The “sparklines” is a copy of some products made before by XLCubed (check the SQL Bits session where XLCubed presented “Microcharts”) and it’s in a very early stage. And the last feature that I highlight today, is the possibility to create named sets from Excel, grouping a set of items and several other features like customize MDX statements of a pivot table.

I hope this post was helpfull to you.
regards, Pedro

Monday, September 29, 2008

BI Education - SQLBI Methodology

Marco helped me two years ago in a dimensional model design problem. He probably does not remember me, but I’m following his work, mainly all the stuff relating to dimensional modelling. Marco published last week the first draft for an alternative purpose to Inmon and Kimball approaches. I spent yesterday and today reading all the 44 pages of this first draft. And came to me all the issues and problems that I felt in the past… it’s really a very good approach not only for BI professionals using Microft tools.

I’m comment Marco Russo and Albert Ferrari draft, only because I want to contribute with my opinion for their project and advertise it. And I would be happy if professionals and friends like Subhash, Bruno, Amin, Tomislav, Alex, Ella and all other experts give their opinions about this subject.

It’s an approach near the real world and my next comments are only to clarify myselft and make this draft even better. Like they say in the draft:

“Always remember that these are not fixed and immutable rules. Your specific implementation of the architecture will probably be constrained by other considerations. You – as the BI analyst – are the one and only person that will be able to take these decisions in a correct way. “

Download the draft in the official website

http://www.sqlbi.com/sqlbimethodology.aspx

Some comments/questions

1. The main use of the configuration database is for dealing with bookkeeping information and with data that come from more than one source? An example is to deal with the same customer that comes from two different sources with a different name… Which more funcionalities this configuration database could have? I think this configuration database should be shared with several BI solutions and use SQL schemas to split them. If you have one database configuration for each BI solution you’ll have more cost and more work to maintain it.

2. I’m agree that mirror database, filters only the required data, and the staging area is a relational database that will be a bridge between mirror and datawarehouse denormalized database. If you need to process all the cubes again, datawarehouse database stores all the historical data or you need to go again to mirror database or operational sources?! (sometimes the historic data in operational sources is available only once)

3. SQL Schemas could create problems in third party products, and I always avoid them. What you think about it? In Data warehouse or Data Mart databases you change the schemas used in Staging Area, could be good for understand the solution, but I’m afraid that could be an issue in ETL and OLAP designs. What you think?

4. Your definition of Metadata OLTP DB makes me a confusion… do you have views to OLTP operational systems to work as a bridge to DW in case of not using a staging area, but normally metadata is contextualizing your data… describing what each field, each table, each database is … I think you could explain this better.

5. Following your draft, in Kimball approach, the Data Warehouse database has a denormalized structure in spite Inmon aproach that has a normalized structure. Am I correct? There is some confusion in this point…

6. I don’t think that in SCD type 2 the status column should be used. I only understand the use of this column in this type of dimensions, only for users that goes inside database and need a fast understand of the current row, because in practice, if you lookup for the status column, you may have a problem if you need to import rows from a previous date (late arriving facts)… and you will map this rows to the status and could not be true

7. My defition of ODS was a little different. Your ODS description (only used by softwares) may cause several views of “true”. What do you think?

8. I’m completed agree with you about Natural and Surrogate Key in date dimension. In my last project I used Surrogate Key as an identity integer, that make me felt lost. I always needed to join the fact table with date dimension to understand the mean of each fact.

9. In page 36 when you wrote “Mirroring is not part of the ETL. “, I think create the mirror is the extract step of an ETL (ExtractTransformLoad)… you don’t do any transformations… only the basics like change column names and extract the data, and if you can apply some filters. Changing column datatype could generate an error… and we must avoid accessing source operational systems more than once

10. In pag 42, in which database you store logs added by SSIS Log provider? (Sysdtslogs). I like the answer you gave to the question “what is the difference between log and audit?” and also the use of audit information directly in the cubes but hidden for business users. Some people are a little confused in this subject.

11. How BI analysts control the data lineage? And how we can make an impact analyst on a change in a database?

Improvements

1. I think could be interesting include some standards in database objects (tables, views and stopred procedures names). Which nomenclatures you normal use to each object in each database (Mirror, Staging, ODS and DW e DMart)?

2. More focus on business and technical metadata of each database.

3. Create an addittional shared database for SSAS cubes performance using used based optimization. (Activating query logs of olap cubes and creating aggregations based on those logs)

4. In configuration database described, could be interesting the use of data profiling

5. Include a data mining structure

The original image was mofified by me and got it from draft 1.0.

I think that some comments could be basic for some of you, but because there aren’t any standards and any ISO for this subject, BI Analyst still some has doubts about the best pratiques to follow implementing a BI solution

So, I hope to receive Marco and Albert here to clarify me the points described before. It’s a very important step that these guys are doing, because they are creating rules, standards and make we speak the same language.

Good Luck!!

Updated 29-09-2008 with Albert Ferrari response. I will reply in the comments section.

Albert and Marco response:

Pedro,

I'll try to give you some answers, even if the space here is not enough for a complete discussion:
1. Yes, the config DB can be shared between different solutions for the same customer even because, in the DWH world, "different solutions for one customer" means "One complex solution for one customer, composed of different subject areas". :)

2. In some situation we store in the OLTP Mirror several snapshots of OLTP tables, in order to be able to rebuild the history. Clearly the DWH contains all the history but, sometimes, you will need to rebuild the DWH too...

3. I would not avoid schemas in order to satisfy third party tools. If some tool need to query the DWH then it will have its views and, if it cannot use schemas, then its views will live in the dbo schema.

4. OLTP Metadata is just a set of views. If you cannot build them in the OLTP DB and you cannot afford an OLTP Mirror, then an OLTP Metadata might help.

5. The Kimball approach is de-normalized but the extent of normalization that is needed in an Inmon one is not very clear, it all depends from the analyst. Inmon leaves all these "details" to the analyst, giving just some advices.

6. I agree with you. What else could I say? :)

7. What is your definition of ODS?

8. Yep. Even if, sometimes, I had to change the key, using some very creative way of expressing a date. If you have several billions rows then the INT representation of YYYYMMDD might be a problem and a smallint solution might be helpful. Clearly 20080101 is not a smallint... so you'll need some differente ways to express the date.

9.I don't understand your note. During mirroring you should not do any kind of transformation and/or column rename. If a column is called C1P8D3B0 ans is of type INT, then you will have that column name and type in the OLTP mirror and a VIEW that will change name and type when you will extract from the mirror.

10. I think the log DB is the perfect place for thos logs.

We are about to publish a second whitepaper that might be interesting to look at, it will provide a demostration of the methodology on the very (ab)used AdventureWorks DB.
Thanks alot for your comments, I will use them and your improvement section for a future review of the whitepaper.
Alberto

Thursday, August 14, 2008

Data Mining - Rafal Event

In April 2008 Rafal Lukawiecki came to Portugal and unfortunately I didn’t attend it. I confess, that I’ve never heard about Rafal, but today, and after seeing the four webcats, I feel that I lost a fantastic event. So... do you know what I did? I link my laptop to my LCD television and see all the Rafal webcasts!! It was really fantastic... now I understand my friend Subhash!

But who is Rafal? See this YouTube video...
http://fr.youtube.com/watch?v=Vs46QZwnEcc

After my Data Mining class in my master BI degree, I’ve started to look to data mining with new eyes, and data mining is the main intelligence that a BI project could have... you could have an ETL, DW or Reporting module perfect, but the intelligence in BI is in the capability to predict the future, to analyse past and present behaviours and make the difference predicting on what it’ll happen next!

For those who really like to learn Data Mining, independently about the tools that usually use my suggestion is to start from here... from starting to see these four Rafal webcasts. Fortunately, in my master BI degree I had one of the most gurus of Data Mining here in Portugal... the teacher Dr. Duarte Trigueiros... He’s also fantastic and very comunicative as Rafal. From one hand Rafal speak more near the business logic and sometimes more intuitive and for other hand the teacher Dr. Duarte Trigueiros is more statistical. The conjuction of both approches is fantastic, and it was a privilege and a proud for me...

Introduction to Data Mining
Working with Data Mining
Using Data Mining in Your IT Systems (Part 1)
Using Data Mining in Your IT Systems (Part 2)
http://www.microsoft.com/emea/spotlight/event.aspx?id=99

After the Rafal webcasts, I dont have nothing more to write about it! But I’m currently doing a work for my master BI degree to predict the stock market behaviours... could be a good way to became rich fast!! I’ll tell about it in a next post...

Thursday, March 20, 2008

BI Education - The Data Warehouse Lifecycle Toolkit

Dear Friends,
I ‘m very busy with my job and my master BI degree, but I will try to come here with more frequently. I have too many things to blog and I don’t know where I will start from.

For now, I’ll write few words about the recent book of Ralph Kimball. I confess that I always was a fan of Kimball books, but normally those books have a lot of years. Although the subjects of almost the books are in use, I think that they was a little old and could be better adapted and integrated.

Ralph Kimball Group launched a new and fantastic book, “The Data Warehouse Lifecycle Toolkit” (second edition) on last January 2008. I read only four chapters yet, but I really feel that this book is better than others from Ralph Kimball Group. Check the contents of the book in the amazon or wiley website.

Obviously, I bought this book, and it has a fantastic price on Amazon (UK) at

http://www.amazon.co.uk/exec/obidos/ASIN/0470149779/ref=nosim/schildnet0a I paid for the book about 18€ (Euro) with shipment included and received in 2 days!!! (I’m in Portugal… don’t forget it) Normally, I buy the books at Amazon directly to the Amazon partners because is more cheaply, but in this case there is no difference between Amazon official price and the partners price.

I’m preparing a post about Reporting Services 2005 and 2008 querying an OLTP database.
Regards!!

Tuesday, March 4, 2008

BI Conference - SQLBits - Birmingham (UK) - After

Dear Friends,
I’m just arriving from Birmingham (UK) and I only have one Word to describe the experience: FANTASTIC! I have several things to write but if you weren’t there you missed a fantastic day!
As you already know, I’m currently living in Portugal but last Friday I pick my girlfriend and went to Birmingham (UK) to attend the SQLBits conference and back to the city where I’ve finished my degree some years ago. Microsoft Portugal is not investing too much in SQL BI tools, and in the last summer it had only one person specialized in these tools… To prove it, the next TechDays 2008 in Portugal there will be just a few sessions in SQL BI tools and probably too basic… The possibility to hear some “gurus” and the authors of my preferred books from Wrox and Wiley, it was an opportunity that I couldn’t missed!
I will describe in general how it was this journey to Birmingham and I will detail my preferred sessions on detail in a next post.

The Conference
There was several sessions in this conference, and I attended the sessions related with Business Intelligence MS SQL tools. Darren Green, Allan Mitchell and Chris Webb didn’t disappointed me in their sessions, and the money expended in this journey was a very good investment on my career and a good opportunity to “open my eyes” more and more to the world.

Allan Mitchell Session
Allan spoke about the new features of SSIS 2008 and compares it with the SSIS 2005. He shows some details that usually some developers don’t pay much attention, the performance… he show some details behind the scene, and in my opinion he done it good!
MultiCast
Pipeline Limiter
LookUp Component (Cache Options)
Data Profiling Task
Change Data Capture

Chris Webb Session
This session was very interesting… I expecting to see something about new features and good advantages of SSRS, and what I saw was the opposite and I’m totally in agree with him and this session prove that this type of events are not to sell products but is a place where we can receive very good and sincerely tips and tricks from developers to developers. And Microsoft was one of the sponsors…

Darren Green Session
Darren session was very cool and he spoke about a subject that I normally in SSIS avoid but after the session I will take more attention… and the subject is custom data components. I bought the WROX book that Darren wrote and refer this subject but I didn’t read those chapters… But is really interesting , not only to understand better the pipeline engine but also to develop more specific and complex transformations.

The City and BullRing
As I told you before, I finished my degree in Birmingham some years ago. There are several improvements in the city but it still beauty! A very good improvement was the BullRing Center that is a big shopping place with an area equivalent to 26 football stadiums!!!

The Hotel
Normally, hotels in England are not comfortable or modern, but this Hotel is very modern and I knew a Portuguese man that works there!

The NightLife
The night bars and discos are incredible on pressure of safety guards…

So... in resume, was a fantastic journey!! I will try to detail more about the conference in a next post!

Cheers!!

Monday, February 18, 2008

BI Conference - SQLBits - Birmingham (UK)

Dear Friends,
Today I'm very happy!! I received the confirmation for the next SQLBits conference in Birmingham(UK). And I'm happy for several reasons...

I will return to Birmingham where I finished my IT Degree some years ago (NewMan College 2001)
I will attend to the sessions of Chris Webb and Darren Green that are one of my main sources of inspiration and Allan Mitchell that is one of the authors of my favorite SSIS book at WROX.
Change knowledge with the BI comunity and see old and new friends.
Learn more about BI and SQL 2008 tools
Enjoy Birmingham lifestyle again!

I'm sure you have a lot of reasons like me to the conference. Dont waste more time, there are more than 400 people confirmed!!

For registations follow this link: http://www.sqlbits.com
There will be several sessions, I selected these ones:

Slot 1:
Master Data Management (MDM)
Taken by Sutha Thiru
Slot 2:
Extending SSIS with custom Data Flow components
Taken by Darren Green
Slot 3:
Understanding LINQ in .NET Framework V3.5 and Beyond
Taken by Mike Taulty
Slot 4:
Making more of SSIS in 2008
Taken by Allan Mitchell
Slot 5:
Using Analysis Services as a data source for Reporting Services reports
Taken by Chris Webb

The agenda: www.sqlbits.com/information/MainAgenda.aspx
I will book my flight and hotel tonight!!

And for some reason you cannot go to Birmingham, prepare to go to Glasgow on 10-May-2008... I will try to be there... so, vote in the sessions you want to see!! Follow the link image!

See you around!
Regards!

Sunday, October 14, 2007

BI CaseStudy - Slowly Changing Dimension (Type2)

The Problem
Store historic data in a relation ManyToMany between two Dimensions.

The Scenario
For each day I need to import via ETL about 5.000 rows of financial movements into a FactTable.
This data must be viewed by several dimensions: Entidade, Carteira, Mesa, Folder and FolderGroup. Each dimension has several variables that is related only by the dimension, and these variables must me store along time. So I need to store historic data for each dimension and In this project there is two important areas:
1. Entidade, Carteira, Mesa and Folder is a typically a Slowly Changing Dimension (Type2) and these dimension are structured as a snowflake. (Yes… I know Kimball doesn’t recommend the use of snowflakes, but I think for this case the data could be better understandable and the data model will be more easy to maintain in the future)
2. FolderGroup is a ManyToMany relation between Dimension Folder and Dimension Group and a Bridge to link the both.

Now, I’ll show the current structure and then I’ll explain my problem.
As I told you before, my problem is that I cannot create a Slowly Changing Dimension (Type 2) when there is a relation ManyToMany between Dimensions.
In my project I have the typically SCD (type2) that is working good and I fell data model very “clean” and understandable.

Sample Data from Slowly Changing Dimension (Type 2)
I divided Entidade on Entidade and EntidadeHistoric because the attributes never change and by this way I’ll avoid redundant data in the dimension. If for any unexpected reason I need to change ENT_Name or ENT_RelatorioKVaR, it will be a typically SCD (Type1) and I’ll do a classic update to the dimension row. The other reason of this division is that I only need to save historic data for hENT_RacioCooke. (I have other fields to store but for this example I’ll omit it)

The value 9020 is the Foreign Key to the Time Table and correspond to the date 10-10-2003.
So, from this picture we can see that for Lisbon, the field hENT_RacioCooke, between date 8888 and 9020 is 0,5 and between 9021 until now is 0,1.

Sample data for my Snowflake schema
I decided to create the relations below as Snowflake, because I felt that could be a better way to maintain the data in the future.
The business keys from each level (ENT_ID, CRT_ID, MESA_ID and FOLDER_ID) are in a string format to be coherent with the source data of ETL and to be more comprehensive for the admin users that will be analyzing data directly to OLAP and OLTP databases.

Based on the images posted before, I will show you how the SCD (Type 2) is working on my project.
I decided not include all the structure and focus only in the essential data to understand my pending problem.

As you can see in the above picture, this is my SCD (type 2). I’ll test if this structure is working good creating the follow SQL query:

And the result is:

Imagine you need to add a new field to the table Dados with the formula : D_MeasureX_ENT=D_MarketValue*hENT_RacioCooke.
For the same D_MarketValue value (10.000) in the days 9019 and 9023, the new field D_MeasureX_ENT is different because the RacioCooke between 8888 and 9020 was 0,5 and between 9021 until now is 0,1.

OK.. now that you understood my project, I will start describing my real problem. All I write above could be changed and discussed with us, but for now is working perfectly and for me is urgent resolve my current problem… but I accept suggestions for it!

In the project I have to group folders in groups and to stores the historic as I did before for Entidade, Carteira, Mesa and Folder. I remember you that each historic depends on each dimension, and I need to group 3 or 4 folders into a group and this group has a specific RacioCooke as Entidade, Carteira, Mesa and Folder. And the problem is that a folder in fact Dados could be in more than one group.
While in the SCD described before there is a relation OneToMany, and you could store your surrogateKey (Foreign Key) in the Fact Dados, in this case you couldn’t because there is a relation ManyToMany. I tried to use the structure below, but when I tested, didn’t work because it would create duplicate records in my fact Dados.

The Big Question

What is the hGRP_RacioCooke for the row 2131 in Fact DADOS for GRP01 (Investment) and for GRP02 (Negotiation)?

The Solution
I already found the solution and I will explain that later.

Kind Regards

Sunday, October 7, 2007

BI Tools - First look at Microsoft BI Tools

Dear Friends,
Following a suggestion of Rui Santos (master colleague), I'll try to write a draft about Microsoft BI Tools using my current BI project.
As you probably know, the current version of SQL Server 2005 has two separate environments, one for management and one for development.
The development environment is called Business Intelligence Development Studio (BIDS) and is focused mainly in business intelligence area. Includes a module for Data Integration (SSIS), another for Data Analysis (SSAS) and another for Report (SSRS). The management environment is called SQL Server Management Studio (SSMS) and it’s here that you do all the work managing your databases.

I don’t know yet the other BI tools, but what I heard from community is that Microsoft tools are more “user-friendly” than others. But in my master I will try SAS tools and I will know if it’s true or not.
I’m very satisfied with Microsoft, and If today BI is more popular is because Microsoft is investing a lot in BI technologies. Why I am say this? Well… is my opinion, but there was big improvements particularly in SQL 2005, SharePoint 2007 (usually used to file share in 2003 version), the acquisition of ProClarity (tools to analyze BI data), the acquisition of Dundas software (Report tools), Office 2007… and others…

For these reasons, I think BI is the right path to our careers, particularly Microsoft! But I think if we know the BI concepts, with some extra work we can use other tools like SAS, WebFocus, SAP and others without problems.

My current BI project is very complex and the old system was too difficult to understand (http://pedrocgd.blogspot.com/2007/06/ssis-temporary-image-to-msdn-forum.html) because it seemed like a data warehouse inside excel and access files… could you imagine that?!
In overall, the big difficulties was understanding the old system and define the best architecture.
Below I will show you some screenshots of Microsoft BI tools in my project.

SSIS - SQL Integration Services
As I wrote in a previous post, the goal of BI is provide the business with high data quality using intelligence! This module provides the right tools to transform, clean and audit data from several data sources and unsure that this data will be inserted correctly and compatible with destination structure – Is my concept of ETL.

Control Flow

Control Flow (Sequence container expanded)

SSAS - SQL Analysis Services
This module consisted of two major, and complementary, pieces of functionality: OLAP and Data Mining.
In this module you can add KPI (corporate measures), perspectives, translations and you must know the language of querying multi dimensional databases, MDX.
MDX is similar to SQL but while in SQL we query data from OLTP databases, with MDX we query data from OLTP databases.

Data Source View

Cube Designer

Calculations Designer (MDX Sintax)

Cube Browser

Cube Browser with some data

SSRS - SQL Reporting Services
Using the MDX, it's possible to create queries to an OLAP database and display here inside a table, matrix or in a chart.

Data designer (wizard)

MDX Data Designer

Report Designer

Don’t forget, this is a draft about each SQL BI module. If you think a more specific description of this subject could be useful, I will describe it better. (In the right panel you could see links of my preferred books)
Regards!