Monday, September 29, 2008

BI Education - SQLBI Methodology

Marco helped me two years ago in a dimensional model design problem. He probably does not remember me, but I’m following his work, mainly all the stuff relating to dimensional modelling. Marco published last week the first draft for an alternative purpose to Inmon and Kimball approaches. I spent yesterday and today reading all the 44 pages of this first draft. And came to me all the issues and problems that I felt in the past… it’s really a very good approach not only for BI professionals using Microft tools.

I’m comment Marco Russo and Albert Ferrari draft, only because I want to contribute with my opinion for their project and advertise it. And I would be happy if professionals and friends like Subhash, Bruno, Amin, Tomislav, Alex, Ella and all other experts give their opinions about this subject.

It’s an approach near the real world and my next comments are only to clarify myselft and make this draft even better. Like they say in the draft:
“Always remember that these are not fixed and immutable rules. Your specific implementation of the architecture will probably be constrained by other considerations. You – as the BI analyst – are the one and only person that will be able to take these decisions in a correct way. “

Download the draft in the official website

Some comments/questions

1. The main use of the configuration database is for dealing with bookkeeping information and with data that come from more than one source? An example is to deal with the same customer that comes from two different sources with a different name… Which more funcionalities this configuration database could have? I think this configuration database should be shared with several BI solutions and use SQL schemas to split them. If you have one database configuration for each BI solution you’ll have more cost and more work to maintain it.


2. I’m agree that mirror database, filters only the required data, and the staging area is a relational database that will be a bridge between mirror and datawarehouse denormalized database. If you need to process all the cubes again, datawarehouse database stores all the historical data or you need to go again to mirror database or operational sources?! (sometimes the historic data in operational sources is available only once)

3. SQL Schemas could create problems in third party products, and I always avoid them. What you think about it? In Data warehouse or Data Mart databases you change the schemas used in Staging Area, could be good for understand the solution, but I’m afraid that could be an issue in ETL and OLAP designs. What you think?

4. Your definition of Metadata OLTP DB makes me a confusion… do you have views to OLTP operational systems to work as a bridge to DW in case of not using a staging area, but normally metadata is contextualizing your data… describing what each field, each table, each database is … I think you could explain this better.

5. Following your draft, in Kimball approach, the Data Warehouse database has a denormalized structure in spite Inmon aproach that has a normalized structure. Am I correct? There is some confusion in this point…

6. I don’t think that in SCD type 2 the status column should be used. I only understand the use of this column in this type of dimensions, only for users that goes inside database and need a fast understand of the current row, because in practice, if you lookup for the status column, you may have a problem if you need to import rows from a previous date (late arriving facts)… and you will map this rows to the status and could not be true

7. My defition of ODS was a little different. Your ODS description (only used by softwares) may cause several views of “true”. What do you think?

8. I’m completed agree with you about Natural and Surrogate Key in date dimension. In my last project I used Surrogate Key as an identity integer, that make me felt lost. I always needed to join the fact table with date dimension to understand the mean of each fact.

9. In page 36 when you wrote “Mirroring is not part of the ETL. “, I think create the mirror is the extract step of an ETL (ExtractTransformLoad)… you don’t do any transformations… only the basics like change column names and extract the data, and if you can apply some filters. Changing column datatype could generate an error… and we must avoid accessing source operational systems more than once

10. In pag 42, in which database you store logs added by SSIS Log provider? (Sysdtslogs). I like the answer you gave to the question “what is the difference between log and audit?” and also the use of audit information directly in the cubes but hidden for business users. Some people are a little confused in this subject.

11. How BI analysts control the data lineage? And how we can make an impact analyst on a change in a database?

Improvements
1. I think could be interesting include some standards in database objects (tables, views and stopred procedures names). Which nomenclatures you normal use to each object in each database (Mirror, Staging, ODS and DW e DMart)?
2. More focus on business and technical metadata of each database.
3. Create an addittional shared database for SSAS cubes performance using used based optimization. (Activating query logs of olap cubes and creating aggregations based on those logs)
4. In configuration database described, could be interesting the use of data profiling
5. Include a data mining structure
The original image was mofified by me and got it from draft 1.0.

I think that some comments could be basic for some of you, but because there aren’t any standards and any ISO for this subject, BI Analyst still some has doubts about the best pratiques to follow implementing a BI solution

So, I hope to receive Marco and Albert here to clarify me the points described before. It’s a very important step that these guys are doing, because they are creating rules, standards and make we speak the same language.
Good Luck!!


Updated 29-09-2008 with Albert Ferrari response. I will reply in the comments section.
Albert and Marco response:
Pedro,
I'll try to give you some answers, even if the space here is not enough for a complete discussion:
1. Yes, the config DB can be shared between different solutions for the same customer even because, in the DWH world, "different solutions for one customer" means "One complex solution for one customer, composed of different subject areas". :)

2. In some situation we store in the OLTP Mirror several snapshots of OLTP tables, in order to be able to rebuild the history. Clearly the DWH contains all the history but, sometimes, you will need to rebuild the DWH too...

3. I would not avoid schemas in order to satisfy third party tools. If some tool need to query the DWH then it will have its views and, if it cannot use schemas, then its views will live in the dbo schema.

4. OLTP Metadata is just a set of views. If you cannot build them in the OLTP DB and you cannot afford an OLTP Mirror, then an OLTP Metadata might help.

5. The Kimball approach is de-normalized but the extent of normalization that is needed in an Inmon one is not very clear, it all depends from the analyst. Inmon leaves all these "details" to the analyst, giving just some advices.

6. I agree with you. What else could I say? :)

7. What is your definition of ODS?

8. Yep. Even if, sometimes, I had to change the key, using some very creative way of expressing a date. If you have several billions rows then the INT representation of YYYYMMDD might be a problem and a smallint solution might be helpful. Clearly 20080101 is not a smallint... so you'll need some differente ways to express the date.

9.I don't understand your note. During mirroring you should not do any kind of transformation and/or column rename. If a column is called C1P8D3B0 ans is of type INT, then you will have that column name and type in the OLTP mirror and a VIEW that will change name and type when you will extract from the mirror.

10. I think the log DB is the perfect place for thos logs.

We are about to publish a second whitepaper that might be interesting to look at, it will provide a demostration of the methodology on the very (ab)used AdventureWorks DB.
Thanks alot for your comments, I will use them and your improvement section for a future review of the whitepaper.
Alberto

Thursday, September 25, 2008

BI Conference - Microsoft Events


Dear Friends,
Microsoft will organize next month a big event about Business Intelligence. Unfortunatelly I don’t have budget to go to this year, but if you are near Seattle (USA) don’t lose it… My friend Bruno Ferreira, that is currently working for http://www.ciall.com/ in Irland (Dublin) will be there… and is because that he’s a very good professional and he’s working for passion like me! Maybe next year I will join you there!!!

And for now… and because I don’t have budget for more events this year, I will be next week here in Lisbon to see Steve Ballmer, Microsoft CEO.

If you are near Lisbon, dont lose this event with Steve Ballmer and several good sessions, mainly in Performance Point Server, that I'll not miss!!!
Regards,
Pedro

Thursday, September 18, 2008

BI Book - Free SQL 2008 Book

Dear Friends,
Only to alert you that Microsoft Press is sharing with comunity a free SQL 2008 book.

It's available for a free download here:
http://csna01.libredigital.com/?urss1q2we6
If you want to begin the SQL 2008 Microsoft Certifications, could be a good start... and free!
Regards!
Pedro

Wednesday, September 17, 2008

BI Conference - SQL Bits III

Once again I went to the SQLBIts conference as I promised. You can refresh the previous event following this link http://pedrocgd.blogspot.com/2008/03/tmp.html
I started to write this post in the University room booked to attend the SQLBits III event. Almost persons that attended to the event already returned to their homes, and unfortunately I had flight only Sunday morning at 8 am. I'm joined with my friends Bruno Ferreira (Portuguese like me that came from Irland), Alex Whittles (that came from London with his boss) and Tomislav Piasevoli (that came from Zagreb, Croacia). The event was significantly more interesting with these 3 excelent guys! I attached to this post several photos only to refresh the memory of people that was there and allow others to imagine how the event was.I have several things to write about this weekend in London, which was extremely good in some points and a litle dispappoint in others.

The University of Hertfordshire
This is the University where the conference toke place. Is a nice place but too quite for a person that goes alone from another country… Bruno was also there but he went to Irland on Saturday after the conference…

Last SQL Bits, in Birmingham (UK), the location of conference was better because it was near the city centre and because I went with my girlfriend, she could went to shopping while I was attending the event



The Training Day (Friday)
SQL Bits III organization included an extra day for intensive training that cost between 150£ and 200£ depending on the selected subjects. For those persons that never saw MSFT BI Tools and Business Intelligence approachs from Kimball an Inmon, I think that was good, but for others was extremely basic. But was my fault, because Chris Webb advised me… but I’m disappointed because I didn’t see the same organization and the same environment in training day (on Friday) than the free conference day (on Saturday) … and for those (like me and Bruno) who payed for the training day we make a question for ourselves… for which day we payed 150£?! Why we make the question? The reason is simple… the free conference had several partners (Microsoft, Quest, Xlcubed and others), extra events like dinner with beers and pizzas, more than 400 persons, a party and some other interesting stuff. We also thought the training day would be an intensive training using the laptop and making some advanced labs mainly in MDX and Data Mining.

The Conference Day (Saturday)
Some photos of conference environment and a photo of Bruno picking some pizza and beer! Bruno is the guy with black t-shirt! You can also see the taller guy… He’s Alex Whittles that I referenced before and he’s a cool and friendly guy!




The Conference Day Sessions
I’ll describe a a little resume of each session that I attended, maybe I will go into detail in a next post, but for now, I’ll do a short briefing.

Alan Mitchell session
Getting more value from your data through data mining
After seeing the Rafal Webcasts, it’s difficult to find a better session about Data Mining, but Allan was really great and focus on excel 2007 add-In for Data Mining. Allan showed how the business user can apply a Data Mining algorithm without statistical and developing skills. Don’t need to know formulas, don’t need to know programming, it’s all there!!!


The hard task will be the ETL that is the most important step in a Data Mining project. If the dataset is not well created, even the top statistical expert of the world cannot identify real patterns and trends

Gary Crawford session
Data Vizualisation - Make your reports useful to business - No more "Speed Dials"
Maybe some people didn’t saw anything new, but I loved this session. Gary gave some very good tips that must be taken when you create a report or dashboard. He started from default charts generated by excel wizard and gave some design tips to create more friendly and understandble charts for business reports

He also showed some gauges, thermometers and traffic lights usually used in some business reports. He explain some best pratiques that should be followed in order to avoid confused and unperceptable reports

And as he told… running a business is not driving a car. Cars are real time, need real time response. Business reports don’t move as you read them. He finished giving some tips for a good and more usefull report or dashboard


Andrew Sadler session
Microsoft's integrated end-to-end BI offering
I never heard about Andrew Sadler before, and was really a very good surprise for me. He was fast, precisely and efficiency, in the way he explained the integration that could be done with Sharepoint, Performance Point Server and ProClarity

In the demo prepared for the session, he used the Sharepoint portal as the main interface/front office and behind the scenes he created several different types of reports and charts in Performance Point Server (including proClarity) showed inside normal webparts of Sharepoint portal

Andrew also show some tips using MDX that could be applied on filtering the reports and charts created in the previous steps


Andrew Wiles session
Does your cube support your users reporting needs?
As Andrew Sadler, Andrew Wiles was also a very good surprise for me and for almost persons that was attending the session. He focused in user reporting needs as Gary in a previous session, not in design but in funcionalities… for example how a simple indicator YearToDate could be available to the business user? You will respond that I can create a calculated member or allow user to do that using the MDX functon YTD… but Andrew showed a new approach that is abstracting all this stuff using reporting dimensions… for budget reports I believe that could be helpfull… I’m still tring to understand all the the stuff and how could apply in the real world…. I’m ansious to see the material that Andrew promised to share with us…

This session was really the more original and harder! Check Alex post for further details http://www.purplefrogsystems.com/blog/?p=18

Colin Hardie session
Choices, Choices - Loading and Modelling SCD in SQL 2008 Suite
Colin Hardie work for the IMGroup (the main Microsoft partner) and showed the approach about Slowly Changing Dimensions (SCD) from Kimball and the capabilities of SCD transform of SQL Integration Services.

Nothing new, only the different alternatives to deal with Slowly Changing Dimensions, but Colin is very comunicative and also fast and precisely!

In the final, Simon Sabin, invited the attendences to eat some pizza, drink some beers and singing a song in auditorium!

Thanks to the organanizers, that although some little disappointing points, I'm sure that they did everything they could to do a fantastic event as it was in SQLBits III. Thanks to Simon, Chris Webb, Allan Mitchell, Tony, Darren Green and Jamie.
See you guys in next SQLBits!

Friday, September 5, 2008

BI Certification – Microsoft MCTS 70-445 Exam Passed

Yes!!!
One of the goals that I’ve proposed to achieve this year is almost accomplished! Today I made the Microsoft exam 70-445 (MCTS), and next week I will do the exam 70-446 (MCITP).

The exam 70-445 that I done today includes 3 courses:
Course 2791: Implementing and Maintaining SQL 2005 Analysis Services (3 days)
Course 2792: Implementing and Maintaining SQL 2005 Integration Services (3 days)
Course 2793: Implementing and Maintaining SQL 2005 Reporting Services (3 days)
http://www.microsoft.com/learning/en/us/exams/70-445.mspx

The following list includes the topic areas covered on this exam. The percentage indicates the portion of the exam that addresses a particular skill.

Managing SSAS (12 percent)
Developing SSAS Solutions by Using BIDS (18 percent)
Implementing data mining by Using BIDs (12 percent)
Managing SSRS (13 percent)Developing Reporting Solutions by Using SSRS (19 percent)
Developing Business Intelligence Solutions by Using SSIS (16 percent)
Administering SSIS Packages (11 percent)


The exam wasn’t easy, and I confess that during the exam I was a little afraid, because the questions were very specific and it required a lot of attention! Happily I passed with 842 (84,2%) (minimum is 700)
I have some experience in these subjects for that reason I only studied with MCTS Training Kit. This kit has a book, several labs and about 200 exams to train.

My exam had 44 questions and I felt that almost questions were focused on Reporting Services and Data Mining. I suggest that you get this training book and go inside into the properties of each module. (SSIS, SSAS, SSRS and Data Mining) And some subjects that you must study hard:

Report Model and Report Builder
Data mining algorithms and parameters
Data Mining lift and profit chart (Check the accuracy of your DM Model)
SSIS Transactions
SSAS Security issues
SSAS Performance (Usage based optimization / design aggregations)
SSAS MOLAP (automatic and schedule), ROLAP and HOLAP
DMX Syntax (for data mining models)
SSRS Subreports

If you follow the training kit and focus on the tips above, and are familiar with Microsoft tools I think is sufficient to pass. But some visitors will ask... why SQL 2005 and not SQL 2008? Because SQL Exams is not available yet, and it’s a way to prepare to SQL 2008 and learn more about Microsoft tools. Following this, I already made my registration for the new exams that will be available this month.
(If you register now, you will have 40% discount in the exam price)

Register for the "First to Know" Visual Studio 2008 and SQL Server 2008 Exam Offer
Be Among the First to Certify
http://www.microsoft.com/learning/mcp/sqlvs/offer/default.mspx
Regards!!
Pedro

LinkWithin

Related Posts Plugin for WordPress, Blogger...