Data Warehousing



Author: Grzegorz Goli

Data warehousing explained

Data warehousing is a huge market estimated for $28 billion worth. It is also rapidly developing branch as it is estimated that the growth per annum can be as high as 20%.
This vast and promising market consists of: tools, technologies and methodologies that let not only the construction, usage, management, and maintenance of the hardware and software to be used for a data warehouse, but also as the actual data itself.

But let's start with stating what really data warehousing is?
To answer this question one must go back and take a look at the idea of data warehousing that dates back to the late 1980s when the "business data warehouse" was developed by IBM researchers Barry Devlin and Paul Murphy.

The data warehousing idea was meant to provide an architectural model for the flow of data from operational systems to decision support environments. Additionally, data warehousing software made an attempt to address the various problems connected with the flow. Most of them concerned the high costs associated with it. As a data warehousing architecture was non-existing, an enormous level of redundancy was needed in order to maintain multiple decision support environments. Multiple decision support environments operated independently in large corporations. Each environment served different users but often required much of the same stored data. Thus the process of gathering, cleaning and integrating data was usually partially replicated for every and each environment. The data was taken from different sources, mainly from long term operational systems already existing. What is more, as new decision support requirements appeared, the operational systems had to be often re- examined. Frequently it was the case that those new requirements necessitated gathering, cleaning and integrating the new data from so called data marts which were tailored for ready access by users.

In other words, today's data warehousing software provides a repository of data stored in a dedicated database (a data warehouse). Data warehouses are created to assist reporting and analysis. The above definition focuses mainly on data storage.
However, before data warehouse being fed, data can be analyzed, extracted, transformed, loaded and managed with the ETL processes.

One of the pioneers of data warehousing, Bill Inmon has defined it as: "a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". Subject- oriented means that data is arranged in a particular way thanks to which all the data elements relating to the same object or event in real world are combined together. By integrated Inmon meant that a coherent whole of data warehouse is created by gathering and merging data from various sources.
In other words, the data warehouse includes data from most or even all of a given organization’s operational systems and that data is made consistent. Non- volatile simply means that the data once committed cannot be over-written or deleted. The data is static, to be read only and retained for future reporting. Obviously, more data can be added anytime.

Data Warehousing software

ETL Data warehousing tutorial
ETL Data warehousing tutorialThe ETL tutorial demonstrates the real-life data warehousing scenarios with sample solutions. The course depicts typical datawarehousing problems and shows many ways to handle and...
ETL Tools Info
ETL Tools InfoBusiness Intelligence, Data warehousing and ETL Process information with vast business and technical knowledge about the most interesting solutions on the market. It is a good...
Database and Data Warehouse Tuning Principles
Database and Data Warehouse Tuning PrinciplesDatabase and Data Warehouse Tuning principles, experiments and troubleshooting techniques is an over 200 slides presentation in the form of tutorial on how to tune the DWH system....
Data warehousing articles
Data warehousing articlesThis knowledge base provides an overview of Data Warehousing and Business Intelligence topics with articles covering Data Integration, data migration and quality management,...
ETL Tools comparison
ETL Tools comparisonCommercial and open source ETL and data warehousing Tools comparison provides main advantages and disadvantages of the most popular data integration and ETL software on the...
Practical PDI Solutions
Practical PDI SolutionsTaming Your Data: Practical Data Integration Solutions with Kettle site provides the abstract, slides, and code sample of a the Roland Bauman's 2009 OSCON presentation on Pentaho...
DataWarehousing DataMining and BI resources
DataWarehousing DataMining and BI resourcesThe datawarehousingsites blog provides helpful information on Business Intelligence, Data Warehousing and Data Mining sites and resources. The information is very well...
Data warehouse real-world examples
Data warehouse real-world examplesThe data warehousing tutorial illustrates two real Data Warehousing scenarios. The first business case is a design of a real world data warehouse for Levis Strauss. Another is a...
Datamart definition
Datamart definitionThe article illustrates the role, concepts and various designs of a data mart in a data warehouse environment. It also shows the use of data marts in an organization by...
Data mart definition and overview
Data mart definition and overviewThis Data Mart article page contains general definition of data mart in a data warehouse system, it outlines the differences between them and the purpose of maintaining them in an...
Data Quality Through Metadata
Data Quality Through MetadataThe Implementing Data Quality Through Metadata article elaborates on the importance of the data quality in a data warehouse environment. It outlines the role of proper...
Enterprise open source for business
Enterprise open source for businessThe main goal of the article is to find the most appropriate model of enterprise Open Source which would be stable, reliable and cost effective. The article reveals pros and cons...
How to avoid database explosion
How to avoid database explosionThe author of that article points to a potential problem of Exploding multidimensional databases in the OLAP architectures. The article helps understand the problem of a database...
Data warehouse vs OLTP database
Data warehouse vs OLTP databaseThis tutorial guides users through the differences between a Data Warehouse database and OLTP database. The article lists major goals that each of the two databases try to meet....
Aspects of a DW architecture
Aspects of a DW architectureThe guide lists the different aspects of a data warehouse architecture. It takes into consideration data consistency, reporting data store and data modeling architecture. The...
Data warehouse architecture training articles
Data warehouse architecture training articlesData warehousing training and tutorial articles on the pages of Kimball Group. The data warehouse architecture articles cover all the relevant areas a data warehouse architect...
Developing a Data Warehouse architecture
Developing a Data Warehouse architectureConsiderations about the development of a data warehouse and data warehouse architecture requirements by Warren Thornthwaite. The tutorial contains real-life examples and takes...
Data mining techniques
Data mining techniquesThe data mining techniques article contains information about the main concepts of Data Mining, the data mining concepts, concepts of bagging, voting, boosting, building models...
The Data Warehousing Institute
The Data Warehousing InstituteTDWI (The Data Warehousing Institute) provides education, training, certification, news, and research for executives and information technology (IT) professionals worldwide and it...
BI DW Vendor selection forum
BI DW Vendor selection forumIndependent Data warehousing vendor selection discussion group. The forum is quite active and it's a good place to ask any questions regarding BI strategy in an...
DW Architecture
DW ArchitectureData Warehouse architecture and concepts explained with examples. The site contains comparisons between different DW designs (Star, Snowflake and Fact Constellation schema) and...
LearnDataModeling
LearnDataModelingThe LearnDataModeling site is aimed to help learning data warehousing. It contains information and learning materials about business intelligence, data modelling, database design...
Data Warehouse interview questions
Data Warehouse interview questionsData Warehouse most common Interview Questions with career tips, interview procedures in major companies and technical FAQs. The questions are grouped by technology and the...
Impact of IBM buying Cognos
Impact of IBM buying CognosAuthor of that article analyses potential impact of Cognos acquisition by IBM and tries to predict the future of Data Integration (Cognos ETL, Datastage and Information Server)...
Implementing Slowly Changing Dimensions
Implementing Slowly Changing DimensionsThe tutorial shows how to implement Slowly Changing Dimensions in a data warehouse. It lists all the major differences between different types of SCD (SCD Type 0 through to Type...
Todo BI
Todo BITodo BI, a StrateBI portal and blog provides information on Business Intelligence, Data Warehouse, CRM and ETL. Great source of information on Open Source Business Intelligence...
1keydata Business Intelligence
1keydata Business Intelligence1keydata Data warehousing portal aims to help people get a good high-level understanding of what it takes to implement a successful data warehouse project. A lot of the...