Data Warehousing



Author: Grzegorz Goli

Data warehousing explained

Data warehousing is a huge market estimated for $28 billion worth. It is also rapidly developing branch as it is estimated that the growth per annum can be as high as 20%.
This vast and promising market consists of: tools, technologies and methodologies that let not only the construction, usage, management, and maintenance of the hardware and software to be used for a data warehouse, but also as the actual data itself.

But let's start with stating what really data warehousing is?
To answer this question one must go back and take a look at the idea of data warehousing that dates back to the late 1980s when the "business data warehouse" was developed by IBM researchers Barry Devlin and Paul Murphy.

The data warehousing idea was meant to provide an architectural model for the flow of data from operational systems to decision support environments. Additionally, data warehousing software made an attempt to address the various problems connected with the flow. Most of them concerned the high costs associated with it. As a data warehousing architecture was non-existing, an enormous level of redundancy was needed in order to maintain multiple decision support environments. Multiple decision support environments operated independently in large corporations. Each environment served different users but often required much of the same stored data. Thus the process of gathering, cleaning and integrating data was usually partially replicated for every and each environment. The data was taken from different sources, mainly from long term operational systems already existing. What is more, as new decision support requirements appeared, the operational systems had to be often re- examined. Frequently it was the case that those new requirements necessitated gathering, cleaning and integrating the new data from so called data marts which were tailored for ready access by users.

In other words, today's data warehousing software provides a repository of data stored in a dedicated database (a data warehouse). Data warehouses are created to assist reporting and analysis. The above definition focuses mainly on data storage.
However, before data warehouse being fed, data can be analyzed, extracted, transformed, loaded and managed with the ETL processes.

One of the pioneers of data warehousing, Bill Inmon has defined it as: "a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". Subject- oriented means that data is arranged in a particular way thanks to which all the data elements relating to the same object or event in real world are combined together. By integrated Inmon meant that a coherent whole of data warehouse is created by gathering and merging data from various sources.
In other words, the data warehouse includes data from most or even all of a given organization’s operational systems and that data is made consistent. Non- volatile simply means that the data once committed cannot be over-written or deleted. The data is static, to be read only and retained for future reporting. Obviously, more data can be added anytime.

Data Warehousing software

ETL Data warehousing tutorial
ETL Data warehousing tutorialThe ETL tutorial demonstrates the real-life data warehousing scenarios with sample solutions. The course depicts typical datawarehousing problems and shows many ways to handle and...
ETL Tools Info
ETL Tools InfoBusiness Intelligence, Data warehousing and ETL Process information with vast business and technical knowledge about the most interesting solutions on the market. It is a good...
Database and Data Warehouse Tuning Principles
Database and Data Warehouse Tuning PrinciplesDatabase and Data Warehouse Tuning principles, experiments and troubleshooting techniques is an over 200 slides presentation in the form of tutorial on how to tune the DWH system....

Data warehousing articles
Data warehousing articlesThis knowledge base provides an overview of Data Warehousing and Business Intelligence topics with articles covering Data Integration, data migration and quality management,...
ETL Tools comparison
ETL Tools comparisonCommercial and open source ETL and data warehousing Tools comparison provides main advantages and disadvantages of the most popular data integration and ETL software on the...
Practical PDI Solutions
Practical PDI SolutionsTaming Your Data: Practical Data Integration Solutions with Kettle site provides the abstract, slides, and code sample of a the Roland Bauman's 2009 OSCON presentation on Pentaho...
DataWarehousing DataMining and BI resources
DataWarehousing DataMining and BI resourcesThe datawarehousingsites blog provides helpful information on Business Intelligence, Data Warehousing and Data Mining sites and resources. The information is very well...
Data warehouse real-world examples
Data warehouse real-world examplesThe data warehousing tutorial illustrates two real Data Warehousing scenarios. The first business case is a design of a real world data warehouse for Levis Strauss. Another is a...
Datamart definition
Datamart definitionThe article illustrates the role, concepts and various designs of a data mart in a data warehouse environment. It also shows the use of data marts in an organization by...
Data mart definition and overview
Data mart definition and overviewThis Data Mart article page contains general definition of data mart in a data warehouse system, it outlines the differences between them and the purpose of maintaining them in an...