Data Warehousing



Data warehousing explained

Data warehousing is a huge market estimated for $28 billion worth. It is also rapidly developing branch as it is estimated that the growth per annum can be as high as 20%.
This vast and promising market consists of: tools, technologies and methodologies that let not only the construction, usage, management, and maintenance of the hardware and software to be used for a data warehouse, but also as the actual data itself.

But let's start with stating what really data warehousing is?
To answer this question one must go back and take a look at the idea of data warehousing that dates back to the late 1980s when the "business data warehouse" was developed by IBM researchers Barry Devlin and Paul Murphy.

The data warehousing idea was meant to provide an architectural model for the flow of data from operational systems to decision support environments. Additionally, data warehousing software made an attempt to address the various problems connected with the flow. Most of them concerned the high costs associated with it. As a data warehousing architecture was non-existing, an enormous level of redundancy was needed in order to maintain multiple decision support environments. Multiple decision support environments operated independently in large corporations. Each environment served different users but often required much of the same stored data. Thus the process of gathering, cleaning and integrating data was usually partially replicated for every and each environment. The data was taken from different sources, mainly from long term operational systems already existing. What is more, as new decision support requirements appeared, the operational systems had to be often re- examined. Frequently it was the case that those new requirements necessitated gathering, cleaning and integrating the new data from so called data marts which were tailored for ready access by users.

In other words, today's data warehousing software provides a repository of data stored in a dedicated database (a data warehouse). Data warehouses are created to assist reporting and analysis. The above definition focuses mainly on data storage.
However, before data warehouse being fed, data can be analyzed, extracted, transformed, loaded and managed with the ETL processes.

One of the pioneers of data warehousing, Bill Inmon has defined it as: "a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". Subject- oriented means that data is arranged in a particular way thanks to which all the data elements relating to the same object or event in real world are combined together. By integrated Inmon meant that a coherent whole of data warehouse is created by gathering and merging data from various sources.
In other words, the data warehouse includes data from most or even all of a given organization’s operational systems and that data is made consistent. Non- volatile simply means that the data once committed cannot be over-written or deleted. The data is static, to be read only and retained for future reporting. Obviously, more data can be added anytime.

Data Warehousing software

ETL Data warehousing tutorial
ETL Data warehousing tutorialThe ETL tutorial demonstrates real life data warehousing scenarios with solutions. The section depicts typical datawarehousing problems and shows many ways to handle them in...
ETL Tools Info
ETL Tools InfoBusiness Intelligence, Data warehousing and ETL Process information with vast business and technical knowledge about the most interesting solutions on the market. It is a good...
Data warehousing articles
Data warehousing articlesThis knowledge base provide an overview of Data Warehousing and Business Intelligence with articles covering Data Integration, data migration and quality management, Enterprise...
ETL Tools comparison
ETL Tools comparisonCommercial and open source ETL and data warehousing Tools comparison provides main advantages and disadvantages of the most popular data integration and ETL software on the...
Practical PDI Solutions
Practical PDI SolutionsTaming Your Data: Practical Data Integration Solutions with Kettle site provides the abstract, slides, and code sample of a the Roland Bauman's 2009 OSCON presentation on Pentaho...
DataWarehousing DataMining and BI resources
DataWarehousing DataMining and BI resourcesThe datawarehousingsites blog provides helpful information on Business Intelligence, Data Warehousing and Data Mining sites and resources. The information is very well...
Data warehouse real-world examples
Data warehouse real-world examplesThe data warehousing tutorial illustrates two real Data Warehousing scenarios. The first business case is a design of a real world data warehouse for Levis Strauss. Another is a...
A Detailed Look at Data Replication Options for DR Planning
A Detailed Look at Data Replication Options for DR PlanningThe article helps explore the different approaches for integrating data replication into your DR strategy. As IT organizations are increasingly faced with more stringent SLAs,...
Data Quality Through Metadata
Data Quality Through MetadataThe Implementing Data Quality Through Metadata article elaborates on the importance of the data quality in a data warehouse environment. It outlines the role of proper...