Data warehousing explained
Data warehousing is a huge market estimated for $28 billion worth. It is also rapidly developing branch as it is estimated that the growth per annum can be as high as 20%.
This vast and promising market consists of: tools, technologies and methodologies that let not only the construction, usage, management, and maintenance of the hardware and software to be used for a data warehouse, but also as the actual data itself.
But let's start with stating what really data warehousing is?
To answer this question one must go back and take a look at the idea of data warehousing that dates back to the late 1980s when the "business data warehouse" was developed by IBM researchers Barry Devlin and Paul Murphy.
The data warehousing idea was meant to provide an architectural model for the flow of data from operational systems to decision support environments. Additionally, data warehousing software made an attempt to address the various problems connected with the flow. Most of them concerned the high costs associated with it. As a data warehousing architecture was non-existing, an enormous level of redundancy was needed in order to maintain multiple decision support environments. Multiple decision support environments operated independently in large corporations. Each environment served different users but often required much of the same stored data. Thus the process of gathering, cleaning and integrating data was usually partially replicated for every and each environment. The data was taken from different sources, mainly from long term operational systems already existing. What is more, as new decision support requirements appeared, the operational systems had to be often re- examined. Frequently it was the case that those new requirements necessitated gathering, cleaning and integrating the new data from so called data marts which were tailored for ready access by users.
In other words, today's data warehousing software provides a repository of data stored in a dedicated database (a data warehouse). Data warehouses are created to assist reporting and analysis. The above definition focuses mainly on data storage.
However, before data warehouse being fed, data can be analyzed, extracted, transformed, loaded and managed with the ETL processes.
One of the pioneers of data warehousing, Bill Inmon has defined it as: "a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". Subject- oriented means that data is arranged in a particular way thanks to which all the data elements relating to the same object or event in real world are combined together. By integrated Inmon meant that a coherent whole of data warehouse is created by gathering and merging data from various sources.
In other words, the data warehouse includes data from most or even all of a given organization’s operational systems and that data is made consistent. Non- volatile simply means that the data once committed cannot be over-written or deleted. The data is static, to be read only and retained for future reporting. Obviously, more data can be added anytime.
Data Warehousing software
The ETL tutorial demonstrates the real-life data warehousing scenarios with sample solutions. The course depicts typical datawarehousing problems and shows many ways to handle and...
Business Intelligence, Data warehousing and ETL Process information with vast business and technical knowledge about the most interesting solutions on the market. It is a good...
Google BigQuery is a fully managed, serverless data warehouse that allows businesses to analyze large datasets in real time using SQL-like queries. It is part of the Google...
Databricks Data Lakehouse is a unified platform that combines the capabilities of data lakes and data warehouses. It provides a single, integrated environment where...
Snowflake is a cloud-native data warehousing platform designed for high performance, scalability, and ease of use. It separates storage, compute, and services, allowing...
Database and Data Warehouse Tuning principles, experiments and troubleshooting techniques is an over 200 slides presentation in the form of tutorial on how to tune the DWH system....
This knowledge base provides an overview of Data Warehousing and Business Intelligence topics with articles covering Data Integration, data migration and quality management,...
Taming Your Data: Practical Data Integration Solutions with Kettle site provides the abstract, slides, and code sample of a the Roland Bauman's 2009 OSCON presentation on Pentaho...
The datawarehousingsites blog provides helpful information on Business Intelligence, Data Warehousing and Data Mining sites and resources. The information is very well...
The data warehousing tutorial illustrates two real Data Warehousing scenarios. The first business case is a design of a real world data warehouse for Levis Strauss. Another is a...
The article illustrates the role, concepts and various designs of a data mart in a data warehouse environment. It also shows the use of data marts in an organization by...
This Data Mart article page contains general definition of data mart in a data warehouse system, it outlines the differences between them and the purpose of maintaining them in an...
The Implementing Data Quality Through Metadata article elaborates on the importance of the data quality in a data warehouse environment. It outlines the role of proper...
The main goal of the article is to find the most appropriate model of enterprise Open Source which would be stable, reliable and cost effective. The article reveals pros and cons...
The author of that article points to a potential problem of Exploding multidimensional databases in the OLAP architectures. The article helps understand the problem of a database...
This tutorial guides users through the differences between a Data Warehouse database and OLTP database. The article lists major goals that each of the two databases try to meet....
The guide lists the different aspects of a data warehouse architecture. It takes into consideration data consistency, reporting data store and data modeling architecture. The...
Data warehousing training and tutorial articles on the pages of Kimball Group. The data warehouse architecture articles cover all the relevant areas a data warehouse architect...
Considerations about the development of a data warehouse and data warehouse architecture requirements by Warren Thornthwaite. The tutorial contains real-life examples and takes...
The data mining techniques article contains information about the main concepts of Data Mining, the data mining concepts, concepts of bagging, voting, boosting, building models...
TDWI (The Data Warehousing Institute) provides education, training, certification, news, and research for executives and information technology (IT) professionals worldwide and it...
Independent Data warehousing vendor selection discussion group. The forum is quite active and it's a good place to ask any questions regarding BI strategy in an...
Data Warehouse architecture and concepts explained with examples. The site contains comparisons between different DW designs (Star, Snowflake and Fact Constellation schema) and...
The LearnDataModeling site is aimed to help learning data warehousing. It contains information and learning materials about business intelligence, data modelling, database design...
Data Warehouse most common Interview Questions with career tips, interview procedures in major companies and technical FAQs. The questions are grouped by technology and the...
Author of that article analyses potential impact of Cognos acquisition by IBM and tries to predict the future of Data Integration (Cognos ETL, Datastage and Information Server)...
The tutorial shows how to implement Slowly Changing Dimensions in a data warehouse. It lists all the major differences between different types of SCD (SCD Type 0 through to Type...
Todo BI, a StrateBI portal and blog provides information on Business Intelligence, Data Warehouse, CRM and ETL. Great source of information on Open Source Business Intelligence...
1keydata Data warehousing portal aims to help people get a good high-level understanding of what it takes to implement a successful data warehouse project. A lot of the...