Data warehouse software comparison
What are the best sides of each data warehousing platform and what are the worst? Which one seems the best? Finally, which platform to invest in?
Finding the answer to the questions stated above is extremely difficult and depends on a customer. Anyhow, it's impossible without checking a proper comparison.
The idea of the comparison was to evaluate the most popular solutions currently available at the market. Therefore, seven vendors were selected.
DW software evaluation criteria
Multitude of vendors fighting for market share, offered solution, versions, editions, and licenses, results in strong rivalry and - as a consequence - need to choose ones in the running and include them in a comparison. What exactly has decided of including (excluding) in (out of) the comparison? Data warehousing itself demands a few features responsible determining core functionality, all of them had to be considered in chosen DW platforms:
- a possibility of managing structured analytic data within platform's engine
- a possibility of integrating with different management systems
- a possibility of optimizing queries structure
- a possibility of optimizing load processes
- a possibility of direct usage of SQL for querying against relational and multidimensional data.
- Self-dependence - all the included vendors mustn't rely on external applications. They have to be fully functional on their own.
- Data warehousing by each vendor must be profitable and gain at least $30 million revenue.
- Market share measured through customers interest.
|Vendor name||Products and version considered|
|IBM||InfoSphere Balanced Warehouse 9.5|
|SAP||NetWeaver BI (Business Warehouse) 7.0|
|Teradata||Active Enterprise Data Warehouse 5550, Data Warehouse Appliance 2550, Data Mart Appliance 550, 12|
|Microsoft||SQL Server 2008|
|Oracle||Optimized Warehouses, Database 11g, Warehouse Builder 11g|
|Sybase||Analytic Appliance, IQ 12.7|
|Netezza||Performance Server 1000 Series Data Warehousing Appliance 4.5|
All the vendors included have been examined from the three points of view. Even though, the list of criteria is long, all of them might be boiled down to three general aspects.
- First of them, offering, applies to the universality of each vendor's services. Hereby, architectures, structures and functionality has been analyzed.
- Management evolves incessantly, alongside with customers' requirements. How does each vendor meet inconstant needs? Strategy criteria apply to plans and scenarios each vendor is going to follow.
- Even the best solutions would lack functionality but for compatibility with other ones. Therefore, market presence of each vendor has been analyzed. Its partnerships, but also financial condition and tendencies.
Data warehousing platforms overview
First look into review's result, brings mixed feelings. In the very beginning, appeared one significant division. The first group of vendors keep a lion share of the market, offering solutions perfect for most customers. On the other hand, vendors placed in the second group aren't any worse - they fit perfectly some specialized types of customers, rather for niche - but also important - deployments.
As it's been stated, the first group of vendors - including Teradata, Oracle, IBM and Microsoft - seems to win the comparison:
- Teradata solutions might be praised for their better than ever scalability, maturity and flexibility. Worth noticing is also a growth of functionality - by 2550 and 1250 platforms - adjusted to the whole variety of customers' demands.
- Oracle and - same - IBM expand their solutions, thereupon their growing scalability and affordability deserve distinction.
- Finally, Microsoft seems to be a melody of the future - the Redmond giant continues expanding its empire and - throughout strategic acquisitions and development - improve its strong presence.
The second group, including three remaining vendors - SAP, Netezza, and Sybase - offers specialized solutions which popularity is restrained by customers' requirements.
- SAP is quite recognized for its lack of flexibility. However, its popularity and technological advance is worth noticing.
- The data warehouse appliances for midmarket are still the most important goal of Netezza. Growth of Netezza's popularity for enterprise data warehousing should be a matter of time.
- Scalability of tried and true columnar database, along with perfectly worked out queries against largest table aggregates, make Sybase one of the most interesting enterprise data warehousing solutions vendors. However, it's been classified lower mainly because lack of support for shared-nothing MPP and poor support fo specialized user-defined functions.
Please refer to the descriptions below for more detailed information on the listed platforms.
Even though all the vendors included in the comparison offer efficient and trustworthy solutions, each one differs a bit from the rest. Teradata became - maybe a bit surprisingly - a leader of ranking, but no one can say it's undeserved. The first and also the most significant feature of Teradata solutions is their scalability. Undoubtedly, sales strategy was a key to the success. Teradata corporation attached weight to their products availability - price-competition brought on wider range of interest from customers representing different market segments. As a consequence, need of fulfilling extremely different expectations resulted in improved universality of the solutions. Teradata's portfolio also focus attention not only for 550, 2550, and 550 enterprise data warehousing solutions, but also standalone licenses, databases, tools, and utilities.
In enterprise data warehousing area, as a main target for Teradata, a few important features let Teradata distinguish itself, overshadowing the rivals:
- While comparing to other vendors, appears the real strength of options offered by Teradata. EDW packaging, pricing, and licensing options make it one of the most adjustable and - therefore - suitable solution.
- Teradata designers understand a need of wide cooperation. Solutions are well-prepared for supporting and managing work of external application and middleware.
- Wide range of products and services made Teradata the best choice for customers of different size and requirements. All of them have a possibility of adjusting solution to make it suit them best.
- Optimized especially for analysis, database management system attests to the strength of Teradata portfolio.
- Almost legendary scalability allows Teradata customers to scale data warehouses through massively parallel processors (MPP) to a few petabytes across over thousand nodes, and, finally, store them in different enterprise data warehouses or business intelligence topologies.
- Teradata enterprise data warehouses let customers manage diverse workloads - reporting, query, OLAP, inner analytics, and prime extract, transform, and load.
- Finally, worth mentioning is Teradata's functionality in areas of in-database analysis, workload managing, query optimizing, indexing, partitioning, compressing, and caching.
In spite of all the praises Teradata truly deserves, its autonomy seems to have gone a bit over the top. Solutions by Teradata demands exclusively determined hardware and software - Teradata hardware units, Intel Xeon processors, and Bynet interconnect, while refusing working on other platforms. Furthermore, Teradata doesn't provide SaaS on-demand EDW. First mentioned restriction seems particularly painful as it quite significantly limits solutions' availability.
To sum up, Teradata's market position is already established and company's recent strategies apply especially to maximizing each solution best features. Incessantly proceeded scalability and modularity, along with cost-competition, might result only in even better sale and popularity.
Oracle Database 11g, and Oracle Warehouse Builder (OWB) are the tools that have let Oracle reach its great position in the comparison, along with significant market share. Efficient partnership hosted by Oracle is also worth noticing, the same as solutions' high performance and affordability.
What factors are responsible for Oracle's success?
- Every enterprise data warehousing solution by Oracle is equipped with efficient database management system.
- EDWs' scalability - they're might be scaled out to several nodes able to persist hundreds of terabytes.
- Rationalized different EDW and BI topologies supporting resulted in better efficiency of Oracle solution, followed by processing mixed workloads (including reporting, query, OLAP, ETL, and in-base analytics) ability.
- Oracle solutions - unlike Teradata - might be equally well deployed on different hardware and software platforms, they're not tied to standard configuration. Furthermore, there aren't any complications while integrating solutions with Siebel, Hyperion or Fusion Middleware.
- In-database analytics by Oracle are especially worth noticing, providing well-worked out query optimizing, partitioning, compressing, and caching.
- Oracle solutions meet the expectations of customers representing different sizes companies and diversified requirements.
- Oracle gives its customers a possibility of choosing different Optimized Warehouse EDW appliances.
Although Oracle's solutions have plenty of disadvantages, their weak points are noticeable, as well. First of all, there's no possibility of deploying as a single-tier grid of notes. Furthermore, scaling out beyond 1 petabyte or to a grid of hundreds of nodes leave much to be desired. Finally, there's also a matter of isolation - Oracle Enterprise Data Warehousing solutions are provided separately from the whole products' family - BI, OLAP, MDM, and data integration software.
Are these disadvantages significant enough to cause resignation from Oracle solutions? Not, especially because of the company's plans. Extending partnerships, facilitating maintenance and multiplying features are only a part of Oracle's strategy. It states clearly that all the solutions' imperfections might be quickly fixed. All in all, Oracle provides solutions really worth trusting and future prognosis announce only a growth of its market position.
IBM is widely known for its customer care, providing solutions adjusted for clients of diverse sizes and requirements, therefore no one should be surprised with IBM's high position in the ranking, as long as improving IOD (Information On Demand) portfolio remains its main goal. Thereupon, all the new solutions are designed to even supplement, better, and broaden the portfolio, offering the wide range of services. These tendencies ensured IBM a place among market leaders.
In a word, the aspect distinguishing IBM from other vendors is functionality of its solutions. All of them are prepared to suit even the most exacting customers. What features are especially worth mentioning?
- Among diverse IBM solutions, there is DB2, efficient enterprise database.
- IBM might be praised for its universality. Information Server solutions meet extremely different expectations, providing services able to satisfy even the utmost requirements regardless of customers' size or trade.
- EDW appliances by IBM might be scaled out in hundreds of terabytes and diverse topologies (EDW and BI).
- Full integration with Optim, Rational, FileNet, WebSphere, Cognos, and InfoSphere.
- Like Oracle's, solutions by IBM might be implemented on different software and hardware platforms, thereupon they're not tied to standard platforms, and - as a consequence - available for more customers.
- IBM appliances are always well-prepared for supporting and managing mixed workloads - OLAP, ETL, in-database analytics, ad hoc query, and reporting.
- Furthermore, IBM ensures efficient database security systems, information governance tools, and life-cycle management.
Even though the prognosis seem "IBM-friendly", its solutions still might be improved, as long as a few bugs definitely should be fixed. InfoSphere Balanced Warehouse lacks for petabyte-scalability. That restrains IBM's usage in enterprises and service providers (not fully satisfying very-high-end demands).
Among vendors making most significant headways, Microsoft is definitely on the first place. Important acquisitions made by Microsoft - especially DATAllegro - along with aggressive scalability push, let Redmond Giant become one of the most solid EDW vendors in the comparison.
- SQL Server 2008 is a platform definitely worth noticing, featuring:
- Microsoft prepared very well scaling into range of tens of terabytes.
- SQL Server, main enterprise database, is efficiently adjustable, so that might be used by customers representing companies of diverse size and trade.
- Worth noticing is also Microsotf's solutions' flexibility - SQL Server might be easily deployed in different EDW and BI technologies. Furthermore, database management systems might be integrated with specified SOA architectures, platforms, middleware, BI, and other solutions.
- Microsoft SQL Server is also very well-prepared for supporting diversified OLAP, BI, query, and advanced analytics workloads.
- Finally, workload management functionality, cost-based query optimization, indexing, partitioning, compression, and caching are one of the best-of-breed.
Unfortunately, Microsoft's solutions still have a lot to be improved - firstly, scaling. There's no possibility of petabyte-scale massively parallel processing. Secondly, information life-cycle management is weak. Then, information governance, data quality, federation, and hierarchy managing tools seem a bit half-baked. Finally, Microsoft solutions for EDW work only on Windows platform, restraining its usage's width.
Also a few other lacks might be mentioned - not providing own EDW appliances or SaaS-based EDW services - but Microsoft declares to fix that quickly. On the other hand, so far all the lacks are replenished by efficient partnerships - with Dell, HP, and SaaS. All in all, if believe Microsoft's announcements, "Redmond Giant" is working hard on its own solutions, so the customers might expect strongly improved solution in a nearest future.
What features might settle the choice on SAP?
- Both NetWeaver Business Intelligence 7.1 and BI Accelerator make SAP provide extremely efficient EDW appliances.
- Impressive ability to persist data to a choice of database management systems (IBM DB2, Oracle Database, Microsoft SQL Server, and MaxDB).
- Triple availability of SAP's EDW offerings - as appliance (BIA), software (NetWeaver BI/BW), and SaaS components (Business ByDesign).
- Row-based storage supporting capabilities. Plus columnar and cache persistence through BIA.
- Fully integration with service oriented architecture capabilities, application platforms, BI, middleware, desktop software, and PM.
- Broad range of EDW services by SAP itself or efficient partnerships.
Even though, SAP doesn't seem to do its best for collecting new customers, it's rather concentrated on already-existing users. Furthermore, there are also a few significant lacks - no cost-based query optimizing capabilities, query predicate pushdown to storage layers, and compression by data types. Finally, the general universality of SAP's enterprise data warehousing appliances is somehow weak - low-cost solutions for the midmarket have been totally omitted.
Beside those few weak points, SAP is thought to get better and better as every new solutions' edition gets released (and of course after acquisition of Business Objects). In a word, SAP is also a good choice, but - surely - might be significantly better, what's only a matter of time.
SybaseAmong other vendors offerings, Sybase's portfolio seems to be slightly different as it contains especially low-cost solutions, while the midmarket and budget-constrained enterprises are thought to be Sybase's main target. What features argue for investing in Sybase's solutions?
- EDW offerings might be fully integrated with IQ - columnar database by Sybase.
- Full support for integration and modeling tools.
- EDW solutions by Sybase are fully compatible with diverse platforms of software and hardware. They're not tied to the default ones.
- Extensibility options multiply EDW's universality, broadening its usage capabilities. Weight's been attached also to increasing flexibility of workload managing, cost-based query optimizing, indexing, partitioning, and caching.
- Sybase might be praised for its scalability options.
- Lastly, Sybase's partnerships net looks pretty impressing, that guarantee efficient support in every situation.
As a typically niche vendor, Sybase's solutions have a few weak points disabling its wider usage. There's no support for shared-nothing MPP, specialized in-database analytics beyond the pale of pre- or user-defined functions. Furthermore, EDW offerings by Sybase might be integrated only with MDM, OLAP, BI, predictive analytics, and data mining solutions through the partnerships among vendors of the same market segment. Finally, Sybase has omitted providing own life-cycle management systems.
To sum up, the solutions by Sybase in the very beginning distinguish themselves because of the price. Low-cost EDW offerings might have a really successful future if only Sybase do its best to extend capabilities and broaden functionality range.
The vendors listed above are the unquestionable leaders. Their solutions are proven, trustworthy, and services complex. Then must have resulted in great popularity.
Netezza, the representative of the "strong performers" group in this comparison doesn't have to stand down in favor of market leaders that's even more admirable because of its quick advance from total beginners to established solution provider. Netezza has many times proven functionality and capability of its solutions, while deploying mission-critical environments.
What features make Netezza a good choice?
- First thing, that should be mentioned, is incessant growth of Netezza's portfolio. Successive solutions get better and better, therefore all the expectations might be quickly satisfied.
- The features distinguishing Netezza from the other vendors are hybrid shared-nothing massively parallel processing and symmetric multiprocessing approaches (storage and host tier).
- Worth noticing is also a unique model of physical data storage implementing.
- Netezza might be praised for its in-database analytic framework, offering wide range of powerful capabilities.
- BI, OLAP, query, and advanced analytic workloads seem to be very well-worked out.
- Cost-based query optimizing is possible and significantly flexible.
- Finally, workload management capability is also worth mentioning.
Those were the advantages. On the other hand, solutions offered by Netezza might be better, but their advance is restrained by company's strategy itself. Generally, Netezza's weak points might be divided into two categories - go-to-market, and technical standpoints. From the first point of view, Netezza doesn't really seem to make efforts for advance into leader vendors group. Its appliances for the midmarket are still quite expensive and nothing announces its quick changes.
From the second, technical, point of view, Netezza's solutions disclose their freshness and inexperience. They demand predefined hardware and software platforms, refusing to work with different configurations. Although almost all the lacks are replenished by external partnerships, that's not the model appropriate for leaders group. Finally, high-speed interconnects, and MOLAP supports don't exist, the same as database encryption functionality.
All in all, Netezza is already offering a few interesting solutions, and successive ones might be expected to quickly appear. Even though there's still a lot to do, Netezza guarantees great perspectives, therefore it's a really worth-considering vendor.
Despite all the rankings and orders, it must be reminded, that all of the included vendors are the best-of-breed and offer great solutions itself. It's somehow similar to rating Brazilian football players - even the worst one of them is much better than most players from other regions of the World.
After all, it's important to remember that all comparisons - even those based on the clearest and most objective criteria - are somehow subjective. The same as conclusions made of them. Thereupon, present comparison might be a good hint or general review, but the final choice depends - as always - on specified customers' expectations.