As my colleague Vijay has already posted, I am very happy to see that IBM has finally embraced the paradigm shift by modernizing their database and entering the world of columnar in-memory processing of data. The status quo of processing OLTP and OLAP separately was the critical business bottleneck which prompted SAP to completely re-think information processing architecture and introduce SAP HANA 2 years ago. SAP HANA is not just a database; it is a platform that provides customers with capabilities to eliminate batch, streamline multiple data and application logic processing workloads (without requiring multiple data copies), and enable enterprises to become real-time business by converging OLTP and OLAP.
However, amidst this excitement about BLU come many unsubstantiated claims and over-reaching statements by IBM which have already caused much confusion for its own customer base. With some deep digging into DB2 BLU public materials, following are my assessments.
1. DB2 BLU is Not a Real-Time Solution
IBM has made much of the fact that row and columnar databases co-exist in DB2 BLU. But, there is a big difference between co-existence of row and column stores and the true convergence of OLTP and OLAP in one columnar store as is offered by SAP HANA. DB2 BLU does not solve the inherent latency between OLTP and OLAP which is the critical bottleneck that makes real-time business very difficult. Providing operational reporting as business is happening will still require a cumbersome and manual process as queries still need to hit the row table on disk, or rely on a DBA to convert the tables for applications from row to column store. It would be necessary for DBAs to take the system down and use a utility to drop all the data from the row table and then perform bulk insert to add into a newly created column table. Also, once they convert to column store, data cannot be written into the columnar store without using bulk load. By contrast SAP HANA supports both dynamic read and write of data in a column store, including bulk insert, providing not only analytics but also analytical planning applications; truly extending the use cases for real-time operations.
2. Real-time Business Needs More than Analytic Acceleration
This is an inherent issue that comes from separation of OLTP and OLAP into two different database stores. IBM DB2 BLU may, and probably really does, do a good job of accelerating selected tables – a nice improvement. This is something SAP achieved 10 years ago with SAP NetWeaver Business Warehouse Accelerator. SAP Sybase Adaptive Server Enterprise (ASE) has also had this same in-memory acceleration capability for years. However, this is still fundamentally different than SAP HANA which is optimized for mixed workload (OLTP+OLAP). In contrast, DB2 BLU is limited to query acceleration on pre-selected tables solely. Unfortunately, IBM DB2 BLU does little, if anything, to address the inherent latency between OLTP and OLAP. Its operational queries still need to hit a row-based table stored on a disk, and trend and historical queries will hit its BLU-based in-memory data mart. Memory is just a caching layer and query response is still disk bound and as a result IBM DB2 BLU cannot deliver the real-time operational reporting on business as it is happening.
SAP HANA has erased the line between OLTP and OLAP. With SAP HANA data is optimistically loaded into memory when needed and is accessible by all queries, including ad-hoc queries, and therefore eliminating disk base latency where access can be 100,000 times slower. Additionally HANA enables dynamic addition of columns without needing to remove the data and tables, providing complete business flexibility. There is no way you can deliver the real-time results that customers need to seize new opportunities without a true in-memory system that really handles both OLTP and OLAP.
3. BLU is Not Big Data Ready
IBM DB2 BLU is not Big Data and Enterprise ready. It is a single server solution only and, given its inability to scale out and the 10TB cap it imposes, it is not suited to Big Data. SAP HANA has demonstrated 100TB and 1 Petabyte performance test results (100 times the DB2 BLU cap) and is well suited to run mission critical enterprise workloads of any size. Conveniently, the speed claims against HANA is made on a 10TB data mart (assuming 10X compression to 1TB main memory), the limitation of the current DB2 BLU database.
4. 10x vs 10,000x Faster
Any serious DBA would admit that without knowing the dataset, schema, and SQL statement, it is very difficult to compare query performances. IBM has made claims that some queries are 5-10x faster than SAP HANA, that HANA requires 4x more RAM, 10x more disk, or 10x more CPU, all without disclosing dataset/query details. In contrast, many SAP HANA customers have experienced 10,000+ times faster improvements on queries, many of which include users of DB2.
With this said, I am the first to tell you that this is still apples vs. oranges and it is not a true indicator of query performance comparison. There is a published benchmark for SAP HANA – BW-EML (extended mixed load), measuring the in-memory query performance during live data insert scenario. This benchmark was developed based on real customer use cases and not developed in isolated theoretical lab scenarios. IBM DB2 has participated in many SAP previous benchmarks, we invite DB2 BLU to participate in this performance test before compare the results.
Some of the SAP HANA capabilities that optimizes the use of DRAM include:
- Dynamic data loading to ensure that only the requested data column is loaded into memory when it is queried or upon database start.
- Active query against highly compressed columnar data without uncompression.
- Multicore vectorized processing applies to both scale up and scale out, unlike DB2 BLU which is only limited in single node only.
5. HANA would Require New hardware vs. Re-using Existing Hardware?
When considering hardware requirement, one must compare the overall TCO include hardware footprint and personnel maintenance costs of managing an older generation of hardware vs. new generation hardware. SAP HANA customers chose to deploy the latest hardware technology because it offers superior TCO and performance. As one CIO of a major international retailor told me yesterday, he can run his entire multi-billion, multi-brand enterprise on one server of Ivy Bridge based “HANA Hawk”, as opposed to spending approximately £35M in existing hardware, labor, and data center costs. Why wouldn’t you run your business leveraging the latest hardware innovation if you can achieve superior performance and TCO? Existing databases and hardware choices may be more expensive than lower footprint of SAP HANA due to fundamental removal of layers of redundant data processing steps.
SAP HANA is highly optimized for multicore vectorized processing and has been certified for Nehalem and Westmere processor. SAP also decided to leapfrog the Sandy bridge processor and adopt the more advanced Ivy bridge processor, as it offers superior performance and TCO.
6. SAP HANA Must Back-up to Disk and Can't Leverage Customers' Existing 3rd party Storage Solutions?
SAP has already announced open hardware architecture to support tailored datacenter integration. Solutions such as Symantic Netbackup 7.5 has just passed the certification, with other solutions in the certification process. This provides customers the choice to integrate SAP HANA platform with customers’ existing network storage infrastructure to reduce TCO. Additionally this will be part of the open initiatives for SAP HANA platform which include the existing certification program for SAP HANA targeting third-party business intelligence (BI), ETL and backup and recovery tools.
7. SAP HANA Doesn’t Provide Security to Restrict DBA from Unauthorized Data Access?
Not true again. Public documentation supporting separation of duty for DBA is posted here. Please read page 32. DBAs are only granted system privileges for administrative actions for the DB. The DBA can only access DB content and records if the user who owns it grants him/her the privileges to see them. Additionally, SAP HANA provides an additional layer of oversight through audit functionality.
8. DB2 BLU “Database” vs. SAP HANA "Platform”
With DB2 BLU, IBM seems to have focused only on the database. However, one must look at much more than that. SAP HANA is much more than just a database. It is also a full-fledged application platform with predictive, natural language processing, geo-spatial processing, rules, planning, an application server and a web server built in (and SAP doesn’t charge extra add-on licensing in order to use this functionality). SAP HANA also comes with libraries of embedded business, statistical, and predictive algorithms and geo-content available for use. This not only simplifies the overall architecture and reduces TCO, it speeds up processing and application development. As noted earlier, SAP HANA erases the line between OLTP and OLAP and provides enterprise applications the mixed workload environment they now need for real-time information processing across transaction and analytics, structured vs unstructured data. In contrast, IBM DB2 BLU and the family of PureData systems still requires several layers and separate servers to be used and integrated by services professionals for an application to be built and deployed. IBM may have made some progress and may have begun to enter the in-memory world where SAP has been for years. However, there is no comparison between DB2 BLU and SAP HANA.
9. DB2 BLU Can’t be used with Mission-Critical App such as SAP Business Suite and SAP BW
First off, IBM DB2 10.5 with BLU is not as yet certified for SAP. The reason for this is not just timing, but the inherent technical limitation of DB2 BLU. Today it would not be possible for SAP Business Suite nor SAP BW applications to leverage DB2 BLU technology as its column store does not support OLTP writes, it only supports block updates. So, if it is ever certified for or used with SAP Business Suite, DB2, without BLU, will really only be able to leverage row store for the SAP Business Suite. If someone did indeed want to use DB2 BLU with SAP Business Suite they would have to invest in DBA time to find and convert the necessary tables to a column store as well as invest in ongoing maintenance to support such customizations.
That is not to say customers cannot run SAP Business Suite on IBM DB2. SAP is open, it is just that they would not be able to use BLU without additional customizations and then only for read-only use cases given that BLU does not support writing to the column data base. Further, IBM DB2 BLU cannot really be used natively with SAP NetWeaver Business Warehouse or SAP Business Warehouse based applications either due to is write limitations. The only way it seems to fit in here is as a read-only near-line store (NLS) for SAP NetWeaver Business Warehouse. And there are other products much more suited to this, such as SAP Sybase IQ which is fully integrated already.
So while it is nice to see IBM’s attempt to come into the in-memory world and final jump on the bandwagon, they appear to have a lot of work to do an a long way to go before they can offer a product that will be of true value to customers. Hopefully this discussion of these points above has helped clear up some of the confusion IBM’s unsubstantiated claims and over-reaching statements have caused. We look forward to competing fairly and in partnership with IBM, for whom we have great respect.