There are several reasons why SAP HANA works: reasons tied to the advance of information technology; from the economics of IT. The advantages of HANA come from these economics. Let’s see how IT has advanced through the client-server architecture, n-tiered architecture, virtualization, and cloud; and see how HANA fits in.
In 1988 enterprise computing was mainframe-based and a mainframe ran the entire stack: OS, database, application, and user interface (UI) on a single node. Within that stack programmers worked very hard to cut the overhead of hops between the layers down to the level of counting instructions.
However, the power of microprocessors was overwhelming. Therefore, system architects built client-server systems and offloaded some work from the mainframes. The overhead of hopping across the two tiers was enormous, but microprocessor economics made it work. Then the database layer was split out into a 3-tiered architecture; more work was off-loaded and the hop overhead increased again.
By the late 1990’s the n-tiered architecture was developed and it became possible to eliminate the mainframe entirely by replacing it with hundreds of commodity microprocessor-based servers.
While all of this was occurring Moore’s Law was at work as well. Moore’s Law suggests that the number of transistors on integrated circuits would double every two years, yielding a 2x performance improvement with each jump. Even though the use of automated systems was booming the servers that comprised the architecture kept up. With n-tier spreading the workload and Moore's Law driving up the processing power on each server, by the late 1990’s the architecture and the processors matured to solve most enterprise compute problems and the mainframe was done, if not quite dead.
But the business software boom that made companies like SAP and Oracle what they are today continued on and the number of servers required to service the demand went from hundreds to several hundreds in each data center. The server farm was born.
Then n-tier morphed slightly into todays web-based architecture with browser-based clients, the user interface rendered in a web server, applications spread across multiple application servers, and databases in clusters, all communicating over a network.
About 6-7 years ago, an interesting thing occurred. Moore’s Law, with its compounded growth, started to produce processors with computing power that was not immediately consumed by application growth. When upgraded servers with 2x the compute were added into the tiered architecture, all of a sudden the server farm became under-utilized. This caused IT shops around the world to start projects consolidating servers. Then multi-core servers emerged, another 2x increment was available, and the problem repeated itself. The servers IT just consolidated needed to be consolidated again.
With VMWare the servers in the n-tiered architecture became containers that could be easily relocated and the n-tiers could be consolidated without changing the system architecture, but at a cost. In exchange for the convenience of having containers, virtualization added a 5%-20% performance penalty. In addition, because the underlying systems architecture did not change, applications continued to hop over virtual machine boundaries, from application layer to the OS, from the OS to the network layer, over the network to another network layer, to that server's OS, to the database layer, and back. This kludge was convenient but not efficient; and Moore’s Law masked the fat with ever more processing power.
Finally, to bring us up to date, we throw in the Cloud, where infrastructure is deployed to manage the provisioning and placement of virtual server containers. The Cloud moves the fat around with ease.
HANA was designed to optimize out the fat. HANA deploys as much application code as possible in the same address space as the database, deploying both as engines using lightweight threads. Hops are reduced in size or eliminated completely and fat is trimmed. Further, HANA optimizes these engines to share data in-memory instead of by passing data blocks around. And HANA optimizes to the bare metal as data is sent in full cache lines often using special parallel instructions to drive the cores as efficiently as possible.
These optimizations add up. We tend to focus on the micro-level on the reduced latencies associated with I/O from memory instead of from other block devices, but what is really going on is bigger. The sum of these optimizations is often 1,000x, sometimes 10,000x, sometimes even 100,000x of fat removed.
So while we can highlight the many little efficiencies that distinguish HANA from the competition the basis for this radical performance advance is not radical at all. It is simply a fresh start. HANA is a new database:
- optimized for the current and best high performance computing technology available rather than designed for a hardware architecture that is aging out;
- deeply integrating, rather than tacking on, the advances in database software that have emerged over the last 30 years;
- with a few innovations from SAP thrown in for good measure.