Will SAP HANA TDI make the appliance model obsolete?

NOTE: This text was originally published by me at saphana.com in November 2013.

I believe the future will make the SAP HANA appliance model obsolete, and HANA will be a normal application on the datacenter. Things like downloading the SAP HANA software from SAPNET and installing in a VM for productive use will be a normal activity.

SAP HANA is still a “young child” maturing and evolving, and as it evolves, so does the architecture options and operation practices.

As adoption increases, so does the knowledge on that technology, and broader become the deployment options as well as related best practices.

In my analysis, starting to bring HANA to the market through a very controlled appliance model was a wise choice by SAP, but being HANA targeted to become a mainstream component in datacenters WW, it will need to evolve, and SAP will have to manage that, pushed by the increasing customer requests for more openness and choice.

The SAP HANA Tailored Datacenter Integration is just a first step on the journey to increased openness and deployment options. SAP HANA’s customers, partners and component suppliers must understand the technology evolution journey, not only to be able to best position HANA in their technology portfolio, but also to anticipate next steps that will enable them to lead and innovate in this new reality.

SAP HANA Tailored Datacenter Integration solutions will become increasingly relevant in the evolution of older SAP HANA implementations as well as new core business related implementation scenarios.

Setting the scene

Let me explain my understanding of the “SAP HANA journey”, why TDI is a sign of the future, and why this understanding is crucial for customers to make the best deployment decisions, in regards to their specific implementation scenarios.

Over the recent months, I’ve been faced with the following question:

Will SAP HANA imply a return to the early mainframe era with huge central servers and internal storage?

Why does this question come? Let me split it in its two parts:

1. Because today SAP recommends to scale-up first before considering scaling out;

2. Because in the early days of SAP HANA many customers opted for single server installations with only internal disks.

I’ve received these questions both from business partners and customers.

Background on Scale-up vs Scale-out implications on SAP HANA

If you’ve been absent from the HANA discussions, or are just starting to look at the SAP HANA topic, you’ll feel a bit lost among lots of these discussion points.

Let me briefly fill you in regarding the concept and challenges of scale-up vs scale-out on a HANA system.

SAP HANA is an “in-memory” database, which means that all the data in the database is permanently in the RAM of the servers where the database instances are running.

So, if you consider that some customers today may be running SAP systems with over 40 TB databases, even if you consider a compression factor of 4, you end up with many customers requiring over 10 TB of RAM (there are customer cases where compression ratios of 1:8 have been observed, but I’ll stick to the 1:4 for easier calculation in my example).

If you add the fact that HANA only uses about 50% of the RAM for data storage, being the rest used for temporary structures and calculations, you may need systems as large as 20 TB of RAM (if you want to learn more on sizing, start by checking out the “SAP Note 1637145 - SAP BW on HANA: Sizing SAP In-Memory Database”, and make sure of reading the attached PDFs).

Today’s existing systems are limited to 1 TB for analytics use cases (BW on HANA is one example), and 4 TB for transactional use cases (Business Suite on HANA is one example). With the new Intel Ivy-bridge chipsets these limits are expected to be increased by a factor of 3.

HANA has a “shared-nothing” cluster architecture, which means that you can split tables (or partitions of columnar tables) across multiple nodes of a cluster, and that cluster node will have exclusive access to the data it is holding. Note that this is completely different from the Oracle RAC cluster architecture (shared-all cluster architecture), where all nodes access the same data at the same time. The HANA architecture is more similar to the GreenPlum database architecture than the Oracle RAC one.

A SAP HANA database has simultaneously columnar and row storage, meaning that on the same database you can have coexisting tables that store data on rows, and tables that store data on columns. Note that up until HANA, it was typical that companies would operate transactional applications on one database with row tables, and then would use an ETL to load data to a Datawarehouse system where the analytics would run supported on databases with columnar tables. HANA enables to merge these two worlds on the same database, eliminating the need for an ETL. The end result is having optimal options within the same database for simultaneous transactional and analytics needs, enabling real-time analytics on top of operational data, for example for large scale real-time operational monitoring.

If you consider the GreenPlum architecture, SAP HANA goes beyond GreenPlum and has more potential use cases, as it has been designed to support not only analytic workloads, but also transactional workloads. This implies that on HANA you have transactional consistency across nodes, bringing a new level of integration challenges, one thing you do not have on GreenPlum or other purely analytics databases today (like SAP's own Sybase IQ).

The fact is that SAP HANA, being still on version 1.0, is still maturing, and with each new Service Pack, new solutions come to overcome the new found limitations.

Note some of the limitations.

Today, SAP HANA is on version 1.0 SP7, and on this version, there is only possible to store row tables on the single master node of the cluster (consider that for analytic use cases, SAP currently enforces a ratio of 128 GB / CPU socket as maximum of RAM, and of this RAM, only about 50 % is used for permanent data storage);
All connections to the HANA system as well as query distribution is controlled by the “single master nameserver” making the server with this service a potential bottleneck for massive scale-out deployments (in massive I’m talking about many dozens of servers);
Doing single commits that imply writing to tables (or partitions) in different HANA nodes, or doing joins of data across different HANA nodes, imply a “cost” in terms of performance that is very significant, since instead of working with data at nanosecond speeds inside a server NUMA node, you need to keep moving back and forth across the network at millisecond grade speed, which represents a very significant performance hit, implying better tools, work and mechanisms to minimize this impact (although a lot has been delivered in this area, there is still a need for further improvement);
Configuring and operating a SAP HANA system in a scale-out cluster is more complex than operating a single node configuration;
Due to the specifics of the current implementation of the SAP HANA persistency and “savepoints”, there are still some challenges to be overcome in regards to the recoverability features of the system in case of a disaster;
Some challenges are already documented, and some already have a solution, but here are some examples:

1743225 - HANA: Potential failure of connections with scale out nodes (network configuration needs special attention in scale out configurations)
1905462 - HANA scale-out landscape : topology mismatch leads to recovery failure (backup and recovery still not yet fully predictable on scale-out)
1945833 - Processing dimension is intermittently slow in HANA scale out landscape (the behavior of certain features is not yet stable in scale-out configurations)
1825774 - SAP Business Suite Powered by SAP HANA - Multi-Node Support (no scale-out support yet)
1855041 - Sizing Recommendation for Master Node in BW-on-HANA (recommends minimum of 1 TB HANA nodes for larger scale BW on HANA implementations, in order to ensure stable operations…)

These examples confirm that HANA on Scale-out is walking through its growth path in order to become the platform of choice for massive mission critical transactional applications.

All these aspects, on one side push customers to select mostly scale-up solutions which limit their growth potential to the current maximum capacity of existing servers, or when going to scale-out deployments makes them consider HANA as a solution only for analytical (non-mission critical) workloads.

The first has as a consequence both increased cost as the biggest x86 boxes fully loaded with memory and CPU will cost more than twice as 2 half sized boxes, as well as making HANA to be kept out as an option for the core largest core business systems.

The second, prevents the customers from experiencing the greatest benefits of HANA regarding real time operational monitoring, planning and forecasting, as in this scenario, HANA will not be the preferred persistency layer for their core business operational systems.

The point is: you cannot look at these aspects in a static way like they are written in stone, and limit your decisions based on what others have chosen in the past. A customer thinking today on a HANA implementation may face a decision and implementation cycle that reasonably could be around 18 months. At the speed that HANA is evolving, this is an eternity!

Remember that technology evolves, and what is true today, may have changed tomorrow. As guessing the future is no way of conducting business, you need a more pragmatic and supported analysis, so that the understanding of the technology direction provides you a reasonable perspective of how the future might look like.

SAP's previews on what is planned for future releases regarding Datacenter Integration features, is definitely something to consider before making decisions on infrastructure architecture.

Background on file vs block

Regarding the disks discussion, I’ve already written a blog post on that, which you can read here: http://sapinfrastructureintegration.blogspot.com.es/2014/01/choosing-right-sap-hana-architecture.html

Nevertheless, let me add some further background on the “file vs block” discussion, in the “evolutionary journey” perspective, of course.

Most of you knew a software that SAP have named: TREX – Search and Classification Engine. This software has existed for many years, and represents a big part of the foundation on which SAP HANA has been built on.

Why is this important?

If you understand where HANA comes from, maybe you can also understand why some things started in a certain way, or where it may evolve to. TREX leveraged a “shared filesystem” to store all the data when in a multiple host / scale-out configuration. This was because, TREX stored data on small files in a directory structure, which formed the index that existed in memory. So, as TREX was already based on a “shared filesystem”, I would say SAP had a lot other priorities for development to make HANA the “new platform” other than just changing the persistency access model.

Any smart company leverages existing knowledge in order not to be reinventing the wheel. So if SAP had a software component already in their portfolio that allowed to build the HANA vision on top of it, why not leverage it before starting to develop completely new code?

As a curiosity, do you know that the main process on HANA is called "IndexServer", which was the name of the executable that on TREX managed the in memory indexes of the search engine?

The following picture helps you understand better how SAP HANA was born, and the software that SAP leveraged upon to build the SAP HANA code.

This picture was extracted from the IBM RedBook on SAP HANA.

Also, if you want to understand better TREX, you can check the “TREX 7.1 Installation Guide – Multiple Hosts” at http://service.sap.com/installnw74 > “3 – Installation – Standalone Engines” > “Installation: SAP NetWeaver Search and Classification TREX” > “Installing TREX 7.1 Multiple Hosts” (NOTE: Access to this document requires an S-User!).

As a conclusion, I believe SAP HANA started with “file over NAS”, because that was the architecture of TREX (which made all the sense for TREX as it stored the index in the form of small files in a file-system), and so, they leveraged that scenario and invested their development efforts on building new possibilities at the development and data modeling layer, as the “persistency access model” was nor a first stage priority.

As HANA evolves and is challenged with other types of use cases, workloads and datacenter scenarios, SAP is being asked for more options from their customers as a pre-condition to consider HANA as the primary platform for their datacenter. This makes SAP to look closer into these aspects, tune them, and find better solutions.

The point: HANA having been born based on NFS access to persistency doesn’t necessarily mean that this is the best option for all implementation scenarios!

Background on internal disks vs shared disks

As for the fact that some server manufacturers started with servers with internal disks only, clustered through a parallel file system, if you understand some of the use cases of Parallel file systems in the HPTC world (High Performance Technical Computing), as HANA started in 2010 being positioned as an “analytics workload” purpose built database, aspects like data protection, disaster recovery, cross node consistency, ease of change and maintenance, simple scalability, etc, weren’t on the top of the concerns.

Note that on many pure analytics application scenarios, you do not expect the system to ensure transaction consistency across nodes, as you assume that the consistency has been ensured by the previous load process (for example through an ETL) that has dumped the data in these systems. So, now it’s only about massive parallel reads, and even backups are not that critical, since the data still exists someplace else other than the analytics database, and in case of corruption could always be reloaded.

Considering HANA just as an analytics database, the internal storage story with a parallel filesystem made sense.

But SAP’s vision for HANA goes beyond “just” an analytics use case, and SAP states that HANA will be the next generation platform, on top of which all their applications will run.

Also SAP ambitions to become a top tier player in the “Database Management Systems” space, and this will put HANA to the test of transactional systems aggressive RTO and RPO requirements over large distances.

This is making those server manufacturers that started with this scenario of only internal disks to realize that there is a need for other options. Even the ones with a bigger stake in the “internal disk with parallel file system” story have started to develop offerings for HANA with external storage. So it’s clear that other use cases of HANA, will imply different architecture designs where this “internal disks design” may not always sustain the needed Service Level Agreements for operational mission critical applications.

Some of these server vendors’ documentation on TDI will make it clear to you, they’ve realized that an internal disks cluster cannot be the only solutions for all HANA scenarios.

As a conclusion, in the scenario of a larger HANA adoption, the customer has the power to decide what is best for him, and many will want to keep their existing datacenter standards for architecture and operations. Being block access to shared storage the “today’s enterprise standard for mission critical transactional workloads”, I believe it will be the chosen one, which will make it observe increased adoption in the new HANA world.

Note that Block access to HANA Persistency has been available since SAP HANA 1.0 SP5.

An evolutionary analysis of SAP HANA

Having understood a bit on the history behind HANA, let’s start to consolidate an analysis of the SAP HANA evolutionary journey.

HANA has started as an analytics only platform, leveraging the TREX as a foundation.

Has evolved, being today the preferred platform for SAP BW, and seeing all SAP applications being ported to work on top of it.

SAP Business Suite, comprising components like the ERP, CRM or SCM, supports in many customers their core business processes, requiring levels of performance, availability, recoverability, security and stability that push existing technologies to their limits.

As SAP puts its best efforts on bringing those workloads to work on top of HANA, this platform will need to evolve in order to present the choice, robustness and datacenter integration and operation capabilities that customers are used to for their existing mission critical applications.

The promise over the early years

SAP started to position HANA as the next big thing. It would be the fastest platform you’ve even seen. It promised more than 100 times faster performance for analytics on your existing applications.

Those of you that like me have long years of consulting managing large enterprise customers, know that most of the performance and availability problems observed in existing SAP systems, had as most common causes:

Poor architecture designs:

not enough spindles for the expected workloads;
poor network connections between database and application servers;
low memory on database servers;
bad separation / aggregation of components;
etc.

Poor integration of components:

wrong / sub-optimal configurations on Network ports;
wrong / sub-optimal configurations of fibre channel HBAs;
excessive resource sharing without accounting needed reservations for critical applications;
etc.

So, if I was SAP, knowing all the above, would “I” ever leave in the hands of “infrastructure integrators” the success of my most important strategic initiative?

Would “I” risk having customers saying my software isn’t as fast as “I” promised, having “me” troubleshooting thousands of different configuration combinations that “my” hardware partners and all their implementation partners could have come up with????? NOOO WAYYYY !!!!!

"SAP HANA WILL BE DELIVERED AS AN APPLIANCE ONLY!!! This way I’ll ensure control of the variables, and restrict the configuration options in terms of support requests."

Also, this option would enable “me” to focus “my” resources more on the development of new functionalities of the platform, boosting early adoption and higher returns, instead of having to put in place right from the start a very extensive and costly support organization.

But all of this is just me thinking...

The adoption and knowledge growth acceleration

A fact we all need to realize when comparing the deployment options of traditional Netweaver applications and HANA, is that Netweaver has been around for many years, and the ABAP Application server as we know it is now over 20 years old, and has more than 80.000 installations worldwide.

This means that there is enough experience to write a “sizing best practices guide”, or “architecture recommendations for maximum availability and performance”, etc.

The point is, many people claim that “SAP should provide more sizing guidelines and architecture recommendations”, as well as “provide mode options when deploying SAP HANA”. Have you considered for a moment that - maybe - there isn’t yet enough knowledge on this new platform to provide the same level of options as you have on Netweaver?

HANA was announced in 2010. The ABAP Application server has been around since the launch of the R/2 in 1981! Can you compare the level of knowledge gathered over 30 years with what SAP has done with HANA in less than 4 years???

The conclusion is that HANA is !!OUTSTANDING!!, considering its young age, and what SAP has delivered so far.

Connecting back to the reason why it made sense (at least in my head) to start offering HANA on an “appliance model only”, with more customers using HANA, with customers having used HANA over several yearsnow, and going through the growth learnings, more al life knowledge gets gathered on the system operational behavior and challenges.

One thing is the planned operational practices on the design table, and another is the reality on the customer datacenter.

Today, we are only starting to have relevant "operations knowledge" on HANA, as (comparing with the ABAP application server reality) the number of customers with more than 2 full years of experience managing and operating a productive HANA system is still small. So, time is still short to understand all the lifecycle challenges of evolving a HANA system and coping with its growth demands.

So, in my perspective it makes sense that SAP starts to open up a little bit the appliance model. But still with caution.

On my view, some conditions are starting to be fulfilled for more “openness” to come:

1. There is already real-world experience of operating and evolving productive HANA system;

2. There are already a “large enough” SAP HANA customer base to support the expansion of the support matrix;

3. There is already a level of requests in the market, where not having enough wide of deployment options, may hinder the growth expectations, pushing SAP to open up a bit more;

4. The level of functionality is not the biggest barrier to adoption, so development resources can be focused on other aspects.

On my view, the fulfilment of these (or simillar) conditions has enabled SAP to say: “Ok, as the appliance model doesn’t fit the datacenter setup of most and largest customers, we’ll let you choose your preferred server and storage vendor independently”.

This is the SAP HANA Tailored Datacenter Integration, and is the first sign that HANA is maturing and growing in terms of market adoption.

More use cases, more options are requested

As time goes by, the reasonable expectation, considering the massive effort and investment SAP has put on the SAP HANA initiative and how much has been delivered in such a short time, is that HANA adoption will continue to grow.

With this adoption growth, the knowledge will also grow, but most of all, the HANA software reputation will settle, and customers will increasingly differentiate the HANA Software problems from the infrastructure integration problems.

It’s also expected that as the use cases expand, and as HANA spreads more in customer type and customer specific utilizations, the request for increased flexibility becomes larger.

In such a scenario, probably the investment in developing and supporting more deployment options:

On one side, will justify against the risk of not gaining some customers;
And on the other, will pay-off against the new expected revenues, as with more experience, the cost of development tends to become lower.

So, my expectation is that in a time-frame up to 3 years, SAP HANA to become another normal application in the datacenter. One that you can download from SAPNET and install, and one that can be deployed on the shared (or even virtualized) infrastructures you use for other equivalent applications.

Conclusion

I do believe that promoting a higher level of adoption and contributing to the acceleration of HANA maturity, will be the best way to open up the deployment options of SAP HANA and make the appliance model become obsolete at some point in time.

This means that, in terms of server memory, the Tailored Datacenter Integration allows today for customers to choose their preferred server vendor and storage vendor in separate, and integrate the two components.

This is "approach" is no rocket science, as that has been the approach in which most customers have deployed SAP Netweaver based applications until today, procuring each of the infrastructure components to their preferred vendors, and then installing the SAP applications on top of it.

So, this is "THE" option for all those customers where the appliance model doesn't fit either their datacenter setup or operational practices.

I have no question this is the way forward, and it will make storage vendors play a more important role in SAP HANA implementations, as storage vendors have been the experts on data availability, protection and replication for many years.

Make no mistake though, as HANA represents a significantly different paradigm and a different workload pattern (mostly disk-writes with HANA, instead of mostly disk-reads with legacy Netweaver based applications), and the solutions that have made storage vendors successful over the past SAP Netweaver customer based reality, do not ensure success on this new SAP HANA era.

So, what I try to do every time I meet customers, is to demystify HANA and make it simple to understand.

Having a clear understanding of the HANA history and future evolution, will enable you to make the best choices for your specific implementation scenarios, not only based on the few things supported today, but also on what the future is likely to look like.

Know more about EMC and SAP HANA Tailored Datacenter Integration at: https://community.emc.com/docs/DOC-31784

SAP Infrastructure Integration - Experiences and Thoughts

Other activity

2014-01-21

Will SAP HANA TDI make the appliance model obsolete?

No comments:

Post a Comment