2015-09-23

SAP HANA TDI KPI: mandatory or indicative? And what about the network implications on Synchronous replication?

By suggestion of some friends, I've broken this blog post in two to give the proper highlight to the architecture considerations I included in the second part of the original version of this post. That seccond part is now published at: http://sapinfrastructureintegration.blogspot.com/2015/10/exceptions-are-exceptions-one-example.html


          Part 1: The current facts

I've been receiving a lot this question: when I have SAP HANA either with storage or HANA System Replication in a synchronous mode, depending on the network latency, it will not meet the TDI KPIs. Will SAP accept this? What would be the performance impact at an application level?

The question came as this customer wanted all the performance he could get, but also the maximum data resiliency technology could provide today.

In a nutshell, there are certain situations that the "Laws of Physics" do not allow you to have all you want, and you need to chose: either the maximum performance, or the maximum data resiliency.


So, going through this reasoning I first collected the currently available information from SAP in this regards, just to reach the conclusion that "YES it is acceptable to fail KPIs" as SAP says "its the customer's decision to define whether the performance penalty is acceptable to his specific business scenario". Then, I entered a discussion with him on finding the right balance between technical and business requirements for his architecture blueprint.



Where did I get the information that it is acceptable to fail KPIs?

Let me share here what I wrote some weeks ago to a customer with a question in regards to what latency is accepted when doing storage replication, as a competitor was telling him that if he had storage replication and did not fulfill all the TDI KPIs, he could not run any production workload on that infrastructure, which is FALSE!

Then, I'll also share the reasoning that followed after getting this question cleared up, by providing a concrete customer example I hope helps you build your own reasoning in case you're going through a similar discussion in your organization as well.


          Network latency impact on SAP HANA performance when replicating synchronously

Indeed the network latency will be a critical factor on SAP HANA performance. But it is both for storage replication in the same way it is in the case the customer decides to implement SAP HANA System Replication in synchronous mode.

My advice here is to carefully evaluate what is the latency between the two sites being synchronously replicated. Ideal is in fact for latency to be below 1ms. More than that can really start to impact write performance, so if the customer application will have massive data loads, the data load time can become longer because of this. There will be tough, absolutely no impact on the read performance as it is done in the memory of the server. So depending on your application specific workload profile, you may feel a lot this increased latency or nothing at all.

In the end, all the details are on SAP’s Whitepapers.
There are two in particular relevant to this matter: The SAP HANA Network Requirements whitepaper: http://scn.sap.com/docs/DOC-63221
In page 28 it says:
  • There is no straightforward recommendation regarding the bandwidth and the latency a system replication network must provide. A rough estimation of the required performance is given in the How-to Guide Network Required for SAP HANA System Replication, which is referred to from the SAP Note 1999880 - FAQ: SAP HANA system replication
  • Latency: The redo log shipping time for 4 KB log buffers must be less than a millisecond or in a low single-digit millisecond range – depending on the application requirements (relevant for synchronous replication only).

This applies to storage replication as well. So, let’s say that if you cannot stay below 1ms, that 2 or 3 ms may be acceptable. But it will depend on your specific business scenario.
Transactional scenarios (like SAP ERP on HANA) are more sensitive to latency. Analytical scenarios are more sensitive to throughput.

Looking at the note mentioned above it sends you in a sequence of documents being the final one: Network Recommendations for SAP HANA System Replication at http://scn.sap.com/docs/DOC-56044
Here, the same as above is mentioned:
  • All changes to data are captured in the redo log. The SAP HANA database asynchronously persists the redo log with I/O orders of 4 KB to 1 MB size into log segment files in the log volume (i. e. on disk). A transaction writing a commit into the redo log waits until the buffer containing the commit has been written to the log volume. This wait time for 4 KB log buffers should be less than a millisecond or in a low single-digit millisecond range.


          Even without replication, it is your choice whether is acceptable to fail KPIs

Let me add as a final note the transcript of the response to question 7 on page 7 of the SAP HANA TDI FAQ published at SCN: SAP HANA TDI - FAQ | SCN

There is written:

Q: Which cases where one or more KPIs are not fulfilled does SAP consider as uncritical or acceptable?
  • This is always up to the customer to decide if falling below the KPIs is acceptable for his/her daily operation of the SAP HANA system. The customer must decide whether the performance of his/her SAP HANA system is sufficient for his/her needs.
  • The following questions and examples might help for making that decision:
    • Is the given SAP HANA system a non-productive system?
      • The KPIs apply for production systems only. In general, for non-productive systems weaker performance is acceptable
    • Which SAP HANA scenarios are run on the given system?
      • OLAP scenarios only (e.g. SAP BW-on-HANA):
        • The performance of queries is usually not affected if all required tables have been loaded into memory
        • Latency times of the log volume and throughput rates for writing the data volume are mainly relevant when loading data from source systems
        • Throughput rates for reading from the data or the log volume affect the overall system restart time (e.g. after applying an SAP HANA revision update)
      • OLTP scenarios only (e.g. SAP Business-Suite-on-HANA):
        • Latency times of the log volume affect the duration of every transaction that changes one or more tables in the database
        • Throughput rates for reading from the data or the log volume affect the overall system restart time (e.g. after applying an SAP HANA revision update)
      • See the Storage Requirements whitepaper for details: SAP HANA TDI - Storage Requirements | SCN

So, as you can see, SAP HANA is really becoming a normal application in the datacenter, and more choice is given to customers every single day. Today, even without replication, it is each customer's choice whether failing certain KPIs is acceptable.

And this is relevant for example when you think on very small SAP HANA systems that are not very critical, that you see more and more as customers make SAP HANA mainstream and migrate every single system to HANA. There may be a couple of systems that are very big and very critical, but there are a ton of others where you could compromise a bit of performance to have a better TCO.

2015-09-02

SAP HANA in the Cloud? "No way" or "what cloud"?



Being "Cloud" one of today’s buzz words, that is also shaking-up the SAP world alongside HANA, I’ve come across many customer with a strong “antagonism” to any topic with “cloud” on the title.

I see as a root cause to this antagonism, the huge confusion created by all the players that want to be in this market, whom call “cloud” to everything that moves in their portfolio in their struggle to be relevant.

So, when talking with CIOs, Enterprise Architects and other IT leaders, I see still lots of misunderstanding in relation to “all things Cloud for SAP”.

In this blog, I’ll make a brief definition of my understanding of “Cloud”, to then use this definition to do a high level discussion on “what is the right cloud model for each company and business scenario”.

Note that you’ll see that my discussion is generic from a pure cloud perspective, and not very “HANA” specific. The point here is that, all I’ll be discussing in this blog post, can be part of a datacenter wide strategy discussion on which SAP HANA today definitely fit in as well.

SAP HANA is becoming increasingly normal, and so you should consider it as an integral part to your cloud strategy.

I know there are certain limitations in regards to “very huge” HANA systems. But when talking with customers with 70 and more productive systems with full landscapes of more than 300 SIDs, and many multi-terabyte databases, the ones that require special attention are usually less than 10%. So, you can set a rule for the other 90+ %, and then deal with the exceptions.

Let me tell you as well, that I’ve seen companies looking at those huge SAP HANA systems first and using them to define the standard, and it has led to massive costs that could have been avoided. But talking on the benefits of standardization of IT would be another conversation I’ll leave to another occasion!


           What is "Cloud"? 

So, let’s get to it and start putting some names on the “things”.

In fact, Cloud is a buzz word subject to many miss understandings today, and due to that may not be “the first option” many organizations around the world would think of.

Although the “Cloud” buzz word is relatively new, the concept has been around for many years, and it’s all about industrialization of IT.

Looking beyond the noise and the smoke, indeed Cloud not being new, it is indeed real today.

I remember presenting at conferences for another company many years ago about “industrialization of IT” where I knew that what I had was a reference architecture, and that everything would have to be custom built through massive consulting projects. This is no longer the case today.

And industrialization of IT is real because today we have the technologies that indeed enable to manage IT as a Service at its multiple levels.

Hardware has evolved to be able to be managed and configured by software, and most components in the IT stack provide scripting interfaces that enables “automation tools” to manage them.

This discussion has been becoming increasingly confusing for customers because each provider organization abuses the “cloud” word to positon themselves as relevant despite their disparate maturity, standardization, automation, geographic coverage and competitive positioning.

Among the miss understandings of “cloud” there are 2 I consider fundamental to overcome:

  • That once a workload is virtualized, it is already in some sort of cloud;
  • And the idea that cloud is only “public cloud”. 


           Just Virtualizing Systems is not Cloud !!!

Let me then start setting some foundations by making a reference to the U.S. National Institute for Standards and Technology (N.I.S.T.). This organization came up with a definition of cloud, to which all players in the market can refer to, so enabling cloud customers to have a framework through which evaluate their candidate providers in terms of completeness of offerings.

This standard has been widely accepted by the key players in the “cloud business”, and can be found at: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf

If you read through the N.I.S.T. definition of cloud computing, it clearly identifies what are the essential characteristics needed to be in place, so that you can call something “cloud”:

  • On-demand self-service;
  • Broad network access;
  • Resource pooling;
  • Rapid elasticity;
  • Measured service.

You can read more about them both on the N.I.S.T. documentation and on other documents available in the internet.

From these characteristics you can evaluate that they imply a more comprehensive scenario than just virtualizing applications. Virtualization can indeed be a key enabler of cloud as it enables for example “rapid elasticity” and “resource pooling”, but to be in face of a cloud, you’ll also need a significant level of automation.

Indeed, automation is a key component to any real cloud. If you look at it, Information Technologies came with the promise of automating the business processes of organizations, but the management of the IT systems has been quite manual. In fact, for most organizations around the world (end customers and service providers alike), the cost of labor is one of the biggest items in IT operating expenses.

So, "Cloud" is all about bringing to IT what IT brought to business processes many years ago: minimize the human intervention in regular IT operations. The benefits from this reduction of human intervention has been realized for many years in the manufacturing industries, having demonstrated to reduce defects, improve response times, reduce operating costs and simplifying change.

These levels of automation are implicit in characteristics like “on-demand self-service”, where it is assumed that there will be no manual work behind a “self-service portal”, as it would be fully orchestrated and automated through workflows.

For me a “cloud” that is hosted at a service provider, but that behind the “portal” has a lot of manual work, and because of that has expensive costs and low response times, is not a cloud. Although many service providers around the world advertise cloud offerings for SAP HANA that indeed in the background operate in this way. They will always fall short on the expectations created by the “true cloud players”.

Another key characteristic of cloud is to have a “measured service”, as this is a key characteristic to enable your organization to run IT as a Service, and become an IT services broker to your business organizations.

This would for example enable to, when a business user comes to you asking for a system to start a new project, to check your “private cloud” and “public cloud” costs, security, service levels and response times, and give an the best option to the business user based on the business scenario he will be implementing. Here is where I believe IT really becomes a partner to business organizations in driving innovation and further organizational agility.

I believe we can agree at this stage that “virtualization” and “cloud” are not synonyms. Virtualization is definitely a key ingredient to build efficient clouds, but is not “cloud” just by itself.


           Cloud is not only “Public” Cloud – look for Private and Hybrid as well.

The clarification for the second myth we’ve referred above comes from the N.I.S.T. definition of Cloud Computing as well.

In that document there are also described the “Cloud Service and Delivery Models”.

In terms of service models, N.I.S.T. defines the following:
  • Software as a Service
  • Platform as a Service
  • Infrastructure as a Service

And in terms of delivery models, the following are defined:
  • Private cloud;
  • Community Cloud;
  • Public Cloud;
  • Hybrid Cloud.

Again, not being my goal to dive deep on the NIST definition (as you can read all about it online), the point here is that private IT organizations can work in a “cloud operating model”, delivering a Private Cloud to their business organizations with similar characteristics to public cloud offerings.

To do this, what private organizations need is access to the infrastructure components, tools and know how on the processes used by leading public cloud providers. And this is available today!

I’m pleased to have gotten to know some IT organizations that “got their act together” and stepped it up to be competitive in terms of agility and quality of service with public cloud providers! In the end it’s all about the people.

But there is another choice coming strong to market, that might bring together the best of two worlds: what if the “public cloud provider” offered his services to manage a private cloud hosted on your premises (under your full span of control) in the same way it manages his public cloud?

This is what I call the “managed private cloud”.

The point here: cloud is not only public cloud. It’s all about a service model around automation of IT.
I believe then, that we can agree that accepting this definition, “cloud” must indeed be part of every IT organization’s strategy discussions.


           Should I go IaaS, PaaS or SaaS? Soo many “aaS” is so confusing...



Now, looking at the NIST defined service models, what is the right one for each of your business processes / application stacks / organizations?

Let me share what I’ve learned on this topic from working in IT operations myself, and participating in many interesting discussions with the leaders of those IT organizations.

Many organizations look to differentiate themselves through the usage of information. This makes it that an increasing number of organizations see “code development” capabilities increasingly as a core competence as it provides them the capability to manage and analyze information. In the information age, being able to rapidly turn information into actionable insights to the business is a key competitive differentiator.

On the same side, there are also organizations that struggle to get the skills they need at this level, and see themselves many times hostages of “under-performing” IT teams. This has led in many cases to top management decisions to embark on “comprehensive outsourcing deals”, which we’ve seen as very popular through the late 90’s and early 2000’s, as they expected the outsourcer to solve the problems they haven’t been able to solve themselves.

History has shown that most of these comprehensive outsourcing deals apart from the short term financial impact of turning CAPEX into OPEX and providing a cash injection, have come short on most operational, innovation and strategic expectations.

So, externalization of IT functions, being through Outsourcing or Cloud, has been a tool in the hands of CEOs, CFOs, CIOs and in general C-level executives as a way to solve many different problems (strategic, financial, etc), including IT skills and processes problems.

I have worked both on architecting solutions for these “old large outsourcing contracts” as well on the delivery of those contracts, and I have a ton of stories to tell on my learnings from it.

But the point is that these problems haven’t gone away, and still exist in many organizations.
So choosing between SaaS, PaaS and IaaS may also align to where the problem lies in the customer’s IT organization.

I would say that a decision to adopt IaaS, PaaS or SaaS should be based fundamentally on the business scenario to be supported. But I have to acknowledge that on the board room, this is not the only discussion argument.

This is why I say that SaaS may be a way to solve your problems when functional and development teams do not deliver. By going SaaS you externalize all of that to the provider. Of course this should be balanced with an analysis of the application, as going to a SaaS model would be a good fit if the application has mainly standard requirements across the company’s industry.

As for PaaS, apart from providing pervasive access to a development platform, it might be looked as well as a way to cover poor capabilities of DB and OS admin teams, as that would be in the responsibility of the provider.

Remember, these are just examples of discussions I’ve observed, and I believe it is important to put the SaaS/PaaS/IaaS options in perspective.

Associate this previous aspect with the fact that in many parts of the world there are severe technical and regulatory limitations to a public cloud model, and the scenario of implementing a IaaS private or hybrid cloud model is in fact the most accepted one for business critical applications. I see this today, and foresee it will stay like this for the coming years.


           What is the right cloud delivery model for my Business Critical applications?

So what kind of a cloud will you’ll be putting each of your workloads on?

Will everything be public?

It depends on the specifics of each customer / country and use-case, and the following metrics should provide you a good guidance on how to decide: Economics; Trust; Functional.

For example:

  • Where is more economic to run this workload? On the public cloud or on my private cloud?
    • As internal IT usually does not have a profitability goal, the end cost for the company may be lower running it on-premise. But on the other side, in many occasions private IT organizations do not have either the scale or the most efficient operating model, making it then more costly in the long run. Not all companies are alike, so the result of this evaluation is absolutely unique to each customer scenario.
  • What confidentiality and compliance requirements are applicable to this business process/application? It is subject to data sovereignty regulations, or other risk related regulations like "Sarbanes/Oxley" or "Basel"?
    • Not all countries have the same regulation, and each organization manages business in a different and unique way. There may be scenarios that either industry, country or corporate regulations impose certain restrictions on "data placement". Also the risk profile of each company defines their approach to risk, and their evaluation on probability, impact and acceptable risk containment/management practices.
  • Does the public could provider provide functionality that maps to the unique needs of the organization?
    • There are business processes in organizations that are quite standard in the industry, leading to look at application adoption as a way to have access to standard and updated industry best practices. There are as well situations where the business processes and functional requirements are very unique or custom to the organization, being even a unique source of competitive differentiation, situation in which most likely a standardized public cloud offering may not be a good fit.


Again, this is just an example of a reflection, and if you look at this, you can realize that most likely you'll find workloads suitable to public cloud offerings, and other that will be subject to more restrictive conditions.

Apart from this, we have to remember as well that not all the world has today cheap and stable high bandwidth connectivity, and that the increased news on data privacy violations and questionable access by public entities to private data, makes it that many organizations see themselves in the cross-roads of making careful decisions of what should be placed under the organization’s direct span of control (the private cloud), and what is acceptable to be put in a fully public environment.

Let me share one example provided by one customer I met some weeks ago on his reasoning of what we could place on one side or on the other.

This customer gave as an example 2 particular systems:
  • one to support employee performance evaluation, career planning and skills development;
  • and another system for “order to cash processes” like invoicing and accounts receivable.

If they lose access to the employee evaluation system, the company will not stop operating. Also, the kind of data included in such a system does not contain critical data in terms of the company secrets, which makes that having a reliable public cloud offering it might be the right solution for such a system.

On the other way, for the invoicing and accounts receivable system, if access to such a system might be lost, the business operations would be paralyzed and cash inflows might be negatively impacted. 
Also if in such information there is business critical data, like the markets an organization is operating, its revenues by market, its customers, its planned promotions and new product launches, etc… This is definitely the kind of system that it will be most difficult for an organization to accept being operated outside its span of control, or without access to its own auditing services.

If we remember as well, that apart from the US, Western Europe and certain specific countries in Asia, the majority of the world still has both technical and legal constraints around public cloud, I believe on-premise (or a hosted / managed private environments) will still be the mainstream option for the core business systems of organizations for quite some years to come.

So, we will have cases where a pure public cloud offering where you have no control over the software stack or the data placement location will be an acceptable choice, and also cases on the other extreme where you'll need to keep full control of the software stack and data location. But between these two extremes, there are a multitude of possibilities, like leveraging a local provider providing you an hybrid model like: having the data placed in the local provider within the country's borders, in association with a global provider providing the cloud services on top of that infrastructure, either in a dedicated or in a shared environment.

The point: today there are multiple options, so embrace the discussion and find out what are the best options for your organization from a financial, strategic, innovation and operational perspectives.


           IaaS is the Cloud Service Model that Provides you the most flexibility

Assuming that through your “cloud workload filter” you define what services are acceptable to run on a public cloud and which are not, you’ll end up with the conclusion that you’ll be operating a Hybrid Cloud Model, with certain services running under your span of control (either hosted by your own IT or managed in a hosted managed private environment at a Service Provider).

How do you ensure you do not end up locked-in to a certain vendor, and that you don’t lose the flexibility an hybrid cloud model should be providing you in the first place?

The response would be: adopt a cloud service model that enables you to have control over the “mobility” of your workloads between public clouds, or between public and private clouds.

Today, only Infrastructure as a Service (IaaS) offers this.
Let’s look at it for an instant:

  • SaaS: the application logic will be property of the service provider, and if your organization wants to move that process back to its span of control, it will imply a major migration project as it will be needed to configure a new application to support that functionality alongside a migration of the data from the public provider to your controlled environment. This will definitely be time and resource consuming, representing a disruptive change. So we can say, that once you make a choice to adopt a SaaS offering in a Public Cloud, there will be a number of barriers to exit, that must be evaluated beforehand and included in your contract “exit clauses”.
  • PaaS: the provider owns up to the run-time platform where you deploy your code. Means that most likely, your data will also be hosted on the provider infrastructure. Here moving these processes from public to private cloud becomes simpler, whenever the platform you’ve chosen to run your code on can be easily deployed on-premise as well. But also in this case, moving functionality from public to private will be a disruptive event which will imply certain costs and application downtime.

Both in SaaS and PaaS, when talking about hybrid cloud scenarios, we are thinking about the possibility of business processes or applications interact and expand between private and public cloud. But in neither of these cases we can talk about “dynamic workload mobility”. At least with the current (that I know of, and foreseen for the medium term) public cloud offerings and technologies, there is NO dynamic / real-time workload mobility.

But this is a reality today in the IaaS model.

When considering Infrastructure as a Service, apart from having complete isolation at the data level with today’s Virtualization Technologies like the ones provided by VMware, the whole stack needed to run your business processes is packed in a container that is the Virtual Machine, that you can absolutely control form a confidentiality point of view and move around between compatible clouds.

But the key benefit here is that if you choose public cloud providers who operate on VMware, having VMware as well as an enabling technology for your private cloud in a IaaS model, you can today use technologies like “long distance vMotion” (check out the announcements on this topic from VMworld US 2015) to dynamically and without disruption move applications from the private to the public cloud and back, avoiding being locked-in to a specific vendor.

There are also storage provider technologies that facilitate the movement and copy of virtual machines between clouds that are in use by many service providers around the world and that can be used in combination with “long distance vMotion”. But to my current knowledge, the scenarios I know are in production still imply a minimal application downtime to do this workload mobility, as “long distance vMotion” is a brand new feature that will still take some time to become widely implemented and integrated with key hardware infrastructure components.

So, from a risk perspective, IaaS is the only model that truly enables you to operate in a hybrid cloud that doesn’t get you "technically" locked-in to one public cloud vendor.


          IaaS is the only true hybrid cloud model without vendor lock-in

Cloud isn’t then just Public Cloud.

Cloud is about industrialization of IT, being its architecture enablers: virtualization,  automation and orchestration software, and most important, people and processes.

Today it is possible to acquire a pre-configured infrastructure, with all the software already deployed on it to provide an out of the box self-service portal, charge back tools, automated deployment workflows, and all those other tools that truly enable a private cloud according to the characteristics of cloud described above as per the N.I.S.T. definition of cloud computing.

Considering the profile of your organization, and the different types of business processes you’ll be supporting, cloud must definitely be a part of your IT strategy conversations.

In many cases, implementing a private cloud and adopting a cloud operating model on your own IT organization may show to be the right solution to your most critical and customized business applications. But in many cases I’m finding out that with the increased security certifications obtained by cloud companies like Virtustream, the option for a hosted / managed private cloud receives a lot of very strong positive feedback.

Having hosted managed private cloud offerings, brings most of the benefits of a public cloud but under your organization’s span of control.

Being Cloud a key topic in your architecture discussions today, adopting a TDI and Virtualized infrastructure for your on-premise SAP HANA deployment will also enable you to be prepared for the adoption of a Hybrid Cloud IaaS scenario, where you’ll be able to have full control of the mobility of your workloads between your private and multiple cloud providers operating under this model.

This scenario will for example enable:
  • Higher utilization of private assets: provisioning only for average workload, being able to leverage public cloud for peaks;
  • Simplification of operations: by having the same base architectures on-premise and on your cloud provider you can for example setup your disaster recovery to the cloud in a fully automated way;
  • Reduction of the risk of change: having a new project you can just deploy a system either on-premise or on the cloud, whatever is more convenient, being able at a later stage to move that system in either direction depending on your “workload filter” and whether the applications will more or less critical and confidential.
  • Drive business innovation: one of the key aspects to business innovation is to be able to experiment at a low cost. This scenario provides you full flexibility, so that you don’t have to say no to your business ever. In this scenario you can truly operate as a business broker taking the best of your private infrastructure as well as the public providers you are working with.


           Conclusion

Hope that at this point you agree with me that “Cloud” must be in every organization’s strategy and architecture discussion.

Which one is the right one for you? There is definitely not a “one size fits all” answer.

The good thing is that there are choices that can map to your specific organization, industry, location, and business scenario.

I hope this discussion helps you get started with your own ideas on cloud.

From my side, it is my true belief that an Hybrid IaaS model will be the leading option for mission critical SAP application, being based on SAP HANA or AnyDB.

If you want to get started in engaging with candidate providers to work with you on building your cloud strategy and architecture, being close to me, let me advise the following:

  • EMC Federation Enterprise Hybrid Cloud for SAP: this solution makes available a fully enabled cloud infrastructure factory build and fully integrated, delivered pre-configured on your datacenter, leveraging the best technologies from the EMC Federation (EMC, VCE, VMware, Virtustream), so that you can start operating a private cloud without having to build all functionalities on your own from scratch;
  • Virtustream Hosted and Managed Cloud offerings for SAP: Virtustream is a recognized leader in cloud for SAP and SAP HANA and has been recently acquired by the EMC Federation, being as well one of the only 3 SAP Premium HANA Cloud Partners.