2015-04-29

SAP HANA Storage Whitepaper version 2.6 released

SAP has published quite some time ago their "storage requirements whitepaper" for SAP HANA, which explains at a quite interesting level of detail, how SAP HANA uses storage, and to what extent storage characteristics are relevant for SAP HANA operations.

So far, no news.

What is new, is that with each new version of this "Storage Whitepaper", SAP has been simplifying the SAP HANA Storage requirements (for example on this latest version by reducing the capacity requirements for the /hana/shared/ filesystem), which is a clear reflection of the increased experience on SAP HANA operations, increased stability of the software, and so a clear sign of maturity.

If you haven't read this document yet, I would really suggest you to take some time to go through it in detail.

If you already saw this document at some point in time, be sure to review it and be up to date on the latest.

The SAP HANA Storage Requirements Whitepaper is published at: http://scn.sap.com/docs/DOC-62595

Reference to this and other important documents in regards to SAP HANA infrastructure integration can be found on the right pane of my blog under reference pages: SAP HANA Technical Documentation

2015-04-23

The right architecture for SAP S/4 HANA and the Internet of Things?

SAP HANA is changing the long time ERP paradigms for SAP customers.

But SAP HANA itself is evolving very fast, and bringing new functionality and possibilities with every new release. Among the new variables needed to be considered are aspects like:
  • the introduction of data temperatures and tiering data to the right “economically viable” repository;
  • integration of machine data;
  • the mandate for the new HANA warm store to be on a separate server than the HANA system, and using shared storage;
  • the new HANA functionality (named HANA "Vora") promising to enable HADOOP for the enterprise;
  • the increased value of data and the increased need for availability and data protection in organizations.

Despite the expected data footprint reduction from the migration of SAP ERP to S/4 HANA, the volume of data managed by SAP HANA may only increase, which mandates to think infrastructure in a different perspective.

In this blog post, while explaining where the SAP HANA data footprint reduction will come from, and I’ll talk a bit about the relevance of data storage and virtualization in the new HANA world.

Also separating what is marketing noise from the reality, will take a closer look at what is possible today as well as what is coming, confirming the increased openness SAP HANA is providing today just comparing with what was the reality two years ago.

I'll conclude by making the case on why implementing SAP HANA in a TDI and Virtualized infrastructure are the right choices for companies planning to implement HANA today.


               Setting the scene

Just read an interesting article on a UK online journal, talking about the promise of less storage consumption as an IT driver for HANA adoption.

I would find it strange that a company would implement SAP HANA just because of that, as I think the rationale for HANA adoption should be a business related one.

At least, if today I was working as a customer, that would be my thinking: what does this bring additionally to my business, balanced against the change costs it will imply.

Nevertheless, stepping back a bit from all the marketing stuff that has been around since the S/4 announcement, let’s think on it with more calm.

Where the claim for SAP HANA storage footprint reduction does comes from?

For sure many of you have seen the following image:

For the sake of organizing better my ideas, let me call each of the reduction factors in the image above, Jump1, Jump2 and Jump3.


               Where does the data reduction come from when “just” moving from Oracle to HANA?

Analyzing them, where does Jump1 comes from?
  • Looking at a traditional SAP ERP implementation on Oracle, one thing we know is that having all data in memory, the need for a massive amount of Indexes to speed up the access to data goes away.
    • If you think about it, more than 50% of the storage footprint of an Oracle database supporting SAP ERP is just for indexes! So, here is 50% reduction in space.
  • On the other side, SAP HANA compresses the data in memory, which provides an additional contribution to the storage footprint reduction.
  • Also, in the “traditional DB world” some records were loaded in multiple database tables (so data redundancy) to serve different application needs and avoid data access bottlenecks. For example, you had a table with the raw data, another with that same data aggregated by year, and another aggregated by some other variable.
    • So, in the new in-memory world of HANA, there is no need for this as well, as SAP is replacing all those tables with views on top of a single data record, and through the elimination of data redundancy, also reducing significantly the data footprint of HANA.


And mainly this is where SAP is expecting the 5x data footprint reduction just out of the Jump1.

And it’s a fair expectation!

Of course, like any generic statements, in reality each case will be a case, and I’ve seen cases where there was only about 50% of data reduction (you can call it 2 x), which summing up the need for free space in the HANA server for calculations and HANA functioning, ended up requiring exactly the same amount of RAM as the disk footprint of the original Oracle Database. This was of course an extreme case of a database with an industry vertical where SAP hasn’t yet done all the table redundancy clean-up, and where the source database was already compressed.


Where does the footprint reduction come from when moving from Suite on HANA to S/4?

Jump2 footprint reduction comes from the well know “magic” of Simple Finance.

The image bellow is for sure also known by many:


The data redundancies I’ve started to describe above, when we talk about basic business objects in the ERP, meant in the example of finance that the data that “could reside” only on 4 tables, was duplicated to about 23 tables.

With the redesign of the SAP’s well know FI/CO modules, now rebranded as sFin, SAP reduced the tables from those 23 to 4. So, this is what is accounting for the additional 2,5x data footprint reduction.

Here I would be careful just for one aspect: we only have today sFin. sLog (Simplified Logistics) has been announced to be launched in the 2nd half of 2015.

But there is a lot more to SAP applications than just sFin and sLog. So, let’s wait and see what the reality will bring us before start launching the fireworks. And with this I’m not insinuating in anyway that the reduction will be smaller than claimed. In some modules it can actually be more.

Again, if you are moving today to S/4, the only module that will observe this dramatic reduction will be sFin, as none of the others are yet available.

Meaning: manage carefully your expectations. This is a nice statement of direction, but the reality is not there yet.


               What about the split between actual and historical?

Jump3 "will" come from the adoption by SAP Business Suite of the functionality announced with SAP HANA SPS09, called data tiering.

And I put the "will" between "", because dynamic tiering is not ready yet for SAP Business Suite or S/4, and is still today restricted in usage to BW (SAP product management mentione the need for dynamic tiering needing to enable "hybrid tables" before it can be considered for busines suite).

So, What I'll describe here is "imagining the future", as today the reality is just based on the possibility of keeping some data only on disk, and load to memory upon need, sort of working like the old ABAP buffers (least used get's "destaged" to provide space for new objects being loaded in memory).


In that presentation you’ll find the following slides:


So, the idea if that on S/4 HANA, the ILM (Information Lifecycle Management) functionality will be redesigned to take advantage of this feature (somewhere in the future...), where the idea is that the HANA system will be able to automatically determine the relevance of a specific data record and either place it in memory, or in the “Warm Store”. Remember, I just said this is the “future idea”. So we are not there yet.

If you look at this slide, the warm store is another service of the HANA database, where the primary image will be on disk.

What does this mean? This will be a columnar compressed database, but optimized to run on disk.
If SAP implements this well, and according to the expectations, one of the things that made SAP databases grow so much to a size almost unmanageable, was the difficulty to define data governance policies that then led to data archiving practices.

Meaning: many SAP customers never did any archiving, not because it was technically challenging, but because no one on the organization has put their neck on the guillotine in defining what data could be removed from the database.

So, here we are no longer talking about data footprint reduction but rather about placing the data on the most “economically sensible” medium, according to that data’s value.

For example, if you want to make real time business decisions, maybe this year’s data is fundamental to be accessed very fast, but do you need the data from 10 years ago to be available at the same speed, and so at the same cost? Maybe not.

Here SAP is finally introducing the concept of data temperatures, and data tiering. Concepts, that companies like EMC have developed and successfully implemented many years ago. The difference here is that SAP is trying to implement this logic on the DB code.

We’ll need to wait and see how successful they will be in implementing this, because if the data tiering doesn’t come to be dynamic, lots of benefits will be lost due to the same reasons lots of customers never archived: lack of governance, lack of technical knowledge, or not wanting to deal with that additional level of complexity.

Nevertheless, data storage has never been more important. What changes here is the profile of that storage as new variables will increase of importance in the new HANA reality.


               The VMware effect on SAP HANA Data Volumes

So, lets now put all of this in perspective.

Do you remember what happened to the number of servers that existed in organizations when VMware made deploying them so easy? They went sky rocket!

Translating to HANA, if SAP makes – not only data tiering – but as well data acquisition simple, integrating structured and non-structured data, capturing machine data, making HANA a “business information hub” for the organization, two of the “Big Data V’s” will hit hard these systems like nothing we’ve see so far: the Volume and the Variety.


The performance and lifecycle effects on storage capacity

Adding two final variables to this discussion before diving into my conclusions of this phenomena:
  • A system that on Oracle needed 32 CPU cores to run its database, on HANA may run on 120 CPU cores;
    • Imagine loading machine data into a 120 CPU (or even 240 cores and more). How many log writes will such a system generate;
    • HANA has to comply with ACID principles of Atomicity, Consistency, Isolation and Durability, so whatever happens in the HANA world will have to be persisted on a “persistent medium”.
    •  Maybe its more sexy to call it persistency, but this is storage! It may be a different type of storage, more oriented to speed than to capacity, but this is what storage companies are moving for, as their offerings will be more needed than ever!
  • What about High Availability? Disaster Recovery? Data protection or Backup and Recovery (whatever you like to call it)? And application change management?
    • All these activities have demanded additional storage capacity over the years. Having customers demanding as much as 16 times the productive database capacity to support the requirements here (DR site, Test systems, etc);
    • One thing I haven’t seen thoroughly discussed yet is how SAP Applications Lifecycle Management will evolve in the new S/4 reality, as this will be determinant to define the true impact of HANA on the storage footprint.


One thing I know for sure: the value of information for organizations will only grow faster.

So, I do not see organizations assuming data loss anymore, and even less in this new HANA world.
Having all data accessible at “nanosecond” grade speeds, and increasing the dependency of business processes on real time data, will imply increasingly demanding architectures in terms of disaster avoidance and business continuity.


Conclusion

In conclusion, yes, HANA may drive some data footprint reduction.

And it must, to be viable! As 1 TB of RAM does not cost the same as 1 TB of disk.

Determining the right value of data, and putting it on the right “economically suitable” medium, is fundamental for the SAP HANA ROI equation.

So, I see the “HANA data volume reduction” more on the perspective of HANA’s viability itself (someone wrote some weeks ago that the price list of 12 TB of RAM is over 1 million USD!!! So, a lot more than the same capacity on disk).

But thinking on the increased easiness of loading and manipulating data in HANA, associated to the expected volume and variety coming for example from SAP HANA integration with machine data, I’m not sure that in a 10 years period the volume of stored data will actually be less than it is today with SAP ERP on Oracle.

If I may make a guess, I think it will not only increase, but it will increase at an accelerated pace!

What I take from all this discussion is that probably the “infrastructure things” customers will buy in this new “in-Memory” world will be different from the ones they were used to buy up until today, but maybe the budget will stay the same.

Providers will need to adapt to this new reality to stay relevant and in business.


But considering SAP’s own statements that the HW costs are only the tip of the iceberg of the total IT costs, there are so much saving to be realized on other areas, that I wouldn’t go all obsessed with the infrastructure part of it, as what I see is that “the early adopter’s induced obsession” with CAPEX reduction, now that some of them have reached 2 or 3 years of operations experience, have revealed a significant increase on all the costs hidden bellow the water (as per the slide above).

The money spend on Operations and Change Management is massive in many organizations, and can easily - over a period of 5 years - be 4 or 5 times the investment cost of the infrastructure.

Let me suggest you all to have a look at a presentation from SAP focused exactly on this: the new HANA economics.

As you can see, SAP is also evolving their understanding, and aspects like SAP HANA Tailored Datacenter and Virtualization are just a natural evolution step on SAP HANA maturity, so options you should consider from the start.

If you agree with SAP’s analysis there, a couple of things stand out that confirm what has been my reasoning for quite some time now:
  • When implementing HANA chose a Tailored Datacenter Implementation as it will drive out costs;
  • When possible use commodity hardware (for example the new validated Intel E5 based servers – available for configs up to 1,5 TB);
  • Virtualize your systems whenever possible (vSphere 6 coming in a couple of months to support HANA Scale-out virtualized, and scale-up systems up to 4 TB).


And be prepared for the unexpected, as not only your business may change to unexpected directions and making massive “monolithic and inflexible” investments will not help your business become more agile.

With all the footprint reduction described here, as it will imply the implementation of data temperatures, and tiering data out of RAM to more affordable mediums, implementing HANA in a VM, with the Warm Store on another VM, and the HADOOP store on another, all using shared storage, will be the right way to go.


Looking to the picture above, I believe it is clear that "an appliance" cannot respond to the architecture needs of this new SAP HANA reality.

And, don’t take my word for it, as it is stated crystal clear in the SAP documents I’ve been mentioning through this blog post!

So, I would expect to see:

  • a raise on HDFS capable storage in conjunction with HANA to store the less valuable “machine data” managed by a fully virtualized HADOOP cluster;
  • a rise in the demand for cost effective, flash optimized storage to support the warm store, as the majority of volume for the structured thata will be there in the future;
  • and a speed optimized “multi-channel” storage to support HANA massive log generation, and speedy restart time needs for application availability requirements;
  • SAP HANA Tailored Datacenter Integration become the preferred deployment model for SAP HANA;
  • Virtualization usage to see increased adoption.



I hope this discussion will help you out to put in perspective both your architecture and data placement strategies for this new HANA world.