SAP has published quite some time ago their "storage requirements whitepaper" for SAP HANA, which explains at a quite interesting level of detail, how SAP HANA uses storage, and to what extent storage characteristics are relevant for SAP HANA operations.
So far, no news.
What is new, is that with each new version of this "Storage Whitepaper", SAP has been simplifying the SAP HANA Storage requirements (for example on this latest version by reducing the capacity requirements for the /hana/shared/ filesystem), which is a clear reflection of the increased experience on SAP HANA operations, increased stability of the software, and so a clear sign of maturity.
If you haven't read this document yet, I would really suggest you to take some time to go through it in detail.
If you already saw this document at some point in time, be sure to review it and be up to date on the latest.
The SAP HANA Storage Requirements Whitepaper is published at: http://scn.sap.com/docs/DOC-62595
Reference to this and other important documents in regards to SAP HANA infrastructure integration can be found on the right pane of my blog under reference pages: SAP HANA Technical Documentation
All about Business / IT alignment, passing through IT Strategy and Architecture. Here you'll find my experience, learning and thoughts on the Business relevance/role of IT in organizations, SAP Systems architecture, sourcing and operational models. Being HANA and CLOUD the buzz words shaking up the SAP world like we haven't seen since the introduction of R/3 in 1992, these topics and their implication on existing organizations are my main focus at this moment.
2015-04-29
2015-04-23
The right architecture for SAP S/4 HANA and the Internet of Things?
SAP HANA is changing the long time ERP paradigms for SAP customers.
But SAP HANA itself is evolving very fast, and bringing new functionality and possibilities with every new release. Among the new variables needed to be considered are aspects like:
But SAP HANA itself is evolving very fast, and bringing new functionality and possibilities with every new release. Among the new variables needed to be considered are aspects like:
- the introduction of data temperatures and tiering data to the right “economically viable” repository;
- integration of machine data;
- the mandate for the new HANA warm store to be on a separate server than the HANA system, and using shared storage;
- the new HANA functionality (named HANA "Vora") promising to enable HADOOP for the enterprise;
- the increased value of data and the increased need for availability and data protection in organizations.
Despite the expected data footprint reduction from the
migration of SAP ERP to S/4 HANA, the volume of data managed by SAP HANA may only increase, which mandates to think infrastructure in a different perspective.
In this blog post, while explaining where the SAP HANA data footprint
reduction will come from, and I’ll talk a bit about the relevance of data storage
and virtualization in the new HANA world.
Also separating what is marketing noise from the reality, will take a closer look at what is possible today as well as what is coming, confirming the increased openness SAP HANA is providing today just comparing with what was the reality two years ago.
I'll conclude by making the case on why implementing SAP HANA in a TDI and Virtualized infrastructure are the right choices for companies planning to implement HANA today.
Also separating what is marketing noise from the reality, will take a closer look at what is possible today as well as what is coming, confirming the increased openness SAP HANA is providing today just comparing with what was the reality two years ago.
I'll conclude by making the case on why implementing SAP HANA in a TDI and Virtualized infrastructure are the right choices for companies planning to implement HANA today.
Setting the scene
Just read an interesting article on a UK online journal, talking
about the promise of less storage consumption as an IT driver for HANA
adoption.
I would find it strange that a company would implement SAP
HANA just because of that, as I think the rationale for HANA adoption should be
a business related one.
At least, if today I was working as a customer, that would be
my thinking: what does this bring additionally to my business, balanced against
the change costs it will imply.
Nevertheless, stepping back a bit from all the marketing
stuff that has been around since the S/4 announcement, let’s think on it with
more calm.
Where the claim for SAP HANA storage footprint reduction does
comes from?
For sure many of you have seen the following image:
For the sake of organizing better my ideas, let me call each
of the reduction factors in the image above, Jump1, Jump2 and Jump3.
Where does the data reduction
come from when “just” moving from Oracle to HANA?
Analyzing them, where does Jump1 comes from?
- Looking at a traditional SAP ERP implementation on Oracle, one thing we know is that having all data in memory, the need for a massive amount of Indexes to speed up the access to data goes away.
- If you think about it, more than 50% of the storage footprint of an Oracle database supporting SAP ERP is just for indexes! So, here is 50% reduction in space.
- On the other side, SAP HANA compresses the data in memory, which provides an additional contribution to the storage footprint reduction.
- Also, in the “traditional DB world” some records were loaded in multiple database tables (so data redundancy) to serve different application needs and avoid data access bottlenecks. For example, you had a table with the raw data, another with that same data aggregated by year, and another aggregated by some other variable.
- So, in the new in-memory world of HANA, there is no need for this as well, as SAP is replacing all those tables with views on top of a single data record, and through the elimination of data redundancy, also reducing significantly the data footprint of HANA.
And mainly this is where SAP is expecting the 5x data
footprint reduction just out of the Jump1.
And it’s a fair expectation!
Of course, like any generic statements, in reality each case
will be a case, and I’ve seen cases where there was only about 50% of data
reduction (you can call it 2 x), which summing up the need for free space in
the HANA server for calculations and HANA functioning, ended up requiring
exactly the same amount of RAM as the disk footprint of the original Oracle
Database. This was of course an extreme case of a database with an industry
vertical where SAP hasn’t yet done all the table redundancy clean-up, and where
the source database was already compressed.
Where does the footprint reduction come from when moving from Suite on
HANA to S/4?
Jump2 footprint reduction comes from the well know “magic”
of Simple Finance.
The image bellow is for sure also known by many:
The data redundancies I’ve started to describe above, when
we talk about basic business objects in the ERP, meant in the example of
finance that the data that “could reside” only on 4 tables, was duplicated to
about 23 tables.
With the redesign of the SAP’s well know FI/CO modules, now
rebranded as sFin, SAP reduced the tables from those 23 to 4. So, this is what
is accounting for the additional 2,5x data footprint reduction.
Here I would be careful just for one aspect: we only have
today sFin. sLog (Simplified Logistics) has been announced to be launched in
the 2nd half of 2015.
But there is a lot more to SAP applications than just sFin
and sLog. So, let’s wait and see what the reality will bring us before start
launching the fireworks. And with this I’m not insinuating in anyway that the
reduction will be smaller than claimed. In some modules it can actually be
more.
Again, if you are moving today to S/4, the only module that
will observe this dramatic reduction will be sFin, as none of the others are
yet available.
Meaning: manage carefully your expectations. This is a nice
statement of direction, but the reality is not there yet.
What about the split between
actual and historical?
Jump3 "will" come from the adoption by SAP Business Suite of
the functionality announced with SAP HANA SPS09, called data tiering.
And I put the "will" between "", because dynamic tiering is not ready yet for SAP Business Suite or S/4, and is still today restricted in usage to BW (SAP product management mentione the need for dynamic tiering needing to enable "hybrid tables" before it can be considered for busines suite).
So, What I'll describe here is "imagining the future", as today the reality is just based on the possibility of keeping some data only on disk, and load to memory upon need, sort of working like the old ABAP buffers (least used get's "destaged" to provide space for new objects being loaded in memory).
And I put the "will" between "", because dynamic tiering is not ready yet for SAP Business Suite or S/4, and is still today restricted in usage to BW (SAP product management mentione the need for dynamic tiering needing to enable "hybrid tables" before it can be considered for busines suite).
So, What I'll describe here is "imagining the future", as today the reality is just based on the possibility of keeping some data only on disk, and load to memory upon need, sort of working like the old ABAP buffers (least used get's "destaged" to provide space for new objects being loaded in memory).
If you look for the presentations
released by SAP at the time of SAP HANA SPS09 announcement, you can find one
presentation specific on this topic.
In that presentation you’ll find the following slides:
So, the idea if that on S/4 HANA, the ILM (Information
Lifecycle Management) functionality will be redesigned to take advantage of
this feature (somewhere in the future...), where the idea is that the HANA system will be able to automatically
determine the relevance of a specific data record and either place it in
memory, or in the “Warm Store”. Remember, I just said this is the “future idea”.
So we are not there yet.
If you look at this slide, the warm store is another service
of the HANA database, where the primary image will be on disk.
What does this mean? This will be a columnar compressed
database, but optimized to run on disk.
If SAP implements this well, and according to the
expectations, one of the things that made SAP databases grow so much to a size
almost unmanageable, was the difficulty to define data governance policies that
then led to data archiving practices.
Meaning: many SAP customers never did any archiving, not
because it was technically challenging, but because no one on the organization
has put their neck on the guillotine in defining what data could be removed from
the database.
So, here we are no longer talking about data footprint
reduction but rather about placing the data on the most “economically sensible”
medium, according to that data’s value.
For example, if you want to make real time business
decisions, maybe this year’s data is fundamental to be accessed very fast, but
do you need the data from 10 years ago to be available at the same speed, and
so at the same cost? Maybe not.
Here SAP is finally introducing the concept of data
temperatures, and data tiering. Concepts, that companies like EMC have developed
and successfully implemented many years ago. The difference here is that SAP is
trying to implement this logic on the DB code.
We’ll need to wait and see how successful they will be in
implementing this, because if the data tiering doesn’t come to be dynamic, lots
of benefits will be lost due to the same reasons lots of customers never
archived: lack of governance, lack of technical knowledge, or not wanting to
deal with that additional level of complexity.
Nevertheless, data storage has never been more important.
What changes here is the profile of that storage as new variables will increase
of importance in the new HANA reality.
The VMware effect on SAP HANA Data Volumes
So, lets now put all of this in perspective.
Do you remember what happened to the number of servers that
existed in organizations when VMware made deploying them so easy? They went sky
rocket!
Translating to HANA, if SAP makes – not only data tiering –
but as well data acquisition simple, integrating structured and non-structured
data, capturing machine data, making HANA a “business information hub” for the
organization, two of the “Big Data V’s” will hit hard these systems like
nothing we’ve see so far: the Volume and the Variety.
The performance and lifecycle effects on storage capacity
Adding two final variables to this discussion before diving
into my conclusions of this phenomena:
- A system that on Oracle needed 32 CPU cores to run its database, on HANA may run on 120 CPU cores;
- Imagine loading machine data into a 120 CPU (or even 240 cores and more). How many log writes will such a system generate;
- HANA has to comply with ACID principles of Atomicity, Consistency, Isolation and Durability, so whatever happens in the HANA world will have to be persisted on a “persistent medium”.
- Maybe its more sexy to call it persistency, but this is storage! It may be a different type of storage, more oriented to speed than to capacity, but this is what storage companies are moving for, as their offerings will be more needed than ever!
- What about High Availability? Disaster Recovery? Data protection or Backup and Recovery (whatever you like to call it)? And application change management?
- All these activities have demanded additional storage capacity over the years. Having customers demanding as much as 16 times the productive database capacity to support the requirements here (DR site, Test systems, etc);
- One thing I haven’t seen thoroughly discussed yet is how SAP Applications Lifecycle Management will evolve in the new S/4 reality, as this will be determinant to define the true impact of HANA on the storage footprint.
One thing I know for sure: the value of information for
organizations will only grow faster.
So, I do not see organizations assuming data loss anymore,
and even less in this new HANA world.
Having all data accessible at “nanosecond” grade speeds, and
increasing the dependency of business processes on real time data, will imply
increasingly demanding architectures in terms of disaster avoidance and
business continuity.
Conclusion
In conclusion, yes, HANA may drive some data footprint
reduction.
And it must, to be viable! As 1 TB of RAM does not cost the
same as 1 TB of disk.
Determining the right value of data, and putting it on
the right “economically suitable” medium, is fundamental for the SAP HANA ROI
equation.
So, I see the “HANA data volume reduction” more on the perspective
of HANA’s viability itself (someone wrote some weeks ago that the price list of
12 TB of RAM is over 1 million USD!!! So, a lot more than the same capacity on
disk).
But thinking on the increased easiness of loading and
manipulating data in HANA, associated to the expected volume and variety coming
for example from SAP HANA integration with machine data, I’m not sure that in a
10 years period the volume of stored data will actually be less than it is
today with SAP ERP on Oracle.
If I may make a guess, I think it will not only increase,
but it will increase at an accelerated pace!
What I take from all this discussion is that probably the “infrastructure
things” customers will buy in this new “in-Memory” world will be different from
the ones they were used to buy up until today, but maybe the budget will stay the same.
Providers will need to adapt to this new reality to stay
relevant and in business.
But considering SAP’s own statements that the HW costs are
only the tip of the iceberg of the total IT costs, there are so much saving to be
realized on other areas, that I wouldn’t go all obsessed with the
infrastructure part of it, as what I see is that “the early adopter’s induced
obsession” with CAPEX reduction, now that some of them have reached 2 or 3 years
of operations experience, have revealed a significant increase on all the costs
hidden bellow the water (as per the slide above).
The money spend on Operations and Change Management is massive in many organizations, and can easily - over a period of 5 years - be 4 or 5 times the investment cost of the infrastructure.
Let me suggest you all to have a look at a
presentation from SAP focused exactly on this: the new HANA economics.
As you can see, SAP is also evolving their understanding, and aspects like SAP HANA Tailored Datacenter and Virtualization are just a natural evolution step on SAP HANA maturity, so options you should consider from the start.
As you can see, SAP is also evolving their understanding, and aspects like SAP HANA Tailored Datacenter and Virtualization are just a natural evolution step on SAP HANA maturity, so options you should consider from the start.
If you agree with SAP’s analysis there, a couple of things
stand out that confirm what has been my reasoning for quite some time now:
- When implementing HANA chose a Tailored Datacenter Implementation as it will drive out costs;
- When possible use commodity hardware (for example the new validated Intel E5 based servers – available for configs up to 1,5 TB);
- Virtualize your systems whenever possible (vSphere 6 coming in a couple of months to support HANA Scale-out virtualized, and scale-up systems up to 4 TB).
And be prepared for the unexpected, as not only your
business may change to unexpected directions and making massive “monolithic and
inflexible” investments will not help your business become more agile.
With all the footprint reduction described here, as it will
imply the implementation of data temperatures, and tiering data out of RAM to
more affordable mediums, implementing HANA in a VM, with the Warm Store on
another VM, and the HADOOP store on another, all using shared storage, will be
the right way to go.
Looking to the picture above, I believe it is clear that "an appliance" cannot respond to the architecture needs of this new SAP HANA reality.
And, don’t take my word for it, as it is stated crystal
clear in the SAP documents I’ve been mentioning through this blog post!
So, I would expect to see:
- a raise on HDFS capable storage in conjunction with HANA to store the less valuable “machine data” managed by a fully virtualized HADOOP cluster;
- a rise in the demand for cost effective, flash optimized storage to support the warm store, as the majority of volume for the structured thata will be there in the future;
- and a speed optimized “multi-channel” storage to support HANA massive log generation, and speedy restart time needs for application availability requirements;
- SAP HANA Tailored Datacenter Integration become the preferred deployment model for SAP HANA;
- Virtualization usage to see increased adoption.
I hope this discussion will help you out to put in
perspective both your architecture and data placement strategies for this new
HANA world.
Subscribe to:
Posts (Atom)