It’s been quite some time since I’ve blogged, but as you all
know, changing jobs always requires some adjustment time, and the traction that
the Virtustream value proposition is observing in the European market, has
simply made my agenda overflow! The good thing about is that my learning has
only grown due to the massive number of customer engagements I’ve had just in a
2 months period.
And on these interactions, the question on whether the best
option when deploying applications based on SAP HANA is to scale-up or scale-out
keeps popping up much frequently than I would ever expect. Despite SAP’s
recommendations in this regards, as the reality sometimes outpaces current
rules, this becomes a key discussion point when deciding whether to move
forward with SAP HANA adoption plans or not.
In this blog post I’m coming to share my most recent
learning and experiences on this topic, as well as my reasoning behind the
pros and cons of each of the scenarios, while sustaining my arguments with the
currently available information from SAP.
My goal here is not to provide any clear guidance, but
rather to show you my reasoning process and arguments, so that you can have
your own discussion in a more informed way, and leveraging knowledge on what
others are going through.
In the end, each customer scenario is different, and so
there is no “one rule that fits all”.
Setting the Scene
Time goes by, and this topic keeps popping up in many of the
customer engagements I participate in. I’ve already written about this long ago, but the fact that I keep getting faced
with absurd situations got me going to write about it more extensively.
And the question is: “For my specific scenario, should I
implement SAP HANA in a scale-up or scale-out model”?
On my experiences, there were multiple angles to this
question, and depending on the angle, the response may be different.
One of the aspects that disturbs me the most is how so many
“technical architects” I come across ignore the variables in the picture above,
and come up with solutions, that when we factor in the increased openness SAP
HANA is going through, and the reasonable expectations on the expansion of
available options over a 3 year period, still come with designs that on my mind
simply do not make any business sense. After all IT architecture is not just a
“techies” discipline! Being technically grounded, the IT architects should be
key pieces in driving business and IT alignment, and so making choices and
imposing scenarios that completely ignore the business side of IT, simply leads
to bad solutions.
To get background on SAP HANA scalability, you can read
SAP’s scalability whitepaper at: http://scn.sap.com/docs/DOC-60340
Will not repeat that here, so if you are new to the HANA
scalability topic, I would recommend you to read this whitepaper before reading
this blog post.
While I’ll focus mostly on the technical aspects of this
discussion, on the back of my head is always: on one side the fact that IT has as primary goal to serve and support business needs; and on the other facts like the dramatic increase
in RAM capacity on servers just over the last two years with the evolution of
Intel CPUs from Westmere, to Ivy-Bridge and now Haswell, which has come to
enable organizations today to dramatically simplify architectures thanks to the
increased capacity these latest generations of technology have come to enable.
To get started, let remember current SAP recommendations in
regards to SAP HANA scale-up vs scale-out:
- Scale-up first as much as you can before considering scale-out;
- Scaling out is only generally available for analytic workloads (BW on HANA or DataMart scenarios);
- For transactional workloads, like SAP Business Suite on HANA (ERP, CRM, SRM, SCM, etc), today only scale-up is generally available.
And we also all have heard SAP’s vision for the future of
the enterprise, with a single version of truth, a single HANA database hosting
all business applications, enabling real-time reporting on operational data.
So, why am I coming back with this question, isn’t it clear
already?
Cost and Reality Check disrupt
current rules on SAP HANA scalability
Well, there are 2 variables that disrupt all the above
stated:
- Cost
- Reality
Cost, because
customers won’t buy something just because someone tells them they should. I’ve written a lot about that as well, and don’t want to repeat myself here, but
IT exists to serve business purposes, so the cost of IT cannot be higher than
the value gained by the services IT provides, and this is the simple reason
many organizations live with “good enough” solutions.
How many times, being you a buyer or a seller in IT, have
seen customers complaining systems don’t perform, and buyers going there trying
to sell more gear to solve that, and those projects never got budget? In many
of those situations, as IT stakeholders went up the chain up to the CIO level,
or even the CFO, LoB or CEO levels to get funding for those projects, what you
hear from them is that “we can live with what we have, and cannot afford more
costs there”.
And this brings to the sellers the challenge of needing to
build a TCO business case, to show how the benefits sold will pay off the
investment and provide savings on the long run, which in many cases is not
easy.
So, balancing costs with the business benefits is a fundamental
aspect of any IT adoption decision, which in turn drives a search for the best
balance between architecture choices and “good enough” business results, than
enable the lowest possible cost each specific business requirements.
And Reality,
because the reality I’m seeing in customers is not compatible with the simple
rules I’ve stated above as communicated by SAP.
On the “reality check” part, let me just give you to
examples from real customer cases I’ve been involved just over the last 2
months.
On one case, we were talking about an analytics scenario,
with native HANA applications, so no ABAP there. The initial sizing was 12TB,
with an expected growth over 3 years to 40 TB of in-memory data, and 800 TB of
disk based data.
We need to remember that today the biggest box available for
SAP HANA today is the SGI UV300 server which can hold up to 24 TB of RAM (in
32GB DIMMs). Also important to note that when talking about “disk based data”,
on this scenario it would be SAP HANA Dynamic Tiering, which today only allows
for single-server implementations of the “Extended Store”, which ends up
limiting the capacity of the extended store to about 100 TB.
So, we are talking about a system that breaks the barriers
of “supported”. And I say “supported” because in many cases the limitations are
not a factor of “technology limitation” but rather a factor either of existing
experience, knowledge or confidence in such extreme scenarios.
We all know that SAP is an organization conservative by
nature, and when they state something as “generally available” it means that it
has been extensively tested, is well documented, and basically any customer
should be able to do it with no major surprises. The other side of this is when
we face a context of such accelerated pace of change, where customer scenarios
keep pushing the boundaries of “generally available”, and that’s where
“controlled availability” or “restricted availability” come in. For these
extreme cases, if you limit yourself to the options of “generally available”,
you may just be killing your organization’s adoption plans. To which I would
say, its better to push the boundaries a bit, than just give-up.
So, on many of these extreme scenarios, scaling-up is not an
option, and you must consider scale-out “by design” and from the start.
The other case was a customer evaluating the migration of a
SAP ERP with IS-U to HANA. The sizing pointed out to the need of 25 to 35 TB of
In-memory capacity for go-live (I’m still working on qualifying this further
and understand whether the sizing assumptions were correct, as it would not be
the first time I would see some “gross mistakes” in this area).
So, here as well, we are outside the limits of the currently
possible.
We might discuss whether SGI would be able to load their 32
socket servers with 64GB DIMMs and then scale to 48 TB of in-memory capacity.
Being a fan of SGI technology, putting myself in the shoes
of a customer (which I once was), once you go beyond 16 sockets, you are
limited to a single vendor, as no one else can provide boxes that big, and if
you are getting started with this sizes, you can only expect that it will
continuously outgrow the limits of technology. So I would defend a
“multi-vendor limit”, as for example when considering 16 sockets you have
multiple vendors (Like SGI and Bull), which provides you bargain power as you
are not locked in to a single vendor, which as a consequence makes prices more
reasonable.
Also, defining this limit, enables you “in-box” growth paths, as if
SAP confirms that you have a memory bottleneck but your CPU utilization is
rather low, you may be allowed just to grow in memory, braking the current
rules of memory to CPU ratios (note that this only applies to growth scenarios,
and not initial sizing – for more info on this read for example SAP Note 1903576 and it’s attachment).
So, why not start with scale-out by design?
Oh, yes. For the first use case you could as it’s an
analytics use case, but for this one you can’t as SAP HANA Scale-out is not
generally available for transactional use cases.
Scale-out transactional workloads
(Suite on HANA) is possible!!!
But do you know that SAP HANA Scale-out is released under
“controlled availability” for transactional use cases? And that there is an
installation guide for “SAP Business Suite powered by SAP HANA Scale-out” in
attach to SAP Note 1781986? (Check-out my list of relevant SAP notes on this
topic at: http://sapinfrastructureintegration.blogspot.com/p/important-sap-notes.html)
So, scaling out transactional workloads on HANA is not a
discussion of technical possibilities.
It is possible, there are already a few customers doing it
and there is SAP documentation on it.
Being in controlled availability means it is not a solution so
spread out that anyone can do it without supervision or support, so being in
controlled availability means that SAP must approve you to enter the program in
order to have support, enabling them to validate that your scenario and
arguments make sense, and at the same time, commit to support you and enable
you to then operate the solution.
And Suite on HANA has been in controlled
availability for a long time as you can see from the “very old” SAP slide above
(anything older than 1 year, in the HANA world is already too old! :) ).
Some of the proposed solutions
I’ve seen make no sense…
Before going further on my perspective on scale-out, let me
tell you the “indication” that was given to both these customers, with which I
struggle to agree.
On the analytics use case, the customer was told to break
his “analysis groups” in independent “universes”, and then load each group in a
different HANA system, and then have multiple scale-up smaller single nodes,
instead of a single large scale-out cluster.
On the transactional use case, a similar solution to the one
above was suggested: break your business units in groups and put one business
unit in each separate ERP, with a smaller single HANA scale-up system.
Heck, what the hell?!?!?! So, what about all that talk of
the single system, single source of truth SAP has been telling over the last
couple of years?!?!?!?! Am I missing something here?
In both these scenarios, the implication will be a
proliferation of smaller systems, and the build of an “integration system”. For
example in the transactional use case, we’ll go back to the scenario of 1 ERP
for each company and then a consolidation ERP for the holding. Wasn’t this one
of the things SAP was promising to end with the adoption of SAP HANA?
Which one is better or worse? To integrate the data at the
database level through a scale-out architecture, or to integrate it at a
functional level by creating interfaces across the multiple instances?
I see this in the following perspective: once data exceeds
the capacity of a single server you’ll need to distribute it across multiple
servers anyway. If you divide the universe of data across disparate independent
systems, you’ll then need to take care of the integration at the functional
level. If you are a company managing a portfolio of business with a dynamic
acquisition and divestiture activity, you’ll have for sure a consolidation
system, and have set your functional knowledge already to manage the
integration at this level, so I would understand that you break your “massive
single instance into more and smaller independent systems”.
But if you really need the integrated view across all those
universes of data, I would say it would be easier to manage a scale-out cluster
(being for analytic workloads or transactional workloads), than breaking the
data across smaller single systems and then keep leaving with the “data latency”
problem due to the ETL process and then the delayed reporting process we heard
so much Hasso Platner talking about as a problem that having all data in HANA
would solve.
In the end, when dealing with very large systems, complexity will exist. So as a CIO or senior leader on an IT organization you need to evaluate where you'd rather have the complexity:
In the end, when dealing with very large systems, complexity will exist. So as a CIO or senior leader on an IT organization you need to evaluate where you'd rather have the complexity:
- At the functional level, and have your development and functional teams dealing with the integration of data from multiple systems, with interface development, agregation systems, etc... well, all the things you've had across the last 2 decades on SAP;
- At the technical level, and have your infrastructure architects and HANA DBAs deal with it, through expensive SQL statement analysis and infrastructure optimization.
- If you integrate at the functional level, whether you can leave with the "reporting latency" introduced by the need to move data around and aggregate it , and whether you trust more the capacity of your functional and development teams to manage interfaces and data integration;
- If you integrate at the technical level, whether you trust your DBA and infrastructure teams ability to properly design, architect, build and operate such an environment.
But let me continue this blog post ignoring that I'm working for a cloud provider specialized on SAP HANA, and continue the reasoning as if I was involved in deciding on-premise implementations.
Will the HANA reality be of many
small systems??? In some cases, yes.
Being a bit of a devil’s lawyer here, right? ;-)
But we need to make our minds clear: either we believe in
the benefits of “single pond” where we build a single source of truth, or we
understand that there are still reasons for customers to have multiple systems
like happened in the past. So, on my mind, the message in the PowerPoints is
not matching the reality I’m seeing in the field.
As a sort of disclosure, I believe there are plenty of
reasons for organizations to maintain multiple separate databases for each of
their business systems, and I’ve written on how VMware based virtualization is
a great match here, as it brings massive operational simplification and
efficiencies. The example I gave at a previous blog post was a corporation that
manages a portfolio of business, and keeps buying and selling companies, where
keeping separate independent ERP systems for each business is crucial to
simplify the divestiture when the time comes. We all know that splitting the data within a single SAP ERP is the key pain in separating a business group (and I've been more than once in such projects, and it was not fast, cheap or simple at all).
But I also understand that there are some special cases
where having a single database for the whole business brings massive benefits,
and we need to tackle both these scenarios with an open mind, and no bias
toward one or the other, in order to truly be able to make the best choice for
each customer organization. For example, companies with vertical or horizontal
integration, where the different companies businesses are interrelated and can
potencially cross sell to join customers between each other will certainly
benefit from having an integrated real-time global view across businesses,
which would point out to having a single global instance for all business units.
So, you’ll see me in one customer, depending on his
particular scenario advising to keep all their smaller systems and run them all
on VMware, while on another with a different business scenario, you may see me
advising for a single global instance with all data in a single database.
In the end it, the “best solution” really depends on each
company business scenario.
Data Temperatures with Dynamic Tiering
and SAP HANA Vora
Have to say here that I’m ignoring all the topics about data
temperatures. But that would take us on a completely
different direction, so I’ll chose to ignore the implications of data
temperatures here, although I must say, it is a very relevant angle to the
problem I’m describing.
Have to note though, that the “split between current data
and historical data” on SAP ERP on HANA is still not a reality, and the “Data
Temperature” discussion when in face of transactional systems, today is still
much more limited in options than when talking about analytic use cases like
SAP BW on HANA.
Key barrier to scaling-out transactional workloads on HANA is operations
experience
But, enough with problem analysis and considerations! Let me
then share a bit my perspective in regards to scaling out “transactional
workloads” on SAP HANA.
I believe the key reasons to avoid scaling out for
transactional scenarios is the lack of knowledge on the market on how to
implement and manage these systems.
The technology is there, is available and works.
But if people implementing and operating these systems don’t
understand the “angle of scaling out transactional workloads”, you may easily
end up with an application that, because of the performance impact a bad
scale-out architecture, implementation and operation implications, may end up
lagging a lot on expectations performance wise.
Note that I haven’t said, “won’t work”. I said “lagging on
expectations”.
And I’m making this note because I believe it’s all about
expectation management.
One customer example on proper
expectations management with SAP HANA adoption
And let me take the example on the picture bellow to make
the case.
In this example a customer was moving BPC from Oracle to
HANA. BTW, this customer was EMC Corporation (read the full story at: http://www.emc.com/collateral/analyst-reports/emc-it-breaks-grounds-sap-hana.pdf ). I was an EMC employee when this happened, and was privileged to be in close contact with the EMC IT SAP Architecture and Basis guys driving this when the project went live.
Once they moved BPC “as is” to HANA, immediately one of
their critical financial processes passed from 53 minutes to 7 minutes. Then,
after they applied EHP1 for BPC which brings the HANA optimized code for that
process, it passed to 17 seconds. EMC IT here also tested the impact of
virtualization and did tests both on physical servers and virtual servers and
verified that running that process on physical would take less 1,5 second.
So, when talking with the business users, their expectation
was to be able to run that process faster. When they saw 7 minutes they were very
happy. So when they saw 17 seconds, they were exhilarated! When asked whether
1,5 second would make any difference for them, when confronted with the
difference of cost on both scenarios, virtualization being so much cheaper
became a no-brainer option.
But when talking with the technical teams, their concern was
all about how much is the difference between running the process on physical vs
virtual, which for the business is simply an irrelevant discussion.
I’m telling this story because this happened in November
2013, way before SAP supported SAP HANA in production on VMware. And I was
privileged to observe closely this story as it developed.
I wrote a lot about the benefits of virtualization then, and that it was fundamental to do a proper expectation management. I believe we are in face of a similar “expectations
management” discussion in regards to scaling out transactional workloads on SAP
HANA.
Managing SoH Scale-out is not
more complex that managing Oracle DBs
So, on my perspective managing a SAP HANA scale-out cluster
is in no way more complex than managing an Oracle database.
It’s just a matter of understanding the conditions and
tools.
Starting with the Oracle example. I remember a colleague
that came from a Microsoft background managing SQL Server, and started on a
project where all systems were Oracle.
To make him understand that he needed to do expensive SQL
Statement analysis, that he needed to work with development teams to build new
indexes, needed to determine whether a table or index would benefit from having
dedicated table space and needed to define whether a certain tablespace would
benefit from a dedicated LUN, or when was time to do a table reorganization, or
to how do create the rollback segments so that you wouldn’t have processes
breaking… you should have seen his face… He was so lost and frustrated!!!
Now imagine that he was the SAP Basis admin on an
organization that had just implemented SAP ERP, the system was growing very
fast, and he only had his SQL Server background. The result would have been a
disaster, because he didn’t understand the need for all these optimization
processes (as for example with SAP ERP on SQL Server you don’t determine the
table placement at the disk level, SQL Server automatically stripes all data
across all available datafiles), neither knew where to look for problems, or
even if he knew where to look, he wouldn’t know how to interpret the numbers,
or what actions to take to solve the problems. Having had SAP ADM315, ADM505
and ADM506 training courses would have helped him a lot.
On my perspective, the discussion on scaling-out
transactional workload on SAP HANA is of a similar nature to the story I just
told of the SQL Server DBA going to an Oracle landscape.
Similar nature, but much simpler! As I believe the
complexity involved in managing SAP HANA in a scale-out cluster is not nearly
as complex as it was to manage an Oracle 8 database!!! Remember the “PCTFREE”,
“PCTUSED” parameters on Oracle??? Those were tough days when compared with SAP
HANA System Administration. When I remember the project I worked in 1998 of a
R/3 system with a 500GB database and 1200 concurrent users… Getting the system
running smoothly was no easy job! (those doing "Basis" back then know how huge such a system was at that time...)
Understanding latency
implications on SAP HANA performance
So, diving into SAP HANA scale-out challenges for
transactional workloads.
First of all is all about an “expectation management”
problem. And this is due to the effect of network latency on the processing of
data in the cluster.
When you look at the picture above, and if you consider that
SAP HANA is optimized to process data in RAM, you see that the latency for data
transmission between RAM and the CPU on the same NUMA node is of about 60
nanoseconds, while if you transmit data across the network between two servers
you’ll always be looking at latencies from micro to milliseconds.
This means that if you need to move data between nodes
across the network, that is a very “expensive” operation on SAP HANA
performance wise when compared with processing data in the RAM of the same
server. This is because, SAP HANA having a shared nothing cluster architecture
which means data only exists in one cluster node (I’ve also written about this a long time ago, so please read that one for some history and
context), and when in a scale-out cluster, data is stripped across multiple
nodes.
The challenge on a transactional system is then that as you
have a lot of joins between tables across many different business areas, and
you also have “business objects” like invoices, materials and others that are
stored in many different tables, you’ll be facing a very high probability of a
large number of processes requiring cross-node communication.
The consequence here is that once reporting against data
that are on tables across different server nodes, you’ll need to move data over
the network to gather it all together and calculate results. And that “costs” a
lot more time than if all tables were in the RAM of a single server. Will this
mean that those operations will be slower than on “anyDB”? Well, the reality
may be similar to the scenario I’ve described once EMC decided to virtualize
BPC on HANA.
The technical guys may be very concerned with how many millisecconds more it will take on scale-out vs scale-up, while the business guys will be thrilled with the improvements observed in terms of increased business visibility, and reduced lead-time to consolidated information, and might consider that the reports taking a bit longer to generate is a small price to pay, especially when in context of their starting scenario on legacy ERP on AnyDB.
The technical guys may be very concerned with how many millisecconds more it will take on scale-out vs scale-up, while the business guys will be thrilled with the improvements observed in terms of increased business visibility, and reduced lead-time to consolidated information, and might consider that the reports taking a bit longer to generate is a small price to pay, especially when in context of their starting scenario on legacy ERP on AnyDB.
The same problem happens when you write data, as if you
write an object that will be stored in multiple tables, and those tables are
distributed across different server nodes, you’ll need to do the write in each
of the nodes, and the commit is only given once all nodes communicate that data
has been written, so writing may become slower.
Then, my point that this is no different from managing an
Oracle database.
As in Oracle (in fact as in any database), once you start
dealing with very large systems, you’ll need to look at parameters and
variables that for smaller scenarios you just ignore, as the standard/default
configurations are good enough for you.
When faced with bigger systems, once you learn where to look
and how to manage this additional level of detail, then it becomes routine and
a no-brainer.
The same must happen with HANA. But the knowledge in the
market is just not there yet.
SAP should develop and make available a course similar to
the old “ADM315” (or any course for SAP Basis kind of people just focusing on
performance analysis and optimization) focusing on HANA, just covering the
monitoring tools today available to evaluate performance in a scale-out
cluster, and well as the tools and mechanisms to address potential performance
problems.
There is already some training on this area, must if fully
focused on developers, and so, it would be useful to have something really
focused on SAP Basis people.
For example, today there is the possibility to mirror tables
across HANA nodes. Why is this relevant? Imagine that you have a very large
table partitioned across multiple nodes, on which you run a report that implies
a join with a configuration table that is just 100 lines long. By mirroring
that table across nodes, you avoid cross node joins, and all joins are done
locally in each node, so you truly taking benefit of the massive parallel
scale-out architecture of SAP HANA.
Also the tools on HANA (like the “Plan
Visualizer”) to analyze where time is spent when a specific statement is
executed, are mind blowing when compared with I knew in Netweaver, and provide
you a level of detail that enable you to clearly identify whether the delay is
caused by network, disk or any other reason.
It is then my belief that the key barrier to further
scale-out adoption is then lack of knowledge, with all the risk implications of
it for business critical applications.
And on my perspective, although I may accept that scaling-up
is the short term solution to avoid the risk, considering the increased number
of customer cases I’m seeing with very large datasets needing massive amounts
of memory, and considering that most things about Data Temperatures are not
ready yet for transactional workloads (Dynamic Tiering, NLS or Vora do not
solve this problem now), documenting and transmitting knowledge on SAP HANA
Scale-out performance analysis and optimization would be the way to go for
these customers (at least for the customer examples I mentioned here).
Ok, then when to scale-up or to
scale-out?
Couldn’t finish this blog, without adding some
comments on the variables I identified at the beginning.
And that is, when is it a good option to scale-up and to
scale-out.
So, let's look at what many say about scale-out:
- In some cases (with certain server vendors), from a CAPEX perspective 2 x 4 socket box is a lot less expensive than 1 x 8 socket (as an example);
- If you have to do HA, looking for example for a 18 TB system, instead of having 6+6+6+(6 for HA), you would have 2x24TB;
- If you have a very dynamic change environment, where you buy and sell companies, launch and divest business lines, having a standard building block in the datacenter enables you to easily reallocate these smaller boxes, while in the big box scenario, it will always be there, as you can’t even slice it with VMware.
- Looking also from a service provider perspective (and the same for very large customers with 50+ productive SAP Systems), scale-out provides less cost of vacancy as its easier to play around with the multiple smaller boxes.
- You could argue that scale-out drives higher OPEX:
- More OS images to manage and patch:
- If you implement automation tools, doing one or many should be the same, and for these 50+ systems customers, they will always be managing a ton of OS images, so more or less 5 OS images…
- Need to do continuous table distribution analysis and replacement:
- In the Oracle world everyone looks as a normal activity to look at table partitioning optimization in BW, and do “hot-spot analysis”, reorganizing tables across tablespaces or putting tables in dedicated tablespaces. Table redistribution on HANA would look exactly like that. If you make it a part of your core knowledge, it becomes a no brainer.
- Future S/4 will be based on Multi-tenant Database containers. So we are consolidating multiple applications on the same SAP HANA System to then break it up between multiple tenants? Why would we then put them all on a single box? Rather split them across a scale-out node. The only caveat here would be the network connection between the nodes, as if the tenants would have a lot of cross tenant communication; it would definitely be faster in a single scale-up node.
- Another topic that is gaining increased traction is “data aging” or “data tiering”, meaning most likely in the future you’ll have less and less data footprint. So, you would buy a ton of iron to solve a problem that will see its impact reduced across the next years. I would rather “scale-out” and then repurpose. Add to this the fact that many of the large scale-out sizing exercises already account for 3 years of growth… so who knows what will happen in 3 years.
- Imagine as well that you have a fully virtualized datacenter, and that you’ll need a 9 TB system. Why not to have a 3x3TB scale-out system on VMware? I know this is not available today, but I believe we’ll reach it at some point. Or that you have a 4 TB system that has outgrown the vShpere6 maximums? Do you move from virtual to physical and buy a huge box, or just scale-out?
- Imagine customers that have massive growth rates on their SAP Systems (know one with a 90 TB ERP…). Keep scaling up will lead you at one point in time to a wall. Scale-out keeps your scaling options open, while enabling to invest as you grow.
- Also from a bargain perspective, if you buy many smaller servers, your bargain power as a customer is higher, and you avoid vendor lock in…
All of these are examples of arguments about scale-out I've heard, and have to say, some of them are very strong.
But let’s look at scaling-up:
- A strong argument is definitely consolidating multiple AnyDB systems on a single SAP HANA system. But again, this will be based on MDC, so does it really make sense to put all of these in a single box?
- Another strong argument is that there are always development resources in the code development teams that suck, and produce bad code, that isn't properly checked for quality, especially when in "RUN mode", so doing maintenance on existing systems with the pressure from business to do it faster and cheaper. In a single scale-up system, this problem would not be as evident as in a scale-out.
- You may also say that its easier to buy more iron than to upgrade a SAP System. So, all the things in regards to MDC and Data Tiering will still take some years to be real and mature, and you need to make a decision today. Then a scale-up would be a good fit.
- Also if your system is expected to be rather stable, and you do not foresee a very dynamic change environment affecting the SAP Systems, you may argue that on the long run, a scale-up system may bring OPEX savings that compensate for the larger CAPEX.
You would say that the arguments above are highly biased towards
scale-out, right?
There are 2 recent experiences I've had, which I believe add another perspective to this discussion:
- There are server vendors with modular architectures (for example Bull and SGI) that enable you to scale-up as you go, and rearrange the building blocks as you need.
- For example Bull servers are composed of 2 socket modules that are aggregated together. So a 16 socket server is the aggregation of 8 x 2 socket modules.
- In the example of SGI, the servers grow in 4 socket modules, and then a 16 socket server is the aggregation of 4 x 4 socket modules.
- On both you can add additional modules to grow, or reutilize the modules by breaking down the server in case you no longer need such a large box.
- I’ve also learned that “if you negotiate properly”, the costs of these servers can be equivalent (meaning not massively more expensive, and in some cases maybe even cheaper!!!) to similar capacity from other vendors in smaller servers.
- Then you have service providers like Virtustream, that have the possibility upon growth to reallocate compute capacity between tenants and so provide a risk free evolution path for these very large scenarios. Meaning, you don’t need to figure all this out by yourself, as Virtustream will do this analysis with you and provide you with the right solution in a purely OPEX model while taking care of SAP HANA systems administration for you, and so eliminating all this complexity and risk. This would enable you to just chose between scale-up or scale-out based on your business requirements, and not the architecture and systems administration constraints that would entail.
Conclusions: final questions and… what about Cloud?
So, today I would say, that my personal preference is to
scale-up on all sizes up to 16 sockets (as there are multiple server vendor
alternatives in the market), and to scale-out beyond that.
Why I say this? It’s not as simple as stated above, as I
would need to factor in many considerations.
Questions I would ask to provide a better advice to
customers would be:
- What is your growth forecast for the next 3 years: after 3 years technology is obsolete, and being SAP HANA running on x86 servers, you just need to do an homogeneous system copy, which being in a TDI scenario – with external storage – may mean just attaching the storage to a new server, with a minimum downtime and risk.
- Do you have any extreme performance requirements: number of transactions per minute may indicate how much being in a scale-out can become a problem performance wise or not. There are a lot of customers that have a lot of data, but not that extreme volume of transactions, which will mean that SAP will allow them to scale beyond current CPU/memory ratios to increase just memory - always when confirmed that your CPU utilization is really low.
- Do you need high availability, and which are your SLAs: when in a scale-out scenario, you have less stand-by capacity than in a scale-up scenario. So, considering the failover time of a node with SAP HANA Host Auto-failover in a scale-out scenario with external storage, if this is acceptable for the customer, may enable him to save some money. This may change once SAP enables SAP HANA System Replication with the standby node active for reporting, but we can't always postpone our decisions based on futurology, and this option is not yet available today.
- How resilient you need your data to be: this goes to RPO in a scenario of disaster. Many organizations put in place a DR scenario because they simply cannot afford to operate without SAP systems. But facts are showing that with today’s datacenter standards, it is almost impossible for a Tier4 datacenter to go out of service. This makes many customers to take an approach of disaster avoidance, and build high availability across different buildings in a Tier4 datacenter provider, instead of having asynchronous replication to a remote location. Have to say that this varies a lot by region, and by industry. For example in regions more subject to extreme natural disasters, is more likely for remote data replication to be required, while in western Europe is increasingly common for organizations to assume a disaster avoidance scenario and just do synchronous replication across metro distances. This will have implications on the mechanisms to put in place and the associated costs.
- And as a final aspect, I’ll look at what operation automation mechanisms have to be put in place: for example, there are customers making a standard doing more frequent data refresh on Q&A systems. Doing it for example on a weekly basis, implies a high degree of automation involved in this, and has implications on infrastructure architecture. It is a lot simpler and faster to automate this for a single scale-up node, than for a scale-out cluster.
And the obvious conclusion from this journey through my
thoughts and learnings on this topic is obvious: the right choice depends!
As always, hope that this helps you build up your mind on
what is right for your organization, and be aware of the possibilities.
Anyway, remember that today you can just avoid all this
complexity and risk by putting all your SAP systems in an “Enterprise Class
Cloud”, by leveraging the uniqueness of the offer of companies like Virtustream
which have pushed Security and Compliance to a level that truly makes it safe
for organizations to run their mission critical systems on. That also opens you
the possibility of working with architects like me on evaluating your business
scenario, and assisting you in making the right decisions for your business.
As a final word, have to say I’m feeling truly privileged
and overwhelmed with the amount of talent and innovation at Virtustream and how
it is leading the emergence of a new “Cloud Generation”, which I would call
“Enterprise Class Cloud” or "Cloud 2.0"!
Stay tuned, as I’ll be writing very soon on what I’ve
learned so far about Virtustream, what sets its Cloud offer apart, and in what
ways it breaks barriers to adoption of SAP HANA, and overcomes long-time
concerns of organizations when faced with the scenario of moving enterprise
mission critical applications like SAP Business Suite to the cloud.