2014-10-31

ScaleIO - A New way to configure storage for SAP HANA?

Just a couple of days ago, EMC's ScaleIO got to be supported by SAP to be used in productive environments both for SAP HANA and traditional SAP Landscapes.
SAP's support statement can be found on the following SAP Note: https://service.sap.com/sap/support/notes/800326

For many, this will have little meaning, but for some this will be a major breakthrough.

Let me state from the start that it's my true belief that for large enterprise customers, having a SAN based storage config is the best way to deploy SAP Systems, an in particular SAP HANA. My posts on this blog explain why i believe that, and it's a factor of performance, TCO, flexibility and operations aspects.

But the fact is that mainly in the service provider business, some organizations are looking to "white label" servers with direct attach storage, forming large pools of resources, as a cheaper infrastructure option in this very competitive "public cloud" world.

And in that case, you will want a couple of things:
  • Performance, of course;
  • But also redundancy;
  • Scalability;
  • and ease of management.

In the SAP world, so far, the only way to get the redundancy and scalability was using IBM's GPFS, which is a clustered file system providing many redundancy and scalability characteristics.

There are lots of merits and value in GPFS, but I think GPFS is the wrong tool for environments like databases (for example see my blog post on block vs file here). Also the feedback I'm getting from customers I meet using GPFS, say it's a nightmare to manage, implies unacceptable downtimes to scale, and implies as well operations costs that are not the most affordable in the market.

Well, ScaleIO is a technology that allows to aggregate the direct attach storage of many servers into a single "virtual SAN alike" block storage pool.

And as I've written on an earlier blog post, for database workloads, I believe that "block" access to disk is the right option. After all the databases read and write blocks, and the same happens with SAP HANA.

So, what ScaleIO allows is to:
  • get redundancy into direct attach storage my ensuring a copy of each server's data on another server;
  • get awesome performance by distributing the data of a filesystem attributed to a server accross all servers in the pool (all application servers are at the same time storage servers for their peers), which makes this solution to perform better the more nodes you have in the cluster (of course LAN network planning plays an important role here);
  • scalability is natural in the design of ScaleIO, and the best is that you can add servers to the pool and re-balance volumes automatically behind the scenes. the same automatic re-balancing happens as well when you remove a server;
  • and the best of it, is that many of these operations are done online.
More information will come out explaining what is ScaleIO, how it works, and how to configure it for SAP HANA environments. EMC is working on a whitepaper documenting how to configure ScaleIO for SAP HANA environments which should not take long to be published.

Meanwhile, check out SAP Note 800326 for the official support statement from SAP for ScaleIO, and if you want to learn more about this technology, it's benefits and how it might fit on your datacenter strategy, drop me a message and I'll be happy to help.

One final note: if you are a fan of server based storage, you need to scale-out, you would like alternatives to GPFS, and would like to have more choice for this kind of setup either with IBM servers, or other server manufacturers, the best news of all, is that NOW you have!

Stay tuned as more new will come on this soon.

SAP HANA Network Requirements Whitepaper has been published



Just wanted to bring your attention to the publishing of the “SAP HANA Network Requirements” whitepaper, that can be found at: http://www.saphana.com/docs/DOC-4805

I’ve talked with some of you in the past in regards to:

  • What is the needed throughput for certain LAN segments;
  • What is the maximum latency admitted for synchronous replication;
  • What network segmentation must be implemented for HANA Network Integration;
  • Etc.


All of these and other questions are answered in this whitepaper.


For me, some aspects within the document caught my attention:

  • Recommended to have a maximum of 1 ms round-trip in the network connecting two sites when intending to implement synchronous replication;
  • Demand for the HANA Internal Network (for inter-node communication in a scale-out cluster) to deliver minimum of 2x9 GBit/s in full duplex;
  • Recommendation to have up to 9 network segments (with the implied demand for server network ports) for performance and security reasons…


Well, if this is a topic of your interest, be sure to reserve some quiet time to have a close look at this document as it is indeed very detailed.
I believe there are some aspects I would argue against or add something, but overall is a very good document for all that want to get started on these topics.

Also remember that I keep an exhaustive list of relevant SAP Technical Document to make it easier to you all to find what you are looking for.

Have a great reading!

2014-10-04

Does SAP HANA Require NAS storage? No it does NOT!

Although I've written quite a bit about this, there are still many confused about whether SAP HANA needs a NAS storage.

So, to make it clear, NO IT DOES NOT!

What SAP HANA requires is a shared file system in the same way the old SAP Netweaver system had a shared file system that was SAPMNT.

The same happens with HANA and his is called /hana/shared and must be accessible by all HANA nodes, in the same way it happened with SAPMNT in the Netweaver world

So, using a NAS storage / gateway is a way to achieve this goal, but is not the only one.

For example, in the same way it was done for the Netweaver's SAPMNT, you can use a Linux server with PaceMaker to export an NFS share out of a block device in an high available way.

The consideration you need to take here is the size of your cluster. So, if you are going for scale-out clusters with a large number of nodes, you don't want this NFS share to become a bottleneck and a problem, so ensuring proper network connectivity (latency, throughput and availability) requires more careful planning than just throwing it to any existing Linux machine.

I've written a lot on why you should use block devices for HANA.

As a short reference you can read about:

All this said, using a unified storage that provides both block and file connectivity may also serve your needs, by using the block connectivity for all the data and log devices, and the NAS functionality for the HANA SHARED.

Again, this is a possibility, not a need.

Finally, considering that most HANA projects I'm seeing these days are for Suite on HANA, which implies single server implementations, doing this in a TDI setup, by connecting two servers to external storage for high availability, and while installing Linux add as well the clustering software package just with the goal of protecting and exporting the "HANA shared" NFS share might be the simplest more integrated option. (you can read more about running SAP Business Suite on HANA in a TDI setup on my blog post here: http://sapinfrastructureintegration.blogspot.com/2014/09/running-sap-business-suite-on-hana-in.html).

Again, these are all possibilities, and fortunately, as HANA has matured a lot over the last year, now you have these well known options also available for HANA, further providing you choice in a way that you can standardize your datacenter practices.

As a conclusion, HANA is a lot more open today than it was just 1 year ago, so don't go for proprietary solutions that having been the first, are not the easiest neither the best for most customer cases. Take your time to evaluate current architectural options and make a decision to have the most standard possible application architecture and building blocks, across your datacenter, including HANA of course!

Having this uniform architecture across the datacenter will drive down your risk, enable more agile changes, and in the end a more streamline and cost effective operation.

Hope this helps, and feel free to shoot me any deeper technical questions you may have in this regards.