Is it time for a new data tier?

Posted on June 23, 2013

Tier0_boxes A common bottleneck in many virtualization implementations is storage. While it is generally quite easy to scale CPU and memory in an environment by adding to existing hosts or by adding additional hosts to the environment, it is often more difficult to make an impactful storage purchase. Calculating how much storage to add is also often not straight forward. Many factors play into the equation used to determine the amount of storage required for both capacity and performance. Add to that that multiple host environments typically use shared storage with many hosts connecting to a single redundant array, and things like host, cluster, and array controller queue depths, available storage network bandwidth, array controller capacity, timing of workload bursts, and back-end storage system capacity all need to be known when deciding how much storage to add. This is even further complicated with tiered storage systems as you also need to calculate the amount of storage performance and capacity required for each tier of storage to make sure that you are adding in the right place. Having administered a tiered storage system for the last 5 years this is the type of math I did every year when doing capacity planning and budgeting, usually with little visibility of what projects were coming down the pipe and how much capacity and performance would be required on a per server or per application basis.

The situation many storage administrators find themselves in over time is that while they carefully plan certain purchases for capacity and others for performance, they invariably end up with more capacity than they need - sometimes two or three times more (or even more!) - in order to meet the performance requirements. Purchasing more spindle to add performance when there is already so much extra capacity can be difficult math to explain to management when they see the world in terms of budget dollars while eyeing an "unused" resource. Wouldn't it be nice if there were a better way to scale performance in a manner more inline with the scaling of capacity? In other words, to scale in a more linear fashion.

I have been on the lookout for many years and saw a few solutions. The most typical one has historically been to add cache to the array controllers. Issues with this were that some vendors did not offer extra cache, the costs were even higher than disk for relatively small increases, and/or the controller software couldn't effectively use the extra memory anyway due to OS limitations (such as 32 bit versus 64 bit memory addressing). More recently array vendors have begun integrating flash cards (such as Fusion-IO for example) within the controller to accelerate storage performance. While the storage market has been very active over the past 2-3 years most of the solutions available involve purchasing new hardware which I am leery to do. It's not that I'm not buying new hardware - I just don't relish the idea of buying different hardware from a different vendor. With a large investment in storage systems from vendor X the idea of purchasing from a different vendor and replacing primary storage is not appealing. Imagine sitting in a meeting with storage vendor Y and essentially being told that the solution is to replace your existing tier 1 storage system with theirs - not an easy sell. Now add to this the high costs of new arrays or filers and budgeting issues that make this an either/or decision, as in I either spend my storage budget on a new system which may or many not address my challenges, or spend it on my current system which I know and am comfortable with, hating that I have x number of times the capacity that I need to deliver performance. Some companies may brush off these types of expenses as no big deal, but my employer certainly would not. The result is storage systems with dozens or even hundreds of disks. So what other options are there?

Software_VSA

Recently an increasing number of companies have begun marketing software-based solutions to the problem, or what I call the "bring your own hardware" approach. The solutions I have seen basically break down along the following lines:

Your physical server with their storage software and some combination of flash/SSD and HDD at the top of rack presenting volumes as block or file. Basically build your own array.
A virtual appliance with their storage software and a combination of flash/SSD and HDD within the SAN, presenting volumes as block or file.
A virtual appliance with their storage software and a combination of flash/SSD and HDD within the host, presenting volumes as block or file.
A virtual appliance with their storage software using flash/SSD on the host as cache.
A virtual appliance with their storage software using RAM as cache.
A virtual appliance with their storage software on each host leveraging local SSD/fhash and HDD to create a distributed storage system (sometimes called a virtual SAN).
A kernel module in the storage path of the hypervisor which leverages local flash/SSD.

The various solutions each have their own pros and cons, and specific solutions take different approaches to address them (if they address them at all). Most perform read acceleration only as it is the easiest to do, and with the cost of flash drives gradually coming down most solutions are leveraging local flash. This local tier of storage is often called Tier 0, and is ironically being marketed as a new tier of storage, despite having been there first. What is different however is in how this tier is used. Instead of it being the final resting place for data at rest it is instead most often used as a caching layer or transitive tier to deliver increased I/O with lower latency. Other effects of using this tier include freeing up resources on the back-end array which also reduces latency due to offloading, and even improving back-end write performance, again due to the freeing up of resources.

Hosts_Tier0

Regardless of the exact approach being used I think that one that leverages locally installed flash/SSD for transitive data makes sense. I also like the idea of leveraging local RAM for read caching as server RAM prices have come down enough that it is not unusual to see virtualization hosts configured with 128 GB or more. Call this tier 0 if you wish. Not only does this allow for scaling storage along with compute and memory, it has the potential to dramatically lower latency to the application which often translates to a better user experience. While read caching is the most solved solution I am interested in seeing write acceleration as well. Most vendors do not yet offer this capability and if they do I am very much interested in understanding how they deal with various failure scenarios. Write acceleration in RAM is difficult unless the RAM is battery-backed, synchronously replicated, and/or leveraging flash as a persistent additional tier. Data integrity is King so anything that puts that at risk is too risky for me. But I will take read caching in the meantime. I do have concerns about solutions that run as resource-consuming VMs that could potentially become resource-starved, require a reboot, hang, or impact other VMs for example. I am also concerned about solutions that might cause loss of existing functionality or require extensive changes to the environment.

What do you think of leveraging resources local to the server? Do you have issues with data anywhere other than on the shared array? What do you think of the solutions currently on the market?

Carefully consider:

what challenge(s) you are trying to solve
whether improving write performance is a requirement
how the various solutions help solve some of those problems
what changes will be required in the virtual environment (new port groups, datastores, etc.)
whether an agent will be required within the guest
whether any functionality will be lost (vMotion, DRS, SDRS, HA, FT, replication, etc.)
how backups may be impacted
what process changes will be required
whether your data will be put at increased risk
whether availability will be impacted in any way and whether it is built into the solution
how fault tolerant is the solution (a different question from above)
whether you are willing to make trade-offs elsewhere in order to solve one problem
how simple the solution is to implement and manage
what the impact of different types of failure will be
whether the solution will scale
whether costs will scale (licensing model)
the support options available (8x5 support for production may not be acceptable unless there are mitigating factors)

With new SSDs on order I will be doing my own research and testing over the next few months and will write about them as I go. In the mean time I am interested in learning about the solutions that some of you have chosen to boost storage performance without adding unneeded capacity, and what you think of your solution. What do you like or dislike, and do you see it as a viable permanent and scalable solution or is it filling a gap for now?

Enjoy this article?

Consider subscribing to our rss feed!

Posted by Peter

Filed under: Performance, Storage, Virtualization Leave a comment

Comments (0) Trackbacks (0) ( subscribe to comments on this post )

No comments yet.

Trackbacks are disabled.

A look at the new DXM-A PCoIP offload card for Dell M-series blade servers » « vExpert 2013!