Rethinking Storage to Avoid Drowning in Data

aabbatista.jpg
Vella, Michael

When bailing out a boat, few would stop to consider the long-term merits of the bucket in hand. In a similar sense, it's hard to fault carriers, who for years have been facing an ever-rising tide of data to store, for valuing expediency over long-term strategy when it comes to storage.

This seems especially true considering that in recent years data storage needs - often the most expensive part of a data center's operations - have been growing rapidly while IT budgets have remained relatively flat. Thus carriers are facing a choice: Either take money from other projects, or learn to store data more efficiently.

Fortunately, constantly improving disk densities and the attendant proliferation of cheaper, network-attached storage (NAS) options have enabled CIOs to swell their storage capabilities without busting the bank. However, at some point, this "throw-more-hardware-at-it" approach begins to yield diminishing returns, and many carriers, especially large ones, are reconsidering their strategy for storage.

"A few years ago, people were wildly adding storage, and we were no exception to that," says Anthony Abbattista, VP, technology solutions, of Northbrook, Ill.-based Allstate Insurance Co. "Hardware has become commoditized. The question is: Do you have the software environment and the operations environment to take advantage of it?"

An interesting aspect of storage is that carriers are seemingly being pulled in two directions. One imperative is get more from existing investments by consolidating data, while a competing imperative is to warehouse and widely disseminate data for use in analytics. Chad Hersh, a principal at New York-based Novarica, says the underlying cause for this friction between efficiency and functionality is largely architectural.

"Existing architectures are well designed to store large amounts of data, but not designed to do a lot with that data," he says, noting that the size is but one concern. To be sure, carriers now collect a mishmash of data types, from simple text, to audio recordings with customer service representatives, to pictures and, increasingly, video from claims.

THE HOLISTIC APPROACH

At Allstate, efforts to rethink storage coincided with a larger architectural initiative to combine the company's data centers over five years. The insurer has consolidated from 13 mainframe data warehouses to just two, including one contained with a new, highly energy-efficiency data center in Rochelle, Ill.

According to Abbattista, this effort to minimize the physical footprint allowed the company to take a "clean slate" approach to storage. "The [old] warehouses followed the structure of the company, not the data."

He says going to a consolidated footprint also made the company more efficient with energy. "The data center migration helped us look not only at storage, but also at processors and virtualization," he says. "It's about better use of resources-it's really easy to just put racks on floors."

This data-centric approach has paid off handsomely, as the company now has 10 times the storage capacity (five to six petabytes) as had just four years ago, while storage expenses have declined 12%. Abbattista says a multi-pronged approach was necessary in order to store more for less.

One obvious alley was take advantage of market conditions by using lots of cheap drives. "Having ever-growing storage needs, we started to go to network attached devices, and that got us part of the way," Abbattista says, but adds that speed and reliability issues around NAS limit its utility to less critical data.

Indeed, Abbattista says the company became aggressive in its efforts to get data into correct tiers of storage in order to reduce total storage cost. Insurers tier data based on the levels of protection or performance required, frequency of use and compliance requirements. For example, mission-critical or frequently accessed data might be assigned to expensive RAID (redundant arrays of independent disks) arrays in tier 1, while other data will be consigned to lower tiers, and less-expensive storage options.

By analyzing workflow and utilizing different tiers of storage, Allstate was able to trim costs. "It forced us to think about business terms and how we we're going to access information," he says. "We realized we were using tier-1 SANs (storage area networks) for some things we should not have been using it for."

As a result, Allstate created a new, second-tier SAN, and brought in an additional vendor. Abbattista says the company purposely uses a different vendor in each tier. While he acknowledges this arrangement adds complexity, he believes it provides for better competition. "The tier-2 SAN is good enough for about 90% of SAN production needs," he says.

THE NEW CRAZE

Allstate recently began to use deduplication, which employs sophisticated algorithms to weed out redundancies in data and compress it to a fraction of its former size. Major storage vendors IBM, EMC and NetApp offer deduplication within their product lines, but take different approaches. By using deduplication appliances for backup purposes, carriers can avoid making multiple of copies of the same file. "We've found that a lot of backup strategies, no matter how good they are, tend to multiply the base date," Abbattista says.

One indication of growing importance of deduplication is the scrum over the acquisition Data Domain, a company that specializes in deduplication for disk storage and archiving. Both EMC and NetApp made bids for Data Domain, and the issue remains unresolved at press time. Abbattista admires Data Domain's approach to the technology because it performs its work in the system memory and thus saves I/O operations.

Larry Freeman, senior marketing manager, storage efficiency solutions at NetApp, says deduplication does enable the data center to do more with less, noting that a large UK insurer was able to reduce 100 terabytes of actuarial data by 60% using the technology.

Despite impressive results such as these, Freeman is quick to caution that dedupe is but a milestone on the road to storage efficiency. "You really have to pick your spots and deploy it strategically," he says.

Some data sets have a higher percentage of duplicates than others. Freeman says user data and office files tend to have a pretty high percentage of duplicate data, while databases usually don't. NetApp also provides customers with a diagnostic a tool to enable them to know in advance how much space can save through de-duplicating a given set of data, Freeman says.

However, this is far from the only consideration. There are performance implications to dedupe, as it always involves creating a meta database that describes all other data that it is backing up. Insurers may well be wary of adding another process that can slow down sensitive applications. "You can't dedupe everything because you can create more overhead where it's not necessary," Freeman says.

Abbattista acknowledges the thought of adding another structure, the proverbial "one more thing that could go wrong" gave him pause when considering deduplication. "It took a lot of convincing for me to get excited about it," he says. "One thing I always think about is 'can I get the data back?'"

Freeman says vendors have endeavored mightily to address this last concern. Data fingerprints are validated and false positives are weeded out before deduplication occurs. Moreover, Freeman stresses that by working at the meta level, dedupe doesn't change the actual data, but instead changes the pointers that point to the data. This is no small issue in the heavily regulated insurance industry, as well as for any public company in the Sarbanes-Oxley era.

"We did a lot of work developing dedupe to make sure we were not changing the basic structure of the data," he says.

THIN IS IN

If dedupe is the storage technology du jour, thin provisioning, which assigns storage on an as-needed basis, is the methodology of the moment. Indeed, many posit thin provisioning as the best way to reduce the physical amount of storage needed for a given application.

In data centers, it is common practice to over-provision or buy more storage than is truly needed to account for growth. For example, a database administrator may receive a terabyte of space for a particular database and never use half of it. In fact, studies indicate average utilization of enterprise storage system is 20% to 30%.

Thin provisioning creates a pool of storage that applications and databases can draw from. This in done in the background, and fools applications - and users - into thinking that they have more storage than they do.

By thin-provisioning, data centers can add or subtract capacity to fit the needs of applications and the business. "Everybody's guessing," Freeman says. "Nobody really knows how much storage an application will need for a year, five years or ten years. Thin provisioning takes the whole guessing game out of the equation and lets the storage system configure based on how much they are actually writing."

Abbattista appreciates thin provisioning because he no longer needs to keep too much headroom on hand for a rainy day. "We're doing a level of thin provisioning. We'll over-allocate physical space logically, and allocate on demand rather than on the initial database lay down. That's saving us a tremendous amount of space."

Freeman notes the combination of dedupe and thin provisioning work well together for storage efficiency because as you dedupe you can send newly freed storage back to a free pool to be used by other applications. "When someone goes to thin provisioning we usually see their utilization go up by 50%," he says.

THE CLOUD

Yet, are these measures cause for everyone to rethink storage? Hersh says storage issues are more acute at larger insurance operations. "Right now this primarily affects large carriers - they are the ones experimenting with archiving everything," he says.

Allstate's Abbattista concurs. "We are a serious infrastructure shop," he says. "Putting it together at each tier might not be for everybody."

So what are the options for smaller insurers hoping to realize storage efficiencies? Hersh advises looking to the cloud, but acknowledges such a move would present unique challenges with bandwidth and security. "If you look at smaller carriers, they are just not equipped to deal with this," he says. "It's more than an architectural change; it's a whole new world. Either they are going to have to learn how to store large amounts of data, or look at things like the cloud or SaaS."

Eric Bulis, SVP and CIO at New York-based SBLI USA Mutual Life, wouldn't dismiss the notion of cloud-based computing out of hand. "We are monitoring developments in the space and have not yet had time to do significant research," he says. "However, at this point, we would consider it for non-critical applications that do not have significant performance requirements, but that do have significant storage requirements."

Marj Hutchings, director of Internet operations at San Francisco-based Esurance is less enthusiastic. "If we had a cloud within our data center we would do it, but we wouldn't move storage to a cloud that was not managed or hosted by us," she says.

THE ANALYTICAL IMPERATIVE

While such efficiency efforts are crucial, Novarica's Hersh maintains that when crafting storage strategies, carriers need to focus on larger business objectives. "They are trying to reduce claims leakage, reduce fraud, market to more profitable customers and improve underwriting," he says. "To do that you need to get better at analyzing and collecting data. Nobody wants to rely on equity markets for returns."

Hersh contends that having a better grip on storage will only become more important in the future. He singles out telematics, or pay-as-you-go auto insurance as a prime example of why carriers need to address storage issues to remain competitive. "Talk about something that is data intensive - you are potentially collecting driving data from every insured car on the road."

VENDORS ENVISION SINGLE-FABRIC DATA CENTER

The fine distinctions between computing, networking and storage products, and the vendors who sell them, may become harder to discern in the data centers of tomorrow. Indeed, major storage and networking solution providers are promoting their own take on a highly vertical, highly virtualized, unified data center.

The most obvious manifestation of this trend was evident in March when networking leviathan Cisco Systems announced its unified computing system. The Cisco model aims to integrate all major data center functions under a single management structure in order to give companies a utility model of computing where processing power, networking bandwidth and storage space are parceled out as needed.

So what's the value proposition for CIOs? Proponents claim that the homogeneity engendered by a single-fabric data center will boost operational flexibility and energy efficiency while slashing costs. Just by diminishing the amount of cabling and the number of network connections, while boosting utilization, Cisco estimates such a setup would trim costs in data centers by reducing capital expenditures 20%. The company also foresees a 30% reduction in operational expenditures by eliminating the need for separate for storage and network management teams.

Cisco is hardly alone in espousing such a unified vision. Server giant Hewlett-Packard is touting its BladeSystem Matrix, which melds software, servers and storage into a "wire-once" infrastructure. Likewise, software behemoth Oracle is expected to offer a more tightly integrated package to datacenters with its pending acquisition of Sun Microsystems Inc.

While many carriers are unlikely to abandon the freedoms afforded by a multivendor strategy without a fight, these endeavors by vendors may speed adoption of new technologies.

"The best hope for carriers to move forward quickly with these solutions is for the vendors to continue to verticalize," says Chad Hersh, a principal at New York-based Novarica.

 

(c) 2009 Insurance Networking News and SourceMedia, Inc. All Rights Reserved.

For reprint and licensing requests for this article, click here.
Data and information management Policy adminstration Analytics
MORE FROM DIGITAL INSURANCE