Rapid Recovery From Disaster

Michael Galvin knows more about disaster recovery and business continuity than most of us care to know. That's because the vice president and chief infrastructure officer of Empire Blue Cross and Blue Shield has first-hand experience with the worst catastrophe this country has ever endured: September 11, 2001.The New York-based health insurance company occupied 10 floors and 480,000 square feet in the North Tower of the World Trade Center. And, although the company tragically lost 11 people that day and several employees were injured, most of the 1,900 employees and consultants escaped from the building unharmed.

Empire also lost 265 servers, 2,300 desktops and 400 laptop computers. But customers never even noticed a difference in service. "Everything is designed and engineered for redundancy and reliability," says Galvin. "We always build for the worst-case scenario-although, quite honestly, no one expected to lose a whole building."

The redundancy and reliability that Empire engineered into its systems included automatic call rerouting, server backup and a flexible, standards-based infrastructure. The company's mainframes and mid-tier systems were not affected.

But as important as Empire's technology design and engineering were to its rapid recovery from the terrorist attacks, the spirit and the willingness of the people to work together and with the company's business partners were also instrumental, Galvin says.

"In some conferences I've spoken at, people are amazed that we were able to stay in business the second the tragedy happened," Galvin says. "But business continuity is not a three-ring, five-inch-thick binder of procedures that tells you how to recover. It's really about making the investment upfront in your design, engineering, people and partnerships," he says. "Those are the four most important components in disaster recovery and business continuity."

Deeper understanding

Empire Blue Cross and Blue Shield was unusually prepared for the unthinkable tragedy that occurred on Sept. 11. But many companies would not have been so well-prepared, and realizing this, they have used that day as the motivation to improve their disaster recovery and business continuity plans.

"Most organizations had some form of disaster planning prior to 9/11. But 9/11 made us consider a much broader scope of risk, and a much deeper understanding of what the planning process is," says Victor Sordillo, global technical services manager at the Chubb Group of Insurance Cos., Warren, N.J.

The insurance industry, in particular, first looked externally to their exposure to claims, he says. "Most insurance companies never considered weapons of mass destruction as a loss scenario." Then, after assessing their claims exposure, insurers looked internally and asked themselves, "If we were there, how would we have responded? How would we have been prepared? And could we have survived a disaster of this type?" Sordillo adds.

Overall, North American companies spent 17% more in 2002 on loss-control services, even while they faced budgetary constraints and a difficult economic environment. That's according to a Chubb survey of 385 risk managers conducted in February. In addition, many risk managers are shifting loss-control dollars from traditional areas, such as workers compensation and product liability, to emerging risks, such as security, disaster preparedness and corporate governance, Chubb found.

The average level of spending on full-featured business continuity plans in financial services firms is approximately 5% of the total IT budget, according to Celent Communications, a Boston-based research and advisory firm. "Business continuity definitely moved to the front burner after Sept. 11," says Matthew Josefowicz, senior analyst at Celent. "Everybody started reviewing their plans."

Historically, disaster recovery and business continuity were not sponsored at a high level in an organization, and many plans were not kept current, he says. But that is changing.

For instance, Chubb already had thorough disaster recovery and security processes in place, but after Sept. 11, it expanded its business continuity planning efforts. It re-examined its program, established a dedicated department and beefed up its IT and physical security. It also began to include its senior level executives in the process.

Other companies, especially large firms, have done the same. Many senior executives were complacent and satisfied that they had good plans, says Chubb's Sordillo. "Now, they're asking for proof of preparation."

Auditors and regulators are also asking more about business continuity plans, says Nancy Edwards, assistant vice president and director of business continuity at State Auto Insurance Cos., Columbus, Ohio. "Auditors have been asking questions for three or four years now, but the urgency of their questions and the level of detail they demand have increased," she says.

Auditors' questions are more sophisticated, says Edwards, who was reassigned last June to directing State Auto's business continuity planning at the senior level. They're saying, "Yes, you've shown me that your data can be recovered. Now, how are you going to access it, and get an invoice out, and pay claims-and all those good things that really make the money move?"

Indeed, Sept. 11 elevated disaster recovery and business continuity to the C-level, sources concur. "It became an operational issue and even a risk and financial issue-rather than just an IT issue," says Celent's Josefowicz, "because if you cannot demonstrate continuity to your investors, there's a potential liability there."

Business unit managers are beginning to understand the risks of not being able to get their people back in business, says Pat McAnally, senior director of marketing at SunGard Planning Solutions, the professional services arm of SunGard Availability Services, a Wayne, Pa.-based business continuity software and consulting provider.

As a result, planning for business continuity has spread from the IT department, which traditionally has planned for data center recovery, into the business units, she says. "They're developing processes and procedures-deciding who has to go where, and who needs to come up first and second, and how critical e-mail is, for example."

The focus of disaster recovery and business continuity planning has been primarily on data and systems backup, Josefowicz says. "After 9/11, a lot of attention was paid to staff relocation and productivity systems, such as e-mail."

At USAA, business continuity expanded significantly after Sept. 11, according to John Blaha, assistant vice president, business continuation at the San Antonio-based financial services company. Blaha was assigned to his position in July 2001. And, as a former NASA astronaut, he's about as top-notch a person USAA could have selected for the job. "I feel very comfortable in this role-because I lived it from 1980 to 1997 working for NASA. For me, it's just common sense," he says.

Under Blaha's direction, USAA has deployed a strong business continuity management team and trained the group extensively in their roles and responsibilities.

The company also has bolstered its communications capabilities. For instance, key recovery decision-makers now have GETS cards (Government Emer-gency Telecommunications Systems) to enable them to obtain priority routing if the phone system is overloaded in an emergency. And, the company has installed satellites and radio communications in critical locations, including the homes of vital people on the recovery team.

USAA also conducts regular testing of its plans. In March, the company completed an enterprise recovery out of its backup center, and met all its recovery time objectives, which ranged from zero to 72 hours. That recovery included 200 critical business operations, four terabytes of data, and 111 third-party vendor connections. All were up and running within 63 hours.

Testing is critical to ensure that business continuity plans will be effective if and when they're needed, Blaha says. "Exercises serve as a good training ground. They provide incentive for the people involved to correctly accomplish their change management. It does no good to build all that (capability) and pay all that money, if you don't keep (the process) up to date."

CEO commitment is another key to USAA's reputation as the gold standard in business continuation planning. In fact, it's the most important component, according to Blaha. "What we have done, we have done because the CEO of USAA and the president of the information technology company were committed to putting in place the real capability they now have," he says.

"They were committed-financially-and willing to buy a 'life insurance policy.' Their commitment and their desire made it happen. Without that it doesn't happen. It doesn't happen at all," he adds.

Leading-edge technology

Like USAA, other companies have reinforced their communications systems to include satellite and radio technologies following Sept. 11. "Cell phones were always considered a backup," says Chubb's Sordillo. "But when cell phones lines were blocked from overuse, companies began to consider that maybe secondary backup wasn't enough. Now, they're looking at two or three backups," he says.

Empire Blue Cross and Blue Shield leveraged its IP infrastructure after Sept. 11. During the recovery period, the company used e-mail and the Web to communicate and enable people to work from home. For the first six weeks after the disaster, employees telecommuted, says Galvin. "We had people calling up wanting to work from home," he says. "We used Web casting and collaboration tools so people could work together."

The company also used IP Voice and IP Video at a temporary site after the attacks. Because the company's infrastructure is standards-based, Empire was able to go to the Midtown W Hotel, and work with the hotel's engineers to convert two floors to a high-speed LAN environment, Galvin says.

The company located about 400 developers on those two floors and connected them through a virtual private network to its data center. "So we had people back to work almost immediately," he says.

Indeed, leading-edge technology played a role in Empire's Sept. 11 recovery. And Galvin expects another new technology-the Storage Area Network (SAN)-will be deployed in the future.

Recovery of the back-end and e-mail systems took up to 10 days-because data restoration was manually intensive, Galvin says. With SAN technology, data is managed across the network instead of being stored on one server.

This way, "if something goes down, it won't affect your business, because it automatically switches over to another silo," he says.

High-availability systems are becoming mandatory for insurers operating in a real-time environment, sources say. This technology enables two systems to be connected and configured in such a way that the secondary system takes over if the primary system fails (see "High availability systems become a necessity," page 29).

"Business is just different today than it was five years ago-with different demands," says Ken Smith, president of SunGard Planning Solutions.

"Companies have much shorter recovery time objectives, and greater expectations from the public at-large. There was a time when a check could go out in a week-and that was okay," he says. "But if you're a help desk today, you can't be down for two weeks."

Business vulnerability has increased due to 9/11, says State Auto's Edwards. "People say, 'We've got this great plan and we can recover our mainframe computer in very good time,'" she says. "But then you start to think, 'If the mainframe comes up, who will be able to access it-and where-if the entire building is gone?'"

Herculean task

Technology is more complex as well, Edwards notes, which is a challenge to business continuity. "Six years ago, if we recovered our mainframe, we were in great shape because we could set up dumb tubes someplace where somebody could access it, and they were off to the races getting their work done.

"Now, there's a PC between you and the mainframe, and all this middleware and all these software programs," she says. "We used to have to recover one big machine-the mainframe. Now, we've got more than 130 servers to recover too. And getting 130 servers recovered in two days flat is a Herculean task."

Decentralization of operations is a relatively simple strategy to ensure faster recovery, sources say. "The best thing that people have found is that splitting a 'one-and-only' location into two trumps just about everything else you can do," Edwards says. "Even though the second location would suddenly be horribly burdened trying to do the work of two locations, at least you've got a trained workforce ready to go."

And the workforce ultimately recovers the business-as Empire's Galvin witnessed on Sept. 11. For example, an Empire employee at a remote location automatically forwarded the domain controllers in the World Trade Center to another site before the buildings went down-a decision that saved the company a considerable amount of manual recovery of employee profiles.

Another employee who managed workstations and servers phoned Empire's business partners-as the tragedy was unfolding-to order hundreds of servers, workstations and laptops-with instructions on where to deliver them.

And, an engineer quickly set up microwave communications between two buildings that served as temporary sites during the recovery. "It wasn't always the high-level person who had the ideas," Galvin stresses. "This was a lower-level person who made a recommendation, and we bought into it, and bang-it worked."

If there's one lesson Galvin wants to share about business continuity planning, it's "invest in your people and their skills, and listen to them," he says. "Whatever you put into your people, you'll get back tenfold."

For reprint and licensing requests for this article, click here.
Policy adminstration Workforce management
MORE FROM DIGITAL INSURANCE